Jason Liu [Tue, 28 May 2019 14:37:59 +0000 (14:37 +0000)]
[XCOFF] Implement parsing symbol table for xcoffobjfile and output as yaml format
Summary:
This patch implement parsing symbol table for xcoffobjfile and
output as yaml format. Parsing auxiliary entries of a symbol
will be in a separate patch.
The XCOFF object file (aix_xcoff.o) used in the test comes from
-bash-4.2$ cat test.c
extern int i;
extern int TestforXcoff;
int main()
{
i++;
TestforXcoff--;
}
Sanjay Patel [Tue, 28 May 2019 13:54:17 +0000 (13:54 +0000)]
[x86] split 256-bit store of concatenated vectors
This shows up as a side issue to the main problem for the AVX target example from PR37428:
https://bugs.llvm.org/show_bug.cgi?id=37428 - https://godbolt.org/z/7tpRa3
But as we can see in the pile of existing test diffs, it's actually a widespread problem
that affects any AVX or later target. Apart from a couple of oddballs, I think these are
all improvements for the reasons stated in the code comment: we do not want to enable YMM
unnecessarily (avoid vzeroupper and frequency throttling) and some cores split 256-bit
stores anyway.
We could say that MergeConsecutiveStores() is going overboard on some of these examples,
but that won't solve the problem completely. But that is the reason I'm proposing this as
a lowering rather than a combine: we will infinite loop fighting the merge code if we try
this earlier.
David Stenberg [Tue, 28 May 2019 13:23:25 +0000 (13:23 +0000)]
Stop undef fragments from closing non-overlapping fragments
Summary:
When DwarfDebug::buildLocationList() encountered an undef debug value,
it would truncate all open values, regardless if they were overlapping or
not. This patch fixes so that it only does that for overlapping fragments.
This change unearthed a bug that I had introduced in D57511,
which I have fixed in this patch. The code in DebugHandlerBase that
changes labels for parameter debug values could break DwarfDebug's
assumption that the labels for the entries in the debug value history
are monotonically increasing. Before this patch, that bug could result
in location list entries whose ending address was lower than the
beginning address, and with the changes for undef debug values that this
patch introduces it could trigger an assertion, due to attempting to
emit location list entries with empty ranges. A reproducer for the bug
is added in param-reg-const-mix.mir.
Sanjay Patel [Tue, 28 May 2019 12:58:07 +0000 (12:58 +0000)]
[x86] fix 256-bit vector store splitting to honor 'volatile'
Forking this out of the discussion in D62498
(and assuming that will be committed later, so adding the helper function here).
The LangRef says:
"the backend should never split or merge target-legal volatile load/store instructions."
Hans Wennborg [Tue, 28 May 2019 12:19:38 +0000 (12:19 +0000)]
Re-commit r357452 (take 2): "SimplifyCFG SinkCommonCodeFromPredecessors: Also sink function calls without used results (PR41259)"
This was reverted in r360086 as it was supected of causing mysterious test
failures internally. However, it was never concluded that this patch was the
root cause.
> The code was previously checking that candidates for sinking had exactly
> one use or were a store instruction (which can't have uses). This meant
> we could sink call instructions only if they had a use.
>
> That limitation seemed a bit arbitrary, so this patch changes it to
> "instruction has zero or one use" which seems more natural and removes
> the need to special-case stores.
>
> Differential revision: https://reviews.llvm.org/D59936
Yevgeny Rouban [Tue, 28 May 2019 11:33:50 +0000 (11:33 +0000)]
[CorrelatedValuePropagation] Fix prof branch_weights metadata handling for SwitchInst
This patch fixes the CorrelatedValuePropagation pass to keep
prof branch_weights metadata of SwitchInst consistent.
It makes use of SwitchInstProfUpdateWrapper.
New tests are added.
Craig Topper [Tue, 28 May 2019 07:25:27 +0000 (07:25 +0000)]
[InlineCost] Fix a couple comments. NFC
Replace "unary operator" with "unary instruction" in visitUnaryInstruction since
we now have a UnaryOperator class which might needs its own visit function.
Fix a copy/paste in visitCastInst that appears to have been copied from
visitPtrToInt.
Don Hinton [Tue, 28 May 2019 06:26:58 +0000 (06:26 +0000)]
[test] Fix plugin tests
Summary:
The following changes were required to fix these tests:
1) Change LLVM_ENABLE_PLUGINS to an option and move it to
llvm/CMakeLists.txt with an appropriate default -- which matches
the original default behavior.
2) Move the plugins directory from clang/test/Analysis
clang/lib/Analysis. It's not enough to add an exclude to the
lit.local.cfg file because add_lit_testsuites recurses the tree and
automatically adds the appropriate `check-` targets, which don't
make sense for the plugins because they aren't tests and don't
have `RUN` statements.
Here's a list of the `clang-check-anlysis*` targets with this
change:
Matt Arsenault [Mon, 27 May 2019 20:37:31 +0000 (20:37 +0000)]
RegAllocFast: Set MayLiveAcrossBlocks when allocating uses
Setting mayLiveOut based only on use instructions after allocating the
def block did not work if the use block was allocated before the def
block, since the virtual register uses were already removed.
Sanjay Patel [Mon, 27 May 2019 20:26:21 +0000 (20:26 +0000)]
[SelectionDAG] fold concat of extract subvectors
This is derived from the related fold for build vectors.
We also have a version of this in DAGCombiner. The benefit of
having this fold at node creation time is (1) efficiency and
(2) preventing infinite looping from creating patterns that
should not exist in the first place.
Currently, the inf-loop could happen with MergeConsecutiveStores()
because it naively creates concat of extracts when forming a wider
vector store. That could fight with target-specific store narrowing.
Sanjay Patel [Mon, 27 May 2019 18:26:43 +0000 (18:26 +0000)]
[SelectionDAG] fix formatting and redundant comments; NFC
There's a possible missing fold here for extracting from the
same source vector. It's similar to a check that we use to
squash a build vector with all extracted elements from the
same source vector.
Michael Liao [Mon, 27 May 2019 18:26:29 +0000 (18:26 +0000)]
[SelectionDAG] Enhance the simplification of `copyto` from `implicit-def`.
Summary:
- The current implementation simplifies the case where the source of
`copyto` is `implicit-def`ed. However, it only works when that
`implicit-def` is single-used since it detects that from
`implicit-def` and cannot determine which destination vreg should be
used if there are multiple uses.
- This patch changes that detection when `copyto` is being emitted. If
that `copyto`'s source is defined from `implicit-def`, it simplifies
it. Hence, it works even that `implicit-def` is multi-used.
- Except it simplifies the internal IR, it won't improve the quality of
code generation. However, it helps to detect 'implicit-def` in a
straight-forward manner in some passes, such as `si-i1-copies`. A test
case is added.
Jacques Pienaar [Mon, 27 May 2019 17:38:41 +0000 (17:38 +0000)]
NFC: Change usage of 'DenseSet' to 'DenseSetImpl' in DenseSetImpl::ConstIterator.
Summary:
Change usage of 'DenseSet' to 'DenseSetImpl' in a friend declaration within DenseSetImpl::ConstIterator. 'ConstIterator' was never updated when DenseSet was split into an impl when adding support for DenseSetImpl.
This fixes build errors on MSVC when forward declaring DenseSet as this friend decl does not declare the template arguments as well.
Dmitri Gribenko [Mon, 27 May 2019 17:03:57 +0000 (17:03 +0000)]
Include what you use in AArch64AsmBackend.cpp
AArch64AsmBackend.cpp was not using any APIs from AArch64.h, and was
only including it for transitive dependencies. Doing so is problematic
from include-what-you-use perspective, but it is also a layering issue
(it creates a dependency cycle between the primary AArch64 target
library and the MCTargetDesc library).
Simon Pilgrim [Mon, 27 May 2019 16:39:25 +0000 (16:39 +0000)]
[SelectionDAG] GetDemandedBits - add demanded elements wrapper implementation
The DemandedElts variable is pretty much inert at the moment - the original GetDemandedBits implementation calls it with an 'all ones' DemandedElts value so the function is active and behaves exactly as it used to.
LI is a loop invariant load instruction that post dominate for.outer, so LI should be able to move out of the loop nest. However, there is a bug in allLoopPathsLeadToBlock().
Current algorithm of allLoopPathsLeadToBlock()
1. get all the transitive predecessors of the basic block LI belongs to (for.inner) ==> for.outer, for.inner.latch
2. if any successors of any of the predecessors are not for.inner or for.inner's predecessors, then return false
3. return true
Although for.inner.latch is for.inner's predecessor, but for.inner dominates for.inner.latch, which means if for.inner.latch is ever executed, for.inner should be as well. It should not return false for cases like this.
Hans Wennborg [Mon, 27 May 2019 09:03:00 +0000 (09:03 +0000)]
Cmake: allow using LLVM_EXTERNAL_PROJECTS with LLVM_ENABLE_PROJECTS
The current code iterates over the combination of LLVM_EXTERNAL_PROJECTS
and LLVM_ENABLE_PROJECTS, but then disables projects that are only in
the former. If a project is in LLVM_EXTERNAL_PROJECTS, it should be
enabled.
Serge Guelton [Mon, 27 May 2019 08:24:06 +0000 (08:24 +0000)]
Make llvm-as --help great again
This is a follow-up to https://reviews.llvm.org/D60411, but for llvm-as.
New output:
OVERVIEW: llvm .ll -> .bc assembler
USAGE: llvm-as [options] <input .llvm file>
OPTIONS:
Generic Options:
-help - Display available options (-help-hidden for more)
-help-list - Display list of available options (-help-list-hidden for more)
-version - Display the version of this program
llvm-as Options:
-data-layout=<layout-string> - data layout string to use
-disable-output - Disable output
-f - Enable binary output on terminals
-module-hash - Emit module hash
-o=<filename> - Override output filename
Nico Weber [Mon, 27 May 2019 00:48:59 +0000 (00:48 +0000)]
llvm-undname: Make demangling of MD5 names more robust
Demangler::parse() for MD5 names would:
1. Put all remaining text into the MD5 name sight unseen
2. Not modify MangledName
This meant that if the demangler recursively called parse() (e.g. in
demangleLocallyScopedNamePiece()), every recursive call that started on
an MD5 name would add all remaining bytes to the output buffer but
only advance the input by a byte. For valid inputs, MD5 types are
never (well, see comments for 2 exceptions) nested, but for invalid
input this could cause memory use quadratic in the input size.
[AMDGPU] Divergence driven ISel. Assign register class for cross block values according to the divergence.
Details: To make instruction selection really divergence driven it is necessary to assign
the correct register classes to the cross block values beforehand. For the divergent targets
same value type requires different register classes dependent on the value divergence.
This fixes a problem where back-pressure increases caused by register
dependencies were not correctly notified if execution was also delayed by memory
dependencies.
Andrea Di Biagio [Sun, 26 May 2019 18:41:35 +0000 (18:41 +0000)]
[MCA] Refactor the logic that computes the critical memory dependency info. NFCI
CriticalRegDep has been renamed CriticalDependency, and it is now used by class
Instruction to store information about the critical register dependency and the
critical memory dependency. No functional change intendend.
Shawn Landden [Sun, 26 May 2019 13:55:14 +0000 (13:55 +0000)]
[SimplifyCFG] Run ReduceSwitchRange unconditionally, generalize
Rather than gating on "isSwitchDense" (resulting in necessesarily
sparse lookup tables even when they were generated), always run
this quite cheap transform.
This transform is useful not just for generating tables.
LowerSwitch also wants this: read LowerSwitch.cpp:257.
Be careful to not generate worse code, by introducing a
SubThreshold heuristic.
Instead of just sorting by signed, generalize the finding of the
best base.
And now that it is run unconditionally, do not replicate its
functionality in SwitchToLookupTable (which could use a Sub
when having a hole is smaller, hence the SubThreshold
heuristic located in a single place).
This simplifies SwitchToLookupTable, and fixes
some ugly corner cases due to the use of signed numbers,
such as a table containing i16 32768 and 32769, of which
32769 would be interpreted as -32768, and now the code thinks
the table is size 65536.
(We still use unconditional subtraction when building a single-register mask,
but I think this whole block should go when the more general sparse
map is added, which doesn't leave empty holes in the table.)
And the reason test4 and test5 did not trigger was documented wrong:
it was because they were not considered sufficiently "dense".
Also, fix generation of invalid LLVM-IR: shl by bit-width.
Extract method to compute overflow based on binop and signedness,
and then make the result handling code generic. This extends the
always-overflow handling to signed muls, but has currently no effect,
as we don't compute always overflow for them (thus NFC).
David Green [Sun, 26 May 2019 10:59:21 +0000 (10:59 +0000)]
[ARM] Promote various fp16 math intrinsics
Promote a number of fp16 math intrinsics to float, so that the relevant float
math routines can be used. Copysign is expanded so as to be handled in-place.
Fangrui Song [Sun, 26 May 2019 08:31:00 +0000 (08:31 +0000)]
[PowerPC] Add missing R_PPC_* relocation types
While people mostly care about 64-bit, some systems need basic lib32
support. The plan is to make lld (see PR40888) capable of linking some
applications (PR40888).
Sanjay Patel [Sat, 25 May 2019 15:28:55 +0000 (15:28 +0000)]
[SelectionDAG] define binops as a superset of commutative binops
The test diffs show improved vector narrowing for integer min/max opcodes because
those were all absent from the list. I'm not sure if we can expose functional diffs
for all of the moved/added opcodes though.
It seems like we are missing an AVX512 opportunity to use 256-bit ops in place of
512-bit ops on some tests/targets, but I think that can be a follow-up.
Preliminary steps to make sure the callers are not misusing these queries:
rL361268
rL361547
Sanjay Patel [Sat, 25 May 2019 13:48:07 +0000 (13:48 +0000)]
[SelectionDAG] soften assertion when legalizing narrow vector FP ops
The test based on PR42010:
https://bugs.llvm.org/show_bug.cgi?id=42010
...may show an inaccuracy for PPC's target defs, but we should not
be so aggressive with an assert here. There's no telling what out-of-tree
targets look like.