Matt Arsenault [Tue, 28 May 2019 16:46:02 +0000 (16:46 +0000)]
AMDGPU: Don't enable all lanes with non-CSR VGPR spills
If the only VGPRs used for SGPR spilling were not CSRs, this was
enabling all laness and immediately restoring exec. This is the usual
situation in leaf functions.
Michael Liao [Tue, 28 May 2019 16:29:39 +0000 (16:29 +0000)]
[AMDGPU] Fix the mis-handling of `vreg_1` copied from scalar register.
Summary:
- Don't treat the use of a scalar register as `vreg_1` an VGPR usage.
Otherwise, that promotes that scalar register into vector one, which
breaks the assumption that scalar register holds the lane mask.
- The issue is triggered in a complicated case, where if the uses of
that (lane mask) scalar register is legalized firstly before its
definition, e.g., due to the mismatch block placement and its
topological order or loop. In that cases, the legalization of PHI
introduces the use of that scalar register as `vreg_1`.
Simon Tatham [Tue, 28 May 2019 16:13:20 +0000 (16:13 +0000)]
[ARM] Replace fp-only-sp and d16 with fp64 and d32.
Those two subtarget features were awkward because their semantics are
reversed: each one indicates the _lack_ of support for something in
the architecture, rather than the presence. As a consequence, you
don't get the behavior you want if you combine two sets of feature
bits.
Each SubtargetFeature for an FP architecture version now comes in four
versions, one for each combination of those options. So you can still
say (for example) '+vfp2' in a feature string and it will mean what
it's always meant, but there's a new string '+vfp2d16sp' meaning the
version without those extra options.
A lot of this change is just mechanically replacing positive checks
for the old features with negative checks for the new ones. But one
more interesting change is that I've rearranged getFPUFeatures() so
that the main FPU feature is appended to the output list *before*
rather than after the features derived from the Restriction field, so
that -fp64 and -d32 can override defaults added by the main feature.
Jason Liu [Tue, 28 May 2019 14:37:59 +0000 (14:37 +0000)]
[XCOFF] Implement parsing symbol table for xcoffobjfile and output as yaml format
Summary:
This patch implement parsing symbol table for xcoffobjfile and
output as yaml format. Parsing auxiliary entries of a symbol
will be in a separate patch.
The XCOFF object file (aix_xcoff.o) used in the test comes from
-bash-4.2$ cat test.c
extern int i;
extern int TestforXcoff;
int main()
{
i++;
TestforXcoff--;
}
Sanjay Patel [Tue, 28 May 2019 13:54:17 +0000 (13:54 +0000)]
[x86] split 256-bit store of concatenated vectors
This shows up as a side issue to the main problem for the AVX target example from PR37428:
https://bugs.llvm.org/show_bug.cgi?id=37428 - https://godbolt.org/z/7tpRa3
But as we can see in the pile of existing test diffs, it's actually a widespread problem
that affects any AVX or later target. Apart from a couple of oddballs, I think these are
all improvements for the reasons stated in the code comment: we do not want to enable YMM
unnecessarily (avoid vzeroupper and frequency throttling) and some cores split 256-bit
stores anyway.
We could say that MergeConsecutiveStores() is going overboard on some of these examples,
but that won't solve the problem completely. But that is the reason I'm proposing this as
a lowering rather than a combine: we will infinite loop fighting the merge code if we try
this earlier.
David Stenberg [Tue, 28 May 2019 13:23:25 +0000 (13:23 +0000)]
Stop undef fragments from closing non-overlapping fragments
Summary:
When DwarfDebug::buildLocationList() encountered an undef debug value,
it would truncate all open values, regardless if they were overlapping or
not. This patch fixes so that it only does that for overlapping fragments.
This change unearthed a bug that I had introduced in D57511,
which I have fixed in this patch. The code in DebugHandlerBase that
changes labels for parameter debug values could break DwarfDebug's
assumption that the labels for the entries in the debug value history
are monotonically increasing. Before this patch, that bug could result
in location list entries whose ending address was lower than the
beginning address, and with the changes for undef debug values that this
patch introduces it could trigger an assertion, due to attempting to
emit location list entries with empty ranges. A reproducer for the bug
is added in param-reg-const-mix.mir.
Sanjay Patel [Tue, 28 May 2019 12:58:07 +0000 (12:58 +0000)]
[x86] fix 256-bit vector store splitting to honor 'volatile'
Forking this out of the discussion in D62498
(and assuming that will be committed later, so adding the helper function here).
The LangRef says:
"the backend should never split or merge target-legal volatile load/store instructions."
Hans Wennborg [Tue, 28 May 2019 12:19:38 +0000 (12:19 +0000)]
Re-commit r357452 (take 2): "SimplifyCFG SinkCommonCodeFromPredecessors: Also sink function calls without used results (PR41259)"
This was reverted in r360086 as it was supected of causing mysterious test
failures internally. However, it was never concluded that this patch was the
root cause.
> The code was previously checking that candidates for sinking had exactly
> one use or were a store instruction (which can't have uses). This meant
> we could sink call instructions only if they had a use.
>
> That limitation seemed a bit arbitrary, so this patch changes it to
> "instruction has zero or one use" which seems more natural and removes
> the need to special-case stores.
>
> Differential revision: https://reviews.llvm.org/D59936
Yevgeny Rouban [Tue, 28 May 2019 11:33:50 +0000 (11:33 +0000)]
[CorrelatedValuePropagation] Fix prof branch_weights metadata handling for SwitchInst
This patch fixes the CorrelatedValuePropagation pass to keep
prof branch_weights metadata of SwitchInst consistent.
It makes use of SwitchInstProfUpdateWrapper.
New tests are added.
Craig Topper [Tue, 28 May 2019 07:25:27 +0000 (07:25 +0000)]
[InlineCost] Fix a couple comments. NFC
Replace "unary operator" with "unary instruction" in visitUnaryInstruction since
we now have a UnaryOperator class which might needs its own visit function.
Fix a copy/paste in visitCastInst that appears to have been copied from
visitPtrToInt.
Don Hinton [Tue, 28 May 2019 06:26:58 +0000 (06:26 +0000)]
[test] Fix plugin tests
Summary:
The following changes were required to fix these tests:
1) Change LLVM_ENABLE_PLUGINS to an option and move it to
llvm/CMakeLists.txt with an appropriate default -- which matches
the original default behavior.
2) Move the plugins directory from clang/test/Analysis
clang/lib/Analysis. It's not enough to add an exclude to the
lit.local.cfg file because add_lit_testsuites recurses the tree and
automatically adds the appropriate `check-` targets, which don't
make sense for the plugins because they aren't tests and don't
have `RUN` statements.
Here's a list of the `clang-check-anlysis*` targets with this
change:
Matt Arsenault [Mon, 27 May 2019 20:37:31 +0000 (20:37 +0000)]
RegAllocFast: Set MayLiveAcrossBlocks when allocating uses
Setting mayLiveOut based only on use instructions after allocating the
def block did not work if the use block was allocated before the def
block, since the virtual register uses were already removed.
Sanjay Patel [Mon, 27 May 2019 20:26:21 +0000 (20:26 +0000)]
[SelectionDAG] fold concat of extract subvectors
This is derived from the related fold for build vectors.
We also have a version of this in DAGCombiner. The benefit of
having this fold at node creation time is (1) efficiency and
(2) preventing infinite looping from creating patterns that
should not exist in the first place.
Currently, the inf-loop could happen with MergeConsecutiveStores()
because it naively creates concat of extracts when forming a wider
vector store. That could fight with target-specific store narrowing.
Sanjay Patel [Mon, 27 May 2019 18:26:43 +0000 (18:26 +0000)]
[SelectionDAG] fix formatting and redundant comments; NFC
There's a possible missing fold here for extracting from the
same source vector. It's similar to a check that we use to
squash a build vector with all extracted elements from the
same source vector.
Michael Liao [Mon, 27 May 2019 18:26:29 +0000 (18:26 +0000)]
[SelectionDAG] Enhance the simplification of `copyto` from `implicit-def`.
Summary:
- The current implementation simplifies the case where the source of
`copyto` is `implicit-def`ed. However, it only works when that
`implicit-def` is single-used since it detects that from
`implicit-def` and cannot determine which destination vreg should be
used if there are multiple uses.
- This patch changes that detection when `copyto` is being emitted. If
that `copyto`'s source is defined from `implicit-def`, it simplifies
it. Hence, it works even that `implicit-def` is multi-used.
- Except it simplifies the internal IR, it won't improve the quality of
code generation. However, it helps to detect 'implicit-def` in a
straight-forward manner in some passes, such as `si-i1-copies`. A test
case is added.
Jacques Pienaar [Mon, 27 May 2019 17:38:41 +0000 (17:38 +0000)]
NFC: Change usage of 'DenseSet' to 'DenseSetImpl' in DenseSetImpl::ConstIterator.
Summary:
Change usage of 'DenseSet' to 'DenseSetImpl' in a friend declaration within DenseSetImpl::ConstIterator. 'ConstIterator' was never updated when DenseSet was split into an impl when adding support for DenseSetImpl.
This fixes build errors on MSVC when forward declaring DenseSet as this friend decl does not declare the template arguments as well.
Dmitri Gribenko [Mon, 27 May 2019 17:03:57 +0000 (17:03 +0000)]
Include what you use in AArch64AsmBackend.cpp
AArch64AsmBackend.cpp was not using any APIs from AArch64.h, and was
only including it for transitive dependencies. Doing so is problematic
from include-what-you-use perspective, but it is also a layering issue
(it creates a dependency cycle between the primary AArch64 target
library and the MCTargetDesc library).
Simon Pilgrim [Mon, 27 May 2019 16:39:25 +0000 (16:39 +0000)]
[SelectionDAG] GetDemandedBits - add demanded elements wrapper implementation
The DemandedElts variable is pretty much inert at the moment - the original GetDemandedBits implementation calls it with an 'all ones' DemandedElts value so the function is active and behaves exactly as it used to.
LI is a loop invariant load instruction that post dominate for.outer, so LI should be able to move out of the loop nest. However, there is a bug in allLoopPathsLeadToBlock().
Current algorithm of allLoopPathsLeadToBlock()
1. get all the transitive predecessors of the basic block LI belongs to (for.inner) ==> for.outer, for.inner.latch
2. if any successors of any of the predecessors are not for.inner or for.inner's predecessors, then return false
3. return true
Although for.inner.latch is for.inner's predecessor, but for.inner dominates for.inner.latch, which means if for.inner.latch is ever executed, for.inner should be as well. It should not return false for cases like this.
Hans Wennborg [Mon, 27 May 2019 09:03:00 +0000 (09:03 +0000)]
Cmake: allow using LLVM_EXTERNAL_PROJECTS with LLVM_ENABLE_PROJECTS
The current code iterates over the combination of LLVM_EXTERNAL_PROJECTS
and LLVM_ENABLE_PROJECTS, but then disables projects that are only in
the former. If a project is in LLVM_EXTERNAL_PROJECTS, it should be
enabled.
Serge Guelton [Mon, 27 May 2019 08:24:06 +0000 (08:24 +0000)]
Make llvm-as --help great again
This is a follow-up to https://reviews.llvm.org/D60411, but for llvm-as.
New output:
OVERVIEW: llvm .ll -> .bc assembler
USAGE: llvm-as [options] <input .llvm file>
OPTIONS:
Generic Options:
-help - Display available options (-help-hidden for more)
-help-list - Display list of available options (-help-list-hidden for more)
-version - Display the version of this program
llvm-as Options:
-data-layout=<layout-string> - data layout string to use
-disable-output - Disable output
-f - Enable binary output on terminals
-module-hash - Emit module hash
-o=<filename> - Override output filename
Nico Weber [Mon, 27 May 2019 00:48:59 +0000 (00:48 +0000)]
llvm-undname: Make demangling of MD5 names more robust
Demangler::parse() for MD5 names would:
1. Put all remaining text into the MD5 name sight unseen
2. Not modify MangledName
This meant that if the demangler recursively called parse() (e.g. in
demangleLocallyScopedNamePiece()), every recursive call that started on
an MD5 name would add all remaining bytes to the output buffer but
only advance the input by a byte. For valid inputs, MD5 types are
never (well, see comments for 2 exceptions) nested, but for invalid
input this could cause memory use quadratic in the input size.
[AMDGPU] Divergence driven ISel. Assign register class for cross block values according to the divergence.
Details: To make instruction selection really divergence driven it is necessary to assign
the correct register classes to the cross block values beforehand. For the divergent targets
same value type requires different register classes dependent on the value divergence.
This fixes a problem where back-pressure increases caused by register
dependencies were not correctly notified if execution was also delayed by memory
dependencies.
Andrea Di Biagio [Sun, 26 May 2019 18:41:35 +0000 (18:41 +0000)]
[MCA] Refactor the logic that computes the critical memory dependency info. NFCI
CriticalRegDep has been renamed CriticalDependency, and it is now used by class
Instruction to store information about the critical register dependency and the
critical memory dependency. No functional change intendend.
Shawn Landden [Sun, 26 May 2019 13:55:14 +0000 (13:55 +0000)]
[SimplifyCFG] Run ReduceSwitchRange unconditionally, generalize
Rather than gating on "isSwitchDense" (resulting in necessesarily
sparse lookup tables even when they were generated), always run
this quite cheap transform.
This transform is useful not just for generating tables.
LowerSwitch also wants this: read LowerSwitch.cpp:257.
Be careful to not generate worse code, by introducing a
SubThreshold heuristic.
Instead of just sorting by signed, generalize the finding of the
best base.
And now that it is run unconditionally, do not replicate its
functionality in SwitchToLookupTable (which could use a Sub
when having a hole is smaller, hence the SubThreshold
heuristic located in a single place).
This simplifies SwitchToLookupTable, and fixes
some ugly corner cases due to the use of signed numbers,
such as a table containing i16 32768 and 32769, of which
32769 would be interpreted as -32768, and now the code thinks
the table is size 65536.
(We still use unconditional subtraction when building a single-register mask,
but I think this whole block should go when the more general sparse
map is added, which doesn't leave empty holes in the table.)
And the reason test4 and test5 did not trigger was documented wrong:
it was because they were not considered sufficiently "dense".
Also, fix generation of invalid LLVM-IR: shl by bit-width.
Extract method to compute overflow based on binop and signedness,
and then make the result handling code generic. This extends the
always-overflow handling to signed muls, but has currently no effect,
as we don't compute always overflow for them (thus NFC).
David Green [Sun, 26 May 2019 10:59:21 +0000 (10:59 +0000)]
[ARM] Promote various fp16 math intrinsics
Promote a number of fp16 math intrinsics to float, so that the relevant float
math routines can be used. Copysign is expanded so as to be handled in-place.
Fangrui Song [Sun, 26 May 2019 08:31:00 +0000 (08:31 +0000)]
[PowerPC] Add missing R_PPC_* relocation types
While people mostly care about 64-bit, some systems need basic lib32
support. The plan is to make lld (see PR40888) capable of linking some
applications (PR40888).