Hsiangkai Wang [Tue, 31 Jul 2018 16:22:09 +0000 (16:22 +0000)]
[DebugInfo] Fix build failed in 'clang-cmake-armv8-full'.
Builder clang-cmake-armv8-full failed due to the assembly 'comment'
notation is not '#' in the target. So, I use CHECK-SAME to avoid to
check the comment notation in the same line in the test case.
Jakub Kuderski [Tue, 31 Jul 2018 15:53:10 +0000 (15:53 +0000)]
[Dominators] Make slow walks shorter
Summary:
When DFS numbers are not yet calculated for a dominator tree, we have to walk it up to say whether one node dominates some other.
This patch makes the slow walks shorter by only walking until the level of the node we check against is reached. This is because a node cannot possibly dominate something higher in its tree.
When running opt with -O3, the patch results in:
* 25% fewer loop iterations for `opt` (fullLTO)
* 30% fewer loop iterations for sqlite
Workaround bug where the InstCombine pass was asserting on the IR added in lit
test, where we have a bitcast instruction after a GEP from an addrspace cast.
The second bitcast in the test was getting combined into
`bitcast <16 x i32>* %0 to <16 x i32> addrspace(3)*`, which looks like it should
be an addrspace cast instruction instead. Otherwise if control flow is allowed
to continue as it is now we create a GEP instruction
`<badref> = getelementptr inbounds <16 x i32>, <16 x i32>* %0, i32 0`. However
because the type of this instruction doesn't match the address space we hit an
assert when replacing the bitcast with that GEP.
```
void llvm::Value::doRAUW(llvm::Value*, bool): Assertion `New->getType() == getType() && "replaceAllUses of value with new value of different type!"' failed.
```
We will collect label information from DBG_LABEL. Before every DBG_LABEL,
we will generate a temporary symbol to denote the location of the label.
The symbol could be used to get DW_AT_low_pc afterwards. So, we create a
mapping between 'inlined label' and DBG_LABEL MachineInstr in DebugHandlerBase.
The DBG_LABEL in the mapping is used to query the symbol before it.
The AbstractLabels in DwarfCompileUnit is used to process labels in inlined
functions.
We also keep a mapping between scope and labels in DwarfFile to help to
generate correct tree structure of DIEs.
It also generates label debug information under global isel.
David Bolvansky [Tue, 31 Jul 2018 14:25:24 +0000 (14:25 +0000)]
Enrich inline messages
Summary:
This patch improves Inliner to provide causes/reasons for negative inline decisions.
1. It adds one new message field to InlineCost to report causes for Always and Never instances. All Never and Always instantiations must provide a simple message.
2. Several functions that used to return the inlining results as boolean are changed to return InlineResult which carries the cause for negative decision.
3. Changed remark priniting and debug output messages to provide the additional messages and related inline cost.
4. Adjusted tests for changed printing.
Andrea Di Biagio [Tue, 31 Jul 2018 14:23:49 +0000 (14:23 +0000)]
[llvm-mca] Remove README.txt
A detailed description of the tool has been recently added by Matt to
CommandGuide/llvm-mca.rst. File README.txt is now redundant and can be removed;
all the relevant user-guide information has been improved and then moved to
llvm-mca.rst.
In future, we should add another .rst for the "llvm-mca developer manual" to
provide infromation about:
- llvm-mca internals.
- How to add custom stages to the simulated pipeline.
- How to provide extra processor info in the scheduling model to improve the
analysis performed by llvm-mca.
John Brawn [Tue, 31 Jul 2018 14:19:29 +0000 (14:19 +0000)]
[MemDep] Use PhiValuesAnalysis to improve alias analysis results
This is being done in order to make GVN able to better optimize certain inputs.
MemDep doesn't use PhiValues directly, but does need to notifiy it when things
get invalidated.
[SLP] Fix PR38339: Instruction does not dominate all uses!
Summary:
If the ExtractElement instructions can be optimized out during the
vectorization and we need to reshuffle the parent vector, this
ShuffleInstruction may be inserted in the wrong place causing compiler
to produce incorrect code.
Matt Arsenault [Tue, 31 Jul 2018 13:34:31 +0000 (13:34 +0000)]
AMDGPU: Fold undef fcanonicalize to qNaN
We could choose a free 0 for this, but this
matches the behavior for fmul undef, 1.0. Also,
the NaN use is more useful for folding use operations
although if it's not eliminated it is more expensive
in terms of code size.
Peter Smith [Tue, 31 Jul 2018 13:24:49 +0000 (13:24 +0000)]
[ARM] Complete enumeration values for Tag_ABI_VFP_args
The LLD implementation of Tag_ABI_VFP_args needs to check the rarely seen
values of 3 (toolchain specific) and 4 compatible with both Base and VFP.
Add the missing enumeration values so that LLD can refer to them without
having to use the raw numbers.
Andrea Di Biagio [Tue, 31 Jul 2018 13:21:43 +0000 (13:21 +0000)]
[llvm-mca][BtVer2] Teach how to identify dependency-breaking idioms.
This patch teaches llvm-mca how to identify dependency breaking instructions on
btver2.
An example of dependency breaking instructions is the zero-idiom XOR (example:
`XOR %eax, %eax`), which always generates zero regardless of the actual value of
the input register operands.
Dependency breaking instructions don't have to wait on their input register
operands before executing. This is because the computation is not dependent on
the inputs.
Not all dependency breaking idioms are also zero-latency instructions. For
example, `CMPEQ %xmm1, %xmm1` is independent on
the value of XMM1, and it generates a vector of all-ones.
That instruction is not eliminated at register renaming stage, and its opcode is
issued to a pipeline for execution. So, the latency is not zero.
This patch adds a new method named isDependencyBreaking() to the MCInstrAnalysis
interface. That method takes as input an instruction (i.e. MCInst) and a
MCSubtargetInfo.
The default implementation of isDependencyBreaking() conservatively returns
false for all instructions. Targets may override the default behavior for
specific CPUs, and return a value which better matches the subtarget behavior.
In future, we should teach to Tablegen how to automatically generate the body of
isDependencyBreaking from scheduling predicate definitions. This would allow us
to expose the knowledge about dependency breaking instructions to the machine
schedulers (and, potentially, other codegen passes).
Peter Smith [Tue, 31 Jul 2018 13:03:54 +0000 (13:03 +0000)]
[ELF][ARM] Add Arm ABI names for float ABI ELF Header flags
The ELF for the Arm architecture document defines, for EF_ARM_EABI_VER5 and
above, the flags EF_ARM_ABI_FLOAT_HARD and EF_ARM_ABI_FLOAT_SOFT. These
have been defined to be compatible with the existing EF_ARM_VFP_FLOAT and
EF_ARM_SOFT_FLOAT used by gcc for EF_ARM_EABI_UNKNOWN.
This patch adds the flags in addition to the existing ones so that any code
depending on the old names will still work.
Jonas Paulsson [Tue, 31 Jul 2018 13:00:42 +0000 (13:00 +0000)]
[SystemZ] Improve decoding in case of instructions with four register operands.
Since z13, the max group size will be 2 if any μop has more than 3 register
sources.
This has been ignored sofar in the SystemZHazardRecognizer, but is now
handled by recognizing those instructions and adjusting the tracking of
decoding and the cost heuristic for grouping.
[InstCombine] simplify code for A & (A ^ B) --> A & ~B
This fold was written in an odd way and tried to avoid
an endless loop by bailing out on all constants instead
of the supposedly problematic case of -1. But (X & -1)
should always be simplified before we reach here, so I'm
not sure how that is a problem.
There were no tests for the commuted patterns, so I added
those at rL338364.
Martin Storsjo [Tue, 31 Jul 2018 09:27:01 +0000 (09:27 +0000)]
[ARM] Support the .inst directive for MachO and COFF targets
Contrary to ELF, we don't add any markers that distinguish data generated
with .short/.long from normal instructions, so the .inst directive only
adds compatibility with assembly that uses it.
Martin Storsjo [Tue, 31 Jul 2018 09:26:52 +0000 (09:26 +0000)]
[AArch64] Support the .inst directive for MachO and COFF targets
Contrary to ELF, we don't add any markers that distinguish data generated
with .long from normal instructions, so the .inst directive only adds
compatibility with assembly that uses it.
Diego Caballero [Tue, 31 Jul 2018 01:57:29 +0000 (01:57 +0000)]
[VPlan] Introduce VPLoopInfo analysis.
The patch introduces loop analysis (VPLoopInfo/VPLoop) for VPBlockBases.
This analysis will be necessary to perform some H-CFG transformations and
detect and introduce regions representing a loop in the H-CFG.
[X86] Stop accidentally running the Bonnell LEA fixup path on Goldmont.
In one place we checked X86Subtarget.slowLEA() to decide if the pass should run. But to decide what the pass should we only check isSLM. This resulted in Goldmont going down the Bonnell path.
[MS Demangler] Better demangling of template arguments.
This patch fixes demangling of template aliases as template-template
arguments, and also fixes function pointers and references as
not type template parameters. All of these can be properly
demangled now, so I've ported over the test
clang/test/CodeGenCXX/ms-template-callbacks.cpp. All of these
tests pass
[DAGCombiner][TargetLowering] Pass a SmallVector instead of a std::vector to BuildSDIV/BuildUDIV/etc.
The vector contains the SDNodes that these functions create. The number of nodes is always a small number so we should use SmallVector to avoid a heap allocation.
This patch adds support for demangling r-value references, new
operators such as the ""_foo operator, lambdas, alias types,
nullptr_t, and various other C++11'isms.
There is 1 failing test remaining in this file, which appears to
be related to back-referencing. This type of problem has the
potential to get ugly so I'd rather fix it in a separate patch.
Summary:
This patch mostly copies the existing Instruction Flow, and stage descriptions
from the mca README. I made a few text tweaks, but no semantic changes,
and made reference to the "default pipeline." I also removed the internals
references (e.g., reference to class names and header files). I did leave the
LSUnit name around, but only as an abbreviated word for the load-store unit.
[DAGCombiner] transform sub-of-shifted-signbit to add
This is exchanging a sub-of-1 with add-of-minus-1:
https://rise4fun.com/Alive/plKAH
This is another step towards improving select-of-constants codegen (see D48970).
x86 is the motivating target, and those diffs all appear to be wins. PPC and AArch64 look neutral.
I've limited this to early combining (!LegalOperations) in case a target wants to reverse it, but
I think canonicalizing to 'add' is more likely to produce further transforms because we have more
folds for 'add'.
Diego Caballero [Mon, 30 Jul 2018 21:33:31 +0000 (21:33 +0000)]
[VPlan] Introduce VPlan-based dominator analysis.
The patch introduces dominator analysis for VPBlockBases and extend
VPlan's GraphTraits specialization with the required interfaces. Dominator
analysis will be necessary to perform some H-CFG transformations and
to introduce VPLoopInfo (LoopInfo analysis on top of the VPlan representation).
This fixes a crash when a second pass is required for the Codeview Type merging *and* the index points outside of the table (which should lead to an error being printed).
This occurs currently until MS precompiled headers .obj is added (see D45213)
Lang Hames [Mon, 30 Jul 2018 21:08:06 +0000 (21:08 +0000)]
[ORC] Add SerializationTraits for std::set and std::map.
Also, make SerializationTraits for pairs forward the actual pair
template type arguments to the underlying serializer. This allows, for example,
std::pair<StringRef, bool> to be passed as an argument to an RPC call expecting
a std::pair<std::string, bool>, since there is an underlying serializer from
StringRef to std::string that can be used.
Revert r338222 "[DAGCombiner] Remove unnecessary calls to AddToWorklist."
Thinking about it more it might be possible for the later nodes to be folded in getNode in such a way that the other created nodes are left dead. This can cause use counts to be incorrect on nodes that aren't dead.
[Inline] Copy "null-pointer-is-valid" attribute in caller.
Summary:
Normally, inling does not happen if caller does not have
"null-pointer-is-valid"="true" attibute but callee has it.
However, alwaysinline may force callee to be inlined.
In this case, if the caller has the "null-pointer-is-valid"="true"
attribute, copy the attribute to caller.
[MachineOutliner][AArch64] Add support for saving LR to a register
This teaches the outliner to save LR to a register rather than the stack when
possible. This allows us to avoid bumping the stack in outlined functions in
some cases. By doing this, in a later patch, we can teach the outliner to do
something like this:
f1:
...
bl OUTLINED_FUNCTION
...
f2:
...
move LR's contents to a register
bl OUTLINED_FUNCTION
move the register's contents back
instead of falling back to saving LR in both cases.
Add machine verifier to arm64-opt-remarks-lazy-bfi
Previously, I thought this was a Windows failure. Then I realized it failed on
every bot that used the verifier. This makes it use the verifier always, and
adds that pass to the pipeline checks so that it's consistent across all bots.
David Bolvansky [Mon, 30 Jul 2018 16:50:00 +0000 (16:50 +0000)]
[DAGCombiner] Bug 31275- Extract a shift from a constant mul or udiv if a rotate can be formed
Summary:
Attempt to extract a shrl from a udiv or a shl from a mul if this allows a rotate to be formed. This targets cases where the input to a rotate pattern was a mul or udiv by a constant and InstCombine merged one of the shifts with the op.
Reapply "Fix crash on inline asm with 64bit matching input in 32bit GPR"
This reapplies commit r338206 reverted by r338214 since the bug that
r338206 uncovered has been fixed in r338268.
Add support for inline assembly with matching input operand that do not
naturally go in the register class it is constrained to (eg. double in a
32-bit GPR). Note that regular input is already handled by existing
code.
Summary:
Fix read of uninitialized RC variable in ARM's PrintAsmOperand when
hasRegClassConstraint returns false. This was causing
inline-asm-operand-implicit-cast test to fail in r338206.
Attempt to fix Windows test failure caused by r338133
It seems like the pass pipeline on Windows is slightly different than on Linux
and macOS. As a result, the arm64-opt-remarks-lazy-bfi test has been failing.
This switches a CHECK-NEXT to a CHECK-DAG to try and get this running properly
again.
It'd be nice to switch it back to a CHECK-NEXT if possible, but the CHECK-NEXT
lines following the line we care about (the optimization remark emitter)
do a pretty good job of enforcing the ordering we want.
Hopefully this works, since I don't have a Windows machine. ;)
Example failure: http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/11295
[AArch64][SVE] Asm: Enable instructions to be prefixed.
This patch enables instructions that are destructive on their
destination- and first source operand, to be prefixed with a
MOVPRFX instruction.
This patch also adds a variety of tests:
- positive tests for all instructions and forms that accept a
movprfx for either or both predicated and unpredicated forms.
- negative tests for all instructions and forms that do not accept
an unpredicated or predicated movprfx.
- negative tests for the diagnostics that get emitted when a MOVPRFX
instruction is used incorrectly.
This is patch [2/2] in a series to add MOVPRFX instructions:
- Patch [1/2]: https://reviews.llvm.org/D49592
- Patch [2/2]: https://reviews.llvm.org/D49593
This patch adds predicated and unpredicated MOVPRFX instructions, which
can be prepended to SVE instructions that are destructive on their first
source operand, to make them a constructive operation, e.g.
Not all instructions can be prefixed with the MOVPRFX instruction
which is why this patch also adds a mechanism to validate prefixed
instructions. The exact rules when a MOVPRFX applies is detailed in
the SVE supplement of the Architectural Reference Manual.
This is patch [1/2] in a series to add MOVPRFX instructions:
- Patch [1/2]: https://reviews.llvm.org/D49592
- Patch [2/2]: https://reviews.llvm.org/D49593
John Brawn [Mon, 30 Jul 2018 14:26:24 +0000 (14:26 +0000)]
Adjust opt pass pipeline tests to cope with combination of r338240 and r338242
The combination of r338240 and r338242 causes the opt pass pipeline tests to
fail because of how r338242 makes BasicAA be invalidated more often. Adjust the
tests to reflect this.
John Brawn [Mon, 30 Jul 2018 11:52:08 +0000 (11:52 +0000)]
[BasicAA] Use PhiValuesAnalysis if available when handling phi alias
By using PhiValuesAnalysis we can get all the values reachable from a phi, so
we can be more precise instead of giving up when a phi has phi operands. We
can't make BaseicAA directly use PhiValuesAnalysis though, as the user of
BasicAA may modify the function in ways that PhiValuesAnalysis can't cope with.
For this optional usage to work correctly BasicAAWrapperPass now needs to be not
marked as CFG-only (i.e. it is now invalidated even when CFG is preserved) due
to how the legacy pass manager handles dependent passes being invalidated,
namely the depending pass still has a pointer to the now-dead dependent pass.
My initial motivation for this came from https://reviews.llvm.org/D48122,
where it was pointed out that my change didn't fit well in SimplifyCFG and
therefore using GVNHoist was a better way to go. GVNHoist has been disabled
for a while as there was a list of bugs related to it.
I investigated this one and proved to be unrelated to GVNHoist, but a genuine bug in NewGvn:
https://bugs.llvm.org/show_bug.cgi?id=37660
To convince myself GVNHoist is in a good state I made a successful bootstrap build of LLVM.
Merging this change now in order to make it to the LLVM 7.0.0 branch.
AMDGPU: Force skip over s_sendmsg and exp instructions
Summary:
These instructions interact with hardware blocks outside the shader core,
and they can have "scalar" side effects even when EXEC = 0. We don't
want these scalar side effects to occur when all lanes want to skip
these instructions, so always add the execz skip branch instruction
for basic blocks that contain them.
Also ensure that we skip scalar stores / atomics, though we don't
code-gen those yet.
Petr Pavlu [Mon, 30 Jul 2018 08:49:30 +0000 (08:49 +0000)]
[ARM] Fix over-alignment in arguments that are HA of 128-bit vectors
Code in `CC_ARM_AAPCS_Custom_Aggregate()` is responsible for handling
homogeneous aggregates for `CC_ARM_AAPCS_VFP`. When an aggregate ends up
fully on stack, the function tries to pack all resulting items of the
aggregate as tightly as possible according to AAPCS.
Once the first item was laid out, the alignment used for consecutive
items was the size of one item. This logic went wrong for 128-bit
vectors because their alignment is normally only 64 bits, and so could
result in inserting unexpected padding between the first and second
element.
The patch fixes the problem by updating the alignment with the item size
only if this results in reducing it.
[MS Demangler] Demangle symbols in function scopes.
There are a couple of issues you run into when you start getting into
more complex names, especially with regards to function local statics.
When you've got something like:
int x() {
static int n = 0;
return n;
}
Then this needs to demangle to something like
int `int __cdecl x()'::`1'::n
The nested mangled symbols (e.g. `int __cdecl x()` in the above
example) also share state with regards to back-referencing, so
we need to be able to re-use the demangler in the middle of
demangling a symbol while sharing back-ref state.
To make matters more complicated, there are a lot of ambiguities
when demangling a symbol's qualified name, because a function local
scope pattern (usually something like `?1??name?`) looks suspiciously
like many other possible things that can occur, such as `?1` meaning
the second back-ref and disambiguating these cases is rather
interesting. The `?1?` in a local scope pattern is actually a special
case of the more general pattern of `? + <encoded number> + ?`, where
"encoded number" can itself have embedded `@` symbols, which is a
common delimeter in mangled names. So we have to take care during the
disambiguation, which is the reason for the overly complicated
`isLocalScopePattern` function in this patch.
I've added some pretty obnoxious tests to exercise all of this, which
exposed several other problems related to back-referencing, so those
are fixed here as well. Finally, I've uncommented some tests that were
previously marked as `FIXME`, since now these work.
[DAGCombiner] Remove unnecessary calls to AddToWorklist.
The DAGCombiner has a mechanism for ensuring all nodes have been visited at least once. Every time a node is visited, it makes sure its operands have been in the worklist at least once. This ensures that when multiple nodes are created by a combine, only the last node needs to be returned. The earlier nodes can all be found Through this operand check. These means we don't need to explicitly add nodes to the worklist when a combine creates multiple nodes.
I've removed the most obvious cases here. There are probably more than can be removed.
These are reassociated versions of the same pattern and
similar transforms as in rL338200 and rL338118.
The motivation is identical to those commits:
Patterns with add/sub combos can be improved using
'not' ops. This is better for analysis and may lead
to follow-on transforms because 'xor' and 'add' are
commutative/associative. It can also help codegen.
[MS Demangler] NFC - Remove state from Demangler class.
We need to be able to initiate a nested demangling from inside
of an "outer" demangling. These need to be able to share some
state, such as back-references. As a result, we can't store
things like the output stream or the mangled name in the Demangler
class, since each demangling will have different values. So
remove this state and pass it through the necessary methods.
Dsymutil's update functionality was broken on Windows because we tried
to rename a file while we're holding open handles to that file. TempFile
provides a solution for this through its keep(Twine) method. This patch
changes dsymutil to make use of that functionality.
Example of bot failure:
http://lab.llvm.org:8011/builders/clang-cmake-armv8-quick/builds/5107/steps/ninja%20check%201/logs/FAIL%3A%20LLVM%3A%3Ainline-asm-operand-implicit-cast.ll