This is the final patch in the series of patches that improves
BUILD_VECTOR handling on PowerPC. This adds a few peephole optimizations
to remove redundant instructions. It also adds a large test case which
encompasses a large set of code patterns that build vectors - this test
case was the motivator for this series of patches.
[LCG] Add some much needed asserts and verify runs to uncover
a hilarious bug and fix it.
We somehow were never verifying the RefSCCs newly formed when
splitting an existing one apart, and when verifying them we weren't
really checking the SCC indices mapping effectively.
If we had been, it would have been blindingly obvious that right after
putting something int `RC.SCCs` we should update `RC.SCCIndices` instead
of `SCCIndices` which we were about to clear and rebuild anyways. =[
Anyways, this is thoroughly covered by existing tests now that we
actually verify things properly.
Craig Topper [Tue, 6 Dec 2016 08:08:12 +0000 (08:08 +0000)]
[X86] Remove another weird scalar sqrt/rcp/rsqrt pattern.
This pattern turned a vector sqrt/rcp/rsqrt operation of sse_load_f32/f64 into the the scalar instruction for the operation and put undef into the upper bits. For correctness, the resulting code should still perform the sqrt/rcp/rsqrt on the upper bits after the load is extended since that's what the operation asked for. Particularly in the case where the upper bits are 0, in that case we need calculate the sqrt/rcp/rsqrt of the zeroes and keep the result in the upper-bits. This implies we should be using the packed instruction still.
The only test case for this pattern is one I just added so there was no coverage of this.
Craig Topper [Tue, 6 Dec 2016 08:08:09 +0000 (08:08 +0000)]
[X86] Add test case demonstrating a case where a vector sqrt being passed (scalar_to_vector loadf64) uses a scalar sqrt instruction.
This occurs due to a pattern that uses sse_load_f32/f64 with vector sqrt/rcp/rsqrt operations and turns them into scalar instructions. Perhaps for the case were the upper bits come from undef this is ok. I believe a (vzmovl load64) would do the same thing but those seems to become vzload instead and selectScalarSSELoad doesn't handle that today. In that case we should be performing the vector operation on the zeros in the upper bits which is not equivalent to using a scalar instruction.
I will remove this pattern in a follow up patch. There appears to be no other test content for it.
Craig Topper [Tue, 6 Dec 2016 08:08:04 +0000 (08:08 +0000)]
[X86] Remove bad pattern that caused 128-bit loads being used by scalar sqrt/rcp/rsqrt intrinsics to select the memory form of the corresponding instruction and violate the semantics of the intrinsic.
The intrinsics are supposed to pass the upper bits straight through to their output register. This means we need to make sure we still perform the 128-bit load to get those upper bits to pass to give to the instruction since the memory form of the instruction only reads 32 or 64 bits.
Craig Topper [Tue, 6 Dec 2016 08:08:01 +0000 (08:08 +0000)]
[X86] Add test case that shows a scalar sqrtsd intrinsic of a 128-bit vector load using the load form of the sqrtsd instruction which violates the intrinsic semantics.
The sqrtsd instruction only loads 64-bits and writes bits 63:0 with the sqrt result. Bits 127:64 are preserved in the destination register. The semantics of the intrinsic indicate bits 127:64 should come from the intrinsic argument which in this case is a 128-bit load. So the generated code should have a 128-bit load and use a register form of sqrtsd.
Craig Topper [Tue, 6 Dec 2016 08:07:58 +0000 (08:07 +0000)]
[X86] Correct pattern for VSQRTSSr_Int, VSQRTSDr_Int, VRCPSSr_Int, and VRSQRTSSr_Int to not have an IMPLICIT_DEF on the first input. The semantics of the intrinsic are clear and not undefined.
The intrinsic takes one argument, the lower bits are affected by the operation and the upper bits should be passed through. The instruction itself takes two operands, the high bits of the first operand are passed through and the low bits of the second operand are modified by the operation. To match this to the intrinsic we should pass the single intrinsic input to both operands.
I had to remove the stack folding test for these instructions since they depended on the incorrect behavior. The same register is now used for both inputs so the load can't be folded.
Chris Bieneman [Tue, 6 Dec 2016 06:00:49 +0000 (06:00 +0000)]
[ObjectYAML] First bit of support for encoding DWARF in MachO
This patch adds the starting support for encoding data from the MachO __DWARF segment. The first section supported is the __debug_str section because it is the simplest.
Craig Topper [Tue, 6 Dec 2016 04:58:39 +0000 (04:58 +0000)]
[X86] Remove scalar logical op alias instructions. Just use COPY_FROM/TO_REGCLASS and the normal packed instructions instead
Summary:
This patch removes the scalar logical operation alias instructions. We can just use reg class copies and use the normal packed instructions instead. This removes the need for putting these instructions in the execution domain fixing tables as was done recently.
I removed the loadf64_128 and loadf32_128 patterns as DAG combine creates a narrower load for (extractelt (loadv4f32)) before we ever get to isel.
I plan to add similar patterns for AVX512DQ in a future commit to allow use of the larger register class when available.
Chris Bieneman [Tue, 6 Dec 2016 04:45:11 +0000 (04:45 +0000)]
[CMake] Cleanup TableGen include flags
It is kinda crazy to have llvm/include and llvm/lib/Target in the include path for every tablegen invocation for every tablegen-like tool.
This patch removes those flags from the tablgen function that is called everywhere by instead creating a variable LLVM_TABLEGEN_FLAGS which is setup in the LLVM source directories.
This removes TableGen.cmake's dependency on LLVM_MAIN_SRC_DIR, and LLVM_MAIN_INCLUDE_DIR.
Philip Reames [Tue, 6 Dec 2016 03:34:33 +0000 (03:34 +0000)]
[LVI] Remove dead code in mergeIn
Integers are expressed in the lattice via constant ranges. They can never be represented by constants or not-constants; those are reserved for non-integer types. This code has been dead for literaly years.
Philip Reames [Tue, 6 Dec 2016 03:01:08 +0000 (03:01 +0000)]
[LVI] Hide the last markX function on LVILatticeVal
This completes a small series of patches to hide the stateful updates of LVILatticeVal from the consuming code. The only remaining stateful API is mergeIn.
Summary:
We recently introduced a feature that enforce at link-time that the
LLVM headers used by a clients are matching the ABI setting of the
LLVM library linked to.
However for clients that are using only headers from ADT and promise
they won't call into LLVM, this is forcing to link libSupport. This
new flag is intended to provide a way to configure LLVM with this
promise for such client.
Matt Arsenault [Tue, 6 Dec 2016 01:02:51 +0000 (01:02 +0000)]
AMDGPU: Don't required structured CFG
The structured CFG is just an aid to inserting exec
mask modification instructions, once that is done
we don't really need it anymore. We also
do not analyze blocks with terminators that
modify exec, so this should only be impacting
true branches.
[libFuzzer] refactor the code to allow collecting features in different ways. Also initialize a couple of Fuzzer:: members that might have been used uninitialized :(
Tim Northover [Mon, 5 Dec 2016 23:10:19 +0000 (23:10 +0000)]
GlobalISel: avoid looking too closely at PHIs when we bail.
The function used to finish off PHIs by adding the relevant basic blocks can
fail if we're aborting and still don't actually have the needed
MachineBasicBlocks. So avoid trying in that case.
Davide Italiano [Mon, 5 Dec 2016 23:04:21 +0000 (23:04 +0000)]
[SCCP] Remove manual folding of terminator instructions.
There are two cases handled here:
1) a branch on undef
2) a switch with an undef condition.
Both cases are currently handled by ResolvedUndefsIn. If we have
a branch on undef, we force its value to false (which is trivially
foldable). If we have a switch on undef, we force to the first
constant (which is also foldable).
Bob Haarman [Mon, 5 Dec 2016 22:44:00 +0000 (22:44 +0000)]
[pdb] handle missing pdb streams more gracefully
Summary: The code we use to read PDBs assumed that streams we ask it to read exist, and would read memory outside a vector and crash if this wasn't the case. This would, for example, cause llvm-pdbdump to crash on PDBs generated by lld. This patch handles such cases more gracefully: the PDB reading code in LLVM now reports errors when asked to get a stream that is not present, and llvm-pdbdump will report missing streams and continue processing streams that are present.
Tim Northover [Mon, 5 Dec 2016 22:40:13 +0000 (22:40 +0000)]
GlobalISel: place constants correctly in the entry block.
When the entry block was empty after arg lowering, we were always placing
constants at the end. This is probably hamrless while translating the same
block, but horribly wrong once its terminator has been translated. So switch to
inserting at the beginning.
Tim Northover [Mon, 5 Dec 2016 21:47:07 +0000 (21:47 +0000)]
GlobalISel: make G_CONSTANT take a ConstantInt rather than int64_t.
This makes it more similar to the floating-point constant, and also allows for
larger constants to be translated later. There's no real functional change in
this patch though, just syntax updates.
Tim Northover [Mon, 5 Dec 2016 21:40:33 +0000 (21:40 +0000)]
GlobalISel: improve translation fallback for constants.
Returning 0 (NoReg) from getOrCreateVReg leads to unexpected situations later
in the translation. It's better to return a valid (if undefined) register and
let the rest of the instruction carry on as planned.
Keno Fischer [Mon, 5 Dec 2016 21:25:03 +0000 (21:25 +0000)]
[LAA] Prevent invalid IR for loop-invariant bound in loop body
Summary:
If LAA expands a bound that is loop invariant, but not hoisted out
of the loop body, it used to use that value anyway, causing a
non-domination error, because the memcheck block is of course not
dominated by the scalar loop body. Detect this situation and expand
the SCEV expression instead.
[X86] Fix non-intrinsic roundss/roundsd to not read the destination register
This changes the scalar non-intrinsic non-avx roundss/sd instruction
definitions not to read their destination register - allowing partial dependency
breaking.
Matt Arsenault [Mon, 5 Dec 2016 20:23:10 +0000 (20:23 +0000)]
AMDGPU: Refactor exp instructions
Structure the definitions a bit more like the other classes.
The main change here is to split EXP with the done bit set
to a separate opcode, so we can set mayLoad = 1 so that it won't
be reordered before the other exp stores, since this has the special
constraint that if the done bit is set then this should be the last
exp in she shader.
Previously all exp instructions were inferred to have unmodeled
side effects.
Eric Fiselier [Mon, 5 Dec 2016 20:21:21 +0000 (20:21 +0000)]
[lit] Support custom parsers in parseIntegratedTestScript
Summary:
Libc++ frequently has the need to parse more than just the builtin *test keywords* (`RUN`, `REQUIRES`, `XFAIL`, ect). For example libc++ currently needs a new keyword `MODULES-DEFINES: macro list...`. Instead of re-implementing the script parsing in libc++ this patch allows `parseIntegratedTestScript` to take custom parsers.
This patch introduces a new class `IntegratedTestKeywordParser` which implements the logic to parse/process a test keyword. Parsing of various keyword "kinds" are supported out of the box, including 'TAG', 'COMMAND', and 'LIST', which parse keywords such as `END.`, `RUN:` and `XFAIL:` respectively.
As an example after this change libc++ can implement the `MODULES-DEFINES` simply using:
```
mparser = IntegratedTestKeywordParser('MODULES-DEFINES:', ParserKind.LIST)
parseIntegratedTestScript(test, additional_parsers=[mparser])
macro_list = mparser.getValue()
```
Matthias Braun [Mon, 5 Dec 2016 19:44:31 +0000 (19:44 +0000)]
TableGen/AsmMatcherEmitter: Bring sorting check back under EXPENSIVE_CHECKS
Bring the sorting check back that I removed in r288655 but put it under
EXPENSIVE_CHECKS this time. Also document that this the check isn't
purely about having a sorted list but also about operator < having the
correct transitive behavior.
Adrian Prantl [Mon, 5 Dec 2016 18:04:47 +0000 (18:04 +0000)]
[DIExpression] Introduce a dedicated DW_OP_LLVM_fragment operation
so we can stop using DW_OP_bit_piece with the wrong semantics.
The entire back story can be found here:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20161114/405934.html
The gist is that in LLVM we've been misinterpreting DW_OP_bit_piece's
offset field to mean the offset into the source variable rather than
the offset into the location at the top the DWARF expression stack. In
order to be able to fix this in a subsequent patch, this patch
introduces a dedicated DW_OP_LLVM_fragment operation with the
semantics that we used to apply to DW_OP_bit_piece, which is what we
actually need while inside of LLVM. This patch is complete with a
bitcode upgrade for expressions using the old format. It does not yet
fix the DWARF backend to use DW_OP_bit_piece correctly.
Implementation note: We discussed several options for implementing
this, including reserving a dedicated field in DIExpression for the
fragment size and offset, but using an custom operator at the end of
the expression works just fine and is more efficient because we then
only pay for it when we need it.
Sanjay Patel [Mon, 5 Dec 2016 15:58:21 +0000 (15:58 +0000)]
[TargetLowering] add special-case for demanded bits analysis of 'not'
We treat bitwise 'not' as a special operation and try not to reduce its all-ones mask.
Presumably, this is because a 'not' may be cheaper than a generic 'xor' or it may get
folded into another logic op if the target has those. However, if we can remove a logic
instruction by changing the xor's constant mask value, that should always be a win.
Note that the IR version of SimplifyDemandedBits() does not treat 'not' as a special-case
currently (although that's marked with a FIXME). So if you run this IR through -instcombine,
you should get the same end result. I'm hoping to add a different backend transform that
will expose this problem though, so I need to solve this first.
Diana Picus [Mon, 5 Dec 2016 10:40:33 +0000 (10:40 +0000)]
[GlobalISel] Extract handleAssignments out of AArch64CallLowering
This function seems target-independent so far: all the target-specific behaviour
is isolated in the CCAssignFn and the ValueHandler (which we're also extracting
into the generic CallLowering).
Matthias Braun [Mon, 5 Dec 2016 08:15:57 +0000 (08:15 +0000)]
TableGen/AsmMatcherEmitter: Trust that stable_sort works
A debug build of AsmMatcherEmitter would use a quadratic algorithm to
check whether std::stable_sort() actually sorted. Let's hope the authors
of our C++ standard library did that testing for us. Removing the check
gives a 3x speedup in the X86 case.
Matthias Braun [Mon, 5 Dec 2016 07:35:09 +0000 (07:35 +0000)]
TableGen/Record: Shortcut member access in hottest function
This may seem unusual, but makes most debug tblgen builds ~10% faster.
Usually we wouldn't care about speed that much in debug builds, but for
tblgen that also translates into build time.
Matthias Braun [Mon, 5 Dec 2016 06:00:36 +0000 (06:00 +0000)]
TableGen: Use more StringInit instead of StringRef
This forces the code to call StringInit::get on the string early and
avoids storing duplicates in std::string and sometimes allows pointer
comparisons instead of string comparisons.
Kuba Mracek [Mon, 5 Dec 2016 05:21:44 +0000 (05:21 +0000)]
Use Darwin libtool's -no_warning_for_no_symbols if available to silence the "has no symbols" link warning
Building compiler-rt on Darwin produces dozens of meaningless warnings about object files having no symbols during static archive creation. This is very intentional as compiler-rt uses #ifdefs to conditionally compile platform-specific code, and we even have a .cpp source file that only contains static asserts to make sure the environment is configured right. On Linux, this situation is fine and no warning is produced. This patch adds a libtool version detection and if it's new enough, we'll use the -no_warning_for_no_symbols flag that suppresses this warning. Build logs should be much cleaner now!
Matthias Braun [Mon, 5 Dec 2016 05:21:18 +0000 (05:21 +0000)]
TableGen: Factor out STRCONCAT constructor, add shortcut.
Introduce new constructor for STRCONCAT binop with a shortcut that
immediately concatenates if the two arguments are StringInits.
Makes the QualifyName code more readable and tablegen 2-3% faster.
Craig Topper [Mon, 5 Dec 2016 04:51:31 +0000 (04:51 +0000)]
[AVX-512] Teach fast isel to use masked compare and movss for handling scalar cmp and select sequence when AVX-512 is enabled. This matches the behavior of normal isel.
Craig Topper [Mon, 5 Dec 2016 04:51:28 +0000 (04:51 +0000)]
[AVX-512] Add avx512f command lines to fast isel SSE select test.
Currently the fast isel code emits an avx1 instruction sequence even with avx512. This is different than normal isel. A follow up commit will fix this.
Chris Bieneman [Mon, 5 Dec 2016 03:28:03 +0000 (03:28 +0000)]
[CMake] Refactor add_llvm_tool_symlink for reuse
The old implementation of add_llvm_tool_symlink could fail in odd ways when building out of tree. This version solves that problem by not using the LLVM_* variables, and instead reaeding the target's properties.