[PeepholeOptimizer] Look through PHIs to find additional register sources
Reapply r242295 with fixes in the implementation.
- Teaches the ValueTracker in the PeepholeOptimizer to look through PHI
instructions.
- Add findNextSourceAndRewritePHI method to lookup into multiple sources
returnted by the ValueTracker and rewrite PHIs with new sources.
With these changes we can find more register sources and rewrite more
copies to allow coaslescing of bitcast instructions. Hence, we eliminate
unnecessary VR64 <-> GR64 copies in x86, but it could be extended to
other archs by marking "isBitcast" on target specific instructions. The
x86 example follows:
[ARM/AArch64] Fix cost model for interleaved accesses
Summary:
Fix the cost of interleaved accesses for ARM/AArch64.
We were calling getTypeAllocSize and using it to check
the number of bits, when we should have called
getTypeAllocSizeInBits instead.
This would pottentially cause the vectorizer to
generate loads/stores and shuffles which cannot
be matched with an interleaved access instruction.
No performance changes are expected for now since
matching/generating interleaved accesses is still
disabled by default.
r243250 appeared to break clang/test/Analysis/dead-store.c on one of the build
slaves, but I couldn't reproduce this failure locally. Probably a false
positive as I saw this test was broken by r243246 or r243247 too but passed
later without people fixing anything.
[TTI/CostModel] improve TTI::getGEPCost and use it in CostModel::getInstructionCost
Summary:
This patch updates TargetTransformInfoImplCRTPBase::getGEPCost to consider
addressing modes. It now returns TCC_Free when the GEP can be completely folded
to an addresing mode.
I started this patch as I refactored SLSR. Function isGEPFoldable looks common
and is indeed used by some WIP of mine. So I extracted that logic to getGEPCost.
Furthermore, I noticed getGEPCost wasn't directly tested anywhere. The best
testing bed seems CostModel, but its getInstructionCost method invokes
getAddressComputationCost for GEPs which provides very coarse estimation. So
this patch also makes getInstructionCost call the updated getGEPCost for GEPs.
This change inevitably breaks some tests because the cost model changes, but
nothing looks seriously wrong -- if we believe the new cost model is the right
way to go, these tests should be updated.
This patch is not perfect yet -- the comments in some tests need to be updated.
I want to know whether this is a right approach before fixing those details.
bugpoint: make the number of trim iterations a compile-time constant
Around 10 year ago Chris limited this code to a single iteration by just
dropping a break into the loop body. We now make the number of trim iterations
a compile time constant to be able to play with it and see if this can
improve the bugpoint results. We currently use with '3' still a small and
conservative value, but this can be adjusted in the future, if needed.
I tried to look for a trivial test case, but did not succeed yet.
Igor Breger [Sun, 26 Jul 2015 14:41:44 +0000 (14:41 +0000)]
Implemented encoding and intrinsics of the following instructions
vunpckhps/pd, vunpcklps/pd,
vpunpcklbw, vpunpckhbw, vpunpcklwd, vpunpckhwd, vpunpckldq, vpunpckhdq, vpunpcklqdq, vpunpckhqdq
Added tests for intrinsics and encoding.
Davide Italiano [Sun, 26 Jul 2015 05:35:59 +0000 (05:35 +0000)]
[llvm-dwarfump] Don't rely on global state, part 3.
Some tools used to rely on a global static variable to keep track of the
return value for main(). I changed llvm-cxxdump to use exit(1)
and Rafael shortly after did the same with llvm-readobj. This is
(yet) another step towards the goal.
Adam Nemet [Sun, 26 Jul 2015 05:32:14 +0000 (05:32 +0000)]
[LAA] Begin moving the logic of generating checks out of addRuntimeCheck
Summary:
The goal is to start moving us closer to the model where
RuntimePointerChecking will compute and store the checks. Then a client
can filter the check according to its requirements and then use the
filtered list of checks with addRuntimeCheck.
Before the patch, this is all done in addRuntimeCheck. So the patch
starts to split up addRuntimeCheck while providing the old API under
what's more or less a wrapper now.
The new underlying addRuntimeCheck takes a collection of checks now,
expands the code for the bounds then generates the code for the checks.
I am not completely happy with making expandBounds static because now it
needs so many explicit arguments but I don't want to make the type
PointerBounds part of LAI. This should get fixed when addRuntimeCheck
is moved to LoopVersioning where it really belongs, IMO.
Audited the assembly diff of the testsuite (including externals). There
is a tiny bit of assembly churn that is due to the different order the
code for the bounds is expanded now
(MultiSource/Benchmarks/Prolangs-C/bison/conflicts.s and with LoopDist
on 456.hmmer/fast_algorithms.s).
Chen Li [Sat, 25 Jul 2015 03:21:06 +0000 (03:21 +0000)]
[LoopUnswitch] Improve loop unswitch pass to find trivial unswitch conditions more effectively
Summary:
This patch improves trivial loop unswitch.
The current trivial loop unswitch only checks if loop header's terminator contains a trivial unswitch condition. But if the loop header only has one reachable successor (due to intentionally or unintentionally missed code simplification), we should consider the successor as part of the loop header. Therefore, instead of stopping at loop header's terminator, we should keep traversing its successors within loop until reach a *real* conditional branch or switch (whose condition can not be constant folded). This change will enable a single -loop-unswitch pass to unswitch multiple trivial conditions (unswitch one trivial condition could open opportunity to unswitch another one in the same loop), while the old implementation can unswitch only one per pass.
[AArch64][FastISel] Always use an AND instruction when truncating to non-legal types.
When truncating to non-legal types (such as i16, i8 and i1) always use an AND
instruction to mask out the upper bits. This was only done when the source type
was an i64, but not when the source type was an i32.
This commit fixes this and adds the missing i32 truncate tests.
Eric Christopher [Sat, 25 Jul 2015 00:48:08 +0000 (00:48 +0000)]
Fix PPCMaterializeInt to check the size of the integer based on the
extension property we're requesting - zero or sign extended.
This fixes cases where we want to return a zero extended 32-bit -1
and not be sign extended for the entire register. Also updated the
already out of date comment with the current behavior.
[AArch64] Define subtarget feature "reserve-x18", which is used to decide
whether register x18 should be reserved.
This change is needed because we cannot use a backend option to set
cl::opt "aarch64-reserve-x18" when doing LTO.
Out-of-tree projects currently using cl::opt option "-aarch64-reserve-x18"
to reserve x18 should make changes to add subtarget feature "reserve-x18"
to the IR.
DI/Verifier: Fix argument bitrot in DILocalVariable
Add a verifier check that `DILocalVariable`s of tag
`DW_TAG_arg_variable` always have a non-zero 'arg:' field, and those of
tag `DW_TAG_auto_variable` always have a zero 'arg:' field. These are
the only configurations that are properly understood by the backend.
(Also, fix the bad examples in LangRef and test/Assembler, and fix the
bug in Kaleidoscope Ch8.)
A large number of testcases seem to have bitrotted their way forward
from some ancient version of the debug info hierarchy that didn't have
`arg:` parameters. If you have out-of-tree testcases that start failing
in the verifier and you don't care enough to get the `arg:` right, you
may have some luck just calling:
sed -e 's/, arg: 0/, arg: 1/'
or some such, but I hand-updated the ones in tree.
Alex Lorenz [Fri, 24 Jul 2015 22:22:50 +0000 (22:22 +0000)]
MIR Serialization: Serialize MachineFrameInfo's callee saved information.
This commit serializes the callee saved information from the class
'MachineFrameInfo'. This commit extends the YAML mappings for the fixed and
the ordinary stack objects and adds an optional 'callee-saved-register'
attribute. This attribute is used to serialize the callee save information.
Pete Cooper [Fri, 24 Jul 2015 21:38:01 +0000 (21:38 +0000)]
Remove unnecessary null check. NFC.
Since both places which set this variable do so with dyn_cast, and not
dyn_cast_or_null, its impossible to get a nullptr here, so we can remove
the check.
Remove unnecessary and confusing common base class for `DICompositeType`
and `DISubroutineType`.
While at a high-level `DISubroutineType` is a sort of composite of other
types, it has no shared code paths, and its fields are completely
disjoint. This relationship was left over from the old debug info
hierarchy.
Handle `DISubroutineType` up-front rather than as part of a branch for
`DICompositeTypeBase`. The only shared code path was looking through
the base type, but `DISubroutineType` can never have a base type.
This also removes the last use of `DICompositeTypeBase`, since we can
strengthen the cast to `DICompositeType`.
AsmPrinter: Use DICompositeType in updateAcceleratorTables(), NFC
`DISubroutineType` is impossible at this `dyn_cast` site, since we're
only dealing with named types and `DISubroutineType` cannot be named.
Strengthen the `dyn_cast` to `DICompositeType`.
Remove an unnecessary (and confusing) common subclass for
`DIDerivedType` and `DICompositeType`. These classes aren't really
related, and even in the old debug info hierarchy, there was a
long-standing FIXME to separate them.
Verifier: Sink filename check into visitMDCompositeType(), NFC
We really only want to check this for unions and classes (all the other
tags have been ruled out), so simplify the check and move it to the
right place.
Verifier: Remove unnecessary references to DW_TAG_subroutine_type, NFC
Remove unnecessary references to `DW_TAG_subroutine_type` in
`visitDICompositeType()` and `visitDIDerivedTypeBase()`, since
`visitDISubroutineType()` doesn't call either of those (and shouldn't,
since subroutine types are really quite special).
Refactor `isUnsignedDIType()` to deal with `DICompositeType` explicitly.
Since `DW_TAG_subroutine_type` isn't handled here (the assertions about
tags rule it out), this allows strengthening the `dyn_cast` to
`DIDerivedType`.
Besides making the code clearer, this it removes a use of
`DIDerivedTypeBase`.
DI: Strengthen some dyn_casts to DIDerivedType, NFC
The surrounding code proves in both cases that these must be
`DIDerivedType` if they're `DIDerivedTypeBase`, so strengthen the
`dyn_cast`s to the more specific type.
Remove the user-count threshold when analyzing read attributes
Summary:
This threshold limited FunctionAttrs ability to prove arguments to be read-only.
In NVPTX, a specialized instruction ld.global.nc can be used to load memory
with non-coherent texture cache. We notice that in SHOC [1] benchmark, some
function arguments are not marked with readonly because FunctionAttrs reaches
a hardcoded threshold when analysis uses.
Removing this threshold won't cause significant regression in compilation time, because the worst-case time complexity of the algorithm is still O(# of instructions) for each parameter.
Philip Reames [Fri, 24 Jul 2015 19:01:39 +0000 (19:01 +0000)]
[RewriteStatepointsForGC] Adjust naming scheme to be more stable
The names for instructions inserted were previous dependent on iteration order. By deriving the names from the original instructions, we can avoid instability in tests without resorting to ordered traversals. It also makes the IR mildly easier to read at large scale.
Pete Cooper [Fri, 24 Jul 2015 18:29:09 +0000 (18:29 +0000)]
Add const to a bunch of Type* in DataLayout. NFC.
Almost all methods in DataLayout took mutable pointers but didn't need to.
These were only accessing constant methods of the types, or using the Type*
to key a map. Neither of these needs a mutable pointer.
There is an assertion inside `DICompositeTypeBase::getElements()` that
`this` is not a `DISubroutineType`, leaving only `DICompositeType`.
Make that clear at the call sites.
Alex Lorenz [Fri, 24 Jul 2015 17:36:55 +0000 (17:36 +0000)]
MIR Tests: Add liveins and successors to make tests pass with machine verifier.
This commit adds the liveins and successors properties to machine basic blocks
in some of the MIR tests to ensure that the tests will pass when the MIR parser
will run the machine verifier after initializing a machine function.
Alex Lorenz [Fri, 24 Jul 2015 17:31:55 +0000 (17:31 +0000)]
MIR Tests: Make the basic block successor test an X86 specific test.
This commit moves and transforms the generic test
'CodeGen/MIR/successor-basic-blocks.mir' into an X86 specific test
'CodeGen/MIR/X86/successor-basic-blocks.mir'. This change is required in order
to enable the machine verifier for the MIR parser, as the machine verifier
verifies that the machine basic blocks contain instructions that actually
determine the machine basic block successors.
Igor Breger [Fri, 24 Jul 2015 17:24:15 +0000 (17:24 +0000)]
AVX-512: Implemented encoding , DAG lowering and intrinsics for Integer Truncate with/without saturation
Added tests for DAG lowering ,encoding and intrinsic
Hans Wennborg [Fri, 24 Jul 2015 16:16:09 +0000 (16:16 +0000)]
test-release.sh: Defer test errors until the end
This makes the script run to the end and produce tarballs even on test
failures, and then highlights any errors afterwards.
(I first tried just storing the errors in a global variable, but that
didn't work as the "test_llvmCore" function invocation is actually
running as a sub-shell.)
Mehdi Amini [Fri, 24 Jul 2015 16:04:22 +0000 (16:04 +0000)]
Remove access to the DataLayout in the TargetMachine
Summary:
Replace getDataLayout() with a createDataLayout() method to make
explicit that it is intended to create a DataLayout only and not
accessing it for other purpose.
This change is the last of a series of commits dedicated to have a
single DataLayout during compilation by using always the one owned
by the module.
Luke Cheeseman [Fri, 24 Jul 2015 09:57:05 +0000 (09:57 +0000)]
[ARM] - Fix lowering of shufflevectors in AArch32
Some shufflevectors are currently being incorrectly lowered in the AArch32
backend as the existing checks for detecting the NEON operations from the
shufflevector instruction expects the shuffle mask and the vector operands to be
of the same length.
This is not always the case as the mask may be twice as long as the operand;
here only the lower half of the shufflemask gets checked, so provided the lower
half of the shufflemask looks like a vector transpose (or even is just all -1
for undef) then the intrinsics may get incorrectly lowered into a vector
transpose (VTRN) instruction.
This patch fixes this by accommodating for both cases and adds regression tests.
Luke Cheeseman [Fri, 24 Jul 2015 09:31:48 +0000 (09:31 +0000)]
When lowering vector shifts a check is performed to see if the value to shift by
is an immediate, in this check the value is negated and stored in and int64_t.
The value can be -2^63 yet the result cannot be stored in an int64_t and this
gives some undefined behaviour causing failures. The negation is only necessary
when the values is within a certain range and so it should not need to negate
-2^63, this patch introduces this and also a regression test.
[dsymutil] Implement support for universal mach-o object files.
This patch allows llvm-dsymutil to read universal (aka fat) macho object
files and archives. The patch touches nearly everything in the BinaryHolder,
but it is fairly mechinical: the methods that returned MemoryBufferRefs or
ObjectFiles now return a vector of those, and the high-level access function
takes a triple argument to select the architecture.
There is no support yet for handling fat executables and thus no support for
writing fat object files.
MachOObjectFile offers a method for detecting the correct triple, use
it instead of the previous approximation. This doesn't matter right
now, but it will become important for mach-o universal (fat) binaries.
Call a helper that resets all the internal state of the BinaryHolder
when we change the underlying memory buffer. Makes a followup patch
a tiny bit smaller.
Handle resolvable branches in complete loop unroll heuristic.
Summary:
Resolving a branch allows us to ignore blocks that won't be executed, and thus make our estimate more accurate.
This patch is intended to be applied after D10205 (though it could be applied independently).
Mehdi Amini [Fri, 24 Jul 2015 01:44:39 +0000 (01:44 +0000)]
Remove access to the DataLayout in the TargetMachine
Summary:
Replace getDataLayout() with a createDataLayout() method to make
explicit that it is intended to create a DataLayout only and not
accessing it for other purpose.
This change is the last of a series of commits dedicated to have a
single DataLayout during compilation by using always the one owned
by the module.
Philip Reames [Fri, 24 Jul 2015 00:02:11 +0000 (00:02 +0000)]
[RewriteStatepointsForGC] Use a worklist algorithm for first part of base pointer algorithm [NFC]
The new code should hopefully be equivalent to the old code; it just uses a worklist to track instructions which need to visited rather than iterating over all instructions visited each time. This should be faster, but the primary benefit is that the purpose should be more clear and the diff of adding another instruction type (forthcoming) much more obvious.
fix crash in machine trace metrics due to processing dbg_value instructions (PR24199)
The test in PR24199 ( https://llvm.org/bugs/show_bug.cgi?id=24199 ) crashes because machine
trace metrics was not ignoring dbg_value instructions when calculating data dependencies.
The machine-combiner pass asks machine trace metrics to calculate an instruction trace,
does some reassociations, and calls MachineInstr::eraseFromParentAndMarkDBGValuesForRemoval()
along with MachineTraceMetrics::invalidate(). The dbg_value instructions have their operands
invalidated, but the instructions are not expected to be deleted.
On a subsequent loop iteration of the machine-combiner pass, machine trace metrics would be
called again and die while accessing the invalid debug instructions.
Philip Reames [Thu, 23 Jul 2015 22:25:26 +0000 (22:25 +0000)]
[RewriteStatepointsForGC] Use idomatic mechanisms for debug tracing [NFC]
Deleting much of the code using trace-rewrite-statepoints and use idiomatic DEBUG statements instead. This includes adding operator<< to a helper class.
Philip Reames [Thu, 23 Jul 2015 21:41:27 +0000 (21:41 +0000)]
[RewriteStatepointsForGC] Simplify code around meet of PhiStates [NFC]
We don't need to pass in the map from BDV to PhiStates; we can instead handle that externally and let the MeetPhiStates helper class just meet PhiStates.
Matt Wala [Thu, 23 Jul 2015 20:53:46 +0000 (20:53 +0000)]
[Scalarizer] Fix potential for stale data in Scattered across invocations
Summary:
Scalarizer has two data structures that hold information about changes
to the function, Gathered and Scattered. These are cleared in finish()
at the end of runOnFunction() if finish() detects any changes to the
function.
However, finish() was checking for changes by only checking if
Gathered was non-empty. The function visitStore() only modifies
Scattered without touching Gathered. As a result, Scattered could have
ended up having stale data if Scalarizer only scalarized store
instructions. Since the data in Scattered is used during the execution
of the pass, this introduced dangling pointer errors.
The fix is to check whether both Scattered and Gathered are empty
before deciding what to do in finish(). This also fixes a problem
where the Function can be modified although the pass returns false.