Alex Lorenz [Mon, 27 Jul 2015 22:42:41 +0000 (22:42 +0000)]
MIR Serialization: Serialize the unnamed basic block references.
This commit serializes the references from the machine basic blocks to the
unnamed basic blocks.
This commit adds a new attribute to the machine basic block's YAML mapping
called 'ir-block'. This attribute contains the actual reference to the
basic block.
Sanjoy Das [Mon, 27 Jul 2015 21:42:49 +0000 (21:42 +0000)]
[IndVars] Make loop varying predicates loop invariant.
Summary:
Was D9784: "Remove loop variant range check when induction variable is
strictly increasing"
This change re-implements D9784 with the two differences:
1. It does not use SCEVExpander and does not generate new
instructions. Instead, it does a quick local search for existing
`llvm::Value`s that it needs when modifying the `icmp`
instruction.
2. It is more general -- it deals with both increasing and decreasing
induction variables.
I've added all of the tests included with D9784, and two more.
As an example on what this change does (copied from D9784):
Given C code:
```
for (int i = M; i < N; i++) // i is known not to overflow
if (i < 0) break;
a[i] = 0;
}
```
This transformation produces:
```
for (int i = M; i < N; i++)
if (M < 0) break;
a[i] = 0;
}
```
Which can be unswitched into:
```
if (!(M < 0))
for (int i = M; i < N; i++)
a[i] = 0;
}
```
I went back and forth on whether the top level logic should live in
`SimplifyIndvar::eliminateIVComparison` or be put into its own
routine. Right now I've put it under `eliminateIVComparison` because
even though the `icmp` is not *eliminated*, it no longer is an IV
comparison. I'm open to putting it in its own helper routine if you
think that is better.
Alex Lorenz [Mon, 27 Jul 2015 20:29:27 +0000 (20:29 +0000)]
MIR Parser: Rename the standalone parsing methods. NFC.
This commit renames the methods 'parseMBB' and 'parseNamedRegister' to
'parseStandaloneMBB' and 'parseStandaloneNamedRegister' in order for their
names to be consistent with the method 'parseStandaloneVirtualRegister'.
Simon Pilgrim [Mon, 27 Jul 2015 18:52:15 +0000 (18:52 +0000)]
[InstCombine][X86][SSE] Replace sign/zero extension intrinsics with native IR
Now that we are generating sane codegen for vector sext/zext nodes on SSE targets, this patch uses instcombine to replace the SSE41/AVX2 pmovsx and pmovzx intrinsics with the equivalent native IR code.
Matt Arsenault [Mon, 27 Jul 2015 18:31:03 +0000 (18:31 +0000)]
Fix assert when inlining a constantexpr addrspacecast
The pointer size of the addrspacecasted pointer might not have matched,
so this would have hit an assert in accumulateConstantOffset.
I think this was here to allow constant folding of a load of an
addrspacecasted constant. Accumulating the offset through the
addrspacecast doesn't make much sense, so something else is necessary
to allow folding the load through this cast.
Diego Novillo [Mon, 27 Jul 2015 18:27:23 +0000 (18:27 +0000)]
Fix ODR violation. NFC.
There is an ODR conflict between lib/ExecutionEngine/ExecutionEngineBindings.cpp
and lib/Target/TargetMachineC.cpp. The inline definitions should simply
be marked static (thanks dblaikie for the hint).
Fix `llvm-config` to emit the linker flag for the combined shared object built by autoconfig/make instead of the individual components.
Summary:
When LLVM is configured to build shared libraries, CMake builds each component as it's own shared object, while autoconfig/make builds them statically and then links them all together to create a single shared object. This change adds compile time config flags to `llvm-config` so it can know whether LLVM's components are separated or not and act accordingly.
This fixes `llvm-config` instead of fixing the makefiles to behave like CMake because, AIUI, LLVM's autoconfig/make build system is on the way out anyway.
This change only affects `llvm-config` from builds that use autoconfig/make.
Marek Olsak [Mon, 27 Jul 2015 18:16:08 +0000 (18:16 +0000)]
AMDGPU: don't match vgpr loads for constant loads
Author: Dave Airlie <airlied@redhat.com>
In order to implement indirect sampler loads, we don't
want to match on a VGPR load but an SGPR one for constants,
as we cannot feed VGPRs to the sampler only SGPRs.
Alex Lorenz [Mon, 27 Jul 2015 17:51:59 +0000 (17:51 +0000)]
Reset the virtual registers in liveins when clearing the virtual registers.
This commit zeroes out the virtual register references in the machine
function's liveins in the class 'MachineRegisterInfo' when the virtual
register definitions are cleared.
[PeepholeOptimizer] Look through PHIs to find additional register sources
Reapply r242295 with fixes in the implementation.
- Teaches the ValueTracker in the PeepholeOptimizer to look through PHI
instructions.
- Add findNextSourceAndRewritePHI method to lookup into multiple sources
returnted by the ValueTracker and rewrite PHIs with new sources.
With these changes we can find more register sources and rewrite more
copies to allow coaslescing of bitcast instructions. Hence, we eliminate
unnecessary VR64 <-> GR64 copies in x86, but it could be extended to
other archs by marking "isBitcast" on target specific instructions. The
x86 example follows:
[ARM/AArch64] Fix cost model for interleaved accesses
Summary:
Fix the cost of interleaved accesses for ARM/AArch64.
We were calling getTypeAllocSize and using it to check
the number of bits, when we should have called
getTypeAllocSizeInBits instead.
This would pottentially cause the vectorizer to
generate loads/stores and shuffles which cannot
be matched with an interleaved access instruction.
No performance changes are expected for now since
matching/generating interleaved accesses is still
disabled by default.
r243250 appeared to break clang/test/Analysis/dead-store.c on one of the build
slaves, but I couldn't reproduce this failure locally. Probably a false
positive as I saw this test was broken by r243246 or r243247 too but passed
later without people fixing anything.
[TTI/CostModel] improve TTI::getGEPCost and use it in CostModel::getInstructionCost
Summary:
This patch updates TargetTransformInfoImplCRTPBase::getGEPCost to consider
addressing modes. It now returns TCC_Free when the GEP can be completely folded
to an addresing mode.
I started this patch as I refactored SLSR. Function isGEPFoldable looks common
and is indeed used by some WIP of mine. So I extracted that logic to getGEPCost.
Furthermore, I noticed getGEPCost wasn't directly tested anywhere. The best
testing bed seems CostModel, but its getInstructionCost method invokes
getAddressComputationCost for GEPs which provides very coarse estimation. So
this patch also makes getInstructionCost call the updated getGEPCost for GEPs.
This change inevitably breaks some tests because the cost model changes, but
nothing looks seriously wrong -- if we believe the new cost model is the right
way to go, these tests should be updated.
This patch is not perfect yet -- the comments in some tests need to be updated.
I want to know whether this is a right approach before fixing those details.
bugpoint: make the number of trim iterations a compile-time constant
Around 10 year ago Chris limited this code to a single iteration by just
dropping a break into the loop body. We now make the number of trim iterations
a compile time constant to be able to play with it and see if this can
improve the bugpoint results. We currently use with '3' still a small and
conservative value, but this can be adjusted in the future, if needed.
I tried to look for a trivial test case, but did not succeed yet.
Igor Breger [Sun, 26 Jul 2015 14:41:44 +0000 (14:41 +0000)]
Implemented encoding and intrinsics of the following instructions
vunpckhps/pd, vunpcklps/pd,
vpunpcklbw, vpunpckhbw, vpunpcklwd, vpunpckhwd, vpunpckldq, vpunpckhdq, vpunpcklqdq, vpunpckhqdq
Added tests for intrinsics and encoding.
Davide Italiano [Sun, 26 Jul 2015 05:35:59 +0000 (05:35 +0000)]
[llvm-dwarfump] Don't rely on global state, part 3.
Some tools used to rely on a global static variable to keep track of the
return value for main(). I changed llvm-cxxdump to use exit(1)
and Rafael shortly after did the same with llvm-readobj. This is
(yet) another step towards the goal.
Adam Nemet [Sun, 26 Jul 2015 05:32:14 +0000 (05:32 +0000)]
[LAA] Begin moving the logic of generating checks out of addRuntimeCheck
Summary:
The goal is to start moving us closer to the model where
RuntimePointerChecking will compute and store the checks. Then a client
can filter the check according to its requirements and then use the
filtered list of checks with addRuntimeCheck.
Before the patch, this is all done in addRuntimeCheck. So the patch
starts to split up addRuntimeCheck while providing the old API under
what's more or less a wrapper now.
The new underlying addRuntimeCheck takes a collection of checks now,
expands the code for the bounds then generates the code for the checks.
I am not completely happy with making expandBounds static because now it
needs so many explicit arguments but I don't want to make the type
PointerBounds part of LAI. This should get fixed when addRuntimeCheck
is moved to LoopVersioning where it really belongs, IMO.
Audited the assembly diff of the testsuite (including externals). There
is a tiny bit of assembly churn that is due to the different order the
code for the bounds is expanded now
(MultiSource/Benchmarks/Prolangs-C/bison/conflicts.s and with LoopDist
on 456.hmmer/fast_algorithms.s).
Chen Li [Sat, 25 Jul 2015 03:21:06 +0000 (03:21 +0000)]
[LoopUnswitch] Improve loop unswitch pass to find trivial unswitch conditions more effectively
Summary:
This patch improves trivial loop unswitch.
The current trivial loop unswitch only checks if loop header's terminator contains a trivial unswitch condition. But if the loop header only has one reachable successor (due to intentionally or unintentionally missed code simplification), we should consider the successor as part of the loop header. Therefore, instead of stopping at loop header's terminator, we should keep traversing its successors within loop until reach a *real* conditional branch or switch (whose condition can not be constant folded). This change will enable a single -loop-unswitch pass to unswitch multiple trivial conditions (unswitch one trivial condition could open opportunity to unswitch another one in the same loop), while the old implementation can unswitch only one per pass.
[AArch64][FastISel] Always use an AND instruction when truncating to non-legal types.
When truncating to non-legal types (such as i16, i8 and i1) always use an AND
instruction to mask out the upper bits. This was only done when the source type
was an i64, but not when the source type was an i32.
This commit fixes this and adds the missing i32 truncate tests.
Eric Christopher [Sat, 25 Jul 2015 00:48:08 +0000 (00:48 +0000)]
Fix PPCMaterializeInt to check the size of the integer based on the
extension property we're requesting - zero or sign extended.
This fixes cases where we want to return a zero extended 32-bit -1
and not be sign extended for the entire register. Also updated the
already out of date comment with the current behavior.
[AArch64] Define subtarget feature "reserve-x18", which is used to decide
whether register x18 should be reserved.
This change is needed because we cannot use a backend option to set
cl::opt "aarch64-reserve-x18" when doing LTO.
Out-of-tree projects currently using cl::opt option "-aarch64-reserve-x18"
to reserve x18 should make changes to add subtarget feature "reserve-x18"
to the IR.
DI/Verifier: Fix argument bitrot in DILocalVariable
Add a verifier check that `DILocalVariable`s of tag
`DW_TAG_arg_variable` always have a non-zero 'arg:' field, and those of
tag `DW_TAG_auto_variable` always have a zero 'arg:' field. These are
the only configurations that are properly understood by the backend.
(Also, fix the bad examples in LangRef and test/Assembler, and fix the
bug in Kaleidoscope Ch8.)
A large number of testcases seem to have bitrotted their way forward
from some ancient version of the debug info hierarchy that didn't have
`arg:` parameters. If you have out-of-tree testcases that start failing
in the verifier and you don't care enough to get the `arg:` right, you
may have some luck just calling:
sed -e 's/, arg: 0/, arg: 1/'
or some such, but I hand-updated the ones in tree.
Alex Lorenz [Fri, 24 Jul 2015 22:22:50 +0000 (22:22 +0000)]
MIR Serialization: Serialize MachineFrameInfo's callee saved information.
This commit serializes the callee saved information from the class
'MachineFrameInfo'. This commit extends the YAML mappings for the fixed and
the ordinary stack objects and adds an optional 'callee-saved-register'
attribute. This attribute is used to serialize the callee save information.
Pete Cooper [Fri, 24 Jul 2015 21:38:01 +0000 (21:38 +0000)]
Remove unnecessary null check. NFC.
Since both places which set this variable do so with dyn_cast, and not
dyn_cast_or_null, its impossible to get a nullptr here, so we can remove
the check.
Remove unnecessary and confusing common base class for `DICompositeType`
and `DISubroutineType`.
While at a high-level `DISubroutineType` is a sort of composite of other
types, it has no shared code paths, and its fields are completely
disjoint. This relationship was left over from the old debug info
hierarchy.
Handle `DISubroutineType` up-front rather than as part of a branch for
`DICompositeTypeBase`. The only shared code path was looking through
the base type, but `DISubroutineType` can never have a base type.
This also removes the last use of `DICompositeTypeBase`, since we can
strengthen the cast to `DICompositeType`.
AsmPrinter: Use DICompositeType in updateAcceleratorTables(), NFC
`DISubroutineType` is impossible at this `dyn_cast` site, since we're
only dealing with named types and `DISubroutineType` cannot be named.
Strengthen the `dyn_cast` to `DICompositeType`.
Remove an unnecessary (and confusing) common subclass for
`DIDerivedType` and `DICompositeType`. These classes aren't really
related, and even in the old debug info hierarchy, there was a
long-standing FIXME to separate them.
Verifier: Sink filename check into visitMDCompositeType(), NFC
We really only want to check this for unions and classes (all the other
tags have been ruled out), so simplify the check and move it to the
right place.
Verifier: Remove unnecessary references to DW_TAG_subroutine_type, NFC
Remove unnecessary references to `DW_TAG_subroutine_type` in
`visitDICompositeType()` and `visitDIDerivedTypeBase()`, since
`visitDISubroutineType()` doesn't call either of those (and shouldn't,
since subroutine types are really quite special).
Refactor `isUnsignedDIType()` to deal with `DICompositeType` explicitly.
Since `DW_TAG_subroutine_type` isn't handled here (the assertions about
tags rule it out), this allows strengthening the `dyn_cast` to
`DIDerivedType`.
Besides making the code clearer, this it removes a use of
`DIDerivedTypeBase`.
DI: Strengthen some dyn_casts to DIDerivedType, NFC
The surrounding code proves in both cases that these must be
`DIDerivedType` if they're `DIDerivedTypeBase`, so strengthen the
`dyn_cast`s to the more specific type.
Remove the user-count threshold when analyzing read attributes
Summary:
This threshold limited FunctionAttrs ability to prove arguments to be read-only.
In NVPTX, a specialized instruction ld.global.nc can be used to load memory
with non-coherent texture cache. We notice that in SHOC [1] benchmark, some
function arguments are not marked with readonly because FunctionAttrs reaches
a hardcoded threshold when analysis uses.
Removing this threshold won't cause significant regression in compilation time, because the worst-case time complexity of the algorithm is still O(# of instructions) for each parameter.
Philip Reames [Fri, 24 Jul 2015 19:01:39 +0000 (19:01 +0000)]
[RewriteStatepointsForGC] Adjust naming scheme to be more stable
The names for instructions inserted were previous dependent on iteration order. By deriving the names from the original instructions, we can avoid instability in tests without resorting to ordered traversals. It also makes the IR mildly easier to read at large scale.
Pete Cooper [Fri, 24 Jul 2015 18:29:09 +0000 (18:29 +0000)]
Add const to a bunch of Type* in DataLayout. NFC.
Almost all methods in DataLayout took mutable pointers but didn't need to.
These were only accessing constant methods of the types, or using the Type*
to key a map. Neither of these needs a mutable pointer.
There is an assertion inside `DICompositeTypeBase::getElements()` that
`this` is not a `DISubroutineType`, leaving only `DICompositeType`.
Make that clear at the call sites.
Alex Lorenz [Fri, 24 Jul 2015 17:36:55 +0000 (17:36 +0000)]
MIR Tests: Add liveins and successors to make tests pass with machine verifier.
This commit adds the liveins and successors properties to machine basic blocks
in some of the MIR tests to ensure that the tests will pass when the MIR parser
will run the machine verifier after initializing a machine function.
Alex Lorenz [Fri, 24 Jul 2015 17:31:55 +0000 (17:31 +0000)]
MIR Tests: Make the basic block successor test an X86 specific test.
This commit moves and transforms the generic test
'CodeGen/MIR/successor-basic-blocks.mir' into an X86 specific test
'CodeGen/MIR/X86/successor-basic-blocks.mir'. This change is required in order
to enable the machine verifier for the MIR parser, as the machine verifier
verifies that the machine basic blocks contain instructions that actually
determine the machine basic block successors.
Igor Breger [Fri, 24 Jul 2015 17:24:15 +0000 (17:24 +0000)]
AVX-512: Implemented encoding , DAG lowering and intrinsics for Integer Truncate with/without saturation
Added tests for DAG lowering ,encoding and intrinsic
Hans Wennborg [Fri, 24 Jul 2015 16:16:09 +0000 (16:16 +0000)]
test-release.sh: Defer test errors until the end
This makes the script run to the end and produce tarballs even on test
failures, and then highlights any errors afterwards.
(I first tried just storing the errors in a global variable, but that
didn't work as the "test_llvmCore" function invocation is actually
running as a sub-shell.)
Mehdi Amini [Fri, 24 Jul 2015 16:04:22 +0000 (16:04 +0000)]
Remove access to the DataLayout in the TargetMachine
Summary:
Replace getDataLayout() with a createDataLayout() method to make
explicit that it is intended to create a DataLayout only and not
accessing it for other purpose.
This change is the last of a series of commits dedicated to have a
single DataLayout during compilation by using always the one owned
by the module.
Luke Cheeseman [Fri, 24 Jul 2015 09:57:05 +0000 (09:57 +0000)]
[ARM] - Fix lowering of shufflevectors in AArch32
Some shufflevectors are currently being incorrectly lowered in the AArch32
backend as the existing checks for detecting the NEON operations from the
shufflevector instruction expects the shuffle mask and the vector operands to be
of the same length.
This is not always the case as the mask may be twice as long as the operand;
here only the lower half of the shufflemask gets checked, so provided the lower
half of the shufflemask looks like a vector transpose (or even is just all -1
for undef) then the intrinsics may get incorrectly lowered into a vector
transpose (VTRN) instruction.
This patch fixes this by accommodating for both cases and adds regression tests.
Luke Cheeseman [Fri, 24 Jul 2015 09:31:48 +0000 (09:31 +0000)]
When lowering vector shifts a check is performed to see if the value to shift by
is an immediate, in this check the value is negated and stored in and int64_t.
The value can be -2^63 yet the result cannot be stored in an int64_t and this
gives some undefined behaviour causing failures. The negation is only necessary
when the values is within a certain range and so it should not need to negate
-2^63, this patch introduces this and also a regression test.
[dsymutil] Implement support for universal mach-o object files.
This patch allows llvm-dsymutil to read universal (aka fat) macho object
files and archives. The patch touches nearly everything in the BinaryHolder,
but it is fairly mechinical: the methods that returned MemoryBufferRefs or
ObjectFiles now return a vector of those, and the high-level access function
takes a triple argument to select the architecture.
There is no support yet for handling fat executables and thus no support for
writing fat object files.
MachOObjectFile offers a method for detecting the correct triple, use
it instead of the previous approximation. This doesn't matter right
now, but it will become important for mach-o universal (fat) binaries.
Call a helper that resets all the internal state of the BinaryHolder
when we change the underlying memory buffer. Makes a followup patch
a tiny bit smaller.