Philip Reames [Tue, 12 May 2015 23:39:23 +0000 (23:39 +0000)]
[PlaceSafepoints] Followup to commit L237172
Responding to review feedback from http://reviews.llvm.org/D9585
1) Remove a variable shadow by converting the outer loop to a range for loop. We never really used the 'i' variable which was being shadowed.
2) Reduce DominatorTree recalculations by passing the DT to SplitEdge.
Chandler Carruth [Tue, 12 May 2015 23:32:56 +0000 (23:32 +0000)]
[Unrolling] Refactor the start and step offsets to simplify overflow
checking and make the cache faster and smaller.
I had thought that using an APInt here would be useful, but I think
I was just wrong. Notably, we don't have to do any fancy overflow
checking, we can just bound the values as quite small and do the math in
a higher precision integer. I've switched to a signed integer so that
UBSan will even point out if we ever have integer overflow. I've added
various asserts to try to catch things as well and hoisted the overflow
checks so that we just leave the too-large offsets out of the SCEV-GEP
cache. This makes the value in the cache quite a bit smaller which is
probably worthwhile.
No functionality changed here (for trip counts under 1 billion).
Bjorn Steinbrink [Tue, 12 May 2015 22:31:47 +0000 (22:31 +0000)]
CVP: Improve handling of Selects used as incoming PHI values
Summary:
If the branch that leads to the PHI node and the Select instruction
depend on correlated conditions, we might be able to directly use the
corresponding value from the Select instruction as the incoming value
for the PHI node, allowing later removal of the select instruction.
Philip Reames [Tue, 12 May 2015 22:19:52 +0000 (22:19 +0000)]
[RewriteStatepointsForGC] Extend base pointer to handle more cases w/vectors
When relocating a pointer, we need to determine a base pointer for the derived pointer being relocated. We have limited support for handling a pointer extracted from a vector; the current code only handled the case where the entire vector was known to contain base pointers. This patch extends the reasoning to handle chains of insertelements where the indices are constants. This case turns out to be fairly common in vectorized code. We can now handle vectors which contains mixtures of base and derived pointers provided the insertelements use constant indices.
Note that this doesn't solve the general problem. To handle variable indexed insertelements, we'd need to scalarize and introduce conditional branching based on the index. Alternatively, we could eagerly scalarize, but the code structure doesn't currently make either fix easy. The patch also doesn't handle shufflevector or other vector manipulation for much the same reasons. I plan to defer this work until I have a motivating test case.
Philip Reames [Tue, 12 May 2015 21:21:18 +0000 (21:21 +0000)]
[PlaceSafepoints] Switch to being a FunctionPass
The pass doesn't actually modify the module outside of the function being processed. The only confusing piece is that it both inserts calls and then inlines the resulting calls. Given that, it definitely invalidates module level analysis results, but many FunctionPasses do that.
Philip Reames [Tue, 12 May 2015 21:09:36 +0000 (21:09 +0000)]
[PlaceSafepoints] Make internal helper pass a FunctionPass
Switch from using a LoopPass to using a FunctionPass for the internal helper analysis pass. The next step is going to be to make this a true analysis pass which is required by the PlaceSafepoints pass itself.
p.s. The interesting semantic part here is that we're changing the iteration order over the loops. It shouldn't matter, but that's the reason to separate this into it's own distinct patch.
Matthias Braun [Tue, 12 May 2015 21:07:54 +0000 (21:07 +0000)]
ARM: Remove Itineraries for swift CPU
They do more harm than good when used in the MachineScheduler as they
tend to take preference to register pressure minimsation which is more
important for swift.
Philip Reames [Tue, 12 May 2015 20:43:48 +0000 (20:43 +0000)]
[PlaceSafepoints] Remove dependence on LoopSimplify
As a step towards getting rid of internal pass manager hack entirely, remove the need for loop simplify to run in the inner pass manager. The new code does produce slightly different loop structures, so this isn't technically NFC.
Reimplement heuristic for estimating complete-unroll optimization effects.
Summary:
This patch reimplements heuristic that tries to estimate optimization beneftis
from complete loop unrolling.
In this patch I kept the minimal changes - e.g. I removed code handling
branches and folding compares. That's a promising area, but now there
are too many questions to discuss before we can enable it.
Petar Jovanovic [Tue, 12 May 2015 17:14:05 +0000 (17:14 +0000)]
[Mips] Return false for isFPCloseToIncomingSP()
On Mips, frame pointer points to the same side of the frame as the stack
pointer. This function is used to decide where to put register scavenging
spill slot. So far, it was put on the wrong side of the frame, and thus it
was too far away from $fp when frame was larger than 2^15 bytes.
Tom Stellard [Tue, 12 May 2015 17:13:02 +0000 (17:13 +0000)]
R600/SI: add pass to mark CF live ranges as non-spillable
Spilling can insert instructions almost anywhere, and this can mess
up control flow lowering in a multitude of ways, due to instruction
reordering. Let's sort this out the easy way: never spill registers
involved with control flow, i.e. saved EXEC masks.
Unfortunately, this does not work at all with optimizations disabled,
as the register allocator ignores spill weights. This should be
addressed in a future commit.
The test was reduced from the "stacks" shader of [1]. Some issues
trigger the machine verifier while another one is checked manually.
[1] http://madebyevan.com/webgl-path-tracing/
v2: only insert pass with optimizations enabled, merge test runs.
Sunil Srivastava [Tue, 12 May 2015 16:47:30 +0000 (16:47 +0000)]
Changed renaming of local symbols by inserting a dot vefore the numeric suffix.
One code change and several test changes to match that
details in http://reviews.llvm.org/D9481
Keith Walker [Tue, 12 May 2015 15:25:08 +0000 (15:25 +0000)]
[DWARF] Add CIE header fields address_size and segment_size when generating dwarf-4
The DWARF-4 specification added 2 new fields in the CIE header called
address_size and segment_size.
Create these 2 new fields when generating dwarf-4 CIE entries, print out
the new fields when dumping the CIE and update tests
Tom Stellard [Tue, 12 May 2015 15:00:53 +0000 (15:00 +0000)]
R600/SI: Update tablegen defs to avoid restoring spilled sgprs to m0
We had code to do this in SIRegisterInfo::eliminateFrameIndex(), but
it is easier to just change the definition of SI_SPILL_S32_RESTORE to
only allow numbered sgprs.
Tom Stellard [Tue, 12 May 2015 14:18:14 +0000 (14:18 +0000)]
R600/SI: Remove explicit m0 operand from s_sendmsg
Instead add m0 as an implicit operand. This allows us to avoid using
the M0Reg register class and eliminates a number of unnecessary spills
when using s_sendmsg instructions. This impacts one shader in the
shader-db:
AVX-512, X86: Added lowering for shift operations for SKX.
The other changes in the LowerShift() are not functional,
just to make the code more convenient.
So, the functional changes for SKX only.
John Brawn [Tue, 12 May 2015 13:13:38 +0000 (13:13 +0000)]
[ARM] Use AEABI aligned function variants
AEABI defines aligned variants of memcpy etc. that can be faster than
the default version due to not having to do alignment checks. When
emitting target code for these functions make use of these aligned
variants if possible. Also convert memset to memclr if possible.
Igor Laevsky [Tue, 12 May 2015 13:12:14 +0000 (13:12 +0000)]
Reverse ordering of base and derived pointer during safepoint lowering.
According to the documentation in StackMap section for the safepoint we should have:
"The first Location in each pair describes the base pointer for the object. The second is the derived pointer actually being relocated."
But before this change we emitted them in reverse order - derived pointer first, base pointer second.
Andrea Di Biagio [Tue, 12 May 2015 12:34:22 +0000 (12:34 +0000)]
[X86] Remove useless target specific combine on TRUNCATE dag nodes.
Before revision 171146, function 'PerformTruncateCombine' used to perform
a premature lowering of TRUNCATE dag nodes.
Revision 171146 then moved all the logic implemented by PerformTruncateCombine
to a custom lowering hook. However, that revision forgot to delete
function PerformTruncateCombine from the code.
This patch removes function 'PerformTruncateCombine' since it has no effect
on the SelectionDAG. No functional change intended.
AVX-512: select operation for i1 vectors
like: select i1 %cond, <16 x i1> %a, <16 x i1> %b.
I added pseudo-CMOV patterns to resolve the "select".
Added tests for KNL and SKX.
Eric Christopher [Tue, 12 May 2015 01:26:05 +0000 (01:26 +0000)]
Migrate existing backends that care about software floating point
to use the information in the module rather than TargetOptions.
We've had and clang has used the use-soft-float attribute for some
time now so have the backends set a subtarget feature based on
a particular function now that subtargets are created based on
functions and function attributes.
For the one middle end soft float check go ahead and create
an overloadable TargetLowering::useSoftFloat function that
just checks the TargetSubtargetInfo in all cases.
Also remove the command line option that hard codes whether or
not soft-float is set by using the attribute for all of the
target specific test cases - for the generic just go ahead and
add the attribute in the one case that showed up.
Sanjoy Das [Mon, 11 May 2015 23:47:30 +0000 (23:47 +0000)]
Refactoring gc_relocate related code in CodeGenPrepare.cpp
Summary:
The original code inserted new instructions by following a
Create->Remove->ReInsert flow. This patch removes the unnecessary
Remove->ReInsert part by setting up the InsertPoint correctly at the
very beginning. This change does not introduce any functionality change.
Ahmed Bougacha [Mon, 11 May 2015 23:09:46 +0000 (23:09 +0000)]
[MemCpyOpt] Look at any dependency -not just source- for memset+memcpy.
This fixes another miscompile introduced by r235232: when there was a
dependency on the memcpy destination other than the memset, we would
ignore it, because we only looked at the source dependency.
It was a mistake to use SrcDepInfo. Instead, just use DepInfo.
David Blaikie [Mon, 11 May 2015 22:20:48 +0000 (22:20 +0000)]
Readdress r236990, use of static members on a non-static variable.
The TargetRegistry is just a namespace-like class, instantiated in one
place to use a range-based for loop. Instead, expose access to the
registry via a range-based 'targets()' function instead. This makes most
uses a bit awkward/more verbose - but eventually we should just add a
range-based find_if function which will streamline these functions. I'm
happy to mkae them a bit awkward in the interim as encouragement to
improve the algorithms in time.
James Y Knight [Mon, 11 May 2015 22:17:13 +0000 (22:17 +0000)]
Fix tablegen's PrintFatalError function to run registered file
cleanups.
Also, change code in tablegen which printed a message and then called
"exit(1)" to use PrintFatalError, instead.
This fixes instances where an empty output file was left behind after
a failed tablegen invocation, which would confuse subsequent ninja
runs into not attempting to rebuild.
Alexey Samsonov [Mon, 11 May 2015 21:20:20 +0000 (21:20 +0000)]
Fix input validation issues in llvm-as/llvm-dis
Summary:
1. llvm-as/llvm-dis tools do not check for input filename length.
2. llvm-dis does not verify the `Streamer` variable against `nullptr` properly, so the `M` variable could be uninitialized (e.g. if the input file does not exist) leading to null dref.
Sanjay Patel [Mon, 11 May 2015 21:07:09 +0000 (21:07 +0000)]
propagate IR-level fast-math-flags to DAG nodes; 2nd try; NFC
This is a less ambitious version of:
http://reviews.llvm.org/rL236546
because that was reverted in:
http://reviews.llvm.org/rL236600
because it caused memory corruption that wasn't related to FMF
but was actually due to making nodes with 2 operands derive from a
plain SDNode rather than a BinarySDNode.
This patch adds the minimum plumbing necessary to use IR-level
fast-math-flags (FMF) in the backend without actually using
them for anything yet. This is a follow-on to:
http://reviews.llvm.org/rL235997
...which split the existing nsw / nuw / exact flags and FMF
into their own struct.
Davide Italiano [Mon, 11 May 2015 21:02:34 +0000 (21:02 +0000)]
[LoopIdiomRecognize] Transform backedge-taken count check into an assertion.
runOnCountable() allowed the caller to call on a loop without a
predictable backedge-taken count. Change the code so that only loops
with computable backdge-count can call this function, in order to catch
abuses.
[lib/Fuzzer] add a trace-based mutatation logic. Same idea as with DFSan-based mutator, but instead of relying on taint tracking, try to find the data directly in the input. More (logic and comments) to go.
Sanjoy Das [Mon, 11 May 2015 18:49:34 +0000 (18:49 +0000)]
[RewriteStatepointsForGC] Fix a bug on creating gc_relocate for pointer to vector of pointers
Summary:
In RewriteStatepointsForGC pass, we create a gc_relocate intrinsic for
each relocated pointer, and the gc_relocate has the same type with the
pointer. During the creation of gc_relocate intrinsic, llvm requires to
mangle its type. However, llvm does not support mangling of all possible
types. RewriteStatepointsForGC will hit an assertion failure when it
tries to create a gc_relocate for pointer to vector of pointers because
mangling for vector of pointers is not supported.
This patch changes the way RewriteStatepointsForGC pass creates
gc_relocate. For each relocated pointer, we erase the type of pointers
and create an unified gc_relocate of type i8 addrspace(1)*. Then a
bitcast is inserted to convert the gc_relocate to the correct type. In
this way, gc_relocate does not need to deal with different types of
pointers and the unsupported type mangling is no longer a problem. This
change would also ease further merge when LLVM erases types of pointers
and introduces an unified pointer type.
Some minor changes are also introduced to gc_relocate related part in
InstCombineCalls, CodeGenPrepare, and Verifier accordingly.
Summary:
r235215 adds support for f16 to be considered as a load/store type and
promote f16 operations to f32.
This patch has miscellaneous fixes for the X86 backend so all f16
operations are handled:
1. Set loadextaction for f16 vectors to expand.
2. Handle FP_EXTEND in a switch statement when handling v2f32
3. Do not fold (FP_TO_SINT (load f16)) into FP_TO_INT*_IN_MEM or
(store (SINT_TO_FP )) to a FILD.
Aaron Ballman [Mon, 11 May 2015 13:10:17 +0000 (13:10 +0000)]
Replacing a range-based for loop with an old-style for loop. This code was previously causing a warning with MSVC about a compiler-generated local variable because TargetRegistry::begin() and end() are static member functions. NFC.
Adam Nemet [Mon, 11 May 2015 08:30:28 +0000 (08:30 +0000)]
[Docs] Fix scoped noalias example
Summary:
As far as I understand the entire point of this example is to show that
if noalias is not a superset/equal to the alias.scope list on a scope
domain then load could reference locations that the store is not known
to not-alias i.e may alias.
The QPX single-precision load/store intrinsics have implied
truncation/extension from/to the declared value type of <4 x double> to the
memory type of <4 x float>. When we can prove the alignment of the pointer
argument, and thus replace the intrinsic with a regular load or store, we need
to load or store the correct data type (<4 x float>) instead of (<4 x double>).