Override the TLI callback enableAggressiveFMAFusion and return true. Indeed, fmul, fmadd and fadd nodes cost the same number of cycles, so we can enable more combining heuristics to produce more fmadd nodes.
Erik Eckstein [Wed, 14 Jan 2015 11:24:47 +0000 (11:24 +0000)]
reapply: SLPVectorizer: Cache results from memory alias checking.
This speeds up the dependency calculations for blocks with many load/store/call instructions.
Beside the improved runtime, there is no functional change.
Compared to the original commit, this re-applied commit contains a bug fix which ensures that there are
no incorrect collisions in the alias cache.
Chandler Carruth [Wed, 14 Jan 2015 11:23:27 +0000 (11:23 +0000)]
[cleanup] Re-sort all the #include lines in LLVM using
utils/sort_includes.py.
I clearly haven't done this in a while, so more changed than usual. This
even uncovered a missing include from the InstrProf library that I've
added. No functionality changed here, just mechanical cleanup of the
include order.
Chandler Carruth [Wed, 14 Jan 2015 10:33:21 +0000 (10:33 +0000)]
[dom] Make the DominatorTreeBase not a dynamic class!
Now that the passes are wrappers around this, we no longer need
a vtable, virtual destructor, and other associated mess. This is
particularly nice to me as this is a class template. =]
Chandler Carruth [Wed, 14 Jan 2015 10:19:28 +0000 (10:19 +0000)]
[PM] Port domtree to the new pass manager (at last).
This adds the domtree analysis to the new pass manager. The analysis
returns the same DominatorTree result entity used by the old pass
manager and essentially all of the code is shared. We just have
different boilerplate for running and printing the analysis.
I've converted one test to run in both modes just to make sure this is
exercised while both are live in the tree.
This commit refines the pattern for the octeon seq/seqi/sne/snei instructions.
The target register is set to 0 or 1 according to the result of the comparison.
In C, this is something like
rd = (unsigned long)(rs == rt)
This commit adds a zext to bring the result to i64. With this change the
instruction is selected for this type of code. (gcc produces the same code for
the above C code.)
Chandler Carruth [Wed, 14 Jan 2015 10:07:19 +0000 (10:07 +0000)]
[PM] Make DominatorTrees (corectly) movable so that we can move them
into the new pass manager's analysis cache which stores results
by-value.
Technically speaking, the dom trees were originally not movable but
copyable! This, unsurprisingly, didn't work at all -- the copy was
shallow and just resulted in rampant memory corruption. This change
explicitly forbids copying (as it would need to be a deep copy) and
makes them explicitly movable with the unsurprising boiler plate to
member-wise move them because we can't rely on MSVC to generate this
code for us. =/
If there is no associated immediate (MS style inline asm), do not try to access
the operand, assume that it is valid. This should fix the buildbots after SVN
r225941.
Mehdi Amini [Wed, 14 Jan 2015 05:33:01 +0000 (05:33 +0000)]
Fold a loop for array processing in ComputeLinearIndex
When processing an array, every Elt has the same layout, it is
useless to recursively call each ComputeLinearIndex on each element.
Just do it once and multiply by the number of elements.
NVPTX: Use MapMetadata() instead of custom/stale/untested logic
Copy the `GVMap` over to a standard `ValueToValueMapTy` so that we can
reuse the `MapMetadata()` logic. Unfortunately the `GVMap` can't just
be replaced, since `MapMetadata()` likes to modify the map, but at least
this will prevent NVPTX from bitrotting.
Chandler Carruth [Wed, 14 Jan 2015 03:34:55 +0000 (03:34 +0000)]
[dom] Add a basic dominator tree test.
Correct, we have *zero* basic testing of the dominator tree in the
regression test suite. There is a single test that even prints it out,
and that test only checks a single line of the output. There are
a handful of tests that check post dominators, but all of those are
looking for bugs rather than just exercising the basic machinery.
This test is super boring and unexciting. But hey, it's something.
I needed there to be something so I could switch the basic test to run
with both the old and new pass manager.
Hao Liu [Wed, 14 Jan 2015 03:02:16 +0000 (03:02 +0000)]
Fix a wrong comment in LoopVectorize.
I.E. more than two -> exactly two
Fix a typo function name in LoopVectorize.
I.E. collectStrideAcccess() -> collectStrideAccess()
Hal Finkel [Wed, 14 Jan 2015 01:37:21 +0000 (01:37 +0000)]
[PowerPC] Fix the noop-insert test
The form of nops used is CPU-specific (some CPUs, such as the POWER7, have
special group-terminating nops). We probably want a different callback for this
kind of nop insertion (something more like MCAsmBackend::writeNopData), or for
PPC to use a different mechanism for scheduling nops, but this will stop the
test from failing for now.
Matt Arsenault [Wed, 14 Jan 2015 01:35:22 +0000 (01:35 +0000)]
R600/SI: Fix bad code with unaligned byte vector loads
Don't do the v4i8 -> v4f32 combine if the load will need to
be expanded due to alignment. This stops adding instructions
to repack into a single register that the v_cvt_ubyteN_f32
instructions read.
Matt Arsenault [Wed, 14 Jan 2015 01:35:17 +0000 (01:35 +0000)]
Implement new way of expanding extloads.
Now that the source and destination types can be specified,
allow doing an expansion that doesn't use an EXTLOAD of the
result type. Try to do a legal extload to an intermediate type
and extend that if possible.
This generalizes the special case custom lowering of extloads
R600 has been using to work around this problem.
This also happens to fix a bug that would incorrectly use more
aligned loads than should be used.
This re-applies r225808, fixed to avoid problems with SDAG dependencies along
with the preceding fix to ScheduleDAGSDNodes::RegDefIter::InitNodeNumDefs.
These problems caused the original regression tests to assert/segfault on many
(but not all) systems.
Original commit message:
This commit does two things:
1. Refactors PPCFastISel to use more of the common infrastructure for call
lowering (this lets us take advantage of this common code for lowering some
common intrinsics, stackmap/patchpoint among them).
2. Adds support for stackmap/patchpoint lowering. For the most part, this is
very similar to the support in the AArch64 target, with the obvious differences
(different registers, NOP instructions, etc.). The test cases are adapted
from the AArch64 test cases.
One difference of note is that the patchpoint call sequence takes 24 bytes, so
you can't use less than that (on AArch64 you can go down to 16). Also, as noted
in the docs, we take the patchpoint address to be the actual code address
(assuming the call is local in the TOC-sharing sense), which should yield
higher performance than generating the full cross-DSO indirect-call sequence
and is likely just as useful for JITed code (if not, we'll change it).
StackMaps and Patchpoints are still marked as experimental, and so this support
is doubly experimental. So go ahead and experiment!
JF Bastien [Wed, 14 Jan 2015 01:07:26 +0000 (01:07 +0000)]
Insert random noops to increase security against ROP attacks (llvm)
A pass that adds random noops to X86 binaries to introduce diversity with the goal of increasing security against most return-oriented programming attacks.
Command line options:
-noop-insertion // Enable noop insertion.
-noop-insertion-percentage=X // X% of assembly instructions will have a noop prepended (default: 50%, requires -noop-insertion)
-max-noops-per-instruction=X // Randomly generate X noops per instruction. ie. roll the dice X times with probability set above (default: 1). This doesn't guarantee X noop instructions.
In addition, the following 'quick switch' in clang enables basic diversity using default settings (currently: noop insertion and schedule randomization; it is intended to be extended in the future).
-fdiversify
This is the llvm part of the patch.
clang part: D3393
http://reviews.llvm.org/D3392
Patch by Stephen Crane (@rinon)
Hal Finkel [Wed, 14 Jan 2015 01:07:03 +0000 (01:07 +0000)]
Adjust ScheduleDAGSDNodes::RegDefIter for patchpoints
PATCHPOINT is a strange pseudo-instruction. Depending on how it is used, and
whether or not the AnyReg calling convention is being used, it might or might
not define a value. However, its TableGen definition says that it defines one
value, and so when it doesn't, the code in ScheduleDAGSDNodes::RegDefIter
becomes confused and the code that uses the RegDefIter will try to get the
register class of the MVT::Other type associated with the PATCHPOINT's chain
result (under certain circumstances).
This will be covered by the PPC64 PatchPoint test cases once that support is
re-committed.
Reid Kleckner [Wed, 14 Jan 2015 01:05:27 +0000 (01:05 +0000)]
CodeGen support for x86_64 SEH catch handlers in LLVM
This adds handling for ExceptionHandling::MSVC, used by the
x86_64-pc-windows-msvc triple. It assumes that filter functions have
already been outlined in either the frontend or the backend. Filter
functions are used in place of the landingpad catch clause type info
operands. In catch clause order, the first filter to return true will
catch the exception.
The C specific handler table expects the landing pad to be split into
one block per handler, but LLVM IR uses a single landing pad for all
possible unwind actions. This patch papers over the mismatch by
synthesizing single instruction BBs for every catch clause to fill in
the EH selector that the landing pad block expects.
Missing functionality:
- Accessing data in the parent frame from outlined filters
- Cleanups (from __finally) are unsupported, as they will require
outlining and parent frame access
- Filter clauses are unsupported, as there's no clear analogue in SEH
In other words, this is the minimal set of changes needed to write IR to
catch arbitrary exceptions and resume normal execution.
Although this makes the `cast<>` assert more often, the
`assert(Node->isResolved())` on the following line would assert in all
those cases. So, no functionality change here.
Chandler Carruth [Wed, 14 Jan 2015 00:27:45 +0000 (00:27 +0000)]
Revert r225854: [PM] Move the LazyCallGraph printing functionality to
a print method.
This was formulated on a bad idea, but sadly I didn't uncover how bad
this was until I got further down the path. I had hoped that we could
provide a low boilerplate way of printing analyses, but it just doesn't
seem like this really fits the needs of the analyses. Not all analyses
really want to do printing, and those that do don't all use the same
interface. Instead, with the new pass manager let's just take advantage
of the fact that creating an explicit printer pass like the LCG has is
pretty low boilerplate already and rely on that for testing.
Chandler Carruth [Tue, 13 Jan 2015 23:53:50 +0000 (23:53 +0000)]
[PM] Move the LazyCallGraph printing functionality to a print method.
I'm adding generic analysis printing utility pass support which will
require such a method (or a specialization) so this will let the
existing printing logic satisfy that.
Adrian Prantl [Tue, 13 Jan 2015 23:39:15 +0000 (23:39 +0000)]
Debug Info: Bail out of AddMachineRegPiece() if MachineReg is not a
physical register. The call to getMinimalPhysRegClass() later on asserts
on this condition.
Adrian Prantl [Tue, 13 Jan 2015 23:39:11 +0000 (23:39 +0000)]
Debug Info: Move the complex expression handling (=the remainder) of
emitDebugLocValue() into DwarfExpression.
Ought to be NFC, but it actually uncovered a bug in the debug-loc-asan.ll
testcase. The testcase checks that the address of variable "y" is stored
at [RSP+16], which also lines up with the comment.
It also check(ed) that the *value* of "y" is stored in RDI before that,
but that is actually incorrect, since RDI is the very value that is
stored in [RSP+16]. Here's the assembler output:
Fixed the comment to spell out the correct register and the check to
expect an address rather than a value.
Note that the range that is emitted for the RDI location was and is still
wrong, it claims to begin at the function prologue, but really it should
start where RDI is first assigned.
Tom Stellard [Tue, 13 Jan 2015 22:59:41 +0000 (22:59 +0000)]
R600/SI: Add pattern for bitcasting fp immediates to integers
The backend now assumes that all immediates are integers. This allows
us to simplify immediate handling code, becasue we no longer need to
handle fp and integer immediates differently.
Chandler Carruth [Tue, 13 Jan 2015 22:45:13 +0000 (22:45 +0000)]
[PM] Remove the defunt CGSCC-specific debug flag.
Even before I sunk the debug flag into the opt tool this had been made
obsolete by factoring the pass and analysis managers into a single set
of templates that all used the core flag. No functionality changed here.
Chandler Carruth [Tue, 13 Jan 2015 22:42:38 +0000 (22:42 +0000)]
[PM] Push the debug option for the new pass manager into the opt tool
and expose the necessary hooks in the API directly.
This makes it much cleaner for example to log the usage of a pass
manager from a library. It also makes it more obvious that this
functionality isn't "optional" or "asserts-only" for the pass manager.
This adds assembly and bitcode support for `MDLocation`. The assembly
side is rather big, since this is the first `MDNode` subclass (that
isn't `MDTuple`). Part of PR21433.
(If you're wondering where the mountains of testcase updates are, we
don't need them until I update `DILocation` and `DebugLoc` to actually
use this class.)
Matt Arsenault [Tue, 13 Jan 2015 20:53:23 +0000 (20:53 +0000)]
R600: Implement getRecipEstimate
This requires a new hook to prevent expanding sqrt in terms
of rsqrt and reciprocal. v_rcp_f32, v_rsq_f32, and v_sqrt_f32 are
all the same rate, so this expansion would just double the number
of instructions and cycles.
Matt Arsenault [Tue, 13 Jan 2015 20:53:18 +0000 (20:53 +0000)]
R600: Implement getRsqrtEstimate
Only do for f32 since I'm unclear on both what this is expecting
for the refinement steps in terms of accuracy, and what
f64 instruction actually provides.
Add a new subclass of `UniquableMDNode`, `MDLocation`. This will be the
IR version of `DebugLoc` and `DILocation`. The goal is to rename this
to `DILocation` once the IR classes supersede the `DI`-prefixed
wrappers.
Matt Arsenault [Tue, 13 Jan 2015 19:46:48 +0000 (19:46 +0000)]
R600: Make cttz / ctlz cheap to speculate
Speculating things is generally good. SI+ has instructions for these
for 32-bit values. This is still probably better even with the expansion
for 64-bit values, although it is odd that this callback doesn't have
the size as a parameter.
Ulrich Weigand [Tue, 13 Jan 2015 19:43:45 +0000 (19:43 +0000)]
Use the integrated assembler as default on PowerPC
This was already done in clang, this commit now uses the integrated
assembler as default when using LLVM tools directly.
A number of test cases using inline asm had to be adapted, either by
updating the expected output, or by using -no-integrated-as (for such
tests that deliberately use an invalid instruction in inline asm).
Will Schmidt [Tue, 13 Jan 2015 18:17:08 +0000 (18:17 +0000)]
Update multiline.ll testcase to handle (ppc64le) .localentry directive
The ppc64le platform will emit a .localentry directive. This is triggering
a false-positive against a CHECK-NOT: .loc in multiline.ll.
Add a space "{{ }}" to the check-not line to allow for arguments, and
prevent .localentry from matching.
Hal Finkel [Tue, 13 Jan 2015 17:48:12 +0000 (17:48 +0000)]
[PowerPC] Add StackMap/PatchPoint support
This commit does two things:
1. Refactors PPCFastISel to use more of the common infrastructure for call
lowering (this lets us take advantage of this common code for lowering some
common intrinsics, stackmap/patchpoint among them).
2. Adds support for stackmap/patchpoint lowering. For the most part, this is
very similar to the support in the AArch64 target, with the obvious differences
(different registers, NOP instructions, etc.). The test cases are adapted
from the AArch64 test cases.
One difference of note is that the patchpoint call sequence takes 24 bytes, so
you can't use less than that (on AArch64 you can go down to 16). Also, as noted
in the docs, we take the patchpoint address to be the actual code address
(assuming the call is local in the TOC-sharing sense), which should yield
higher performance than generating the full cross-DSO indirect-call sequence
and is likely just as useful for JITed code (if not, we'll change it).
StackMaps and Patchpoints are still marked as experimental, and so this support
is doubly experimental. So go ahead and experiment!
Hal Finkel [Tue, 13 Jan 2015 17:48:07 +0000 (17:48 +0000)]
[StackMaps] Use CurrentFnSymForSize
When computing the call-site offset, use AP.CurrentFnSymForSize instead of
AP.CurrentFnSym. There should be no change for other targets, but this is
necessary for generating valid expressions for PPC64/ELF.