Philip Reames [Wed, 7 Jan 2015 19:07:50 +0000 (19:07 +0000)]
Introduce an example statepoint GC strategy
This change includes the most basic possible GCStrategy for a GC which is using the statepoint lowering code. At the moment, this GCStrategy doesn't really do much - aside from actually generate correct stackmaps that is - but I went ahead and added a few extra correctness checks as proof of concept. It's mostly here to provide documentation on how to do one, and to provide a point for various optimization legality hooks I'd like to add going forward. (For context, see the TODOs in InstCombine around gc.relocate.)
Most of the validation logic added here as proof of concept will soon move in to the Verifier. That move is dependent on http://reviews.llvm.org/D6811
There was discussion in the review thread about addrspace(1) being reserved for something. I'm going to follow up on a seperate llvmdev thread. If needed, I'll update all the code at once.
Note that I am deliberately not making a GCStrategy required to use gc.statepoints with this change. I want to give folks out of tree - including myself - a chance to migrate. In a week or two, I'll make having a GCStrategy be required for gc.statepoints. To this end, I added the gc tag to one of the test cases but not others.
Ahmed Bougacha [Wed, 7 Jan 2015 18:39:00 +0000 (18:39 +0000)]
[CodeGen] Add MVT::FIRST_VALUETYPE to avoid explicit 0. NFC.
Many places reference MVT::LAST_VALUETYPE when iterating over all
valid MVTs, but they usually start with 0.
With FIRST_VALUETYPE, we can avoid explicit constants when we really
should be using MVT::SimpleValueType.
David Majnemer [Wed, 7 Jan 2015 18:14:07 +0000 (18:14 +0000)]
X86: Allow the stack probe size to be configurable per function
LLVM emits stack probes on Windows targets to ensure that the stack is
correctly accessed. However, the amount of stack allocated before
emitting such a probe is hardcoded to 4096.
It is desirable to have this be configurable so that a function might
opt-out of stack probes. Our level of granularity is at the function
level instead of, say, the module level to permit proper generation of
code after LTO.
Patch by Andrew H!
N.B. The inliner needs to be updated to properly consider what happens
after inlining a function with a specific stack-probe-size into another
function with a different stack-probe-size.
Ahmed Bougacha [Wed, 7 Jan 2015 17:33:03 +0000 (17:33 +0000)]
[X86] Teach FCOPYSIGN lowering to recognize constant magnitudes.
For code like:
float foo(float x) { return copysign(1.0, x); }
We used to generate:
andps <-0.000000e+00,0,0,0>, %xmm0
movss <1.000000e+00>, %xmm1
andps <nan>, %xmm1
orps %xmm0, %xmm1
Basically doing an abs(1.0f) in the two middle instructions.
We now generate:
andps <-0.000000e+00,0,0,0>, %xmm0
orps <1.000000e+00,0,0,0>, %xmm0
Builds on cleanups r223415, r223542.
rdar://19049548
Differential Revision: http://reviews.llvm.org/D6555
Improvements to emacs packages for llvm and tablegen mode.
* Both files have valid package headers and footers (you can verify
with M-x checkdoc).
* Fixed style warnings generated by checkdoc.
* Fixed a byte-compiler warning in llvm-mode.el.
* Ensure that the modes are autoloaded, so users do not need to
(require 'llvm-mode) to use them.
Aaron Ballman [Wed, 7 Jan 2015 14:26:07 +0000 (14:26 +0000)]
Manually specify the folder that Kaleidescope should reside in for CMake-produced solutions that care about such things (like MSVC). This takes the Kaleidescope target out of the root solution folder and places it into the Examples folder where it belongs.
Aaron Ballman [Wed, 7 Jan 2015 14:19:15 +0000 (14:19 +0000)]
Manually specify the folder that llvm-ranlib should reside in for CMake-produced solutions that care about such things (like MSVC). This takes llvm-ranlib out of the root solution folder and places it into the Tools folder where it belongs.
Jonas Paulsson [Wed, 7 Jan 2015 13:38:29 +0000 (13:38 +0000)]
New method SDep::isNormalMemoryOrBarrier() in ScheduleDAGInstrs.cpp.
Used to iterate over previously added memory dependencies in
adjustChainDeps() and iterateChainSucc().
SDep::isCtrl() was previously used in these places, that also gave
anti and output edges. The code may be worse if these are followed,
because MisNeedChainEdge() will conservatively return true since a
non-memory instruction has no memory operands, and a false chain dep
will be added. It is also unnecessary since all memory accesses of
interest will be reached by memory dependencies, and there is a budget
limit for the number of edges traversed.
This problem was found on an out-of-tree target with enabled alias
analysis. No test case for an in-tree target has been found.
[PM] Give slightly less horrible names to the utility pass templates for
requiring and invalidating specific analyses. Also make their printed
names match their class names. Writing these out as prose really doesn't
make sense to me any more.
Craig Topper [Wed, 7 Jan 2015 08:10:38 +0000 (08:10 +0000)]
[X86] Merge a switch statement inside a default case of another switch statement on the same variable. There was no additional code in the default so this should be no functional change.
Karthik Bhat [Wed, 7 Jan 2015 06:34:34 +0000 (06:34 +0000)]
Revert r225165 and r225169
Even thouh gcc produces simialr instructions as Owen pointed out the two patterns aren’t equivalent in the case
where the original subtraction could have caused an overflow.
Reverting the same.
Ahmed Bougacha [Wed, 7 Jan 2015 02:42:01 +0000 (02:42 +0000)]
[ADT][SmallVector] Flip an assert comparison to avoid overflows yielding false-negatives. NFC.
r221973 changed SmallVector::operator[] to use size_t instead of unsigned.
Before that, on 64bit platforms, when a large index (say -1) was passed,
truncating it to unsigned avoided an overflow when computing 'begin() + idx',
and failed the range checking assertion, as expected.
With r221973, idx isn't truncated, so the addition wraps to
'(char*)begin() - 1', and doesn't fire anymore when it should have done so.
This commit changes the comparison to instead compute 'end() - begin()'
(i.e., 'size()'), which avoids potentially overflowing additions, and
correctly triggers the assertion when values such as -1 are passed.
Note that the problem already existed before that revision, on platforms
where sizeof(size_t) == sizeof(unsigned).
We can't drop support for RAUW entirely in `MDNode`s, since it's
required for graph construction. This comment was from before I'd done
the math on that (out-of-tree), and never should have been committed.
[PM] Fix a pretty nasty bug where the new pass manager would invalidate
passes too many time.
I think this is actually the issue that someone raised with me at the
developer's meeting and in an email, but that we never really got to the
bottom of. Having all the testing utilities made it much easier to dig
down and uncover the core issue.
When a pass manager is running many passes over a single function, we
need it to invalidate the analyses between each run so that they can be
re-computed as needed. We also need to track the intersection of
preserved higher-level analyses across all the passes that we run (for
example, if there is one module analysis which all the function analyses
preserve, we want to track that and propagate it). Unfortunately, this
interacted poorly with any enclosing pass adaptor between two IR units.
It would see the intersection of preserved analyses, and need to
invalidate any other analyses, but some of the un-preserved analyses
might have already been invalidated *and recomputed*! We would fail to
propagate the fact that the analysis had already been invalidated.
The solution to this struck me as really strange at first, but the more
I thought about it, the more natural it seemed. After a nice discussion
with Duncan about it on IRC, it seemed even nicer. The idea is that
invalidating an analysis *causes* it to be preserved! Preserving the
lack of result is trivial. If it is recomputed, great. Until something
*else* invalidates it again, we're good.
The consequence of this is that the invalidate methods on the analysis
manager which operate over many passes now consume their
PreservedAnalyses object, update it to "preserve" every analysis pass to
which it delivers an invalidation (regardless of whether the pass
chooses to be removed, or handles the invalidation itself by updating
itself). Then we return this augmented set from the invalidate routine,
letting the pass manager take the result and use the intersection of
*that* across each pass run to compute the final preserved set. This
accounts for all the places where the early invalidation of an analysis
has already "preserved" it for a future run.
I've beefed up the testing and adjusted the assertions to show that we
no longer repeatedly invalidate or compute the analyses across nested
pass managers.
David Majnemer [Wed, 7 Jan 2015 00:39:50 +0000 (00:39 +0000)]
Analysis: Reformulate WillNotOverflowUnsignedAdd for reusability
WillNotOverflowUnsignedAdd's smarts will live in ValueTracking as
computeOverflowForUnsignedAdd. It now returns a tri-state result:
never overflows, always overflows and sometimes overflows.
Hal Finkel [Wed, 7 Jan 2015 00:15:29 +0000 (00:15 +0000)]
[PowerPC] Transform a README.txt entry into a FIXME
Remove the README.txt entry regarding register allocation of CR logical ops,
and replace it with a FIXME in PPCInstrInfo.td. The text in the README.txt was
not really accurate, and thanks goes to Pat Haugen (and Bill Schmidt) from IBM
for clarifying what was intended and highlighting the relevant text in the ISA
specification.
Add the missing `DEPENDS` keyword. r225319 did almost the right thing
(I didn't notice the problem with it because `Kaleidoscope-Ch8` wasn't
building at all).
Lang Hames [Tue, 6 Jan 2015 23:04:36 +0000 (23:04 +0000)]
Revert r224935 "Refactor duplicated code. No intended functionality change."
This is affecting the behavior of some ObjC++ / AArch64 test cases on Darwin.
Reverting to get the bots green while I track down the source of the changed
behavior.
Change the .ll syntax for comdats and add a syntactic sugar.
In order to make comdats always explicit in the IR, we decided to make
the syntax a bit more compact for the case of a GlobalObject in a
comdat with the same name.
Hal Finkel [Tue, 6 Jan 2015 22:31:02 +0000 (22:31 +0000)]
[PowerPC] Reuse a load operand in int->fp conversions
int->fp conversions on PPC must be done through memory loads and stores. On a
modern core, this process begins by storing the int value to memory, then
loading it using a (sometimes special) FP load instruction. Unfortunately, we
would do this even when the value to be converted was itself a load, and we can
just use that same memory location instead of copying it to another first.
There is a slight complication when handling int_to_fp(fp_to_int(x)) pairs,
because the fp_to_int operand has not been lowered when the int_to_fp is being
lowered. We handle this specially by invoking fp_to_int's lowering logic
(partially) and getting the necessary memory location (some trivial refactoring
was done to make this possible).
This is all somewhat ugly, and it would be nice if some later CodeGen stage
could just clean this stuff up, but because doing so would involve modifying
target-specific nodes (or instructions), it is not immediately clear how that
would work.
Also, remove a related entry from the README.txt for which we now generate
reasonable code.
Mehdi Amini [Tue, 6 Jan 2015 20:05:02 +0000 (20:05 +0000)]
Use a Factory Method for MachineFunctionInfo Creation
The goal is to allows MachineFunctionInfo to override this create
function to customize the creation.
No change intended in existing backend in this patch.
Tom Stellard [Tue, 6 Jan 2015 19:52:04 +0000 (19:52 +0000)]
R600/SI: Fix dependency calculation for DS writes instructions in SIInsertWaits
In DS write instructions, the address operand comes before the value
operand(s) which is reversed from every other instruction type.
The SIInsertWait assumed that the first use for each instruction
was the value, so for DS write it was protecting the address
operand with s_waitcnt instructions when it should have been
protecting the value operand.
Sanjoy Das [Tue, 6 Jan 2015 19:02:56 +0000 (19:02 +0000)]
This patch teaches IndVarSimplify to add nuw and nsw to certain kinds
of operations that provably don't overflow. For example, we can prove
%civ.inc below does not sign-overflow. With this change,
IndVarSimplify changes %civ.inc to an add nsw.
Tom Stellard [Tue, 6 Jan 2015 18:00:21 +0000 (18:00 +0000)]
R600/SI: Add a stub GCNTargetMachine
This is equivalent to the AMDGPUTargetMachine now, but it is the
starting point for separating R600 and GCN functionality into separate
targets.
It is recommened that users start using the gcn triple for GCN-based
GPUs, because using the r600 triple for these GPUs will be deprecated in
the future.
[CodeGenPrepare] Improved logic to speculate calls to cttz/ctlz.
This patch improves the logic added at revision 224899 (see review D6728) that
teaches the backend when it is profitable to speculate calls to cttz/ctlz.
The original algorithm conservatively avoided speculating more than one
instruction from a basic block in a control flow grap modelling an if-statement.
In particular, the only allowed instruction (excluding the terminator) was a
call to cttz/ctlz. However, there are cases where we could be less conservative
and still be able to speculate a call to cttz/ctlz.
With this patch, CodeGenPrepare now tries to speculate a cttz/ctlz if the
result is zero extended/truncated in the same basic block, and the zext/trunc
instruction is "free" for the target.
Adrian Prantl [Tue, 6 Jan 2015 17:14:10 +0000 (17:14 +0000)]
Reapply: Teach SROA how to update debug info for fragmented variables.
This also rolls in the changes discussed in http://reviews.llvm.org/D6766.
Defers migrating the debug info for new allocas until after all partitions
are created.
Adrian Prantl [Tue, 6 Jan 2015 16:50:25 +0000 (16:50 +0000)]
Implement a very basic colored syntax highlighting for llvm-dwarfdump.
The color scheme is the same as the one used by the colorize dwarfdump
script on Darwin.
A new --color option can be used to forcibly turn color on or off.
Hal Finkel [Tue, 6 Jan 2015 16:46:37 +0000 (16:46 +0000)]
[PowerPC] Add a regression test for r225251
In r225251, I removed an old entry from the README.txt file. While there are
several contributing factors (including pieces in Clang's ABI code), upon
further reflection, the backend part deserves a regression test.
No functional changes. Support for ARM's modified immediate syntax was added
in r223113 and r223115 (review: D6408). That patch introduced the mod_imm*
tblegen definitions which renders the existing so_imm* definitions redundant.
This patch gets rid of them completely.
Matt Arsenault [Tue, 6 Jan 2015 15:50:59 +0000 (15:50 +0000)]
Convert fcmp with 0.0 from casted integers to icmp
This is already handled in general when it is known the
conversion can't lose bits with smaller integer types
casted into wider floating point types.
This pattern happens somewhat often in GPU programs that cast
workitem intrinsics to float, which are often compared with 0.
Specifically handle the special case of compares with zero which
should also be known to not lose information. I had a more general
version of this which allows equality compares if the casted float is
exactly representable in the integer, but I'm not 100% confident that
is always correct.
Also fold cases that aren't integers to true / false.
[PM] Introduce a utility pass that preserves no analyses.
Use this to test that path of invalidation. This test actually shows
redundant invalidation here that is really bad. I'm going to work on
fixing that next, but wanted to commit the test harness now that its all
working.
Craig Topper [Tue, 6 Jan 2015 08:59:30 +0000 (08:59 +0000)]
[X86] Add OpSize32 to XBEGIN_4. Add XBEGIN_2 with OpSize16.
Requires new AsmParserOperand types that detect 16-bit and 32/64-bit mode so that we choose the right instruction based on default sizing without predicates. This is necessary since predicates mess up the disassembler table building.
[PM] Simplify how we parse the outer layer of the pass pipeline text and
remove an extra, redundant pass manager wrapping every run.
I had kept seeing these when manually testing, but it was getting really
annoying and was going to cause problems with overly eager invalidation.
The root cause was an overly complex and unnecessary pile of code for
parsing the outer layer of the pass pipeline. We can instead delegate
most of this to the recursive pipeline parsing.
I've added some somewhat more basic and precise tests to catch this.
Craig Topper [Tue, 6 Jan 2015 07:35:50 +0000 (07:35 +0000)]
[X86] Make isel select the 2-byte register form of INC/DEC even in non-64-bit mode. Convert to the 1-byte form in non-64-bit mode as part of MCInst lowering.
Overall this seems simpler. It reduces duplication of patterns between both modes and it simplifies the memory folding/unfolding tables as they don't need to create fake instructions just to keep track of 64-bitness.
The old target DAG combine that allowed for performing int_to_fp(fp_to_int(x))
without a load/store pair is updated here with support for unsigned integers,
and to support single-precision values without a third rounding step, on newer
cores with the appropriate instructions.
[PM] Add a utility pass template that synthesizes the invalidation of
a specific analysis result.
This is quite handy to test things, and will also likely be very useful
for debugging issues. You could narrow down pass validation failures by
walking these invalidate pass runs up and down the pass pipeline, etc.
I've added support to the pass pipeline parsing to be able to create one
of these for any analysis pass desired.
Just adding this class uncovered one latent bug where the
AnalysisManager CRTP base class had a hard-coded Module type rather than
using IRUnitT.
I've also added tests for invalidation and caching of analyses in
a basic way across all the pass managers. These in turn uncovered two
more bugs where we failed to correctly invalidate an analysis -- its
results were invalidated but the key for re-running the pass was never
cleared and so it was never re-run. Quite nasty. I'm very glad to debug
this here rather than with a full system.
Also, yes, the naming here is horrid. I'm going to update some of the
names to be slightly less awful shortly. But really, I've no "good"
ideas for naming. I'll be satisfied if I can get it to "not bad".
[PM] Simplify how we use the registry by including it only once. Still
more verbose than I'd like, but the code really isn't that interesting,
and this still seems vastly simpler than any other solutions I've come
up with. =] Maybe if we get to the 10th IR unit, this will be a problem
in practice.
Craig Topper [Tue, 6 Jan 2015 04:23:57 +0000 (04:23 +0000)]
[X86] Remove 16-bit and 32-bit offset jump instructions from the AsmParser. We always select the 8-bit size and let the assembler backend relax to the larger size.
Craig Topper [Tue, 6 Jan 2015 04:23:53 +0000 (04:23 +0000)]
[X86] Make isel select the shorter form of jump instructions instead of the long form.
The assembler backend will relax to the long form if necessary. This removes a swap from long form to short form in the MCInstLowering code. Selecting the long form used to be required by the old JIT.
[PM] Sink the no-op pass parsing logic into the .def-based registry to
simplify things. This will become more important as I add no-op analyses
that want to re-use the logic we already have for analyses in the
registry. For now, no functionality changed.
[PM] Move the analysis registry into the Passes.cpp file and provide
a normal interface for it in Passes.h.
This gives us essentially a single interface for running pass managers
which are provided from the bottom of the LLVM stack through interfaces
at the top of the LLVM stack that populate them with all of the
different analyses available throughout. It also means there is a single
blob of code that needs to include all of the pass headers and needs to
deal with the registry of passes and parsing names.
No functionality changed intended, should just be cleanup.
[PM] Add a utility to the new pass manager for generating a pass which
is a no-op other than requiring some analysis results be available.
This can be used in real pass pipelines to force the usually lazy
analysis running to eagerly compute something at a specific point, and
it can be used to test the pass manager infrastructure (my primary use
at the moment).
I've also added bit of pipeline parsing magic to support generating
these directly from the opt command so that you can directly use these
when debugging your analysis. The syntax is:
require<analysis-name>
This can be used at any level of the pass manager. For example:
This would produce a no-op function pass requiring my-analysis, followed
by a fully no-op function pass, both of these in a function pass manager
which is nested inside of a bottom-up CGSCC pass manager which is in the
top-level (implicit) module pass manager.
I have zero attachment to the particular syntax I'm using here. Consider
it a straw man for use while I'm testing and fleshing things out.
Suggestions for better syntax welcome, and I'll update everything based
on any consensus that develops.
I've used this new functionality to more directly test the analysis
printing rather than relying on the cgscc pass manager running an
analysis for me. This is still minimally tested because I need to have
analyses to run first! ;] That patch is next, but wanted to keep this
one separate for easier review and discussion.
Now that `LLVMContextImpl` can call `MDNode::dropAllReferences()` to
prevent teardown madness, stop dropping uniquing just because an operand
drops to null.