Prior to this commit, physical registers defined implicitly were considered free
right after their definition, i.e.. like dead definitions. Therefore, their uses
had to immediately follow their definitions, otherwise the related register may
be reused to allocate a virtual register.
This commit fixes this assumption by keeping implicit definitions alive until
they are actually used. The downside is that if the implicit definition was dead
(and not marked at such), we block an otherwise available register. This is
however conservatively correct and makes the fast register allocator much more
robust in particular regarding the scheduling of the instructions.
Split the set of identified struct types into opaque and non-opaque ones.
The non-opaque part can be structurally uniqued. To keep this to just
a hash lookup, we don't try to unique cyclic types.
Also change the type mapping algorithm to be optimistic about a type
not being recursive and only create a new type when proven to be wrong.
This is not as strong as trying to speculate that we can keep the source
type, but is simpler (no speculation to revert) and more powerfull
than what we had before (we don't copy non-recursive types at least).
I initially wrote this to try to replace the name based type merging.
It is not strong enough to replace it, but is is a useful addition.
With this patch the number of named struct types is a clang lto bootstrap goes
from 49674 to 15986.
Philip Reames [Wed, 3 Dec 2014 19:53:15 +0000 (19:53 +0000)]
Make the Verifier more strict about gc.statepoints
The recently added documentation for statepoints claimed that we checked the parameters of the various intrinsics for validity. This patch adds the code to actually do so. I also removed a couple of redundant checks for conditions which are checked elsewhere in the Verifier and simplified the logic using the helper functions from Statepoint.h.
Erik Eckstein [Wed, 3 Dec 2014 10:39:15 +0000 (10:39 +0000)]
InstCombine: simplify signed range checks
Try to convert two compares of a signed range check into a single unsigned compare.
Examples:
(icmp sge x, 0) & (icmp slt x, n) --> icmp ult x, n
(icmp slt x, 0) | (icmp sgt x, n) --> icmp ugt x, n
Hal Finkel [Wed, 3 Dec 2014 09:37:50 +0000 (09:37 +0000)]
[PowerPC] Print all inline-asm consts as signed numbers
Almost all immediates in PowerPC assembly (both 32-bit and 64-bit) are signed
numbers, and it is important that we print them as such. To make sure that
happens, we change PPCTargetLowering::LowerAsmOperandForConstraint so that it
does all intermediate checks on a signed-extended int64_t value, and then
creates the resulting target constant using MVT::i64. This will ensure that all
negative values are printed as negative values (mirroring what is done in other
backends to achieve the same sign-extension effect).
This came up in the context of inline assembly like this:
"add%I2 %0,%0,%2", ..., "Ir"(-1ll)
where we used to print:
addi 3,3,4294967295
and gcc would print:
addi 3,3,-1
and gas accepts both forms, but our builtin assembler (correctly) does not. Now
we print -1 like gcc does.
While here, I replaced a bunch of custom integer checks with isInt<16> and
friends from MathExtras.h.
Charlie Turner [Wed, 3 Dec 2014 08:12:26 +0000 (08:12 +0000)]
Emit ABI_FP_rounding attribute.
LLVM understands a -enable-sign-dependent-rounding-fp-math codegen option. When
the user has specified this option, the Tag_ABI_FP_rounding attribute should be
emitted with value 1. This option currently does not appear to disable
transformations and optimizations that assume default floating point rounding
behavior, AFAICT, but the intention should be recorded in the build attributes,
regardless of what the compiler actually does with the intention.
When lazy reading a module, the types used in a function will not be visible to
a TypeFinder until the body is read.
This patch fixes that by asking the module for its identified struct types.
If a materializer is present, the module asks it. If not, it uses a TypeFinder.
This fixes pr21374.
I will be the first to say that this is ugly, but it was the best I could find.
Some of the options I looked at:
* Asking the LLVMContext. This could be made to work for gold, but not currently
for ld64. ld64 will load multiple modules into a single context before merging
them. This causes us to see types from future merges. Unfortunately,
MappedTypes is not just a cache when it comes to opaque types. Once the
mapping has been made, we have to remember it for as long as the key may
be used. This would mean moving MappedTypes to the Linker class and having
to drop the Linker::LinkModules static methods, which are visible from C.
* Adding an option to ignore function bodies in the TypeFinder. This would
fix the PR by picking the worst result. It would work, but unfortunately
we are currently quite dependent on the upfront type merging. I will
try to reduce our dependency, but it is not clear that we will be able
to get rid of it for now.
The only clean solution I could think of is making the Module own the types.
This would have other advantages, but it is a much bigger change. I will
propose it, but it is nice to have this fixed while that is discussed.
With the gold plugin, this patch takes the number of types in the LTO clang
binary from 52817 to 49669.
Remove an unnecessary `MDNode::replaceAllUsesWith()`. In the preceding
line, `TheLoop->setLoopID()` visits all backedges and sets the new loop
ID. This sufficiently updates the loop metadata.
Matt Arsenault [Wed, 3 Dec 2014 05:22:35 +0000 (05:22 +0000)]
R600/SI: Remove i1 pseudo VALU ops
Select i1 logical ops directly to 64-bit SALU instructions.
Vector i1 values are always really in SGPRs, with each
bit for each item in the wave. This saves about 4 instructions
when and/or/xoring any condition, and also helps write conditions
that need to be passed in vcc.
This should work correctly now that the SGPR live range
fixing pass works. More work is needed to eliminate the VReg_1
pseudo regclass and possibly the entire SILowerI1Copies pass.
Matt Arsenault [Wed, 3 Dec 2014 05:22:32 +0000 (05:22 +0000)]
R600/SI: Fix suspicious indexing
The loop is over the operands of an instruction, and checks the
register with the sub reg index of the dest register. This probably
meant to be checking the sub reg index of the same operand.
Matt Arsenault [Wed, 3 Dec 2014 05:22:29 +0000 (05:22 +0000)]
R600/SI: Fix live range error hidden by SIFoldOperands
m0 is treated as a virtual register class with a single register
rather than the physical register it really is. This was updating
the live range of the used virtual copy of m0 from the first ds_read
instruction, and leaving the unused copy unchanged. This resulted in a
"Live segment doesn't end at a valid instruction" verifier error because
the erased instructions. Update the live range of the second copy (which
should be dead).
No test since I'm not sure how to trigger this with SIFoldOperands
enabled.
Tom Stellard [Wed, 3 Dec 2014 04:28:32 +0000 (04:28 +0000)]
StructurizeCFG: Use LoopInfo analysis for better loop detection
We were assuming that each back-edge in a region represented a unique
loop, which is not always the case. We need to use LoopInfo to
correctly determine which back-edges are loops.
Nick Lewycky [Wed, 3 Dec 2014 02:45:01 +0000 (02:45 +0000)]
Emit the entry block first and the exit block second, then all the blocks in between afterwards. This is what gcc always does, and some out of tree tools depend on that.
This redefines the `prefix` attribute introduced previously and
introduces a `prologue` attribute. There are a two primary usecases
that these attributes aim to serve,
1. Function prologue sigils
2. Function hot-patching: Enable the user to insert `nop` operations
at the beginning of the function which can later be safely replaced
with a call to some instrumentation facility
3. Runtime metadata: Allow a compiler to insert data for use by the
runtime during execution. GHC is one example of a compiler that
needs this functionality for its tables-next-to-code functionality.
Previously `prefix` served cases (1) and (2) quite well by allowing the user
to introduce arbitrary data at the entrypoint but before the function
body. Case (3), however, was poorly handled by this approach as it
required that prefix data was valid executable code.
Here we redefine the notion of prefix data to instead be data which
occurs immediately before the function entrypoint (i.e. the symbol
address). Since prefix data now occurs before the function entrypoint,
there is no need for the data to be valid code.
The previous notion of prefix data now goes under the name "prologue
data" to emphasize its duality with the function epilogue.
The intention here is to handle cases (1) and (2) with prologue data and
case (3) with prefix data.
References
----------
This idea arose out of discussions[1] with Reid Kleckner in response to a
proposal to introduce the notion of symbol offsets to enable handling of
case (3).
The X86AsmParser intel handling was refactored in r216481, making it
try each different memory operand size to see which one matches.
Operand sizes larger than 80 ("[xyz]mmword ptr") were forgotten, which
led to an "invalid operand" error for code such as:
movdqa [rax], xmm0
Tim Northover [Tue, 2 Dec 2014 23:53:43 +0000 (23:53 +0000)]
AArch64: strengthen Darwin ABI alignment assumptions
A global variable without an explicit alignment specified should be assumed to
be ABI-aligned according to its type, like on other platforms. This allows us
to use better memory operations when accessing it.
Tim Northover [Tue, 2 Dec 2014 23:13:39 +0000 (23:13 +0000)]
AArch64: don't be too greedy when folding :lo12: accesses into mem ops.
This frequently leads to cases like:
ldr xD, [xN, :lo12:var]
add xA, xN, :lo12:var
ldr xD, [xA, #8]
where the ADD would have been needed anyway, and the two distinct addressing
modes can prevent the formation of an ldp. Because of how we handle ADRP
(aggressively forming an ADRP/ADD pseudo-inst at ISel time), this pattern also
results in duplicated ADRP instructions (one on its own to cover the ldr, and
one combined with the add).
Such loops shouldn't be vectorized due to the loops form.
After applying loop-rotate (+simplifycfg) the tests again start to check
what they are intended to check.
Simon Pilgrim [Tue, 2 Dec 2014 22:31:23 +0000 (22:31 +0000)]
[X86][SSE] Keep 4i32 vector insertions in integer domain on SSE4.1 targets
4i32 shuffles for single insertions into zero vectors lowers to X86vzmovl which was using (v)blendps - causing domain switch stalls. This patch fixes this by using (v)pblendw instead.
The updated tests on test/CodeGen/X86/sse41.ll still contain a domain stall due to the use of insertps - I'm looking at fixing this in a future patch.
Hal Finkel [Tue, 2 Dec 2014 22:01:00 +0000 (22:01 +0000)]
[PowerPC] Implement readcyclecounter for PPC32
We've long supported readcyclecounter on PPC64, but it is easier there (the
read of the 64-bit time-base register can be accomplished via a single
instruction). This now provides an implementation for PPC32 as well. On PPC32,
the time-base register is still 64 bits, but can only be read 32 bits at a time
via two separate SPRs. The ISA manual explains how to do this properly (it
involves re-reading the upper bits and looping if the counter has wrapped while
being read).
This requires PPC to implement a custom integer splitting legalization for the
READCYCLECOUNTER node, turning it into a target-specific SDAG node, which then
gets turned into a pseudo-instruction, which is then expanded to the necessary
sequence (which has three SPR reads, the comparison and the branch).
Thanks to Paul Hargrove for pointing out to me that this was still unimplemented.
Lang Hames [Tue, 2 Dec 2014 21:36:24 +0000 (21:36 +0000)]
[AArch64][Stackmaps] Optimize stackmap shadows on AArch64.
Reduce the number of nops emitted for stackmap shadows on AArch64 by counting
non-stackmap instructions up to the next branch target towards the requested
shadow.
Daniel Sanders [Tue, 2 Dec 2014 20:40:27 +0000 (20:40 +0000)]
[mips] Fix passing of small structures for big-endian O32.
Summary:
Like N32/N64, they must be passed in the upper bits of the register.
The new code could be merged with the existing if-statements but I've
refrained from doing this since it will make porting the O32 implementation
to tablegen harder later.
Roman Divacky [Tue, 2 Dec 2014 20:03:22 +0000 (20:03 +0000)]
Introduce CPUStringIsValid() into MCSubtargetInfo and use it for ARM .cpu parsing.
Previously .cpu directive in ARM assembler didnt switch to the new CPU and
therefore acted as a nop. This implemented real action for .cpu and eg.
allows to assembler FreeBSD kernel with -integrated-as.
Philip Reames [Tue, 2 Dec 2014 19:37:00 +0000 (19:37 +0000)]
[Statepoints 4/4] Statepoint infrastructure for garbage collection: Documentation
This is the fourth and final patch in the statepoint series. It contains the documentation for the statepoint intrinsics and their usage.
There's definitely still room to improve the documentation here, but I wanted to get this landed so it was available for others. There will likely be a series of small cleanup changes over the next few weeks as we work to clarify and revise the documentation. If you have comments or questions, please feel free to discuss them either in this commit thread, the original review thread, or on llvmdev. Comments are more than welcome.
Reid Kleckner [Tue, 2 Dec 2014 18:59:08 +0000 (18:59 +0000)]
cmake: Remove MAXPATHLEN define as autoconf does not provide it
Presumably it was added to the CMake system when MAXPATHLEN was still
used by code built for Windows. Currently only lib/Support/Path.inc uses
MAXPATHLEN, and it should be available on all Unices.
Philip Reames [Tue, 2 Dec 2014 18:50:36 +0000 (18:50 +0000)]
[Statepoints 3/4] Statepoint infrastructure for garbage collection: SelectionDAGBuilder
This is the third patch in a small series. It contains the CodeGen support for lowering the gc.statepoint intrinsic sequences (223078) to the STATEPOINT pseudo machine instruction (223085). The change also includes the set of helper routines and classes for working with gc.statepoints, gc.relocates, and gc.results since the lowering code uses them.
With this change, gc.statepoints should be functionally complete. The documentation will follow in the fourth change, and there will likely be some cleanup changes, but interested parties can start experimenting now.
I'm not particularly happy with the amount of code or complexity involved with the lowering step, but at least it's fairly well isolated. The statepoint lowering code is split into it's own files and anyone not working on the statepoint support itself should be able to ignore it.
During the lowering process, we currently spill aggressively to stack. This is not entirely ideal (and we have plans to do better), but it's functional, relatively straight forward, and matches closely the implementations of the patchpoint intrinsics. Most of the complexity comes from trying to keep relocated copies of values in the same stack slots across statepoints. Doing so avoids the insertion of pointless load and store instructions to reshuffle the stack. The current implementation isn't as effective as I'd like, but it is functional and 'good enough' for many common use cases.
In the long term, I'd like to figure out how to integrate the statepoint lowering with the register allocator. In principal, we shouldn't need to eagerly spill at all. The register allocator should do any spilling required and the statepoint should simply record that fact. Depending on how challenging that turns out to be, we may invest in a smarter global stack slot assignment mechanism as a stop gap measure.
Ahmed Bougacha [Tue, 2 Dec 2014 18:09:51 +0000 (18:09 +0000)]
[MachineCSE] Clear kill-flag on registers imp-def'd by the CSE'd instruction.
Go through implicit defs of CSMI and MI, and clear the kill flags on
their uses in all the instructions between CSMI and MI.
We might have made some of the kill flags redundant, consider:
subs ... %NZCV<imp-def> <- CSMI
csinc ... %NZCV<imp-use,kill> <- this kill flag isn't valid anymore
subs ... %NZCV<imp-def> <- MI, to be eliminated
csinc ... %NZCV<imp-use,kill>
Since we eliminated MI, and reused a register imp-def'd by CSMI
(here %NZCV), that register, if it was killed before MI, should have
that kill flag removed, because it's lifetime was extended.
Also, add an exhaustive testcase for the motivating example.
Tim Northover [Tue, 2 Dec 2014 17:15:22 +0000 (17:15 +0000)]
AArch64: make register block rules apply to vector types too.
The blocking code originated in ARM, which is more aggressive about casting
types to a canonical representative before doing anything else, so I missed out
most vector HFAs and broke the ABI. This should fix it.
Tom Stellard [Tue, 2 Dec 2014 16:45:47 +0000 (16:45 +0000)]
Triple: Add AMDHSA operating system type
This operating system type represents the AMD HSA runtime,
and will be required by the R600 backend in order to generate
correct code for this runtime.
[LICM] Avoind store sinking if no preheader is available
Load instructions are inserted into loop preheaders when sinking stores
and later removed if not used by the SSA updater. Avoid sinking if the
loop has no preheader and avoid crashes. This fixes one more side effect
of not handling indirectbr instructions properly on LoopSimplify.