Alex Lorenz [Tue, 19 May 2015 18:17:39 +0000 (18:17 +0000)]
MIR Serialization: print and parse LLVM IR using MIR format.
This commit is the initial commit for the MIR serialization project.
It creates a new library under CodeGen called 'MIR'. This new
library adds a new machine function pass that prints out the LLVM IR
using the MIR format. This pass is then added as a last pass when a
'stop-after' option is used in llc. The new library adds the initial
functionality for parsing of MIR files as well. This commit also
extends the llc tool so that it can recognize and parse MIR input files.
Reviewers: Duncan P. N. Exon Smith, Matthias Braun, Philip Reames
Wei Mi [Tue, 19 May 2015 16:09:11 +0000 (16:09 +0000)]
Remove the InstructionSimplifierPass immediately after InstructionCombiningPass.
InstructionCombiningPass was added after LoopUnrollPass in r237395. Because
InstructionCombiningPass is strictly more powerful than InstructionSimplifierPass,
remove the unnecessary InstructionSimplifierPass.
Igor Laevsky [Tue, 19 May 2015 15:59:05 +0000 (15:59 +0000)]
[RewriteStatepointsForGC] For some values (like gep's and bitcasts) it's cheaper to clone them after statepoint than to emit proper relocates for them. This change implements this logic. There is alredy similar optimization in CodeGenPrepare, but doing so during RewriteStatepointsForGC allows to capture more opprtunities such as relocates in loops and longer instruction chains.
Daniel Sanders [Tue, 19 May 2015 12:24:52 +0000 (12:24 +0000)]
[mips] Correct and improve special-case shuffle instructions.
Summary:
The documentation writes vectors highest-index first whereas LLVM-IR writes
them lowest-index first. As a result, instructions defined in terms of
left_half() and right_half() had the halves reversed.
In addition to correcting them, they have been improved to allow shuffles
that use the same operand twice or in reverse order. For example, ilvev
used to accept masks of the form:
<0, n, 2, n+2, 4, n+4, ...>
but now accepts:
<0, 0, 2, 2, 4, 4, ...>
<n, n, n+2, n+2, n+4, n+4, ...>
<0, n, 2, n+2, 4, n+4, ...>
<n, 0, n+2, 2, n+4, 4, ...>
One further improvement is that splati.[bhwd] is now the preferred instruction
for splat-like operations. The other special shuffles are no longer used
for splats. This lead to the discovery that <0, 0, ...> would not cause
splati.[hwd] to be selected and this has also been fixed.
This fixes the enc-3des test from the test-suite on Mips64r6 with MSA.
[X86] ABI change for x86-32: pass 3 vector arguments in-register instead of 4, except on Darwin.
This changes the ABI used on 32-bit x86 for passing vector arguments.
Historically, clang passes the first 4 vector arguments in-register, and additional vector arguments on the stack, regardless of platform. That is different from the behavior of gcc, icc, and msvc, all of which pass only the first 3 arguments in-register.
The 3-register convention is documented, unofficially, in Agner's calling convention guide, and, officially, in the recently released version 1.0 of the i386 psABI.
Darwin is kept as is because the OS X ABI Function Call Guide explicitly documents the current (4-register) behavior.
Matthias Braun [Tue, 19 May 2015 01:40:21 +0000 (01:40 +0000)]
SelectionDAG: Cleanup and simplify FoldConstantArithmetic
This cleans up the FoldConstantArithmetic code by factoring out the case
of two ConstantSDNodes into an own function. This avoids unnecessary
complexity for many callers who already have ConstantSDNode arguments.
This also avoids an intermeidate SmallVector datastructure and a loop
over that datastructure.
Pete Cooper [Tue, 19 May 2015 00:24:26 +0000 (00:24 +0000)]
Store intrinsic ID by value in Function instead of a string lookup. NFC.
On 64-bit targets, Function has 4-bytes of padding in its struct layout.
This uses the space for the intrinsic ID. It is set and recalculated whenever the function name is set. This is similar to the current behavior which clears the function from the intrinsic ID cache when its renamed.
The intrinsic cache itself is removed as the only purpose was to speedup calls to getIntrinsicID() which now just reading the new field in the struct.
Tim Northover [Mon, 18 May 2015 22:07:20 +0000 (22:07 +0000)]
AArch64: work around ld64 bug more aggressively.
ld64 currently mishandles internal pointer relocations (i.e.
ARM64_RELOC_UNSIGNED referred to by section & offset rather than symbol). The
existing __cfstring clause was an early discovery and workaround for this, but
the problem is wider and we should avoid such relocations wherever possible for
now.
This code should be reverted to allowing internal relocations as soon as
possible.
Extract the load/store type verification to a separate function.
Summary:
Added isLoadableOrStorableType to PointerType.
We were doing some checks in some places, occasionally assert()ing instead
of telling the caller. With this patch, I'm putting all type checking in
the same place for load/store type instructions, and verifying the same
thing every time.
I also added a check for load/store of a function type.
Applied extracted check to Load, Store, and Cmpxcg.
I don't have exhaustive tests for all of these, but all Error() calls in
TypeCheckLoadStoreInst are being tested (in invalid.test).
Matthias Braun [Mon, 18 May 2015 20:27:55 +0000 (20:27 +0000)]
MachineInstr: Change return value of getOpcode() to unsigned.
This was previously returning int. However there are no negative opcode
numbers and more importantly this was needlessly different from
MCInstrDesc::getOpcode() (which even is the value returned here) and
SDValue::getOpcode()/SDNode::getOpcode().
Chen Li [Mon, 18 May 2015 19:50:14 +0000 (19:50 +0000)]
[Verifier] Assert gc_relocate always return a pointer type
Summary: Add an assertion in verifier.cpp to make sure gc_relocate relocate a gc pointer, and its return type has the same address space with the relocated pointer.
Chen Li [Mon, 18 May 2015 19:02:25 +0000 (19:02 +0000)]
[PlaceSafepoints] Assertion on that gc_result can not have preceding phis should only apply to invoke statepoint
Summary: When PlaceSafepoints pass replaces old return result with gc_result from statepoint, it asserts that gc_result can not have preceding phis in its parent block. This is only true on invoke statepoint, which terminates the block and puts its result at the beginning of the normal successor block. Call statepoint does not terminate the block and thus its result is in the same block with it. There should be no restriction on whether there are phis or not.
Sanjoy Das [Mon, 18 May 2015 18:07:00 +0000 (18:07 +0000)]
Exploit dereferenceable_or_null attribute in LICM pass
Summary:
Allow hoisting of loads from values marked with dereferenceable_or_null
attribute. For values marked with the attribute perform
context-sensitive analysis to determine whether it's known-non-null or
not.
Tim Northover [Mon, 18 May 2015 17:10:40 +0000 (17:10 +0000)]
ARM: allow jump tables to be placed as constant islands.
Previously, they were forced to immediately follow the actual branch
instruction. This was usually OK (the LEAs actually accessing them got emitted
nearby, and weren't usually separated much afterwards). Unfortunately, a
sufficiently nasty phi elimination dumps many instructions right before the
basic block terminator, and this can increase the range too much.
This patch frees them up to be placed as usual by the constant islands pass,
and consequently has to slightly modify the form of TBB/TBH tables to refer to
a PC-relative label at the final jump. The other jump table formats were
already position-independent.
Andrew Trick [Mon, 18 May 2015 16:49:34 +0000 (16:49 +0000)]
indvars cruft: don't replace phi nodes for no reason.
Don't replace a phi with an identical phi. This was done long ago to
"preserve" IVUsers analysis. The code has already called
SE->forgetValue(PN) so I see no purpose in creating a new value for
the phi.
Hal Finkel [Mon, 18 May 2015 16:42:10 +0000 (16:42 +0000)]
Preserve the order of READ_REGISTER and WRITE_REGISTER
At the present time, we don't have a way to represent general dependency
relationships, so everything is represented using memory dependency. In order
to preserve the data dependency of a READ_REGISTER on WRITE_REGISTER, we need
to model WRITE_REGISTER as writing (which we had been doing) and model
READ_REGISTER as reading (which we had not been doing). Fix this, and also the
way that the chain operands were generated at the SDAG level.
Patch by Nicholas Paul Johnson, thanks! Test case by me.
James Y Knight [Mon, 18 May 2015 16:35:04 +0000 (16:35 +0000)]
Sparc: Add the "alternate address space" load/store instructions.
- Adds support for the asm syntax, which has an immediate integer
"ASI" (address space identifier) appearing after an address, before
a comma.
- Adds the various-width load, store, and swap in alternate address
space instructions. (ldsba, ldsha, lduba, lduha, lda, stba, stha,
sta, swapa)
This does not attempt to hook these instructions up to pointer address
spaces in LLVM, although that would probably be a reasonable thing to
do in the future.
Oliver Stannard [Mon, 18 May 2015 16:23:33 +0000 (16:23 +0000)]
[LLVM - ARM/AArch64] Add ACLE special register intrinsics
This patch implements LLVM support for the ACLE special register intrinsics in
section 10.1, __arm_{w,r}sr{,p,64}.
This patch is intended to lower the read/write_register instrinsics, used to
implement the special register intrinsics in the clang patch for special
register intrinsics (see http://reviews.llvm.org/D9697), to ARM specific
instructions MRC,MCR,MSR etc. to allow reading an writing of coprocessor
registers in AArch32 and AArch64. This is done by inspecting the register
string passed to the intrinsic and then lowering to the appropriate
instruction.
Hal Finkel [Mon, 18 May 2015 15:46:02 +0000 (15:46 +0000)]
[DAGCombine] Be more pedantic about use iteration in CombineToPreIndexedLoadStore
In CombineToPreIndexedLoadStore, when the offset is a constant, we have code
that looks for other uses of the pointer which are constant offset computations
so that they can be rewritten in terms of the updated pointer so that we don't
need to keep a copy of the base pointer to compute these constant offsets.
Unfortunately, when it iterated over the uses, it did so by SDNodes, and so we
could confuse ourselves if the base pointer was produced by a node that had
multiple results (because we would not immediately exclude uses of the other
node results). This was reported as PR22755. Unfortunately, we don't have a
test case (and I've also been unable to produce one thus far), but at least the
mistake is clear. The right way to fix this problem is to make use of the information
contained in the use iterators to filter out any uses of other results of the
node producing the base pointer.
This should be mostly NFC, but should also fix PR22755 (for which,
unfortunately, we have no in-tree test case).
Adam Nemet [Mon, 18 May 2015 15:37:03 +0000 (15:37 +0000)]
[LoopAccesses] If shouldRetryWithRuntimeCheck, reset InterestingDependences
When dependence analysis encounters a non-constant distance between
memory accesses it aborts the analysis and falls back to run-time checks
only. In this case we weren't resetting the array of dependences.
AVX-512: Added intrinsics for ADDSS/D, MULSS/D, SUBSS/D, DIVSS/D
instructions. These intrinsics are comming with rounding mode.
Added intrinsics for MAXSS/D, MINSS/D - with and without sae.
Hal Finkel [Mon, 18 May 2015 06:25:59 +0000 (06:25 +0000)]
[PowerPC] Add extra r2 read deps on @toc@l relocations
If some commits are happy, and some commits are sad, this is a sad commit. It
is sad because it restricts instruction scheduling to work around a binutils
linker bug, and moreover, one that may never be fixed. On 2012-05-21, GCC was
updated not to produce code triggering this bug, and now we'll do the same...
When resolving an address using the ELF ABI TOC pointer, two relocations are
generally required: one for the high part and one for the low part. Only
the high part generally explicitly depends on r2 (the TOC pointer). And, so,
we might produce code like this:
And there is nothing wrong with this code, as such, but there is a linker bug
in binutils (https://sourceware.org/bugzilla/show_bug.cgi?id=18414) that will
misoptimize this code sequence to this:
nop
std r2,40(r1)
ld r5,0(r27)
ld r2,8(r27)
ld r11,16(r27)
ld r3,-32472(r2)
clrldi r4,r4,32
mtctr r5
bctrl
ld r2,40(r1)
because the linker does not know (and does not check) that the value in r2
changed in between the instruction using the .LC12@toc@ha (TOC-relative)
relocation and the instruction using the .LC12@toc@l(3) relocation.
Because it finds these instructions using the relocations (and not by
scanning the instructions), it has been asserted that there is no good way
to detect the change of r2 in between. As a result, this bug may never be
fixed (i.e. it may become part of the definition of the ABI). GCC was
updated to add extra dependencies on r2 to instructions using the @toc@l
relocations to avoid this problem, and we'll do the same here.
This is done as a separate pass because:
1. These extra r2 dependencies are not really properties of the
instructions, but rather due to a linker bug, and maybe one day we'll be
able to get rid of them when targeting linkers without this bug (and,
thus, keeping the logic centralized here will make that
straightforward).
2. There are ISel-level peephole optimizations that propagate the @toc@l
relocations to some user instructions, and so the exta dependencies do
not apply only to a fixed set of instructions (without undesirable
definition replication).
The test case was reduced with the help of bugpoint, with minimal cleaning. I'm
looking forward to our upcoming MI serialization support, and with that, much
better tests can be created.
James Molloy [Sun, 17 May 2015 08:27:27 +0000 (08:27 +0000)]
Reapply r237520 with another fix for infinite looping
SimplifyDemandedBits was "simplifying" a constant by removing just sign bits.
This caused a canonicalization race between different parts of instcombine.
AVX-512: fixed a bug in mask operations - (i1 1) pattern
Filling k-reg with all-ones value was wrong,
(i1 1) should switch on only one bit in mask register
James Molloy [Sat, 16 May 2015 21:27:14 +0000 (21:27 +0000)]
Revert commits r237521 and r237520.
The AArch64 LNT bot is unhappy - I've found that the problem is in
SimpliftDemandedBits, but that's going to require another code review
so reverting in the meantime.
James Molloy [Sat, 16 May 2015 13:10:45 +0000 (13:10 +0000)]
Reapply r237453 with a fix for the test timeouts.
The test timeouts were due to instcombine fighting itself. Regression test added.
Original log message:
Canonicalize min/max expressions correctly.
This patch introduces a canonical form for min/max idioms where one operand
is extended or truncated. This often happens when the other operand is a
constant. For example:
This builds upon a patch posted by David Majenemer
(https://www.marc.info/?l=llvm-commits&m=143008038714141&w=2). That pass
passively stopped instcombine from ruining canonical patterns. This
patch additionally actively makes instcombine canonicalize too.
Canonicalization of expressions involving a change in type from int->fp
or fp->int are not yet implemented.
Daniel Sanders [Sat, 16 May 2015 12:09:54 +0000 (12:09 +0000)]
[x86] Distinguish the 'o', 'v', 'X', and 'i' inline assembly memory constraints.
Summary:
But still handle them the same way since I don't know how they differ on
this target.
Of these, 'o' and 'v' are not tested but were already implemented.
I'm not sure why 'i' is required for X86 since it's supposed to be an
immediate constraint rather than a memory constraint. A test asserts
without it so I've included it for now.
Craig Topper [Sat, 16 May 2015 05:42:03 +0000 (05:42 +0000)]
[TableGen] Restructure a loop to make it exit early instead of skipping a portion of the body based on what will also be the terminating condition. NFC
Matthias Braun [Sat, 16 May 2015 03:11:07 +0000 (03:11 +0000)]
MachineSink: Collect registers before clearing their killflags.
Currently whenever we sink any instruction, we do clearKillFlags for
every use of every use operand for that instruction, apparently there
are a lot of duplication, therefore compile time penalties.
This patch collect all the interested registers first, do clearKillFlags
for it all together at once at the end, so we only need to do
clearKillFlags once for one register, duplication is avoided.
Ahmed Bougacha [Sat, 16 May 2015 01:32:26 +0000 (01:32 +0000)]
[MemCpyOpt] Turn memcpy from just-memset'd source into memset.
There's no point in copying around constants, so, when all else fails,
we can still transform memcpy of memset into two independent memsets.
To quote the example, we can turn:
memset(dst1, c, dst1_size);
memcpy(dst2, dst1, dst2_size);
into:
memset(dst1, c, dst1_size);
memset(dst2, c, dst2_size);
When dst2_size <= dst1_size.
Like r235232 for copy constructors, this can occur in move constructors.