Juergen Ributzka [Wed, 27 Aug 2014 21:04:52 +0000 (21:04 +0000)]
[FastISel][AArch64] Use the zero register for stores.
Use the zero register directly when possible to avoid an unnecessary register
copy and a wasted register at -O0. This also uses integer stores to store a
positive floating-point zero. This saves us from materializing the positive zero
in a register and then storing it.
Juergen Ributzka [Wed, 27 Aug 2014 20:47:33 +0000 (20:47 +0000)]
[FastISel] Fix a potential bug in FastEmitInst_ri
FastEmitInst_ri was constraining the first operand without checking if it is
a virtual register. Use constrainOperandRegClass as all the other
FastEmitInst_* functions.
David Blaikie [Wed, 27 Aug 2014 20:14:18 +0000 (20:14 +0000)]
Convert a few more cases of direct intialization of unique_ptrs from MemoryBuffer::getMemBuffer to move initialization now that it returns by unique_ptr instead of raw pointer.
Reid Kleckner [Wed, 27 Aug 2014 20:10:38 +0000 (20:10 +0000)]
X86 MC: Handle instructions like fxsave that match multiple operand sizes
Instructions like 'fxsave' and control flow instructions like 'jne'
match any operand size. The loop I added to the Intel syntax matcher
assumed that using a different size would give a different instruction.
Now it handles the case where we get the same instruction for different
memory operand sizes.
This also allows us to remove the hack we had for unsized absolute
memory operands, because we can successfully match things like 'jnz'
without reporting ambiguity. Removing this hack uncovered test case
involving 'fadd' that was ambiguous. The memory operand could have been
single or double precision.
David Majnemer [Wed, 27 Aug 2014 20:08:37 +0000 (20:08 +0000)]
InstCombine: Combine gep X, (Y-X) to Y
We try to perform this transform in InstSimplify but we aren't always
able to. Sometimes, we need to insert a bitcast if X and Y don't have
the same time.
Alexey Samsonov [Wed, 27 Aug 2014 19:36:53 +0000 (19:36 +0000)]
Use BitVector instead of int in R600 SIISelLowering.
int may not have enough bits in it, which was detected by UBSan
bootstrap (it reported left shift by a too large constant).
David Majnemer [Wed, 27 Aug 2014 18:03:46 +0000 (18:03 +0000)]
InstSimplify: Compute comparison ranges for left shift instructions
'shl nuw CI, x' produces [CI, CI << CLZ(CI)]
'shl nsw CI, x' produces [CI << CLO(CI)-1, CI] if CI is negative
'shl nsw CI, x' produces [CI, CI << CLZ(CI)-1] if CI is non-negative
Oliver Stannard [Wed, 27 Aug 2014 16:16:04 +0000 (16:16 +0000)]
Teach the AArch64 backend about v4f16 and v8f16
This teaches the AArch64 backend to deal with the operations required
to deal with the operations on v4f16 and v8f16 which are exposed by
NEON intrinsics, plus the add, sub, mul and div operations.
Chandler Carruth [Wed, 27 Aug 2014 11:39:47 +0000 (11:39 +0000)]
[x86] Fix a regression introduced with r213897 for 32-bit targets where
we stopped efficiently lowering sextload using the SSE41 instructions
for that operation.
This is a consequence of a bad predicate I used thinking of the memory
access needs. The code actually handles the cases where the predicate
doesn't apply, and handles them much better. =] Simple fix and a test
case added. Fixes PR20767.
Chandler Carruth [Wed, 27 Aug 2014 11:22:16 +0000 (11:22 +0000)]
[SDAG] Re-instate r215611 with a fix to a pesky X86 DAG combine.
This combine is essentially combining target-specific nodes back into target
independent nodes that it "knows" will be combined yet again by a target
independent DAG combine into a different set of target-independent nodes that
are legal (not custom though!) and thus "ok". This seems... deeply flawed. The
crux of the problem is that we don't combine un-legalized shuffles that are
introduced by legalizing other operations, and thus we don't see a very
profitable combine opportunity. So the backend just forces the input to that
combine to re-appear.
However, for this to work, the conditions detected to re-form the unlegalized
nodes must be *exactly* right. Previously, failing this would have caused poor
code (if you're lucky) or a crasher when we failed to select instructions.
After r215611 we would fall back into the legalizer. In some cases, this just
"fixed" the crasher by produces bad code. But in the test case added it caused
the legalizer and the dag combiner to iterate forever.
The fix is to make the alignment checking in the x86 side of things match the
alignment checking in the generic DAG combine exactly. This isn't really a
satisfying or principled fix, but it at least make the code work as intended.
It also highlights that it would be nice to detect the availability of under
aligned loads for a given type rather than bailing on this optimization. I've
left a FIXME to document this.
Original commit message for r215611 which covers the rest of the chang:
[SDAG] Fix a case where we would iteratively legalize a node during
combining by replacing it with something else but not re-process the
node afterward to remove it.
In a truly remarkable stroke of bad luck, this would (in the test case
attached) end up getting some other node combined into it without ever
getting re-processed. By adding it back on to the worklist, in addition
to deleting the dead nodes more quickly we also ensure that if it
*stops* being dead for any reason it makes it back through the
legalizer. Without this, the test case will end up failing during
instruction selection due to an and node with a type we don't have an
instruction pattern for.
It took many million runs of the shuffle fuzz tester to find this.
We supported transforming:
(gep i8* X, -(ptrtoint Y))
to:
(inttoptr (sub (ptrtoint X), (ptrtoint Y)))
However, this only fired if 'X' had type i8*. Generalize this to
support various types of different sizes. This results in much better
CodeGen, especially for pointers to packed structs.
Juergen Ributzka [Wed, 27 Aug 2014 00:58:30 +0000 (00:58 +0000)]
[FastISel][AArch64] Fix address simplification.
When a shift with extension or an add with shift and extension cannot be folded
into the memory operation, then the address calculation has to be materialized
separately. While doing so the code forgot to consider a possible sign-/zero-
extension. This fix folds now also the sign-/zero-extension into the add or
shift instruction which is used to materialize the address.
Reid Kleckner [Tue, 26 Aug 2014 20:32:34 +0000 (20:32 +0000)]
MC: Split the x86 asm matcher implementations by dialect
The existing matcher has lots of AT&T assembly dialect assumptions baked
into it. In particular, the hack for resolving the size of a memory
operand by appending the four most common suffixes doesn't work at all.
The Intel assembly dialect mnemonic table has ambiguous entries, so we
need to try matching multiple times with different operand sizes, since
that's the only way to choose different instruction variants.
This makes us more compatible with gas's implementation of Intel
assembly syntax. MSVC assumes you want byte-sized operations for the
instructions that we reject as ambiguous.
Rafael Espindola [Tue, 26 Aug 2014 17:19:03 +0000 (17:19 +0000)]
Return a std::unique_ptr from parseInputFile and propagate. NFC.
The memory management in BugPoint is fairly convoluted, so this just unwraps
one layer by changing the return type of functions that always return
owned Modules.
James Molloy [Tue, 26 Aug 2014 13:41:31 +0000 (13:41 +0000)]
Change the return value of "getEnd()" from a MachineInstr* to a MachineBasicBlock::iterator.
It seems on Darwin the illegal round-trip ::iterator -> MachineInstr* -> ::iterator breaks execution horribly when the iterator is not a real MachineInstr, like ::end().
Reid Kleckner [Tue, 26 Aug 2014 00:33:28 +0000 (00:33 +0000)]
Declare that musttail calls in variadic functions forward the ellipsis
Summary:
There is no functionality change here except in the way we assemble and
dump musttail calls in variadic functions. There's really no need to
separate out the bits for musttail and "is forwarding varargs" on call
instructions. A musttail call by definition has to forward the ellipsis
or it would fail verification.
Reid Kleckner [Mon, 25 Aug 2014 23:58:48 +0000 (23:58 +0000)]
ArgPromotion: Don't touch variadic functions
Adding, removing, or changing non-pack parameters can change the ABI
classification of pack parameters. Clang and other frontends encode the
classification in the IR of the call site, but the callee side
determines it dynamically based on the number of registers consumed so
far. Changing the prototype affects the number of registers consumed
would break such code.
Dead argument elimination performs a similar task and already has a
similar check to avoid this problem.
Lang Hames [Mon, 25 Aug 2014 23:33:48 +0000 (23:33 +0000)]
[MCJIT][SystemZ] Use a simpler expression for indirect relocation offsets.
The expressions 'Reloc.Addend - Addend' and 'Reloc.Offset' should always be
equal in this context. The latter is prefered - we want to remove the
RelocationValueRef::Addend field in the future.
Rafael Espindola [Mon, 25 Aug 2014 22:53:21 +0000 (22:53 +0000)]
Fix bug in llvm::sys::argumentsFitWithinSystemLimits().
This patch fixes a subtle bug in the UNIX implementation of
llvm::sys::argumentsFitWithinSystemLimits() regarding the misuse of a static
variable. This bug causes our cached number that stores the system command line
maximum length to be halved after each call to the function. With a sufficient
number of calls to this function, it will eventually report any given command
line string to be over system limits.
Rafael Espindola [Mon, 25 Aug 2014 22:15:06 +0000 (22:15 +0000)]
Refactor argument serialization logic when executing process. NFC.
This patch refactors the argument serialization logic used in the Execute
function, used to launch new Windows processes. There is a critical step that
joins char** arguments into a single string, building the command line used to
launch the new process, and the readability of this code is improved if this
part is refactored in its own helper function.
Chandler Carruth [Mon, 25 Aug 2014 18:06:11 +0000 (18:06 +0000)]
[x86] Fix a bug in r216319 where I was missing a 'break'.
This actually was caught by existing tests but those tests were disabled
with an XFAIL because of PR20736. While working on fixing that,
I noticed the test failure, and tracked it down to this.
We even have a really nice Clang warning that would have caught this but
it isn't enabled in LLVM! =[ I may look at enabling it.
GlobalDCE deletes global vars and updates their initializers to nullptr
while leaving underlying constants to be cleaned up later by its uses.
The clean up may never happen, fix this by forcing it every time it's
safe to destroy constants.
Final patch by Rafael Espindola
http://reviews.llvm.org/D4931
Robert Khasanov [Mon, 25 Aug 2014 14:49:34 +0000 (14:49 +0000)]
[SKX] avx512_icmp_packed multiclass extension
Extended avx512_icmp_packed multiclass by masking versions.
Added avx512_icmp_packed_rmb multiclass for embedded broadcast versions.
Added corresponding _vl multiclasses.
Added encoding tests for CPCMP{EQ|GT}* instructions.
Add more fields for X86VectorVTInfo.
Added AVX512VLVectorVTInfo that include X86VectorVTInfo for 512/256/128-bit versions
Karthik Bhat [Mon, 25 Aug 2014 04:56:54 +0000 (04:56 +0000)]
Allow vectorization of division by uniform power of 2.
This patch adds support to recognize division by uniform power of 2 and modifies the cost table to vectorize division by uniform power of 2 whenever possible.
Updates Cost model for Loop and SLP Vectorizer.The cost table is currently only updated for X86 backend.
Thanks to Hal, Andrea, Sanjay for the review. (http://reviews.llvm.org/D4971)