Benjamin Kramer [Wed, 31 Oct 2012 16:30:03 +0000 (16:30 +0000)]
LCSSA: Try to recover compile time regressions due to SCEV updates.
- Use value handle tricks to communicate use replacements instead of forgetLoop, this is a lot faster.
- Move the "big hammer" out of the main loop so it's not called for every instruction.
This should recover most (if not all) compile time regressions introduced by this code.
Ulrich Weigand [Wed, 31 Oct 2012 16:18:02 +0000 (16:18 +0000)]
Disable all old-JIT unit tests on PowerPC.
These tests were all failing since the old JIT doesn't work
for PowerPC (any more), and there are no plans to attempt to
fix it again (instead, work focuses on MCJIT).
Hal Finkel [Wed, 31 Oct 2012 15:17:07 +0000 (15:17 +0000)]
BBVectorize: Choose pair ordering to minimize shuffles
BBVectorize would, except for loads and stores, always fuse instructions
so that the first instruction (in the current source order) would always
represent the low part of the input vectors and the second instruction
would always represent the high part. This lead to too many shuffles
being produced because sometimes the opposite order produces fewer of them.
With this change, BBVectorize tracks the kind of pair connections that form
the DAG of candidate pairs, and uses that information to reorder the pairs to
avoid excess shuffles. Using this information, a future commit will be able
to add VTTI-based shuffle costs to the pair selection procedure. Importantly,
the number of remaining shuffles can now be estimated during pair selection.
There are some trivial instruction reorderings in the test cases, and one
simple additional test where we certainly want to do a reordering to
avoid an unnecessary shuffle.
Bill Schmidt [Wed, 31 Oct 2012 01:15:05 +0000 (01:15 +0000)]
This patch addresses an ABI compatibility issue with empty aggregate
parameters. Examples of these are:
struct { } a;
union { } b[256];
int a[0];
An empty aggregate has an address, although dereferencing that address is
pointless. When passed as a parameter, an empty aggregate does not consume
a protocol register, nor does it consume a doubleword in the parameter save
area. Passing an empty aggregate by reference passes an address just as
for any other aggregate. Returning an empty aggregate uses GPR3 as a hidden
address of the return value location, just as for any other aggregate.
The patch modifies PPCTargetLowering::LowerFormalArguments_64SVR4 and
PPCTargetLowering::LowerCall_64SVR4 to properly skip empty aggregate
parameters passed by value. The handling of return values and by-reference
parameters was already correct.
Built on powerpc64-unknown-linux-gnu and tested with no new regressions.
A test case is included to test proper handling of empty aggregate
parameters on both sides of the function call protocol.
Nadav Rotem [Wed, 31 Oct 2012 00:45:26 +0000 (00:45 +0000)]
Add support for loops that don't start with Zero.
This is important for loops in the LAPACK test-suite.
These loops start at 1 because they are auto-converted from fortran.
Meador Inge [Wed, 31 Oct 2012 00:20:56 +0000 (00:20 +0000)]
instcombine: Migrate stpcpy optimizations
This patch migrates the stpcpy optimizations from the simplify-libcalls
pass into the instcombine library call simplifier. Note that the
__stpcpy_chk simplifications were migrated in a previous commit.
Meador Inge [Wed, 31 Oct 2012 00:20:51 +0000 (00:20 +0000)]
instcombine: Split out the __stpcpy_chk simplifications from StrCpyChkOpt
r166198 migrated the strcpy optimization to instcombine. The strcpy
simplifier that was migrated from Transforms/Scalar/SimplifyLibCalls.cpp
was also doing some __strcpy_chk simplifications. Those fortified
simplifications were migrated as well, but introduced a bug in the
__stpcpy_chk simplifier in the process. This happened because the
__strcpy_chk and __stpcpy_chk simplifiers were both mapped to StrCpyChkOpt
which was updated with simplifications that worked for __strcpy_chk, but
not __stpcpy_chk.
This patch fixes the problem by adding proper test coverage and creating a
new simplifier for __stpcpy_chk (instead of sharing one with __strcpy_chk).
Chandler Carruth [Tue, 30 Oct 2012 20:52:40 +0000 (20:52 +0000)]
Fix PR14212: For some strange reason I treated vectors differently from
integers in that the code to handle split alloca-wide integer loads or
stores doesn't come first. It should, for the same reasons as with
integers, and the PR attests to that. Also had to fix a busted assert in
that this test case also covers.
Hal Finkel [Tue, 30 Oct 2012 20:17:37 +0000 (20:17 +0000)]
BBVectorize: Cache fixed-order pairs instead of recomputing pointer info.
Instead of recomputing relative pointer information just prior to fusing,
cache this information (which also needs to be computed during the
candidate-pair selection process). This cuts down on the total number of
SE queries made, and also is a necessary intermediate step on the road toward
including shuffle costs in the pair selection procedure.
Akira Hatanaka [Tue, 30 Oct 2012 19:37:25 +0000 (19:37 +0000)]
Add code for saving formal argument information to MipsFunctionInfo. This
information will be used by IsEligibleForTailCallOptimization to determine
whether a call can be tail-call optimized.
Hal Finkel [Tue, 30 Oct 2012 19:35:29 +0000 (19:35 +0000)]
BBVectorize: Simplify how input swapping is handled.
Stop propagating the FlipMemInputs variable into the routines that
create the replacement instructions. Instead, just flip the arguments
of those routines. This allows for some associated cleanup (not all
of which is done here). No functionality change is intended.
Chad Rosier [Tue, 30 Oct 2012 19:11:54 +0000 (19:11 +0000)]
[inline asm] Implement mayLoad and mayStore for inline assembly. In general,
the MachineInstr MayLoad/MayLoad flags are based on the tablegen implementation.
For inline assembly, however, we need to compute these based on the constraints.
Revert r166929 as this is no longer needed, but leave the test case in place.
rdar://12033048 and PR13504
PowerPC: More support for Altivec compare operations
This patch adds more support for vector type comparisons using altivec.
It adds correct support for v16i8, v8i16, v4i32, and v4f32 vector
types for comparison operators ==, !=, >, >=, <, and <=.
Duncan Sands [Tue, 30 Oct 2012 13:38:54 +0000 (13:38 +0000)]
Add a helper for telling whether a type is a pointer or vector of pointer type.
Simplify the implementation of the corresponding integer and float functions and
move them inline while there.
Hans Wennborg [Tue, 30 Oct 2012 11:23:25 +0000 (11:23 +0000)]
Use TargetTransformInfo to control switch-to-lookup table transformation
When the switch-to-lookup tables transform landed in SimplifyCFG, it
was pointed out that this could be inappropriate for some targets.
Since there was no way at the time for the pass to know anything about
the target, an awkward reverse-transform was added in CodeGenPrepare
that turned lookup tables back into switches for some targets.
This patch uses the new TargetTransformInfo to determine if a
switch should be transformed, and removes
CodeGenPrepare::ConvertLoadToSwitch.
Reed Kotler [Tue, 30 Oct 2012 00:54:49 +0000 (00:54 +0000)]
Change mips16 delay slot jumps to non delay slot forms by default.
We will make them delay slot forms if there is something that can be
placed in the delay slot during a separate pass. Mips16 extended instructions
cannot be placed in delay slots.
Nadav Rotem [Tue, 30 Oct 2012 00:40:39 +0000 (00:40 +0000)]
LoopVectorizer: change debug prints: Print the module identifier when deciding to vectorize. When deciding not to vectorize do not print the called function name because it can be null.
Kevin Enderby [Mon, 29 Oct 2012 23:27:20 +0000 (23:27 +0000)]
Fix ARM's b.w instruction for thumb 2 and the encoding T4. The branch target
is 24 bits not 20 and the decoding needed to correctly handle converting the
J1 and J2 bits to their I1 and I2 values to reconstruct the displacement.
Bill Schmidt [Mon, 29 Oct 2012 21:18:16 +0000 (21:18 +0000)]
This patch solves a problem with passing varargs parameters under the PPC64
ELF ABI.
A varargs parameter consisting of a single-precision floating-point value,
or of a single-element aggregate containing a single-precision floating-point
value, must be passed in the low-order (rightmost) four bytes of the
doubleword stack slot reserved for that parameter. If there are GPR protocol
registers remaining, the parameter must also be mirrored in the low-order
four bytes of the reserved GPR.
Prior to this patch, such parameters were being passed in the high-order
four bytes of the stack slot and the mirrored GPR.
The patch adds a new test case to verify the correct code generation.
Ulrich Weigand [Mon, 29 Oct 2012 18:09:01 +0000 (18:09 +0000)]
Implement arithmetic on APFloat with PPCDoubleDouble semantics by
treating it as if it were an IEEE floating-point type with 106-bit
mantissa.
This makes compile-time arithmetic on "long double" for PowerPC
in clang (in particular parsing of floating point constants)
work, and fixes all "long double" related failures in the test
suite.
Chad Rosier [Mon, 29 Oct 2012 18:01:54 +0000 (18:01 +0000)]
[ms-inline asm] Add support for the [] operator. Essentially, [expr1][expr2] is
equivalent to [expr1 + expr2]. See test cases for more examples.
rdar://12470392