[PowerPC] Swap arguments and adjust shift count for vsldoi on little endian
Commits r213915 and r214718 fix recognition of shuffle masks for vmrg*
and vpku*um instructions for a little-endian target, by swapping the
input arguments. The vsldoi instruction requires similar treatment,
and also needs its shift count adjusted for little endian.
Reviewed by Ulrich Weigand.
This is a bug fix candidate for release 3.5 (and hopefully the last of
those for PowerPC).
X86: restrict combine to when type sizes are safe.
The folding of unary operations through a vector compare and mask operation
is only safe if the unary operation result is of the same size as its input.
For example, it's not safe for [su]itofp from v4i32 to v4f64.
------------------------------------------------------------------------
My original LE implementation of the vsldoi instruction, with its
altivec.h interfaces vec_sld and vec_vsldoi, produces incorrect
shufflevector operations in the LLVM IR. Correct code is generated
because the back end handles the incorrect shufflevector in a
consistent manner.
This patch and a companion patch for Clang correct this problem by
removing the fixup from altivec.h and the corresponding fixup from the
PowerPC back end. Several test cases are also modified to reflect the
now-correct LLVM IR.
[PowerPC] Swap arguments to vpkuhum/vpkuwum on little-endian
In commit r213915, Bill fixed little-endian usage of vmrgh* and vmrgl*
by swapping the input arguments. As it turns out, the exact same fix
is also required for the vpkuhum/vpkuwum patterns.
This fixes another regression in llvmpipe when vector support is
enabled.
[PowerPC] MULHU/MULHS are not legal for vector types
I ran into some test failures where common code changed vector division
by constant into a multiply-high operation (MULHU). But these are not
implemented by the back-end, so we failed to recognize the insn.
Fixed by marking MULHU/MULHS as Expand for vector types.
This patch refactors code generation of vector comparisons.
This fixes a wrong code-gen bug for ISD::SETGE for floating-point types,
and improves generated code for vector comparisons in general.
Specifically, the patch moves all logic deciding how to implement vector
comparisons into getVCmpInst, which gets two extra boolean outputs
indicating to its caller whether its needs to swap the input operands
and/or negate the result of the comparison. Apart from implementing
these two modifications as directed by getVCmpInst, there is no need
to ever implement vector comparisons in any other manner; in particular,
there is never a need to perform two separate comparisons (e.g. one for
equal and one for greater-than, as code used to do before this patch).
X86: drop relocations on __eh_frame sections globally.
Without this, we produce non-extern relocations when targeting older OS X
versions that ld64 can't cope with in the particular context of __eh_frame
sections (who'd want generic relocation-processing anyway?).
This means that an updated linker (ld64 from Xcode 3.2.6 or later) may be
needed when targeting such platforms with a modern version of LLVM, but this is
probably the case anyway and a reasonable requirement.
We may visit a call that uses an alloca multiple times in callUsesLocalStack, sometimes with IsNocapture true and sometimes with IsNocapture false. We accidentally skipped work we needed to do in the IsNocapture=false case if we were called with IsNocapture=true the first time. Fixes PR20405!
The Microsoft ABI and MSVCRT are considered the canonical C runtime and ABI.
The long double routines are not part of this environment. However, cygwin and
MinGW both provide supplementary implementations. Change the condition to
reflect this reality.
------------------------------------------------------------------------
X86: correct library call setup for Windows itanium
This target is identical to the Windows MSVC (and follows Microsoft ABI for C).
Correct the library call setup for this target. The same set of library calls
are missing on this environment.
------------------------------------------------------------------------
Fix ScalarEvolutionExpander when creating a PHI in a block with duplicate predecessors
It seems that when I fixed this, almost exactly a year ago, I did not quite do
it correctly. When we have duplicate block predecessors, we can indeed not have
different incoming values for the same block, but we *must* have duplicate
entries. So, instead of skipping the duplicates, we explicitly add the
duplicate incoming values.
This is a follow-up to the activity in the bug at
http://llvm.org/bugs/show_bug.cgi?id=18663 . The underlying issue has
to do with how the KILL pseudo-instruction is handled. I defer to
Hal/Jakob/Uli for additional details and background.
This will disable the (bad?) assert, add an associated fixme comment,
and add a pair of tests.
The code change and the pr18663-2.ll test are copied from the referenced
bug. That test does not immediately fail in my environment, but I have
added the pr18663.ll test which does.
(Comment from Hal)
to provide everyone else with some context, this assert was not bad when
it was written. At that time, we only generated KILL pseudo instructions
around subregister copies. This logic, unfortunately, had its own problems.
In r199797, the relevant logic in MachineCopyPropagation was replaced to
generate KILLs for other kinds of copies too. This change in semantics broke
this now-problematic assumption in AggressiveAntiDepBreaker. The
AggressiveAntiDepBreaker really needs a proper cleanup to deal with the
change, but removing the assert (which just allows the function to return
false) is a safe conservative behavior, and should do for the time being.
Constant fold the lanes of the input constant build_vector individually
so we correctly handle when the vector elements are not all the same
constant value.
[NVPTX] Silence a GCC warning found by the buildbots
The cast to NVPTXTargetLowering was missing a 'const', but let's
just access the right pointer through the subtarget anyway.
------------------------------------------------------------------------
[NVPTX] Make sure we do not generate MULWIDE ISD nodes when optimizations are disabled
With optimizations disabled, we disable the isel patterns for mul.wide; but we
were still generating MULWIDE ISD nodes. Now, we only try to generate MULWIDE
ISD nodes in DAGCombine if the optimization level is not zero.
------------------------------------------------------------------------
Don't manually (and forcibly) run the verifier on the entire module from
the jump instruction table pass. First, the verifier is already built
into all the tools. The test case is adapted to just run llvm-as
demonstrating that we still catch the broken module. Second, the
verifier is *extremely* slow. This was responsible for very significant
compile time regressions.
If you have deployed a Clang binary anywhere from r210280 to this
commit, you really want to re-deploy.
------------------------------------------------------------------------
Regenerate autoconf, previous updates to the configury haven't
been updating configure.
------------------------------------------------------------------------
Try to fix the bots. If this does not work, I am going to move it to X86 directory.
------------------------------------------------------------------------
SimplifyCFG: fix a bug in switch to table conversion
We use gep to access the global array "switch.table", and the table index
should be treated as unsigned. When the highest bit is 1, this commit
zero-extends the index to an integer type with larger size.
For a switch on i2, we used to generate:
%switch.tableidx = sub i2 %0, -2
getelementptr inbounds [4 x i64]* @switch.table, i32 0, i2 %switch.tableidx
It is incorrect when %switch.tableidx is 2 or 3. The fix is to generate
%switch.tableidx = sub i2 %0, -2
%switch.tableidx.zext = zext i2 %switch.tableidx to i3
getelementptr inbounds [4 x i64]* @switch.table, i32 0, i3 %switch.tableidx.zext
Even if there's a file called c:\a, we want /? to be preserved as
an option, not expanded to a filename.
------------------------------------------------------------------------
Emit a warning if llvm_map_components_to_libraries() is used noting that its
use is deprecated in favour of llvm_map_components_to_libnames()
Although message(DEPRECATION "msg") would probably be a better fit this
does nothing if CMAKE_ERROR_DEPRECATED and CMAKE_WARNING_DEPRECATED are
both off, which is the default.
------------------------------------------------------------------------
Document the new LLVM CMake interface for building against LLVM
libraries. With many contributions from Brad King.
------------------------------------------------------------------------
[PATCH][PPC64LE] Correct little-endian usage of vmrgh* and vmrgl*.
Because the PowerPC vmrgh* and vmrgl* instructions have a built-in
big-endian bias, it is necessary to swap their inputs in little-endian
mode when using them to implement a vector shuffle. This was
previously missed in the vector LE implementation.
There was already logic to distinguish between unary and "normal"
vmrg* vector shuffles, so this patch extends that logic to use a third
option: "swapped" vmrg* vector shuffles that are used for little
endian in place of the "normal" ones.
I've updated the vec-shuffle-le.ll test to check for the expected
register ordering on the generated instructions.
This bug was discovered when testing the LE and ELFv2 patches for
safety if they were backported to 3.4. A different vectorization
decision was made in 3.4 than on mainline trunk, and that exposed the
problem. I've verified this fix takes care of that issue.
When we had a vector_shuffle where we had an input from each vector, we
could miscompile it because we were assuming the input from V2 wouldn't
be moved from where it was on the vector.
Export LLVM_ENABLE_RTTI and LLVM_ENABLE_EH in LLVMConfig.cmake so
clients of LLVM know if RTTI and/or EH were enabled in the build of
LLVM they are trying to link against.
------------------------------------------------------------------------
Added LLVM_ENABLE_RTTI and LLVM_ENABLE_EH options that allow RTTI and EH
to globally be controlled. Individual targets (e.g. ExceptionDemo) can
still override this by using LLVM_REQUIRE_RTTI and LLVM_REQUIRE_EH if
they need to be compiled with RTTI or exception handling respectively.
------------------------------------------------------------------------
Richard Smith [Tue, 22 Jul 2014 02:32:12 +0000 (02:32 +0000)]
Revert of r213521. This change introduced a non-hermetic test (depending on a
file not in the test/ area). Backing out now so that this test isn't part of
the 3.5 branch.
Original commit message: "TableGen: Allow AddedComplexity values to be negative
[...]"
Hal Finkel [Mon, 21 Jul 2014 21:30:22 +0000 (21:30 +0000)]
Match semantics of PointerMayBeCapturedBefore to its name by default
As it turns out, the capture tracker named CaptureBefore used by AA, and now
available via the PointerMayBeCapturedBefore function, would have been
more-aptly named CapturedBeforeOrAt, because it considers captures at the
instruction provided. This is not always what one wants, and it is difficult to
get the strictly-before behavior given only the current interface. This adds an
additional parameter which controls whether or not you want to include
captures at the provided instruction. The default is not to include the
instruction provided, so that 'Before' matches its name.
GCC believes it may be possible to not return a value from the switch:
lib/Target/R600/SIRegisterInfo.cpp:187:1: warning: control reaches end of non-void function [-Wreturn-type]
Add an unreachable label to indicate that this is not possible and still permit
switch coverage checking.
Replace the result usages while legalizing cmpxchg.
We should update the usages to all of the results;
otherwise, we might get assertion failure or SEGV during
the type legalization of ATOMIC_CMP_SWAP_WITH_SUCCESS
with two or more illegal types.
For example, in the following sequence, both i8 and i1
might be illegal in some target, e.g. armv5, mipsel, mips64el,
Since both i8 and i1 should be legalized, the corresponding
ATOMIC_CMP_SWAP_WITH_SUCCESS dag will be checked/replaced/updated
twice.
If we don't update the usage to *ALL* of the results in the
first round, the DAG for extractvalue might be processed earlier.
The GetPromotedInteger() will result in assertion failure,
because its operand (i.e. the success bit of cmpxchg) is not
promoted beforehand.
David Blaikie [Mon, 21 Jul 2014 16:26:24 +0000 (16:26 +0000)]
Correct the ownership passing semantics of object::createBinary and make them explicit in the type system.
createBinary documented that it destroyed the parameter in error cases,
though by observation it does not. By passing the unique_ptr by value
rather than lvalue reference, callers are now explicit about passing
ownership and the function implements the documented contract. Remove
the explicit documentation, since now the behavior cannot be anything
other than what was documented, so it's redundant.
Also drops a unique_ptr::release in llvm-nm that was always run on a
null unique_ptr anyway.
Tom Stellard [Mon, 21 Jul 2014 15:45:06 +0000 (15:45 +0000)]
R600/SI: Clean up some of the unused REGISTER_{LOAD,STORE} code
There are a few more cleanups to do, but I ran into some problems
with ext loads and trunc stores, when I tried to change some of the
vector loads and stores from custom to legal, so I wasn't able to
get rid of everything.
Daniel Sanders [Mon, 21 Jul 2014 15:25:24 +0000 (15:25 +0000)]
[mips] Do not emit '.module fp=...' unless we really need to.
We now emit this value when we need to contradict the default value. This
restores support for binutils 2.24.
When a suitable binutils has been released we can resume unconditionally
emitting .module directives. This is preferable to omitting the .module
directives since the .module directives protect against, for example,
accidentally assembling FP32 code with -mfp64 and producing an unusuable object.
Tom Stellard [Mon, 21 Jul 2014 14:01:14 +0000 (14:01 +0000)]
R600/SI: Store constant initializer data in constant memory
This implements a solution for constant initializers suggested
by Vadim Girlin, where we store the data after the shader code
and then use the S_GETPC instruction to compute its address.
This saves use the trouble of creating a new buffer for constant data
and then having to pass the pointer to the kernel via user SGPRs or the
input buffer.
Daniel Sanders [Mon, 21 Jul 2014 13:30:55 +0000 (13:30 +0000)]
[mips] Add MipsOptionRecord abstraction and use it to implement .reginfo/.MIPS.options
This abstraction allows us to support the various records that can be placed in
the .MIPS.options section in the future. We currently use it to record register
usage information (the ODK_REGINFO record in our ELF64 spec).
Each .MIPS.options record should subclass MipsOptionRecord and provide an
implementation of EmitMipsOptionRecord.
Tom Stellard [Mon, 21 Jul 2014 13:28:54 +0000 (13:28 +0000)]
TableGen: Allow AddedComplexity values to be negative
This is useful for cases when stand-alone patterns are preferred to the
patterns included in the instruction definitions. Instead of requiring
that stand-alone patterns set a larger AddedComplexity value, which
can be confusing to new developers, the allows us to reduce the
complexity of the included patterns to achieve the same result.
Hal Finkel [Mon, 21 Jul 2014 13:15:48 +0000 (13:15 +0000)]
Move the CapturesBefore tracker from AA into CaptureTracking
There were two generally-useful CaptureTracker classes defined in LLVM: the
simple tracker defined in CaptureTracking (and made available via the
PointerMayBeCaptured utility function), and the CapturesBefore tracker
available only inside of AA. This change moves the CapturesBefore tracker into
CaptureTracking, generalizes it slightly (by adding a ReturnCaptures
parameter), and makes it generally available via a PointerMayBeCapturedBefore
utility function.
This logic will be needed, for example, to perform noalias function parameter
attribute inference.
This declaration has no definition, which is causing MSVC to emit several "no suitable definition provided for explicit template instantiation request" C4661 warnings.
Hal Finkel [Mon, 21 Jul 2014 12:27:23 +0000 (12:27 +0000)]
Move isIdentifiedFunctionLocal from BasicAA to AA
The ability to identify function locals will exist outside of BasicAA (for
example, logic for inferring noalias function arguments will need this), so
make this concept generally accessible without code duplication.
Daniel Sanders [Mon, 21 Jul 2014 12:25:34 +0000 (12:25 +0000)]
[mips] Try to fix the test/ExecutionEngine tests on a MIPS host.
Fix a dangerous default case that caused MipsCodeEmitter to discard pseudo
instructions it didn't recognize. It will now call llvm_unreachable() for
unrecognized pseudo's and explicitly handles PseudoReturn, PseudoReturn64,
PseudoIndirectBranch, PseudoIndirectBranch64, CFI_INSTRUCTION, IMPLICIT_DEF,
and KILL.
There may be other pseudos that need handling but this was enough for the
ExecutionEngine tests to pass on my test system.
Daniel Sanders [Mon, 21 Jul 2014 10:45:47 +0000 (10:45 +0000)]
[mips] Do not emit '.module [no]oddspreg' unless we really need to.
We now emit this directive when we need to contradict the default value (e.g.
-mno-odd-spreg is given) or an option changed the default value (e.g. -mfpxx
is given).
This restores support for the currently available head of binutils. However,
at this point binutils 2.24 is still not sufficient since it does not support
'.module fp=...'.
Tim Northover [Mon, 21 Jul 2014 09:13:56 +0000 (09:13 +0000)]
CodeGen: emit IR-level f16 conversion intrinsics as fptrunc/fpext
This makes the first stage DAG for @llvm.convert.to.fp16 an fptrunc,
and correspondingly @llvm.convert.from.fp16 an fpext. The legalisation
path is now uniform, regardless of the input IR:
Each target should be able to select the version that best matches its
operations and not be required to duplicate patterns for both fptrunc
and FP_TO_FP16 (for example).
As a result we can remove some redundant AArch64 patterns.
[SDAG,cleanup] Switch the DAG combiner over to use the spelling
'Worklist' consistently rather than a deeply confusing mixture of
'WorkList' and 'Worklist'.
Notably, the very 'WorkList' of the DAG combiner was exposed to target
specific DAG combines under an interface 'AddToWorklist' which was
implemented by in turn calling 'AddToWorkList' in the combiner. This has
sent me circling with the wrong case in grep one too many times.
I chose to normalize on 'Worklist' because that one won the grep-vote
for llvm/lib/... by a hundered hits or so, and it is used in places
relatively "canonical" such as InstCombine's Worklist. Let's all jsut
pick this casing, whether "correct", "good", or "bad" and be
consistent...
[SDAG] Rather than using a narrow test against the one dummy node on the
stack, filter all handle nodes from the DAG combiner worklist.
This will also handle cases where other handle nodes might be
(erroneously) added to the worklist and then cause bugs and explosions
when deleted. For example, when running the legalizer within the DAG
combiner, there are times when other handle nodes are used and can end
up here.
Andrea Di Biagio [Mon, 21 Jul 2014 07:30:54 +0000 (07:30 +0000)]
[DAGCombiner] Improve the shuffle-vector folding logic.
Canonicalize shuffles according to rules:
* shuffle(A, shuffle(A, B)) -> shuffle(shuffle(A,B), A)
* shuffle(B, shuffle(A, B)) -> shuffle(shuffle(A,B), B)
* shuffle(B, shuffle(A, Undef)) -> shuffle(shuffle(A, Undef), B)
This patch helps identifying more shuffle pairs that could be combined reusing
the already existing rules in the DAGCombiner.
Added new test 'combine-vec-shuffle-5.ll' to verify that the canonicalized
shuffles are now folded into a single shuffle node by the DAGCombiner.
Added more test cases to 'combine-vec-shuffle-4.ll'.
Andrea Di Biagio [Mon, 21 Jul 2014 07:28:51 +0000 (07:28 +0000)]
[DAG] Refactor some logic. No functional change.
This patch removes function 'CommuteVectorShuffle' from X86ISelLowering.cpp
and moves its logic into SelectionDAG.cpp as method 'getCommutedVectorShuffles'.
This refactoring is in preperation of an upcoming change to the DAGCombiner.
This patch adds infrastructure support for passing array types
directly. These can be used by the front-end to pass aggregate
types (coerced to an appropriate array type). The details of the
array type being used inform the back-end about ABI-relevant
properties. Specifically, the array element type encodes:
- whether the parameter should be passed in FPRs, VRs, or just
GPRs/stack slots (for float / vector / integer element types,
respectively)
- what the alignment requirements of the parameter are when passed in
GPRs/stack slots (8 for float / 16 for vector / the element type
size for integer element types) -- this corresponds to the
"byval align" field
Using the infrastructure provided by this patch, a companion patch
to clang will enable two features:
- In the ELFv2 ABI, pass (and return) "homogeneous" floating-point
or vector aggregates in FPRs and VRs (this is similar to the ARM
homogeneous aggregate ABI)
- As an optimization for both ELFv1 and ELFv2 ABIs, pass aggregates
that fit fully in registers without using the "byval" mechanism
The patch uses the functionArgumentNeedsConsecutiveRegisters callback
to encode that special treatment is required for all directly-passed
array types. The isInConsecutiveRegs / isInConsecutiveRegsLast bits set
as a results are then used to implement the required size and alignment
rules in CalculateStackSlotSize / CalculateStackSlotAlignment etc.
As a related change, the ABI routines have to be modified to support
passing floating-point types in GPRs. This is necessary because with
homogeneous aggregates of 4-byte float type we can now run out of FPRs
*before* we run out of the 64-byte argument save area that is shadowed
by GPRs. Any extra floating-point arguments that no longer fit in FPRs
must now be passed in GPRs until we run out of those too.
Note that there was already code to pass floating-point arguments in
GPRs used with vararg parameters, which was done by writing the argument
out to the argument save area first and then reloading into GPRs. The
patch re-implements this, however, in favor of code packing float arguments
directly via extension/truncation, BITCAST, and BUILD_PAIR operations.
This is required to support the ELFv2 ABI, since we cannot unconditionally
write to the argument save area (which the caller might not have allocated).
The change does, however, affect ELFv1 varags routines too; but even here
the overall effect should be advantageous: Instead of loading the argument
into the FPR, then storing the argument to the stack slot, and finally
reloading the argument from the stack slot into a GPR, the new code now
just loads the argument into the FPR, and subsequently loads the argument
into the GPR (via BITCAST). That BITCAST might imply a save/reload from
a stack temporary (in which case we're no worse than before); but it
might be implemented more efficiently in some cases.
The final part of the patch enables up to 8 FPRs and VRs for argument
return in PPCCallingConv.td; this is required to support returning
ELFv2 homogeneous aggregates. (Note that this doesn't affect other ABIs
since LLVM wil only look for which register to use if the parameter is
marked as "direct" return anyway.)