Daniel Sanders [Mon, 1 Dec 2014 13:34:51 +0000 (13:34 +0000)]
Merged from r218036:
[mips] Remove custom versions of CCState::AnalyzeReturn() and CCState::AnalyzeCallReturn().
Summary:
The N32/N64 ABI's return f128 values in $f0 and $f2 for hard-float and $v0 and
$a0 for soft-float. The registers used in the soft-float case differ from the
usual $v0, and $v1 specified for return values.
Both cases were previously handled by duplicating the CCState::AnalyzeReturn()
and CCState::AnalyzeCallReturn() functions and modifying them to delegate to
a different assignment function for f128 and further replace the register type
for the hard-float case. There is a simpler way to do both of these.
We now use the common functions and select an initial assignment function based
on whether the original type is f128 or not. We then handle the hard-float case
using CCBitConvertToType<>.
Daniel Sanders [Mon, 1 Dec 2014 11:38:38 +0000 (11:38 +0000)]
Merged from r215211:
[mips] Invert the abicalls feature bit to be noabicalls so that it's possible for -mno-abicalls to take effect.
Also added the testcase that should have been in r215194.
This behaviour has surprised me a few times now. The problem is that the
generated MipsSubtarget::ParseSubtargetFeatures() contains code like this:
if ((Bits & Mips::FeatureABICalls) != 0) IsABICalls = true;
so '-abicalls' means 'leave it at the default' and '+abicalls' means 'set it to
true'. In this case, (and the similar -modd-spreg case) I'd like the code to be
[mips] Tolerate the use of the %z inline asm operand modifier with non-immediates.
Summary:
Currently, we give an error if %z is used with non-immediates, instead of continuing as if the %z isn't there.
For example, you use the %z operand modifier along with the "Jr" constraints ("r" makes the operand a register, and "J" makes it an immediate, but only if its value is 0).
In this case, you want the compiler to print "$0" if the inline asm input operand turns out to be an immediate zero and you want it to print the register containing the operand, if it's not.
We give an error in the latter case, and we shouldn't (GCC also doesn't).
Tom Stellard [Wed, 5 Nov 2014 20:14:56 +0000 (20:14 +0000)]
Merging r220401:
------------------------------------------------------------------------
r220401 | mail | 2014-10-22 14:18:54 -0400 (Wed, 22 Oct 2014) | 6 lines
test: Make this test runnable in directories with @ in their names
Jenkins likes to use directories with names involving the '@'
character, which breaks the sed expression in this test. Switch to use
'|' on the assumption that it's less likely to show up in a path.
This fixes the generation of broken LLVMExports.cmake file by
the Autoconf/Makefile build system when --enable-shared is passed to
configure.
When --enable_shared is passed the Makefile.rules does not set the
LLVMConfigLibs variable which cmake/modules/Makefile previously relied
on. Now it runs the llvm-config command itself to get the library names.
This still isn't perfect because the generated LLVM targets refer to the
static libraries and not the shared library but that is much larger
problem to fix.
------------------------------------------------------------------------
LLVM generates illegal `rbit r0, #352` instruction for rbit intrinsic.
According to ARM ARM, rbit only takes register as argument, not immediate.
The correct instruction should be rbit <Rd>, <Rm>.
[PPC64] Add missing dependency on X2 to LDinto_toc.
The LDinto_toc pattern has been part of 64-bit PowerPC for a long
time, and represents loading from a memory location into the TOC
register (X2). However, this pattern doesn't explicitly record that
it modifies that register. This patch adds the missing dependency.
It was very surprising to me that this has never shown up as a problem
in the past, and that we only saw this problem recently in a single
scenario when building a self-hosted clang. It turns out that in most
cases we have another dependency present that keeps the LDinto_toc
instruction tied in place. LDinto_toc is used for TOC restore
following a call site, so this is a typical sequence:
Because the LDinto_toc is inserted prior to the ADJCALLSTACKUP, there
is a natural anti-dependency between the two that keeps it in place.
Therefore we don't usually see a problem. However, in one particular
case, one call is followed immediately by another call, and the second
call requires a parameter that is a TOC-relative address. This is the
code sequence:
Now there is nothing to prevent the ADDIStocHA instruction from moving
ahead of the LDinto_toc instruction, and because of the longest-path
heuristic, this is what happens.
With the accompanying patch, %X2 is represented as an implicit def:
So now when the two stack adjustments are removed, ADDIStocHA is
prevented from being moved above LDinto_toc.
I have not yet created a test case for this, because the original
failure occurs on a relatively large function that needs reduction.
However, this is a fairly serious bug, despite its infrequency, and I
wanted to get this patch onto the list as soon as possible so that it
can be considered for a 3.5 backport. I'll work on whittling down a
test case.
[x86] Fix the test case added in r214670 and tweaked in r214674 further.
Fundamentally, there isn't a really portable way to test the constant
pool contents. Instead, pin this test to the bare-metal triple. This
also makes it a 64-bit triple which allows us to only match a single
constant pool rather than two. It can also just hard code the '.' prefix
as the format should be stable now that it has a fixed triple. Finally,
I've switched it to use CHECK-NEXT to be more precise in the instruction
sequence expected and to use variables rather than hard coding decisions
by the register allocator.
------------------------------------------------------------------------
fix for PR20354 - Miscompile of fabs due to vectorization
This is intended to be the minimal change needed to fix PR20354 ( http://llvm.org/bugs/show_bug.cgi?id=20354 ). The check for a vector operation was wrong; we need to check that the fabs itself is not a vector operation.
This patch will not generate the optimal code. A constant pool load and 'and' op will be generated instead of just returning a value that we can calculate in advance (as we do for the scalar case). I've put a 'TODO' comment for that here and expect to have that patch ready soon.
There is a very similar optimization that we can do in visitFNEG, so I've put another 'TODO' there and expect to have another patch for that too.
[PowerPC] Generate unaligned vector loads using intrinsics instead of regular loads
Altivec vector loads on PowerPC have an interesting property: They always load
from an aligned address (by rounding down the address actually provided if
necessary). In order to generate an actual unaligned load, you can generate two
load instructions, one with the original address, one offset by one vector
length, and use a special permutation to extract the bytes desired.
When this was originally implemented, I generated these two loads using regular
ISD::LOAD nodes, now marked as aligned. Unfortunately, there is a problem with
this:
The alignment of a load does not contribute to its identity, and SDNodes
are uniqued. So, imagine that we have some unaligned load, L1, that is not
aligned. The routine will create two loads, L1(aligned) and (L1+16)(aligned).
Further imagine that there had already existed a load (L1+16)(unaligned) with
the same chain operand as the load L1. When (L1+16)(aligned) is created as part
of the lowering of L1, this load *is* also the (L1+16)(unaligned) node, just
now marked as aligned (because the new alignment overwrites the old). But the
original users of (L1+16)(unaligned) now get the data intended for the
permutation yielding the data for L1, and (L1+16)(unaligned) no longer exists
to get its own permutation-based expansion. This was PR19991.
A second potential problem has to do with the MMOs on these loads, which can be
used by AA during instruction scheduling to break chain-based dependencies. If
the new "aligned" loads get the MMO from the original unaligned load, this does
not represent the fact that it will load data from below the original address.
Normally, this would not matter, but this load might be combined with another
load pair for a previous vector, and then the dependency on the otherwise-
ignored lower bytes can matter.
To fix both problems, instead of generating the necessary loads using regular
ISD::LOAD instructions, ppc_altivec_lvx intrinsics are used instead. These are
provided with MMOs with a conservative address range.
Unfortunately, I no longer have a failing test case (since PR19991 was
reported, other changes in CodeGen have forced this bug back into hiding it
again). Nevertheless, this should fix the underlying problem.
------------------------------------------------------------------------
[PowerPC] Swap arguments and adjust shift count for vsldoi on little endian
Commits r213915 and r214718 fix recognition of shuffle masks for vmrg*
and vpku*um instructions for a little-endian target, by swapping the
input arguments. The vsldoi instruction requires similar treatment,
and also needs its shift count adjusted for little endian.
Reviewed by Ulrich Weigand.
This is a bug fix candidate for release 3.5 (and hopefully the last of
those for PowerPC).
X86: restrict combine to when type sizes are safe.
The folding of unary operations through a vector compare and mask operation
is only safe if the unary operation result is of the same size as its input.
For example, it's not safe for [su]itofp from v4i32 to v4f64.
------------------------------------------------------------------------
My original LE implementation of the vsldoi instruction, with its
altivec.h interfaces vec_sld and vec_vsldoi, produces incorrect
shufflevector operations in the LLVM IR. Correct code is generated
because the back end handles the incorrect shufflevector in a
consistent manner.
This patch and a companion patch for Clang correct this problem by
removing the fixup from altivec.h and the corresponding fixup from the
PowerPC back end. Several test cases are also modified to reflect the
now-correct LLVM IR.
[PowerPC] Swap arguments to vpkuhum/vpkuwum on little-endian
In commit r213915, Bill fixed little-endian usage of vmrgh* and vmrgl*
by swapping the input arguments. As it turns out, the exact same fix
is also required for the vpkuhum/vpkuwum patterns.
This fixes another regression in llvmpipe when vector support is
enabled.
[PowerPC] MULHU/MULHS are not legal for vector types
I ran into some test failures where common code changed vector division
by constant into a multiply-high operation (MULHU). But these are not
implemented by the back-end, so we failed to recognize the insn.
Fixed by marking MULHU/MULHS as Expand for vector types.
This patch refactors code generation of vector comparisons.
This fixes a wrong code-gen bug for ISD::SETGE for floating-point types,
and improves generated code for vector comparisons in general.
Specifically, the patch moves all logic deciding how to implement vector
comparisons into getVCmpInst, which gets two extra boolean outputs
indicating to its caller whether its needs to swap the input operands
and/or negate the result of the comparison. Apart from implementing
these two modifications as directed by getVCmpInst, there is no need
to ever implement vector comparisons in any other manner; in particular,
there is never a need to perform two separate comparisons (e.g. one for
equal and one for greater-than, as code used to do before this patch).
X86: drop relocations on __eh_frame sections globally.
Without this, we produce non-extern relocations when targeting older OS X
versions that ld64 can't cope with in the particular context of __eh_frame
sections (who'd want generic relocation-processing anyway?).
This means that an updated linker (ld64 from Xcode 3.2.6 or later) may be
needed when targeting such platforms with a modern version of LLVM, but this is
probably the case anyway and a reasonable requirement.
We may visit a call that uses an alloca multiple times in callUsesLocalStack, sometimes with IsNocapture true and sometimes with IsNocapture false. We accidentally skipped work we needed to do in the IsNocapture=false case if we were called with IsNocapture=true the first time. Fixes PR20405!
The Microsoft ABI and MSVCRT are considered the canonical C runtime and ABI.
The long double routines are not part of this environment. However, cygwin and
MinGW both provide supplementary implementations. Change the condition to
reflect this reality.
------------------------------------------------------------------------
X86: correct library call setup for Windows itanium
This target is identical to the Windows MSVC (and follows Microsoft ABI for C).
Correct the library call setup for this target. The same set of library calls
are missing on this environment.
------------------------------------------------------------------------
Fix ScalarEvolutionExpander when creating a PHI in a block with duplicate predecessors
It seems that when I fixed this, almost exactly a year ago, I did not quite do
it correctly. When we have duplicate block predecessors, we can indeed not have
different incoming values for the same block, but we *must* have duplicate
entries. So, instead of skipping the duplicates, we explicitly add the
duplicate incoming values.
This is a follow-up to the activity in the bug at
http://llvm.org/bugs/show_bug.cgi?id=18663 . The underlying issue has
to do with how the KILL pseudo-instruction is handled. I defer to
Hal/Jakob/Uli for additional details and background.
This will disable the (bad?) assert, add an associated fixme comment,
and add a pair of tests.
The code change and the pr18663-2.ll test are copied from the referenced
bug. That test does not immediately fail in my environment, but I have
added the pr18663.ll test which does.
(Comment from Hal)
to provide everyone else with some context, this assert was not bad when
it was written. At that time, we only generated KILL pseudo instructions
around subregister copies. This logic, unfortunately, had its own problems.
In r199797, the relevant logic in MachineCopyPropagation was replaced to
generate KILLs for other kinds of copies too. This change in semantics broke
this now-problematic assumption in AggressiveAntiDepBreaker. The
AggressiveAntiDepBreaker really needs a proper cleanup to deal with the
change, but removing the assert (which just allows the function to return
false) is a safe conservative behavior, and should do for the time being.
Constant fold the lanes of the input constant build_vector individually
so we correctly handle when the vector elements are not all the same
constant value.
[NVPTX] Silence a GCC warning found by the buildbots
The cast to NVPTXTargetLowering was missing a 'const', but let's
just access the right pointer through the subtarget anyway.
------------------------------------------------------------------------
[NVPTX] Make sure we do not generate MULWIDE ISD nodes when optimizations are disabled
With optimizations disabled, we disable the isel patterns for mul.wide; but we
were still generating MULWIDE ISD nodes. Now, we only try to generate MULWIDE
ISD nodes in DAGCombine if the optimization level is not zero.
------------------------------------------------------------------------
Don't manually (and forcibly) run the verifier on the entire module from
the jump instruction table pass. First, the verifier is already built
into all the tools. The test case is adapted to just run llvm-as
demonstrating that we still catch the broken module. Second, the
verifier is *extremely* slow. This was responsible for very significant
compile time regressions.
If you have deployed a Clang binary anywhere from r210280 to this
commit, you really want to re-deploy.
------------------------------------------------------------------------
Regenerate autoconf, previous updates to the configury haven't
been updating configure.
------------------------------------------------------------------------
Try to fix the bots. If this does not work, I am going to move it to X86 directory.
------------------------------------------------------------------------
SimplifyCFG: fix a bug in switch to table conversion
We use gep to access the global array "switch.table", and the table index
should be treated as unsigned. When the highest bit is 1, this commit
zero-extends the index to an integer type with larger size.
For a switch on i2, we used to generate:
%switch.tableidx = sub i2 %0, -2
getelementptr inbounds [4 x i64]* @switch.table, i32 0, i2 %switch.tableidx
It is incorrect when %switch.tableidx is 2 or 3. The fix is to generate
%switch.tableidx = sub i2 %0, -2
%switch.tableidx.zext = zext i2 %switch.tableidx to i3
getelementptr inbounds [4 x i64]* @switch.table, i32 0, i3 %switch.tableidx.zext
Even if there's a file called c:\a, we want /? to be preserved as
an option, not expanded to a filename.
------------------------------------------------------------------------
Emit a warning if llvm_map_components_to_libraries() is used noting that its
use is deprecated in favour of llvm_map_components_to_libnames()
Although message(DEPRECATION "msg") would probably be a better fit this
does nothing if CMAKE_ERROR_DEPRECATED and CMAKE_WARNING_DEPRECATED are
both off, which is the default.
------------------------------------------------------------------------
Document the new LLVM CMake interface for building against LLVM
libraries. With many contributions from Brad King.
------------------------------------------------------------------------
[PATCH][PPC64LE] Correct little-endian usage of vmrgh* and vmrgl*.
Because the PowerPC vmrgh* and vmrgl* instructions have a built-in
big-endian bias, it is necessary to swap their inputs in little-endian
mode when using them to implement a vector shuffle. This was
previously missed in the vector LE implementation.
There was already logic to distinguish between unary and "normal"
vmrg* vector shuffles, so this patch extends that logic to use a third
option: "swapped" vmrg* vector shuffles that are used for little
endian in place of the "normal" ones.
I've updated the vec-shuffle-le.ll test to check for the expected
register ordering on the generated instructions.
This bug was discovered when testing the LE and ELFv2 patches for
safety if they were backported to 3.4. A different vectorization
decision was made in 3.4 than on mainline trunk, and that exposed the
problem. I've verified this fix takes care of that issue.
When we had a vector_shuffle where we had an input from each vector, we
could miscompile it because we were assuming the input from V2 wouldn't
be moved from where it was on the vector.
Export LLVM_ENABLE_RTTI and LLVM_ENABLE_EH in LLVMConfig.cmake so
clients of LLVM know if RTTI and/or EH were enabled in the build of
LLVM they are trying to link against.
------------------------------------------------------------------------
Added LLVM_ENABLE_RTTI and LLVM_ENABLE_EH options that allow RTTI and EH
to globally be controlled. Individual targets (e.g. ExceptionDemo) can
still override this by using LLVM_REQUIRE_RTTI and LLVM_REQUIRE_EH if
they need to be compiled with RTTI or exception handling respectively.
------------------------------------------------------------------------
Richard Smith [Tue, 22 Jul 2014 02:32:12 +0000 (02:32 +0000)]
Revert of r213521. This change introduced a non-hermetic test (depending on a
file not in the test/ area). Backing out now so that this test isn't part of
the 3.5 branch.
Original commit message: "TableGen: Allow AddedComplexity values to be negative
[...]"
Hal Finkel [Mon, 21 Jul 2014 21:30:22 +0000 (21:30 +0000)]
Match semantics of PointerMayBeCapturedBefore to its name by default
As it turns out, the capture tracker named CaptureBefore used by AA, and now
available via the PointerMayBeCapturedBefore function, would have been
more-aptly named CapturedBeforeOrAt, because it considers captures at the
instruction provided. This is not always what one wants, and it is difficult to
get the strictly-before behavior given only the current interface. This adds an
additional parameter which controls whether or not you want to include
captures at the provided instruction. The default is not to include the
instruction provided, so that 'Before' matches its name.
GCC believes it may be possible to not return a value from the switch:
lib/Target/R600/SIRegisterInfo.cpp:187:1: warning: control reaches end of non-void function [-Wreturn-type]
Add an unreachable label to indicate that this is not possible and still permit
switch coverage checking.
Replace the result usages while legalizing cmpxchg.
We should update the usages to all of the results;
otherwise, we might get assertion failure or SEGV during
the type legalization of ATOMIC_CMP_SWAP_WITH_SUCCESS
with two or more illegal types.
For example, in the following sequence, both i8 and i1
might be illegal in some target, e.g. armv5, mipsel, mips64el,
Since both i8 and i1 should be legalized, the corresponding
ATOMIC_CMP_SWAP_WITH_SUCCESS dag will be checked/replaced/updated
twice.
If we don't update the usage to *ALL* of the results in the
first round, the DAG for extractvalue might be processed earlier.
The GetPromotedInteger() will result in assertion failure,
because its operand (i.e. the success bit of cmpxchg) is not
promoted beforehand.