Kevin Enderby [Wed, 16 Jul 2014 17:38:26 +0000 (17:38 +0000)]
Add the "-x" flag to llvm-nm for Mach-O files that prints the fields of a symbol in hex.
(generally use for debugging the tools). This is same functionality as darwin’s
nm(1) "-x" flag.
Tim Northover [Wed, 16 Jul 2014 15:37:24 +0000 (15:37 +0000)]
CodeGen: don't form illegail EXTLOAD operations.
It turns out that in most cases (the main exception being i1-related
types) once these operations are formed we cannot separate them and
the targets end up having to deal with them whether they want to or
not.
This is not a good situation, and a more reasonable default can be
formed by ackowledging this and having targets leave them as Legal.
Only x86 seems to be affected (other targets don't even try marking
the operation Expand).
Mostly there's no visible change here yet, but it will be useful to
have truly expanded EXTLOADS for MVT::f16 softening support.
Daniel Sanders [Wed, 16 Jul 2014 15:34:07 +0000 (15:34 +0000)]
[mips][fp64a] Temporarily disable odd-numbered double-precision registers when using the FP64A ABI.
Summary:
A few instructions (mostly cvt.d.w and similar) are causing problems with
-mfp64 and -mno-odd-spreg and it looks like fixing it properly may
take several weeks. In the meantime, let's disable the odd-numbered
double-precision registers so that the generated code is at least valid.
The problem is that instructions like cvt.d.w read from the 32-bit low
subregister of a double-precision FPU register. This often leads to the compiler
to inserting moves to transfer a GPR32 to a FGR32 using mtc1. Such moves
violate the rules against 32-bit writes to odd-numbered FPU registers imposed
by -mno-odd-spreg. By disabling the odd-numbered double-precision registers, it
becomes impossible for the 32-bit low subregister to be odd-numbered.
This fixes numerous test-suite failures when compiling for the FP64A ABI
('-mfp64 -mno-odd-spreg'). There is no LLVM test case because it's difficult to
test that odd-numbered FPU registers are not allocatable. Instead, we depend on
the assembler (GAS and -fintegrated-as) raising errors when the rules are
violated.
Andrea Di Biagio [Wed, 16 Jul 2014 11:29:39 +0000 (11:29 +0000)]
[X86] Add a check for 'isMOVHLPSMask' within method 'isShuffleMaskLegal'.
Before this change, method 'isShuffleMaskLegal' didn't know that shuffles
implementing a 'movhlps' operation were perfectly legal for SSE targets.
This patch adds the missing check for 'isMOVHLPSMask' inside method
'isShuffleMaskLegal' to fix the problem.
The reason why it is important to do this is because the DAGCombiner
conservatively avoids combining a pair of shuffles if the resulting shuffle
node has an illegal mask. Before this patch, shuffles with a MOVHLPS mask were
wrongly considered not to be legal. This was the root cause of some poor-code
generation bugs.
The "valgrind was whining" comment looked promising in terms of a
simpler to debug case of the same errors. However, it appears that the
valgrind complaints the comment was referring to are distinct from the
ones in the frontend, since this updated test isn't complaining for me
under valgrind.
In any case, the disabled tests weren't helping anybody.
Roundtrip the inalloca bit on allocas through bitcode
This was an oversight in the original support. As it is, I stuffed this
bit into the alignment. The alignment is stored in log2 form, so it
doesn't need more than 5 bits, given that Value::MaximumAlignment is 1
<< 29.
Manuel Jacob [Wed, 16 Jul 2014 01:34:21 +0000 (01:34 +0000)]
Fix comment in InstCombiner::visitAddrSpaceCast.
In the original version of the patch the behaviour was like described in
the comment. This behaviour was changed before committing it without
updating the comment.
Hans Wennborg [Wed, 16 Jul 2014 00:52:11 +0000 (00:52 +0000)]
Perform wildcard expansion in Process::GetArgumentVector on Windows (PR17098)
On Windows, wildcard expansion isn't performed by the shell, but left to the
program itself. The common way to do this is to link with setargv.obj, which
performs the expansion on argc/argv before main is entered. However, we don't
use argv in Clang on Windows, but instead call GetCommandLineW so we can handle
unicode arguments. This means we have to do wildcard expansion ourselves.
Emit warnings if vectorization is forced and fails.
This patch modifies the existing DiagnosticInfo system to create a generic base
class that is inherited to produce diagnostic-based warnings. This is used by
the loop vectorizer to trigger a warning when vectorization is forced and
fails. Several tests have been added to verify this behavior.
Move Post RA Scheduling flag bit into SchedMachineModel
Refactoring; no functional changes intended
Removed PostRAScheduler bits from subtargets (X86, ARM).
Added PostRAScheduler bit to MCSchedModel class.
This bit is set by a CPU's scheduling model (if it exists).
Removed enablePostRAScheduler() function from TargetSubtargetInfo and subclasses.
Fixed the existing enablePostMachineScheduler() method to use the MCSchedModel (was just returning false!).
Added methods to TargetSubtargetInfo to allow overrides for AntiDepBreakMode, CriticalPathRCs, and OptLevel for PostRAScheduling.
Added enablePostRAScheduler() function to PostRAScheduler class which queries the subtarget for the above values.
Preserved existing scheduler behavior for ARM, MIPS, PPC, and X86:
a. ARM overrides the CPU's postRA settings by enabling postRA for any non-Thumb or Thumb2 subtarget.
b. MIPS overrides the CPU's postRA settings by enabling postRA for everything.
c. PPC overrides the CPU's postRA settings by enabling postRA for everything.
d. X86 is the only target that actually has postRA specified via sched model info.
David Blaikie [Tue, 15 Jul 2014 21:06:37 +0000 (21:06 +0000)]
Try out FileCheck's new (in r212810) -implicit-check-not in a DebugInfo test.
Just tried this on a few tests and this was the only one that was
easily ported to use the new feature, so we'll go with that for now.
Hopefully can act as inspiration/reminder for other tests.
Not all debug info tests need to check for every DW_TAG or NULL child
terminator, but perhaps they should (just to ensure they don't accidentally
end up with tags nested inside other tags without the test failing, for example)
Alp Toker [Tue, 15 Jul 2014 21:04:12 +0000 (21:04 +0000)]
CMake: fix cross-compilation with external source directories
This adds support for building native artifacts when cross-compiling using the
popular side-by-side source directory layout (no symlinks, no nested
repositories).
The registration scheme used in r211652 violated the read-only contract of
MemoryBuffer. This caused crashes in llvm-rtdyld where macho objects were backed
by read-only mmap'd memory.
Actually update the changed indexes in the map portion of `MapVector`
when erasing from the middle. Add a unit test that checks for this.
Note that `MapVector::erase()` is a linear time operation (it was and
still is). I'll commit a new method in a moment called
`MapVector::remove_if()` that deletes multiple entries in linear time,
which should be slightly less painful.
Chris Bieneman [Tue, 15 Jul 2014 17:18:41 +0000 (17:18 +0000)]
[RegisterCoalescer] Add new subtarget hook allowing targets to opt-out of coalescing.
The coalescer is very aggressive at propagating constraints on the register classes, and the register allocator doesn’t know how to split sub-registers later to recover. This patch provides an escape valve for targets that encounter this problem to limit coalescing.
This patch also implements such for ARM to lower register pressure when using lots of large register classes. This works around PR18825.
[AArch64] Add negative tests for the SIMD & FP LDP instructions.
LDP is unpredictable if the registers in the pair are identical, these tests check that we don't assemble instructions like that and error out instead.
Andrea Di Biagio [Tue, 15 Jul 2014 13:26:28 +0000 (13:26 +0000)]
[DAGCombiner] Add more rules to fold shuffles.
This patch adds two new rules to the DAGCombiner:
1. shuffle (shuffle A, Undef, M0), B, M1 -> shuffle A, B, M2
2. shuffle (shuffle A, Undef, M0), A, M1 -> shuffle A, Undef, M2
We only do this if the combined shuffle is legal for the target.
Example:
;;
define <4 x float> @test(<4 x float> %a, <4 x float> %b) {
%1 = shufflevector <4 x float> %a, <4 x float> undef, <4 x i32><i32 6, i32 0, i32 1, i32 7>
%2 = shufflevector <4 x float> %1, <4 x float> %b, <4 x i32><i32 1, i32 2, i32 4, i32 5>
ret <4 x i32> %2
}
;;
(using llc -mcpu=corei7 -march=x86-64)
Before, the x86 backend generated:
pshufd $120, %xmm0, %xmm0
shufps $-108, %xmm0, %xmm1
movaps %xmm1, %xmm0
Andrea Di Biagio [Tue, 15 Jul 2014 10:53:44 +0000 (10:53 +0000)]
Silence a warning in conditional expression.
Fixes a gcc warning caused by a typo. A redundant assignment operation was
accidentally used as the third operand of a conditional expression.
No functional change intended.
This implements the FastLowerCall hook, which is based on the DoSelectCall
function. The implementation is very similar, but the target-independent call
lowering part has been factored out.
This should also enable patchpoint intrinsic lowering for FastISel on X86.
David Majnemer [Tue, 15 Jul 2014 02:34:12 +0000 (02:34 +0000)]
CodeGen: Handle ConstantVector and undef in WinCOFF constant pools
The constant pool entry code for WinCOFF assumed that vector constants
would be formed using ConstantDataVector, it did not expect to see a
ConstantVector. Furthermore, it did not expect undef as one of the
elements of the vector.
ConstantVectors should be handled like ConstantDataVectors, treat Undef
as zero.
This implements the FastLowerCall hook, which is based on the DoSelectCall
function. The implementation is very similar, but the target-independent call
lowering part has been factored out.
This should also enable patchpoint intrinsic lowering for FastISel on X86.
[FastISel] Insert patchpoint instruction before the target generated call instruction.
The patchpoint instruction should have been inserted before the target
generated call instruction to be inside the ADJSTACKDOWN/ADJSTACKUP call
sequence window.
[FastISel] Fix patchpoint lowering to set the result register.
Always update the value map with the result register (if there is one), for the
patchpoint instruction we created to replace the target-specific call
instruction.
Matt Arsenault [Tue, 15 Jul 2014 02:06:31 +0000 (02:06 +0000)]
R600: Add dag combine for copy of an illegal type.
This helps avoid redundant instructions to unpack, and repack
the vectors. Ideally we could recognize that pattern and eliminate
it. Currently v4i8 and other small element type vectors are scalarized,
so this has the added bonus of avoiding that.
Andrea Di Biagio [Tue, 15 Jul 2014 01:29:27 +0000 (01:29 +0000)]
Improve test 'CodeGen/X86/combine-vec-shuffle-3.ll'.
Now functions 'test4', 'test9', 'test14' and 'test19' correctly perform
a move of two packed values from the high quadword of vector %b to the low
quadword of vector %a (movhlps idiom).
Andrea Di Biagio [Tue, 15 Jul 2014 00:02:32 +0000 (00:02 +0000)]
[DAGCombiner] Avoid calling method 'isShuffleMaskLegal' on illegal vector types.
This patch fixes a crasher in method 'DAGCombiner::visitOR' due to an invalid
call to method 'isShuffleMaskLegal'. On x86, method 'isShuffleMaskLegal'
always expects a legal vector value type in input.
With this patch, we immediately check if the input OR dag node has a legal
vector type; we only try to fold a OR dag node into a single shufflevector
if we know that the resulting shuffle will have a legal type.
This is to avoid calling method 'isShuffleMaskLegal' on a potentially
illegal vector value type.
Added a new test-case to file 'CodeGen/X86/combine-or.ll' to verify that
DAGCombiner doesn't crash in the attempt to check/combine an OR between shuffles
with illegal types.
Lang Hames [Mon, 14 Jul 2014 23:19:50 +0000 (23:19 +0000)]
[RuntimeDyld] Handle endiannes differences between the host and target while
reading MachO files magic numbers in RuntimeDyld.
This is required now that we're testing cross-platform JITing (via
RuntimeDyldChecker), and should fix some issues that David Fang has seen on PPC
builds.
Adam Nemet [Mon, 14 Jul 2014 23:18:39 +0000 (23:18 +0000)]
[X86] Specify all TSFlags bit-offsets symbolically
No functional change.
The offsets for the other bitfields are specified symbolically. I need to
increase the size for one of the earlier fields which is easier after this
cleanup.
Why these bits are relative to VEXShift is a bit strange but that is for
another cleanup.
I made sure that the values for the enums are unchanged after this change.
David Majnemer [Mon, 14 Jul 2014 22:57:27 +0000 (22:57 +0000)]
CodeGen: Stick constant pool entries in COMDAT sections for WinCOFF
COFF lacks a feature that other object file formats support: mergeable
sections.
To work around this, MSVC sticks constant pool entries in special COMDAT
sections so that each constant is in it's own section. This permits
unused constants to be dropped and it also allows duplicate constants in
different translation units to get merged together.
Andrea Di Biagio [Mon, 14 Jul 2014 22:46:26 +0000 (22:46 +0000)]
[DAGCombiner] Add more rules to combine shuffle vector dag nodes.
This patch teaches the DAGCombiner how to fold a pair of shuffles
according to rules:
1. shuffle(shuffle A, B, M0), B, M1) -> shuffle(A, B, M2)
2. shuffle(shuffle A, B, M0), A, M1) -> shuffle(A, B, M3)
The new rules would only trigger if the resulting shuffle has legal type and
legal mask.
Added test 'combine-vec-shuffle-3.ll' to verify that DAGCombiner correctly
folds shuffles on x86 when the resulting mask is legal. Also added some negative
cases to verify that we avoid introducing illegal shuffles.
David Majnemer [Mon, 14 Jul 2014 21:56:54 +0000 (21:56 +0000)]
ADT: Surface LowerCase argument for utohexstr
The underlying function. utohex_buffer, already supports an argument for
deciding if the hex characters should be upper or lower case. Expose an
identical argument for utohexstr.
Support: Fix option handling when using cl::Required with aliasopt
Until now, attempting to create an alias of a required option would
complain if the user supplied the alias, because the required option
didn't have a value. Similarly, if you said the alias was required,
then using the base option would complain that the alias wasn't
supplied. Lastly, if you put required on both, *neither* option would
work.
By changning alias to overload addOccurrence and setting cl::Required
on the original option, we can get this to behave in a more useful
way. I've also added a test and updated a user that was getting this
wrong.
David Majnemer [Mon, 14 Jul 2014 20:38:45 +0000 (20:38 +0000)]
InstSimplify: Correct sdiv x / -1
Determining the bounds of x/ -1 would start off with us dividing it by
INT_MIN. Suffice to say, this would not work very well.
Instead, handle it upfront by checking for -1 and mapping it to the
range: [INT_MIN + 1, INT_MAX. This means that the result of our
division can be any value other than INT_MIN.
David Majnemer [Mon, 14 Jul 2014 19:49:57 +0000 (19:49 +0000)]
InstSimplify: The upper bound of X / C was missing a rounding step
Summary:
When calculating the upper bound of X / -8589934592, we would perform
the following calculation: Floor[INT_MAX / 8589934592]
However, flooring the result would make us wrongly come to the
conclusion that 1073741824 was not in the set of possible values.
Instead, use the ceiling of the result.
We would emit a libcall for a 64-bit atomic on x86 after SVN r212119. This was
due to the misuse of hasCmpxchg16 to indicate if cmpxchg8b was supported on a
32-bit target. They were added at different times and would result in the
border condition being mishandled.
This fixes the border case to emit the cmpxchg8b instruction for 64-bit atomic
operations on x86 at the cost of restoring a long-standing bug in the codegen.
We emit a cmpxchg8b on all x86 targets even where the CPU does not support this
instruction (pre-Pentium CPUs). Although this bug should be fixed, this was
present prior to SVN r212119 and this change, so this is not really introducing
a regression.
Found during windows unwinding work. This header is indirectly included through
a chain leading through Support/Win64EH.h. Explicitly include the header. NFC.
Tim Northover [Mon, 14 Jul 2014 15:31:13 +0000 (15:31 +0000)]
X86: remove temporary atomicrmw used during lowering.
We construct a temporary "atomicrmw xchg" instruction when lowering atomic
stores for widths that aren't supported natively. This isn't on the top-level
worklist though, so it won't be removed automatically and we have to do it
ourselves once that itself has been lowered.