Eric Christopher [Mon, 14 Oct 2013 21:52:26 +0000 (21:52 +0000)]
Revert part of a fix from 2010, changes since then:
a) x86-64 TLS has been documented
b) the code path should use movq for the correct relocation
to be generated.
I've also added a fixme for the test case that we should improve
the code generated, it should look something like is documented
in the tls abi document.
Andrew Trick [Mon, 14 Oct 2013 20:45:19 +0000 (20:45 +0000)]
LiveRegUnits::removeRegsInMask safety.
Clobbering is exclusive not inclusive on register units.
For liveness, we need to consider all the preserved registers.
e.g. A regmask that clobbers YMM0 may preserve XMM0.
Units are only clobbered when all super-registers are clobbered.
Andrew Trick [Mon, 14 Oct 2013 20:45:17 +0000 (20:45 +0000)]
Use a SparseSet in LiveRegUnits.
Some clients may add block live ins and may track liveness over a
large scope. This guarantees an efficient implementation in all cases
with no memory allocation/deallocation, independent of the number of
target registers. It could be slightly less convenient but is fine in
the expected case.
Manman Ren [Mon, 14 Oct 2013 20:33:57 +0000 (20:33 +0000)]
Debug Info: static member DIE creation.
Clean up creation of static member DIEs. We can create static member DIEs from
two places, so we call getOrCreateStaticMemberDIE from the two places.
getOrCreateStaticMemberDIE will get or create the context DIE first, then it
will check if the DIE already exists, if not, we create the static member DIE
and add it to the context.
Creation of static member DIEs are handled in a similar way as subprogram DIEs.
Will Dietz [Mon, 14 Oct 2013 16:57:17 +0000 (16:57 +0000)]
MachineSink: Fix and tweak critical-edge breaking heuristic.
Per original comment, the intention of this loop
is to go ahead and break the critical edge
(in order to sink this instruction) if there's
reason to believe doing so might "unblock" the
sinking of additional instructions that define
registers used by this one. The idea is that if
we have a few instructions to sink "together"
breaking the edge might be worthwhile.
This commit makes a few small changes
to help better realize this goal:
First, modify the loop to ignore registers
defined by this instruction. We don't
sink definitions of physical registers,
and sinking an SSA definition isn't
going to unblock an upstream instruction.
Second, ignore uses of physical registers.
Instructions that define physical registers are
rejected for sinking, and so moving this one
won't enable moving any defining instructions.
As an added bonus, while virtual register
use-def chains are generally small due
to SSA goodness, iteration over the uses
and definitions (used by hasOneNonDBGUse)
for physical registers like EFLAGS
can be rather expensive in practice.
(This is the original reason for looking at this)
Finally, to keep things simple continue
to only consider this trick for registers that
have a single use (via hasOneNonDBGUse),
but to avoid spuriously breaking critical edges
only do so if the definition resides
in the same MBB and therefore this one directly
blocks it from being sunk as well.
If sinking them together is meant to be,
let the iterative nature of this pass
sink the definition into this block first.
Update tests to accomodate this change,
add new testcase where sinking avoids pipeline stalls.
Evgeniy Stepanov [Mon, 14 Oct 2013 15:16:25 +0000 (15:16 +0000)]
[msan] Instrument x86.*_cvt* intrinsics.
Currently MSan checks that arguments of *cvt* intrinsics are fully initialized.
That's too much to ask: some of them only operate on lower half, or even
quarter, of the input register.
Matheus Almeida [Mon, 14 Oct 2013 11:49:30 +0000 (11:49 +0000)]
[mips][msa] Direct Object Emission of INSERT.{B,H,W} instruction.
INSERT is the first type of MSA instruction that requires a change to the way
MSA registers are parsed. This happens because MSA registers may be suffixed by
an index in the form of an immediate or a general purpose register. The changes
to parseMSARegs reflect that requirement.
Craig Topper [Mon, 14 Oct 2013 04:55:01 +0000 (04:55 +0000)]
Allow pinsrw/pinsrb/pextrb/pextrw/movmskps/movmskpd/pmovmskb/extractps instructions to parse either GR32 or GR64 without resorting to duplicating instructions.
Craig Topper [Mon, 14 Oct 2013 01:42:32 +0000 (01:42 +0000)]
Add disassembler support for SSE4.1 register/register form of PEXTRW. There is a shorter encoding that was part of SSE2, but a memory form was added in SSE4.1. This is the register form of that encoding.
Craig Topper [Mon, 14 Oct 2013 00:24:33 +0000 (00:24 +0000)]
Don't use 64-bit versions of MOVMSKPD in CodeGen. The instructions only produce a 1-bit result so we can just use SUBREG_TO_REG to extend the 32-bit versions.
Will Dietz [Sun, 13 Oct 2013 22:09:26 +0000 (22:09 +0000)]
MC: Don't assume incoming StringRef's are null terminated.
This can happen when processing command line arguments, which
are often stored as std::string's and later turned into
StringRef's via std::string::data(). Unfortunately this
is not guaranteed to return a null-terminated string
until C++11, causing breakage on platforms that don't do this.
David Majnemer [Sun, 13 Oct 2013 10:34:21 +0000 (10:34 +0000)]
Windows: Use GetModuleHandleEx instead of LoadLibrary
We were using an anti-pattern of:
- LoadLibrary
- GetProcAddress
- FreeLibrary
This is problematic because of several reasons:
- We are holding on to pointers into a library we just unloaded.
- Calling LoadLibrary results in an increase in the reference count of
the library in question and any libraries that it depends on and
so-on and so-forth. This is none too quick.
Instead, use GetModuleHandleEx with GET_MODULE_HANDLE_EX_FLAG_PIN. This
is done because because we didn't bring the reference for the library
into existence and therefor shouldn't count on it being around later.
SLPVectorizer: Sort PHINodes based on their opcode
Before this patch we relied on the order of phi nodes when we looked for phi
nodes of the same type. This could prevent vectorization of cases where there
was a phi node of a second type in between phi nodes of some type.
This is important for vectorization of an internal graphics kernel. On the test
suite + external on x86_64 (and on a run on armv7s) it showed no impact on
either performance or compile time.
Reed Kotler [Sat, 12 Oct 2013 02:19:08 +0000 (02:19 +0000)]
For Mips16, start to consolidate all forms of 32 bit literal loading so that
they can be better handled and optimized in the Mips16 constant island code.
Andrew Kaylor [Fri, 11 Oct 2013 22:47:10 +0000 (22:47 +0000)]
Fixing problems in lli's RemoteMemoryManager.
This fixes a problem from a previous check-in where a return value was omitted.
Previously the remote/stubs-remote.ll and remote/stubs-sm-pic.ll tests were reporting passes, but they should have been failing. Those tests attempt to link against an external symbol and remote symbol resolution is not supported. The old RemoteMemoryManager implementation resulted in local symbols being used for resolution and the child process crashed but the test didn't notice. With this check-in remote symbol resolution fails, and so the test (correctly) fails.
Matthias Braun [Fri, 11 Oct 2013 19:04:37 +0000 (19:04 +0000)]
Remove kill flags after if conversion if necessary
When if converting something like:
true:
... = R0<kill>
false:
... = R0<kill>
then the instructions of the true block must not have a <kill> flag
anymore, as the instruction of the false block follow and do still read
the R0 value.
Specifically this patch determines the set of register live-in in the
false block (possibly after simulating the liveness changes of the
duplicated instructions). Each of these live-in registers mustn't be
killed.
Stephen Lin [Fri, 11 Oct 2013 18:38:36 +0000 (18:38 +0000)]
Really fix CHECK-LABEL and CHECK-DAG interaction. This actually just restores the initial implementation that was in r186162 but got lost in some subsequent refactoring. More explicit variable names and comments are present now to hopefully prevent repeat regression, as well as another test.
Quentin Colombet [Fri, 11 Oct 2013 18:29:42 +0000 (18:29 +0000)]
[DAGCombiner] Reapply load slicing (192471) with a test that explicitly set sse4.2 support.
This should fix the buildbots.
Original commit message:
[DAGCombiner] Slice a big load in two loads when the element are next to each
other in memory and the target has paired load and performs post-isel loads
combining.
E.g., this optimization will transform something like this:
a = load i64* addr
b = trunc i64 a to i32
c = lshr i64 a, 32
d = trunc i64 c to i32
into:
b = load i32* addr1
d = load i32* addr2
Where addr1 = addr2 +/- sizeof(i32), if the target supports paired load and
performs post-isel loads combining.
One should overload TargetLowering::hasPairedLoad to provide this information.
The default is false.
Quentin Colombet [Fri, 11 Oct 2013 18:01:14 +0000 (18:01 +0000)]
[DAGCombiner] Slice a big load in two loads when the element are next to each
other in memory and the target has paired load and performs post-isel loads
combining.
E.g., this optimization will transform something like this:
a = load i64* addr
b = trunc i64 a to i32
c = lshr i64 a, 32
d = trunc i64 c to i32
into:
b = load i32* addr1
d = load i32* addr2
Where addr1 = addr2 +/- sizeof(i32), if the target supports paired load and
performs post-isel loads combining.
One should overload TargetLowering::hasPairedLoad to provide this information.
The default is false.
Matheus Almeida [Fri, 11 Oct 2013 13:29:36 +0000 (13:29 +0000)]
[mips][msa] Direct Object Emission of INSERT.{B,H,W} instruction.
INSERT is the first type of MSA instruction that requires a change to the way MSA registers are parsed.
This happens because MSA registers may be suffixed by an index in the form of an immediate or a
general purpose register. The changes to parseMSARegs reflect that requirement.
Make AsmPrinter::emitImplicitDef a virtual method so targets can emit custom comments for implicit defs
For NVPTX, this fixes a crash where the emitImplicitDef implementation was expecting physical registers,
while NVPTX uses virtual registers (with a couple of exceptions). Now, the implicit def comment will be
emitted as a true PTX register name. Other targets can use this to customize the output of implicit def
comments.
Robert Lytton [Fri, 11 Oct 2013 10:26:48 +0000 (10:26 +0000)]
XCore target: fix bug in XCoreLowerThreadLocal.cpp
When a ConstantExpr which uses a thread local is part of a PHI node
instruction, the insruction that replaces the ConstantExpr must
be inserted in the predecessor block, in front of the terminator instruction.
If the predecessor block has multiple successors, the edge is first split.