Hans Wennborg [Wed, 16 Oct 2013 01:20:40 +0000 (01:20 +0000)]
MC: Better handling of tricky symbol and section names
Because of win32 mangling, we produce symbol and section names with
funny characters in them, most notably @ characters.
MC would choke on trying to parse its own assembly output. This patch addresses
that by:
- Making @ trigger quoting of symbol names
- Also quote section names in the same way
- Just parse section names like other identifiers (to allow for quotes)
- Don't assume @ signifies a symbol variant if it is in a string.
Andrew Trick [Tue, 15 Oct 2013 23:33:07 +0000 (23:33 +0000)]
Enable MI Sched for x86.
This changes the SelectionDAG scheduling preference to source
order. Soon, the SelectionDAG scheduler can be bypassed saving
a nice chunk of compile time.
Performance differences that result from this change are often a
consequence of register coalescing. The register coalescer is far from
perfect. Bugs can be filed for deficiencies.
On x86 SandyBridge/Haswell, the source order schedule is often
preserved, particularly for small blocks.
Register pressure is generally improved over the SD scheduler's ILP
mode. However, we are still able to handle large blocks that require
latency hiding, unlike the SD scheduler's BURR mode. MI scheduler also
attempts to discover the critical path in single-block loops and
adjust heuristics accordingly.
The MI scheduler relies on the new machine model. This is currently
unimplemented for AVX, so we may not be generating the best code yet.
Unit tests are updated so they don't depend on SD scheduling heuristics.
Eric Christopher [Tue, 15 Oct 2013 23:31:38 +0000 (23:31 +0000)]
Make sure we're not attempting to construct a subprogram DIE
twice and just look up the value. Fix the one case where
we were trying to create a subprogram DIE and we should already
have had one. Reflow formatting in collectDeadVariables while fixing.
Eric Christopher [Tue, 15 Oct 2013 23:31:36 +0000 (23:31 +0000)]
Add an assert that we have a scope that matters for methods
and remove a call to getNonCompileUnitScope as a method
shouldn't be in the compile unit scope.
Rui Ueyama [Tue, 15 Oct 2013 22:45:38 +0000 (22:45 +0000)]
Path: Recognize Windows compiled resource file.
Some background: One can pass compiled resource files (.res files) directly
to the linker on Windows. If a resource file is given, the linker will run
"cvtres" command in background to convert the resource file to a COFF file
to link it.
What I'm trying to do with this patch is to make the linker to recognize
the resource file by file magic, so that it can run cvtres command.
Michael Liao [Tue, 15 Oct 2013 17:51:58 +0000 (17:51 +0000)]
Fix PR17546
- Type of index used in extract_vector_elt or insert_vector_elt supposes
to be TLI.getVectorIdxTy() which is pointer type on most targets. It'd
better to truncate (or zero-extend in case it's changed later) it to
mask element type to guarantee they are matching instead of asserting
that.
Michael Liao [Tue, 15 Oct 2013 17:51:02 +0000 (17:51 +0000)]
Fix PR16807
- Lower signed division by constant powers-of-2 to target-independent
DAG operators instead of target-dependent ones to support them better
on targets where vector types are legal but shift operators on that
types are illegal. E.g., on AVX, PSRAW is only available on <8 x i16>
though <16 x i16> is a legal type.
Add AllTargetsBindings sublibrary instead of having static inlines in the llvm-c headers.
This new library will be linked in when using the "all-targets"
component and contains the LLVMInitializeAll* functions.
This means that those functions will exist as real symbols in
the shared library, and can therefore can be called from
bindings that are using ffi the shared library.
David Majnemer [Tue, 15 Oct 2013 08:30:07 +0000 (08:30 +0000)]
docs: Remove incompatibility with Solaris shell
There doesn't seem to be a need in checking if a directory exists if we
will just rm -rf it once we affirm that it does. Instead, just blindly
try to delete it.
Craig Topper [Tue, 15 Oct 2013 05:20:47 +0000 (05:20 +0000)]
Remove x86_sse42_crc32_64_8 intrinsic. It has no functional difference from x86_sse42_crc32_32_8 and was not mapped to a clang builtin. I'm not even sure why this form of the instruction is even called out explicitly in the docs. Also add AutoUpgrade support to convert it into the other intrinsic with appropriate trunc and zext.
Quentin Colombet [Mon, 14 Oct 2013 22:32:09 +0000 (22:32 +0000)]
[X86][FastISel] During X86 fastisel, the address of indirect call was resolved
through bitcast, ptrtoint, and inttoptr instructions. This is valid
only if the related instructions are in that same basic block, otherwise
we may reference variables that were not live accross basic blocks
resulting in undefined virtual registers.
The bug was exposed when both SDISel and FastISel were used within the same
function, i.e., one basic block is issued with FastISel and another with SDISel,
as demonstrated with the testcase.
Andrew Trick [Mon, 14 Oct 2013 22:19:03 +0000 (22:19 +0000)]
Fix the ExecutionDepsFix pass to handle AVX instructions.
This pass is needed to break false dependencies. Without it, unlucky
register assignment can result in wild (5x) swings in
performance. This pass was trying to handle AVX but not getting it
right. AVX doesn't have partial register defs, it has unused register
reads in which the high bits of a source operand are copied into the
unused bits of the dest.
Fixing this requires conservative liveness analysis. This is awkard
because the pass already has its own pseudo-liveness. However, proper
liveness is expensive, and we would like to use a generic utility to
compute it. The fix only invokes liveness on-demand. It is rare to
detect a case that needs undef-read dependence breaking, but when it
happens, it can be needed many times within a very large block.
I think the existing heuristic which uses a register window of 16 is
too conservative for loop-carried false dependencies. If the loop is a
reduction. The out-of-order engine may be able to execute several loop
iterations in parallel. However, I'll leave this tuning exercise for
next time.
Eric Christopher [Mon, 14 Oct 2013 21:52:26 +0000 (21:52 +0000)]
Revert part of a fix from 2010, changes since then:
a) x86-64 TLS has been documented
b) the code path should use movq for the correct relocation
to be generated.
I've also added a fixme for the test case that we should improve
the code generated, it should look something like is documented
in the tls abi document.
Andrew Trick [Mon, 14 Oct 2013 20:45:19 +0000 (20:45 +0000)]
LiveRegUnits::removeRegsInMask safety.
Clobbering is exclusive not inclusive on register units.
For liveness, we need to consider all the preserved registers.
e.g. A regmask that clobbers YMM0 may preserve XMM0.
Units are only clobbered when all super-registers are clobbered.
Andrew Trick [Mon, 14 Oct 2013 20:45:17 +0000 (20:45 +0000)]
Use a SparseSet in LiveRegUnits.
Some clients may add block live ins and may track liveness over a
large scope. This guarantees an efficient implementation in all cases
with no memory allocation/deallocation, independent of the number of
target registers. It could be slightly less convenient but is fine in
the expected case.
Manman Ren [Mon, 14 Oct 2013 20:33:57 +0000 (20:33 +0000)]
Debug Info: static member DIE creation.
Clean up creation of static member DIEs. We can create static member DIEs from
two places, so we call getOrCreateStaticMemberDIE from the two places.
getOrCreateStaticMemberDIE will get or create the context DIE first, then it
will check if the DIE already exists, if not, we create the static member DIE
and add it to the context.
Creation of static member DIEs are handled in a similar way as subprogram DIEs.
Will Dietz [Mon, 14 Oct 2013 16:57:17 +0000 (16:57 +0000)]
MachineSink: Fix and tweak critical-edge breaking heuristic.
Per original comment, the intention of this loop
is to go ahead and break the critical edge
(in order to sink this instruction) if there's
reason to believe doing so might "unblock" the
sinking of additional instructions that define
registers used by this one. The idea is that if
we have a few instructions to sink "together"
breaking the edge might be worthwhile.
This commit makes a few small changes
to help better realize this goal:
First, modify the loop to ignore registers
defined by this instruction. We don't
sink definitions of physical registers,
and sinking an SSA definition isn't
going to unblock an upstream instruction.
Second, ignore uses of physical registers.
Instructions that define physical registers are
rejected for sinking, and so moving this one
won't enable moving any defining instructions.
As an added bonus, while virtual register
use-def chains are generally small due
to SSA goodness, iteration over the uses
and definitions (used by hasOneNonDBGUse)
for physical registers like EFLAGS
can be rather expensive in practice.
(This is the original reason for looking at this)
Finally, to keep things simple continue
to only consider this trick for registers that
have a single use (via hasOneNonDBGUse),
but to avoid spuriously breaking critical edges
only do so if the definition resides
in the same MBB and therefore this one directly
blocks it from being sunk as well.
If sinking them together is meant to be,
let the iterative nature of this pass
sink the definition into this block first.
Update tests to accomodate this change,
add new testcase where sinking avoids pipeline stalls.
Evgeniy Stepanov [Mon, 14 Oct 2013 15:16:25 +0000 (15:16 +0000)]
[msan] Instrument x86.*_cvt* intrinsics.
Currently MSan checks that arguments of *cvt* intrinsics are fully initialized.
That's too much to ask: some of them only operate on lower half, or even
quarter, of the input register.
Matheus Almeida [Mon, 14 Oct 2013 11:49:30 +0000 (11:49 +0000)]
[mips][msa] Direct Object Emission of INSERT.{B,H,W} instruction.
INSERT is the first type of MSA instruction that requires a change to the way
MSA registers are parsed. This happens because MSA registers may be suffixed by
an index in the form of an immediate or a general purpose register. The changes
to parseMSARegs reflect that requirement.
Craig Topper [Mon, 14 Oct 2013 04:55:01 +0000 (04:55 +0000)]
Allow pinsrw/pinsrb/pextrb/pextrw/movmskps/movmskpd/pmovmskb/extractps instructions to parse either GR32 or GR64 without resorting to duplicating instructions.
Craig Topper [Mon, 14 Oct 2013 01:42:32 +0000 (01:42 +0000)]
Add disassembler support for SSE4.1 register/register form of PEXTRW. There is a shorter encoding that was part of SSE2, but a memory form was added in SSE4.1. This is the register form of that encoding.
Craig Topper [Mon, 14 Oct 2013 00:24:33 +0000 (00:24 +0000)]
Don't use 64-bit versions of MOVMSKPD in CodeGen. The instructions only produce a 1-bit result so we can just use SUBREG_TO_REG to extend the 32-bit versions.
Will Dietz [Sun, 13 Oct 2013 22:09:26 +0000 (22:09 +0000)]
MC: Don't assume incoming StringRef's are null terminated.
This can happen when processing command line arguments, which
are often stored as std::string's and later turned into
StringRef's via std::string::data(). Unfortunately this
is not guaranteed to return a null-terminated string
until C++11, causing breakage on platforms that don't do this.
David Majnemer [Sun, 13 Oct 2013 10:34:21 +0000 (10:34 +0000)]
Windows: Use GetModuleHandleEx instead of LoadLibrary
We were using an anti-pattern of:
- LoadLibrary
- GetProcAddress
- FreeLibrary
This is problematic because of several reasons:
- We are holding on to pointers into a library we just unloaded.
- Calling LoadLibrary results in an increase in the reference count of
the library in question and any libraries that it depends on and
so-on and so-forth. This is none too quick.
Instead, use GetModuleHandleEx with GET_MODULE_HANDLE_EX_FLAG_PIN. This
is done because because we didn't bring the reference for the library
into existence and therefor shouldn't count on it being around later.
SLPVectorizer: Sort PHINodes based on their opcode
Before this patch we relied on the order of phi nodes when we looked for phi
nodes of the same type. This could prevent vectorization of cases where there
was a phi node of a second type in between phi nodes of some type.
This is important for vectorization of an internal graphics kernel. On the test
suite + external on x86_64 (and on a run on armv7s) it showed no impact on
either performance or compile time.