Chandler Carruth [Sun, 15 Feb 2015 12:42:15 +0000 (12:42 +0000)]
[x86] Teach the decomposed shuffle/blend lowering to use an early blend
when that will allow it to lower with a single permute instead of
multiple permutes.
It tries to detect when it will only have to do a single permute in
either case to maximize folding of loads and such.
This cuts a *lot* of the avx2 shuffle permute counts in half. =]
Chandler Carruth [Sun, 15 Feb 2015 12:18:12 +0000 (12:18 +0000)]
[SDAG] Teach the SelectionDAG to canonicalize vector shuffles of splats
directly into blends of the splats.
These patterns show up even very late in the vector shuffle lowering
where we don't have any chance for DAG combining to kick in, and
blending is a tremendously simpler operation to model. By coercing the
shuffle into a blend we can much more easily match and lower shuffles of
splats.
Immediately with this change there are significantly more blends being
matched in the x86 vector shuffle lowering.
Chandler Carruth [Sun, 15 Feb 2015 12:07:55 +0000 (12:07 +0000)]
[x86] Teach the shuffle mask equivalence test to look through build
vectors and detect equivalent inputs.
This lets the code match unpck-style instructions when only one of the
inputs are lined up but the other input is a splat and so which lanes we
pull from doesn't matter. Today, this doesn't really happen, but just by
accident. I have a patch that normalizes how we shuffle splats, and with
that patch this will be necessary for a lot of the mask equivalence
tests to work.
I don't really know how to write a test case for this specific change
until the other change lands though.
Chandler Carruth [Sun, 15 Feb 2015 12:01:14 +0000 (12:01 +0000)]
[x86] Tweak the ordering of unpack matching vs. element insertion, and
don't try to do element insertion for non-zero-index floating point
vectors.
We don't have any useful patterns or lowering for element insertion into
high elements of a floating point vector, and the generic shuffle
lowering will end up being better -- namely it will fall back to unpck.
But we should try to handle other forms of element insertion before
matching unpck patterns.
While this doesn't matter much right now, I'm working on a patch that
makes unpck matching much more powerful, and that patch will break
without this re-ordering.
Chandler Carruth [Sun, 15 Feb 2015 10:34:52 +0000 (10:34 +0000)]
[x86] Stop shuffling zero vectors. =]
I was somewhat surprised this pattern really came up, but it does. It
seems better to just directly handle it than try to special case every
place where we end up forming a shuffle that devolves to a shuffle of
a zero vector.
Chandler Carruth [Sun, 15 Feb 2015 10:12:02 +0000 (10:12 +0000)]
[x86] When splitting 256-bit vectors into 128-bit vectors, don't extract
subvectors from buildvectors. That doesn't really make any sense and it
breaks all of the down-stream matching of buildvectors to cleverly lower
shuffles.
With this, we now get the shift-based lowering of 256-bit vector
shuffles with AVX1 when we split them into 128-bit vectors. We also do
much better on the zero-extension patterns, although there remains quite
a bit of room for improvement here.
Chandler Carruth [Sun, 15 Feb 2015 09:33:36 +0000 (09:33 +0000)]
[x86] Make computing the zeroable elements slightly more powerful, at
least in theory.
I don't actually have a test case that benefits from this, but
theoretically, it could come up, and I don't want to try to think about
whether this is the culprit or something else is, so I'd rather just
make this code powerful. =/ Makes me sad that I can't really test it
though.
gold-plugin: fix test to allow default visibility on local symbols
GNU ld sets default, not hidden, visibility on local symbols.
Having default or hidden visibility on local symbols makes no difference in run-time behavior.
Chandler Carruth [Sun, 15 Feb 2015 08:26:30 +0000 (08:26 +0000)]
[x86] Add a slight variation on some of the other generic shuffle
lowerings -- one which decomposes into an initial blend followed by
a permute.
Particularly on newer chips, blends are handled independently of
shuffles and so this is much less bottlenecked on the single port that
floating point shuffles are executed with on Intel.
I'll be adding this lowering to a bunch of other code paths in
subsequent commits to handle still more places where we can effectively
leverage blends when they're available in the ISA.
Chandler Carruth [Sun, 15 Feb 2015 07:05:50 +0000 (07:05 +0000)]
[x86] Add a test case for PR22390 which was a dup of PR22377 and fixed
by r229285. This is a nice different test case though, so I'd like to
have the extra testing of these kinds of patterns.
Chandler Carruth [Sun, 15 Feb 2015 07:01:10 +0000 (07:01 +0000)]
[x86] Fix PR22377, a regression with the new vector shuffle legality
test.
This was just a matter of the DAG combine for vector shuffles being too
aggressive. This is a bit of a grey area, but I think generally if we
can re-use intermediate shuffles, we should. Certainly, given the test
cases I have available, this seems like the right call.
Chandler Carruth [Sun, 15 Feb 2015 06:37:21 +0000 (06:37 +0000)]
[x86] Switch a collection of tests explicitly to the new vector shuffle
legality test (essentially, everything is legal).
I'm planning to make this the default shortly, but I'd like to fix
a collection of the bugs it exposes first, and this will let me easily
test them. It also showcases both the improvements and a few of the
regressions triggered by the change. The biggest improvements by far are
the significantly reduced shuffling and domain crossing in the combining
test case. The biggest regressions are missing some clever blending
patterns.
Craig Topper [Sun, 15 Feb 2015 04:54:55 +0000 (04:54 +0000)]
[X86] Add assembler predicates for the rest of the AVX512 feature flags. This makes the assembly matching consistent across all AVX512 instructions. Without this we were allowing some AVX512 instructions to be parsed always, but not the foundation instructions.
Chandler Carruth [Sun, 15 Feb 2015 00:08:01 +0000 (00:08 +0000)]
[x86] Teach my test updating script about another quirk of the printed
asm and port the mmx vector shuffle test to it.
Not thrilled with how it handles the stack manipulation logic, but I'm
much less bothered by that than I am by updating the test manually. =]
If anyone wants to teach the test checks management script about stack
adjustment patterns, that'd be cool too.
Simon Pilgrim [Sat, 14 Feb 2015 22:40:46 +0000 (22:40 +0000)]
[X86][XOP] Enable commutation for XOP instructions
Patch to allow XOP instructions (integer comparison and integer multiply-add) to be commuted. The comparison instructions sometimes require the compare mode to be flipped but the remaining instructions can use default commutation modes.
This patch also sets the SSE domains of all the XOP instructions.
Craig Topper [Sat, 14 Feb 2015 21:54:03 +0000 (21:54 +0000)]
[X86] Improve parsing support AVX/SSE floating point compare instruction mnemonic aliases. They'll now print with the alias the parser received instead of converting to the explicit immediate form.
InstCombine: propagate deref via new addDereferenceableAttr
The "dereferenceable" attribute cannot be added via .addAttribute(),
since it also expects a size in bytes. AttrBuilder#addAttribute or
AttributeSet#addAttribute is wrapped by classes Function, InvokeInst,
and CallInst. Add corresponding wrappers to
AttrBuilder#addDereferenceableAttr.
Having done this, propagate the dereferenceable attribute via
gc.relocate, adding a test to exercise it. Note that -datalayout is
required during execution over and above -instcombine, because
InstCombine only optionally requires DataLayoutPass.
Chandler Carruth [Sat, 14 Feb 2015 09:43:57 +0000 (09:43 +0000)]
[gold] Consolidate the gold plugin options and actually search for
a gold binary explicitly. Substitute this binary into the tests rather
than just directly executing the 'ld' binary.
This should allow folks to inject a cross compiling gold binary, or in
my case to use a gold binary built and installed somewhere other than
/usr/bin/ld. It should also allow the tests to find 'ld.gold' so that
things work even if gold isn't the default on the system.
I've only stubbed out support in the makefile to preserve the existing
behavior with none of the fancy logic. If someone else wants to add
logic here, they're welcome to do so.
Chandler Carruth [Sat, 14 Feb 2015 09:05:58 +0000 (09:05 +0000)]
Back out two accidental changes that snuck in with r229245. Sorry these
snuck in, they weren't ready for prime time and had *nothing* to do
with that commit.
Chandler Carruth [Sat, 14 Feb 2015 07:05:15 +0000 (07:05 +0000)]
[lit] Make the 'llvm-lit' utility defend against a system where Python3
is the default.
The lit.cfg files are not all valid Python3 and I've no idea if anyone
is really prepared to update them. The easiest way I know of to ensure
that this script uses Python 2 is to use 'python2.7' in the command. Mac
and Linux are definitely fine with this and I think other platforms will
be as well, but if anyone struggles with this set up and has better
ideas, let me know.
Zachary Turner [Sat, 14 Feb 2015 03:54:28 +0000 (03:54 +0000)]
llvm-pdbdump: Only dump whitelisted global symbols.
Dumping the global scope contains a lot of very uninteresting
things and is generally polluted with a lot of random junk.
Furthermore, it dumps values unsorted, making it hard to read.
This patch dumps known interesting types only, and as a side
effect sorts the list by symbol type.
Justin Bogner [Sat, 14 Feb 2015 02:01:24 +0000 (02:01 +0000)]
llvm-cov: Simplify coverage reports, fixing PR22575 in the process
PR22575 occurred because we were unsafely storing references into a
std::vector. If the vector moved because it grew, we'd be left
iterating through garbage memory. This avoids the issue by simplifying
the logic to gather coverage information as we go, rather than storing
it and iterating over it.
I'm relying on the existing tests showing that this is semantically
NFC, since it's difficult to hit the issue this fixes without
relatively large covered programs.
Philip Reames [Sat, 14 Feb 2015 00:05:36 +0000 (00:05 +0000)]
[InstCombine] When canonicalizing gep indices, prefer zext when possible
If we know that the sign bit of a value being sign extended is zero, we can use a zero extension instead. This is motivated by the fact that zero extensions are generally cheaper on x86 (and most other architectures?). We already apply a similar transform in DAGCombine, this just extends that to the IR level.
This comes up when we eagerly canonicalize gep indices to the width of a machine register (i64 on x86_64). To do so, we insert sign extensions (sext) to promote smaller types.
Frederic Riss [Fri, 13 Feb 2015 23:18:34 +0000 (23:18 +0000)]
[dsymutil] Add DIE selection algorithm.
With this commit, llvm-dsymutil learns how to choose which DIEs
it will link in the final output and which ones it won't. This
is based on the 'valid relocation' information that has been
built in the previous commits.
The test only tests that we choose the right 'root DIEs'. The
selection algorithm (and especially the part that walk the
dependencies of a root DIE) lacks a bit test coverage. This
will be much easier to cover when we output actual Dwarf and
thus can use llvm-dwarfdump to verify the structure of the
emitted DIE trees. I'll add more tests then.
Philip Reames [Fri, 13 Feb 2015 23:08:37 +0000 (23:08 +0000)]
Minor tweak to MDA
Two minor tweaks I noticed when reading through the code:
- No need to recompute begin() on every iteration. We're not modifying the instructions in this loop.
- We can ignore PHINodes and Dbg intrinsics. The current code does this anyways, but it will spend slightly more time doing so and will count towards the limit of instructions in the block. It seems really silly to give up due the presence of PHIs...