[X86] Convert esp-relative movs of function arguments into pushes, step 1
This handles the simplest case for mov -> push conversion:
1. x86-32 calling convention, everything is passed through the stack.
2. There is no reserved call frame.
3. Only registers or immediates are pushed, no attempt to combine a mem-reg-mem sequence into a single PUSHmm.
NAKAMURA Takumi [Tue, 9 Dec 2014 01:03:27 +0000 (01:03 +0000)]
Revert r223709, "[PowerPC]Activate FeatureVSX for the Power target", to unbreak bots.
CodeGen/PowerPC/vsx-p8.ll was failing.
'+power8-vector' is not a recognized feature for this target (ignoring feature)
llvm/test/CodeGen/PowerPC/vsx-p8.ll:33:14: error: expected string not found in input
; CHECK-REG: lxvw4x 34, 0, 3
^
<stdin>:50:2: note: scanning from here
.align 3
^
<stdin>:61:2: note: possible intended match here
lvx 3, 0, 3
^
Hal Finkel [Tue, 9 Dec 2014 01:00:59 +0000 (01:00 +0000)]
Handle early-clobber registers in the aggressive anti-dep breaker
The aggressive anti-dep breaker, used by the PowerPC backend during post-RA
scheduling (but is available to all targets), did not handle early-clobber MI
operands (at all). When constructing the list of available registers for the
replacement of some def operand, check the using instructions, and remove
registers assigned to early-clobbered defs from the set.
Tom Stellard [Mon, 8 Dec 2014 23:36:48 +0000 (23:36 +0000)]
MISched: Fix moving stores across barriers
This fixes an issue with ScheduleDAGInstrs::buildSchedGraph
where stores without an underlying object would not be added
as a predecessor to the current BarrierChain.
Hal Finkel [Mon, 8 Dec 2014 22:54:22 +0000 (22:54 +0000)]
[PowerPC] Don't use a non-allocatable register to implement the 'cc' alias
GCC accepts 'cc' as an alias for 'cr0', and we need to do the same when
processing inline asm constraints. This had previously been implemented using a
non-allocatable register, named 'cc', that was listed as an alias of 'cr0', but
the infrastructure does not seem to support this properly (neither the register
allocator nor the scheduler properly accounts for the alias). Instead, we can
just process this as a naming alias inside of the inline asm
constraint-processing code, so we'll do that instead.
There are two regression tests, one where the post-RA scheduler did the wrong
thing with the non-allocatable alias, and one where the register allocator did
the wrong thing. Fixes PR21742.
Fix a compact unwind encoding logic bug which would try to encode
more callee saved registers than it should, leading to early bail out
in the encoding logic and abusive use of DWARF frame mode unnecessarily.
Also remove no-compact-unwind.ll which was testing the wrong thing
based on this bug and move it to valid 'compact unwind' tests. Added
other few more tests too.
Justin Bogner [Mon, 8 Dec 2014 18:02:35 +0000 (18:02 +0000)]
InstrProf: An intrinsic and lowering for instrumentation based profiling
Introduce the ``llvm.instrprof_increment`` intrinsic and the
``-instrprof`` pass. These provide the infrastructure for writing
counters for profiling, as in clang's ``-fprofile-instr-generate``.
The implementation of the instrprof pass is ported directly out of the
CodeGenPGO classes in clang, and with the followup in clang that rips
that code out to use these new intrinsics this ends up being NFC.
Doing the instrumentation this way opens some doors in terms of
improving the counter performance. For example, this will make it
simple to experiment with alternate lowering strategies, and allows us
to try handling profiling specially in some optimizations if we want
to.
Finally, this drastically simplifies the frontend and puts all of the
lowering logic in one place.
[X86] Improved tablegen patters for matching TZCNT/LZCNT.
Teach ISel how to match a TZCNT/LZCNT from a conditional move if the
condition code is X86_COND_NE.
Existing tablegen patterns only allowed to match TZCNT/LZCNT from a
X86cond with condition code equal to X86_COND_E. To avoid introducing
extra rules, I added an 'ImmLeaf' definition that checks if the
condition code is COND_E or COND_NE.
[X86] Improved lowering of packed v8i16 vector shifts by non-constant count.
Before this patch, the backend sub-optimally expanded the non-constant shift
count of a v8i16 shift into a sequence of two 'movd' plus 'movzwl'.
With this patch the backend checks if the target features sse4.1. If so, then
it lets the shuffle legalizer deal with the expansion of the shift amount.
Example:
;;
define <8 x i16> @test(<8 x i16> %A, <8 x i16> %B) {
%shamt = shufflevector <8 x i16> %B, <8 x i16> undef, <8 x i32> zeroinitializer
%shl = shl <8 x i16> %A, %shamt
ret <8 x i16> %shl
}
;;
X86 intrinsics moved form X86ISelLowering.cpp to X86IntrinsicsInfo.h
X86ISelLowering.cpp has a long switch for intrinsics. I moved a part of
this long switch to the new intrinsics table in X86IntrinsicsInfo.h.
No functional changes, just code and compile time optimization.
IR: Revert r223618 behaviour of MDNode::concatenate()
r223618 including special handling of `MDNode::intersect()`: if the
first operand is a self-reference with the same operands you're trying
to return, return it instead.
Reuse that handling in `MDNode::concatenate()` in the hopes that it
fixes a polly test that seems to rely on the old behaviour [1].
It doesn't make sense to unique self-referencing nodes. Drop uniquing
for them.
Note that `MDNode::intersect()` occasionally returns self-referencing
nodes. Previously these would be returned by `MDNode::get()`. I'm not
convinced this was intended behaviour -- to me it seems it should return
a node whose only operand is the self-reference -- but I don't know much
about alias scopes so I'm preserving it for now.
Apparently `MDNode` uniquing used to be optional. I suppose the
configure flag must have disappeared at some point. Change the test so
it actually tests uniquing, and remove the check for
`ENABLE_MDNODE_UNIQUING`.
Add assembly and bitcode tests that I neglected to add in r223564 (IR:
Disallow complicated function-local metadata) and r223574 (IR: Disallow
function-local metadata attachments).
Found a couple of bugs:
- The error message for function-local attachments gave the wrong line
number -- it indicated the next token (typically on the next line)
instead of the token that started the attachment. Fixed.
- Metadata arguments of the form `!{i32 0, i32 %v}` (or with the
arguments reversed) fired an assertion in `ValueEnumerator` in LLVM
v3.5, so I suppose this never really worked. I suppose this was
"fixed" by r223564.
(Thanks to dblaikie for pointing out my omission.)
[x86] Clean up the SSE1 test to use a slightly different pattern for
matching offsets. I don't expect this to really matter, but its what the
latest incarnation of my script for maintaining these tests happens to
produce, and so its simpler for me if everything matches.
[x86] Switch a constant selection test to use positive assertions and to
store to real pointers so that its clear that the right code is in fact
being generated.
[x86] Clean up the shift lowering vector shuffle tests a bit using my
script. Notably this folds all the SSE cases together into a single
FileCheck block. It also adds a vex prefix.
Benjamin Kramer [Sat, 6 Dec 2014 19:22:44 +0000 (19:22 +0000)]
Make the DenseMap bucket type configurable and use a smaller bucket for DenseSet.
DenseSet used to be implemented as DenseMap<Key, char>, which usually doubled
the memory footprint of the map. Now we use a compressed set so the second
element uses no memory at all. This required some surgery on DenseMap as
all accesses to the bucket now have to go through methods; this should
have no impact on the behavior of DenseMap though. The new default bucket
type for DenseMap is a slightly extended std::pair as we expose it through
DenseMap's iterator and don't want to break any existing users.
Hans Wennborg [Sat, 6 Dec 2014 01:28:50 +0000 (01:28 +0000)]
SelectionDAG switch lowering: Replace unreachable default with most popular case.
This can significantly reduce the size of the switch, allowing for more
efficient lowering.
I also worked with the idea of exploiting unreachable defaults by
omitting the range check for jump tables, but always ended up with a
non-neglible binary size increase. It might be worth looking into some more.
SimplifyCFG currently does this transformation, but I'm working towards changing
that so we can optimize harder based on unreachable defaults.
Disallow complex types of function-local metadata. The only valid
function-local metadata is an `MDNode` whose sole argument is a
non-metadata function-local value.