Tyler Nowicki [Wed, 25 Jun 2014 17:50:15 +0000 (17:50 +0000)]
Add Rpass-missed and Rpass-analysis reports to the loop vectorizer. The remarks give the vector width of vectorized loops and a brief analysis of loops that fail to be vectorized. For example, an analysis will be generated for loops containing control flow that cannot be simplified to a select. The optimization remarks also give the debug location of expressions that cannot be vectorized, for example the location of an unvectorizable call.
Andrea Di Biagio [Wed, 25 Jun 2014 17:41:58 +0000 (17:41 +0000)]
[X86] Always prefer to lower a VECTOR_SHUFFLE into a BLENDI instead of SHUFP (or VPERM2X128).
This patch teaches method 'LowerVECTOR_SHUFFLE' to give higher precedence to
the check for 'isBlendMask'; the idea is that, when possible, we should firstly
check if a shuffle performs a blend, and in case, try to lower it into a BLENDI
instead of selecting a SHUFP or (worse) a VPERM2X128.
In general:
- AVX VBLENDPS/D always have better latency and throughput than VPERM2F128;
- BLENDPS/D instructions tend to always have better 'reciprocal throughput'
than the equivalent SHUFPS/D;
- Both BLENDPS/D and SHUFPS/D are often decoded into the same number of
m-ops; however, a m-op obtained from a BLENDPS/D can be scheduled to more
than one execution port.
This patch:
- Moves the check for 'isBlendMask' immediately before the check for
'isSHUFPMask' within method 'LowerVECTOR_SHUFFLE';
- Updates existing tests for sse/avx shuffle/blend instructions to verify
that we select (v)blendps/d when possible (instead of (v)shufps/d or
vperm2f128).
JF Bastien [Wed, 25 Jun 2014 15:21:42 +0000 (15:21 +0000)]
Random Number Generator (llvm)
Provides an abstraction for a random number generator (RNG) that produces a stream of pseudo-random numbers.
The current implementation uses C++11 facilities and is therefore not cryptographically secure.
The RNG is salted with the text of the current command line invocation.
In addition, a user may specify a seed (reproducible builds).
In clang, the seed can be set via
-frandom-seed=X
In the back end, the seed can be set via
-rng-seed=X
This is the llvm part of the patch.
clang part: D3391
URL: http://reviews.llvm.org/D3390
Author: yln
I'm landing this for the second time, it broke Windows bots the first time around.
Evgeniy Stepanov [Wed, 25 Jun 2014 14:41:57 +0000 (14:41 +0000)]
[msan] Fix bad interaction between with-calls mode and chained origin tracking.
Origin history should only be recorded for uninitialized values, because it is
meaningless otherwise. This change moves __msan_chain_origin to the runtime
library side and makes it conditional on the corresponding shadow value.
Previous code was correct, but _very_ inefficient.
NAKAMURA Takumi [Wed, 25 Jun 2014 12:41:52 +0000 (12:41 +0000)]
Re-apply r211399, "Generate native unwind info on Win64" with a fix to ignore SEH pseudo ops in X86 JIT emitter.
--
This patch enables LLVM to emit Win64-native unwind info rather than
DWARF CFI. It handles all corner cases (I hope), including stack
realignment.
Because the unwind info is not flexible enough to describe stack frames
with a gap of unknown size in the middle, such as the one caused by
stack realignment, I modified register spilling code to place all spills
into the fixed frame slots, so that they can be accessed relative to the
frame pointer.
Andrea Di Biagio [Wed, 25 Jun 2014 10:02:21 +0000 (10:02 +0000)]
[X86] Add target combine rule to select ADDSUB instructions from a build_vector
This patch teaches the backend how to combine a build_vector that implements
an 'addsub' between packed float vectors into a sequence of vector add
and vector sub followed by a VSELECT.
The new VSELECT is expected to be lowered into a BLENDI.
At ISel stage, the sequence 'vector add + vector sub + BLENDI' is
pattern-matched against ISel patterns added at r211427 to select
'addsub' instructions.
Added three more ISel patterns for ADDSUB.
Added test sse3-avx-addsub-2.ll to verify that we correctly emit 'addsub'
instructions.
Evgeniy Stepanov [Wed, 25 Jun 2014 07:54:58 +0000 (07:54 +0000)]
[LICM] Don't create more than one copy of an instruction per loop exit block when sinking.
Fixes exponential compilation complexity in PR19835, caused by
LICM::sink not handling the following pattern well:
f = op g
e = op f, g
d = op e
c = op d, e
b = op c
a = op b, c
When an instruction with N uses is sunk, each of its operands gets N
new uses (all of them - phi nodes). In the example above, if a had 1
use, c would have 2, e would have 4, and g would have 8.
Tom Stellard [Tue, 24 Jun 2014 23:33:07 +0000 (23:33 +0000)]
R600/SI: Use a ComplexPattern for MUBUF stores
Now that non-leaf ComplexPatterns are allowed we can fold all the MUBUF
store patterns into the instruction definition. We will also be able to
reuse this new ComplexPattern for MUBUF loads and atomic operations.
Rafael Espindola [Tue, 24 Jun 2014 22:45:16 +0000 (22:45 +0000)]
Print a=b as an assignment.
In assembly the expression a=b is parsed as an assignment, so it should be
printed as one.
This remove a truly horrible hack for producing a label with "a=.". It would
be used by codegen but would never be reached by the asm parser. Sorry I
missed this when it was first committed.
Matt Arsenault [Tue, 24 Jun 2014 22:13:39 +0000 (22:13 +0000)]
R600: Fix inconsistency in rsq instructions.
R600 was using a clamped version of rsq, but SI was not. Add a
new rsq_clamped intrinsic and use them consistently.
It's unclear to me from the documentation what behavior
the R600 instructions have, so I assume they have the legacy behavior
described by the SI documents. For R600, use RECIPSQRT_IEEE
for both llvm.AMDGPU.rsq.legacy and llvm.AMDGPU.rsq. R600 also
has RECIPSQRT_FF, which I'm not sure how it fits in here.
David Blaikie [Tue, 24 Jun 2014 20:10:27 +0000 (20:10 +0000)]
Fix up scoping in a few tests (and delete one that validates unnecessary behavior).
Most of this is just tests that were silently succeeding in spite of
schema changes I made over a year ago. Cleaning them up as they lead to
failures in a change I'm working on/will come soon.
test/DebugInfo/2010-01-19-DbgScope.ll was removed as it tested miscoping
where a DebugLoc described a location not in the current function. The
test case doesn't describe why this is a valid situation and should be
supported, so I'm removing it and shortly going to commit changes that
make this firmly unsupported/assert-fail.
Bill Schmidt [Tue, 24 Jun 2014 20:05:18 +0000 (20:05 +0000)]
[PPC64] Fix PR20071 (fctiduz generated for targets lacking that instruction)
PR20071 identifies a problem in PowerPC's fast-isel implementation for
floating-point conversion to integer. The fctiduz instruction was added in
Power ISA 2.06 (i.e., Power7 and later). However, this instruction is being
generated regardless of which 64-bit PowerPC target is selected.
The intent is for fast-isel to punt to DAG selection when this instruction is
not available. This patch implements that change. For testing purposes, the
existing fast-isel-conversion.ll test adds a RUN line for -mcpu=970 and tests
for the expected code generation. Additionally, the existing test
fast-isel-conversion-p5.ll was found to be incorrectly expecting the
unavailable instruction to be generated. I've removed these test variants
since we have adequate coverage in fast-isel-conversion.ll.
Diego Novillo [Tue, 24 Jun 2014 17:02:03 +0000 (17:02 +0000)]
Add new debug kind LocTrackingOnly.
Summary:
This new debug emission kind supports emitting line location
information in all instructions, but stops code generation
from emitting debug info to the final output.
This mode is useful when the backend wants to track source
locations during code generation, but it does not want to
produce debug info. This is currently used by optimization
remarks (-pass-remarks, -pass-remarks-missed and
-pass-remarks-analysis).
To prevent debug info emission, DIBuilder never inserts the
annotation 'llvm.dbg.cu' when LocTrackingOnly is enabled.
Daniel Sanders [Tue, 24 Jun 2014 13:53:56 +0000 (13:53 +0000)]
Revert: r211588 - [mips] Use __clear_cache builtin instead of cacheflush() in Unix Memory::InvalidateInstructionCache()
Buildbot reports a test failure on the llvm-mips-linux builder and blames r211588.
Although it doesn't appear in the blamelist, it seems it could also be r211587
(because it's committed to compiler-rt?) since they were tested together.
Reverting the most likely suspect (r211588) to confirm one way or the other.
Daniel Sanders [Tue, 24 Jun 2014 13:00:32 +0000 (13:00 +0000)]
[mips] Added support for assembling sdbbp.
Summary:
This instruction is re-encoded in MIPS32r6/MIPS64r6 without changing the
restrictions. We hadn't implemented it for earlier ISA's so it has been added to those too.
ScaledNumber has been cleaned up enough to pull out of BFI now. Still
work to do there (tests for shifting, bloated printing code, etc.), but
it seems clean enough for its new home.
Rafael Espindola [Mon, 23 Jun 2014 22:00:37 +0000 (22:00 +0000)]
Pass a std::unique_ptr& to the create??? methods is lib/Object.
This makes the buffer ownership on error conditions very natural. The buffer
is only moved out of the argument if an object is constructed that now
owns the buffer.
Juergen Ributzka [Mon, 23 Jun 2014 21:55:44 +0000 (21:55 +0000)]
[FastISel][X86] Lower unsupported selects to control-flow.
The extends the select lowering coverage by emiting pseudo cmov
instructions. These insturction will be later on lowered to control-flow to
simulate the select.
Juergen Ributzka [Mon, 23 Jun 2014 21:55:40 +0000 (21:55 +0000)]
[FastISel][X86] Add support for floating-point select.
This extends the select lowering to support floating-point selects. The
lowering depends on SSE instructions and that the conditon comes from a
floating-point compare. Under this conditions it is possible to emit an
optimized instruction sequence that doesn't require any branches to
simulate the select.
Rafael Espindola [Mon, 23 Jun 2014 21:53:12 +0000 (21:53 +0000)]
Make ObjectFile and BitcodeReader always own the MemoryBuffer.
This allows us to just use a std::unique_ptr to store the pointer to the buffer.
The flip side is that they have to support releasing the buffer back to the
caller.
Overall this looks like a more efficient and less brittle api.
Weiming Zhao [Mon, 23 Jun 2014 20:44:16 +0000 (20:44 +0000)]
Fix PR20056: Implement pseudo LDR <reg>, =<literal/label> for AArch64
This patch is based on the changes from ARM target [1,2]
Based on ARM doc [3], if the literal value can be loaded with a valid MOV,
it can emit that instruction. This is implemented in this patch.
[1] Fix PR18345: ldr= pseudo instruction produces incorrect code when using in inline assembly
Author: David Peixotto <dpeixott@codeaurora.org>
commit b92cca222898d87bbc764fa22e805adb04ef7f13 (r200777)
[2] Implement the ldr-pseudo opcode for ARM assembly
Author: David Peixotto <dpeixott@codeaurora.org>
commit 0fa193b08627927ccaa0804a34d80480894614b8 (r197708)
[3] http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0802a/CJAHAIBC.html
David Blaikie [Mon, 23 Jun 2014 18:28:53 +0000 (18:28 +0000)]
Recommit 211309 (StringMap::insert), reverted in 211328 due to issues with private, but non-deleted, move members.
Certain versions of GCC (~4.7) couldn't handle the SFINAE on access
control, but with "= delete" (hidden behind a macro for portability)
this issue is worked around/addressed.