Alp Toker [Fri, 27 Jun 2014 04:33:58 +0000 (04:33 +0000)]
ParseIR: don't take ownership of the MemoryBuffer
clang was needlessly duplicating whole memory buffer contents in an attempt to
satisfy unclear ownership semantics. Let's just hide internal LLVM quirks and
present a simple non-owning interface.
The public C API preserves previous behaviour for stability.
Add the new AppContainer characteristic which is import for Windows Store
(Metro) compatible applications. Add the new Control Flow Guard flag to bring
the enumeration up to date with the current values as of Windows 8.1.
Rafael Espindola [Fri, 27 Jun 2014 02:51:21 +0000 (02:51 +0000)]
Don't force the build of toos/lto as a static lib.
Any uses of tools/lto as a static lib should probably move to lib/LTO.
This was also never implemented in the configure build, so this reduces
the differences among the two.
Eric Christopher [Fri, 27 Jun 2014 01:14:50 +0000 (01:14 +0000)]
Remove uses and caches of the target machine and subtarget from
both MSP430InstrInfo and MSP430RegisterInfo. Remove unused member
variable StackAlign from MSP430RegisterInfo. Update constructors
accordingly.
Juergen Ributzka [Thu, 26 Jun 2014 23:39:44 +0000 (23:39 +0000)]
[Stackmaps] Remove the liveness calculation for stackmap intrinsics.
There is no need to calculate the liveness information for stackmaps. The
liveness information is still available for the patchpoint intrinsic and
that is also the intended usage model.
Eric Christopher [Thu, 26 Jun 2014 22:33:55 +0000 (22:33 +0000)]
Move the various Subtarget dependent members down to the subtarget
for the Sparc port. Use the same initializeSubtargetDependencies
function to handle initialization similar to the other ports to
handle dependencies.
Matt Arsenault [Thu, 26 Jun 2014 17:22:30 +0000 (17:22 +0000)]
R600/SI: Add FP mode bits to binary.
The default rounding mode to initialize the mode register needs
to be reported to the runtime. Fill in other bits a kernel
may be interested in setting for future use.
Will Schmidt [Thu, 26 Jun 2014 13:36:19 +0000 (13:36 +0000)]
add ppc64/pwr8 as target
includes handling DIR_PWR8 where appropriate
The P7Model Itinerary is currently tied in for use under the P8Model, and will be updated later.
Renato Golin [Thu, 26 Jun 2014 13:10:53 +0000 (13:10 +0000)]
Added parsing co-processor names starting with "cr"
Additional compliant GAS names for coprocessor register name
are enabled for all instruction with parameter MCK_CoprocReg:
LDC,LDC2,STC,STC2,CDP,CDP2,MCR,MCR2,MCRR,MCRR2,MRC,MRC2,MRRC,MRRC2
Andrea Di Biagio [Thu, 26 Jun 2014 10:45:21 +0000 (10:45 +0000)]
[X86] Improve the selection of SSE3/AVX addsub instructions.
This patch teaches the backend how to canonicalize a shuffle vectors
according to the rule:
- (shuffle (FADD A, B), (FSUB A, B), Mask) ->
(shuffle (FSUB A, -B), (FADD A, -B), Mask)
Where 'Mask' is:
<0,5,2,7> ;; for v4f32 and v4f64 shuffles.
<0,3> ;; for v2f64 shuffles.
<0,9,2,11,4,13,6,15> ;; for v8f32 shuffles.
In general, ISel only knows how to pattern-match a canonical
'fadd + fsub + blendi' dag node sequence into an ADDSUB instruction.
This new rule allows to convert a non-canonical dag sequence into a
canonical one that will be matched by a single ADDSUB at ISel stage.
The idea of converting a non-canonical ADDSUB into a canonical one by
swapping the first two operands of the shuffle, and then negating the
second operand of the FADD and FSUB, was originally proposed by Hal Finkel.
David Majnemer [Thu, 26 Jun 2014 03:02:19 +0000 (03:02 +0000)]
GlobalOpt: Don't optimize thread_local for initializers
Folding a reference to a thread_local variable into another global
variable's initializer is very problematic, there is no relocation that
exists to represent such an access.
Adam Nemet [Thu, 26 Jun 2014 00:21:12 +0000 (00:21 +0000)]
[X86] AVX512: Fix asm syntax for packed vcmp
The *_alt defs for vcmp are used by the InstParser (the asm string in the main
def is used by the InstPrinter) . The former was accepting vector registers
as destination rather than mask registers.
Alp Toker [Thu, 26 Jun 2014 00:00:48 +0000 (00:00 +0000)]
Introduce a string_ostream string builder facilty
string_ostream is a safe and efficient string builder that combines opaque
stack storage with a built-in ostream interface.
small_string_ostream<bytes> additionally permits an explicit stack storage size
other than the default 128 bytes to be provided. Beyond that, storage is
transferred to the heap.
This convenient class can be used in most places an
std::string+raw_string_ostream pair or SmallString<>+raw_svector_ostream pair
would previously have been used, in order to guarantee consistent access
without byte truncation.
The patch also converts much of LLVM to use the new facility. These changes
include several probable bug fixes for truncated output, a programming error
that's no longer possible with the new interface.
Juergen Ributzka [Wed, 25 Jun 2014 20:06:12 +0000 (20:06 +0000)]
[FastISel][X86] Only fold the cmp into the select when both instructions are in the same basic block.
If the cmp is in a different basic block, then it is possible that not all
operands of that compare have defined registers. This can happen when one of
the operands to the cmp is a load and the load gets folded into the cmp. In
this case FastISel will skip the load instruction and the vreg is never
defined.
David Blaikie [Wed, 25 Jun 2014 18:03:10 +0000 (18:03 +0000)]
PR20038: DebugInfo: Inlined call sites where the caller has debug info but the call itself has no debug location.
This situation does bad things when inlined, so I've fixed Clang not to
produce inlinable call sites without locations when the caller has debug
info (in the one case where I could find that this occurred). This
updates the PR20038 test case to be what clang now produces, and readds
the assertion that had to be removed due to this bug.
I've also beefed up the debug info verifier to help diagnose these
issues in the future, and I hope to add checks to the inliner to just
assert-fail if it encounters this situation. If, in the future, we
decide we have to cope with this situation, the right thing to do is
probably to just remove all the DebugLocs from the inlined instructions.
Tyler Nowicki [Wed, 25 Jun 2014 17:50:15 +0000 (17:50 +0000)]
Add Rpass-missed and Rpass-analysis reports to the loop vectorizer. The remarks give the vector width of vectorized loops and a brief analysis of loops that fail to be vectorized. For example, an analysis will be generated for loops containing control flow that cannot be simplified to a select. The optimization remarks also give the debug location of expressions that cannot be vectorized, for example the location of an unvectorizable call.
Andrea Di Biagio [Wed, 25 Jun 2014 17:41:58 +0000 (17:41 +0000)]
[X86] Always prefer to lower a VECTOR_SHUFFLE into a BLENDI instead of SHUFP (or VPERM2X128).
This patch teaches method 'LowerVECTOR_SHUFFLE' to give higher precedence to
the check for 'isBlendMask'; the idea is that, when possible, we should firstly
check if a shuffle performs a blend, and in case, try to lower it into a BLENDI
instead of selecting a SHUFP or (worse) a VPERM2X128.
In general:
- AVX VBLENDPS/D always have better latency and throughput than VPERM2F128;
- BLENDPS/D instructions tend to always have better 'reciprocal throughput'
than the equivalent SHUFPS/D;
- Both BLENDPS/D and SHUFPS/D are often decoded into the same number of
m-ops; however, a m-op obtained from a BLENDPS/D can be scheduled to more
than one execution port.
This patch:
- Moves the check for 'isBlendMask' immediately before the check for
'isSHUFPMask' within method 'LowerVECTOR_SHUFFLE';
- Updates existing tests for sse/avx shuffle/blend instructions to verify
that we select (v)blendps/d when possible (instead of (v)shufps/d or
vperm2f128).
JF Bastien [Wed, 25 Jun 2014 15:21:42 +0000 (15:21 +0000)]
Random Number Generator (llvm)
Provides an abstraction for a random number generator (RNG) that produces a stream of pseudo-random numbers.
The current implementation uses C++11 facilities and is therefore not cryptographically secure.
The RNG is salted with the text of the current command line invocation.
In addition, a user may specify a seed (reproducible builds).
In clang, the seed can be set via
-frandom-seed=X
In the back end, the seed can be set via
-rng-seed=X
This is the llvm part of the patch.
clang part: D3391
URL: http://reviews.llvm.org/D3390
Author: yln
I'm landing this for the second time, it broke Windows bots the first time around.
Evgeniy Stepanov [Wed, 25 Jun 2014 14:41:57 +0000 (14:41 +0000)]
[msan] Fix bad interaction between with-calls mode and chained origin tracking.
Origin history should only be recorded for uninitialized values, because it is
meaningless otherwise. This change moves __msan_chain_origin to the runtime
library side and makes it conditional on the corresponding shadow value.
Previous code was correct, but _very_ inefficient.
NAKAMURA Takumi [Wed, 25 Jun 2014 12:41:52 +0000 (12:41 +0000)]
Re-apply r211399, "Generate native unwind info on Win64" with a fix to ignore SEH pseudo ops in X86 JIT emitter.
--
This patch enables LLVM to emit Win64-native unwind info rather than
DWARF CFI. It handles all corner cases (I hope), including stack
realignment.
Because the unwind info is not flexible enough to describe stack frames
with a gap of unknown size in the middle, such as the one caused by
stack realignment, I modified register spilling code to place all spills
into the fixed frame slots, so that they can be accessed relative to the
frame pointer.
Andrea Di Biagio [Wed, 25 Jun 2014 10:02:21 +0000 (10:02 +0000)]
[X86] Add target combine rule to select ADDSUB instructions from a build_vector
This patch teaches the backend how to combine a build_vector that implements
an 'addsub' between packed float vectors into a sequence of vector add
and vector sub followed by a VSELECT.
The new VSELECT is expected to be lowered into a BLENDI.
At ISel stage, the sequence 'vector add + vector sub + BLENDI' is
pattern-matched against ISel patterns added at r211427 to select
'addsub' instructions.
Added three more ISel patterns for ADDSUB.
Added test sse3-avx-addsub-2.ll to verify that we correctly emit 'addsub'
instructions.
Evgeniy Stepanov [Wed, 25 Jun 2014 07:54:58 +0000 (07:54 +0000)]
[LICM] Don't create more than one copy of an instruction per loop exit block when sinking.
Fixes exponential compilation complexity in PR19835, caused by
LICM::sink not handling the following pattern well:
f = op g
e = op f, g
d = op e
c = op d, e
b = op c
a = op b, c
When an instruction with N uses is sunk, each of its operands gets N
new uses (all of them - phi nodes). In the example above, if a had 1
use, c would have 2, e would have 4, and g would have 8.