Tyler Nowicki [Mon, 10 Aug 2015 23:01:55 +0000 (23:01 +0000)]
Extend late diagnostics to include late test for runtime pointer checks.
This patch moves checking the threshold of runtime pointer checks to the vectorization requirements (late diagnostics) and emits a diagnostic that infroms the user the loop would be vectorized if not for exceeding the pointer-check threshold. Clang will also append the options that can be used to allow vectorization.
JF Bastien [Mon, 10 Aug 2015 22:36:48 +0000 (22:36 +0000)]
WebAssembly: print immediates
Summary:
For now output using C99's hexadecimal floating-point representation.
This patch also cleans up how machine operands are printed: instead of special-casing per type of machine instruction, the code now handles operands generically.
Alex Lorenz [Mon, 10 Aug 2015 21:47:36 +0000 (21:47 +0000)]
MachineVerifier: Handle the optional def operand in a PATCHPOINT instruction.
The PATCHPOINT instructions have a single optional defined register operand,
but the machine verifier can't verify the optional defined register operands.
This commit makes sure that the machine verifier won't report an error when a
PATCHPOINT instruction doesn't have its optional defined register operand.
This change will allow us to enable the machine verifier for the code
generation tests for the patchpoint intrinsics.
Reid Kleckner [Mon, 10 Aug 2015 21:47:11 +0000 (21:47 +0000)]
[llvm-symbolizer] Remove underscores and other C mangling on Windows
Summary:
This makes it so that reports symbolized after the fact with
llvm-symbolizer are more similar to the ones we generate at runtime with
in-process dbghelp.
Alex Lorenz [Mon, 10 Aug 2015 21:27:03 +0000 (21:27 +0000)]
StackMap: FastISel: Add an appropriate number of immediate operands to the
frame setup instruction.
This commit ensures that the stack map lowering code in FastISel adds an
appropriate number of immediate operands to the frame setup instruction.
The previous code added just one immediate operand, which was fine for a target
like AArch64, but on X86 the ADJCALLSTACKDOWN64 instruction needs two explicit
operands. This caused the machine verifier to report an error when the old code
added just one.
JF Bastien [Mon, 10 Aug 2015 20:59:36 +0000 (20:59 +0000)]
x86: Emit LAHF/SAHF instead of PUSHF/POPF
NaCl's sandbox doesn't allow PUSHF/POPF out of security concerns (priviledged emulators have forgotten to mask system bits in the past, and EFLAGS's DF bit is a constant source of hilarity). Commit r220529 fixed PR20376 by saving cmpxchg's flags result using EFLAGS, this commit now generated LAHF/SAHF instead, for all of x86 (not just NaCl) because it leads to an overall performance gain over PUSHF/POPF.
As with the previous patch this code generation is pretty bad because it occurs very later, after register allocation, and in many cases it rematerializes flags which were already available (e.g. already in a register through SETE). Fortunately it's somewhat rare that this code needs to fire.
I did [[ https://github.com/jfbastien/benchmark-x86-flags | a bit of benchmarking ]], the results on an Intel Haswell E5-2690 CPU at 2.9GHz are:
Simon Pilgrim [Mon, 10 Aug 2015 20:21:15 +0000 (20:21 +0000)]
[InstCombine] Move SSE2/AVX2 arithmetic vector shift folding to instcombiner
As discussed in D11760, this patch moves the (V)PSRA(WD) arithmetic shift-by-constant folding to InstCombine to match the logical shift implementations.
Tyler Nowicki [Mon, 10 Aug 2015 19:51:46 +0000 (19:51 +0000)]
Late evaluation of the fast-math vectorization requirement.
This patch moves the verification of fast-math to just before vectorization is done. This way we can tell clang to append the command line options would that allow floating-point commutativity. Specifically those are enableing fast-math or specifying a loop hint.
Tyler Nowicki [Mon, 10 Aug 2015 19:14:16 +0000 (19:14 +0000)]
Modify diagnostic messages to clearly indicate the why interleaving wasn't done.
Sometimes interleaving is not beneficial, as determined by the cost-model and sometimes it is disabled by a loop hint (by the user). This patch modifies the diagnostic messages to make it clear why interleaving wasn't done.
James Y Knight [Mon, 10 Aug 2015 19:11:39 +0000 (19:11 +0000)]
[Sparc] Implement i64 load/store support for 32-bit sparc.
The LDD/STD instructions can load/store a 64bit quantity from/to
memory to/from a consecutive even/odd pair of (32-bit) registers. They
are part of SparcV8, and also present in SparcV9. (Although deprecated
there, as you can store 64bits in one register).
As recommended on llvmdev in the thread "How to enable use of 64bit
load/store for 32bit architecture" from Apr 2015, I've modeled the
64-bit load/store operations as working on a v2i32 type, rather than
making i64 a legal type, but with few legal operations. The latter
does not (currently) work, as there is much code in llvm which assumes
that if i64 is legal, operations like "add" will actually work on it.
The same assumption does not hold for v2i32 -- for vector types, it is
workable to support only load/store, and expand everything else.
This patch:
- Adds a new register class, IntPair, for even/odd pairs of registers.
- Modifies the list of reserved registers, the stack spilling code,
and register copying code to support the IntPair register class.
- Adds support in AsmParser. (note that in asm text, you write the
name of the first register of the pair only. So the parser has to
morph the single register into the equivalent paired register).
- Adds the new instructions themselves (LDD/STD/LDDA/STDA).
- Hooks up the instructions and registers as a vector type v2i32. Adds
custom legalizer to transform i64 load/stores into v2i32 load/stores
and bitcasts, so that the new instructions can actually be
generated, and marks all operations other than load/store on v2i32
as needing to be expanded.
- Copies the unfortunate SelectInlineAsm hack from ARMISelDAGToDAG.
This hack undoes the transformation of i64 operands into two
arbitrarily-allocated separate i32 registers in
SelectionDAGBuilder. and instead passes them in a single
IntPair. (Arbitrarily allocated registers are not useful, asm code
expects to be receiving a pair, which can be passed to ldd/std.)
Also adds a bunch of test cases covering all the bugs I've added along
the way.
Jonathan Roelofs [Mon, 10 Aug 2015 19:01:27 +0000 (19:01 +0000)]
Fix a bunch of trivial cases of 'CHECK[^:]*$' in the tests. NFCI
I looked into adding a warning / error for this to FileCheck, but there doesn't
seem to be a good way to avoid it triggering on the instances of it in RUN lines.
Yaron Keren [Mon, 10 Aug 2015 18:03:35 +0000 (18:03 +0000)]
Really implement David Blaikie suggestion in full of seperating
variable initialization from its usage in the push_back making
collapse of the two statements unlikely even without a comment.
Mark Heffernan [Mon, 10 Aug 2015 17:28:08 +0000 (17:28 +0000)]
Add new llvm.loop.unroll.enable metadata.
This change adds the unroll metadata "llvm.loop.unroll.enable" which directs
the optimizer to unroll a loop fully if the trip count is known at compile time, and
unroll partially if the trip count is not known at compile time. This differs from
"llvm.loop.unroll.full" which explicitly does not unroll a loop if the trip count is not
known at compile time.
The "llvm.loop.unroll.enable" is intended to be added for loops annotated with
"#pragma unroll".
Silviu Baranga [Mon, 10 Aug 2015 14:50:54 +0000 (14:50 +0000)]
[TTI] Add a hook for specifying per-target defaults for Interleaved Accesses
Summary:
This adds a hook to TTI which enables us to selectively turn on by default
interleaved access vectorization for targets on which we have have performed
the required benchmarking.
Fraser Cormack [Mon, 10 Aug 2015 14:48:47 +0000 (14:48 +0000)]
Prevent the scalarizer from caching incorrect entries
The scalarizer can cache incorrect entries when walking up a chain of
insertelement instructions. This occurs when it encounters more than one
instruction that it is not actively searching for, as it unconditionally caches
every element it finds. The fix is to only cache the first element that it
isn't searching for so we don't overwrite correct entries.
Michael Kruse [Mon, 10 Aug 2015 13:21:59 +0000 (13:21 +0000)]
[RegionInfo] Add debug-time region viewer functions
Summary:
Analogously to Function::viewCFG(), RegionInfo::view() and RegionInfo::viewOnly() are meant to be called in debugging sessions. They open a viewer to show how RegionInfo currently understands the region hierarchy.
The functions viewRegion(Function*) and viewRegionOnly(Function*) invoke a fresh region analysis of the function in contrast to viewRegion(RegionInfo*) and viewRegionOnly(RegionInfo*) which show the current analysis result.
Robert Lougher [Mon, 10 Aug 2015 11:59:44 +0000 (11:59 +0000)]
Trace copies when checking for rematerializability in spill weight calculation
PR24139 contains an analysis of poor register allocation. One of the findings
was that when calculating the spill weight, a rematerializable interval once
split is no longer rematerializable. This is because the isRematerializable
check in CalcSpillWeights.cpp does not follow the copies introduced by live
range splitting (after splitting, the live interval register definition is a
copy which is not rematerializable).
The SP was always unconditionally assigned to later, but initialised early.
This delays the initialisation, and avoids the dead store. Identified by
clang static analysis. No functional change intended.
David Majnemer [Sun, 9 Aug 2015 15:43:02 +0000 (15:43 +0000)]
[PHITransAddr] Don't assume that instruction operands are translatable
We can only PHI translate instructions. In our attempt to PHI translate
a bitcast, we attempt to translate its operand; however, the operand
might be an argument or a global instead of an instruction. Benignly
bail out when this happens.
Justin Bogner [Sat, 8 Aug 2015 21:04:45 +0000 (21:04 +0000)]
cmake: Error on invalid CMAKE_BUILD_TYPE
Apparently if you make a typo in the argument to CMAKE_BUILD_TYPE,
cmake silently accepts this but doesn't apply any particular build
type to your build. This means you get a build that doesn't really
make any sense - it's sort of a debug build with asserts disabled.
Yaron Keren [Sat, 8 Aug 2015 21:03:19 +0000 (21:03 +0000)]
Fix dangling reference in DwarfLinker.cpp. The original code
Seq.emplace_back(Seq.back());
does not work as planned, since Seq.back() may become a dangling reference
when emplace_back is called and possibly reallocates vector. To avoid this,
the vector allocation should be reserved first and only then used.
This broke test/tools/dsymutil/X86/custom-line-table.test with Visual C++ 2013.
Craig Topper [Sat, 8 Aug 2015 07:20:04 +0000 (07:20 +0000)]
Add SlowBTMem to Sandy Bridge and newer Intel CPUs. Reading through Agner Fog's table suggests there have been no improvements to these processors relative to Westmere for bit test instructions.