Sanjay Patel [Mon, 5 Dec 2016 15:58:21 +0000 (15:58 +0000)]
[TargetLowering] add special-case for demanded bits analysis of 'not'
We treat bitwise 'not' as a special operation and try not to reduce its all-ones mask.
Presumably, this is because a 'not' may be cheaper than a generic 'xor' or it may get
folded into another logic op if the target has those. However, if we can remove a logic
instruction by changing the xor's constant mask value, that should always be a win.
Note that the IR version of SimplifyDemandedBits() does not treat 'not' as a special-case
currently (although that's marked with a FIXME). So if you run this IR through -instcombine,
you should get the same end result. I'm hoping to add a different backend transform that
will expose this problem though, so I need to solve this first.
Diana Picus [Mon, 5 Dec 2016 10:40:33 +0000 (10:40 +0000)]
[GlobalISel] Extract handleAssignments out of AArch64CallLowering
This function seems target-independent so far: all the target-specific behaviour
is isolated in the CCAssignFn and the ValueHandler (which we're also extracting
into the generic CallLowering).
Matthias Braun [Mon, 5 Dec 2016 08:15:57 +0000 (08:15 +0000)]
TableGen/AsmMatcherEmitter: Trust that stable_sort works
A debug build of AsmMatcherEmitter would use a quadratic algorithm to
check whether std::stable_sort() actually sorted. Let's hope the authors
of our C++ standard library did that testing for us. Removing the check
gives a 3x speedup in the X86 case.
Matthias Braun [Mon, 5 Dec 2016 07:35:09 +0000 (07:35 +0000)]
TableGen/Record: Shortcut member access in hottest function
This may seem unusual, but makes most debug tblgen builds ~10% faster.
Usually we wouldn't care about speed that much in debug builds, but for
tblgen that also translates into build time.
Matthias Braun [Mon, 5 Dec 2016 06:00:36 +0000 (06:00 +0000)]
TableGen: Use more StringInit instead of StringRef
This forces the code to call StringInit::get on the string early and
avoids storing duplicates in std::string and sometimes allows pointer
comparisons instead of string comparisons.
Kuba Mracek [Mon, 5 Dec 2016 05:21:44 +0000 (05:21 +0000)]
Use Darwin libtool's -no_warning_for_no_symbols if available to silence the "has no symbols" link warning
Building compiler-rt on Darwin produces dozens of meaningless warnings about object files having no symbols during static archive creation. This is very intentional as compiler-rt uses #ifdefs to conditionally compile platform-specific code, and we even have a .cpp source file that only contains static asserts to make sure the environment is configured right. On Linux, this situation is fine and no warning is produced. This patch adds a libtool version detection and if it's new enough, we'll use the -no_warning_for_no_symbols flag that suppresses this warning. Build logs should be much cleaner now!
Matthias Braun [Mon, 5 Dec 2016 05:21:18 +0000 (05:21 +0000)]
TableGen: Factor out STRCONCAT constructor, add shortcut.
Introduce new constructor for STRCONCAT binop with a shortcut that
immediately concatenates if the two arguments are StringInits.
Makes the QualifyName code more readable and tablegen 2-3% faster.
Craig Topper [Mon, 5 Dec 2016 04:51:31 +0000 (04:51 +0000)]
[AVX-512] Teach fast isel to use masked compare and movss for handling scalar cmp and select sequence when AVX-512 is enabled. This matches the behavior of normal isel.
Craig Topper [Mon, 5 Dec 2016 04:51:28 +0000 (04:51 +0000)]
[AVX-512] Add avx512f command lines to fast isel SSE select test.
Currently the fast isel code emits an avx1 instruction sequence even with avx512. This is different than normal isel. A follow up commit will fix this.
Chris Bieneman [Mon, 5 Dec 2016 03:28:03 +0000 (03:28 +0000)]
[CMake] Refactor add_llvm_tool_symlink for reuse
The old implementation of add_llvm_tool_symlink could fail in odd ways when building out of tree. This version solves that problem by not using the LLVM_* variables, and instead reaeding the target's properties.
lib/Target/AMDGPU/SIRegisterInfo.cpp: In member function 'void llvm::SIRegisterInfo::spillSGPR(llvm::MachineBasicBlock::iterator, int, llvm::RegScavenger*) const':
lib/Target/AMDGPU/SIRegisterInfo.cpp:572:30: warning: variable 'SubRC' set but not used [-Wunused-but-set-variable]
const TargetRegisterClass *SubRC = nullptr;
^
lib/Target/AMDGPU/SIRegisterInfo.cpp: In member function 'void llvm::SIRegisterInfo::restoreSGPR(llvm::MachineBasicBlock::iterator, int, llvm::RegScavenger*) const':
lib/Target/AMDGPU/SIRegisterInfo.cpp:723:30: warning: variable 'SubRC' set but not used [-Wunused-but-set-variable]
const TargetRegisterClass *SubRC = nullptr;
^
The variable was assigned to, but never used. The functions called did not
mutate state. Simplify the logic and remove the variable. Identified by gcc
5.4.0.
build: allow specifying the component to `llvm_install_symlink`
Add an optional parameter to `llvm_install_symlink` which allows the symlink
installation to be placed into a specific component rather than the default
value.
Matthias Braun [Sat, 3 Dec 2016 00:52:56 +0000 (00:52 +0000)]
AArch64CollectLOH: Rewrite as block-local analysis.
Previously this pass was using up to 5% compile time in some cases which
is a bit much for what it is doing. The pass featured a full blown
data-flow analysis which in the default configuration was restricted to a
single block.
This rewrites the pass under the assumption that we only ever work on a
single block. This is done in a single pass maintaining a state machine
per general purpose register to catch LOH patterns.
Guozhi Wei [Sat, 3 Dec 2016 00:41:43 +0000 (00:41 +0000)]
[ppc] Correctly compute the cost of loading 32/64 bit memory into VSR
VSX has instructions lxsiwax/lxsdx that can load 32/64 bit value into VSX register cheaply. That patch makes it known to memory cost model, so the vectorization of the test case in pr30990 is beneficial.
Ivan Krasin [Fri, 2 Dec 2016 23:30:16 +0000 (23:30 +0000)]
Support escaping in TrigramIndex.
Summary:
This is a follow up to r288303, where I have introduced TrigramIndex
to speed up SpecialCaseList for the cases when all rules are
simple wildcards, like *hello*wor.d*.
Here, I add support for escaping, so that it's possible to
specify rules like *c\+\+abi*.
Zachary Turner [Fri, 2 Dec 2016 23:02:01 +0000 (23:02 +0000)]
Resubmit "[LibFuzzer] Split FuzzerUtil for Posix and Windows."
This resubmits r288529, which was resubmitted because it broke a
fuzzer bot. According to kcc@ the test that broke was flakey
and it is unlikely to be a result of this patch.
Zachary Turner [Fri, 2 Dec 2016 19:41:17 +0000 (19:41 +0000)]
[LibFuzzer] Introduce a portable WeakAlias implementation.
Windows doesn't really support weak aliases, but with some
linker magic we can get something that's pretty close on
Windows. This introduces an interface to accessing weakly
aliased symbols that will work on any platform. Linker
magic changes to come in a separate patch.
Patch by Marcos Pividori
Differential Revision: https://reviews.llvm.org/D27235
Rong Xu [Fri, 2 Dec 2016 19:10:29 +0000 (19:10 +0000)]
[PGO] Fix PGO use ICE when there are unreachable BBs
For -O0 there might be unreachable BBs, which breaks the assumption that all the
BBs have an auxiliary data structure. In this patch, we add another interface
called findBBInfo() so that a nullptr can be returned for the unreachable BBs
(and the callers can ignore those BBs).
This fixes the bug reported
https://llvm.org/bugs/show_bug.cgi?id=31209
Ulrich Weigand [Fri, 2 Dec 2016 18:24:16 +0000 (18:24 +0000)]
[SystemZ] Support remaining atomic instructions
Add assembler support for all atomic instructions that weren't already
supported. Some of those could be used to implement codegen for 128-bit
atomic operations, but this isn't done here yet.
Ulrich Weigand [Fri, 2 Dec 2016 18:19:22 +0000 (18:19 +0000)]
[SystemZ] Refactor hasSideEffects setting
Move setting of hasSideEffects out of SystemZInstrFormats.td,
to allow use of the format classes for instructions where this
flag shouldn't be set. NFC.
Renato Golin [Fri, 2 Dec 2016 16:56:26 +0000 (16:56 +0000)]
Revert "[SLP] Fix for PR6246: vectorization for scalar ops on vector elements."
This reverts commit r288497, as it broke the AArch64 build of Compiler-RT's
builtins (twice: once in r288412 and once in r288497). We should investigate
this offline.
Nicolai Haehnle [Fri, 2 Dec 2016 16:06:18 +0000 (16:06 +0000)]
[DAGCombiner] do not fold (fmul (fadd X, 1), Y) -> (fmad X, Y, Y) by default
Summary:
When X = 0 and Y = inf, the original code produces inf, but the transformed
code produces nan. So this transform (and its relatives) should only be
used when the no-infs-fp-math flag is explicitly enabled.
Also disable the transform using fmad (intermediate rounding) when unsafe-math
is not enabled, since it can reduce the precision of the result; consider this
example with binary floating point numbers with two bits of mantissa:
x = 1.01
y = 111
x * (y + 1) = 1.01 * 1000 = 1010 (this is the exact result; no rounding occurs at any step)
x * y + x = 1000.11 + 1.01 =r 1000 + 1.01 = 1001.01 =r 1000 (with rounding towards zero)
The example relies on rounding towards zero at least in the second step.
Alexey Bataev [Fri, 2 Dec 2016 12:20:22 +0000 (12:20 +0000)]
[SLP] Fix for PR6246: vectorization for scalar ops on vector elements.
When trying to vectorize trees that start at insertelement instructions
function tryToVectorizeList() uses vectorization factor calculated as
MinVecRegSize/ScalarTypeSize. But sometimes it does not work as tree
cost for this fixed vectorization factor is too high.
Patch tries to improve the situation. It tries different vectorization
factors from max(PowerOf2Floor(NumberOfVectorizedValues),
MinVecRegSize/ScalarTypeSize) to MinVecRegSize/ScalarTypeSize and tries
to choose the best one.
Simon Pilgrim [Fri, 2 Dec 2016 11:58:05 +0000 (11:58 +0000)]
[X86] Refactored getTargetConstantBitsFromNode to allow for expansion. NFCI.
getTargetConstantBitsFromNode currently only extracts constant pool vector data, but it will need to be generalized to support broadcast and scalar constant pool data as well.
Converted Constant bit extraction and Bitset splitting to helper lambda functions.