Matthias Braun [Mon, 5 Dec 2016 06:00:36 +0000 (06:00 +0000)]
TableGen: Use more StringInit instead of StringRef
This forces the code to call StringInit::get on the string early and
avoids storing duplicates in std::string and sometimes allows pointer
comparisons instead of string comparisons.
Kuba Mracek [Mon, 5 Dec 2016 05:21:44 +0000 (05:21 +0000)]
Use Darwin libtool's -no_warning_for_no_symbols if available to silence the "has no symbols" link warning
Building compiler-rt on Darwin produces dozens of meaningless warnings about object files having no symbols during static archive creation. This is very intentional as compiler-rt uses #ifdefs to conditionally compile platform-specific code, and we even have a .cpp source file that only contains static asserts to make sure the environment is configured right. On Linux, this situation is fine and no warning is produced. This patch adds a libtool version detection and if it's new enough, we'll use the -no_warning_for_no_symbols flag that suppresses this warning. Build logs should be much cleaner now!
Matthias Braun [Mon, 5 Dec 2016 05:21:18 +0000 (05:21 +0000)]
TableGen: Factor out STRCONCAT constructor, add shortcut.
Introduce new constructor for STRCONCAT binop with a shortcut that
immediately concatenates if the two arguments are StringInits.
Makes the QualifyName code more readable and tablegen 2-3% faster.
Craig Topper [Mon, 5 Dec 2016 04:51:31 +0000 (04:51 +0000)]
[AVX-512] Teach fast isel to use masked compare and movss for handling scalar cmp and select sequence when AVX-512 is enabled. This matches the behavior of normal isel.
Craig Topper [Mon, 5 Dec 2016 04:51:28 +0000 (04:51 +0000)]
[AVX-512] Add avx512f command lines to fast isel SSE select test.
Currently the fast isel code emits an avx1 instruction sequence even with avx512. This is different than normal isel. A follow up commit will fix this.
Chris Bieneman [Mon, 5 Dec 2016 03:28:03 +0000 (03:28 +0000)]
[CMake] Refactor add_llvm_tool_symlink for reuse
The old implementation of add_llvm_tool_symlink could fail in odd ways when building out of tree. This version solves that problem by not using the LLVM_* variables, and instead reaeding the target's properties.
lib/Target/AMDGPU/SIRegisterInfo.cpp: In member function 'void llvm::SIRegisterInfo::spillSGPR(llvm::MachineBasicBlock::iterator, int, llvm::RegScavenger*) const':
lib/Target/AMDGPU/SIRegisterInfo.cpp:572:30: warning: variable 'SubRC' set but not used [-Wunused-but-set-variable]
const TargetRegisterClass *SubRC = nullptr;
^
lib/Target/AMDGPU/SIRegisterInfo.cpp: In member function 'void llvm::SIRegisterInfo::restoreSGPR(llvm::MachineBasicBlock::iterator, int, llvm::RegScavenger*) const':
lib/Target/AMDGPU/SIRegisterInfo.cpp:723:30: warning: variable 'SubRC' set but not used [-Wunused-but-set-variable]
const TargetRegisterClass *SubRC = nullptr;
^
The variable was assigned to, but never used. The functions called did not
mutate state. Simplify the logic and remove the variable. Identified by gcc
5.4.0.
build: allow specifying the component to `llvm_install_symlink`
Add an optional parameter to `llvm_install_symlink` which allows the symlink
installation to be placed into a specific component rather than the default
value.
Matthias Braun [Sat, 3 Dec 2016 00:52:56 +0000 (00:52 +0000)]
AArch64CollectLOH: Rewrite as block-local analysis.
Previously this pass was using up to 5% compile time in some cases which
is a bit much for what it is doing. The pass featured a full blown
data-flow analysis which in the default configuration was restricted to a
single block.
This rewrites the pass under the assumption that we only ever work on a
single block. This is done in a single pass maintaining a state machine
per general purpose register to catch LOH patterns.
Guozhi Wei [Sat, 3 Dec 2016 00:41:43 +0000 (00:41 +0000)]
[ppc] Correctly compute the cost of loading 32/64 bit memory into VSR
VSX has instructions lxsiwax/lxsdx that can load 32/64 bit value into VSX register cheaply. That patch makes it known to memory cost model, so the vectorization of the test case in pr30990 is beneficial.
Ivan Krasin [Fri, 2 Dec 2016 23:30:16 +0000 (23:30 +0000)]
Support escaping in TrigramIndex.
Summary:
This is a follow up to r288303, where I have introduced TrigramIndex
to speed up SpecialCaseList for the cases when all rules are
simple wildcards, like *hello*wor.d*.
Here, I add support for escaping, so that it's possible to
specify rules like *c\+\+abi*.
Zachary Turner [Fri, 2 Dec 2016 23:02:01 +0000 (23:02 +0000)]
Resubmit "[LibFuzzer] Split FuzzerUtil for Posix and Windows."
This resubmits r288529, which was resubmitted because it broke a
fuzzer bot. According to kcc@ the test that broke was flakey
and it is unlikely to be a result of this patch.
Zachary Turner [Fri, 2 Dec 2016 19:41:17 +0000 (19:41 +0000)]
[LibFuzzer] Introduce a portable WeakAlias implementation.
Windows doesn't really support weak aliases, but with some
linker magic we can get something that's pretty close on
Windows. This introduces an interface to accessing weakly
aliased symbols that will work on any platform. Linker
magic changes to come in a separate patch.
Patch by Marcos Pividori
Differential Revision: https://reviews.llvm.org/D27235
Rong Xu [Fri, 2 Dec 2016 19:10:29 +0000 (19:10 +0000)]
[PGO] Fix PGO use ICE when there are unreachable BBs
For -O0 there might be unreachable BBs, which breaks the assumption that all the
BBs have an auxiliary data structure. In this patch, we add another interface
called findBBInfo() so that a nullptr can be returned for the unreachable BBs
(and the callers can ignore those BBs).
This fixes the bug reported
https://llvm.org/bugs/show_bug.cgi?id=31209
Ulrich Weigand [Fri, 2 Dec 2016 18:24:16 +0000 (18:24 +0000)]
[SystemZ] Support remaining atomic instructions
Add assembler support for all atomic instructions that weren't already
supported. Some of those could be used to implement codegen for 128-bit
atomic operations, but this isn't done here yet.
Ulrich Weigand [Fri, 2 Dec 2016 18:19:22 +0000 (18:19 +0000)]
[SystemZ] Refactor hasSideEffects setting
Move setting of hasSideEffects out of SystemZInstrFormats.td,
to allow use of the format classes for instructions where this
flag shouldn't be set. NFC.
Renato Golin [Fri, 2 Dec 2016 16:56:26 +0000 (16:56 +0000)]
Revert "[SLP] Fix for PR6246: vectorization for scalar ops on vector elements."
This reverts commit r288497, as it broke the AArch64 build of Compiler-RT's
builtins (twice: once in r288412 and once in r288497). We should investigate
this offline.
Nicolai Haehnle [Fri, 2 Dec 2016 16:06:18 +0000 (16:06 +0000)]
[DAGCombiner] do not fold (fmul (fadd X, 1), Y) -> (fmad X, Y, Y) by default
Summary:
When X = 0 and Y = inf, the original code produces inf, but the transformed
code produces nan. So this transform (and its relatives) should only be
used when the no-infs-fp-math flag is explicitly enabled.
Also disable the transform using fmad (intermediate rounding) when unsafe-math
is not enabled, since it can reduce the precision of the result; consider this
example with binary floating point numbers with two bits of mantissa:
x = 1.01
y = 111
x * (y + 1) = 1.01 * 1000 = 1010 (this is the exact result; no rounding occurs at any step)
x * y + x = 1000.11 + 1.01 =r 1000 + 1.01 = 1001.01 =r 1000 (with rounding towards zero)
The example relies on rounding towards zero at least in the second step.
Alexey Bataev [Fri, 2 Dec 2016 12:20:22 +0000 (12:20 +0000)]
[SLP] Fix for PR6246: vectorization for scalar ops on vector elements.
When trying to vectorize trees that start at insertelement instructions
function tryToVectorizeList() uses vectorization factor calculated as
MinVecRegSize/ScalarTypeSize. But sometimes it does not work as tree
cost for this fixed vectorization factor is too high.
Patch tries to improve the situation. It tries different vectorization
factors from max(PowerOf2Floor(NumberOfVectorizedValues),
MinVecRegSize/ScalarTypeSize) to MinVecRegSize/ScalarTypeSize and tries
to choose the best one.
Simon Pilgrim [Fri, 2 Dec 2016 11:58:05 +0000 (11:58 +0000)]
[X86] Refactored getTargetConstantBitsFromNode to allow for expansion. NFCI.
getTargetConstantBitsFromNode currently only extracts constant pool vector data, but it will need to be generalized to support broadcast and scalar constant pool data as well.
Converted Constant bit extraction and Bitset splitting to helper lambda functions.
IR: Move NumElements field from {Array,Vector}Type to SequentialType.
Now that PointerType is no longer a SequentialType, all SequentialTypes
have an associated number of elements, so we can move that information to
the base class, allowing for a number of simplifications.
IR: Change PointerType to derive from Type rather than SequentialType.
As proposed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2016-October/106640.html
This is for a couple of reasons:
- Values of type PointerType are unlike the other SequentialTypes (arrays
and vectors) in that they do not hold values of the element type. By moving
PointerType we can unify certain aspects of how the other SequentialTypes
are handled.
- PointerType will have no place in the SequentialType hierarchy once
pointee types are removed, so this is a necessary step towards removing
pointee types.
IR: Change the gep_type_iterator API to avoid always exposing the "current" type.
Instead, expose whether the current type is an array or a struct, if an array
what the upper bound is, and if a struct the struct type itself. This is
in preparation for a later change which will make PointerType derive from
Type rather than SequentialType.
Paul Robinson [Fri, 2 Dec 2016 01:55:17 +0000 (01:55 +0000)]
[DWARF] Put linkage-name on abstract origin even when there's a declaration.
In r266692, we made it possible to emit linkage names for just inlined
functions, putting the attribute on the abstract origin. Make sure we
don't think the linkage-name was already emitted on a declaration.
Teresa Johnson [Fri, 2 Dec 2016 01:02:30 +0000 (01:02 +0000)]
[ThinLTO] Stop importing constant global vars as copies in the backend
Summary:
We were doing an optimization in the ThinLTO backends of importing
constant unnamed_addr globals unconditionally as a local copy (regardless
of whether the thin link decided to import them). This should be done in
the thin link instead, so that resulting exported references are marked
and promoted appropriately, but will need a summary enhancement to mark
these variables as constant unnamed_addr.
The function import logic during the thin link was trying to handle
this proactively, by conservatively marking all values referenced in
the initializer lists of exported global variables as also exported.
However, this only handled values referenced directly from the
initializer list of an exported global variable. If the value is itself
a constant unnamed_addr variable, we could end up exporting its
references as well. This caused multiple issues. The first is that the
transitively exported references weren't promoted. Secondly, some could
not be promoted/renamed (e.g. they had a section or other constraint).
recursively, instead of just adding the first level of initializer list
references to the ExportList directly.
Remove this optimization and the associated handling in the function
import backend. SPEC measurements indicate we weren't getting much
from it in any case.
Matt Arsenault [Fri, 2 Dec 2016 00:54:45 +0000 (00:54 +0000)]
AMDGPU: Use wider scalar spills for SGPR spilling
Since the spill is for the whole wave, these
don't have the swizzling problems that vector stores do
and a single 4-byte allocation is enough to spill a 64 element
register. This should reduce the number of spill instructions and
put all the spills for a register in the same cacheline.
This should save allocated private size, but for now it doesn't.
The extra slots are allocated for each component, but never used
because the frame layout is essentially finalized before frame
indices are replaced. For always using the scalar store path,
this should probably be moved into processFunctionBeforeFrameFinalized.
Wolfgang Pieb [Fri, 2 Dec 2016 00:37:57 +0000 (00:37 +0000)]
When instructions are hoisted out of loops by MachineLICM, remove their debug loc.
This prevents erratic stepping behavior as well as incorrect source attribution
for sample profiling.
Justin Bogner [Fri, 2 Dec 2016 00:11:01 +0000 (00:11 +0000)]
SDAG: Avoid a large, usually empty SmallVector in a recursive function
This SmallVector is using up 128 bytes on the stack every time despite
almost always being empty[1], and since this function can recurse quite
deeply that adds up to a lot of overhead. We've seen this run afoul of
ulimits in some cases with ASAN on.
Replacing the SmallVector with a std::vector trades an occasional heap
allocation for vastly less stack usage.
[1]: I gathered some stats on an internal test suite and the vector
was non-empty in only 45,000 of 10,000,000 calls to this function.