Tim Northover [Mon, 6 Mar 2017 18:23:04 +0000 (18:23 +0000)]
GlobalISel: refactor legalization of G_INSERT.
Now that G_INSERT instructions can only insert one register, this code was
overly general. In another direction it didn't handle registers that crossed
split boundaries properly, which needed to be fixed.
Dehao Chen [Mon, 6 Mar 2017 17:49:59 +0000 (17:49 +0000)]
Remove the sample pgo annotation heuristic that uses call count to annotate basic block count.
Summary: We do not need that special handling because the debug info is more accurate now. Performance testing shows no regression on google internal benchmarks.
[Hexagon] Early-if-convert branches that may exit the loop
Merge the tail block into the loop in cases where the main loop body
exits early, subject to profitability constraints. This will coalesce
the loop body into fewer blocks.
For example:
loop: loop:
// loop body // loop body
if (...) jump exit --> // more body
more: if (...) jump exit
// more body jump loop
jump loop
[Hexagon] Mark dead defs as <dead> in expand-condsets
The code in updateDeadFlags removed unnecessary <dead> flags, but there
can be cases where such a flag is not set, and yet a register has become
dead. For example, if a mux with identical inputs is replaced with a COPY,
the predicate register may no longer be used after that.
Sanjay Patel [Mon, 6 Mar 2017 16:36:42 +0000 (16:36 +0000)]
[DAGCombiner] simplify div/rem-by-0
Refactoring of duplicated code and more fixes to follow.
This is motivated by the post-commit comments for r296699:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20170306/435182.html
Ie, we can crash if we're missing obvious simplifications like this that
exist in the IR simplifier or if these occur later than expected.
The x86 change for non-splat division shows a potential opportunity to improve
vector codegen: we assumed that since only one lane had meaningful results, we
should do the math in scalar. But that means moving back and forth from vector
registers.
Sanjay Patel [Mon, 6 Mar 2017 15:50:07 +0000 (15:50 +0000)]
[x86] add tests to show missing div/rem simplifications; NFC
These are not x86-specific, but the problem is not visible for all targets
because it is masked by other transforms. These can lead to compiler crashes.
Michael Kruse [Mon, 6 Mar 2017 15:33:05 +0000 (15:33 +0000)]
[BasicBlockUtils] Check for nullptr before updating LoopInfo.
LoopInfo::getLoopFor returns nullptr if a BB is not in a loop and only
then can the loop be updated to contain the newly created BBs. Add the
missing nullptr check to SplitBlockAndInsertIfThen.
Within LLVM, the only user of this function that also passes a LoopInfo
to be updated is InnerLoopVectorizer::predicateInstructions().
As the method's name implies, the BB operataten on will always be within
a loop, but out-of-tree users may also use it differently (here: Polly).
All other uses of LoopInfo::getLoopFor in the file properly check its
return value for nullptr.
Tom Stellard [Mon, 6 Mar 2017 14:26:50 +0000 (14:26 +0000)]
CMake: Add a build target for generating a source RPM
Summary:
'make srpm' or 'ninja srpm' will tar up the current source code and then
build a source RPM package.
By default it will use the llvm.spec file to generate the source RPM,
but you can specify your own custom spec file with the
LLVM_SRPM_USER_BINARY_SPECFILE option. CMake will perform variable
substitution on your custom specfile, so you can reference CMake
variables in it. For example:
Version: @LLVM_RPM_SPEC_VERSION@
Note that everything in the source directory will be included in the
tarball so if you have a clang check out in tools/clang, then all
the clang source will end up in the tarball to. It is recommended
to only use this build target with a clean source tree.
[XRay] Allow logging the first argument of a function call.
Summary:
Functions with the "xray-log-args" attribute will have a special XRay sled kind
emitted, for compiler-rt to copy any call arguments to your logging handler.
For practical and performance reasons, only the first argument is supported, and
only up to 64 bits.
Craig Topper [Mon, 6 Mar 2017 06:30:47 +0000 (06:30 +0000)]
[APInt] Move operator~ out of line to make it better able to reused memory allocation from temporary objects
Summary:
This makes operator~ take the APInt by value so if it came from a temporary APInt the move constructor will get invoked and it will be able to reuse the memory allocation from the temporary.
This is similar to what was already done for 2s complement negation.
Sanjoy Das [Sun, 5 Mar 2017 23:49:17 +0000 (23:49 +0000)]
[SCEV] Decrease the recursion threshold for CompareValueComplexity
Fixes PR32142.
r287232 accidentally increased the recursion threshold for
CompareValueComplexity from 2 to 32. This change reverses that change
by introducing a separate flag for CompareValueComplexity's threshold.
Tobias Grosser [Sun, 5 Mar 2017 14:08:28 +0000 (14:08 +0000)]
New Test-Case for Region Analysis
While working on improvements to the region info analysis, this test case caused
an incorrect region 1 => 2 to be detected. It is incorrect because entry has an
outgoing edge to 3. This is interesting because 1 dom 2 and 2 pdom 1, which
should have been enough to prevent incoming forward edges into the region and
outgoing forward edges from the region.
Simon Pilgrim [Sun, 5 Mar 2017 09:57:20 +0000 (09:57 +0000)]
[X86][SSE] Lower 128-bit vectors to SIGN/ZERO_EXTEND_VECTOR_IN_REG ops
As described on PR31712, we miss a variety of legalization combines because we lower these to X86ISD::VSEXT/VZEXT despite them having the same functionality. This patch makes 128-bit (SSE41) SIGN/ZERO_EXTEND_VECTOR_IN_REG ops legal, adds the necessary tablegen plumbing and uses a helper 'getExtendInVec' to decide when to use SIGN/ZERO_EXTEND_VECTOR_IN_REG or VSEXT/VZEXT.
We're missing a couple of shuffle combines that will be added in a future patch for review.
Later patches can then support the AVX2 cases as a mixture of SIGN/ZERO_EXTEND and SIGN/ZERO_EXTEND_VECTOR_IN_REG, and then finally deal with the AVX512 cases.
Craig Topper [Sun, 5 Mar 2017 01:08:16 +0000 (01:08 +0000)]
[DAGCombine] Use APInt::operator|(uint64_t) instead of creating a temporary APInt and calling APInt::Or. NFC
This is more efficient by itself. But this is prep for a future patch that may remove APInt::Or while making operator| support rvalue references similar to add/sub.
Sanjay Patel [Sat, 4 Mar 2017 20:35:19 +0000 (20:35 +0000)]
[x86] don't require a zext when forming ADC/SBB
The larger goal is to move the ADC/SBB transforms currently in
combineX86SetCC() to combineAddOrSubToADCOrSBB() because we're
creating ADC/SBB in lots of places where we shouldn't.
This was intended to be an NFC change, but avx-512 has something
strange going on. It doesn't seem like any of the affected tests
should really be using SET+TEST or ADC; a simple ADD could replace
several instructions. But that's another bug...
Sanjay Patel [Sat, 4 Mar 2017 19:18:09 +0000 (19:18 +0000)]
[DAGCombiner] allow transforming (select Cond, C +/- 1, C) to (add(ext Cond), C)
select Cond, C +/- 1, C --> add(ext Cond), C -- with a target hook.
This is part of the ongoing process to obsolete D24480. The motivation is to
canonicalize to select IR in InstCombine whenever possible, so we need to have a way to
undo that easily in codegen.
PowerPC is an obvious winner for this kind of transform because it has fast and complete
bit-twiddling abilities but generally lousy conditional execution perf (although this might
have changed in recent implementations).
x86 also sees some wins, but the effect is limited because these transforms already mostly
exist in its target-specific combineSelectOfTwoConstants(). The fact that we see any x86
changes just shows that that code is a mess of special-case holes. We may be able to remove
some of that logic now.
My guess is that other targets will want to enable this hook for most cases. The likely
follow-ups would be to add value type and/or the constants themselves as parameters for the
hook. As the tests in select_const.ll show, we can transform any select-of-constants to
math/logic, but the general transform for any 2 constants needs one more instruction
(multiply or 'and').
ARM is one target that I think may not want this for most cases. I see infinite loops there
because it wants to use selects to enable conditionally executed instructions.
Simon Pilgrim [Sat, 4 Mar 2017 12:50:47 +0000 (12:50 +0000)]
[X86][SSE] Enable post-legalize vXi64 shuffle combining on 32-bit targets
Long ago (2010 according to svn blame), combineShuffle probably needed to prevent the accidental creation of illegal i64 types but there doesn't appear to be any combines that can cause this any more as they all have their own legality checks.
WholeProgramDevirt: Add any unsuccessful llvm.type.checked.load devirtualizations to the list of llvm.type.test users.
Any unsuccessful llvm.type.checked.load devirtualizations will be translated
into uses of llvm.type.test, so we need to add the resulting llvm.type.test
intrinsics to the function summaries so that the LowerTypeTests pass will
export them.
Matthias Braun [Fri, 3 Mar 2017 23:27:20 +0000 (23:27 +0000)]
RegAllocGreedy: Follow-up to r296722
We can now end up in situations where we initiate LiveIntervalUnion
queries with different SubRanges against the same register unit, so the
assert() no longer holds in all cases. Just recalculate now when we know
the cache is out of date.
Tim Northover [Fri, 3 Mar 2017 22:46:09 +0000 (22:46 +0000)]
GlobalISel: add merge/unmerge nodes for legalization.
These are simplified variants of the current G_SEQUENCE and G_EXTRACT, which
assume the individual parts will be contiguous, homogeneous, and occupy the
entirity of the larger register. This makes reasoning about them much easer
since you only have to look at the first register being merged and the result
to know what the instruction is doing.
I intend to gradually replace all uses of the more complicated sequence/extract
with these (or single-element insert/extracts), and then remove the older
variants. For now we start with legalization.
Sanjay Patel [Fri, 3 Mar 2017 22:35:11 +0000 (22:35 +0000)]
[x86] refactor combineAddOrSubToADCOrSBB(); NFCI
The comments were wrong, and this is not an obvious transform.
This hopefully makes it clearer that we're missing the commuted
patterns for adds. It's less clear that this is actually a good
transform for all micro-arch.
This is prep work for trying to clean up the current adc/sbb
codegen because it's definitely not happening optimally.
In the DWARF 4 Spec section 7.2.2, data in many DWARF sections, and some DWARF structures start with "Initial Length Values", which are a 32-bit length, and an optional 64-bit length if the 32 bit value == UINT32_MAX.
This patch abstracts the Initial Length type in YAML, and extends its use to all the DWARF structures that are supported in the DWARFYAML code that have Initial Length values.
LTO: Hash the set of imported symbols for each module.
This set may affect code generation and is sensitive to link order (and
possibly in the future to the linker's choice of prevailing symbol), so we
need to include it.
Zachary Turner [Fri, 3 Mar 2017 20:21:59 +0000 (20:21 +0000)]
[Windows] Remove the #include <eh.h> hack.
Prior to MSVC 2015 we had to manually include this header any
time we were going to include <thread> or <future> due to a
bug in MSVC's STL implementation. This has been fixed in MSVC
for some time now, and we require VS 2015 minimum, so we can
remove this across all subprojects.
Zachary Turner [Fri, 3 Mar 2017 18:55:24 +0000 (18:55 +0000)]
Teach lit to expand glob expressions.
This will enable removing hacks throughout the codebase
in clang and compiler-rt that feed multiple inputs to a
testing utility by globbing, all of which are either disabled
on Windows currently or using xargs / find hacks.
Sanjoy Das [Fri, 3 Mar 2017 18:19:15 +0000 (18:19 +0000)]
[LoopUnrolling] Peel loops with invariant backedge Phi input
Summary:
If a loop contains a Phi node which has an invariant input from back
edge, it is profitable to peel such loops (rather than unroll them) to
use the advantage that this Phi is always invariant starting from 2nd
iteration. After the 1st iteration is peeled, other optimizations can
potentially simplify calculations with this invariant.
Sanjoy Das [Fri, 3 Mar 2017 18:19:10 +0000 (18:19 +0000)]
[LoopUnrolling] Re-prioritize Peeling and Partial unrolling
Summary:
In current implementation the loop peeling happens after trip-count based partial unrolling and may
sometimes not happen at all due to it (for example, if trip count is known, but UP.Partial = false). This
is generally bad, the more than there are some situations where peeling is profitable even if the partial
unrolling is disabled.
This patch is a NFC which reorders peeling and partial unrolling application and prepares the code for
implementation of the said optimizations.
Zachary Turner [Fri, 3 Mar 2017 17:56:14 +0000 (17:56 +0000)]
Try to appease the FreeBSD bots.
pthread_self() returns a pthread_t, but we were setting it to
an int. It seems the cast to int when calling sysctl is still
the correct thing to do, though.
Zachary Turner [Fri, 3 Mar 2017 17:39:24 +0000 (17:39 +0000)]
Don't bring in llvm/Support/thread.h in Threading.cpp
Doing so defines the type llvm::thread. On FreeBSD, we need
to call a macro which references its own ::thread type, which
causes an ambiguity due to ADL when inside of the llvm namespace.
Since we don't even need this unless LLVM_ENABLE_THREADS == 1,
we don't even need this type anyway, as it is always equal to
std::thread, so we can just use that directly.
Zachary Turner [Fri, 3 Mar 2017 17:15:17 +0000 (17:15 +0000)]
[Support] Provide access to current thread name/thread id.
Applications often need the current thread id when making
system calls, and some operating systems provide the notion
of a thread name, which can be useful in enabling better
diagnostics when debugging or logging.
This patch adds an accessor for the thread id, and "best effort"
getters and setters for the thread name. Since this is
non critical functionality, no error is returned to indicate
that a platform doesn't support thread names.
Added code to check constant bus restrictions for VOP formats (only one SGPR value or literal-constant may be used by the instruction).
Note that the same checks are performed by SIInstrInfo::verifyInstruction (used by lowering code).
Added LIT tests.
Ranjeet Singh [Fri, 3 Mar 2017 11:40:07 +0000 (11:40 +0000)]
[ARM] fpscr read/write intrinsics not aware of each other
The intrinsics __builtin_arm_get_fpscr and __builtin_arm_set_fpscr read and
write to the fpscr (Floating-Point Status and Control Register) register.
A bug exists in the __builtin_arm_get_fpscr intrinsic definition in llvm which
treats this intrinsic as a IntroNoMem which means it's not a memory access and
doesn't have any other side-effects. Having this property on this intrinsic
means that various optimizations can be done on this such as common
sub-expression elimination with other reads. This can cause issues if there has
been write to this register, e.g.
in the above example the second read is currently CSE'd into the first read,
this is because llvm isn't aware that the write done by __builtin_arm_set_fpscr
effects the same register that __builtin_arm_get_fpscr reads from, to fix this
problem I've removed the property IntrNoMem so that __builtin_arm_get_fpscr is
treated as a memory access.