NAKAMURA Takumi [Sun, 8 Nov 2015 09:45:06 +0000 (09:45 +0000)]
Appease hosts without HAVE_BACKTRACE nor ENABLE_BACKTRACES.
llvm/lib/Support/Signals.cpp:66:13: warning: unused function 'printSymbolizedStackTrace' [-Wunused-function]
llvm/lib/Support/Signals.cpp:52:13: warning: function 'findModulesAndOffsets' has internal linkage but is not defined [-Wundefined-internal]
Hal Finkel [Sun, 8 Nov 2015 08:04:40 +0000 (08:04 +0000)]
[PowerPC] Fix LoopPreIncPrep not to depend on SCEV constant simplifications
Under most circumstances, if SCEV can simplify X-Y to a constant, then it can
also simplify Y-X to a constant. However, there is no guarantee that this is
always true, and concensus is not to consider that a correctness bug in SCEV
(although it is undesirable).
PPCLoopPreIncPrep gathers pointers used to access memory (via loads, stores and
prefetches) into buckets, where in each bucket the relative pointer offsets are
constant. We used to keep each bucket as a multimap, where SCEV's subtraction
operation was used to define the ordering predicate. Instead, use a fixed SCEV
base expression for each bucket, record the constant offsets from that base
expression, and adjust it later, if desirable, once all pointers have been
collected.
Doing it this way should be more compile-time efficient than the previous
scheme (in addition to making the implementation less sensitive to SCEV
simplification quirks).
David Majnemer [Sun, 8 Nov 2015 02:36:00 +0000 (02:36 +0000)]
[WinEH] Update PHIs of CATCHRET successors
The TailDuplication machine pass ran across a malformed CFG: a PHI node
referred it's predecessor's predecessor instead of it's predecessor.
This occurred because we split the edge in X86ISelLowering when we
processed the CATCHRET but forgot to do something about the PHI nodes.
Sanjoy Das [Sat, 7 Nov 2015 01:55:53 +0000 (01:55 +0000)]
[FunctionAttrs] Fix an iterator wraparound bug
Summary:
This change fixes an iterator wraparound bug in
`determinePointerReadAttrs`.
Ideally, ++'ing off the `end()` of an iplist should result in a failed
assert, but currently iplist seems to silently wrap to the head of the
list on `end()++`. This is why the bad behavior is difficult to
demonstrate.
Disallow implicit conversions between ilist iterators and element
points. Explicit conversions still work of course.
This is the first step toward removing the undefined behaviour in
`ilist` and `iplist`:
http://lists.llvm.org/pipermail/llvm-dev/2015-October/091115.html
The motivation for removing the implicit iterators is that I came across
real bugs (that were *really* getting lucky). More details and some
brief discussion later in that thread:
http://lists.llvm.org/pipermail/llvm-dev/2015-October/091617.html
Note: if you have out-of-tree code, it should be fairly easy to revert
this patch downstream while you update your out-of-tree call sites.
Note that these conversions are occasionally latent bugs (that may
happen to "work" now, but only because of getting lucky with UB;
follow-ups will change your luck). When they are valid, I suggest using
`->getIterator()` to go from pointer to iterator, and `&*` to go from
iterator to pointer.
David Majnemer [Sat, 7 Nov 2015 00:52:53 +0000 (00:52 +0000)]
[InstCombine] Teach FoldPHIArgZextsIntoPHI about EHPads
FoldPHIArgZextsIntoPHI cannot insert an instruction after the PHI if
there is an EHPad in the BB. Doing so would result in an instruction
inserted after a terminator.
This reverts commit r252372. Apparently I missed clang-tools-extra.
http://lab.llvm.org:8011/builders/llvm-clang-lld-x86_64-scei-ps4-ubuntu-fast/builds/2534/steps/build/logs/stdio
Disallow implicit conversions between ilist iterators and element
points. Explicit conversions still work of course.
This is the first step toward removing the undefined behaviour in
`ilist` and `iplist`:
http://lists.llvm.org/pipermail/llvm-dev/2015-October/091115.html
The motivation for removing the implicit iterators is that I came across
real bugs (that were *really* getting lucky). More details and some
brief discussion later in that thread:
http://lists.llvm.org/pipermail/llvm-dev/2015-October/091617.html
Note: if you have out-of-tree code, it should be fairly easy to revert
this patch downstream while you update your out-of-tree call sites.
Note that these conversions are occasionally latent bugs (that may
happen to "work" now, but only because of getting lucky with UB;
follow-ups will change your luck). When they are valid, I suggest using
`->getIterator()` to go from pointer to iterator, and `&*` to go from
iterator to pointer.
Akira Hatanaka [Fri, 6 Nov 2015 23:55:38 +0000 (23:55 +0000)]
Add 'notail' marker for call instructions.
This marker prevents optimization passes from adding 'tail' or
'musttail' markers to a call. Is is used to prevent tail call
optimization from being performed on the call.
Pawel Bylica [Fri, 6 Nov 2015 23:21:49 +0000 (23:21 +0000)]
[Support] Use GetTempDir to get the temporary dir path on Windows.
Summary:
In general GetTempDir follows the same logic as the replaced code: checks env variables TMP, TEMP, USERPROFILE in order. However, it also perform other checks like making separators native (\), making the path absolute, etc.
This change fixes FileSystemTest.CreateDir unittest that had been failing when run from Unix-like shell on Windows (Unix-like path separator (/) used in env variables).
Ahmed Bougacha [Fri, 6 Nov 2015 23:16:53 +0000 (23:16 +0000)]
[AArch64][FastISel] Don't even try to select vector icmps.
We used to try to constant-fold them to i32 immediates.
Given that fast-isel doesn't otherwise support vNi1, when selecting
the result users, we'd fallback to SDAG anyway.
However, if the users were in another block, we'd insert broken
cross-class copies (GPR32 to FPR64).
Give up, let SDAG agree with itself on a vNi1 legalization strategy.
Ahmed Bougacha [Fri, 6 Nov 2015 23:16:48 +0000 (23:16 +0000)]
[X86] Fold (trunc (i32 (zextload i16))) into vbroadcast.
When matching non-LSB-extracting truncating broadcasts, we now insert
the necessary SRL. If the scalar resulted from a load, the SRL will be
folded into it, creating a narrower, offset, load.
However, i16 loads aren't Desirable, so we get i16->i32 zextloads.
We already catch i16 aextloads; catch these as well.
Ahmed Bougacha [Fri, 6 Nov 2015 23:16:43 +0000 (23:16 +0000)]
[X86] SRL non-LSB extracts when folding to truncating broadcasts.
Now that we recognize this, we can support it instead of bailing out.
That is, we can fold:
(v8i16 (shufflevector
(v8i16 (bitcast (v4i32 (build_vector X, Y, ...)))),
<1,1,...,1>))
into:
(v8i16 (vbroadcast (i16 (trunc (srl Y, 16)))))
Ahmed Bougacha [Fri, 6 Nov 2015 23:16:38 +0000 (23:16 +0000)]
[X86] Don't fold non-LSB extracts into truncating broadcasts.
We used to incorrectly assume that the offset we're extracting from
was a multiple of the element size. So, we'd fold:
(v8i16 (shufflevector
(v8i16 (bitcast (v4i32 (build_vector X, Y, ...)))),
<1,1,...,1>))
into:
(v8i16 (vbroadcast (i16 (trunc Y))))
whereas we should have extracted the higher bits from X.
Mehdi Amini [Fri, 6 Nov 2015 20:17:51 +0000 (20:17 +0000)]
Fix SLPVectorizer commutativity reordering
The SLPVectorizer had a very crude way of trying to benefit
from associativity: it tried to optimize for splat/broadcast
or in order to have the same operator on the same side.
This is benefitial to the cost model and allows more vectorization
to occur.
This patch improve the logic and make the detection optimal (locally,
we don't look at the full tree but only at the immediate children).
Should fix https://llvm.org/bugs/show_bug.cgi?id=25247
Sanjoy Das [Fri, 6 Nov 2015 19:01:08 +0000 (19:01 +0000)]
[ValueTracking] Add parameters to isImpliedCondition; NFC
Summary:
This change makes the `isImpliedCondition` interface similar to the rest
of the functions in ValueTracking (in that it takes a DataLayout,
AssumptionCache etc.). This is an NFC, intended to make a later diff
less noisy.
Sanjoy Das [Fri, 6 Nov 2015 19:01:03 +0000 (19:01 +0000)]
[ValueTracking] De-pessimize isImpliedCondition around unsigned compares
Summary:
Currently `isImpliedCondition` will optimize "I +_nuw C < L ==> I < L"
only if C is positive. This is an unnecessary restriction -- the
implication holds even if `C` is negative.
Sanjoy Das [Fri, 6 Nov 2015 19:00:57 +0000 (19:00 +0000)]
[ValueTracking] Add a framework for encoding implication rules
Summary:
This change adds a framework for adding more smarts to
`isImpliedCondition` around inequalities. Informally,
`isImpliedCondition` will now try to prove "A < B ==> C < D" by proving
"C <= A && B <= D", since then it follows "C <= A < B <= D".
While this change is in principle NFC, I could not think of a way to not
handle cases like "i +_nsw 1 < L ==> i < L +_nsw 1" (that ValueTracking
did not handle before) while keeping the change understandable. I've
added tests for these cases.
Matt Arsenault [Fri, 6 Nov 2015 18:01:57 +0000 (18:01 +0000)]
AMDGPU: Add pass to detect used kernel features
Mark kernels that use certain features that require user
SGPRs to support with kernel attributes. We need to know
before instruction selection begins because it impacts
the kernel calling convention lowering.
For now this only detects the workitem intrinsics.
Teresa Johnson [Fri, 6 Nov 2015 17:50:53 +0000 (17:50 +0000)]
Restore "Move metadata linking after lazy global materialization/linking."
Summary:
This reverts commit r251965.
Restore "Move metadata linking after lazy global materialization/linking."
This restores commit r251926, with fixes for the LTO bootstrapping bot
failure.
The bot failure was caused by references from debug metadata to
otherwise unreferenced globals. Previously, this caused the lazy linking
to link in their defs, which is unnecessary. With this patch, because
lazy linking is complete when we encounter the metadata reference, the
materializer created a declaration. For definitions such as aliases and
comdats, it is illegal to have a declaration. Furthermore, metadata
linking should not change code generation. Therefore, when linking of
global value bodies is complete, the materializer will simply return
nullptr as the new reference for the linked metadata.
This change required fixing a different test to ensure there was a
real reference to a linkonce global that was only being reference from
metadata.
Note that the new changes to the only-needed-named-metadata.ll test
illustrate an issue with llvm-link -only-needed handling of comdat
groups, whereby it may result in an incomplete comdat group. I note this
in the test comments, but the issue is orthogonal to this patch (it can
be reproduced without any metadata at head).
Reid Kleckner [Fri, 6 Nov 2015 17:06:38 +0000 (17:06 +0000)]
[WinEH] Mark funclet entries and exits as clobbering all registers
Summary:
In this implementation, LiveIntervalAnalysis invents a few register
masks on basic block boundaries that preserve no registers. The nice
thing about this is that it prevents the prologue inserter from thinking
it needs to spill all XMM CSRs, because it doesn't see any explicit
physreg defs in the MI.
Jun Bum Lim [Fri, 6 Nov 2015 16:27:47 +0000 (16:27 +0000)]
[AArch64]Enable the narrow ld promotion only on profitable microarchitectures
The benefit from converting narrow loads into a wider load (r251438) could be
micro-architecturally dependent, as it assumes that a single load with two bitfield
extracts is cheaper than two narrow loads. Currently, this conversion is
enabled only in cortex-a57 on which performance benefits were verified.
Daniel Sanders [Fri, 6 Nov 2015 12:41:43 +0000 (12:41 +0000)]
[mips][ias] Range check uimm4 operands and fixed a bug this revealed.
Summary:
The bug was that the sldi instructions have immediate widths dependant on
their element size. So sldi.d has a 1-bit immediate and sldi.b has a 4-bit
immediate. All of these were using 4-bit immediates previously.
Daniel Sanders [Fri, 6 Nov 2015 12:22:31 +0000 (12:22 +0000)]
[mips][ias] Range check uimm2 operands and fix a bug this revealed.
Summary:
The bug was that the MIPS32R6/MIPS64R6/microMIPS32R6 versions of LSA and DLSA
(unlike the MSA version) failed to account for the off-by-one encoding of the
immediate. The range is actually 1..4 rather than 0..3.
[mips] Define patterns for the atomic_{load,store}_{8,16,32,64} nodes.
Summary:
Without these patterns we would generate a complete LL/SC sequence.
This would be problematic for memory regions marked as WRITE-only or
READ-only, as the instructions LL/SC would read/write to the protected
memory regions correspondingly.
James Molloy [Fri, 6 Nov 2015 10:32:53 +0000 (10:32 +0000)]
Add a new attribute: norecurse
This attribute allows the compiler to assume that the function never recurses into itself, either directly or indirectly (transitively). This can be used among other things to demote global variables to locals.
Reid Kleckner [Fri, 6 Nov 2015 01:49:05 +0000 (01:49 +0000)]
[WinEH] Split EH_RESTORE out of CATCHRET for 32-bit EH
This adds the EH_RESTORE x86 pseudo instr, which is responsible for
restoring the stack pointers: EBP and ESP, and ESI if stack realignment
is involved. We only need this on 32-bit x86, because on x64 the runtime
restores CSRs for us.
Previously we had to keep the CATCHRET instruction around during SEH so
that we could convince X86FrameLowering to restore our frame pointers.
Now we can split these instructions earlier.
This was confusing, because we had a return instruction which wasn't
really a return and was ultimately going to be removed by
X86FrameLowering. This change also simplifies X86FrameLowering, which
really shouldn't be building new MBBs.
No observable functional change currently, but with the new register
mask stuff in D14407, CATCHRET will become a register allocator barrier,
and our existing tests rely on us having reasonable register allocation
around SEH.
Andrew Kaylor [Fri, 6 Nov 2015 00:20:50 +0000 (00:20 +0000)]
[WinEH] Clone funclets with multiple parents
Windows EH funclets need to always return to a single parent funclet. However, it is possible for earlier optimizations to combine funclets (probably based on one funclet having an unreachable terminator) in such a way that this condition is violated.
These changes add code to the WinEHPrepare pass to detect situations where a funclet has multiple parents and clone such funclets, fixing up the unwind and catch return edges so that each copy of the funclet returns to the correct parent funclet.
Keno Fischer [Fri, 6 Nov 2015 00:12:50 +0000 (00:12 +0000)]
[bugpoint] Add a named metadata (+their operands) reducer
Summary:
We frequently run bugpoint on a linked module that consists of all
modules we create while jitting the julia standard library. This module
has a very large number of compile units (10000+) in `llvm.dbg.cu`,
which didn't get reduced at all, requiring manual post processing.
This is an attempt to have bugpoint go through and attempt to reduce
the number of global named metadata nodes as well as their operands,
to cut down the number of roots for such metadata.
DI: Reverse direction of subprogram -> function edge.
Previously, subprograms contained a metadata reference to the function they
described. Because most clients need to get or set a subprogram for a given
function rather than the other way around, this created unneeded inefficiency.
For example, many passes needed to call the function llvm::makeSubprogramMap()
to build a mapping from functions to subprograms, and the IR linker needed to
fix up function references in a way that caused quadratic complexity in the IR
linking phase of LTO.
This change reverses the direction of the edge by storing the subprogram as
function-level metadata and removing DISubprogram's function field.
Since this is an IR change, a bitcode upgrade has been provided.
Fixes PR23367. An upgrade script for textual IR for out-of-tree clients is
attached to the PR.
Alexey Samsonov [Thu, 5 Nov 2015 21:18:41 +0000 (21:18 +0000)]
[ASan] Disable instrumentation for inalloca variables.
inalloca variables were not treated as static allocas, therefore didn't
participate in regular stack instrumentation. We don't want them to
participate in dynamic alloca instrumentation as well.
Reid Kleckner [Thu, 5 Nov 2015 21:09:49 +0000 (21:09 +0000)]
[WinEH] Fix funclet prologues with stack realignment
We already had a test for this for 32-bit SEH catchpads, but those don't
actually create funclets. We had a bug that only appeared in funclet
prologues, where we would establish EBP and ESI as our FP and BP, and
then downstream prologue code would overwrite them.
While I was at it, I fixed Win64+funclets+stackrealign. This issue
doesn't come up as often there due to the ABI requring 16 byte stack
alignment, but now we can rest easy that AVX and WinEH will work well
together =P.