Reapply "Verifier: Check for incompatible bit piece expressions"
This reverts commit r234717, reapplying r234698 (in spirit).
As described in r234717, the original `Verifier` check had a
use-after-free. Instead of storing pointers to "interesting" debug info
intrinsics whose bit piece expressions should be verified once we have
typerefs, do a second traversal. I've added a testcase to catch the
`llc` crasher.
Original commit message:
Verifier: Check for incompatible bit piece expressions
Convert an assertion into a `Verifier` check. Bit piece expressions
must fit inside the variable, and mustn't be the entire variable.
Catching this in the verifier will help us find bugs sooner, and makes
`DIVariable::getSizeInBits()` dead code.
Philip Reames [Mon, 13 Apr 2015 18:07:21 +0000 (18:07 +0000)]
[RewriteStatepointsForGC] Fix a latent bug in normalization for invoke statepoint [NFC]
Since we're restructuring the CFG, we also need to make sure to update the analsis passes. While I'm touching the code, I dedicided to restructure it a bit. The code involved here was very confusing. This change moves the normalization to essentially being a pre-pass before the main insertion work and updates a few comments to actually say what is happening and *why*.
The restructuring should be covered by existing tests. I couldn't easily see how to create a test for the invalidation bug. Suggestions welcome.
Jan Vesely [Mon, 13 Apr 2015 17:47:15 +0000 (17:47 +0000)]
Revert revisions r234755, r234759, r234760
Revert "Remove default in fully-covered switch (to fix Clang -Werror -Wcovered-switch-default)"
Revert "R600: Add carry and borrow instructions. Use them to implement UADDO/USUBO"
Revert "LegalizeDAG: Try to use Overflow operations when expanding ADD/SUB"
Using overflow operations fails CodeGen/Generic/2011-07-07-ScheduleDAGCrash.ll
on hexagon, nvptx, and r600. Revert while I investigate.
Philip Reames [Mon, 13 Apr 2015 17:35:55 +0000 (17:35 +0000)]
[RewriteStatepointsForGC] Strengthen assertions around liveness
This is related to the issues addressed in 234651. These assertions check the properties ensured by that change at the place of use. Note that a similiar property is checked in checkBasicSSA, but without the reachability constraint. Technically, the liveness would be correct to include unreachable values, but this would be problematic for actual relocation.
Philip Reames [Mon, 13 Apr 2015 16:41:32 +0000 (16:41 +0000)]
[RewriteStatepointsForGC] Move an expensive debugging check to XDEBUG
The check in question is attempting to help find cases where we haven't relocated a pointer at a safepoint we should have. It does this by coercing the value to null at any safepoint which doesn't relocate it.
Unfortunately, this turns out to be rather expensive in terms of memory usage and time. The number of stores inserted can grow with O(number of values x number of statepoints). On at least one example I looked at, over half of peak memory usage was coming from this check.
With this change, the check is no longer enabled by default in Asserts builds. It is enabled for expensive asserts builds and has a command line option to enable it in both Asserts and non-Asserts builds.
Jan Vesely [Mon, 13 Apr 2015 16:26:00 +0000 (16:26 +0000)]
R600: Add carry and borrow instructions. Use them to implement UADDO/USUBO
v2: tighten the sub64 tests
v3: rename to CARRY/BORROW
v4: fixup test cmdline
add known bits computation
use sign extend instead of sub 0,x
better add test
v5: remove redundant break
move lowering to separate functions
fix comments
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewers: arsenm
David Blaikie [Mon, 13 Apr 2015 16:04:17 +0000 (16:04 +0000)]
[docs] Update outdated ExtendingLLVM.rst
Summary:
The document is still incomplete in some degrees, but updated to reflect the
latest changes. Anyway we can detail it if any one think it is not enough. For
the sake of it, some useful examples are listed below:
Refer to r113618 "Add X86 MMX type to bitcode and Type" for how to add a new
type.
> One notable change from then is only one thing that ``lib/VMCore`` is renamed
to ``lib/IR``.
Refer to r194760 "Add addrspacecast instruction" for how to add a new
instruction.
John Brawn [Mon, 13 Apr 2015 10:47:39 +0000 (10:47 +0000)]
[ARM] Align global variables passed to memory intrinsics
Fill in the TODO in CodeGenPrepare::OptimizeCallInst so that global
variables that are passed to memory intrinsics are aligned in the same
way that allocas are.
Revert "Verifier: Check for incompatible bit piece expressions"
This reverts commit r234698.
This caused a use-after-free: `QueuedBitPieceExpressions` holds onto
references to `DbgInfoIntrinsic`s and references them past where they're
deleted (this is because the verifier is run as a function pass, and
then `verifyTypeRefs()` is called during `doFinalization()`).
I'll include a reduced crasher for `llc` when I recommit the check.
Petr Hosek [Sun, 12 Apr 2015 23:42:25 +0000 (23:42 +0000)]
[MC] Write padding into fragments when -mc-relax-all flag is used
Summary:
When instruction bundling is enabled and the -mc-relax-all flag is
set, we can write bundle padding directly into fragments and avoid
creating large number of fragments significantly reducing LLVM MC
memory usage.
Hal Finkel [Sun, 12 Apr 2015 17:18:56 +0000 (17:18 +0000)]
[PowerPC] Really iterate over all loops in PPCLoopDataPrefetch/PPCLoopPreIncPrep
When I fixed these a couple of days ago to iterate over all loops, not just
depth == 1 loops, I inadvertently made it such that we'd only look at the first
top-level loop. Make sure that we really look at all of them.
Sanjoy Das [Sun, 12 Apr 2015 01:24:01 +0000 (01:24 +0000)]
[LoopUnrollRuntime] Clean up a predicate.
Clean up a predicate I added in r229731, fix the relevant comment and
add a test case. The earlier version is confusing to read and was also
buggy (probably not a coincidence) till Alexey fixed it in r233881.
Verifier: Check for incompatible bit piece expressions
Convert an assertion into a `Verifier` check. Bit piece expressions
must fit inside the variable, and mustn't be the entire variable.
Catching this in the verifier will help us find bugs sooner, and makes
`DIVariable::getSizeInBits()` dead code.
Add `DIBuilder::replaceTemporary()` as a replacement for
`DIDescriptor::replaceAllUsesWith()`. I'll update clang to use the new
method, and then come back to delete the original.
This method dispatches to `replaceAllUsesWith()` or
`replaceWithUniqued()`, depending on whether the replacement is actually
a different node from the original.
DebugInfo: Move DIScope::getName() and getContext() to MDScope
Continue gutting the `DIDescriptor` hierarchy. In this case, move the
guts of `DIScope::getName()` and `DIScope::getContext()` to
`MDScope::getName()` and `MDScope::getScope()`.
DebugInfo: Rewrite atSameLineAs() as MDLocation::canDiscriminate()
Rewrite `DILocation::atSameLineAs()` as `MDLocation::canDiscriminate()`
with a doxygen comment explaining its purpose. I've added a few FIXMEs
where I think this check is too weak; fixing that is tracked by PR23199.
DebugInfo: Add forwarding getFilename() accessor to new hierarchy
Add forwarding `getFilename()` and `getDirectory()` accessors to nodes
in the new hierarchy that define a `getFile()`. Use that to
re-implement existing functionality in the `DIDescriptor` hierarchy.
Hal Finkel [Sat, 11 Apr 2015 00:33:08 +0000 (00:33 +0000)]
[PowerPC] Fix PPCLoopPreIncPrep for depth > 1 loops
This pass had the same problem as the data-prefetching pass: it was only
checking for depth == 1 loops in practice. Fix that, add some debugging
statements, and make sure that, when we grab an AddRec, it is for the loop we
expect.
Ahmed Bougacha [Sat, 11 Apr 2015 00:06:36 +0000 (00:06 +0000)]
[CodeGen] Split -enable-global-merge into ARM and AArch64 options.
Currently, there's a single flag, checked by the pass itself.
It can't force-enable the pass (and is on by default), because it
might not even have been created, as that's the targets decision.
Instead, have separate explicit flags, so that the decision is
consistently made in the target.
Keep the flag as a last-resort "force-disable GlobalMerge" for now,
for backwards compatibility.
[AArch64] Strengthen the code for the prologue insertion.
The spilled registers are pristine and thus, correctly handled by
the register scavenger and so on, but the liveness information is
strictly speaking wrong at this point.
Fix that.
[WinEH] Recognize SEH finally block inserted by the frontend
This allows winehprepare to build sensible llvm.eh.actions calls for SEH
finally blocks. The pattern matching in this change is brittle and
should be replaced with something more robust soon. In the meantime,
this will let us write the code that produces __C_specific_handler xdata
tables, which we need regardless of how we decide to get finally blocks
through EH preparation.
Tim Northover [Fri, 10 Apr 2015 22:58:48 +0000 (22:58 +0000)]
Generic: Make isMask_N and isShiftedMask_N consistent over 0
Previously, isMask_N returned false for 0 but isShiftedMask_N returned true.
Almost all uses are for pattern matching bitfield operations in the backends,
and expect false (this was discovered because of AArch64's copy of this logic).
Unfortunately, I couldn't put together a small non-fragile test for this. The
nature of the bitfield operations means that this edge case is only really
triggered for nodes like "(and N, 0)", which the DAG combiner is usually very
good at folding away before they get to this stage.
Philip Reames [Fri, 10 Apr 2015 22:53:14 +0000 (22:53 +0000)]
[RewriteStatepointsForGC] Use an actual liveness algorithm
When rewriting statepoints to make relocations explicit, we need to have a conservative but consistent notion of where a particular pointer is live at a particular site. The old code just used dominance, which is correct, but decidedly more conservative then it needed to be. This patch implements a simple dataflow algorithm that's run one per function (well, twice counting fixup after base pointer insertion). There's still lots of room to make this faster, but it's fast enough for all practical purposes today.
Philip Reames [Fri, 10 Apr 2015 22:16:58 +0000 (22:16 +0000)]
[RewriteStatepointsForGC] Missed review comment from 234651 & build fix
After submitting 234651, I noticed I hadn't responded to a review comment by mjacob. This patch addresses that comment and fixes a Release only build problem due to an unused variable.
Philip Reames [Fri, 10 Apr 2015 22:07:04 +0000 (22:07 +0000)]
[RewriteStatepointsForGC] Preprocess the IR to remove unreachable blocks and single entry phis
Two related small changes:
Various dominance based queries about liveness can get confused if we're talking about unreachable blocks. To avoid reasoning about such cases, just remove them before rewriting statepoints.
Remove single entry phis (likely left behind by LCSSA) to reduce the number of live values.
Both of these are motivated by http://reviews.llvm.org/D8674 which will be submitted shortly.
Philip Reames [Fri, 10 Apr 2015 21:48:25 +0000 (21:48 +0000)]
[RewriteStatepointsForGC] Limited support for vectors of pointers
This patch adds limited support for inserting explicit relocations when there's a vector of pointers live over the statepoint. This doesn't handle the case where the vector contains a mix of base and non-base pointers; that's future work.
The current implementation just scalarizes the vector over the gc.statepoint before doing the explicit rewrite. An alternate approach would be to plumb the vector all the way though the backend lowering, but doing that appears challenging. In particular, the size of the indirect spill slot is currently assumed to be sizeof(pointer) throughout the backend.
In practice, this is enough to allow running the SLP and Loop vectorizers before RewriteStatepointsForGC.
Sanjoy Das [Fri, 10 Apr 2015 21:07:09 +0000 (21:07 +0000)]
[InstCombine][CodeGenPrep] Create llvm.uadd.with.overflow in CGP.
Summary:
This change moves creating calls to `llvm.uadd.with.overflow` from
InstCombine to CodeGenPrep. Combining overflow check patterns into
calls to the said intrinsic in InstCombine inhibits optimization because
it introduces an intrinsic call that not all other transforms and
analyses understand.
DebugInfo: Stop leaking temporaries in DIBuilder::createCompileUnit()
Stop leaking temporary nodes from `DIBuilder::createCompileUnit()`.
`replaceAllUsesWith()` doesn't delete the nodes, so we need to delete
them "manually" (well, `TempMDTuple` does that for us).
Similarly, stop leaking the temporary nodes used for variables of
subprograms.
[FS] Report errors from llvm::sys::fs::rename on Windows
Previously we would always report success, which is pretty bogus.
I'm too lazy to write a test where rename will portably fail on all
platforms. I'm just trying to fix breakage introduced by r234597, which
happened to tickle this.
[WinEH] Try to make outlining invokes work a little better
WinEH currently turns invokes into calls. Long term, we will reconsider
this, but for now, make sure we remap the operands and clone the
successors of the new terminator.
Benjamin Kramer [Fri, 10 Apr 2015 14:50:08 +0000 (14:50 +0000)]
[CallSite] Make construction from Value* (or Instruction*) explicit.
CallSite roughly behaves as a common base CallInst and InvokeInst. Bring
the behavior closer to that model by making upcasts explicit. Downcasts
remain implicit and work as before.
Following dyn_cast as a mental model checking whether a Value *V isa
CallSite now looks like this:
if (auto CS = CallSite(V)) // think dyn_cast
instead of:
if (CallSite CS = V)
This is an extra token but I think it is slightly clearer. Making the
ctor explicit has the advantage of not accidentally creating nullptr
CallSites, e.g. when you pass a Value * to a function taking a CallSite
argument.
Chad Rosier [Fri, 10 Apr 2015 13:19:27 +0000 (13:19 +0000)]
[AArch64] Changes some SchedAlias to WriteRes for Cortex-A57.
Using SchedAliases is convenient and works well for latency and resource
lookup for instructions. However, this creates an entry in
AArch64WriteLatencyTable with a WriteResourceID of 0, breaking any
SchedReadAdvance since the lookup will fail.
http://reviews.llvm.org/D8043
Patch by Dave Estes <cestes@codeaurora.org>!
Benjamin Kramer [Fri, 10 Apr 2015 12:46:44 +0000 (12:46 +0000)]
Microoptimize DenseMap::clear.
Cache NumEntries locally, it's only used in an assert and using the member
variable prevents the compiler from eliminating the tombstone check for types
with trivial destructors. No functionality change intended.
Summary:
Some optimizations such as jump threading and loop unswitching can negatively
affect performance when applied to divergent branches. The divergence analysis
added in this patch conservatively estimates which branches in a GPU program
can diverge. This information can then help LLVM to run certain optimizations
selectively.
Test Plan: test/Analysis/DivergenceAnalysis/NVPTX/diverge.ll
David Majnemer [Fri, 10 Apr 2015 04:56:17 +0000 (04:56 +0000)]
[WinEHPrepare] Don't rely on the order of IR
The IPToState table must be emitted after we have generated labels for
all functions in the table. Don't rely on the order of the list of
globals. Instead, utilize WinEHFuncInfo to tell us how many catch
handlers we expect to outline. Once we know we've visited all the catch
handlers, emit the cppxdata.
Hal Finkel [Fri, 10 Apr 2015 03:39:00 +0000 (03:39 +0000)]
[PowerPC] Don't crash on PPC32 i64 fp_to_uint on modern cores
When we have an instruction for this (and, thus, don't generate a runtime
call), we need to custom type legalize this (in a trivial way, just as we do
for fp_to_sint).
Ahmed Bougacha [Thu, 9 Apr 2015 20:04:47 +0000 (20:04 +0000)]
[CodeGen] Combine concat_vector of trunc'd scalar to scalar_to_vector.
We already do:
concat_vectors(scalar, undef) -> scalar_to_vector(scalar)
When the scalar is legal.
When it's not, but is a truncated legal scalar, we can also do:
concat_vectors(trunc(scalar), undef) -> scalar_to_vector(scalar)
Which is equivalent, since the upper lanes are undef anyway.
While there, teach the combine to look at more than 2 operands.
The integer extend optimization tries to fold the extend into the load
instruction. This requires us to identify if the extend has already been
emitted or not and act accordingly on it.
The check that was originally performed for this was not sufficient. Besides
checking the ValueMap for a mapped register we also need to check if the
virtual register has already an associated machine instruction that defines it.