Hal Finkel [Sun, 12 Apr 2015 17:18:56 +0000 (17:18 +0000)]
[PowerPC] Really iterate over all loops in PPCLoopDataPrefetch/PPCLoopPreIncPrep
When I fixed these a couple of days ago to iterate over all loops, not just
depth == 1 loops, I inadvertently made it such that we'd only look at the first
top-level loop. Make sure that we really look at all of them.
Sanjoy Das [Sun, 12 Apr 2015 01:24:01 +0000 (01:24 +0000)]
[LoopUnrollRuntime] Clean up a predicate.
Clean up a predicate I added in r229731, fix the relevant comment and
add a test case. The earlier version is confusing to read and was also
buggy (probably not a coincidence) till Alexey fixed it in r233881.
Verifier: Check for incompatible bit piece expressions
Convert an assertion into a `Verifier` check. Bit piece expressions
must fit inside the variable, and mustn't be the entire variable.
Catching this in the verifier will help us find bugs sooner, and makes
`DIVariable::getSizeInBits()` dead code.
Add `DIBuilder::replaceTemporary()` as a replacement for
`DIDescriptor::replaceAllUsesWith()`. I'll update clang to use the new
method, and then come back to delete the original.
This method dispatches to `replaceAllUsesWith()` or
`replaceWithUniqued()`, depending on whether the replacement is actually
a different node from the original.
DebugInfo: Move DIScope::getName() and getContext() to MDScope
Continue gutting the `DIDescriptor` hierarchy. In this case, move the
guts of `DIScope::getName()` and `DIScope::getContext()` to
`MDScope::getName()` and `MDScope::getScope()`.
DebugInfo: Rewrite atSameLineAs() as MDLocation::canDiscriminate()
Rewrite `DILocation::atSameLineAs()` as `MDLocation::canDiscriminate()`
with a doxygen comment explaining its purpose. I've added a few FIXMEs
where I think this check is too weak; fixing that is tracked by PR23199.
DebugInfo: Add forwarding getFilename() accessor to new hierarchy
Add forwarding `getFilename()` and `getDirectory()` accessors to nodes
in the new hierarchy that define a `getFile()`. Use that to
re-implement existing functionality in the `DIDescriptor` hierarchy.
Hal Finkel [Sat, 11 Apr 2015 00:33:08 +0000 (00:33 +0000)]
[PowerPC] Fix PPCLoopPreIncPrep for depth > 1 loops
This pass had the same problem as the data-prefetching pass: it was only
checking for depth == 1 loops in practice. Fix that, add some debugging
statements, and make sure that, when we grab an AddRec, it is for the loop we
expect.
Ahmed Bougacha [Sat, 11 Apr 2015 00:06:36 +0000 (00:06 +0000)]
[CodeGen] Split -enable-global-merge into ARM and AArch64 options.
Currently, there's a single flag, checked by the pass itself.
It can't force-enable the pass (and is on by default), because it
might not even have been created, as that's the targets decision.
Instead, have separate explicit flags, so that the decision is
consistently made in the target.
Keep the flag as a last-resort "force-disable GlobalMerge" for now,
for backwards compatibility.
[AArch64] Strengthen the code for the prologue insertion.
The spilled registers are pristine and thus, correctly handled by
the register scavenger and so on, but the liveness information is
strictly speaking wrong at this point.
Fix that.
[WinEH] Recognize SEH finally block inserted by the frontend
This allows winehprepare to build sensible llvm.eh.actions calls for SEH
finally blocks. The pattern matching in this change is brittle and
should be replaced with something more robust soon. In the meantime,
this will let us write the code that produces __C_specific_handler xdata
tables, which we need regardless of how we decide to get finally blocks
through EH preparation.
Tim Northover [Fri, 10 Apr 2015 22:58:48 +0000 (22:58 +0000)]
Generic: Make isMask_N and isShiftedMask_N consistent over 0
Previously, isMask_N returned false for 0 but isShiftedMask_N returned true.
Almost all uses are for pattern matching bitfield operations in the backends,
and expect false (this was discovered because of AArch64's copy of this logic).
Unfortunately, I couldn't put together a small non-fragile test for this. The
nature of the bitfield operations means that this edge case is only really
triggered for nodes like "(and N, 0)", which the DAG combiner is usually very
good at folding away before they get to this stage.
Philip Reames [Fri, 10 Apr 2015 22:53:14 +0000 (22:53 +0000)]
[RewriteStatepointsForGC] Use an actual liveness algorithm
When rewriting statepoints to make relocations explicit, we need to have a conservative but consistent notion of where a particular pointer is live at a particular site. The old code just used dominance, which is correct, but decidedly more conservative then it needed to be. This patch implements a simple dataflow algorithm that's run one per function (well, twice counting fixup after base pointer insertion). There's still lots of room to make this faster, but it's fast enough for all practical purposes today.
Philip Reames [Fri, 10 Apr 2015 22:16:58 +0000 (22:16 +0000)]
[RewriteStatepointsForGC] Missed review comment from 234651 & build fix
After submitting 234651, I noticed I hadn't responded to a review comment by mjacob. This patch addresses that comment and fixes a Release only build problem due to an unused variable.
Philip Reames [Fri, 10 Apr 2015 22:07:04 +0000 (22:07 +0000)]
[RewriteStatepointsForGC] Preprocess the IR to remove unreachable blocks and single entry phis
Two related small changes:
Various dominance based queries about liveness can get confused if we're talking about unreachable blocks. To avoid reasoning about such cases, just remove them before rewriting statepoints.
Remove single entry phis (likely left behind by LCSSA) to reduce the number of live values.
Both of these are motivated by http://reviews.llvm.org/D8674 which will be submitted shortly.
Philip Reames [Fri, 10 Apr 2015 21:48:25 +0000 (21:48 +0000)]
[RewriteStatepointsForGC] Limited support for vectors of pointers
This patch adds limited support for inserting explicit relocations when there's a vector of pointers live over the statepoint. This doesn't handle the case where the vector contains a mix of base and non-base pointers; that's future work.
The current implementation just scalarizes the vector over the gc.statepoint before doing the explicit rewrite. An alternate approach would be to plumb the vector all the way though the backend lowering, but doing that appears challenging. In particular, the size of the indirect spill slot is currently assumed to be sizeof(pointer) throughout the backend.
In practice, this is enough to allow running the SLP and Loop vectorizers before RewriteStatepointsForGC.
Sanjoy Das [Fri, 10 Apr 2015 21:07:09 +0000 (21:07 +0000)]
[InstCombine][CodeGenPrep] Create llvm.uadd.with.overflow in CGP.
Summary:
This change moves creating calls to `llvm.uadd.with.overflow` from
InstCombine to CodeGenPrep. Combining overflow check patterns into
calls to the said intrinsic in InstCombine inhibits optimization because
it introduces an intrinsic call that not all other transforms and
analyses understand.
DebugInfo: Stop leaking temporaries in DIBuilder::createCompileUnit()
Stop leaking temporary nodes from `DIBuilder::createCompileUnit()`.
`replaceAllUsesWith()` doesn't delete the nodes, so we need to delete
them "manually" (well, `TempMDTuple` does that for us).
Similarly, stop leaking the temporary nodes used for variables of
subprograms.
[FS] Report errors from llvm::sys::fs::rename on Windows
Previously we would always report success, which is pretty bogus.
I'm too lazy to write a test where rename will portably fail on all
platforms. I'm just trying to fix breakage introduced by r234597, which
happened to tickle this.
[WinEH] Try to make outlining invokes work a little better
WinEH currently turns invokes into calls. Long term, we will reconsider
this, but for now, make sure we remap the operands and clone the
successors of the new terminator.
Benjamin Kramer [Fri, 10 Apr 2015 14:50:08 +0000 (14:50 +0000)]
[CallSite] Make construction from Value* (or Instruction*) explicit.
CallSite roughly behaves as a common base CallInst and InvokeInst. Bring
the behavior closer to that model by making upcasts explicit. Downcasts
remain implicit and work as before.
Following dyn_cast as a mental model checking whether a Value *V isa
CallSite now looks like this:
if (auto CS = CallSite(V)) // think dyn_cast
instead of:
if (CallSite CS = V)
This is an extra token but I think it is slightly clearer. Making the
ctor explicit has the advantage of not accidentally creating nullptr
CallSites, e.g. when you pass a Value * to a function taking a CallSite
argument.
Chad Rosier [Fri, 10 Apr 2015 13:19:27 +0000 (13:19 +0000)]
[AArch64] Changes some SchedAlias to WriteRes for Cortex-A57.
Using SchedAliases is convenient and works well for latency and resource
lookup for instructions. However, this creates an entry in
AArch64WriteLatencyTable with a WriteResourceID of 0, breaking any
SchedReadAdvance since the lookup will fail.
http://reviews.llvm.org/D8043
Patch by Dave Estes <cestes@codeaurora.org>!
Benjamin Kramer [Fri, 10 Apr 2015 12:46:44 +0000 (12:46 +0000)]
Microoptimize DenseMap::clear.
Cache NumEntries locally, it's only used in an assert and using the member
variable prevents the compiler from eliminating the tombstone check for types
with trivial destructors. No functionality change intended.
Summary:
Some optimizations such as jump threading and loop unswitching can negatively
affect performance when applied to divergent branches. The divergence analysis
added in this patch conservatively estimates which branches in a GPU program
can diverge. This information can then help LLVM to run certain optimizations
selectively.
Test Plan: test/Analysis/DivergenceAnalysis/NVPTX/diverge.ll
David Majnemer [Fri, 10 Apr 2015 04:56:17 +0000 (04:56 +0000)]
[WinEHPrepare] Don't rely on the order of IR
The IPToState table must be emitted after we have generated labels for
all functions in the table. Don't rely on the order of the list of
globals. Instead, utilize WinEHFuncInfo to tell us how many catch
handlers we expect to outline. Once we know we've visited all the catch
handlers, emit the cppxdata.
Hal Finkel [Fri, 10 Apr 2015 03:39:00 +0000 (03:39 +0000)]
[PowerPC] Don't crash on PPC32 i64 fp_to_uint on modern cores
When we have an instruction for this (and, thus, don't generate a runtime
call), we need to custom type legalize this (in a trivial way, just as we do
for fp_to_sint).
Ahmed Bougacha [Thu, 9 Apr 2015 20:04:47 +0000 (20:04 +0000)]
[CodeGen] Combine concat_vector of trunc'd scalar to scalar_to_vector.
We already do:
concat_vectors(scalar, undef) -> scalar_to_vector(scalar)
When the scalar is legal.
When it's not, but is a truncated legal scalar, we can also do:
concat_vectors(trunc(scalar), undef) -> scalar_to_vector(scalar)
Which is equivalent, since the upper lanes are undef anyway.
While there, teach the combine to look at more than 2 operands.
The integer extend optimization tries to fold the extend into the load
instruction. This requires us to identify if the extend has already been
emitted or not and act accordingly on it.
The check that was originally performed for this was not sufficient. Besides
checking the ValueMap for a mapped register we also need to check if the
virtual register has already an associated machine instruction that defines it.
Currently, llvm (backend) doesn't know cortex-r4, even though it is the
default target for armv7r. Using "--target=armv7r-arm-none-eabi" provokes
'cortex-r4' is not a recognized processor for this target' by llvm.
This patch adds support for cortex-r4 and, very closely related, r4f.
[mips] Refactor saved-registers bitmask creation in MipsAsmPrinter::printSavedRegsBitmask. NFC.
Summary:
Make the code more readable by fusing the for-loops together and explicitly checking for each register class.
Also, this version is more straightforward because it doesn't assume that FPU registers always come before CPU registers in the CalleeSavedInfo vector.
Lang Hames [Thu, 9 Apr 2015 03:40:33 +0000 (03:40 +0000)]
[AArch64] Teach AArch64TargetLowering::getOptimalMemOpType to consider alignment
restrictions when choosing a type for small-memcpy inlining in
SelectionDAGBuilder.
This ensures that the loads and stores output for the memcpy won't be further
expanded during legalization, which would cause the total number of instructions
for the memcpy to exceed (often significantly) the inlining thresholds.
The bug manifests when there are two loads and two stores chained as follows in
a DAG,
(ld v3f32) -> (st f32) -> (ld v3f32) -> (st f32)
and the stores' values are extracted from the preceding vector loads.
MergeConsecutiveStores would replace the first store in the chain with the
merged vector store, which would create a cycle between the merged store node
and the last load node that appears in the chain.
This commits fixes the bug by replacing the last store in the chain instead.
Go bindings: make various DIBuilder arguments optional.
r234262 changed some code in DIBuilderBindings.cpp to use the unwrap function
to unwrap debug metadata. The problem with this is that unwrap asserts that
its argument is non-null, which is not what we want in a number of places
in DIBuilder where the argument is optional. This change makes certain
arguments optional by adding null checks in places where it is required,
fixing the llgo build.
Eliminate O(n^2) worst-case behavior in SSA construction
The code uses a priority queue and a worklist, which share the same
visited set, but the visited set is only updated when inserting into
the priority queue. Instead, switch to using separate visited sets
for the priority queue and worklist.