Adrian Prantl [Fri, 21 Nov 2014 00:39:43 +0000 (00:39 +0000)]
Verifier: Check that all instructions have their parent pointers set up
correctly. This helps with catching problems caused by IRBuilder abuse
such as the one fixed in CFE r222487.
Reid Kleckner [Thu, 20 Nov 2014 23:37:18 +0000 (23:37 +0000)]
Add out of line virtual destructors to all LLVMTargetMachine subclasses
These recently all grew a unique_ptr<TargetLoweringObjectFile> member in
r221878. When anyone calls a virtual method of a class, clang-cl
requires all virtual methods to be semantically valid. This includes the
implicit virtual destructor, which triggers instantiation of the
unique_ptr destructor, which fails because the type being deleted is
incomplete.
This is just part of the ongoing saga of PR20337, which is affecting
Blink as well. Because the MSVC ABI doesn't have key functions, we end
up referencing the vtable and implicit destructor on any virtual call
through a class. We don't actually end up emitting the dtor, so it'd be
good if we could avoid this unneeded type completion work.
Mehdi Amini [Thu, 20 Nov 2014 22:40:25 +0000 (22:40 +0000)]
SimplifyCFG: Refactor GatherConstantCompares() result in a struct
Code seems cleaner and easier to understand this way
This is basically r222416, after fixes for MSVC lack of standard
support, and a few cleaning (got rid of a warning).
Thanks Nakamura Takumi and Nico Weber for the MSVC fixes.
Currently LoopUnroll generates a prologue loop before the main loop
body to execute first N%UnrollFactor iterations. Also, this loop is
used if trip-count can overflow - it's determined by a runtime check.
However, we've been mistakenly optimizing this loop to a linear code for
UnrollFactor = 2, not taking into account that it also serves as a safe
version of the loop if its trip-count overflows.
Michael Ilseman [Thu, 20 Nov 2014 19:33:33 +0000 (19:33 +0000)]
Compilation test for PostOrderIterator.
If the template specialization for externally managed sets in
PostOrderIterator call too far out of sync with each other, this unit
test will fail to build. This is especially useful for developers who
may not build Clang (the only in-tree user) every time.
Michael Ilseman [Thu, 20 Nov 2014 19:33:30 +0000 (19:33 +0000)]
Update template specialization to reflect API changes.
po_iterator_storage's insertEdge was updated to reflect the API
changes from many of our insert methods in r222334, however the
template specialization for external storage was not updated. This
updates the specialization.
X86: use the correct alloca symbol for Windows Itanium
Windows itanium targets the MSVCRT, and the stack probe symbol is provided by
MSVCRT. This corrects the emission of stack probes on i686-windows-itanium.
Frederic Riss [Thu, 20 Nov 2014 15:52:34 +0000 (15:52 +0000)]
Do not create a replaceable Variables MDNode for function forward decls.
These fields would need to be explicitly deleted before we RAUW the temporary
node anyway (this was done in cfe commit r222373). Instead, do not create
these useless nodes in the first place.
Alexey Samsonov [Thu, 20 Nov 2014 01:27:19 +0000 (01:27 +0000)]
Remove support for undocumented SpecialCaseList entries.
"global-init", "global-init-src" and "global-init-type" were originally
used to blacklist entities in ASan init-order checker. However, they
were never documented, and later were replaced by "=init" category.
Old blacklist entries should be converted as follows:
* global-init:foo -> global:foo=init
* global-init-src:bar -> src:bar=init
* global-init-type:baz -> type:baz=init
Matthias Braun [Wed, 19 Nov 2014 19:46:17 +0000 (19:46 +0000)]
RegisterCoalescer: Improve debug messages
- Show "Considering..." message after flipping so you actually see the final
destination vreg as destination.
- Add a message on final join, so you can grep for "Success" messages to obtain
a list of which register got merged with which.
Andrea Di Biagio [Wed, 19 Nov 2014 19:34:29 +0000 (19:34 +0000)]
[X86] Improved lowering of v4x32 build_vector dag nodes.
This patch improves the lowering of v4f32 and v4i32 build_vector dag nodes
that are known to have at least two non-zero elements.
With this patch, a build_vector that performs a blend with zero is
converted into a shuffle. This is done to let the shuffle legalizer expand
the dag node in a optimal way. For example, if we know that a build_vector
performs a blend with zero, we can try to lower it as a movq/blend instead of
always selecting an insertps.
This patch also improves the logic that lowers a build_vector into a insertps
with zero masking. See for example the extra test cases added to test sse41.ll.
Lang Hames [Wed, 19 Nov 2014 19:15:41 +0000 (19:15 +0000)]
[ADT] Fix PR20728 - Incorrect APFloat::fusedMultiplyAdd results for x86_fp80.
As detailed at http://llvm.org/PR20728, due to an internal overflow in
APFloat::multiplySignificand the APFloat::fusedMultiplyAdd method can return
incorrect results for x87DoubleExtended (x86_fp80) values. This commonly
manifests as incorrect constant folding of libm fmal calls on x86. E.g.
fmal(1.0L, 1.0L, 3.0L) == 0.0L (should be 4.0L)
This patch fixes PR20728 by adding an extra bit to the significand for
intermediate results of APFloat::multiplySignificand, avoiding the overflow.
Tom Stellard [Wed, 19 Nov 2014 16:58:49 +0000 (16:58 +0000)]
R600/SI: Make SIInstrInfo::isOperandLegal() more strict
A register operand that has a common sub-class with its instruction's
defined register class is not always legal. For example,
SReg_32 and M0Reg both have a common sub-class, but we can't
use an SReg_32 in instructions that expect a M0Reg.
This prevents the llvm.SI.sendmsg.ll test from failing when the fold
operand pass is added.
When the BasicBlock containing the return instrution has a PHI with 2
incoming values, FoldReturnIntoUncondBranch will remove the no longer
used incoming value and remove the no longer needed phi as well. This
leaves us with a BB that no longer has a PHI, but the subsequent call
to FoldReturnIntoUncondBranch from FoldReturnAndProcessPred will not
remove the return instruction (which still uses the result of the call
instruction). This prevents EliminateRecursiveTailCall to remove
the value, as it is still being used in a basicblock which has no
predecessors.
The basicblock can not be erased on the spot, because its iterator is
still being used in runTRE.
This issue was exposed when removing the threshold on size for lifetime
marker insertion for named temporaries in clang. The testcase is a much
reduced version of peelOffOuterExpr(const Expr*, const ExplodedNode *)
from clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp.
Evgeniy Stepanov [Wed, 19 Nov 2014 10:30:02 +0000 (10:30 +0000)]
Use ninja pools to limit the number of concurrent compile/link jobs.
This change makes use of the new "job pool" capability in cmake 3.0
with ninja generator to allow limiting the number of concurrent jobs
of a certain type.
Simon Pilgrim [Wed, 19 Nov 2014 10:06:49 +0000 (10:06 +0000)]
[X86][SSE] pslldq/psrldq byte shifts/rotation for SSE2
This patch builds on http://reviews.llvm.org/D5598 to perform byte rotation shuffles (lowerVectorShuffleAsByteRotate) on pre-SSSE3 (palignr) targets - pre-SSSE3 is only enabled on i8 and i16 vector targets where it is a more definite performance gain.
I've also added a separate byte shift shuffle (lowerVectorShuffleAsByteShift) that makes use of the ability of the SLLDQ/SRLDQ instructions to implicitly shift in zero bytes to avoid the need to create a zero register if we had used palignr.
David Majnemer [Wed, 19 Nov 2014 09:41:05 +0000 (09:41 +0000)]
AliasSetTracker: UnknownInsts should contribute to the refcount
AliasSetTracker::addUnknown may create an AliasSet devoid of pointers
just to contain an instruction if no suitable AliasSet already exists.
It will then AliasSet::addUnknownInst and we will be done.
However, it's possible for addUnknown to choose an existing AliasSet to
addUnknownInst.
If this were to occur, we are in a bit of a pickle: removing pointers
from the AliasSet can cause the entire AliasSet to become destroyed,
taking our unknown instructions out with them.
Instead, keep track whether or not our AliasSet has any unknown
instructions.
David Blaikie [Wed, 19 Nov 2014 07:49:26 +0000 (07:49 +0000)]
Update SetVector to rely on the underlying set's insert to return a pair<iterator, bool>
This is to be consistent with StringSet and ultimately with the standard
library's associative container insert function.
This lead to updating SmallSet::insert to return pair<iterator, bool>,
and then to update SmallPtrSet::insert to return pair<iterator, bool>,
and then to update all the existing users of those functions...
Hao Liu [Wed, 19 Nov 2014 06:48:56 +0000 (06:48 +0000)]
[AArch64] Disable useAA for Cortex-A57.
Using AA during CodeGen is very useful for in-order cores. It is less useful for ooo cores. Also I find
enabling useAA for Cortex-A57 may generate worse code for some test cases. If useAA in codegen is improved
and benefical for ooo cores, we can enable it again.
Hao Liu [Wed, 19 Nov 2014 06:39:53 +0000 (06:39 +0000)]
[AArch64] Enable SeparateConstOffsetFromGEP, EarlyCSE and LICM passes on AArch64 backend.
SeparateConstOffsetFromGEP can gives more optimizaiton opportunities related to GEPs, which benefits EarlyCSE
and LICM. By enabling these passes we can have better address calculations and generate a better addressing
mode. Some SPEC 2006 benchmarks (astar, gobmk, namd) have obvious improvements on Cortex-A57.
Hao Liu [Wed, 19 Nov 2014 06:24:44 +0000 (06:24 +0000)]
[SeparateConstOffsetFromGEP] Allow SeparateConstOffsetFromGEP pass to lower GEPs.
If LowerGEP is enabled, it can lower a GEP with multiple indices into GEPs with a single index
or arithmetic operations. Lowering GEPs can always extract structure indices. Lowering GEPs can
also give use more optimization opportunities. It can benefit passes like CSE, LICM and CGP.
David Blaikie [Wed, 19 Nov 2014 05:49:42 +0000 (05:49 +0000)]
Remove StringMap::GetOrCreateValue in favor of StringMap::insert
Having two ways to do this doesn't seem terribly helpful and
consistently using the insert version (which we already has) seems like
it'll make the code easier to understand to anyone working with standard
data structures. (I also updated many references to the Entry's
key and value to use first() and second instead of getKey{Data,Length,}
and get/setValue - for similar consistency)
Also removes the GetOrCreateValue functions so there's less surface area
to StringMap to fix/improve/change/accommodate move semantics, etc.
Teach llvm-build to avoid touching LibraryDependencies.inc unless the contents
change. This saves us from rebuilding llvm-config each time we reconfigure.
David Blaikie [Wed, 19 Nov 2014 02:56:00 +0000 (02:56 +0000)]
Make StringSet::insert return pair<iterator, bool> like other self-associative containers
StringSet is still a bit dodgy in that it exposes the raw iterator of
the StringMap parent, which exposes the weird detail that StringSet
actually has a 'value'... but anyway, this is useful for a handful of
clients that want to reference the newly inserted/persistent string data
in the StringSet/Map/Entry/thing.
Summary:
move the code from BreakCriticalEdges::runOnFunction()
into a separate utility function llvm::SplitAllCriticalEdges()
so that it can be used independently.
No functionality change intended.
Under many circumstances the stack is not 32-byte aligned, resulting in the use of the vmovups/vmovupd/vmovdqu instructions when inserting ymm reloads/spills.
This minor patch adds these instructions to the isFrameLoadOpcode/isFrameStoreOpcode helpers so that they can be correctly identified and not be treated as folded reloads/spills.
This has also been noticed by http://llvm.org/bugs/show_bug.cgi?id=18846 where it was causing redundant spills - I've added a reduced test case at test/CodeGen/X86/pr18846.ll
David Majnemer [Tue, 18 Nov 2014 22:06:45 +0000 (22:06 +0000)]
InstCombine: Fix another infinite loop caused by visitFPTrunc
We would attempt to replace an frem's operand with the same operand.
This would cause InstCombine to think real work was done, causing
InstCombine to enter an infinite loop.
Matt Arsenault [Tue, 18 Nov 2014 21:06:58 +0000 (21:06 +0000)]
R600/SI: Move SIFixSGPRCopies to inst selector passes
This should expose more of the actually used VALU
instructions to the machine optimization passes.
This also should help getting i1 handling into a better state.
For not entirly understood reasons, this fixes the split-scalar-i64-add.ll
test where a 64-bit add would only partially be moved to the VALU
resulting in use of undefined VCC.
Juergen Ributzka [Tue, 18 Nov 2014 21:02:40 +0000 (21:02 +0000)]
[AArch64] Don't optimize all compare instructions.
"optimizeCompareInstr" converts compares (cmp/cmn) into plain sub/add
instructions when the flags are not used anymore. This conversion is valid for
most instructions, but not all. Some instructions that don't set the flags
(e.g. sub with immediate) can set the SP, whereas the flag setting version uses
the same encoding for the "zero" register.
Update the code to also check for the return register before performing the
optimization to make sure that a cmp doesn't suddenly turn into a sub that sets
the stack pointer.
I don't have a test case for this, because it isn't easy to trigger.
Chad Rosier [Tue, 18 Nov 2014 20:34:01 +0000 (20:34 +0000)]
[Reassociate] Use test cases that can actually be optimized to verify optional
flags are cleared. The reassociation pass was just reordering the leaf nodes
in the previous test cases.
Juergen Ributzka [Tue, 18 Nov 2014 19:58:59 +0000 (19:58 +0000)]
[FastISel][AArch64] Fix shift-immediate emission for "zero" shifts.
This change emits a COPY for a shift-immediate with a "zero" shift value.
This fixes PR21594 where we emitted a shift instruction with an incorrect
immediate operand.
Philip Reames [Tue, 18 Nov 2014 17:46:32 +0000 (17:46 +0000)]
Tweak EarlyCSE to recognize series of dead stores
EarlyCSE is giving up on the current instruction immediately when it recognizes that the current instruction makes a previous store trivially dead. There's no reason to do this. Once the previous store has been deleted, it's perfectly legal to remember the value of the current store (for value forwarding) and the fact the store occurred (it could be dead too!).
Reviewed by: Hal
Differential Revision: http://reviews.llvm.org/D6301