Tom Stellard [Fri, 23 Jan 2015 22:05:45 +0000 (22:05 +0000)]
R600/SI: Move i64 -> v2i32 load promotion into AMDGPUDAGToDAGISel::Select()
We used to do this promotion during DAG legalization, but this
caused an infinite loop in ExpandUnalignedLoad() because it assumed
that i64 loads were legal if i64 was a legal type.
It also seems better to report i64 loads as legal, since they actually
are and we were just promoting them to simplify our tablegen files.
Lang Hames [Fri, 23 Jan 2015 21:25:00 +0000 (21:25 +0000)]
[Orc] New JIT APIs.
This patch adds a new set of JIT APIs to LLVM. The aim of these new APIs is to
cleanly support a wider range of JIT use cases in LLVM, and encourage the
development and contribution of re-usable infrastructure for LLVM JIT use-cases.
These APIs are intended to live alongside the MCJIT APIs, and should not affect
existing clients.
Included in this patch:
1) New headers in include/llvm/ExecutionEngine/Orc that provide a set of
components for building JIT infrastructure.
Implementation code for these headers lives in lib/ExecutionEngine/Orc.
2) A prototype re-implementation of MCJIT (OrcMCJITReplacement) built out of the
new components.
3) Minor changes to RTDyldMemoryManager needed to support the new components.
These changes should not impact existing clients.
4) A new flag for lli, -use-orcmcjit, which will cause lli to use the
OrcMCJITReplacement class as its underlying execution engine, rather than
MCJIT itself.
Tests to follow shortly.
Special thanks to Michael Ilseman, Pete Cooper, David Blaikie, Eric Christopher,
Justin Bogner, and Jim Grosbach for extensive feedback and discussion.
Kevin Enderby [Fri, 23 Jan 2015 21:02:44 +0000 (21:02 +0000)]
Fix the problem with llvm-objdump and -archive-headers in printing the archive header size field.
This problem showed up with the clang-cmake-armv7-a15-full bot. Thanks to Renato Golin for his help.
Hans Wennborg [Fri, 23 Jan 2015 20:43:51 +0000 (20:43 +0000)]
LowerSwitch: replace unreachable default with popular case destination
SimplifyCFG currently does this transformation, but I'm planning to remove that
to allow other passes, such as this one, to exploit the unreachable default.
This patch takes care to keep track of what case values are unreachable even
after the transformation, allowing for more efficient lowering.
In llvm-mode, with electric-pair-mode turned on, typing a literal '['
would print out '[[', and '(' would print a '(('. This was a very
annoying bug caused by overzealous syntax-table entries: the parens are
already part of the '(' and ')' class by default. Fix this.
While at it, notice that i32, i64, i1 etc. are not font-locked despite a
clear intent to do so. The issue is that regexp-opt doesn't accept
regular expressions. So, spell out the common literal integers with
different widths.
Toma Tabacu [Fri, 23 Jan 2015 10:40:19 +0000 (10:40 +0000)]
[mips] Add new error message and improve testing for parsing the .module directive.
Summary:
We used to silently ignore any empty .module's and we used to give an error saying that we found
an "unexpected token at start of statement" when the value of the option wasn't an identifier (e.g. if it was a number).
We now give an error saying that we "expected .module option identifier" in both of those cases.
I also fixed the other tests in mips-abi-bad.s, which all seemed to be broken.
Mehdi Amini [Fri, 23 Jan 2015 07:07:20 +0000 (07:07 +0000)]
DAGCombine: always constant fold FMA when target disable FP exceptions
Summary: When trying to constant fold an FMA in the DAG, getNode()
fails to fold the FMA if an operand is not finite. In this case this
patch allows the constant folding if !TLI->hasFloatingPointExceptions()
These things are potentially used for non-DWARF data (see the discussion
in PR22235), so take the `Dwarf` out of the name. Since the new name
gives fewer clues, update the doxygen to properly describe what they
are.
Chandler Carruth [Thu, 22 Jan 2015 21:53:09 +0000 (21:53 +0000)]
[PM] Actually add the new pass manager support for the assumption cache.
I had already factored this analysis specifically to enable doing this,
but hadn't actually committed the necessary wiring to get at this from
the new pass manager. This also nicely shows how the separate cache
object can be directly managed by the new pass manager.
This analysis didn't have any direct tests and so I've added a printer
pass and a boring test case. I chose to print the i1 value which is
being assumed rather than the call to llvm.assume as that seems much
more useful for testing... but suggestions on an even better printing
strategy welcome. My main goal was to make sure things actually work. =]
IR: Update references to temporaries before deleting
During `MDNode::deleteTemporary()`, call `replaceAllUsesWith(nullptr)`
to update all tracking references to `nullptr`.
This fixes PR22280, where inverted destruction order between tracking
references and the temporaries themselves caused a use-after-free in
`LLParser`.
An alternative fix would be to add an assertion that there are no users,
and continue to fix inverted destruction order in clients (like
`LLParser`), but instead I decided to make getting-teardown-right easy.
(If someone disagrees let me know.)
Chris Bieneman [Thu, 22 Jan 2015 21:01:12 +0000 (21:01 +0000)]
Refactoring cl::parser construction and initialization.
Summary:
Some parsers need references back to the option they are members of. This is used for handling the argument string as well as by the various pass name parsers for making pass names into flags.
Making parsers that need to refer back to the option have a reference to the option eliminates some of the members of various parsers, and enables further code cleanup.
Sanjay Patel [Thu, 22 Jan 2015 18:21:26 +0000 (18:21 +0000)]
merge consecutive stores of extracted vector elements (PR21711)
This is a 2nd try at the same optimization as http://reviews.llvm.org/D6698.
That patch was checked in at r224611, but reverted at r225031 because it
caused a failure outside of the regression tests.
The cause of the crash was not recognizing consecutive stores that have mixed
source values (loads and vector element extracts), so this patch adds a check
to bail out if any store value is not coming from a vector element extract.
This patch also refactors the shared logic of the constant source and vector
extracted elements source cases into a helper function.
David Blaikie [Thu, 22 Jan 2015 17:49:59 +0000 (17:49 +0000)]
Revert "PR21408: Workaround the appearance of duplicate variables due to problems when inlining two calls to the same function from the same call site."
The underlying bug has been fixed in r226736 so there's no need to
workaround this anymore.
Tim Northover [Thu, 22 Jan 2015 17:23:04 +0000 (17:23 +0000)]
AArch64: decode all MRS/MSR forms early to avoid saving FeatureBits.
Currently, we're adding a uint64_t describing the current subtarget so
that matching can check whether the specified register is valid.
However, we want to move to a bitset for those bits (x86 has more than
64 of them).
This can't live in a union so it's probably better to do the checks
early (especially as there are only 3 of them).
Rafael Espindola [Thu, 22 Jan 2015 14:20:45 +0000 (14:20 +0000)]
[pr21886] Change MCJIT/ELF to support MSVC C++ mangled symbol.
The ELF format is used on Windows by the MCJIT engine. Thus, on Windows, the
ELFObjectWriter can encounter symbols mangled using the MS Visual Studio C++
name mangling. Symbols mangled using the MSVC C++ name mangling can legally
have "@@@" as a substring. The EFLObjectWriter should not interpret the "@@@"
substring as specifying GNU-style symbol versioning. The ELFObjectWriter
therefore check for the MSVC C++ name mangling prefix which is either "?", "@?",
"imp_?" or "imp_?@".
[DAGCombine] Produce better code for constant splats
This solves PR22276.
Splats of constants would sometimes produce redundant shuffles, sometimes ridiculously so (see the PR for details). Fold these shuffles into BUILD_VECTORs early on instead.
[DAGCombine] Produce better code for constant splats
This solves PR22276.
Splats of constants would sometimes produce redundant shuffles, sometimes ridiculously so (see the PR for details). Fold these shuffles into BUILD_VECTORs early on instead.
Fixed a bug in type legalizer for masked load/store intrinsics.
The problem occurs when after vectorization we have type
<2 x i32>. This type is promoted to <2 x i64> and then requires
additional efforts for expanding loads and truncating stores.
I added EXPAND / TRUNCATE attributes to the masked load/store
SDNodes. The code now contains additional shuffles.
I've prepared changes in the cost estimation for masked memory
operations, it will be submitted separately.
Fixed a bug in narrowing store operation.
Type MVT::i1 became legal in KNL, but store operation can't be narrowed to this type,
since the size of VT (1 bit) is not equal to its actual store size(8 bits).
Sanjoy Das [Thu, 22 Jan 2015 09:32:02 +0000 (09:32 +0000)]
[NFC] Introduce a 'struct Range' for IRCE
Use the struct instead of a std::pair<Value *, Value *>. This makes a
Range an obviously immutable object, and we can now assert that a
range is well-typed (Begin->getType() == End->getType()) on its
construction.
Sanjoy Das [Thu, 22 Jan 2015 08:29:18 +0000 (08:29 +0000)]
Fix crashes in IRCE caused by mismatched types
There are places where the inductive range check elimination pass
depends on two llvm::Values or llvm::SCEVs to be of the same
llvm::Type when they do not need to be. This patch relaxes those
restrictions (by bailing out of the optimization if the types
mismatch), and adds test cases to trigger those paths.
These issues were found by bootstrapping clang with IRCE running in
the -O3 pass ordering.
Erik Eckstein [Thu, 22 Jan 2015 08:20:51 +0000 (08:20 +0000)]
SLPVectorizer: add a second limit for the number of alias checks.
Even with the current limit on the number of alias checks, the containing loop has quadratic complexity.
This begins to hurt for blocks containing > 1K load/store instructions.
This commit introduces a limit for the loop count. It reduces the runtime for such very large blocks.
Chandler Carruth [Thu, 22 Jan 2015 05:25:13 +0000 (05:25 +0000)]
[PM] Rename InstCombine.h to InstCombineInternal.h in preparation for
creating a non-internal header file for the InstCombine pass.
I thought about calling this InstCombiner.h or in some way more clearly
associating it with the InstCombiner clas that it is primarily defining,
but there are several other utility interfaces defined within this for
InstCombine. If, in the course of refactoring, those end up moving
elsewhere or going away, it might make more sense to make this the
combiner's header alone.
Naturally, this is a bikeshed to a certain degree, so feel free to lobby
for a different shade of paint if this name just doesn't suit you.
Chandler Carruth [Thu, 22 Jan 2015 05:08:12 +0000 (05:08 +0000)]
[canonicalize] Teach InstCombine to canonicalize loads which are only
ever stored to always use a legal integer type if one is available.
Regardless of whether this particular type is good or bad, it ensures we
don't get weird differences in generated code (and resulting
performance) from "equivalent" patterns that happen to end up using
a slightly different type.
After some discussion on llvmdev it seems everyone generally likes this
canonicalization. However, there may be some parts of LLVM that handle
it poorly and need to be fixed. I have at least verified that this
doesn't impede GVN and instcombine's store-to-load forwarding powers in
any obvious cases. Subtle cases are exactly what we need te flush out if
they remain.
Also note that this IR pattern should already be hitting LLVM from Clang
at least because it is exactly the IR which would be produced if you
used memcpy to copy a pointer or floating point between memory instead
of a variable.
ARM: fail less catastrophically on invalid Windows input
Windows supports a restricted set of relocations (compared to ARM ELF). In some
cases, we may end up generating an unsupported relocation. This can occur with
bad input to the assembler in particular (the frontend should never generate
code that cannot be compiled). Generate an error rather than just aborting.
The change in the API is driven by the desire to provide a slightly more helpful
message for debugging purposes.
This code was confusing, since it created a `DIExpressionIterator` from
an invalid start point (although it wasn't wrong: it never actually
iterated). Now that the underlying iterator has `getNumber()`, just use
it directly.
Chris Bieneman [Thu, 22 Jan 2015 01:49:59 +0000 (01:49 +0000)]
Assigning and copying command line option objects shouldn't be allowed.
Summary:
The default copy and assignment operators for these objects probably don't actually do what the clients intend, so they should be deleted.
Places using the assignment operator to set the value of an option should cast to the option's data type first to call into the override for operator=. Places using the copy constructor just need to be changed to not copy (i.e. passing by const reference instead of value).
Sanjoy Das [Thu, 22 Jan 2015 00:48:47 +0000 (00:48 +0000)]
Make ScalarEvolution less aggressive with respect to no-wrap flags.
ScalarEvolution currently lowers a subtraction recurrence to an add
recurrence with the same no-wrap flags as the subtraction. This is
incorrect because `sub nsw X, Y` is not the same as `add nsw X, -Y`
and `sub nuw X, Y` is not the same as `add nuw X, -Y`. This patch
fixes the issue, and adds two test cases demonstrating the bug.
Chandler Carruth [Wed, 21 Jan 2015 23:45:01 +0000 (23:45 +0000)]
[canonicalization] Refactor how we create new stores into a helper
function. This is a bit tidier anyways and will make a subsquent patch
simpler as I want to add another case to this combine.
Added most of the missing integer vector folding patterns for SSE (to SSE42) and AVX1.
The most useful of these are probably the i32/i64 extraction, i8/i16/i32/i64 insertions, zero/sign extension, unsigned saturation subtractions, i64 subtractions and the variable mask blends (pblendvb) - others include CLMUL, SSE42 string comparisons and bit tests.
David Blaikie [Wed, 21 Jan 2015 22:57:29 +0000 (22:57 +0000)]
DebugInfo: Use distinct inlinedAt MDLocations to avoid separate inlined calls being coalesced
When two calls from the same MDLocation are inlined they currently get
treated as one inlined function call (creating difficulty debugging,
duplicate variables, etc).
Clang worked around this by including column information on inline calls
which doesn't address LTO inlining or calls to the same function from
the same line and column (such as through a macro). It also didn't
address ctor and member function calls.
By making the inlinedAt locations distinct, every call site has an
explicitly distinct location that cannot be coalesced with any other
call.
This can produce linearly (2x in the worst case where every call is
inlined and the call instruction has a non-call instruction at the same
location) more debug locations. Any increase beyond that are in cases
where the Clang workaround was insufficient and the new scheme is
creating necessary distinct nodes that were being erroneously coalesced
previously.
After this change to LLVM the incomplete workarounds in Clang. That
should reduce the number of debug locations (in a build without column
info, the default on Darwin, not the default on Linux) by not creating
pseudo-distinct locations for every call to an inline function.
(oh, and I made the inlined-at chain rebuilding iterative instead of
recursive because I was having trouble wrapping my head around it the
way it was - open to discussion on the right design for that function
(including going back to a recursive solution))
Chris Bieneman [Wed, 21 Jan 2015 22:45:52 +0000 (22:45 +0000)]
Adding a new cl::HideUnrelatedOptions API to allow clang to migrate off cl::getRegisteredOptions.
Summary: cl::getRegisteredOptions really exposes some of the innards of how command line parsing is implemented. Exposing new APIs that allow us to disentangle client code from implementation details will allow us to make more extensive changes to command line parsing.
Simon Pilgrim [Wed, 21 Jan 2015 22:44:35 +0000 (22:44 +0000)]
[X86][SSE] Added support for SSE3 lane duplication shuffle instructions
This patch adds shuffle matching for the SSE3 MOVDDUP, MOVSLDUP and MOVSHDUP instructions. The big use of these being that they avoid many single source shuffles from needing to use (pre-AVX) dual source instructions such as SHUFPD/SHUFPS: causing extra moves and preventing load folds.
Adding these instructions uncovered an issue in XFormVExtractWithShuffleIntoLoad which crashed on single operand shuffle instructions (now fixed). It also involved fixing getTargetShuffleMask to correctly identify theses instructions as unary shuffles.