Summary:
Implemented an InstCombine transformation that takes a blendv* intrinsic
call and translates it into an IR select, if the mask is constant.
This will eventually get lowered into blends with immediates if possible,
or pblendvb (with an option to further optimize if we can transform the
pblendvb into a blend+immediate instruction, depending on the selector).
It will also enable optimizations by the IR passes, which give up on
sight of the intrinsic.
Both the transformation and the lowering of its result to asm got shiny
new tests.
The transformation is a bit convoluted because of blendvp[sd]'s
definition:
Its mask is a floating point value! This forces us to convert it and get
the highest bit. I suppose this happened because the mask has type
__m128 in Intel's intrinsic and v4sf (for blendps) in gcc's builtin.
I will send an email to llvm-dev to discuss if we want to change this or
not.
Tim Northover [Mon, 26 May 2014 17:22:07 +0000 (17:22 +0000)]
AArch64: force i1 to be zero-extended at an ABI boundary.
This commit is debatable. There are two possible approaches, neither
of which is really satisfactory:
1. Use "@foo(i1 zeroext)" to mean an extension to 32-bits on Darwin,
and 8 bits otherwise.
2. Redefine "@foo(i1)" to mean that the i1 is extended by the caller
to 8 bits. This goes against the spirit of "zeroext" I think, but
it's a bit of a vague construct anyway (by definition you're going
to extend to the amount required by the ABI, that's why it's the
ABI!).
This implements option 2. The DAG machinery really isn't setup for the
first (there's a fairly strong assumption that "zeroext" goes to at
least the smallest register size), and even if it was the resulting
DAG looks like it would be inferior in many cases.
Theoretically we could add AssertZext nodes in the consumers of
ABI-passed values too now, but this actually seems to make the code
worse in practice by making truncation proceed in two steps. The code
produced is equally valid if we continue to assume only the low bit is
defined.
Tim Northover [Mon, 26 May 2014 17:21:53 +0000 (17:21 +0000)]
AArch64: simplify calling conventions slightly.
We can eliminate the custom C++ code in favour of some TableGen to
check the same things. Functionality should be identical, except for a
buffer overrun that was present in the C++ code and meant webkit
failed if any small argument needed to be passed on the stack.
Tilmann Scheller [Mon, 26 May 2014 09:37:19 +0000 (09:37 +0000)]
[AArch64] Add a regression test for the load store optimizer.
We have a couple of regression tests for load/store pairing, but (to my knowledge) there are no regression tests for the load/store + add/sub folding.
As a first step towards increased test coverage of this area, this commit adds a test for one instance of a load + add to pre-indexed load transformation.
Owen Anderson [Mon, 26 May 2014 08:58:51 +0000 (08:58 +0000)]
Make the LoopRotate pass's maximum header size configurable both programmatically
and via the command line, mirroring similar functionality in LoopUnroll. In
situations where clients used custom unrolling thresholds, their intent could
previously be foiled by LoopRotate having a hardcoded threshold.
David Blaikie [Mon, 26 May 2014 06:44:52 +0000 (06:44 +0000)]
DebugInfo: Test linkonce-odr functions under LTO.
This was previously regressed/broken by r192749 (reverted due to this
issue in r192938) and I was about to break it again by accident with
some more invasive changes that deal with the subprogram lists. So to
avoid that and further issues - here's a test.
It's a pretty basic test - in both r192749 and my impending case, this
test would crash, but checking the basics (that we put a subprogram in
just one of the two CUs) seems like a good start.
We still get this wrong in weird ways if the linkonce-odr function
happens to not be identical in the metadata (because it's defined in two
different files (hence the # line directives in this test), etc) even
though it meets the language requirements (identical token stream) for
such a thing. That results in two subprogram DIEs, but only one of them
gets the parameter and high/low pc information, etc. We probably need to
use the DIRef infrastructure to deduplicate functions as we do types to
address this issue - or perhaps teach the BC linker to remove the
duplicate entries in subprogram lists?
Remove the use of the std::function and replace the capturing lambda with a
non-capturing one, opting to pass the user data down to the context. This is
needed as std::function is not yet available on all hosted platforms (it
requires RTTI, which breaks on Windows).
Move the implementation of the Win64 EH printer from the COFFDumper into its own
class. This is in preparation for adding support to print ARM EH information.
The only real change here is in printUnwindInfo where we now lambda lift the
implicit this parameter for the resolveFunction. Also setup the printing to
handle ARM. This now has set the stage to introduce ARM EH printing.
tools: refactor COFFDumper symbol resolution logic
Make the use of the cache more transparent to the users. There is no reason
that the cached entries really need to be passed along. The overhead for doing
so is minimal: a single extra parameter. This requires that some standalone
functions be brought into the COFFDumper class so that they may access the
cache.
David Blaikie [Sun, 25 May 2014 18:11:35 +0000 (18:11 +0000)]
DebugInfo: Fix inlining with #file directives a little harder
Seems my previous fix was insufficient - we were still not adding the
inlined function to the abstract scope list. Which meant it wasn't
flagged as inline, didn't have nested lexical scopes in the abstract
definition, and didn't have abstract variables - so the inlined variable
didn't reference an abstract variable, instead being described
completely inline.
Rafael Espindola [Sun, 25 May 2014 12:49:07 +0000 (12:49 +0000)]
Emit data or code export directives based on the type.
Currently we look at the Aliasee to decide what type of export
directive to use. It seems better to use the type of the alias
directly. This is similar to how we handle the alias having the
same address but other attributes (linkage, visibility) from the
aliasee.
With this patch it is now possible to do things like
This extension point allows adding passes that perform peephole optimizations
similar to the instruction combiner. These passes will be inserted after
each instance of the instruction combiner pass.
Benjamin Kramer [Sat, 24 May 2014 13:13:17 +0000 (13:13 +0000)]
CodeGen: Make MachineBasicBlock::back skip to the beginning of the last bundle.
This makes front/back symmetric with begin/end, avoiding some confusion.
Added instr_front/instr_back for the old behavior, corresponding to
instr_begin/instr_end. Audited all three in-tree users of back(), all
of them look like they don't want to look inside bundles.
Fixes an assertion (PR19815) when generating debug info on mips, where a
delay slot was bundled at the end of a branch.
Tim Northover [Sat, 24 May 2014 12:50:23 +0000 (12:50 +0000)]
AArch64/ARM64: move ARM64 into AArch64's place
This commit starts with a "git mv ARM64 AArch64" and continues out
from there, renaming the C++ classes, intrinsics, and other
target-local objects for consistency.
"ARM64" test directories are also moved, and tests that began their
life in ARM64 use an arm64 triple, those from AArch64 use an aarch64
triple. Both should be equivalent though.
This finishes the AArch64 merge, and everyone should feel free to
continue committing as normal now.
Tim Northover [Sat, 24 May 2014 12:42:26 +0000 (12:42 +0000)]
AArch64/ARM64: remove AArch64 from tree prior to renaming ARM64.
I'm doing this in two phases for a better "git blame" record. This
commit removes the previous AArch64 backend and redirects all
functionality to ARM64. It also deduplicates test-lines and removes
orphaned AArch64 tests.
The next step will be "git mv ARM64 AArch64" and rewire most of the
tests.
Hopefully LLVM is still functional, though it would be even better if
no-one ever had to care because the rename happens straight
afterwards.
David Blaikie [Fri, 23 May 2014 21:11:46 +0000 (21:11 +0000)]
DebugInfo: Generalize some tests to handle variations in attribute ordering.
In an effort to fix inlined debug info in situations where the out of
line definition of a function preceeds any inlined usage, the order in
which some attributes are added to subprogram DIEs may change. (in
essence, definition-necessary attributes like DW_AT_low_pc/high_pc will
be added immediately, but the names, types, and other features will be
delayed to module end where they may either be added to the subprogram
DIE or instead reference an abstract definition for those values)
These tests can be generalized to be resilient to this change. 5 or so
tests actually have to be incompatibly changed to cope with this
reordering and will go along with the change that affects the order.
David Blaikie [Fri, 23 May 2014 20:25:15 +0000 (20:25 +0000)]
DebugInfo: Put concrete definitions referencing abstract definitions in the same scope as the abstract definition.
This seems like a simple cleanup/improved consistency, but also helps
lay the foundation to fix the bug mentioned in the test case: concrete
definitions preceeding any inlined usage aren't properly split into
concrete + abstract (because they're not known to need it until it's too
late).
Once we start deferring this choice until later, we won't have the
choice to put concrete definitions for inlined subroutines in a
different scope from concrete definitions for non-inlined subroutines
(since we won't know at time-of-construction which one it'll be). This
change brings those two cases into alignment ahead of that future
chaneg/fix.
Andrew Trick [Fri, 23 May 2014 19:47:13 +0000 (19:47 +0000)]
Fix and improve SCEV ComputeBackedgeTankCount.
This is a follow-up to r209358: PR19799: Indvars miscompile due to an
incorrect max backedge taken count from SCEV.
That fix was incomplete as pointed out by Arnold and Michael Z. The
code was also too confusing. It needed a careful rewrite with more
unit tests. This version will also happen to optimize more cases.
Jingyue Wu [Fri, 23 May 2014 18:39:40 +0000 (18:39 +0000)]
Add the extracted constant offset using GEP
Fixed a TODO in r207783.
Add the extracted constant offset using GEP instead of ugly
ptrtoint+add+inttoptr. Using GEP simplifies future optimizations and makes IR
easier to understand.
Updated all affected tests, and added a new test in split-gep.ll to cover a
corner case where emitting uglygep is necessary.
Lang Hames [Fri, 23 May 2014 18:35:44 +0000 (18:35 +0000)]
[RuntimeDyld] Remove relocation bounds check introduced in r208375 (MachO only).
We do all of our address arithmetic in 64-bit, and operations involving
logically negative 32-bit offsets (actually represented as unsigned 64 bit ints)
often overflow into higher bits. The overflow check could be preserved by
casting to uint32 at the callsite for applyRelocationValue, but this would
eliminate the value of the check.
The right way to handle overflow in relocations is to make relocation processing
target specific, and compute the values for RelocationEntry objects in the
appropriate types (32-bit for 32-bit targets, 64-bit for 64-bit targets). This
is coming as part of the cleanup I'm working on.
Aaron Ballman [Fri, 23 May 2014 15:33:39 +0000 (15:33 +0000)]
Teach the table generated emitPseudoExpansionLowering function to not emit a switch statement containing only a default statement (and no cases). Updated some of the code to use range-based for loops as well. No functional changes.
Daniel Sanders [Fri, 23 May 2014 13:35:24 +0000 (13:35 +0000)]
[mips] Work around inconsistency in llvm-mc's placement of fixup markers
Summary:
Add a second fixup table to MipsAsmBackend::getFixupKindInfo() to correctly
position llvm-mc's fixup placeholders for big-endian.
See PR19836 for full details of the issue. To summarize, the fixup placeholders
do not account for endianness properly and the implementations of
getFixupKindInfo() for each target are measuring MCFixupKindInfo.TargetOffset
from different ends of the instruction encoding to compensate.
Daniel Sanders [Fri, 23 May 2014 13:18:02 +0000 (13:18 +0000)]
[mips][mips64r6] [ls][dw][lr] are not available in MIPS32r6/MIPS64r6
Summary:
Instead the system is required to provide some means of handling unaligned
load/store without special instructions. Options include full hardware
support, full trap-and-emulate, and hybrids such as hardware support within
a cache line and trap-and-emulate for multi-line accesses.
MipsSETargetLowering::allowsUnalignedMemoryAccesses() has been configured to
assume that unaligned accesses are 'fast' on the basis that I expect few
hardware implementations will opt for pure-software handling of unaligned
accesses. The ones that do handle it purely in software can override this.
mips64-load-store-left-right.ll has been merged into load-store-left-right.ll
The stricter testing revealed a Bits!=Bytes bug in passByValArg(). This has
been fixed and the variables renamed to clarify the units they hold.
[asan] properly instrument memory accesses that have small alignment (smaller than min(8,size)) by making two checks instead of one. This may slowdown some cases, e.g. long long on 32-bit or wide loads produced after loop unrolling. The benefit is higher sencitivity.
Updated the llvm.mem.parallel_loop_access semantics to include the possibility
to have only some of the loop's memory instructions be annotated and still _help_
the loop carried dependence analysis.
This was discussed in the llvmdev ML (topic: "parallel loop metadata question").
Simon Atanasyan [Fri, 23 May 2014 08:07:09 +0000 (08:07 +0000)]
[YAML] Add an optional argument `EnumMask` to the `yaml::IO::bitSetCase()`.
Some bit-set fields used in ELF file headers in fact contain two parts.
The first one is a regular bit-field. The second one is an enumeraion.
For example ELF header `e_flags` for MIPS target might contain the
following values:
For printing bit-sets we use the `yaml::IO::bitSetCase()`. It does not
support bit-set/enumeration combinations and prints too many flags from
an enumeration part. This patch fixes this problem. New method
`yaml::IO::maskedBitSetCase()` handle "enumeration" part of bitset
defined by provided mask.
Rafael correctly pointed out that the restriction is unnecessary. Although the
tests are intended to ensure that we dont abort due to an assertion, running the
tests in all modes is better since it also ensures that we dont crash without
assertions. Always run these tests to ensure that we can handle invalid input
correctly.
Justin Bogner [Fri, 23 May 2014 00:06:56 +0000 (00:06 +0000)]
ScalarEvolution: Fix handling of AddRecs in isKnownPredicate
ScalarEvolution::isKnownPredicate() can wrongly reduce a comparison
when both the LHS and RHS are SCEVAddRecExprs. This checks that both
LHS and RHS are guarded in the case when both are SCEVAddRecExprs.
The test case is against indvars because I could not find a way to
directly test SCEV.
[Graph Writer] Limit the length of the graph name because Windows can't handle it.
Windows can't handle paths longer than 260 code points without \\?\. Even
with \\?\ it can't handle path components longer than 255 code points. So
limit graph names to the arbitrary length of 140. Random characters are still
added to the end, so it's ok if graph names collide.
The implementation of processScatteredVANILLA is tasteless (*ba-dum-ching*),
but I'm working on a substantial tidy-up of RuntimeDyldMachO that should
improve things.
This patch also fixes a type-o in RuntimeDyldMachO::processSECTDIFFRelocation,
and teaches that method to skip over the PAIR reloc following the SECTDIFF.
Matt Arsenault [Thu, 22 May 2014 18:27:07 +0000 (18:27 +0000)]
R600: Add definition for flat address space ID.
Use 4 since that's probably what it will be for spir.
Move ADDRESS_NONE to the end to keep the constant_buffer_* values
unchanged, since apparently a bunch of r600 tests use those directly.
Summary:
This patch moves the handling of -pass-remarks* over to
lib/DiagnosticInfo.cpp. This allows the removal of the
optimizationRemarkEnabledFor functions from LLVMContextImpl, as they're
not needed anymore.
Andrea Di Biagio [Thu, 22 May 2014 16:21:39 +0000 (16:21 +0000)]
[X86] Improve the lowering of BITCAST from MVT::f64 to MVT::v4i16/MVT::v8i8.
This patch teaches the x86 backend how to efficiently lower ISD::BITCAST dag
nodes from MVT::f64 to MVT::v4i16 (and vice versa), and from MVT::f64 to
MVT::v8i8 (and vice versa).
This patch extends the logic from revision 208107 to also handle MVT::v4i16
and MVT::v8i8. Also, this patch correctly propagates Undef values when
performing the widening of a vector (example: when widening from v2i32 to
v4i32, the upper 64bits of the resulting vector are 'undef').
Diego Novillo [Thu, 22 May 2014 14:19:46 +0000 (14:19 +0000)]
Add support for missed and analysis optimization remarks.
Summary:
This adds two new diagnostics: -pass-remarks-missed and
-pass-remarks-analysis. They take the same values as -pass-remarks but
are intended to be triggered in different contexts.
-pass-remarks-missed is used by LLVMContext::emitOptimizationRemarkMissed,
which passes call when they tried to apply a transformation but
couldn't.
-pass-remarks-analysis is used by LLVMContext::emitOptimizationRemarkAnalysis,
which passes call when they want to inform the user about analysis
results.
The patch also:
1- Adds support in the inliner for the two new remarks and a
test case.
2- Moves emitOptimizationRemark* functions to the llvm namespace.
3- Adds an LLVMContext argument instead of making them member functions
of LLVMContext.
Tim Northover [Thu, 22 May 2014 11:56:09 +0000 (11:56 +0000)]
ARM64: separate load/store operands to simplify assembler
This changes ARM64 to use separate operands for each component of an
address, and look for separate '[', '$Rn, ..., ']' tokens when
parsing.
This allows us to do away with quite a bit of special C++ code to
handle monolithic "addressing modes" in the MC components. The more
incremental matching of the assembler operands also allows for better
diagnostics when LLVM is presented with invalid input.
Most of the complexity here is with the register-offset instructions,
which were extremely dodgy beforehand: even when the instruction used
wM, LLVM's model had xM as an operand. We papered over this
discrepancy before, but that approach doesn't work now so I split them
into separate X and W variants.
Daniel Sanders [Thu, 22 May 2014 11:55:04 +0000 (11:55 +0000)]
[mips] Make unalignedload.ll test stricter and easier to modify for MIPS32r6/MIPS64r6
Summary:
* Split into two functions, one to test each struct.
* R0 and R2 must be defined by an lw with a %got reference to the correct
symbol.
* Test for $4 (first argument) where appropriate instead of accepting any
register.
* Test that the two lbu's are correctly combined into $4