Now that we can create much more exhaustive X86 memory folding tests, this patch adds the missing AVX1/F16C floating point instruction stack foldings we can easily test for including the scalar intrinsics (add, div, max, min, mul, sub), conversions float/int to double, half precision conversions, rounding, dot product and bit test. The patch also adds a couple of obviously missing SSE instructions (more to follow once we have full SSE testing).
Now that scalar folding is working it broke a very old test (2006-10-07-ScalarSSEMiscompile.ll) - this test appears to make no sense as its trying to ensure that a scalar subtraction isn't folded as it 'would zero the top elts of the loaded vector' - this test just appears to be wrong to me.
IR: Allow temporary nodes to become uniqued or distinct
Add `MDNode::replaceWithUniqued()` and `MDNode::replaceWithDistinct()`,
which mutate temporary nodes to become uniqued or distinct. On uniquing
collisions, the unique version is returned and the node is deleted.
This takes advantage of temporary nodes being folded back in, and should
let me clean up some awkward logic in `MapMetadata()`.
r226504 added `TempMDNodeDeleter` to help with `std::unique_ptr<>`-izing
the `MDNode::getTemporary()` interface. It doesn't need to be
templated, though.
Change `MDTuple::getTemporary()` and `MDLocation::getTemporary()` to
return (effectively) `std::unique_ptr<T, MDNode::deleteTemporary>`, and
clean up call sites. (For now, `DIBuilder` call sites just call
`release()` immediately.)
There's an accompanying change in each of clang and polly to use the new
API.
Rafael Espindola [Mon, 19 Jan 2015 21:11:14 +0000 (21:11 +0000)]
Add r224985 back with fixes.
The fixes are to note that AArch64 has additional restrictions on when local
relocations can be used. In particular, ld64 requires that relocations to
cstring/cfstrings use linker visible symbols.
Original message:
In an assembly expression like
bar:
.long L0 + 1
the intended semantics is that bar will contain a pointer one byte past L0.
In sections that are merged by content (strings, 4 byte constants, etc), a
single position in the section doesn't give the linker enough information.
For example, it would not be able to tell a relocation must point to the
end of a string, since that would look just like the start of the next.
The solution used in ELF to use relocation with symbols if there is a non-zero
addend.
In MachO before this patch we would just keep all symbols in some sections.
This would miss some cases (only cstrings on x86_64 were implemented) and was
inefficient since most relocations have an addend of 0 and can be represented
without the symbol.
This patch implements the non-zero addend logic for MachO too.
Remove `MDNodeFwdDecl` (as promised in r226481). Aside from API
changes, there's no real functionality change here.
`MDNode::getTemporary()` now forwards to `MDTuple::getTemporary()`,
which returns a tuple with `isTemporary()` equal to true.
The main point is that we can now add temporaries of other `MDNode`
subclasses, needed for PR22235 (I introduced `MDNodeFwdDecl` in the
first place because I didn't recognize this need, and thought they were
only needed to handle forward references).
A few things left out of (or highlighted by) this commit:
- I've had to remove the (few) uses of `std::unique_ptr<>` to deal
with temporaries, since the destructor is no longer public.
`getTemporary()` should probably return the equivalent of
`std::unique_ptr<T, MDNode::deleteTemporary>`.
- `MDLocation::getTemporary()` doesn't exist yet (worse, it actually
does exist, but does the wrong thing: `MDNode::getTemporary()` is
inherited and returns an `MDTuple`).
- `MDNode` now only has one subclass, `UniquableMDNode`, and the
distinction between them is actually somewhat confusing.
Merge `getDistinct()`'s implementation with those of `get()` and
`getIfExists()` for both `MDTuple` and `MDLocation`. This will make it
easier to scale to supporting temporaries.
IR: Assert that resolve() is only called on uniqued nodes, NFC
Add an assertion in `UniquableMDNode::resolve()` to prevent temporaries
from being resolved (once they're merged back in). Needed to shuffle
order of `resolve()` and `storeDistinctInContext()` to prevent it from
firing.
Unify the definitions of `MDNode::isResolved()` and
`UniquableMDNode::isResolved()`. Previously, `UniquableMDNode` could
answer this question more efficiently, but now that RAUW support has
been unified with `MDNodeFwdDecl`, `MDNode` doesn't need any casts to
figure out the answer.
IR: Store RAUW support and Context in the same pointer, NFC
Add an `LLVMContext &` to `ReplaceableMetadataImpl`, create a class that
either holds a reference to an `LLVMContext` or owns a
`ReplaceableMetadataImpl`, and use the new class in `MDNode`.
- This saves a pointer in `UniquableMDNode` at the cost of a pointer
in `ValueAsMetadata` (which didn't used to store the `LLVMContext`).
There are far more of the former.
- Unifies RAUW support between `MDNodeFwdDecl` (which is going away,
see r226481) and `UniquableMDNode`.
Change `MDNode::isDistinct()` to only apply to 'distinct' nodes (not
temporaries), and introduce `MDNode::isUniqued()` and
`MDNode::isTemporary()` for the other two possibilities.
More clearly describe the type of storage used for `Metadata`.
- `Uniqued`: uniqued, stored in the context.
- `Distinct`: distinct, stored in the context.
- `Temporary`: not owned by anyone.
This is the first in a series of commits to fix a design problem with
`MDNodeFwdDecl` that I need to solve for PR22235. While `MDNodeFwdDecl`
works well as a forward declaration, we use `MDNode::getTemporary()` for
more than forward declarations -- we also need to create early versions
of nodes (with fields not filled in) that we'll fill out later (see
`DIBuilder::finalize()` and `CGDebugInfo::finalize()` for examples).
This was a blind spot I had when I introduced `MDNodeFwdDecl` (which
David Blaikie (indirectly) highlighted in an unrelated review [1]).
In general, we need `MDTuple::getTemporary()` to give a temporary tuple
(like `MDNodeFwdDecl`), `MDLocation::getTemporary()` to give a temporary
location, and (the problem at hand) `GenericDebugMDNode::getTemporary()`
to give a temporary generic debug node.
So I need to fold the idea of "temporary" nodes back into
`UniquableMDNode`. (More commits to follow as I refactor.)
Adrian Prantl [Mon, 19 Jan 2015 17:57:29 +0000 (17:57 +0000)]
Remove support for DIVariable's FlagIndirectVariable and expect
frontends to use a DIExpression with a DW_OP_deref instead.
This is not only a much more natural place for this informationl; there
is also a technical reason: The FlagIndirectVariable is used to mark a
variable that is turned into a reference by virtue of the calling
convention; this happens for example to aggregate return values.
The inliner, for example, may actually need to undo this indirection to
correctly represent the value in its new context. This is impossible to
implement because the DIVariable can't be safely modified. We can however
safely construct a new DIExpression on the fly.
Chandler Carruth [Mon, 19 Jan 2015 12:36:53 +0000 (12:36 +0000)]
[PM] Replace the Pass argument to SplitEdge with specific analyses used
and updated.
This may appear to remove handling for things like alias analysis when
splitting critical edges here, but in fact no callers of SplitEdge
relied on this. Similarly, all of them wanted to preserve LCSSA if there
was any update of the loop info. That makes the interface much simpler.
With this, all of BasicBlockUtils.h is free of Pass arguments and
prepared for the new pass manager. This is tho majority of utilities
that relied on pass arguments.
Chandler Carruth [Mon, 19 Jan 2015 12:09:11 +0000 (12:09 +0000)]
[PM] Remove the Pass argument from all of the critical edge splitting
APIs and replace it and numerous booleans with an option struct.
The critical edge splitting API has a really large surface of flags and
so it seems worth burning a small option struct / builder. This struct
can be constructed with the various preserved analyses and then flags
can be flipped in a builder style.
The various users are now responsible for directly passing along their
analysis information. This should be enough for the critical edge
splitting to work cleanly with the new pass manager as well.
This API is still pretty crufty and could be cleaned up a lot, but I've
focused on this change just threading an option struct rather than
a pass through the API.
Chandler Carruth [Mon, 19 Jan 2015 10:43:00 +0000 (10:43 +0000)]
Suppress the newly added Clang warning for the inaccessible base in this
test. Do that after we suppress the warnings for unknown pragmas as this
warning flag is quite new in Clang and so old Clang's would warn all the
time on this file.
Chandler Carruth [Mon, 19 Jan 2015 10:23:00 +0000 (10:23 +0000)]
[PM] Relax asserts and always try to reconstruct loop simplify form when
we can while splitting critical edges.
The only code which called this and didn't require simplified loops to
be preserved is polly, and the code behaves correctly there anyways.
Without this change, it becomes really hard to share this code with the
new pass manager where things like preserving loop simplify form don't
make any sense.
If anyone discovers this code behaving incorrectly, what it *should* be
testing for is whether the loops it needs to be in simplified form are
in fact in that form. It should always be trying to preserve that form
when it exists.
Erik Eckstein [Mon, 19 Jan 2015 09:33:38 +0000 (09:33 +0000)]
SLPVectorizer: limit the number of alias checks to reduce the runtime.
In case of blocks with many memory-accessing instructions, alias checking can take lot of time
(because calculating the memory dependencies has quadratic complexity).
I chose a limit which resulted in no changes when running the benchmarks.
Hal Finkel [Mon, 19 Jan 2015 07:44:45 +0000 (07:44 +0000)]
[PowerPC] Minor correction to r226432
We don't need to exclude patchpoints from the implicit r2 dependence in
FastISel because it is added as an implicit operand and, thus, should not
confuse that StackMap code.
Hal Finkel [Mon, 19 Jan 2015 07:20:27 +0000 (07:20 +0000)]
[PowerPC] Add r2 as an operand for all calls under both PPC64 ELF V1 and V2
Our PPC64 ELF V2 call lowering logic added r2 as an operand to all direct call
instructions in order to represent the dependency on the TOC base pointer
value. Restricting this to ELF V2, however, does not seem to make sense: calls
under ELF V1 have the same dependence, and indirect calls have an r2 dependence
just as direct ones. Make sure the dependence is noted for all calls under both
ELF V1 and ELF V2.
Craig Topper [Mon, 19 Jan 2015 06:07:27 +0000 (06:07 +0000)]
[x86] Change AVX512 intrinsics to take a 8-bit immediate for the comparision kind instead of a 32-bit immediate. This better aligns with the emitted instruction. It also matches SSE and AVX1 equivalents. Also add auto upgrade support.
Hide the state of TinyPtrVector and remove the single element constructor.
There is no reason for this state to be exposed as public. The single element
constructor was superfulous in light of the single element ArrayRef
constructor.
David Blaikie [Sun, 18 Jan 2015 20:43:57 +0000 (20:43 +0000)]
Attempt to fix the MSVC build by working around a layering issue
Since MCStreamer isn't part of Support, the dtor can't be called from
here - so just pass by reference instead. This is rather imperfect, but
will hopefully suffice.
Daniel Sanders [Sun, 18 Jan 2015 18:38:36 +0000 (18:38 +0000)]
[mips] Make whitespace in disassembler tests more consistent. NFC.
The tests for the ISA's should now be approximately diffable. That is, the
output of 'diff valid-mips1.txt valid-mips2.txt' should be emit the lines
for instructions that were added/removed to/from MIPS-I by MIPS-II. This
doesn't work perfectly at the moment due to ordering differences but it
should be close.
Hal Finkel [Sun, 18 Jan 2015 15:59:44 +0000 (15:59 +0000)]
[PowerPC] Don't hard-code R2 as register when processing TOC relocations
Instructions that have high-order TOC relocations always carry R2 as their base
register, so it does not matter whether we take the register from the
instruction or just hard-code it in PPCAsmPrinter. In the future, however, we
might want to apply these relocations to instructions using a different
register, so taking the register from the instruction is a better thing to do.
No change in functionality here, however.
Simon Pilgrim [Sun, 18 Jan 2015 12:56:39 +0000 (12:56 +0000)]
AVX1 stack folding tests. NFC.
Begun adding more exhaustive tests - all floating point instructions should now be either tested or have placeholders. We do seem to have a number of missing instructions, I will add a patch for review once the remaining working instructions are added.
I'll then move on to SSE tests and then the integer instructions.
Hal Finkel [Sun, 18 Jan 2015 12:08:47 +0000 (12:08 +0000)]
[PowerPC] Initial PPC64 calling-convention changes for fastcc
The default calling convention specified by the PPC64 ELF (V1 and V2) ABI is
designed to work with both prototyped and non-prototyped/varargs functions. As
a result, GPRs and stack space are allocated for every argument, even those
that are passed in floating-point or vector registers.
GlobalOpt::OptimizeFunctions will transform local non-varargs functions (that
do not have their address taken) to use the 'fast' calling convention.
When functions are using the 'fast' calling convention, don't allocate GPRs for
arguments passed in other types of registers, and don't allocate stack space for
arguments passed in registers. Other changes for the fast calling convention
may be added in the future.
Chandler Carruth [Sun, 18 Jan 2015 09:21:15 +0000 (09:21 +0000)]
[PM] Pull the analyses used for another utility routine into its API
rather than relying on the pass object.
This one is a bit annoying, but will pay off. First, supporting this one
will make the next one much easier, and for utilities like LoopSimplify,
this is moving them (slowly) closer to not having to pass the pass
object around throughout their APIs.
Chandler Carruth [Sun, 18 Jan 2015 02:08:05 +0000 (02:08 +0000)]
[PM] Refactor how the LoopRotation pass access the DominatorTree.
Instead of querying the pass every where we need to, do that once and
cache a pointer in the pass object. This is both simpler and I'm about
to add yet another place where we need to dig out that pointer.
Chandler Carruth [Sun, 18 Jan 2015 01:45:07 +0000 (01:45 +0000)]
[PM] Lift the actual analyses used into the inferface rather than
accepting a Pass and querying it for analyses.
This is necessary to allow the utilities to work both with the old and
new pass managers, and I also think this makes the interface much more
clear and helps the reader know what analyses the utility can actually
handle. I plan to repeat this process iteratively to clean up all the
pass utilities.
Chandler Carruth [Sun, 18 Jan 2015 01:25:51 +0000 (01:25 +0000)]
[PM] Now that LoopInfo isn't in the Pass type hierarchy, it is much
cleaner to derive from the generic base.
Thise removes a ton of boiler plate code and somewhat strange and
pointless indirections. It also remove a bunch of the previously needed
friend declarations. To fully remove these, I also lifted the verify
logic into the generic LoopInfoBase, which seems good anyways -- it is
generic and useful logic even for the machine side.
Chandler Carruth [Sat, 17 Jan 2015 14:31:35 +0000 (14:31 +0000)]
[PM] Remove a dead field.
This was dead even before I refactored how we initialized it, but my
refactoring made it trivially dead and it is now caught by a Clang
warning. This fixes the warning and should clean up the -Werror bot
failures (sorry!).
Chandler Carruth [Sat, 17 Jan 2015 14:16:18 +0000 (14:16 +0000)]
[PM] Split the LoopInfo object apart from the legacy pass, creating
a LoopInfoWrapperPass to wire the object up to the legacy pass manager.
This switches all the clients of LoopInfo over and paves the way to port
LoopInfo to the new pass manager. No functionality change is intended
with this iteration.
Hal Finkel [Sat, 17 Jan 2015 03:57:34 +0000 (03:57 +0000)]
[PowerPC] Don't list R11 as a patchpoint scratch register
R11's status is the same under both the PPC64 ELF V1 and V2 ABIs: it is
reserved for use as an "environment pointer" for compilation models that
require such a thing. We don't, we also don't need a second scratch register,
and because we support only "local" patchpoint call targets, we might as well
let R11 be used for anyregcc patchpoints.
Mehdi Amini [Sat, 17 Jan 2015 01:35:56 +0000 (01:35 +0000)]
Improve DAG combine pass on certain IR vector patterns
Loading 2 2x32-bit float vectors into the bottom half of a 256-bit vector
produced suboptimal code in AVX2 mode with certain IR combinations.
In particular, the IR optimizer folded 2f32 + 2f32 -> 4f32, 4f32 + 4f32
(undef) -> 8f32 into a 2f32 + 2f32 -> 8f32, which seems more canonical,
but then mysteriously generated rather bad code; the movq/movhpd combination
didn't match.
The problem lay in the BUILD_VECTOR optimization path. The 2f32 inputs
would get promoted to 4f32 by the type legalizer, eventually resulting
in a BUILD_VECTOR on two 4f32 into an 8f32. The BUILD_VECTOR then, recognizing
these were both half the output size, concatted them and then produced
a shuffle. However, the resulting concat + shuffle was more complex than
it should be; in the case where the upper half of the output is undef, we
probably want to generate shuffle + concat instead.
This enhancement causes the vector_shuffle combine step to recognize this
suboptimal pattern and correct it. I included it there instead of in BUILD_VECTOR
in case the same suboptimal pattern occurs for other reasons.
This results in the optimizer correctly producing the optimal movq + movhpd
sequence for all three variations on this IR, even with AVX2.
I've included a test case.
Radar link: rdar://problem/19287012
Fix for PR 21943.
Matthias Braun [Sat, 17 Jan 2015 00:33:09 +0000 (00:33 +0000)]
RegisterCoalescer: Cleanup comment style
- Consistenly put comments above the function declaration, not the
definition. To achieve this some duplicate comments got merged and
some comment parts describing implementation details got moved into their
functions.
- Consistently use doxygen comments above functions.
- Do not use doxygen comments inside functions.
Kevin Enderby [Fri, 16 Jan 2015 23:29:07 +0000 (23:29 +0000)]
Change the test case for llvm-objdump’s -archive-headers option to not check the size
while I once again try to figure out why only the clang-cmake-armv7-a15-full bot
is getting that value wrong.
Lang Hames [Fri, 16 Jan 2015 23:13:56 +0000 (23:13 +0000)]
[RuntimeDyld] Track symbol visibility in RuntimeDyld.
RuntimeDyld symbol info previously consisted of just a Section/Offset pair. This
patch replaces that pair type with a SymbolInfo class that also tracks symbol
visibility. A new method, RuntimeDyld::getExportedSymbolLoadAddress, is
introduced which only returns a non-zero result for exported symbols. For
non-exported or non-existant symbols this method will return zero. The
RuntimeDyld::getSymbolAddress method retains its current behavior, returning
non-zero results for all symbols regardless of visibility.
No in-tree clients of RuntimeDyld are changed. The newly introduced
functionality will be used by the Orc APIs.
No test case: Since this patch doesn't modify the behavior for any in-tree
clients we don't have a good tool to test this with yet. Once Orc is in we can
use it to write regression tests that test these changes.
Kevin Enderby [Fri, 16 Jan 2015 22:10:36 +0000 (22:10 +0000)]
Fix the Archive::Child::getRawSize() method used by llvm-objdump’s -archive-headers option
and tweak its use in llvm-objdump. Add back the test case for the -archive-headers option.