Use a smaller pragma unroll threshold to reduce test execution time.
When opt is compiled with AddressSanitizer it takes more than 30 seconds
to unroll the loop in unroll_1M().
Jozef Kolek [Wed, 21 Jan 2015 12:39:30 +0000 (12:39 +0000)]
[mips][microMIPS] MicroMIPS 16-bit unconditional branch instruction B
Implement microMIPS 16-bit unconditional branch instruction B.
Implemented 16-bit microMIPS unconditional instruction has real name B16, and
B is an alias which expands to either B16 or BEQ according to the rules:
b 256 --> b16 256 # R_MICROMIPS_PC10_S1
b 12256 --> beq $zero, $zero, 12256 # R_MICROMIPS_PC16_S1
b label --> beq $zero, $zero, label # R_MICROMIPS_PC16_S1
Chandler Carruth [Wed, 21 Jan 2015 11:38:17 +0000 (11:38 +0000)]
[PM] Refactor the InstCombiner interface to use an external worklist.
Because in its primary function pass the combiner is run repeatedly over
the same function until doing so produces no changes, it is essentially
to not re-allocate the worklist. However, as a utility, the more common
pattern would be to put a limited set of instructions in the worklist
rather than the entire function body. That is also the more likely
pattern when used by the new pass manager.
The result is a very light weight combiner that does the visiting with
a separable worklist. This can then be wrapped up in a helper function
for users that want a combiner utility, or as I have here it can be
wrapped up in a pass which manages the iterations used when combining an
entire function's instructions.
Hopefully this removes some of the worst of the interface warts that
became apparant with the last patch here. However, there is clearly more
work. I've again left some FIXMEs for the most egregious. The ones that
stick out to me are the exposure of the worklist and IR builder as
public members, and the use of pointers rather than references. However,
fixing these is likely to be much more mechanical and less interesting
so I didn't want to touch them in this patch.
Chandler Carruth [Wed, 21 Jan 2015 11:23:40 +0000 (11:23 +0000)]
[PM] Simplify (ha! ha!) the way that instcombine calls the
SimplifyLibCalls utility by sinking it into the specific call part of
the combiner.
This will avoid us needing to do any contortions to build this object in
a subsequent refactoring I'm doing and seems generally better factored.
We don't need this utility everywhere and it carries no interesting
state so we might as well build it on demand.
Vladimir Medic [Wed, 21 Jan 2015 10:47:36 +0000 (10:47 +0000)]
[Mips][Disassembler]When disassembler meets load/store from coprocessor 2 instructions for mips r6 it crashes as the access to operands array is out of range. This patch adds dedicated decoder method that properly handles decoding of these instructions.
Chandler Carruth [Wed, 21 Jan 2015 02:11:59 +0000 (02:11 +0000)]
[PM] Replace an abuse of inheritance to override a single function with
a more direct approach: a type-erased glorified function pointer. Now we
can pass a function pointer into this for the easy case and we can even
pass a lambda into it in the interesting case in the instruction
combiner.
I'll be using this shortly to simplify the interfaces to InstCombiner,
but this helps pave the way and seems like a better design for the
libcall simplifier utility.
Simon Pilgrim [Wed, 21 Jan 2015 00:02:13 +0000 (00:02 +0000)]
[X86][AVX] Simplified diff between AVX1 and SSE42 fp stack folding tests. NFC.
Changed the AVX1 tests register spill tail call to return a xmm like the SSE42 version - makes doing diffs between them a lot easier without affecting the spills themselves.
The SSE42 version of the AVX1 float stack folding tests will be added shortly, this renames the AVX1 file so that the files will be near each other in a directory listing to help ensure they are kept in sync.
Chandler Carruth [Tue, 20 Jan 2015 22:44:35 +0000 (22:44 +0000)]
[PM] Separate the InstCombiner from its pass.
This creates a small internal pass which runs the InstCombiner over
a function. This is the hard part of porting InstCombine to the new pass
manager, as at this point none of the code in InstCombine has access to
a Pass object any longer.
The resulting interface for the InstCombiner is pretty terrible. I'm not
planning on leaving it that way. The key thing missing is that we need
to separate the worklist from the combiner a touch more. Once that's
done, it should be possible for *any* part of LLVM to just create
a worklist with instructions, populate it, and then combine it until
empty. The pass will just be the (obvious and important) special case of
doing that for an entire function body.
For now, this is the first increment of factoring to make all of this
work.
Chandler Carruth [Tue, 20 Jan 2015 21:10:35 +0000 (21:10 +0000)]
[PM] Reformat this code with clang-format so that subsequent changes
don't get muddied up by formatting changes.
Some of these don't really seem like improvements to me, but they also
don't seem any worse and I care much more about not formatting them
manually than I do about the particular formatting. =]
Daniel Jasper [Tue, 20 Jan 2015 19:43:33 +0000 (19:43 +0000)]
Prevent binary-tree deterioration in sparse switch statements.
This addresses part of llvm.org/PR22262. Specifically, it prevents
considering the densities of sub-ranges that have fewer than
TLI.getMinimumJumpTableEntries() elements. Those densities won't help
jump tables.
This is not a complete solution but works around the most pressing
issue.
[GC] Verify-pass void vararg functions in gc.statepoint
With the appropriate Verifier changes, exactracting the result out of a
statepoint wrapping a vararg function crashes. However, a void vararg
function works fine: commit this first step.
Adrian Prantl [Tue, 20 Jan 2015 19:42:22 +0000 (19:42 +0000)]
Reapply: Teach SROA how to update debug info for fragmented variables.
This reapplies r225379.
ChangeLog:
- The assertion that this commit previously ran into about the inability
to handle indirect variables has since been removed and the backend
can handle this now.
- Testcases were upgrade to the new MDLocation format.
- Instead of keeping a DebugDeclares map, we now use
llvm::FindAllocaDbgDeclare().
Original commit message follows.
Debug info: Teach SROA how to update debug info for fragmented variables.
This allows us to generate debug info for extremely advanced code such as
Manman Ren [Tue, 20 Jan 2015 19:24:59 +0000 (19:24 +0000)]
[llvm link] Destroy ConstantArrays in LLVMContext if they are not used.
ConstantArrays constructed during linking can cause quadratic memory
explosion. An example is the ConstantArrays constructed when linking in
GlobalVariables with appending linkage.
Releasing all unused constants can cause a 20% LTO compile-time
slowdown for a large application. So this commit releases unused ConstantArrays
only.
rdar://19040716. It reduces memory footprint from 20+G to 6+G.
Tom Stellard [Tue, 20 Jan 2015 17:49:47 +0000 (17:49 +0000)]
R600/SI: Use external symbols for scratch buffer
We were passing the scratch buffer address to the shaders via user sgprs,
but now we use external symbols and have the driver patch the shader
using reloc information.
Tom Stellard [Tue, 20 Jan 2015 17:49:43 +0000 (17:49 +0000)]
R600/SI: Don't store scratch buffer frame index in MUBUF offset field
We don't have a good way of legalizing this if the frame index offset
is more than the 12-bits, which is size of MUBUF's offset field, so
now we store the frame index in the vaddr field.
Jozef Kolek [Tue, 20 Jan 2015 16:45:27 +0000 (16:45 +0000)]
[mips][microMIPS] MicroMIPS 16-bit unconditional branch instruction B
Implement microMIPS 16-bit unconditional branch instruction B.
Implemented 16-bit microMIPS unconditional instruction has real name B16, and
B is an alias which expands to either B16 or BEQ according to the rules:
b 256 --> b16 256 # R_MICROMIPS_PC10_S1
b 12256 --> beq $zero, $zero, 12256 # R_MICROMIPS_PC16_S1
b label --> beq $zero, $zero, label # R_MICROMIPS_PC16_S1
Evgeniy Stepanov [Tue, 20 Jan 2015 15:21:35 +0000 (15:21 +0000)]
[msan] Optimize -msan-check-constant-shadow.
The new code does not create new basic blocks in the case when shadow is a
compile-time constant; it generates either an unconditional __msan_warning
call or nothing instead.
Chandler Carruth [Tue, 20 Jan 2015 10:58:50 +0000 (10:58 +0000)]
[PM] Port LoopInfo to the new pass manager, adding both a LoopAnalysis
pass and a LoopPrinterPass with the expected associated wiring.
I've added a RUN line to the only test case (!!!) we have that actually
prints loops. Everything seems to be working.
This is somewhat exciting as this is the first analysis using another
analysis to go in for the new pass manager. =D I also believe it is the
last analysis necessary for porting instcombine, but of course I may yet
discover more.
Daniel Jasper [Tue, 20 Jan 2015 08:57:44 +0000 (08:57 +0000)]
Factor out a splitSwitchCase() function so that it can be reused.
This is in preparation for a fix to llvm.org/PR22262. One of the ideas
here is to first find a good jump table range first and then split
before and after it. Thereby, we don't need to use the
split-based-on-density heuristic at all, which can make the "binary
tree" deteriorate in various cases.
Chandler Carruth [Tue, 20 Jan 2015 08:35:24 +0000 (08:35 +0000)]
[PM] Move the LoopInfo analysis pointer into the InstCombiner class
along with the other analyses.
The most obvious reason why is because eventually I need to separate out
the pass layer from the rest of the instcombiner. However, it is also
probably a compile time win as every query through the pass manager
layer is pretty slow these days.
Karthik Bhat [Tue, 20 Jan 2015 06:11:00 +0000 (06:11 +0000)]
Fix Operandreorder logic in SLPVectorizer to generate longer vectorizable chain.
This patch fixes 2 issues in reorderInputsAccordingToOpcode
1) AllSameOpcodeLeft and AllSameOpcodeRight was being calculated incorrectly resulting in code not being vectorized in few cases.
2) Adds logic to reorder operands if we get longer chain of consecutive loads enabling vectorization. Handled the same for cases were we have AltOpcode.
Thanks Michael for inputs and review.
Review: http://reviews.llvm.org/D6677
IR: Move MDNode clone() methods from ValueMapper to MDNode, NFC
Now that the clone methods used by `MapMetadata()` don't do any
remapping (and return a temporary), they make more sense as member
functions on `MDNode` (and subclasses).
Change `HeaderBuilder` API to work well even when it's not starting with
a tag. There's already one case like this, and the tag is moving
elsewhere as part of PR22235.
Fix a leak in `LLVMContextImpl` teardown that the leak sanitizer tracked
down [1]. I've just switched to automatic dispatch here (since I'll
inevitably forget again with the next class).
IR: Detect whether to call recalculateHash() via SFINAE, NFC
Rather than relying on updating switch statements correctly, detect
whether `setHash()` exists in the subclass. If so, call
`recalculateHash()` and `setHash(0)` appropriately.
As part of PR22235, introduce `DwarfNode` and `GenericDwarfNode`. The
former is a metadata node with a DWARF tag. The latter matches our
current (generic) schema of a header with string (and stringified
integer) data and an arbitrary number of operands.
This doesn't move it into place yet; that change will require a large
number of testcase updates.
Swap usage of `SubclassData32` and `MDNodeSubclassData`, and rename
`MDNodeSubclassData` to `NumUnresolved`. Small drive-by cleanup to
`countUnresolvedOperands()` since otherwise the name clash with local
vars named `NumUnresolved` would be confusing.
As pointed out in r226501, the distinction between `MDNode` and
`UniquableMDNode` is confusing. When we need subclasses of `MDNode`
that don't use all its functionality it might make sense to break it
apart again, but until then this makes the code clearer.
Now that we can create much more exhaustive X86 memory folding tests, this patch adds the missing AVX1/F16C floating point instruction stack foldings we can easily test for including the scalar intrinsics (add, div, max, min, mul, sub), conversions float/int to double, half precision conversions, rounding, dot product and bit test. The patch also adds a couple of obviously missing SSE instructions (more to follow once we have full SSE testing).
Now that scalar folding is working it broke a very old test (2006-10-07-ScalarSSEMiscompile.ll) - this test appears to make no sense as its trying to ensure that a scalar subtraction isn't folded as it 'would zero the top elts of the loaded vector' - this test just appears to be wrong to me.
IR: Allow temporary nodes to become uniqued or distinct
Add `MDNode::replaceWithUniqued()` and `MDNode::replaceWithDistinct()`,
which mutate temporary nodes to become uniqued or distinct. On uniquing
collisions, the unique version is returned and the node is deleted.
This takes advantage of temporary nodes being folded back in, and should
let me clean up some awkward logic in `MapMetadata()`.
r226504 added `TempMDNodeDeleter` to help with `std::unique_ptr<>`-izing
the `MDNode::getTemporary()` interface. It doesn't need to be
templated, though.
Change `MDTuple::getTemporary()` and `MDLocation::getTemporary()` to
return (effectively) `std::unique_ptr<T, MDNode::deleteTemporary>`, and
clean up call sites. (For now, `DIBuilder` call sites just call
`release()` immediately.)
There's an accompanying change in each of clang and polly to use the new
API.
Rafael Espindola [Mon, 19 Jan 2015 21:11:14 +0000 (21:11 +0000)]
Add r224985 back with fixes.
The fixes are to note that AArch64 has additional restrictions on when local
relocations can be used. In particular, ld64 requires that relocations to
cstring/cfstrings use linker visible symbols.
Original message:
In an assembly expression like
bar:
.long L0 + 1
the intended semantics is that bar will contain a pointer one byte past L0.
In sections that are merged by content (strings, 4 byte constants, etc), a
single position in the section doesn't give the linker enough information.
For example, it would not be able to tell a relocation must point to the
end of a string, since that would look just like the start of the next.
The solution used in ELF to use relocation with symbols if there is a non-zero
addend.
In MachO before this patch we would just keep all symbols in some sections.
This would miss some cases (only cstrings on x86_64 were implemented) and was
inefficient since most relocations have an addend of 0 and can be represented
without the symbol.
This patch implements the non-zero addend logic for MachO too.