Philip Reames [Fri, 13 Feb 2015 23:08:37 +0000 (23:08 +0000)]
Minor tweak to MDA
Two minor tweaks I noticed when reading through the code:
- No need to recompute begin() on every iteration. We're not modifying the instructions in this loop.
- We can ignore PHINodes and Dbg intrinsics. The current code does this anyways, but it will spend slightly more time doing so and will count towards the limit of instructions in the block. It seems really silly to give up due the presence of PHIs...
Reid Kleckner [Fri, 13 Feb 2015 22:05:50 +0000 (22:05 +0000)]
Triple: Make setEnvironment not override the object format
Discovered by Halide users who had C++ code like this:
Triple.setArch(Triple::x86);
Triple.setOS(Triple::Windows);
Triple.setObjectFormat(Triple::ELF);
Triple.setEnvironment(Triple::MSVC);
This would produce the stringified triple of x86-windows-msvc, instead
of the x86-windows-msvc-elf string needed to run MCJIT.
Sanjay Patel [Fri, 13 Feb 2015 21:52:42 +0000 (21:52 +0000)]
[SSE/AVX] Use multiclasses to reduce the mass of scalar math patterns; NFCI
This takes the preposterous number of patterns in this section
that were last added to in r219033 down to just plain obnoxious.
With a little more work, we might get this down to just comical.
I've added more test cases to the existing file that checks these
patterns, but it seems that some of these patterns simply don't
exist with today's shuffle lowering.
Reid Kleckner [Fri, 13 Feb 2015 21:27:28 +0000 (21:27 +0000)]
Fix R600 test deadlock on Windows by giving FileCheck an argument
llc would hang trying to write output to a full pipe that FileCheck
wasn't reading. FileCheck wasn't reading from stdin because it needs a
file as a positional argument.
Zachary Turner [Fri, 13 Feb 2015 17:57:09 +0000 (17:57 +0000)]
llvm-pdbdump: Improve printing of functions and signatures.
This correctly prints the function pointers, and also prints
function signatures for symbols as opposed to just types. So
actual functions in your program will now be printed with full
name and signature, as opposed to just name as before.
Jozef Kolek [Fri, 13 Feb 2015 17:51:27 +0000 (17:51 +0000)]
[mips][microMIPS] Delay slot filler: Replace the microMIPS JR with the JRC
This patch adds functionality in MIPS delay slot filler such as if delay slot
filler have to put NOP instruction into the delay slot of microMIPS JR
instruction, then instead of emitting NOP this instruction is replaced by
compact jump instruction JRC.
Andrea Di Biagio [Fri, 13 Feb 2015 16:33:34 +0000 (16:33 +0000)]
[InstCombine] Fix regression introduced at r227197.
This patch fixes a problem I accidentally introduced in an instruction combine
on select instructions added at r227197. That revision taught the instruction
combiner how to fold a cttz/ctlz followed by a icmp plus select into a single
cttz/ctlz with flag 'is_zero_undef' cleared.
However, the new rule added at r227197 would have produced wrong results in the
case where a cttz/ctlz with flag 'is_zero_undef' cleared was follwed by a
zero-extend or truncate. In that case, the folded instruction would have
been inserted in a wrong location thus leaving the CFG in an inconsistent
state.
This patch fixes the problem and add two reproducible test cases to
existing test 'InstCombine/select-cmp-cttz-ctlz.ll'.
Andrea Di Biagio [Fri, 13 Feb 2015 14:15:48 +0000 (14:15 +0000)]
[CodeGenPrepare] Removed duplicate logic. SimplifyCFG already knows how to speculate calls to cttz/ctlz.
SimplifyCFG now knows how to speculate calls to intrinsic cttz/ctlz that are
'cheap' for the target. Therefore, some of the logic in CodeGenPrepare
that was originally added at revision 224899 can now be removed.
This patch is basically a no functional change. It removes the duplicated
logic in CodeGenPrepare and converts all the existing target specific tests
for cttz/ctlz into SimplifyCFG tests.
[PBQP] Conservativelly allocatable nodes can be spilled and give a better solution
Although such nodes are allocatable, the cost of spilling may be less than
allocating to register, so spilling the node may provide a better solution.
The assert does not account for this case, so remove it for now.
Toma Tabacu [Fri, 13 Feb 2015 10:30:57 +0000 (10:30 +0000)]
[mips] Improve support for the .set at/noat assembler directives.
Summary:
Made the following changes:
Added calls to emitDirectiveSetNoAt() and emitDirectiveSetAt().
Added special emit function for .set at=$reg, emitDirectiveSetAtWithArg(unsigned RegNo).
Improved parsing error checks for .set at.
Refactored parser code for .set at.
Improved testing of both directives.
Improved code readability and comments.
Chandler Carruth [Fri, 13 Feb 2015 10:21:05 +0000 (10:21 +0000)]
[PM] Update the examples to reflect the removal of the
llvm/PassManager.h wrapper header and its using declarations. These now
directly use the legacy namespace.
I had updated the #include lines in my large commit but forgot that the
examples weren't being built and didn't update the code to use the
correct namespace. Sorry for the noise here.
Chandler Carruth [Fri, 13 Feb 2015 10:01:29 +0000 (10:01 +0000)]
[PM] Remove the old 'PassManager.h' header file at the top level of
LLVM's include tree and the use of using declarations to hide the
'legacy' namespace for the old pass manager.
This undoes the primary modules-hostile change I made to keep
out-of-tree targets building. I sent an email inquiring about whether
this would be reasonable to do at this phase and people seemed fine with
it, so making it a reality. This should allow us to start bootstrapping
with modules to a certain extent along with making it easier to mix and
match headers in general.
The updates to any code for users of LLVM are very mechanical. Switch
from including "llvm/PassManager.h" to "llvm/IR/LegacyPassManager.h".
Qualify the types which now produce compile errors with "legacy::". The
most common ones are "PassManager", "PassManagerBase", and
"FunctionPassManager".
Chandler Carruth [Fri, 13 Feb 2015 07:52:39 +0000 (07:52 +0000)]
Revert a series of commits starting at r228886 which is triggering some
regressions for LLDB on Linux. Rafael indicated on lldb-dev that we
should just go ahead and revert these but that he wasn't at a computer.
The patches backed out are as follows:
r228980: Add support for having multiple sections with the name and ...
r228889: Invert the section relocation map.
r228888: Use the existing SymbolTableIndex intsead of doing a lookup.
r228886: Create the Section -> Rel Section map when it is first needed.
These patches look pretty nice to me, so hoping its not too hard to get
them re-instated. =D
Zachary Turner [Fri, 13 Feb 2015 07:40:03 +0000 (07:40 +0000)]
llvm-pdbdump: Add more comprehensive dumping of symbol types.
In particular this patch adds the ability to dump complete
function signature information including argument types as
correctly formatted strings. A side effect of this is that
almost all symbol and meta types are now formatted.
Chandler Carruth [Fri, 13 Feb 2015 05:31:46 +0000 (05:31 +0000)]
[unroll] Concede defeat and disable the unroll analyzer for now.
The issues with the new unroll analyzer are more fundamental than code
cleanup, algorithm, or data structure changes. I've sent an email to the
original commit thread with details and a proposal for how to redesign
things. I'm disabling this for now so that we don't spend time
debugging issues with it in its current state.
Michael Liao [Fri, 13 Feb 2015 04:51:26 +0000 (04:51 +0000)]
[InstCombine] Fix a bug when combining `icmp` from `ptrtoint`
- First, there's a crash when we try to combine that pointers into `icmp`
directly by creating a `bitcast`, which is invalid if that two pointers are
from different address spaces.
- It's not always appropriate to cast one pointer to another if they are from
different address spaces as that is not no-op cast. Instead, we only combine
`icmp` from `ptrtoint` if that two pointers are of the same address space.
Chandler Carruth [Fri, 13 Feb 2015 04:30:44 +0000 (04:30 +0000)]
[unroll] Don't check the loop set for whether an instruction is
contained in it each time we try to add it to the worklist, just check
this when pulling it off the worklist. That way we do it at most once
per instruction with the cost of the worklist set we would need to pay
anyways.
Chandler Carruth [Fri, 13 Feb 2015 04:27:50 +0000 (04:27 +0000)]
[unroll] Change the other worklist in the unroll analyzer to be a set
vector.
In addition to dramatically reducing the work required for contrived
example loops, this also has to correct some serious latent bugs in the
cost computation. Previously, we might add an instruction onto the
worklist once for every load which it used and was simplified. Then we
would visit it many times and accumulate "savings" each time.
I mean, fortunately this couldn't matter for things like calls with 100s
of operands, but even for binary operators this code seems like it must
be double counting the savings.
I just noticed this by inspection and due to the runtime problems it can
introduce, I don't have any test cases for cases where the cost produced
by this routine is unacceptable.
Chandler Carruth [Fri, 13 Feb 2015 04:14:05 +0000 (04:14 +0000)]
[unroll] Directly query for dead instructions.
In the unroll analyzer, it is checking each user to see if that user
will become dead. However, it first checked if that user was missing
from the simplified values map, and then if was also missing from the
dead instructions set. We add everything from the simplified values map
to the dead instructions set, so the first step is completely subsumed
by the second. Moreover, the first step requires *inserting* something
into the simplified value map which isn't what we want at all.
This also replaces a dyn_cast with a cast as an instruction cannot be
used by a non-instruction.
Chandler Carruth [Fri, 13 Feb 2015 04:06:08 +0000 (04:06 +0000)]
[unroll] Replace a linear time check for no uses with a constant time
check.
Also hoist this into the enqueue process as it is faster even than
testing the worklist set, we should just directly filter these out much
like we filter out constants and such.
Chandler Carruth [Fri, 13 Feb 2015 03:57:40 +0000 (03:57 +0000)]
[unroll] Rather than an operand set, use a setvector for the worklist.
We don't just want to handle duplicate operands within an instruction,
but also duplicates across operands of different instructions. I should
have gone straight to this, but I had convinced myself that it wasn't
going to be necessary briefly. I've come to my senses after chatting
more with Nick, and am now happier here.
Chandler Carruth [Fri, 13 Feb 2015 03:49:41 +0000 (03:49 +0000)]
[unroll] Extract the code to enqueue operansd for the worklist in the
unroll analysis into a lambda and call it. That's much simpler than
duplicating all the code.
Chandler Carruth [Fri, 13 Feb 2015 03:48:38 +0000 (03:48 +0000)]
[unroll] Use a small set to de-duplicate operands prior to putting them
into the worklist. This avoids allocating lots of worklist memory for
them when there are large numbers of repeated operands.
Chandler Carruth [Fri, 13 Feb 2015 03:40:58 +0000 (03:40 +0000)]
[unroll] Make the unroll cost analysis terminate deterministically and
reasonably quickly.
I don't have a reduced test case, but for a version of FFMPEG, this
makes the loop unroller start finishing at all (after over 15 minutes of
running, it hadn't terminated for me, no idea if it was a true infloop
or just exponential work).
The key thing here is to check the DeadInstructions set when pulling
things off the worklist. Without this, we would re-walk the user list of
already dead instructions again and again and again. Consider phi nodes
with many, many operands and other patterns.
The other important aspect of this is that because we would keep
re-visiting instructions that were already known dead, we kept adding
their cost savings to this! This would cause our cost savings to be
*insanely* inflated from this.
While I was here, I also rotated the operand walk out of the worklist
loop to make the code easier to read. There is still work to be done to
minimize worklist traffic because we don't de-duplicate operands. This
means we may add the same instruction onto the worklist 1000s of times
if it shows up in 1000s of operansd to a PHI node for example.
Still, with this patch, the ffmpeg testcase I have finishes quickly and
I can't measure the runtime impact of the unroll analysis any more. I'll
probably try to do a few more cleanups to this code, but not sure how
much cleanup I can justify right now.
IR: Drop never-used defaults for DIBuilder::createTemplate*(), NFC
No caller specifies anything different; these parameters are dead code
and probably always have been. The new hierarchy doesn't bother with
the fields at all (see r228607 and r228652).
Chandler Carruth [Fri, 13 Feb 2015 02:45:17 +0000 (02:45 +0000)]
[unroll] Make range based for loops a bit more explicit and more
readable.
The biggest thing that was causing me problems is recognizing the
references vs. poniters here. I also found that for maps naming the loop
variable as KeyValue helps make it obvious why you don't actually use it
directly. Finally, using 'auto' instead of 'User *' doesn't seem like
a good tradeoff. Much like with the other cases, I like to know its
a pointer, and 'User' is just as long and tells the reader a lot more.
Chandler Carruth [Fri, 13 Feb 2015 02:30:01 +0000 (02:30 +0000)]
[IC] Fix a bug with the instcombine canonicalizing of loads and
propagating of metadata.
We were propagating !nonnull metadata even when the newly formed load is
no longer of a pointer type. This is clearly broken and results in LLVM
failing the verifier and aborting. This patch just restricts the
propagation of !nonnull metadata to when we actually have a pointer
type.
This bug report and the initial version of this patch was provided by
Charles Davis! Many thanks for finding this!
We still need to add logic to round-trip the metadata correctly if we
combine from pointer types to integer types and then back by using range
metadata for the integer type loads. But this is the minimal and safe
version of the patch, which is important so we can backport it into 3.6.
Chandler Carruth [Fri, 13 Feb 2015 02:17:39 +0000 (02:17 +0000)]
[unroll] Avoid the "Insn" abbreviation of Instruction. This is quite
hard to type and read for me, and is inconsistent with the other
abbreviation in the base class "Inst". For most of these (where they are
used widely) I prefer just spelling it out as Instruction. I've changed
two of the short-lived variables to use "Inst" to match the base class.
Chandler Carruth [Fri, 13 Feb 2015 02:10:56 +0000 (02:10 +0000)]
[unroll] Tidy up the integer we use to accumululate the number of
instructions optimized. NFC, just separating this out from the
functionality changing commit.
Zachary Turner [Fri, 13 Feb 2015 01:23:51 +0000 (01:23 +0000)]
Improve llvm-pdbdump output display.
This patch adds a number of improvements to llvm-pdbdump.
1) Dumping of the entire global scope, and not only those
symbols that live in individual compilands.
2) Prepend class name to member functions and data
3) Improved display of bitfields.
4) Support for dumping more kinds of data symbols.