Nadav Rotem [Sun, 10 Nov 2013 04:13:31 +0000 (04:13 +0000)]
SimplifyCFG has a heuristics for out-of-order processors that decides when it is worthwhile to merge branches. It tries to estimate if the operands of the instruction that we want to hoist are ready. This commit marks function arguments as 'ready' because they require no calculation. This boosts libquantum and a few other workloads from the testsuite.
Reed Kotler [Sun, 10 Nov 2013 00:09:26 +0000 (00:09 +0000)]
Mostly finish up constant islands port for Mips for load constants.
Still need to finish the branch part. Still lots more review of the code,
clean up and testing.
Logan Chien [Sat, 9 Nov 2013 14:16:52 +0000 (14:16 +0000)]
[arm] Refine ARMBuildAttrs.h.
This commit cleans up some comments in ARMBuildAttrs.h.
Besides, this commit fixes an error related to AllowWMMXv1
and AllowWMMXv2 (although they are not used currently.)
[PM] Start sketching out the new module and function pass manager.
This is still just a skeleton. I'm trying to pull together the
experimentation I've done into committable chunks, and this is the first
coherent one. Others will follow in hopefully short order that move this
more toward a useful initial implementation. I still expect the design
to continue evolving in small ways as I work through the different
requirements and features needed here though.
Keep in mind, all of this is off by default.
Currently, this mostly exercises the use of a polymorphic smart pointer
and templates to hide the polymorphism for the pass manager from the
pass implementation. The next step will be more significant, adding the
first framework of analysis support.
Move the old pass manager infrastructure into a legacy namespace and
give the files a legacy prefix in the right directory. Use forwarding
headers in the old locations to paper over the name change for most
clients during the transitional period.
No functionality changed here! This is just clearing some space to
reduce renaming churn later on with a new system.
Even when the new stuff starts to go in, it is going to be hidden behind
a flag and off-by-default as it is still WIP and under development.
This patch is specifically designed so that very little out-of-tree code
has to change. I'm going to work as hard as I can to keep that the case.
Only direct forward declarations of the PassManager class are impacted
by this change.
Use something really explicit to test "move semantics" on builds without
r-value references. I still want to test that when we have them,
llvm_move is actually a move.
Have I mentioned that I really want to move to C++11? ;]
Add a polymorphic_ptr<T> smart pointer data type. It's a somewhat silly
unique ownership smart pointer which is *deep* copyable by assuming it
can call a T::clone() method to allocate a copy of the owned data.
This is mostly useful with containers or other collections of uniquely
owned data in C++98 where they *might* copy. With C++11 we can likely
remove this in favor of move-only types and containers wrapped around
those types.
Akira Hatanaka [Sat, 9 Nov 2013 02:38:51 +0000 (02:38 +0000)]
[mips] Make sure there is a chain edge dependency between loads that read
formal arguments on the stack and stores created afterwards. We need this to
ensure tail call optimized function calls do not write over the argument area
of the stack before it is read out.
[Stackmap] Materialize the jump address within the patchpoint noop slide.
This patch moves the jump address materialization inside the noop slide. This
enables patching of the materialization itself or its complete removal. This
patch also adds the ability to define scratch registers that can be used safely
by the code called from the patchpoint intrinsic. At least one scratch register
is required, because that one is used for the materialization of the jump
address. This patch depends on D2009.
Lang Hames [Sat, 9 Nov 2013 00:14:07 +0000 (00:14 +0000)]
Rewrite the PBQP graph data structure.
The new graph structure replaces the node and edge linked lists with vectors.
Free lists (well, free vectors) are used for fast insertion/deletion.
The ultimate aim is to make PBQP graphs cheap to clone. The motivation is that
the PBQP solver destructively consumes input graphs while computing a solution,
forcing the graph to be fully reconstructed for each round of PBQP. This
imposes a high cost on large functions, which often require several rounds of
solving/spilling to find a final register allocation. If we can cheaply clone
the PBQP graph and incrementally update it between rounds then hopefully we can
reduce this cost. Further, once we begin pooling matrix/vector values (future
work), we can cache some PBQP solver metadata and share it between cloned
graphs, allowing the PBQP solver to re-use some of the computation done in
earlier rounds.
For now this is just a data structure update. The allocator and solver still
use the graph the same way as before, fully reconstructing it between each
round. I expect no material change from this update, although it may change
the iteration order of the nodes, causing ties in the solver to break in
different directions, and this could perturb the generated allocations
(hopefully in a completely benign way).
Thanks very much to Arnaud Allard de Grandmaison for encouraging me to get back
to work on this, and for a lot of discussion and many useful PBQP test cases.
[Stackmap] Add AnyReg calling convention support for patchpoint intrinsic.
The idea of the AnyReg Calling Convention is to provide the call arguments in
registers, but not to force them to be placed in a paticular order into a
specified set of registers. Instead it is up tp the register allocator to assign
any register as it sees fit. The same applies to the return value (if
applicable).
Pedro Artigas [Fri, 8 Nov 2013 22:46:28 +0000 (22:46 +0000)]
increase the accuracy of register pressure computation in the presence of dead definitions by using live intervals, if available, to identify dead definitions and proceed accordingly.
Jim Grosbach [Fri, 8 Nov 2013 22:33:06 +0000 (22:33 +0000)]
X86: Assembly files with .cfi_cfa_def shouldn't hit llvm_unreachable()
On darwin, when trying to create compact unwind info, a .cfi_cfa_def
directive would case an llvm_unreachable() to be hit. Back off when we
see this directive and generate the regular DWARF style eh_frame.
Hal Finkel [Fri, 8 Nov 2013 19:58:21 +0000 (19:58 +0000)]
Remove dead code from LoopUnswitch
LoopUnswitch's code simplification routine has logic to convert conditional
branches into unconditional branches, after unswitching makes the condition
constant, and then remove any blocks that renders dead. Unfortunately, this
code is dead, currently broken, and furthermore, has never been alive (at least
as far back at 2006).
[VirtRegMap] Fix for PR17825. Do not ignore noreturn definitions when setting
isPhysRegUsed if the unwind information is required.
Indeed, the runtime may need a correct stack to be able to unwind the call.
Based on discussions with Lang Hames and Jakob Stoklund Olesen at the hacker's lab, and in the light of upcoming work on the PBQP register allocator, it was though that CalcSpillWeights does not need to be a pass. This change will enable to customize / tune the spill weight computation depending on the allocator.
Tim Northover [Fri, 8 Nov 2013 17:18:07 +0000 (17:18 +0000)]
ARM: fold prologue/epilogue sp updates into push/pop for code size
ARM prologues usually look like:
push {r7, lr}
sub sp, sp, #4
If code size is extremely important, this can be optimised to the single
instruction:
push {r6, r7, lr}
where we don't actually care about the contents of r6, but pushing it subtracts
4 from sp as a side effect.
This should implement such a conversion, predicated on the "minsize" function
attribute (-Oz) since I've yet to find any code it actually makes faster.
Artyom Skrobov [Fri, 8 Nov 2013 09:16:31 +0000 (09:16 +0000)]
[ARM] In ARMAsmParser, MatchCoprocessorOperandName() permitted p10 and p11 as operands for coprocessor instructions, resulting in encodings that clash with FP/NEON instruction encodings
David Majnemer [Thu, 7 Nov 2013 22:15:53 +0000 (22:15 +0000)]
IR: Do not canonicalize constant GEPs into an out-of-bounds array access
Summary:
Consider a GEP of:
i8* getelementptr ({ [2 x i8], i32, i8, [3 x i8] }* @main.c, i32 0, i32 0, i64 0)
If we proceeded to GEP the aforementioned object by 8, would form a GEP of:
i8* getelementptr ({ [2 x i8], i32, i8, [3 x i8] }* @main.c, i32 0, i32 0, i64 8)
Note that we would go through the first array member, causing an
out-of-bounds accesses. This is problematic because we might get fooled
if we are trying to evaluate loads using this GEP, for example, based
off of an object with a constant initializer where the array is zero.
Bill Wendling [Thu, 7 Nov 2013 20:14:51 +0000 (20:14 +0000)]
Move copying of global initializers below the cloning of functions.
The BlockAddress doesn't have access to the correct basic blocks until the
functions have been cloned. This causes the BlockAddress to point to the old
values. Just wait until the functions have been cloned before copying the
initializers.
PR13163
Reed Kotler [Thu, 7 Nov 2013 11:56:33 +0000 (11:56 +0000)]
Disable some code that is causing some warnings. It's in the process
of being converted and this path is not relevant to anything at this time
so I have just disabled it for a few days while I'm at the LLVM conference
and don't have time to complete it or properly fix it.
Add the fact that we anticipate switching to use (some subset of) C++11
after the 3.4 release to the release notes. See the *lengthy* llvmdev
and cfe-dev threads on this subject. There will be more emails,
discussion and announcements, but I want to make noise in as many places
as I can to get everyone's concerns voiced and understood.
Andrew Trick [Wed, 6 Nov 2013 02:08:26 +0000 (02:08 +0000)]
Rewrite SCEV's backedge taken count computation.
Patch by Michele Scandale!
Rewrite of the functions used to compute the backedge taken count of a
loop on LT and GT comparisons.
I decided to split the handling of LT and GT cases becasue the trick
"a > b == -a < -b" in some cases prevents the trip count computation
due to the multiplication by -1 on the two operands of the
comparison. This issue comes from the conservative computation of
value range of SCEVs: taking the negative SCEV of an expression that
have a small positive range (e.g. [0,31]), we would have a SCEV with a
fullset as value range.
Indeed, in the new rewritten function I tried to better handle the
maximum backedge taken count computation when MAX/MIN expression are
used to handle the cases where no entry guard is found.
Some test have been modified in order to check the new value correctly
(I manually check them and reasoning on possible overflow the new
values seem correct).
I finally added a new test case related to the multiplication by -1
issue on GT comparisons.
Remove another unused, and IMHO, not very desirable feature of ErrorOr.
One of the uses of the IsValid flag is to support default constructing
a ErrorOr that is not a Error or a Value. There is not much value in
doing that IMHO. If ErrorOr was to have a default constructor, it
should be implemented by default constructing the value, but even that
looks unnecessary.
The other use is to avoid calling destructors on moved objects. This
looks wrong. If the data being moved has non trivial treatment of
moves (an std::vector for example), it is its destructor that should
handle it, not ~ErrorOr.
With this change ErrorOr becomes a fairly simple wrapper and should
always be better than using an error_code + value in an API.
Andrew Trick [Tue, 5 Nov 2013 22:44:04 +0000 (22:44 +0000)]
Slightly change the way stackmap and patchpoint intrinsics are lowered.
MorphNodeTo is not safe to call during DAG building. It eagerly
deletes dependent DAG nodes which invalidates the NodeMap. We could
expose a safe interface for morphing nodes, but I don't think it's
worth it. Just create a new MachineNode and replaceAllUsesWith.
My understaning of the SD design has been that we want to support
early target opcode selection. That isn't very well supported, but
generally works. It seems reasonable to rely on this feature even if
it isn't widely used.
Reed Kotler [Tue, 5 Nov 2013 22:34:29 +0000 (22:34 +0000)]
Get rid of all references to soimm in MipsConstantIslands pass because
we don't have such an operand.
Suprisingly enough, this is never actually accounted for in the
ARM version when determining offset ranges. In both places there is the
comment:
- // FIXME: Make use full range of soimm values.
(soimm = shift operand immediate).
Tim Northover [Tue, 5 Nov 2013 21:36:02 +0000 (21:36 +0000)]
ARM: permit bare dmb/dsb/isb aliases on Cortex-M0
Cortex-M0 supports these 32-bit instructions despite being Thumb1 only
(mostly). We knew about that but not that the aliases without the default "sy"
operand were also permitted.
[objc-arc] Convert the one directional retain/release relation assert to a conditional check + fail.
Due to the previously added overflow checks, we can have a retain/release
relation that is one directional. This occurs specifically when we run into an
additive overflow causing us to drop state in only one direction. If that
occurs, we should bail and not optimize that retain/release instead of
asserting.
Apologies for the size of the testcase. It is necessary to cause the additive
cfg overflow to trigger.