Dan Gohman [Tue, 8 Sep 2015 20:36:33 +0000 (20:36 +0000)]
[WebAssembly] Support running without a register allocator in the default CodeGen passes
This allows backends which don't use a traditional register allocator,
but do need PHI lowering and other passes, to use the default
TargetPassConfig::addFastRegAlloc and
TargetPassConfig::addOptimizedRegAlloc implementations.
Matt Arsenault [Tue, 8 Sep 2015 19:54:32 +0000 (19:54 +0000)]
AMDGPU: Mark s_barrier as a high latency instruction
These were marked as WriteSALU, which is low latency.
I'm guessing at the value to use, but it should probably
be considered the highest latency instruction.
I'm not sure this has any actual effect since hasSideEffects
probably is preventing any moving of these.
The old implementation assumed LP64 which is broken for x32. Specifically, the
MOVE8rm_NOREX and MOVE8mr_NOREX, when selected, would cause a 'Cannot emit
physreg copy instruction' error message to be reported.
This patch also enable the h-register*ll tests for x32.
Diego Novillo [Tue, 8 Sep 2015 19:22:17 +0000 (19:22 +0000)]
Fix PR 24723 - Handle 0-mass backedges in irreducible loops
This corner case happens when we have an irreducible SCC that is
deeply nested. As we work down the tree, the backedge masses start
getting smaller and smaller until we reach one that is down to 0.
Since we distribute the incoming mass using the backedge masses as
weight, the distributor does not allow zero weights. So, we simply
ignore them (which will just use the weights of the non-zero nodes).
Davide Italiano [Tue, 8 Sep 2015 18:59:47 +0000 (18:59 +0000)]
[MC/ELF] Accept zero for .align directive
.align directive refuses alignment 0 -- a comment in the code hints this is
done for GNU as compatibility, but it seems GNU as accepts .align 0
(and silently rounds up alignment to 1).
Benjamin Kramer [Tue, 8 Sep 2015 18:36:56 +0000 (18:36 +0000)]
Merge or combine tests and convert to FileCheck.
- Move tests only exercising instsimplify to instsimplify's apint-or.ll
- Actually test the CHECK lines in instsimplify's apint-or.ll
- Merge the remaining tests in apint-or1.ll and apint-or2.ll, use FileCheck
Jakub Kuderski [Tue, 8 Sep 2015 10:03:17 +0000 (10:03 +0000)]
There is a trunc(lshr (zext A), Cst) optimization in InstCombineCasts that
removes cast by performing the lshr on smaller types. However, currently there
is no trunc(lshr (sext A), Cst) variant.
This patch add such optimization by transforming trunc(lshr (sext A), Cst)
to ashr A, Cst.
David Majnemer [Sun, 6 Sep 2015 06:49:59 +0000 (06:49 +0000)]
[InstCombine] Don't divide by zero when evaluating a potential transform
Trivial multiplication by zero may survive the worklist. We tried to
reassociate the multiplication with a division instruction, causing us
to divide by zero; bail out instead.
Hal Finkel [Sun, 6 Sep 2015 05:42:13 +0000 (05:42 +0000)]
[SelectionDAG] Swap commutative binops before constant-based folding
In searching for a fix for the underlying code-quality bug highlighted by
r246937 (that SDAG simplification can lead to us generating an ISD::OR node
with a constant zero LHS), I ran across this:
We generically canonicalize commutative binary-operation nodes in SDAG getNode
so that, if only one operand is a constant, it will be on the RHS. However, we
were doing this only after a bunch of constant-based simplification checks that
all assume this canonical form (that any constant will be on the RHS). Moving
the operand-swapping canonicalization prior to these checks seems like the
right thing to do (and, as it turns out, causes SDAG to completely fold away the
computation in test/CodeGen/ARM/2012-11-14-subs_carry.ll, just like InstCombine
would do).
To commute a trivial rlwimi instructions (meaning one with a full mask and zero
shift), we'd need to ability to form an all-zero mask (instead of an all-one
mask) using rlwimi. We can't represent this, however, and we'll miscompile code
if we try.
The code quality problem that this highlights (that SDAG simplification can
lead to us generating an ISD::OR node with a constant zero LHS) will be fixed
as a follow-up.
Summary:
This diff attempts to address the concerns raised in
http://reviews.llvm.org/D12488.
We introduce a new USE_SHARED option to llvm_config,
which, if set, causes the target to be linked against
libLLVM.
add_llvm_utility now uniformly disables linking against
libLLVM. These utilities are not intended for distribution,
and this keeps the option handling more centralised.
llvm-shlib is now processes before any other "tools"
subdirectories, ensuring the libLLVM target is defined
before its dependents.
One main difference from what was requested: llvm_config
does not prune LLVM_DYLIB_COMPONENTS from the components
passed into explicit_llvm_config. This is because the "all"
component does something special, adding additional
libraries (namely libLTO). Adding the component libraries
after libLLVM should not be a problem, as symbols will be
resolved in libLLVM first.
Finally, I'm not really happy with the
DISABLE_LLVM_LINK_LLVM option, but I'm not sure of a
better way to get the following:
- link all tools and shared libraries to libLLVM if
LLVM_LINK_LLVM_DYLIB is set
- some way of explicitly *not* doing so for utilities
and libLLVM itself
Suggestions for improvement here are particularly welcome.
PPCISelDAGToDAG has a transformation that generates a rlwimi instruction from
an input pattern that looks like this:
and(or(x, c1), c2)
but the associated logic does not work if there are bits that are 1 in c1 but 0
in c2 (these are normally canonicalized away, but that can't happen if the 'or'
has other users. Make sure we abort the transformation if such bits are
discovered.
Richard Smith [Fri, 4 Sep 2015 04:08:36 +0000 (04:08 +0000)]
Fix APInt value initialization to give a zero value as any sane integer type
should, rather than giving a broken value that doesn't even zero/sign-extend
properly.
Hal Finkel [Fri, 4 Sep 2015 00:10:41 +0000 (00:10 +0000)]
[PowerPC] Enable interleaved-access vectorization
This adds a basic cost model for interleaved-access vectorization (and a better
default for shuffles), and enables interleaved-access vectorization by default.
The relevant difference from the default cost model for interleaved-access
vectorization, is that on PPC, the shuffles that end up being used are *much*
cheaper than modeling the process with insert/extract pairs (which are
quite expensive, especially on older cores).
Hal Finkel [Thu, 3 Sep 2015 23:23:00 +0000 (23:23 +0000)]
[PowerPC] Always use aggressive interleaving on the A2
On the A2, with an eye toward QPX unaligned-load merging, we should always use
aggressive interleaving. It is generally superior to only using concatenation
unrolling.
Hal Finkel [Thu, 3 Sep 2015 22:37:44 +0000 (22:37 +0000)]
[PowerPC] Try harder to find a base+offset when looking for consecutive accesses
When forming permutation-based unaligned vector loads, we need to know whether
it is valid to read ahead of the requested address by a full vector length.
Doing so is more efficient (and allows for more CSE with later loads), but
could trigger a page fault if invalid. To determine validity, we look for other
loads in the same block that access the relevant address range.
The relevant point here is that we need to do this as part of the process of
forming permutation-based vector loads, and this happens quite early in the
SDAG pipeline - specifically before many of the address calculations are fully
canonicalized. As a result, we need to try harder to recognize base+offset
address computations, because they still might appear as chain of adds
(base+offset+offset, for example). To account for this, we'll look through
chains of adds, accumulating the constant offsets.
Philip Reames [Thu, 3 Sep 2015 21:34:30 +0000 (21:34 +0000)]
[RewriteStatepointsForGC] Strengthen invariants around BDVs
As a first step towards a new implementation of the base pointer inference algorithm, introduce an abstraction for BDVs, strengthen the assertions around them, and rewrite the BDV relation code in terms of the abstraction which includes an explicit notion of whether the BDV is also a base. The later is motivated by the fact we had a bug where insertelement was always assumed to be a base pointer even though the BDV code knew it wasn't. The strengthened assertions in this patch would have caught that bug.
The next step will be to separate the DefiningValueMap into a BDV use list cache (entirely within findBasePointers) and a base pointer cache. Having the former will allow me to use a deterministic visit order when visiting BDVs in the inference algorithm and remove a bunch of ordering related hacks. Before actually doing the last step, I'm likely going to extend the lattice with a 'BaseN' (seen only base inputs) state so that I can kill the post process optimization step.
Hal Finkel [Thu, 3 Sep 2015 21:23:18 +0000 (21:23 +0000)]
[PowerPC] Include the permutation cost for unaligned vector loads
Pre-P8, when we generate code for unaligned vector loads (for Altivec and QPX
types), even when accounting for the combining that takes place for multiple
consecutive such loads, there is at least one load instructions and one
permutation for each load. Make sure the cost reported reflects the cost of the
permutes as well.
Hal Finkel [Thu, 3 Sep 2015 21:12:15 +0000 (21:12 +0000)]
[PowerPC] Compute the MMO offset for an unaligned load with signed arithmetic
If you compute the MMO offset using unsigned arithmetic, you end up with a
large positive offset instead of a small negative one. In theory, this could
cause bad instruction-scheduling decisions later.
I noticed this by inspection from the debug output, and using that for the
regression test is the best I can do right now.
Philip Reames [Thu, 3 Sep 2015 20:24:29 +0000 (20:24 +0000)]
[RewriteStatepointsForGC] Workaround a lack of determinism in visit order
The visit order being used in the base pointer inference algorithm is currently non-deterministic. When working on http://reviews.llvm.org/D12583, I discovered that we were relying on a peephole optimization to get deterministic ordering in one of the test cases.
This change is intented to let me test and land http://reviews.llvm.org/D12583. The current code will not be long lived. I'm starting to investigate a rewrite of the algorithm which will combine the post-process step into the initial algorithm and make the visit order determistic. Before doing that, I wanted to make sure the existing code was complete and the test were stable. Hopefully, patches should be up for review for the new algorithm this week or early next.
Dan Liew [Thu, 3 Sep 2015 18:43:56 +0000 (18:43 +0000)]
Try to clarify the semantics of fptrunc
* ``the value cannot fit within the destination type`` is ambiguous.
It could mean overflow, underflow (not in the IEEE-754 sense) or a
result that cannot be exactly represented and requires rounding or it
could mean some combination of these. The semantics now state it means
overflow **only**.
* Using "truncation" in the semantics is very misleading given that it
doesn't necessarily truncate (i.e. round to zero). For example on
x86_64 with SSE2 this is currently mapped to cvtsd2ss instruction
who's rounding behaviour is dependent on the MXCSR register which
is usually set to round to nearest even by default. The semantics
now state that the rounding mode is undefined.
Karl Schimpf [Thu, 3 Sep 2015 18:06:44 +0000 (18:06 +0000)]
Allow global address space forward decls using IDs in .ll files.
Summary:
This fixes bugzilla bug 24656. Fixes the case where there is a forward
reference to a global variable using an ID (i.e. @0). It does this by
passing the address space of the initializer pointer for which the
forward referenced global is used.