David Majnemer [Sun, 16 Aug 2015 07:09:17 +0000 (07:09 +0000)]
[InstCombine] Replace an and+icmp with a trunc+icmp
Bitwise arithmetic can obscure a simple sign-test. If replacing the
mask with a truncate is preferable if the type is legal because it
permits us to rephrase the comparison more explicitly.
Chandler Carruth [Sun, 16 Aug 2015 06:35:19 +0000 (06:35 +0000)]
Revert r244127: [PM] Remove a failed attempt to port the CallGraph
analysis ...
It turns out that we *do* need the old CallGraph ported to the new pass
manager. There are times where this model of a call graph is really
superior to the one provided by the LazyCallGraph. For example,
GlobalsModRef very specifically needs the model provided by CallGraph.
While here, I've tried to make the move semantics actually work. =]
David Majnemer [Sun, 16 Aug 2015 04:52:11 +0000 (04:52 +0000)]
[X86] Widen the 'AND' mask if doing so shrinks the encoding size
We can set additional bits in a mask given that we know the other
operand of an AND already has some bits set to zero. This can be more
efficient if doing so allows us to use an instruction which implicitly
sign extends the immediate.
Simon Pilgrim [Sat, 15 Aug 2015 13:27:30 +0000 (13:27 +0000)]
[DAGCombiner] Attempt to mask vectors before zero extension instead of after.
For cases where we TRUNCATE and then ZERO_EXTEND to a larger size (often from vector legalization), see if we can mask the source data and then ZERO_EXTEND (instead of after a ANY_EXTEND). This can help avoid having to generate a larger mask, and possibly applying it to several sub-vectors.
(zext (truncate x)) -> (zext (and(x, m))
Includes a minor patch to SystemZ to better recognise 8/16-bit zero extension patterns from RISBG bit-extraction code.
This is the first of a number of minor patches to help improve the conversion of byte masks to clear mask shuffles.
Chandler Carruth [Sat, 15 Aug 2015 09:22:21 +0000 (09:22 +0000)]
[PM/AA] Delete the LibCallAliasAnalysis and all the associated
infrastructure.
This AA was never used in tree. It's infrastructure also completely
overlaps that of TargetLibraryInfo which is used heavily by BasicAA to
achieve similar goals to those stated for this analysis.
As has come up in several discussions, the use case here is still really
important, but this code isn't helping move toward that use case. Any
progress on better supporting rich AA information for runtime library
environments would likely be better off starting from scratch or
starting from TargetLibraryInfo than from this base.
Matt Arsenault [Sat, 15 Aug 2015 02:58:49 +0000 (02:58 +0000)]
AMDGPU/SI: Only look at live out SGPR defs
When trying to fix SGPR live ranges, skip defs that are
killed in the same block as the def. I don't think
we need to worry about these cases as long as the
live ranges of the SGPRs in dominating blocks are
correct.
This reduces the number of elements the second
loop over the function needs to look at, and makes
it generally easier to understand. The second loop
also only considers if the live range is live
in to a block, which logically means it
must have been live out from another.
David Majnemer [Sat, 15 Aug 2015 02:46:08 +0000 (02:46 +0000)]
[IR] Give catchret an optional 'return value' operand
Some personality routines require funclet exit points to be clearly
marked, this is done by producing a token at the funclet pad and
consuming it at the corresponding ret instruction. CleanupReturnInst
already had a spot for this operand but CatchReturnInst did not.
Other personality routines don't need to use this which is why it has
been made optional.
JF Bastien [Sat, 15 Aug 2015 01:23:28 +0000 (01:23 +0000)]
[WebAssembly] Add Relooper
This is just an initial checkin of an implementation of the Relooper algorithm, in preparation for WebAssembly codegen to utilize. It doesn't do anything yet by itself.
The Relooper algorithm takes an arbitrary control flow graph and generates structured control flow from that, utilizing a helper variable when necessary to handle irreducibility. The WebAssembly backend will be able to use this in order to generate an AST for its binary format.
JF Bastien [Sat, 15 Aug 2015 01:18:18 +0000 (01:18 +0000)]
Accelerate MergeFunctions with hashing
This patch makes the Merge Functions pass faster by calculating and comparing
a hash value which captures the essential structure of a function before
performing a full function comparison.
The hash is calculated by hashing the function signature, then walking the basic
blocks of the function in the same order as the main comparison function. The
opcode of each instruction is hashed in sequence, which means that different
functions according to the existing total order cannot have the same hash, as
the comparison requires the opcodes of the two functions to be the same order.
The hash function is a static member of the FunctionComparator class because it
is tightly coupled to the exact comparison function used. For example, functions
which are equivalent modulo a single variant callsite might be merged by a more
aggressive MergeFunctions, and the hash function would need to be insensitive to
these differences in order to exploit this.
The hashing function uses a utility class which accumulates the values into an
internal state using a standard bit-mixing function. Note that this is a different interface
than a regular hashing routine, because the values to be hashed are scattered
amongst the properties of a llvm::Function, not linear in memory. This scheme is
fast because only one word of state needs to be kept, and the mixing function is
a few instructions.
The main runOnModule function first computes the hash of each function, and only
further processes functions which do not have a unique function hash. The hash
is also used to order the sorted function set. If the hashes differ, their
values are used to order the functions, otherwise the full comparison is done.
Both of these are helpful in speeding up MergeFunctions. Together they result in
speedups of 9% for mysqld (a mostly C application with little redundancy), 46%
for libxul in Firefox, and 117% for Chromium. (These are all LTO builds.) In all
three cases, the new speed of MergeFunctions is about half that of the module
verifier, making it relatively inexpensive even for large LTO builds with
hundreds of thousands of functions. The same functions are merged, so this
change is free performance.
Matt Arsenault [Sat, 15 Aug 2015 00:53:06 +0000 (00:53 +0000)]
LoopStrengthReduce: Try to pass address space to isLegalAddressingMode
This seems to only work some of the time. In some situations,
this seems to use a nonsensical type and isn't actually aware of the
memory being accessed. e.g. if branch condition is an icmp of a pointer,
it checks the addressing mode of i1.
Matt Arsenault [Sat, 15 Aug 2015 00:12:30 +0000 (00:12 +0000)]
AMDGPU/SI: Make comments more precise.
True branch instructions do behave as expected with liveness.
Avoid the phrasing "branch decision is based on a value in an SGPR"
because this could be misleading. A VALU compare instruction's
result is still based on an SGPR, even though that condition
may be divergent.
[SCEV] Apply NSW and NUW flags via poison value analysis for sub, mul and shl
Summary:
http://reviews.llvm.org/D11212 made Scalar Evolution able to propagate NSW and NUW flags from instructions to SCEVs for add instructions. This patch expands that to sub, mul and shl instructions.
This change makes LSR able to generate pointer induction variables for loops like these, where the index is 32 bit and the pointer is 64 bit:
for (int i = 0; i < numIterations; ++i)
sum += ptr[i - offset];
for (int i = 0; i < numIterations; ++i)
sum += ptr[i * stride];
for (int i = 0; i < numIterations; ++i)
sum += ptr[3 * (i << 7)];
Pat Gavlin [Fri, 14 Aug 2015 22:41:43 +0000 (22:41 +0000)]
Add a target environment for CoreCLR.
Although targeting CoreCLR is similar to targeting MSVC, there are
certain important differences that the backend must be aware of
(e.g. differences in stack probes, EH, and library calls).
We canonicalize V64 vectors to V128 through insert_subvector: the other
FMLA/FMLS/FMUL/FMULX patterns match that already, but this one doesn't,
so we'd fail to match fmls and generate fneg+fmla instead.
The vector equivalents are already tested and functional.
Michael Kruse [Fri, 14 Aug 2015 20:20:00 +0000 (20:20 +0000)]
[RegionInfo] Remove unused and broken function splitBlock
Summary:
It always makes NewBB the entry of the region instead of OldBB. This breaks if there are edges from inside the region to OldBB. OldBB is moved out of the region and hence there are exiting edges to OldBB and the region's exit block, contradicting the single-exit condition for regions.
The only use from Polly is going to be removed, hence I propose to remove the function completely.
Vedant Kumar [Fri, 14 Aug 2015 18:36:47 +0000 (18:36 +0000)]
[ARM] Fix MachO CPU Subtype selection
This patch makes the Darwin ARM backend take advantage of TargetParser. It
also teaches TargetParser about ARMV7K for the first time. This makes target
triple parsing more consistent across llvm.
This patch fixes the x86 implementation of allowsMisalignedMemoryAccess() to correctly
return the 'Fast' output parameter for 32-byte accesses. To test that, an existing load
merging optimization is changed to use the TLI hook. This exposes a shortcoming in the
current logic and results in the regression test update. Changing other direct users of
the isUnalignedMem32Slow() x86 CPU attribute would be a follow-on patch.
Without the fix in allowsMisalignedMemoryAccesses(), we will infinite loop when targeting
SandyBridge because LowerINSERT_SUBVECTOR() creates 32-byte loads from two 16-byte loads
while PerformLOADCombine() splits them back into 16-byte loads.
This change adds RTTI and Exception flags to llvm-config's cxxflags. This solution is a minimal patch to solve the issue, and is recommended for the 3.7 release branch. Tom Stellard's outstanding work is the longer term solution.
Rafael Espindola [Fri, 14 Aug 2015 13:31:17 +0000 (13:31 +0000)]
Centralize the information about which object format we are using.
Other than some places that were handling unknown as ELF, this should
have no change. The test updates are because we were detecting
arm-coff or x86_64-win64-coff as ELF targets before.
It is not clear if the enum should live on the Triple. At least now it lives
in a single location and should be easier to move somewhere else.
James Molloy [Fri, 14 Aug 2015 09:08:50 +0000 (09:08 +0000)]
[AArch64] FMINNAN/FMAXNAN on f16 is not legal.
Spotted by Ahmed - in r244594 I inadvertently marked f16 min/max as legal.
I've reverted it here, and marked min/max on scalar f16's as promote. I've also added a testcase. The test just checks that the compiler doesn't fall over - it doesn't create fmin nodes for f16 yet.
Lang Hames [Fri, 14 Aug 2015 06:26:42 +0000 (06:26 +0000)]
[RuntimeDyld] Make sure code-sections aren't under-aligned.
Code-section alignment should be at least as high as the minimum
stub alignment. If the section alignment is lower it can cause
padding to be emitted resulting in alignment errors if the section
is mapped to a higher alignment on the target.
E.g. If a text section with a 4-byte alignment gets 4-bytes of
padding to guarantee 8-byte alignment for stubs but is re-mapped to
an 8-byte alignment on the target, the 4-bytes of padding will push
the stubs to 4-byte alignment causing a crash.
No test case: There is currently no way to control host section
alignment in llvm-rtdyld. This could be made testable by adding
a custom memory manager. I'll look at that in a follow-up patch.
David Majnemer [Fri, 14 Aug 2015 05:09:07 +0000 (05:09 +0000)]
[IR] Add token types
This introduces the basic functionality to support "token types".
The motivation stems from the need to perform operations on a Value
whose provenance cannot be obscured.
There are several applications for such a type but my immediate
motivation stems from WinEH. Our personality routine enforces a
single-entry - single-exit regime for cleanups. After several rounds of
optimizations, we may be left with a terminator whose "cleanup-entry
block" is not entirely clear because control flow has merged two
cleanups together. We have experimented with using labels as operands
inside of instructions which are not terminators to indicate where we
came from but found that LLVM does not expect such exotic uses of
BasicBlocks.
Instead, we can use this new type to clearly associate the "entry point"
and "exit point" of our cleanup. This is done by having the cleanuppad
yield a Token and consuming it at the cleanupret.
The token type makes it impossible to obscure or otherwise hide the
Value, making it trivial to track the relationship between the two
points.
What is the burden to the optimizer? Well, it turns out we have already
paid down this cost by accepting that there are certain calls that we
are not permitted to duplicate, optimizations have to watch out for
such instructions anyway. There are additional places in the optimizer
that we will probably have to update but early examination has given me
the impression that this will not be heroic.
Chandler Carruth [Fri, 14 Aug 2015 02:50:34 +0000 (02:50 +0000)]
[PM/AA] Hoist the value handle definition for CFLAA into the header to
satisfy libc++'s std::forward_list which requires the value type to be
complete.
Chandler Carruth [Fri, 14 Aug 2015 02:42:20 +0000 (02:42 +0000)]
[PM/AA] Extract a minimal interface for CFLAA to its own header file.
I've used forward declarations and reorderd the source code some to make
this reasonably clean and keep as much of the code as possible in the
source file, including all the stratified set details. Just the basic AA
interface and the create function are in the header file, and the header
file is now included into the relevant locations.
Chandler Carruth [Fri, 14 Aug 2015 02:16:12 +0000 (02:16 +0000)]
[PM/AA] Delete two pointlessly overridden methods on the AA interface by
the AA counter pass.
For pointsToConstantMemory, I think this is a "bug fix" as I think the
code as written will actually infloop if ever reached. For the
getModRefInfo, this is a no-op change but with a significantly simpler
form.
Chandler Carruth [Fri, 14 Aug 2015 02:05:41 +0000 (02:05 +0000)]
[PM/AA] Hoist the AA counter pass into a header to match the analysis
pattern.
Also hoist the creation routine out of the generic header and into the
pass header now that we have one.
I've worked to not make any changes, even formatting ones here. I'll
clean up the formatting and other things in a follow-up patch now that
the code is in the right place.
Jingyue Wu [Fri, 14 Aug 2015 02:02:05 +0000 (02:02 +0000)]
[SeparateConstOffsetFromGEP] sext(a)+sext(b) => sext(a+b) when a+b can't sign-overflow.
Summary:
This patch implements my promised optimization to reunites certain sexts from
operands after we extract the constant offset. See the header comment of
reuniteExts for its motivation.
One key building block that enables this optimization is Bjarke's poison value
analysis (D11212). That helps to prove "a +nsw b" can't overflow.
Chandler Carruth [Fri, 14 Aug 2015 00:21:10 +0000 (00:21 +0000)]
[LIR] Re-instate r244880, reverted in r244884, factoring the handling of
AliasAnalysis in LoopIdiomRecognize.
The previous commit to LIR, r244879, exposed some scary bug in the loop
pass pipeline with an assert failure that showed up on several bots.
This patch got reverted as part of getting that revision reverted, but
they're actually independent and unrelated. This patch has no functional
change and should be completely safe. It is also useful for my current
work on the AA infrastructure.
Alex Lorenz [Thu, 13 Aug 2015 23:10:16 +0000 (23:10 +0000)]
MIR Serialization: Change MIR syntax - use custom syntax for MBBs.
This commit modifies the way the machine basic blocks are serialized - now the
machine basic blocks are serialized using a custom syntax instead of relying on
YAML primitives. Instead of using YAML mappings to represent the individual
machine basic blocks in a machine function's body, the new syntax uses a single
YAML block scalar which contains all of the machine basic blocks and
instructions for that function.
This is an example of a function's body that uses the old syntax:
This syntax change is motivated by the fact that the bundled machine
instructions didn't map that well to the old syntax which was using a single
YAML sequence to store all of the machine instructions in a block. The bundled
machine instructions internally use flags like BundledPred and BundledSucc to
determine the bundles, and serializing them as MI flags using the old syntax
would have had a negative impact on the readability and the ease of editing
for MIR files. The new syntax allows me to serialize the bundled machine
instructions using a block construct without relying on the internal flags,
for example:
This commit also converts the MIR testcases to the new syntax. I developed
a script that can convert from the old syntax to the new one. I will post the
script on the llvm-commits mailing list in the thread for this commit.
Ahmed Bougacha [Thu, 13 Aug 2015 21:09:13 +0000 (21:09 +0000)]
[AArch64] Provide "too few operands" diags on short-form NEON also.
We used to just say "invalid type suffix for instruction", which is
misleading. This is because we fallback to the long-form matcher if the
short-form matcher failed, losing the error information on the way.
Save it, so that we can provide a little better diagnostics when the
long-form matcher thinks a suffix is the cause of the error.