James Molloy [Sat, 16 May 2015 21:27:14 +0000 (21:27 +0000)]
Revert commits r237521 and r237520.
The AArch64 LNT bot is unhappy - I've found that the problem is in
SimpliftDemandedBits, but that's going to require another code review
so reverting in the meantime.
James Molloy [Sat, 16 May 2015 13:10:45 +0000 (13:10 +0000)]
Reapply r237453 with a fix for the test timeouts.
The test timeouts were due to instcombine fighting itself. Regression test added.
Original log message:
Canonicalize min/max expressions correctly.
This patch introduces a canonical form for min/max idioms where one operand
is extended or truncated. This often happens when the other operand is a
constant. For example:
This builds upon a patch posted by David Majenemer
(https://www.marc.info/?l=llvm-commits&m=143008038714141&w=2). That pass
passively stopped instcombine from ruining canonical patterns. This
patch additionally actively makes instcombine canonicalize too.
Canonicalization of expressions involving a change in type from int->fp
or fp->int are not yet implemented.
Daniel Sanders [Sat, 16 May 2015 12:09:54 +0000 (12:09 +0000)]
[x86] Distinguish the 'o', 'v', 'X', and 'i' inline assembly memory constraints.
Summary:
But still handle them the same way since I don't know how they differ on
this target.
Of these, 'o' and 'v' are not tested but were already implemented.
I'm not sure why 'i' is required for X86 since it's supposed to be an
immediate constraint rather than a memory constraint. A test asserts
without it so I've included it for now.
Craig Topper [Sat, 16 May 2015 05:42:03 +0000 (05:42 +0000)]
[TableGen] Restructure a loop to make it exit early instead of skipping a portion of the body based on what will also be the terminating condition. NFC
Matthias Braun [Sat, 16 May 2015 03:11:07 +0000 (03:11 +0000)]
MachineSink: Collect registers before clearing their killflags.
Currently whenever we sink any instruction, we do clearKillFlags for
every use of every use operand for that instruction, apparently there
are a lot of duplication, therefore compile time penalties.
This patch collect all the interested registers first, do clearKillFlags
for it all together at once at the end, so we only need to do
clearKillFlags once for one register, duplication is avoided.
Ahmed Bougacha [Sat, 16 May 2015 01:32:26 +0000 (01:32 +0000)]
[MemCpyOpt] Turn memcpy from just-memset'd source into memset.
There's no point in copying around constants, so, when all else fails,
we can still transform memcpy of memset into two independent memsets.
To quote the example, we can turn:
memset(dst1, c, dst1_size);
memcpy(dst2, dst1, dst2_size);
into:
memset(dst1, c, dst1_size);
memset(dst2, c, dst2_size);
When dst2_size <= dst1_size.
Like r235232 for copy constructors, this can occur in move constructors.
Bill Schmidt [Sat, 16 May 2015 01:02:12 +0000 (01:02 +0000)]
[PPC64] Add vector pack/unpack support from ISA 2.07
This patch adds support for the following new instructions in the
Power ISA 2.07:
vpksdss
vpksdus
vpkudus
vpkudum
vupkhsw
vupklsw
These instructions are available through the vec_packs, vec_packsu,
vec_unpackh, and vec_unpackl built-in interfaces. These are
lane-sensitive instructions, so the built-ins have different
implementations for big- and little-endian, and the instructions must
be marked as killing the vector swap optimization for now.
The first three instructions perform saturating pack operations. The
fourth performs a modulo pack operation, which means it can be
represented with a vector shuffle, and conversely the appropriate
vector shuffles may cause this instruction to be generated. The other
instructions are only generated via built-in support for now.
Appropriate tests have been added.
There is a companion patch to clang for the rest of this support.
MC: Change MCAssembler::Symbols to store MCSymbol, NFC
Instead of storing a list of the `MCSymbolData` in use, store the
`MCSymbol`s. Churning in the direction of removing the back pointer
from `MCSymbolData`.
Turn `MCSymbolData` into a field inside of `MCSymbol`. Keep all the old
API alive for now, so that consumers can be updated in a later commit.
This means we still temporarily need the back pointer from
`MCSymbolData` to `MCSymbol`, but I'll remove it in a follow-up.
This optimizes for object emission over assembly emission. By removing
the `DenseMap` in `MCAssembler`, llc memory usage drops from around 1040
MB to 1001 MB (3.8%).
(I'm looking at `llc` memory usage on `verify-uselistorder.lto.opt.bc`;
see r236629 for details.)
Instead of an intrusive double-linked linked list, use a
`std::vector<>`. This saves a pointer per symbol and simplifies
`MCSymbolData`. Otherwise, no functionality change here.
While I measured a memory drop from around 1047MB to 1040MB (0.6%) --
and this is a decent cleanup in its own right -- it's primarily a
preparation patch for merging `MCSymbol` and `MCSymbolData`. I'll post
an updated patch for that to the list in a moment.
(I'm looking at `llc` memory usage on `verify-uselistorder.lto.opt.bc`;
see r236629 for details.)
Stop exposing the storage for `MCAssembler::Symbols`, and have
`MCAssembler` add symbols directly to its list instead of using a hook
in `MCSymbolData`. This opens up room for a follow-up commit to switch
from a linked list to a vector.
David Majnemer [Fri, 15 May 2015 20:08:27 +0000 (20:08 +0000)]
[X86] Use a better sentinel offset for the FrameAddr index
Other pieces of CodeGen want to negate frame object offsets to account
for architectures where the stack grows down. Our object is a pseudo
object so it's offset doesn't matter. However, we shouldn't choose an
offset which results in undefined behavior if you negate it.
Jingyue Wu [Fri, 15 May 2015 17:54:48 +0000 (17:54 +0000)]
Add a speculative execution pass
Summary:
This is a pass for speculative execution of instructions for simple if-then (triangle) control flow. It's aimed at GPUs, but could perhaps be used in other contexts. Enabling this pass gives us a 1.0% geomean improvement on Google benchmark suites, with one benchmark improving 33%.
Credit goes to Jingyue Wu for writing an earlier version of this pass.
Patched by Bjarke Roune.
Test Plan:
This patch adds a set of tests in test/Transforms/SpeculativeExecution/spec.ll
The pass is controlled by a flag which defaults to having the pass not run.
James Molloy [Fri, 15 May 2015 17:41:29 +0000 (17:41 +0000)]
[SDAGBuilder] Make the AArch64 builder happier.
I intended this loop to only unwrap SplitVector actions, but it
was more broad than that, such as unwrapping WidenVector actions,
which makes operations seem legal when they're not.
James Molloy [Fri, 15 May 2015 16:10:59 +0000 (16:10 +0000)]
Canonicalize min/max expressions correctly.
This patch introduces a canonical form for min/max idioms where one operand
is extended or truncated. This often happens when the other operand is a
constant. For example:
This builds upon a patch posted by David Majenemer
(https://www.marc.info/?l=llvm-commits&m=143008038714141&w=2). That pass
passively stopped instcombine from ruining canonical patterns. This
patch additionally actively makes instcombine canonicalize too.
Canonicalization of expressions involving a change in type from int->fp
or fp->int are not yet implemented.
James Molloy [Fri, 15 May 2015 16:04:50 +0000 (16:04 +0000)]
Allow min/max detection to see through casts.
This teaches the min/max idiom detector in ValueTracking to see through
casts such as SExt/ZExt/Trunc. SCEV can already do this, so we're bringing
non-SCEV analyses up to the same level.
The returned LHS/RHS will not match the type of the original SelectInst
any more, so a CastOp is returned too to inform the caller how to
convert to the SelectInst's type.
No in-tree users yet; this will be used by InstCombine in a followup.
Nemanja Ivanovic [Fri, 15 May 2015 15:29:53 +0000 (15:29 +0000)]
NFC - Test case invokes llc on a file rather than redirected from a file.
This has caused some local failures. Updating the test case to be more
like the majority of the similar test cases.
Committing on behalf of Hubert Tong (hstong@ca.ibm.com).
Toma Tabacu [Fri, 15 May 2015 09:42:11 +0000 (09:42 +0000)]
[mips] [IAS] Fix expansion of negative 32-bit immediates for LI/DLI.
Summary:
To maintain compatibility with GAS, we need to stop treating negative 32-bit immediates as 64-bit values when expanding LI/DLI.
This currently happens because of sign extension.
To do this we need to choose the 32-bit value expansion for values which use their upper 33 bits only for sign extension (i.e. no 0's, only 1's).
Sanjoy Das [Fri, 15 May 2015 00:26:21 +0000 (00:26 +0000)]
[PlaceSafepoints] Fix a bug that came in with rL236672.
Transfer the calling convention from the invoke being replaced by
PlaceStatepoints to the new invoke to gc.statepoint created. Add a test
case that would have caught this issue.
Akira Hatanaka [Fri, 15 May 2015 00:20:44 +0000 (00:20 +0000)]
Stop resetting SanitizeAddress in TargetMachine::resetTargetOptions. NFC.
Instead of doing that, create a temporary copy of MCTargetOptions and reset its
SanitizeAddress field based on the function's attribute every time an InlineAsm
instruction is emitted in AsmPrinter::EmitInlineAsm.
This is part of the work to remove TargetMachine::resetTargetOptions (the FIXME
added to TargetMachine.cpp in r236009 explains why this function has to be
removed).
Alex Lorenz [Thu, 14 May 2015 23:08:22 +0000 (23:08 +0000)]
YAML: Add support for literal block scalar I/O.
This commit gives the users of the YAML Traits I/O library
the ability to serialize scalars using the YAML literal block
scalar notation by allowing them to implement a specialization
of the `BlockScalarTraits` struct for their custom types.
[lib/Fuzzer] Add SHA1 implementation from public domain.
Summary:
This adds a SHA1 implementation taken from public domain code.
The change is trivial, but as it involves third-party code I'd like
a second pair of eyes before commit.
LibFuzzer can not use SHA1 from openssl because openssl may not be available
and because we may be fuzzing openssl itself.
Using sha1sum via a pipe is too slow.
Alex Lorenz [Thu, 14 May 2015 20:46:12 +0000 (20:46 +0000)]
Fix memory leak introduced in r237314.
The commit r237314 that implements YAML block parsing
introduced a leak that was caught by the ASAN linux buildbot.
YAML Parser stores its tokens in an ilist, and allocates
tokens using a BumpPtrAllocator, but doesn't call the
destructor for the allocated tokens. R237314 added an
std::string field to a Token which leaked as the Token's
destructor wasn't called. This commit fixes this leak
by calling the Token's destructor when a Token is being
removed from an ilist of tokens.
Andrea Di Biagio [Thu, 14 May 2015 18:01:48 +0000 (18:01 +0000)]
[ConstantFolding] Fix wrong folding of intrinsic 'convert.from.fp16'.
Function 'ConstantFoldScalarCall' (in ConstantFolding.cpp) works under the
wrong assumption that a call to 'convert.from.fp16' returns a value of
type 'float'.
However, intrinsic 'convert.from.fp16' can be overloaded; for example, we
can call 'convert.from.fp16.f64' to convert from half to double; etc.
Before this patch, the following example would have triggered an assertion
failure in opt (with -constprop):
This patch fixes the problem in ConstantFolding.cpp. When folding a call to
convert.from.fp16, we perform a different kind of conversion based on the call
return type.
Added test 'Transform/ConstProp/convert-from-fp16.ll'.
Brendon Cahoon [Thu, 14 May 2015 17:31:40 +0000 (17:31 +0000)]
[Hexagon] Remove dead constant assignment in hardware loop pass
After converting a loop to a hardware loop, the pass should remove
any unnecessary instructions from the old compare-and-branch
code. This patch removes a dead constant assignment that was
used in the compare instruction.
[mips] Do not place users of $ra in the delay slot of call instructions.
Summary:
When we are trying to fill the delay slot of a call instruction, we must avoid
filler instructions that use the $ra register. This fixes the test
MultiSource/Applications/JM/lencod when we enable the forward delay slot filler.
Adam Nemet [Thu, 14 May 2015 12:05:18 +0000 (12:05 +0000)]
New Loop Distribution pass
Summary:
This implements the initial version as was proposed earlier this year
(http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-January/080462.html).
Since then Loop Access Analysis was split out from the Loop Vectorizer
and was made into a separate analysis pass. Loop Distribution becomes
the second user of this analysis.
The pass is off by default and can be enabled
with -enable-loop-distribution. There is currently no notion of
profitability; if there is a loop with dependence cycles, the pass will
try to split them off from other memory operations into a separate loop.
I decided to remove the control-dependence calculation from this first
version. This and the issues with the PDT are actively discussed so it
probably makes sense to treat it separately. Right now I just mark all
terminator instruction required which keeps identical CFGs for each
distributed loop. This seems to be working pretty well for 456.hmmer
where even though there is an empty if-then block in the distributed
loop initially, it gets completely removed.
The pass keeps DominatorTree and LoopInfo updated. I've tested this
with -loop-distribute-verify with the testsuite where we distribute ~90
loops. SimplifyLoop is violated in some cases and I have a FIXME
covering this.
Toma Tabacu [Thu, 14 May 2015 10:02:58 +0000 (10:02 +0000)]
[mips] [IAS] Give expandLoadAddressSym() more specific arguments. NFC.
Summary:
If we only pass the necessary operands, we don't have to determine the position of the symbol operand when entering expandLoadAddressSym().
This simplifies the expandLoadAddressSym() code.
AVX-512: Added i1 type handling for calling conventions.
i1 type is a legal type on AVX-512 and can be passed as parameter or return value.
i1 is promoted to i8 on return and to i32 for call arguments (i8 is also promoted to i32 here).
The result code is similar to the previous X86 targets, where i1 is allways promoted to i8.
Craig Topper [Thu, 14 May 2015 05:54:02 +0000 (05:54 +0000)]
[TableGen] Remove an unnecessary outer 'if' around 3 separate inner ifs. No functional change intended.
The outer if had 3 separate conditions ORed together and then the inner ifs detected which of the three conditions it was by using only a portion of the specific condition. Just put the whole condition in each inner if and remove the outer if.
Ahmed Bougacha [Thu, 14 May 2015 01:00:51 +0000 (01:00 +0000)]
[CodeGen] Use standard -not gnueabi- naming for f16 libcalls on Darwin.
Other targets probably should as well. Since r237161, compiler-rt has
both, but I don't see why anything other than gnueabi would use a
gnueabi naming scheme.