Chandler Carruth [Mon, 26 Jun 2017 03:31:31 +0000 (03:31 +0000)]
[InstCombine] Factor the logic for propagating !nonnull and !range
metadata out of InstCombine and into helpers.
NFC, this just exposes the logic used by InstCombine when propagating
metadata from one load instruction to another. The plan is to use this
in SROA to address PR32902.
If anyone has better ideas about how to factor this or name variables,
I'm all ears, but this seemed like a pretty good start and lets us make
progress on the PR.
This is based on a patch by Ariel Ben-Yehuda (D34285).
This should not be treated as a different version of
private_segment_buffer. These are distinct things with
different uses and register classes, and requires the
function argument info to have more context about the
function's type and environment.
Also add missing test coverage for the intrinsic, and
emit an error for HSA. This also encovers that the intrinsic
is broken unless there happen to be stack objects.
Chandler Carruth [Sun, 25 Jun 2017 22:45:31 +0000 (22:45 +0000)]
[LoopSimplify] Re-instate r306081 with a bug fix w.r.t. indirectbr.
This was reverted in r306252, but I already had the bug fixed and was
just trying to form a test case.
The original commit factored the logic for forming dedicated exits
inside of LoopSimplify into a helper that could be used elsewhere and
with an approach that required fewer intermediate data structures. See
that commit for full details including the change to the statistic, etc.
The code looked fine to me and my reviewers, but in fact didn't handle
indirectbr correctly -- it left the 'InLoopPredecessors' vector dirty.
If you have code that looks *just* right, you can end up leaking these
predecessors into a subsequent rewrite, and crash deep down when trying
to update PHI nodes for predecessors that don't exist.
I've added an assert that makes the bug much more obvious, and then
changed the code to reliably clear the vector so we don't get this bug
again in some other form as the code changes.
I've also added a test case that *does* manage to catch this while also
giving some nice positive coverage in the face of indirectbr.
The real code that found this came out of what I think is CPython's
interpreter loop, but any code with really "creative" interpreter loops
mixing indirectbr and other exit paths could manage to tickle the bug.
I was hard to reduce the original test case because in addition to
having a particular pattern of IR, the whole thing depends on the order
of the predecessors which is in turn depends on use list order. The test
case added here was designed so that in multiple different predecessor
orderings it should always end up going down the same path and tripping
the same bug. I hope. At least, it tripped it for me without
manipulating the use list order which is better than anything bugpoint
could do...
Chandler Carruth [Sun, 25 Jun 2017 22:24:02 +0000 (22:24 +0000)]
[LoopSimplify] Improve a test for loop simplify minorly. NFC.
I did some basic testing while looking for a bug in my recent change to
loop simplify and even though it didn't find the bug it seems like
a useful improvement anyways.
Anna Thomas [Sun, 25 Jun 2017 21:13:58 +0000 (21:13 +0000)]
[LoopDeletion] NFC: Move phi node value setting into prepass
Recommit NFC patch (rL306157) where I missed incrementing the basic block iterator,
which caused loop deletion tests to hang due to infinite loop.
Had reverted it in rL306162.
rL306157 commit message:
Currently, the implementation of delete dead loops has a special case
when the loop being deleted is never executed. This special case
(updating of exit block's incoming values for phis) can be
run as a prepass for non-executable loops before performing
the actual deletion.
Craig Topper [Sun, 25 Jun 2017 17:33:49 +0000 (17:33 +0000)]
[TableGen] Remove some copies around PatternToMatch.
Summary:
This patch does a few things that should remove some copies around PatternsToMatch. These were noticed while reviewing code for D34341.
Change constructor to take Dstregs by value and move it into the class. Change one of the callers to add std::move to the argument so that it gets moved.
Make AddPatternToMatch take PatternToMatch by rvalue reference so we can move it into the PatternsToMatch vector. I believe we should have a implicit default move constructor available on PatternToMatch. I chose rvalue reference because both callers call it with temporaries already.
Sanjay Patel [Sun, 25 Jun 2017 14:15:28 +0000 (14:15 +0000)]
[InstCombine] add (sext i1 X), 1 --> zext (not X)
http://rise4fun.com/Alive/i8Q
A narrow bitwise logic op is obviously better than math for value tracking,
and zext is better than sext. Typically, the 'not' will be folded into an
icmp predicate.
The IR difference would even survive through codegen for x86, so we would see
worse code:
AVX-512: Fixed a crash during legalization of <3 x i8> type
The compiler fails with assertion during legalization of SETCC for <3 x i8> operands.
The result is extended to <4 x i8> and then truncated <4 x i1>. It does not happen on AVX2, because the final result of SETCC is <4 x i32>.
Xin Tong [Sun, 25 Jun 2017 12:55:11 +0000 (12:55 +0000)]
[AST] Fix a bug in aliasesUnknownInst. Make sure we are comparing the unknown instructions in the alias set and the instruction interested in.
Summary:
Make sure we are comparing the unknown instructions in the alias set and the instruction interested in.
I believe this is clearly a bug (missed opportunity). I can also add some test cases if desired.
Igor Breger [Sun, 25 Jun 2017 11:42:17 +0000 (11:42 +0000)]
[GlobalISel][X86] Support vector type G_EXTRACT selection.
Summary:
Support vector type G_EXTRACT selection. For now G_EXTRACT marked as legal for any type, so nothing to do in legalizer.
Split from https://reviews.llvm.org/D33665
Dorit Nuzman [Sun, 25 Jun 2017 08:26:25 +0000 (08:26 +0000)]
[AVX2] [TTI CostModel] Add cost of interleaved loads/stores for AVX2
The cost of an interleaved access was only implemented for AVX512. For other
X86 targets an overly conservative Base cost was returned, resulting in
avoiding vectorization where it is actually profitable to vectorize.
This patch starts to add costs for AVX2 for most prominent cases of
interleaved accesses (stride 3,4 chars, for now).
Note1: Improvements of up to ~4x were observed in some of EEMBC's rgb
workloads; There is also a known issue of 15-30% degradations on some of these
workloads, associated with an interleaved access followed by type
promotion/widening; the resulting shuffle sequence is currently inefficient and
will be improved by a series of patches that extend the X86InterleavedAccess pass
(such as D34601 and more to follow).
Note 2: The costs in this patch do not reflect port pressure penalties which can
be very dominant in the case of interleaved accesses since most of the shuffle
operations are restricted to a single port. Further tuning, that may incorporate
these considerations, will be done on top of the upcoming improved shuffle
sequences (that is, along with the abovementioned work to extend
X86InterleavedAccess pass).
Ed Schouten [Sun, 25 Jun 2017 08:19:37 +0000 (08:19 +0000)]
Add support for Ananas platform
Ananas is a home-brew operating system, mainly for amd64 machines. After
using GCC for quite some time, it has switched to clang and never looked
back - yet, having to manually patch things is annoying, so it'd be much
nicer if this was in the official tree.
Craig Topper [Sun, 25 Jun 2017 06:56:34 +0000 (06:56 +0000)]
[PatternMatch] Just check if value is a Constant before calling isAllOnesValue for not_match. We don't really need to check for a specific subclass of Constant. NFC
Note the error message about the corrupted line number.
It turns out that the problem is that cvdump cannot find the
/names stream (e.g. the global string table), and the reason it
can't find the /names stream is because it doesn't understand
the NameMap that we serialize which tells pdb consumers which
stream has the string table.
Some experimentation shows that if we add items to the hash
table in a specific order before serializing it, cvdump can read
it. This suggests that either we're using the wrong hash function,
or we're serializing something incorrectly, but it will take some
deeper investigation to figure out how / why. For now, this at
least allows cvdump to read our line information (and incidentally,
produces an identical byte sequence to what Microsoft tools
produce when writing the named stream map).
Zachary Turner [Sun, 25 Jun 2017 00:00:08 +0000 (00:00 +0000)]
[Support] Don't use std::iterator, it's deprecated in C++17.
In converting this over to iterator_facade_base, some member
operators and methods are no longer needed since iterator_facade
implements them in the base class using CRTP.
Craig Topper [Sat, 24 Jun 2017 23:34:50 +0000 (23:34 +0000)]
[SCEV] Avoid copying ConstantRange just to get the min/max value
Summary:
This patch changes getRange to getRangeRef and returns a reference to the ConstantRange object stored inside the DenseMap caches. We then take advantage of that to add new helper methods that can return min/max value of a signed or unsigned ConstantRange using that reference without first copying the ConstantRange.
getRangeRef calls itself recursively and I believe the reference return is fine for those calls.
I've left getSignedRange and getUnsignedRange returning a ConstantRange object so they will make a copy now. This is to ensure safety since the reference will be invalidated if the DenseMap changes.
I'm sure there are still more places that can take advantage of the reference and I'll submit future patches as I find them.
Craig Topper [Sat, 24 Jun 2017 22:59:10 +0000 (22:59 +0000)]
[IR] Implement commutable matchers without using combineOr
Summary:
Turns out creating matchers with combineOr isn't very efficient as we have to build matcher objects for both sides of the OR. Those objects aren't free, the trees usually contain several objects that contain a reference to a Value *, ConstantInt *, APInt * or some such thing. The compiler isn't always willing to inline all the matcher code to get rid of these member variables. Thus we end up loads and stores of these variables.
Using combineOR ends up creating two complete copies of the tree and the associated stores. I believe we're also paying for the opcode check twice.
This patch adds a commutable mode to several of the matcher objects as a bool template parameter that can be used to enable commutable support directly in the match functions of the corresponding objects. This avoids the duplicate object creation and the opcode checks.
This shows about an ~7-8k reduction in the opt binary size on my local build.
Hiroshi Inoue [Sat, 24 Jun 2017 15:17:38 +0000 (15:17 +0000)]
[SelectionDAG] set dereferenceable flag when expanding memcpy/memmove
When SelectionDAG expands memcpy (or memmove) call into a sequence of load and store instructions, it disregards dereferenceable flag even the source pointer is known to be dereferenceable.
This results in an assertion failure if SelectionDAG commonizes a load instruction generated for memcpy with another load instruction for the source pointer.
This patch makes SelectionDAG to set the dereferenceable flag for the load instructions properly to avoid the assertion failure.
Tobias Grosser [Sat, 24 Jun 2017 08:09:33 +0000 (08:09 +0000)]
Ensure backends available in 'opt' are also available in 'bugpoint'
This patch links LLVM back-ends into bugpoint the same way they are already
available in 'opt' and 'clang'. This resolves an inconsistency that allowed the
use of LLVM backends in loadable modules that run in 'opt', but that would
prevent the debugging of these modules with bugpoint due to unavailable /
unresolved symbols.
For e.g. In D31859, Polly requires the NVPTX back-end.
Rafael Espindola [Sat, 24 Jun 2017 05:12:29 +0000 (05:12 +0000)]
Remove a processFixupValue hack.
The intention of processFixupValue is not to redefine the semantics of
MCExpr. It is odd enough that a expression lowers to a PCRel MCExpr or
not depending on what it looks like. At least it is a local hack now.
I left a fix for anyone trying to figure out what producers should be
producing a different expression.
Rafael Espindola [Fri, 23 Jun 2017 22:52:36 +0000 (22:52 +0000)]
ARM: move some logic from processFixupValue to applyFixup.
processFixupValue is called on every relaxation iteration. applyFixup
is only called once at the very end. applyFixup is then the correct
place to do last minute changes and value checks.
While here, do proper range checks again for fixup_arm_thumb_bl. We
used to do it, but dropped because of thumb2. We now do it again, but
use the thumb2 range.
Rafael Espindola [Fri, 23 Jun 2017 22:50:24 +0000 (22:50 +0000)]
This reverts commit r306166 and r306168.
Revert "[ORC] Remove redundant semicolons from DEFINE_SIMPLE_CONVERSION_FUNCTIONS uses."
Revert "[ORC] Move ORC IR layer interface from addModuleSet to addModule and fix the module type as std::shared_ptr<Module>."
They broke ExecutionEngine/OrcMCJIT/test-global-ctors.ll on linux.
Petar Jovanovic [Fri, 23 Jun 2017 22:37:19 +0000 (22:37 +0000)]
Reland r306095: [mips] Fix reg positions in the aui/daui instructions
After fixing (r306173) a failing test in the lld test suite (r306173),
reland r306095.
Original commit message:
[mips] Fix register positions in the aui/daui instructions
Swapped the position of the rt and rs register in the aui/daui
instructions for mips32r6 and mips64r6. With this change, the format of
the generated instructions complies with specifications and GCC.
Patch by Milos Stojanovic.
Reid Kleckner [Fri, 23 Jun 2017 22:12:11 +0000 (22:12 +0000)]
[llvm-readobj] Fix COFF RVA table dumping bug
We would return an error in getVaPtr if the RVA table being dumped was
the last data in the .rdata section. Avoid the issue by subtracting one
from the offset and adding it back to get an open interval again.
Anna Thomas [Fri, 23 Jun 2017 21:30:48 +0000 (21:30 +0000)]
Revert "[LoopDeletion] NFC: Move phi node value setting into prepass"
This reverts commit r306157.
It caused some timeouts in clang tests. Perhaps unreachable loops have
far too many phi nodes.
Reverting and investigating.
Anna Thomas [Fri, 23 Jun 2017 20:38:50 +0000 (20:38 +0000)]
[LoopDeletion] NFC: Move phi node value setting into prepass
Currently, the implementation of delete dead loops has a special case
when the loop being deleted is never executed. This special case
(updating of exit block's incoming values for phis) can be
run as a prepass for non-executable loops before performing
the actual deletion.
Craig Topper [Fri, 23 Jun 2017 20:28:49 +0000 (20:28 +0000)]
[APInt] Use trailing bit counting methods instead of population count method in isAllOnesValue, isMaxSigendValue, and isMinSignedValue. NFCI
The trailing bit methods will early out if they find a bit of the opposite while popcount must always look at all bits. I also assume that more CPUs implement trailing bit counting with native instructions than population count.
Craig Topper [Fri, 23 Jun 2017 20:28:45 +0000 (20:28 +0000)]
[APInt] Move the single word cases of countTrailingZeros and countLeadingOnes inline for consistency with countTrailingOnes and countLeadingZeros. NFCI
Zachary Turner [Fri, 23 Jun 2017 20:28:14 +0000 (20:28 +0000)]
[llvm-pdbutil] Show what blocks a stream occupies.
This is useful when you want to look at a specific chunk of a
stream or look for discontinuities, and you need to know the
list of blocks occupied by a stream.
Zachary Turner [Fri, 23 Jun 2017 20:18:38 +0000 (20:18 +0000)]
[llvm-pdbutil] Dump raw bytes of pdb name map.
This patch dumps the raw bytes of the pdb name map which contains
the mapping of stream name to stream index for the string table
and other reserved streams.
Brian Gesiak [Fri, 23 Jun 2017 20:06:34 +0000 (20:06 +0000)]
[opt-viewer] Remove positional arg checks (NFC)
Summary:
opt-stats.py and opt-viewer.py's argument parsers both take a positional
argument 'yaml_files'. Positional arguments in Python's argparse module are
required by default, so the subsequent checks for `len(args.yaml_files) == 0`
are unnecessary -- if the length was zero, then the call to
`parser.parse_args()` would have thrown an error already.
Because there is no way for `len(args.yaml_files)` to be zero at these
points, removing the code is NFC.
Zachary Turner [Fri, 23 Jun 2017 19:54:44 +0000 (19:54 +0000)]
[llvm-pdbutil] Add the ability to dump raw bytes from the file.
Normally we can only make sense of the content of a PDB in terms
of streams and blocks, but in some cases it may be useful to dump
bytes at a specific absolute file offset. For example, if you
know that some interesting data is at a particular location and
you want to see some surrounding data.
Chad Rosier [Fri, 23 Jun 2017 19:20:12 +0000 (19:20 +0000)]
[AArch64] Prefer Bcc to CBZ/CBNZ/TBZ/TBNZ when NZCV flags can be set for "free".
This patch contains a pass that transforms CBZ/CBNZ/TBZ/TBNZ instructions into a
conditional branch (Bcc), when the NZCV flags can be set for "free". This is
preferred on targets that have more flexibility when scheduling Bcc
instructions as compared to CBZ/CBNZ/TBZ/TBNZ (assuming all other variables are
equal). This can reduce register pressure and is also the default behavior for
GCC.
A few examples:
add w8, w0, w1 -> cmn w0, w1 ; CMN is an alias of ADDS.
cbz w8, .LBB_2 -> b.eq .LBB0_2 ; single def/use of w8 removed.
whitequark [Fri, 23 Jun 2017 18:58:10 +0000 (18:58 +0000)]
[X86] Fix SP adjustment in stack probes emitted on 32-bit Windows.
Commit r306010 adjusted the condition as follows:
- if (Is64Bit) {
+ if (!STI.isTargetWin32()) {
The intent was to preserve the behavior on all Windows platforms
but extend the behavior on 64-bit Windows platforms to every
other one. (Before r306010, emitStackProbeCall only ever executed
when emitting code for Windows triples.)
Unfortunately,
if (Is64Bit && STI.isOSWindows())
is not the same as
if (!STI.isTargetWin32())
because of the way isTargetWin32() is defined:
Zachary Turner [Fri, 23 Jun 2017 18:52:13 +0000 (18:52 +0000)]
[llvm-pdbutil] Add a function for formatting MSF data.
The goal here is to make it possible to display absolute
file offsets when dumping byets from an MSF. The problem is
that when dumping bytes from an MSF, often the bytes will
cross a block boundary and encounter a discontinuity. We
can't use the normal formatBinary() function for this because
this would just treat the sequence as entirely ascending, and
not account out-of-order blocks.
This patch adds a formatMsfData() function to our printer, and
then uses this function to improve the output of the -stream-data
command line option for dumping bytes from a particular stream.
Test coverage is also expanded to make sure to include all possible
scenarios of offsets, sizes, and crossing block boundaries.
Add a ThinLTO cache policy for controlling the maximum cache size in bytes.
This is useful when an upper limit on the cache size needs to be
controlled independently of the amount of the amount of free space.
One use case is a machine with a large number of cache directories
(e.g. a buildbot slave hosting a large number of independent build
jobs). By imposing an upper size limit on each cache directory,
users can more easily estimate the server's capacity.
Zachary Turner [Fri, 23 Jun 2017 16:38:40 +0000 (16:38 +0000)]
Add a BinarySubstreamRef, and a method to read one.
This is essentially just a BinaryStreamRef packaged with an
offset and the logic for reading one is no different than the
logic for reading a BinaryStreamRef, except that we save the
current offset.
Tim Northover [Fri, 23 Jun 2017 16:15:55 +0000 (16:15 +0000)]
GlobalISel: remove G_SEQUENCE instruction.
It was trying to do too many things. The basic lumping together of values for
legalization purposes is now handled by G_MERGE_VALUES. More complex things
involving gaps and odd sizes are handled by G_INSERT sequences.
Tim Northover [Fri, 23 Jun 2017 16:15:37 +0000 (16:15 +0000)]
GlobalISel: convert buildSequence to use non-deprecated instructions.
G_SEQUENCE is going away soon so as a first step the MachineIRBuilder needs to
be taught how to emulate it with alternatives. We use G_MERGE_VALUES where
possible, and a sequence of G_INSERTs if not.