[X86] Add patterns to turn an insert into lower subvector of a zero vector into a move instruction which will implicitly zero the upper elements.
Ideally we'd be able to emit the SUBREG_TO_REG without the explicit register->register move, but we'd need to be sure the producing operation would select something that guaranteed the upper bits were already zeroed.
In a future patch, I plan to teach isel to use a small vector move with implicit zeroing of the upper elements when it sees the (insert_subvector zero, X, 0) pattern.
Ayman Musa [Sun, 3 Sep 2017 13:53:44 +0000 (13:53 +0000)]
[X86][AVX512] Add simple tests for all AVX512 shuffle instructions.
Throughout an effort to strongly check the behavior of CodeGen with the IR shufflevector instruction we generated many tests while predicting the best X86 sequence that may be generated.
This is a subset of the generated tests that we think may add value to our X86 set of tests.
Some of the checks are not optimal and will be changed after fixing:
1. PR34394
2. PR34382
3. PR34380
4. PR34359
Ayman Musa [Sun, 3 Sep 2017 09:09:16 +0000 (09:09 +0000)]
[X86] Fix crash on assert of non-simple type after type-legalization
The function combineShuffleToVectorExtend in DAGCombine might generate an illegal typed node after "legalize types" phase, causing assertion on non-simple type to fail afterwards.
Adding a type check in case the combine is running after the type legalize pass.
Lang Hames [Sun, 3 Sep 2017 00:50:42 +0000 (00:50 +0000)]
[ORC] Add an Error return to the JITCompileCallbackManager::grow method.
Calling grow may result in an error if, for example, this is a callback
manager for a remote target. We need to be able to return this error to the
callee.
Keith Wyss [Sun, 3 Sep 2017 00:03:47 +0000 (00:03 +0000)]
[XRay][tools] Function call stack based analysis tooling for XRay traces
This change introduces a subcommand to the llvm-xray tool called
"stacks" which allows for analysing XRay traces provided as inputs and
accounting time to stacks instead of just individual functions. This
gives us a more precise view of where in a program the latency is
actually attributed.
The tool uses a trie data structure to keep track of the caller-callee
relationships as we process the XRay traces. In particular, we keep
track of the function call stack as we enter functions. While we're
doing this we're adding nodes in a trie and indicating a "calls"
relatinship between the caller (current top of the stack) and the callee
(the new top of the stack). When we push function ids onto the stack, we
keep track of the timestamp (TSC) for the enter event.
When exiting functions, we are able to account the duration by getting
the difference between the timestamp of the exit event and the
corresponding entry event in the stack. This works even if we somehow
miss the exit events for intermediary functions (i.e. if the exit event
is not cleanly associated with the enter event at the top of the stack).
The output of the tool currently provides just the top N leaf functions
that contribute the most latency, and the top N stacks that have the
most frequency. In the future we can provide more sophisticated query
mechanisms and potentially an export to database feature to make offline
analysis of the stack traces possible with existing tools.
Move some CLI utils out of llvm-isel-fuzzer and into the library
FuzzMutate might not be the best place for these, but it makes more
sense than an entirely new library for now. This will make setting up
fuzz targets with consistent CLI handling easier.
[X86] Teach fastisel to handle zext/sext i8->i16 and sext i1->i8/i16/i32/i64
Summary:
ZExt and SExt from i8 to i16 aren't implemented in the autogenerated fast isel table because normal isel does a zext/sext to 32-bits and a subreg extract to avoid a partial register write or false dependency on the upper bits of the destination. This means without handling in fast isel we end up triggering a fast isel abort.
We had no custom sign extend handling at all so while I was there I went ahead and implemented sext i1->i8/i16/i32/i64 which was also missing. This generates an i1->i8 sign extend using a mask with 1, then an 8-bit negate, then continues with a sext from i8. A better sequence would be a wider and/negate, but would require more custom code.
Fast isel tests are a mess and I couldn't find a good home for the tests so I created a new one.
The test pr34381.ll had to have fast-isel removed because it was relying on a fast isel abort to hit the bug. The test case still seems valid with fast-isel disabled though some of the instructions changed.
[InstCombine] combine foldAndOfFCmps and foldOrOfFcmps; NFCI
In addition to removing chunks of duplicated code, we don't
want these to diverge. If there's a fold for one, there
should be a fold of the other via DeMorgan's Laws.
Don Hinton [Sat, 2 Sep 2017 17:28:39 +0000 (17:28 +0000)]
[CMAKE] Move version control macros to AddLLVM.cmake so they can be reused by clang, etc.
Summary:
Move version control macros, find_first_existing_file and
find_first_existing_vc_file to AddLLVM.cmake so they can be reused by sub projects
like clang.
[InstCombine] fix misnamed locals and use them to reduce code; NFCI
We had these locals:
Value *Op0RHS = LHS->getOperand(1);
Value *Op1LHS = RHS->getOperand(0);
...so we confusingly transposed the meaning of left/right and op0/op1.
[InstCombine] move related functions next to each other; NFC
This makes it easier to see that they're almost duplicates.
As with the similar icmp functions, there should be identical
folds for both logic ops because those are DeMorganized variants.
The binutils utility dwp has an option "-e"
https://gcc.gnu.org/wiki/DebugFissionDWP
to specify an executable/library to get the list
of *.dwo files from it. This option is particularly useful when
someone runs the tool manually outside of a build system.
This diff adds an implementation of "-e" to llvm-dwp.
llvm-mt: Fix memory management in WindowsManifestMergerImpl::getMergedManifest
Summary:
xmlDoc needs to be released with xmlFreeDoc.
XML_PARSE_NODICT is needed for safe moving nodes between documents.
Buffer returned from xmlDocDumpFormatMemoryEnc needs xmlFree, but it needs
outlive users of getMergedManifest results.
Petr Hosek [Sat, 2 Sep 2017 02:28:03 +0000 (02:28 +0000)]
[CMake][runtimes] Use target specific name for all runtimes targets
We need to use target specific name for all runtimes targets. Target
specific name means the name of target in the LLVM build is different
from the name in runtimes build (in LLVM build, it's suffixed by the
target itself). Previously we have only used target specific names for
check targets collected through SUB_CHECK_TARGETS, but that's not
sufficient, we need to use target specific names for all targets we're
exposing in LLVM build.
Daniel Berlin [Sat, 2 Sep 2017 02:18:44 +0000 (02:18 +0000)]
Fix PR/33305. caused by trying to simplify expressions in phi of ops that should have no leaders.
Summary:
After a discussion with Rekka, i believe this (or a small variant)
should fix the remaining phi-of-ops problems.
Rekka's algorithm for completeness relies on looking up expressions
that should have no leader, and expecting it to fail (IE looking up
expressions that can't exist in a predecessor, and expecting it to
find nothing).
Unfortunately, sometimes these expressions can be simplified to
constants, but we need the lookup to fail anyway. Additionally, our
simplifier outsmarts this by taking these "not quite right"
expressions, and simplifying them into other expressions or walking
through phis, etc. In the past, we've sometimes been able to find
leaders for these expressions, incorrectly.
This change causes us to not to try to phi of ops such expressions.
We determine safety by seeing if they depend on a phi node in our
block.
This is not perfect, we can do a bit better, but this should be a
"correctness start" that we can then improve. It also requires a
bunch of caching that i'll eventually like to eliminate.
The right solution, longer term, to the simplifier issues, is to make
the query interface for the instruction simplifier/constant folder
have the flags we need, so that we can keep most things going, but
turn off the possibly-invalid parts (threading through phis, etc).
This is an issue in another wrong code bug as well.
Disable 64bit file position on old 32 bit Androids.
This is needed for building LLVM on Android with new NDK (newer
than r15c) and API level < 24. Android C library (Bionic) didn't have
support for 64 bit file position until Android N.
[MIParser] Ensure getHexUint doesn't produce APInts with a bitwidth of 0
If getHexUint reads in a hex 0, it will create an APInt with a value of 0.
The number of active bits on this APInt is used to calculate the bitwidth of
Result. The number of active bits is defined as an APInt's bitwidth - its
number of leading 0s. Since this APInt is 0, its bitwidth and number of leading
0s are equal.
Thus, Result is constructed with a bitwidth of 0, triggering an APInt assert.
This commit fixes that by checking if the APInt is equal to 0, and setting the
bitwidth to 32 if it is. Otherwise, it sets the bitwidth using getActiveBits.
This caused issues when compiling MIR files with successor probabilities. In
the case that a successor is tagged with a probability of 0, this assert would
fire on debug builds.
[InstCombine][InstSimplify] Teach decomposeBitTestICmp to look through truncate instructions
This patch teaches decomposeBitTestICmp to look through truncate instructions on the input to the compare. If a truncate is found it will now return the pre-truncated Value and appropriately extend the APInt mask.
This allows some code to be removed from InstSimplify that was doing this functionality.
This allows InstCombine's bit test combining code to match a pre-truncate Value with the same Value appear with an 'and' on another icmp. Or it allows us to combine a truncate to i16 and a truncate to i8. This also required removing the type check from the beginning of getMaskedTypeForICmpPair, but I believe that's ok because we still have to find two values from the input to each icmp that are equal before we'll do any transformation. So the type check was really just serving as an early out.
There was one user of decomposeBitTestICmp that didn't want to look through truncates, so I've added a flag to prevent that behavior when necessary.
[InstCombine] Don't require the compare types to be the same in getMaskedTypeForICmpPair.
A future patch will make the code look through truncates feeding the compare. So the compares might be different types but the pretruncated types might be the same.
This should be safe because we still require the same Value* to be used truncated or not in both compares. So that serves to ensure the types are the same.
[InstCombine] When converting decomposeBitTestICmp's APInt return to ConstantInt, make sure we use the type from the Value* that was also returned from decomposeBitTestICmp.
Previously we used the type from the LHS of the compare, but a future patch will change decomposeBitTestICmp to look through truncates so it will return a pretruncated Value* and the type needs to match that.
[x86] eliminate redundant shuffle of horizontal math ops when both inputs are the same
This is limited to a set of patterns based on the example in PR34111:
https://bugs.llvm.org/show_bug.cgi?id=34111
...but as I was investigating this, I see that horizontal patterns can go wrong in many,
many other ways that would not be handled by this patch. Each data type may even go
different in the DAG after starting with the same basic IR pattern, so even proper IR
canonicalization won't fix it all.
[llvm-pdbutil] Support dumping CodeView from object files.
We have llvm-readobj for dumping CodeView from object files, and
llvm-pdbutil has always been more focused on PDB. However,
llvm-pdbutil has a lot of useful options for summarizing debug
information in aggregate and presenting high level statistical
views. Furthermore, it's arguably better as a testing tool since
we don't have to write tests to conform to a state-machine like
structure where you match multiple lines in succession, each
depending on a previous match. llvm-pdbutil dumps much more
concisely, so it's possible to use single-line matches in many
cases where as with readobj tests you have to use multi-line
matches with an implicit state machine.
Because of this, I'm adding object file support to llvm-pdbutil.
In fact, this mirrors the cvdump tool from Microsoft, which also
supports both object files and pdb files. In the future we could
perhaps rename this tool llvm-cvutil.
In the meantime, this allows us to deep dive into object files
the same way we already can with PDB files.
Davide Italiano [Fri, 1 Sep 2017 19:54:08 +0000 (19:54 +0000)]
[TTI] Fix getGEPCost() for geps with a single operand.
Previously this would sporadically crash as TargetType
was never initialized. We special-case the single-operand
case returning earlier and trying to mimic the behaviour of
isLegalAddressingMode as closely as possible.
llvm-isel-fuzzer: Weak function invoke the ire of PE/COFF
It's non-trivial to use weak symbols in a cross platform way (See
sanitizer_win_defs.h in compiler-rt), and doing it naively like we
have here causes some build failures:
Instead of going down the rabbit hole of emulating weak symbols for
this very trivial dummy fuzzer driver, we can just rely on the fact
that we know which hooks any given fuzz target implements and forward
declare a normal symbol.
Davide Italiano [Fri, 1 Sep 2017 19:36:34 +0000 (19:36 +0000)]
[TTI] Initialize a value to trigger a crash deterministically.
We expect the pointer to be initialized by the above loop, but
if that's not executed, the contents are garbage.
A fix for the crash will be committed immediately after.
LiveIntervalAnalysis: Fix alias regunit reserved definition
A register in CodeGen can be marked as reserved: In that case we
consider the register always live and do not use (or rather ignore)
kill/dead/undef operand flags.
LiveIntervalAnalysis however tracks liveness per register unit (not per
register). We already needed adjustments for this in r292871 to deal
with super/sub registers. However I did not look at aliased register
there. Looking at ARM:
FPSCR (regunits FPSCR, FPSCR~FPSCR_NZCV) aliases with FPSCR_NZCV
(regunits FPSCR_NZCV, FPSCR~FPSCR_NZCV) hence they share a register unit
(FPSCR~FPSCR_NZCV) that represents the aliased parts of the registers.
This shared register unit was previously considered non-reserved,
however given that we uses of the reserved FPSCR potentially violate
some rules (like uses without defs) we should make FPSCR~FPSCR_NZCV
reserved too and stop tracking liveness for it.
This patch:
- Defines a register unit as reserved when: At least for one root
register, the root register and all its super registers are reserved.
- Adjust LiveIntervals::computeRegUnitRange() for new reserved
definition.
- Add MachineRegisterInfo::isReservedRegUnit() to have a canonical way
of testing.
- Stop computing LiveRanges for reserved register units in HMEditor even
with UpdateFlags enabled.
- Skip verification of uses of reserved reg units in the machine
verifier (this usually didn't happen because there would be no cached
liverange but there is no guarantee for that and I would run into this
case before the HMEditor tweak, so may as well fix the verifier too).
Note that this should only affect ARMs FPSCR/FPSCR_NZCV registers today;
aliased registers are rarely used, the only other cases are hexagons
P0-P3/P3_0 and C8/USR pairs which are not mixing reserved/non-reserved
registers in an alias.
Sam Clegg [Fri, 1 Sep 2017 17:24:19 +0000 (17:24 +0000)]
[WebAssembly] Fix getSymbolValue for exported globals
The code wasn't previously taking into account that the
global index space is not same as the into in the Globals
array since the latter does not include imported globals.
llvm-isel-fuzzer: Make buildable and testable without libFuzzer
This adds a dummy main so we can build and run the llvm-isel-fuzzer
functionality when we aren't building LLVM with coverage. The approach
here should serve as a template to stop in-tree fuzzers from
bitrotting (See llvm.org/pr34314).
Note that I'll probably move most of the logic in DummyISelFuzzer's
`main` to a library so it's easy to reuse it in other fuzz targets,
but I'm planning on doing that in a follow up that also consolidates
argument handling in our LLVMFuzzerInitialize implementations.
AMDGPU: IMPLICIT_DEFs and DBG_VALUEs do not contribute to wait states
Summary:
This fixes a bug that was exposed on gfx9 in various
GL45-CTS.shaders.loops.*_iterations.select_iteration_count_fragment tests,
e.g. GL45-CTS.shaders.loops.do_while_uniform_iterations.select_iteration_count_fragment
ModuleSummaryAnalysis: Correctly handle refs from function inline asm to module inline asm.
If a function contains inline asm and the module-level inline asm
contains the definition of a local symbol, prevent the function from
being imported in case the function-level inline asm refers to a
symbol in the module-level inline asm.
[LoopVectorizer] Use two step casting for float to pointer types.
Summary:
LoopVectorizer is creating casts between vec<ptr> and vec<float> types
on ARM when compiling OpenCV. Since, tIs is illegal to directly cast a
floating point type to a pointer type even if the types have same size
causing a crash. Fix the crash using a two-step casting by bitcasting
to integer and integer to pointer/float.
Fixes PR33804.
%q = udiv <ty> %t, %b
%s = mul <ty> nuw %q, %b
%r = sub <ty> nuw %t, %q ; (t / b) * b + (t % b) = t
As UDiv, Mul and Sub are already supported by SCEV, URem can be implemented
with minimal effort using that relation:
%r --> (-%b * (%t /u %b)) + %t
We implement two special cases:
- if %b is 1, the result is always 0
- if %b is a power-of-two, we produce a zext/trunc based expression instead
That is, the following code:
%r = urem i32 %t, 65536
Produces:
%r --> (zext i16 (trunc i32 %a to i16) to i32)
Note that while this helps get a tighter bound on the range analysis and the
known-bits analysis, this exposes some normalization shortcoming of SCEVs:
Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding"
Issues addressed since original review:
- Moved removal of dead instructions found by
LiveIntervals::shrinkToUses() outside of loop iterating over
instructions to avoid instructions being deleted while pointed to by
iterator.
- Fixed ARMLoadStoreOptimizer bug exposed by this change in r311907.
- The pass no longer forwards COPYs to physical register uses, since
doing so can break code that implicitly relies on the physical
register number of the use.
- The pass no longer forwards COPYs to undef uses, since doing so
can break the machine verifier by creating LiveRanges that don't
end on a use (since the undef operand is not considered a use).
[MachineCopyPropagation] Extend pass to do COPY source forwarding
This change extends MachineCopyPropagation to do COPY source forwarding.
This change also extends the MachineCopyPropagation pass to be able to
be run during register allocation, after physical registers have been
assigned, but before the virtual registers have been re-written, which
allows it to remove virtual register COPY LiveIntervals that become dead
through the forwarding of all of their uses.
Test constants as well in the PIC tests. These are also represented as
G_GLOBAL_VALUE, and although they are treated just like other globals
for PIC, they won't be for ROPI, so it's good to have this coverage.
Debug info for variables whose type is shrinked to bool
This patch provides such debug information for integer
variables whose type is shrinked to bool by providing
dwarf expression which returns either constant initial
value or other value.
[MergeICmps] MergeICmps is a new optimization pass that turns chains of integer
comparisons into memcmp.
Thanks to recent improvements in the LLVM codegen, the memcmp is typically
inlined as a chain of efficient hardware comparisons.
This typically benefits C++ member or nonmember operator==().
For now this is disabled by default until:
- https://bugs.llvm.org/show_bug.cgi?id=33329 is complete
- Benchmarks show that this is always useful.
[AVX512] Suppress duplicate register only FMA patterns.
Previously we generated a register only pattern for each of the 3 instruction forms, but they are all identical as far as isel is concerned. So drop the others and just keep the 213 version.
Summary:
Before https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit/?h=v4.9.46&id=84638335900f1995495838fe1bd4870c43ec1f67
test worked because memory allocated with mmap was not counted against RLIMIT_DATA.
Craig Topper [Thu, 31 Aug 2017 21:39:23 +0000 (21:39 +0000)]
[X86] Don't pull carry through X86ISD::ADD carryin, -1 if we can't guranteed we're really using the carry flag from the add.
Prior to this patch we had a DAG combine that tried to bypass an X86ISD::ADD with -1 being added to the carry flag of some previous operation. We would then pass the carry flag directly to user.
But this is only safe if the user is looking for the carry flag and not the zero flag.
So we need to only do this combine in a context where we know what flag the consumer is using.
This was caused by the Key value in DiagnosticInfoOptimizationBase being
deallocated before emitting the remarks defined in MachineOutliner.cpp. As of
r312277 this should no longer be an issue.
Jessica Paquette [Thu, 31 Aug 2017 20:47:37 +0000 (20:47 +0000)]
[NFC] Change Key in Argument to a std::string
Before, Key was a StringRef to avoid unnecessary copies. This commit changes
that to a std::string.
This was okay previously because when people called emit for remarks before,
they would create the remark *within* the call to emit. However, if you build
the remark up and call emit *afterward*, it's possible to end up freeing the
memory assigned to the StringRef before the call to emit.
This caused a test failure with https://reviews.llvm.org/D37085 on Linux.
Since building remarks before a call to emit is a valid use-case, it makes
sense to replace this with a std::string.