The static analyzer is warning about potential null dereferences, but in these cases we should be able to use cast<SCEVConstant> directly and if not assert will fire for us.
James Molloy [Wed, 2 Oct 2019 12:46:44 +0000 (12:46 +0000)]
[ModuloSchedule] Peel out prologs and epilogs, generate actual code
Summary:
This extends the PeelingModuloScheduleExpander to generate prolog and epilog code,
and correctly stitch uses through the prolog, kernel, epilog DAG.
The key concept in this patch is to ensure that all transforms are *local*; only a
function of a block and its immediate predecessor and successor. By defining the problem in this way
we can inductively rewrite the entire DAG using only local knowledge that is easy to
reason about.
For example, we assume that all prologs and epilogs are near-perfect clones of the
steady-state kernel. This means that if a block has an instruction that is predicated out,
we can redirect all users of that instruction to that equivalent instruction in our
immediate predecessor. As all blocks are clones, every instruction must have an equivalent in
every other block.
Similarly we can make the assumption by construction that if a value defined in a block is used
outside that block, the only possible user is its immediate successors. We maintain this
even for values that are used outside the loop by creating a limited form of LCSSA.
This code isn't small, but it isn't complex.
Enabled a bunch of testing from Hexagon. There are a couple of tests not enabled yet;
I'm about 80% sure there isn't buggy codegen but the tests are checking for patterns
that we don't produce. Those still need a bit more investigation. In the meantime we
(Google) are happy with the code produced by this on our downstream SMS implementation,
and believe it generates correct code.
Florian Hahn [Wed, 2 Oct 2019 12:32:52 +0000 (12:32 +0000)]
[InstCombine] Simplify fma multiplication to nan for undef or nan operands.
In similar fashion to D67721, we can simplify FMA multiplications if any
of the operands is NaN or undef. In instcombine, we will simplify the
FMA to an fadd with a NaN operand, which in turn gets folded to NaN.
Note that this just changes SimplifyFMAFMul, so we still not catch the
case where only the Add part of the FMA is Nan/Undef.
Sanjay Patel [Wed, 2 Oct 2019 12:12:02 +0000 (12:12 +0000)]
[InstSimplify] fold fma/fmuladd with a NaN or undef operand
This is intended to be similar to the constant folding results from
D67446
and earlier, but not all operands are constant in these tests, so the
responsibility for folding is left to InstSimplify.
Hans Wennborg [Wed, 2 Oct 2019 12:08:44 +0000 (12:08 +0000)]
Revert r373431 "Switch lowering: omit range check for bit tests when default is unreachable (PR43129)"
This broke http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/19967
> Switch lowering: omit range check for bit tests when default is unreachable (PR43129)
>
> This is modeled after the same functionality for jump tables, which was
> added in r357067.
>
> Differential revision: https://reviews.llvm.org/D68131
The static analyzer is warning about a potential null dereference, but we should be able to use cast<Function> directly and if not assert will fire for us.
The static analyzer is warning about a potential null dereference, but we know that the source won't be null so just use dyn_cast, which will assert if the value somehow is actually null.
David Green [Wed, 2 Oct 2019 11:40:51 +0000 (11:40 +0000)]
[ARM] Identity shuffles are legal
Identity shuffles, of the form (0, 1, 2, 3, ...) are perfectly OK under MVE
(they essentially just become bitcasts). We were not catching that in the
existing set of what we considered legal though. On NEON, they would be covered
by vext's, but that is not generally available in MVE.
This uses ShuffleVectorInst::isIdentityMask which is a little odd to use here
but does what we want and prevents us from just rewriting what is the same
function.
[IntrinsicEmitter] Add overloaded type VecOfBitcastsToInt for SVE intrinsics
Summary:
This allows intrinsics such as the following to be defined:
- declare <n x 4 x i32> @llvm.something.nxv4f32(<n x 4 x i32>, <n x 4 x i1>, <n x 4 x float>)
...where <n x 4 x i32> is derived from <n x 4 x float>, but
the element needs bitcasting to int.
Jay Foad [Wed, 2 Oct 2019 08:44:15 +0000 (08:44 +0000)]
[AMDGPU] Make printf lowering faster when there are no printfs
Summary:
Printf lowering unconditionally visited every instruction in the module.
To make it faster in the common case where there are no printfs, look up
the printf function (if any) and iterate over its users instead.
Florian Hahn [Wed, 2 Oct 2019 07:37:41 +0000 (07:37 +0000)]
[Local] Simplify function removeUnreachableBlocks() to avoid (re-)computation.
Two small changes in llvm::removeUnreachableBlocks() to avoid unnecessary (re-)computation.
First, replace the use of count() with find(), which has better time complexity.
Second, because we have already computed the set of dead blocks, replace the second loop over all basic blocks to a loop only over the already computed dead blocks. This simplifies the loop and avoids recomputation.
Patch by Rodrigo Caetano Rocha <rcor.cs@gmail.com>
The tool reports verbose output for the DWARF debug location coverage.
The llvm-locstats for each variable or formal parameter DIE computes what
percentage from the code section bytes, where it is in scope, it has
location description. The line 0 shows the number (and the percentage) of
DIEs with no location information, but the line 100 shows the number (and
the percentage) of DIEs where there is location information in all code
section bytes (where the variable or parameter is in the scope). The line
50..59 shows the number (and the percentage) of DIEs where the location
information is in between 50 and 59 percentage of its scope covered.
Rui Ueyama [Wed, 2 Oct 2019 05:24:24 +0000 (05:24 +0000)]
[llvm-lib] Correctly handle .lib input files
If archive files are passed as input files, llvm-lib needs to append
the members of the input archive files to the output file. This patch
implements that behavior.
This patch splits an existing function into smaller functions.
Effectively, the new code is only `if (Magic == file_magic::archive)
{ ... }` part.
Craig Topper [Wed, 2 Oct 2019 04:45:02 +0000 (04:45 +0000)]
[X86] Add broadcast load folding patterns to the NoVLX compare patterns.
These patterns use zmm registers for 128/256-bit compares when
the VLX instructions aren't available. Previously we only
supported registers, but as PR36191 notes we can fold broadcast
loads, but not regular loads.
Matt Arsenault [Wed, 2 Oct 2019 01:02:24 +0000 (01:02 +0000)]
AMDGPU/GlobalISel: Assume VGPR for G_FRAME_INDEX
In principle this should behave as any other constant. However
eliminateFrameIndex currently assumes a VALU use and uses a vector
shift. Work around this by selecting to VGPR for now until
eliminateFrameIndex is fixed.
Craig Topper [Tue, 1 Oct 2019 23:18:31 +0000 (23:18 +0000)]
[X86] Add a DAG combine to shrink vXi64 gather/scatter indices that are constant with sufficient sign bits to fit in vXi32
The gather/scatter instructions can implicitly sign extend the indices. If we're operating on 32-bit data, an v16i64 index can force a v16i32 gather to be split in two since the index needs 2 registers. If we can shrink the index to the i32 we can avoid the split. It should always be safe to shrink the index regardless of the number of elements. We have gather/scatter instructions that can use v2i32 index stored in a v4i32 register with v2i64 data size.
I've limited this to before legalize types to avoid creating a v2i32 after type legalization. We could check for it, but we'd also need testing. I'm also only handling build_vectors with no bitcasts to be sure the truncate will constant fold.
Craig Topper [Tue, 1 Oct 2019 22:40:03 +0000 (22:40 +0000)]
Revert r373172 "[X86] Add custom isel logic to match VPTERNLOG from 2 logic ops."
This seems to be causing some performance regresions that I'm
trying to investigate.
One thing that stands out is that this transform can increase
the live range of the operands of the earlier logic op. This
can be bad for register allocation. If there are two logic
op inputs we should really combine the one that is closest, but
SelectionDAG doesn't have a good way to do that. Maybe we need
to do this as a basic block transform in Machine IR.
[FileCheck] Move private interface to its own header
Summary:
Most of the class definition in llvm/include/llvm/Support/FileCheck.h
are actually implementation details that should not be relied upon. This
commit moves all of it in a new header file under
llvm/lib/Support/FileCheck. It also takes advantage of the code movement
to put the code into a new llvm::filecheck namespace.
Leonard Chan [Tue, 1 Oct 2019 20:49:07 +0000 (20:49 +0000)]
[ASan][NFC] Address remaining comments for https://reviews.llvm.org/D68287
I submitted that patch after I got the LGTM, but the comments didn't
appear until after I submitted the change. This adds `const` to the
constructor argument and makes it a pointer.
Leonard Chan [Tue, 1 Oct 2019 20:30:46 +0000 (20:30 +0000)]
[ASan] Make GlobalsMD member a const reference.
PR42924 points out that copying the GlobalsMetadata type during
construction of AddressSanitizer can result in exteremely lengthened
build times for translation units that have many globals. This can be addressed
by just making the GlobalsMD member in AddressSanitizer a reference to
avoid the copy. The GlobalsMetadata type is already passed to the
constructor as a reference anyway.
Bardia Mahjour [Tue, 1 Oct 2019 19:32:42 +0000 (19:32 +0000)]
[DDG] Data Dependence Graph - Root Node
Summary:
This patch adds Root Node to the DDG. The purpose of the root node is to create a single entry node that allows graph walk iterators to iterate through all nodes of the graph, making sure that no node is left unvisited during a graph walk (eg. SCC or DFS). Once the DDG is fully constructed it will have exactly one root node. Every node in the graph is reachable from the root. The algorithm for connecting the root node is based on depth-first-search that keeps track of visited nodes to try to avoid creating unnecessary edges.
Alina Sbirlea [Tue, 1 Oct 2019 19:09:50 +0000 (19:09 +0000)]
[MemorySSA] Check for unreachable blocks when getting last definition.
If a single predecessor is found, still check if the block is
unreachable. The test that found this had a self loop unreachable block.
Resolves PR43493.
Alina Sbirlea [Tue, 1 Oct 2019 18:34:39 +0000 (18:34 +0000)]
[MemorySSA] Update last_access_in_block check.
The check for "was there an access in this block" should be: is the last
access in this block and is it not a newly inserted phi.
Resolves new test in PR43438.
Also fix a typo when simplifying trivial Phis to match the comment.
Jakub Kuderski [Tue, 1 Oct 2019 18:27:17 +0000 (18:27 +0000)]
[Dominators][CodeGen] Fix MachineDominatorTree preservation in PHIElimination
Summary:
PHIElimination modifies CFG and marks MachineDominatorTree as preserved. Therefore, it the CFG changes it should also update the MDT, when available. This patch teaches PHIElimination to recalculate MDT when necessary.
This fixes the `tailmerging_in_mbp.ll` test failure discovered after switching to generic DomTree verification algorithm in MachineDominators in D67976.
Philip Reames [Tue, 1 Oct 2019 17:03:44 +0000 (17:03 +0000)]
[IndVars] An implementation of loop predication without a need for speculation
This patch implements a variation of a well known techniques for JIT compilers - we have an implementation in tree as LoopPredication - but with an interesting twist. This version does not assume the ability to execute a path which wasn't taken in the original program (such as a guard or widenable.condition intrinsic). The benefit is that this works for arbitrary IR from any frontend (including C/C++/Fortran). The tradeoff is that it's restricted to read only loops without implicit exits.
This builds on SCEV, and can thus eliminate the loop varying portion of the any early exit where all exits are understandable by SCEV. A key advantage is that fixing deficiency exposed in SCEV - already found one while writing test cases - will also benefit all of full redundancy elimination (and most other loop transforms).
I haven't seen anything in the literature which quite matches this. Given that, I'm not entirely sure that keeping the name "loop predication" is helpful. Anyone have suggestions for a better name? This is analogous to partial redundancy elimination - since we remove the condition flowing around the backedge - and has some parallels to our existing transforms which try to make conditions invariant in loops.
Factoring wise, I chose to put this in IndVarSimplify since it's a generally applicable to all workloads. I could split this off into it's own pass, but we'd then probably want to add that new pass every place we use IndVars. One solid argument for splitting it off into it's own pass is that this transform is "too good". It breaks a huge number of existing IndVars test cases as they tend to be simple read only loops. At the moment, I've opted it off by default, but if we add this to IndVars and enable, we'll have to update around 20 test files to add side effects or disable this transform.
Near term plan is to fuzz this extensively while off by default, reflect and discuss on the factoring issue mentioned just above, and then enable by default. I also need to give some though to supporting widenable conditions in this framing.
Craig Topper [Tue, 1 Oct 2019 16:28:20 +0000 (16:28 +0000)]
[X86] Add a VBROADCAST_LOAD ISD opcode representing a scalar load broadcasted to a vector.
Summary:
This adds the ISD opcode and a DAG combine to create it. There are
probably some places where we can directly create it, but I'll
leave that for future work.
This updates all of the isel patterns to look for this new node.
I had to add a few additional isel patterns for aligned extloads
which we should probably fix with a DAG combine or something. This
does mean that the broadcast load folding for avx512 can no
longer match a broadcasted aligned extload.
There's still some work to do here for combining a broadcast of
a broadcast_load. We also need to improve extractelement or
demanded vector elements of a broadcast_load. I'll try to get
those done before I submit this patch.
Simon Pilgrim [Tue, 1 Oct 2019 15:32:04 +0000 (15:32 +0000)]
[DAG][X86] Convert isNegatibleForFree/GetNegatedExpression to a target hook (PR42863)
This patch converts the DAGCombine isNegatibleForFree/GetNegatedExpression into overridable TLI hooks.
The intention is to let us extend existing FNEG combines to work more generally with negatible float ops, allowing it work with target specific combines and opcodes (e.g. X86's FMA variants).
Unlike the SimplifyDemandedBits, we can't just handle target nodes through a Target callback, we need to do this as an override to allow targets to handle generic opcodes as well. This does mean that the target implementations has to duplicate some checks (recursion depth etc.).
Partial reversion of rL372756 - I've identified the infinite loop issue inside the X86 override but haven't fixed it yet so I've only (re)committed the common TargetLowering refactoring part of the patch.
Simon Pilgrim [Tue, 1 Oct 2019 13:39:43 +0000 (13:39 +0000)]
Revert rL349624 : Let TableGen write output only if it changed, instead of doing so in cmake, attempt 2
Differential Revision: https://reviews.llvm.org/D55842
-----------------
As discussed on PR43385 this is causing Visual Studio msbuilds to perpetually rebuild all tablegen generated files
Michal Gorny [Tue, 1 Oct 2019 13:02:48 +0000 (13:02 +0000)]
[llvm-exegesis/lib] Fix missing linkage to MCParser
Otherwise, shared-lib build fails with:
lib64/libLLVMExegesis.a(SnippetFile.cpp.o): In function `llvm::exegesis::readSnippets(llvm::exegesis::LLVMState const&, llvm::StringRef)':
SnippetFile.cpp:(.text._ZN4llvm8exegesis12readSnippetsERKNS0_9LLVMStateENS_9StringRefE+0x31f): undefined reference to `llvm::createMCAsmParser(llvm::SourceMgr&, llvm::MCContext&, llvm::MCStreamer&, llvm::MCAsmInfo const&, unsigned int)'
SnippetFile.cpp:(.text._ZN4llvm8exegesis12readSnippetsERKNS0_9LLVMStateENS_9StringRefE+0x41c): undefined reference to `llvm::MCAsmParser::setTargetParser(llvm::MCTargetAsmParser&)'
collect2: error: ld returned 1 exit status
The static analyzer is warning about a potential null dereference, but we should be able to use cast<COFFObjectFile> directly and if not assert will fire for us.
The static analyzer is warning about a potential null dereference, as we're already earlying-out for a null Constant pointer I've just folded this into a dyn_cast_or_null<ConstantInt>.
Simon Pilgrim [Tue, 1 Oct 2019 10:22:01 +0000 (10:22 +0000)]
ConstantFold - ConstantFoldSelectInstruction - assume constant vector elements are constant. NFCI.
Goes a bit further than rL372743 which added the early out - elements should be Constant so use cast<Constant> instead (and rely on the assert if anything fails).
The tool reports verbose output for the DWARF debug location coverage.
The llvm-locstats for each variable or formal parameter DIE computes what
percentage from the code section bytes, where it is in scope, it has
location description. The line 0 shows the number (and the percentage) of
DIEs with no location information, but the line 100 shows the number (and
the percentage) of DIEs where there is location information in all code
section bytes (where the variable or parameter is in the scope). The line
50..59 shows the number (and the percentage) of DIEs where the location
information is in between 50 and 59 percentage of its scope covered.
Dmitri Gribenko [Tue, 1 Oct 2019 08:24:01 +0000 (08:24 +0000)]
Revert "GlobalISel: Handle llvm.read_register"
This reverts commit r373294. It broke Clang's
CodeGen/arm64-microsoft-status-reg.cpp:
http://lab.llvm.org:8011/builders/clang-x86_64-debian-fast/builds/18483
Craig Topper [Tue, 1 Oct 2019 07:10:09 +0000 (07:10 +0000)]
[X86] Consider isCodeGenOnly in the EVEX2VEX pass to make VMAXPD/PS map to the non-commutable VEX instruction. Use EVEX2VEX override to fix the scalar instructions.
Previously the match was ambiguous and VMAXPS/PD and VMAXCPS/PD
were mapped to the same VEX instruction. But we should keep
the commutableness when change the opcode.
Heejin Ahn [Tue, 1 Oct 2019 06:53:28 +0000 (06:53 +0000)]
[WebAssembly] Make sure EH pads are preferred in sorting
Summary:
In CFGSort, we try to make EH pads have higher priorities as soon as
they are ready to be sorted, to prevent creation of unwind destination
mismatches in CFGStackify. We did that by making priority queues'
comparison function prefer EH pads, but it was possible for an EH pad
to be popped from `Preferred` queue and then not sorted immediately and
enter `Ready` queue instead in a certain condition. This patch makes
sure that special condition does not consider EH pads as its candidates.
Heejin Ahn [Tue, 1 Oct 2019 06:21:53 +0000 (06:21 +0000)]
[WebAssembly] Unstackify regs after fixing unwinding mismatches
Summary:
Fixing unwind mismatches for exception handling can result in splicing
existing BBs and moving some of instructions to new BBs. In this case
some of stackified def registers in the original BB can be used in the
split BB. For example, we have this BB and suppose %r0 is a stackified
register.
```
bb.1:
%r0 = call @foo
... use %r0 ...
```
After fixing unwind mismatches in CFGStackify, `bb.1` can be split and
some instructions can be moved to a newly created BB:
```
bb.1:
%r0 = call @foo
bb.split (new):
... use %r0 ...
```
In this case we should make %r0 un-stackified, because its use is now in
another BB.
When spliting a BB, this CL unstackifies all def registers that have
uses in the new split BB.
Aditya Kumar [Tue, 1 Oct 2019 03:45:09 +0000 (03:45 +0000)]
[OCaml] Handle nullptr in Llvm.global_initializer
LLVMGetInitializer returns nullptr in case there is no
initializer. There is not much that can be done with nullptr in OCaml,
not even test if it is null. Also, there does not seem to be a C or
OCaml API to test if there is an initializer. So this diff changes
Llvm.global_initializer to return an option.
Matt Arsenault [Tue, 1 Oct 2019 02:07:16 +0000 (02:07 +0000)]
GlobalISel: Handle llvm.read_register
SelectionDAG has a bunch of machinery to defer this to selection time
for some reason. Just directly emit a copy during IRTranslator. The
x86 usage does somewhat questionably check hasFP, which could depend
on the whole function being at minimum translated.
This does lose the convergent bit if the callsite had it, which may be
a problem. We also lose that in general for intrinsics, which may also
be a problem.
Matt Arsenault [Tue, 1 Oct 2019 01:44:39 +0000 (01:44 +0000)]
TLI: Remove DAG argument from getRegisterByName
Replace with the MachineFunction. X86 is the only user, and only uses
it for the function. This removes one obstacle from using this in
GlobalISel. The other is the more tolerable EVT argument.
The X86 use of the function seems questionable to me. It checks hasFP,
before frame lowering.
Fangrui Song [Tue, 1 Oct 2019 01:31:15 +0000 (01:31 +0000)]
[llvm-readobj/llvm-readelf] Delete --arm-attributes (alias for --arch-specific)
D68110 added --arch-specific (supported by GNU readelf) and made
--arm-attributes an alias for it. The tests were later migrated to use
--arch-specific.
Note, llvm-readelf --arch-specific currently just uses llvm-readobj
style output for ARM attributes. The readelf-style output is not
implemented.
Craig Topper [Tue, 1 Oct 2019 01:27:52 +0000 (01:27 +0000)]
[X86] Add test case to show missed opportunity to shrink a constant index to a gather in order to avoid splitting.
Also add a test case for an index that could be shrunk, but
would create a narrow type. We can go ahead and do it we just
need to be before type legalization.