[MemorySSA] Extend allowed behavior for simplified instructions.
Summary:
LoopRotate may simplify instructions, leading to the new instructions not having memory accesses created for them.
Allow this behavior, by allowing the new access to be null when the template is null, and looking upwards for the proper defined access when dealing with simplified instructions.
Michael Liao [Tue, 30 Jul 2019 19:52:01 +0000 (19:52 +0000)]
[NVPTX] Fix PR41651
Summary:
- Use the passed `DL` directly as retrieving data layout from CS by
checking the called function is not reliable. Under indirect function
call, there is no called function.
[dsymutil] Pass LinkOptions by value instead of const ref.
When looping over the difference architectures in a fat binary, we
modify the link options before dispatching the link step to a different
thread. Passing the options by cont reference is not thread safe, as we
might modify its fields before the whole sturct is copied over.
Given that the link options are already stored in the DwarfLinker, we
can easily fix this by passing a copy of the link options instead of a
reference, which would just get copied later on.
[FunctionAttrs] Annotate "willreturn" for AssumeLikeInst
Summary:
In D37215, AssumeLikeInstruction are regarded as `willreturn`. In this patch, annotation is added to those which don't have `willreturn` now(`sideeffect, object_size, experimental_widenable_condition`).
Thomas Lively [Tue, 30 Jul 2019 18:08:39 +0000 (18:08 +0000)]
[WebAssembly] Do not emit tail calls with return type mismatch
Summary:
return_call and return_call_indirect are only valid if the return
types of the callee and caller match. We were previously not enforcing
that, which was producing invalid modules.
LLVM_ALIGNAS was removed from this class in http://llvm.org/r338099 but the comment was left there. The class is still sommewhat relevant despite the comment, let's keep it there with its one use.
[GVN] Preserve loop related analysis/canonical forms.
LoopInfo can be easily preserved by passing it to the functions that
modify the CFG (SplitCriticalEdge and MergeBlockIntoPredecessor.
SplitCriticalEdge also preserves LoopSimplify and LCSSA form when when passing in
LoopInfo. The test case shows that we preserve LoopSimplify and
LoopInfo. Adding addPreservedID(LCSSAID) did not preserve LCSSA for some
reason.
Also I am not sure if it is possible to preserve those in the new pass
manager, as they aren't analysis passes.
[LoopFusion] Extend use of OptimizationRemarkEmitter
Summary:
This patch extends the use of the OptimizationRemarkEmitter to provide
information about loops that are not fused, and loops that are not eligible for
fusion. In particular, it uses the OptimizationRemarkAnalysis to identify loops
that are not eligible for fusion and the OptimizationRemarkMissed to identify
loops that cannot be fused.
It also reuses the statistics to provide the messages used in the
OptimizationRemarks. This provides common message strings between the
optimization remarks and the statistics.
I would like feedback on this approach, in general. If people are OK with this,
I will flesh out additional remarks in subsequent commits.
The @srem_of_srem_expanded case exposed a RAUW pitfall in D65298.
Right now these don't appear to fail verification,
so it should be safe to precommit them.
Sean Fertile [Tue, 30 Jul 2019 15:37:01 +0000 (15:37 +0000)]
Address post commit review comments on revision 366727.
Addresses number of comment made on D64652 after commiting:
- Reorders function decls in the TargetLoweringObjectFileXCOFF class.
- Fix comment in MCSectionXCOFF to include description of external reference
csects.
- Convert several llvm_unreachables to report_fatal_error
- Convert several dyn_casts to casts as they are expected not to fail.
- Avoid copying DataLayout object.
Roman Lebedev [Tue, 30 Jul 2019 15:28:22 +0000 (15:28 +0000)]
[InstCombine] Fold "x ?% y ==/!= 0" to "x & (y-1) ==/!= 0" iff y is power-of-two
Summary:
I have stumbled into this by accident while preparing to extend backend `x s% C ==/!= 0` handling.
While we did happen to handle this fold in most of the cases,
the folding is indirect - we fold `x u% y` to `x & (y-1)` (iff `y` is power-of-two),
or first turn `x s% -y` to `x u% y`; that does handle most of the cases.
But we can't turn `x s% INT_MIN` to `x u% -INT_MIN`,
and thus we end up being stuck with `(x s% INT_MIN) == 0`.
There is no such restriction for the more general fold:
https://rise4fun.com/Alive/IIeS
To be noted, the fold does not enforce that `y` is a constant,
so it may indeed increase instruction count.
This is consistent with what `x u% y`->`x & (y-1)` already does.
I think it makes sense, it's at most one (simple) extra instruction,
while `rem`ainder is really much more un-simple (and likely **very** costly).
Sam Elliott [Tue, 30 Jul 2019 13:40:51 +0000 (13:40 +0000)]
[RISCV] Attempt to make rv{32,64}i-aliases-invalid.s less flaky
These tests have been disabled on Linux and Windows due to failing
there. I think that could be down to a race condition between stdout
and stderr, so I have disabled output to stdout.
For the moment, only re-enable on linux, because I don't have a windows
machine to test on.
George Rimar [Tue, 30 Jul 2019 13:37:02 +0000 (13:37 +0000)]
[llvm-objcopy] - Stop using Inputs/alloc-symtab.o
Initially Inputs/alloc-symtab.o was added in D42222.
It contains an allocatable .symtab section. Today
we are able to create such sections using yaml2obj.
Later people started using this input for no solid reason in their tests.
Now multiple of tests are using it.
(And those tests do not need such a specific case actually).
In this patch I removed this binary and rewrote the few tests.
This is the compantion patch to https://reviews.llvm.org/D64482, needed to ensure
that builds with host compilers that don't yet predefine _FILE_OFFSET_BITS=64 on
Solaris succeed by always making the host and freshly built clang consistent.
Roman Lebedev [Tue, 30 Jul 2019 07:10:00 +0000 (07:10 +0000)]
[DivRemPairs] Handling for expanded-form rem - recomposition (PR42673)
Summary:
While `-div-rem-pairs` pass can decompose rem in div+rem pair when div-rem pair
is unsupported by target, nothing performs the opposite fold.
We can't do that in InstCombine or DAGCombine since neither of those has access to TTI.
So it makes most sense to teach `-div-rem-pairs` about it.
If we matched rem in expanded form, we know we will be able to place div-rem pair
next to each other so we won't regress the situation.
Also, we shouldn't decompose rem if we matched already-decomposed form.
This is surprisingly straight-forward otherwise.
Michael Pozulp [Tue, 30 Jul 2019 07:05:27 +0000 (07:05 +0000)]
Revert "[llvm-objdump] Add warning messages if disassembly + source for problematic inputs"
This reverts r367284 (git commit b1cbe51bdf44098c74f5c74b7bcd8c041a7c6772).
My changes to LLVMSymbolizer caused a test to fail:
http://lab.llvm.org:8011/builders/clang-ppc64be-linux-lnt/builds/29488
[NFC] use C++11 in AlignOf.h, remove AlignedCharArray
I removed all uses of AlignedCharArray since the minimum MSVC version can handle
alignas on char arrays correctly. We can therefore remove AlignedCharArray.
This patch also updates AlignedCharArrayUnion to use C++11.
Alex Lorenz [Mon, 29 Jul 2019 23:38:30 +0000 (23:38 +0000)]
[FileCollector] Add a VFS that records FS accesses using the FileCollector
This patch adds a VFS that can be overlaid on top of another VFS
to record file system accesses using the FileCollector.
This can help to gather files that are needed for reproducers.
As discussed in D65249, don't use AlignedCharArray or std::aligned_storage. Just use alignas(X) char Buf[Size];. This will allow me to remove AlignedCharArray entirely, and works on the current minimum version of Visual Studio.
David Bolvansky [Mon, 29 Jul 2019 17:41:00 +0000 (17:41 +0000)]
[UpdateTestChecks] Emit warning when invalid value for -check-prefix(es) option
Summary:
The script is silent for the following issue:
FileCheck %s -check-prefix=CHECK,POPCOUNT
FileCheck will catch it later, but I think we can warn here too.
Now it warns:
./update_llc_test_checks.py file.ll
WARNING: Supplied prefix 'CHECK,POPCOUNT' is invalid. Prefix must contain only alphanumeric characters, hyphens and underscores. Did you mean --check-prefixes=CHECK,POPCOUNT?
ThinLTOBitcodeWriter: Include globals associated with type metadata globals in the merged module.
Globals that are associated with globals with type metadata need to appear
in the merged module because they will reference the global's section directly.
Tom Stellard [Mon, 29 Jul 2019 16:40:58 +0000 (16:40 +0000)]
AMDGPU/LoadStoreOptimizer: combine MMOs when merging instructions
Summary:
The LoadStoreOptimizer was creating instructions with 2
MachineMemOperands, which meant they were assumed to alias with all other instructions,
because MachineInstr:mayAlias() returns true when an instruction has multiple
MachineMemOperands.
This was preventing these instructions from being merged again, and was
giving the scheduler less freedom to reorder them.
Simon Pilgrim [Mon, 29 Jul 2019 15:57:06 +0000 (15:57 +0000)]
[X86] combineX86ShufflesRecursively - start recursion at depth = 0. NFCI.
As discussed on rL367171, we have a problem where the depth recursion used in combineX86ShufflesRecursively was subtly different to computeKnownBits etc. - it starts at Depth=1 instead of Depth=0 like the others and has a different maximum recursion depth.
This NFC patch fixes the recursion depth to start at 0, so we can more easily reuse depth values in calls from combineX86ShufflesRecursively and its helper functions in computeKnownBits etc.
[RISCV] Fix uninitialized variable after call to evaluateConstantImm
For llvm/test/MC/RISCV/rv64i-aliases-invalid.s, UBSan reports:
lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp:371:9: runtime error:
load of value 3879186881, which is not a valid value for type
'RISCVMCExpr::VariantKind'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior
lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp:371:9 in
It turns out that evaluateConstantImm does not set `VK` and it remains
unitialized when doing comparisons in `isImmXLenLI()`.
[InstCombine] fold fadd+fneg with fdiv/fmul betweena
The backend already does this via isNegatibleForFree(),
but we may want to alter the fneg IR canonicalizations
that currently exist, so we need to try harder to fold
fneg in IR to avoid regressions.
[ValueTracking] Remove volatile check in isGuaranteedToTransferExecutionToSuccessor
Summary: As clarified in D53184, volatile load and store do not trap. Therefore, we should remove volatile checks for instructions in `isGuaranteedToTransferExecutionToSuccessor`.
Jay Foad [Mon, 29 Jul 2019 10:22:09 +0000 (10:22 +0000)]
[DivergenceAnalysis] Add methods for querying divergence at use
Summary:
The existing isDivergent(Value) methods query whether a value is
divergent at its definition. However even if a value is uniform at its
definition, a use of it in another basic block can be divergent because
of divergent control flow between the def and the use.
This patch adds new isDivergent(Use) methods to DivergenceAnalysis,
LegacyDivergenceAnalysis and GPUDivergenceAnalysis.
This might allow D63953 or other similar workarounds to be removed.
Hans Wennborg [Mon, 29 Jul 2019 09:49:04 +0000 (09:49 +0000)]
Mark test/MC/RISCV/rv{32,64}i-aliases-invalid.s unsupported also on Windows
Because they fail there too.
FAIL: LLVM :: MC/RISCV/rv32i-aliases-invalid.s (24397 of 32659)
******************** TEST 'LLVM :: MC/RISCV/rv32i-aliases-invalid.s' FAILED ********************
Script:
--
: 'RUN: at line 2'; not c:\src\llvm.monorepo\build.release2\bin\llvm-mc.exe C:\src\llvm.monorepo\llvm\test\MC\RISCV\rv32i-aliases-invalid.s -triple=riscv32 -riscv-no-aliases 2>&1 | c:\src\llvm.monorepo\build.release2\bin\filecheck.exe C:\src\llvm.monorepo\llvm\test\MC\RISCV\rv32i-aliases-invalid.s
: 'RUN: at line 3'; not c:\src\llvm.monorepo\build.release2\bin\llvm-mc.exe C:\src\llvm.monorepo\llvm\test\MC\RISCV\rv32i-aliases-invalid.s -triple=riscv32 2>&1 | c:\src\llvm.monorepo\build.release2\bin\filecheck.exe C:\src\llvm.monorepo\llvm\test\MC\RISCV\rv32i-aliases-invalid.s
--
Exit Code: 1
Command Output (stdout):
--
$ ":" "RUN: at line 2"
$ "not" "c:\src\llvm.monorepo\build.release2\bin\llvm-mc.exe" "C:\src\llvm.monorepo\llvm\test\MC\RISCV\rv32i-aliases-invalid.s" "-triple=riscv32" "-riscv-no-aliases"
$ "c:\src\llvm.monorepo\build.release2\bin\filecheck.exe" "C:\src\llvm.monorepo\llvm\test\MC\RISCV\rv32i-aliases-invalid.s"
C:\src\llvm.monorepo\llvm\test\MC\RISCV\rv32i-aliases-invalid.s:10:21: error: CHECK: expected string not found in input
li t4, foo # CHECK: :[[@LINE]]:8: error: immediate must be an integer in the range [-2147483648, 4294967295]
^
<stdin>:5:1: note: scanning from here
li x0, -2147483649 # CHECK: :[[@LINE]]:8: error: immediate must be an integer in the range [-2147483648, 4294967295]
^
<stdin>:5:1: note: with "@LINE" equal to "10"
li x0, -2147483649 # CHECK: :[[@LINE]]:8: error: immediate must be an integer in the range [-2147483648, 4294967295]
^
<stdin>:5:38: note: possible intended match here
li x0, -2147483649 # CHECK: :[[@LINE]]:8: error: immediate must be an integer in the range [-2147483648, 4294967295]
^
error: command failed with exit status: 1
--
--
********************
Testing: 0 .. 10.. 20.. 30.. 40.. 50.. 60.. 70
FAIL: LLVM :: MC/RISCV/rv64i-aliases-invalid.s (24416 of 32659)
******************** TEST 'LLVM :: MC/RISCV/rv64i-aliases-invalid.s' FAILED ********************
Script:
--
: 'RUN: at line 2'; not c:\src\llvm.monorepo\build.release2\bin\llvm-mc.exe C:\src\llvm.monorepo\llvm\test\MC\RISCV\rv64i-aliases-invalid.s -triple=riscv64 -riscv-no-aliases 2>&1 | c:\src\llvm.monorepo\build.release2\bin\filecheck.exe C:\src\llvm.monorepo\llvm\test\MC\RISCV\rv64i-aliases-invalid.s
: 'RUN: at line 3'; not c:\src\llvm.monorepo\build.release2\bin\llvm-mc.exe C:\src\llvm.monorepo\llvm\test\MC\RISCV\rv64i-aliases-invalid.s -triple=riscv64 2>&1 | c:\src\llvm.monorepo\build.release2\bin\filecheck.exe C:\src\llvm.monorepo\llvm\test\MC\RISCV\rv64i-aliases-invalid.s
--
Exit Code: 1
Command Output (stdout):
--
$ ":" "RUN: at line 2"
$ "not" "c:\src\llvm.monorepo\build.release2\bin\llvm-mc.exe" "C:\src\llvm.monorepo\llvm\test\MC\RISCV\rv64i-aliases-invalid.s" "-triple=riscv64" "-riscv-no-aliases"
$ "c:\src\llvm.monorepo\build.release2\bin\filecheck.exe" "C:\src\llvm.monorepo\llvm\test\MC\RISCV\rv64i-aliases-invalid.s"
C:\src\llvm.monorepo\llvm\test\MC\RISCV\rv64i-aliases-invalid.s:6:21: error: CHECK: expected string not found in input
li t4, foo # CHECK: :[[@LINE]]:8: error: operand must be a constant 64-bit integer
^
<stdin>:2:1: note: scanning from here
li t5, 0x10000000000000000 # CHECK: :[[@LINE]]:8: error: unknown operand
^
<stdin>:2:1: note: with "@LINE" equal to "6"
li t5, 0x10000000000000000 # CHECK: :[[@LINE]]:8: error: unknown operand
^
<stdin>:13:67: note: possible intended match here
C:\src\llvm.monorepo\llvm\test\MC\RISCV\rv64i-aliases-invalid.s:12:13: error: immediate must be an integer in the range [0, 63]
^
Sam Parker [Mon, 29 Jul 2019 08:41:51 +0000 (08:41 +0000)]
[NFC][ARM[ParallelDSP] Cleanup of BinOpChain
- Remove some unused typedefs.
- Rename BinOpChain struct to MulCandidate.
- Remove the size method of MulCandidate.
- Store only the first input of the ValueList provided to
MulCandidate, as it's the only value we care about. This means we
don't have to perform any ugly (and unnecessary) iterations of the
list later on.
David Stuttard [Mon, 29 Jul 2019 08:15:10 +0000 (08:15 +0000)]
[AMDGPU] Enable v4f16 and above for v_pk_fma instructions
Summary:
If isel is presented with <2 x half> vectors then it will correctly select
v_pk_fma style instructions.
If isel is presented with e.g. <4 x half> vectors it will scalarize, unlike for
other instruction types (such as fadd, fmul etc.)
Added extra support to enable this. Updated one of the tests to include a test
for this (as well as extending the test to GFX9)
George Rimar [Mon, 29 Jul 2019 07:55:39 +0000 (07:55 +0000)]
[llvm-objcopy] - Reimplement strip-dwo-groups.test to stop using the precompiled object.
When llvm-copy removes .dwo sections the index of symbol table,
the indices of the symbols and the indices of the sections which go
after the removed ones changes. That affects on SHT_GROUP sections,
which needs to be updated.
Initially this test used a precompiled object, I rewrote it to use YAML
and improved a bit.
[X86] Don't use PMADDWD for vector add reductions of multiplies if the mul inputs have an additional user.
The pmaddwd inserts a truncate, if that truncate would end up
creating additional instructions instead of making a zext
narrower, then we shouldn't do it.
I've restricted this to only sse4.1 targets since on prior
targets the zext will be done in stages. So the truncate will
probably not create additional instructions. Might need some
more investigation of mul shrinking and the other pmaddwd
transform to be sure this is the right decision.
There might be a slight regression on AVX1 targets due to add
splitting. Hard to say for sure. Maybe we need to look into
using the vector reduction flag to use 2 narrow loads and a
blend instead of extracting and inserting.
[X86] In combineLoopMAddPattern and combineLoopSADPattern, preserve the vector reduction flag on the final add. Handle unrolled loops by letting DAG combine revisit.
This reverts r340478 and r340631 and replaces them with a simpler
method of just letting DAG combine revisit the nodes to handle
the other operand.
[InstCombine] fold fsub+fneg with fdiv/fmul between
The backend already does this via isNegatibleForFree(),
but we may want to alter the fneg IR canonicalizations
that currently exist, so we need to try harder to fold
fneg in IR to avoid regressions.
David Green [Sun, 28 Jul 2019 14:07:48 +0000 (14:07 +0000)]
[ARM] MVE VPNOT
This adds the patterns required to transform xor P0, -1 to a VPNOT. The
instruction operands have to change a little for this, adding an in and an out
VCCR reg and using a custom DecodeMVEVPNOT for the decode.
David Green [Sun, 28 Jul 2019 13:53:39 +0000 (13:53 +0000)]
[ARM] Better patterns for fp <> predicate vectors
These are some better patterns for converting between predicates and floating
points. Much like the extends, we select "1"/"-1" or "0" depending on the
predicate value. Or we perform a compare against 0 to convert to a predicate.
Summary:
In current getPointerAlignemnt implementation, CallBase.getPointerAlignement(..) checks only parameter attriutes in the callsite. For example,
[FunctionAttrs] Annotate "willreturn" for intrinsics
Summary:
In D62801, new function attribute `willreturn` was introduced. In short, a function with `willreturn` is guaranteed to come back to the call site(more precise definition is in LangRef).
In this patch, willreturn is annotated for LLVM intrinsics.
The current pattern would trigger for scheduling changes of the
post-load computation, since those are commutable with the inline asm.
Avoid this by explicitly check the order of load vs asm block.
[InstSimplify] remove quadratic time looping (PR42771)
The test case from:
https://bugs.llvm.org/show_bug.cgi?id=42771
...shows a ~30x slowdown caused by the awkward loop iteration (rL207302) that is
seemingly done just to avoid invalidating the instruction iterator. We can instead
delay instruction deletion until we reach the end of the block (or we could delay
until we reach the end of all blocks).
There's a test diff here for a degenerate case with llvm.assume that is not
meaningful in itself, but serves to verify this change in logic.
This change probably doesn't result in much overall compile-time improvement
because we call '-instsimplify' as a standalone pass only once in the standard
-O2 opt pipeline currently.
Simon Pilgrim [Sat, 27 Jul 2019 13:30:29 +0000 (13:30 +0000)]
[X86][SSE] Replace PMULDQ GetDemandedBits combine with SimplifyMultipleUseDemandedBits handler (Reapplied)
Recommit rL367100 which was reverted at rL367141. Until PR42777 is fixed, we no longer get the benefits of peeking through bitcasts but it does still remove a GetDemandedBits user and gives us the equivalent combines.
Simon Pilgrim [Sat, 27 Jul 2019 12:48:46 +0000 (12:48 +0000)]
[SelectionDAG] Check for any recursion depth greater than or equal to limit instead of just equal the limit.
If anything called the recursive isKnownNeverNaN/computeKnownBits/ComputeNumSignBits/SimplifyDemandedBits/SimplifyMultipleUseDemandedBits with an incorrect depth then we could continue to recurse if we'd already exceeded the depth limit.
This replaces the limit check (Depth == 6) with a (Depth >= 6) to make sure that we don't circumvent it.
This causes a couple of regressions as a mixture of calls (SimplifyMultipleUseDemandedBits + combineX86ShufflesRecursively) were calling with depths that were already over the limit. I've fixed SimplifyMultipleUseDemandedBits to not do this. combineX86ShufflesRecursively is trickier as we get a lot of regressions if we reduce its own limit from 8 to 6 (it also starts at Depth == 1 instead of Depth == 0 like the others....) - I'll see what I can do in future patches.
Simon Pilgrim [Sat, 27 Jul 2019 12:23:36 +0000 (12:23 +0000)]
[TargetLowering] Add depth limit to SimplifyMultipleUseDemandedBits
We're getting reports of massive compile time increases because SimplifyMultipleUseDemandedBits was losing track of the depth and not earlying-out. No repro yet, but consider this a pre-emptive commit.