Nico Weber [Tue, 31 Oct 2017 16:39:47 +0000 (16:39 +0000)]
LTOModule::isBitcodeFile() shouldn't assert when returning false.
Fixes a bunch of assert-on-invalid-bitcode regressions after 315483.
Expected<> calls assertIsChecked() in its dtor, and operator bool() only calls
setChecked() if there's no error. So for functions that don't return an error
itself, the Expected<> version needs explicit code to disarm the error that the
ErrorOr<> code didn't need.
Florian Hahn [Tue, 31 Oct 2017 14:06:31 +0000 (14:06 +0000)]
[Reassociate] Remove FIXME from looptest.ll (NFC)
Summary: The loop invariant add (i+j) is reassoicated, I think the FIXME can be removed, because this is what the test case tries to check (AFAIK). I also changed the test to use FileCheck.
This patch adds the --threads option to dsymutil to process
architectures in parallel. The feature is already present in the version
distributed with Xcode, but was not yet upstreamed.
This is NFC as far as the linking behavior is concerned. As threads are
used automatically, the current tests cover the change in
implementation.
Teresa Johnson [Tue, 31 Oct 2017 12:56:09 +0000 (12:56 +0000)]
[ThinLTO] Double bits of module hash used for renaming
Summary:
Use 64 instead of 32 bits of the module hash as the suffix when renaming
after promotion to reduce the likelihood of a collision (which we
observed in a binary when using 32 bits).
Matthew Simpson [Tue, 31 Oct 2017 12:34:02 +0000 (12:34 +0000)]
[InstCombine] Simplify selects that test cmpxchg instructions
If a select instruction tests the returned flag of a cmpxchg instruction and
selects between the returned value of the cmpxchg instruction and its compare
operand, the result of the select will always be equal to its false value.
Ayman Musa [Tue, 31 Oct 2017 11:39:31 +0000 (11:39 +0000)]
Adding a shufflevector and select LLVM IR instructions fuzz tool
Based on similar python tool - utils/shuffle-fuzz.py - this tool extends the ability of it's previous by optionally attaching select instruction to the generated shufflevector instructions.
This was mainly developed to perform exhaustive testing of the X86 AVX512 masked shuffle instructions. But yet it can be used for various other targets.
The general design of the implementation is much modular than the original shuffle_fuzz.py tool, which makes it easier for anyone to extend it further.
David Green [Tue, 31 Oct 2017 10:47:46 +0000 (10:47 +0000)]
[LoopUnroll] Clean up remarks for unroll remainder
The optimisation remarks for loop unrolling with an unrolled remainder looks something like:
test.c:7:18: remark: completely unrolled loop with 3 iterations [-Rpass=loop-unroll]
C[i] += A[i*N+j];
^
test.c:6:9: remark: unrolled loop by a factor of 4 with run-time trip count [-Rpass=loop-unroll]
for(int j = 0; j < N; j++)
^
This removes the first of the two messages.
[AVX512] Adding new patterns for extract_subvector of vXi1
extract subvector of vXi1 from vYi1 is poorly supported by LLVM and most of the time end with an assertion.
This patch fixes this issue by adding new patterns to the TD file.
Max Kazantsev [Tue, 31 Oct 2017 06:19:05 +0000 (06:19 +0000)]
[IRCE][NFC] Rename fields of InductiveRangeCheck
Rename `Offset`, `Scale`, `Length` into `Begin`, `Step`, `End` respectively
to make naming of similar entities for Ranges and Range Checks more
consistent.
Craig Topper [Tue, 31 Oct 2017 06:01:04 +0000 (06:01 +0000)]
[X86] Make AVX512_512_SET0 XMM16-31 lower to 128-bit XOR when AVX512VL is enabled. Use 128-bit VLX instruction when VLX is enabled.
Unfortunately, this weakens our ability to do domain fixing when AVX512DQ is not enabled, but it is consistent with our 256-bit behavior.
Maybe we should add custom handling to domain fixing to allow EVEX integer XOR/AND/OR/ANDN to switch to VEX encoded fp instructions if the high registers aren't being used?
Philip Reames [Tue, 31 Oct 2017 05:16:46 +0000 (05:16 +0000)]
[IndVarSimplify] Simplify code using preheader assumption
As noted in the nice block comment, the previous code didn't actually handle multi-entry loops correctly, it just assumed SCEV didn't analyze such loops. Given SCEV has comments to the contrary, that seems a bit suspect. More importantly, the pass actually requires loopsimplify form which ensures a loop-preheader is available. Remove the excessive generaility and shorten the code greatly.
Note that we do successfully analyze many multi-entry loops, but we do so by converting them to single entry loops. See the added test case.
Max Kazantsev [Tue, 31 Oct 2017 05:07:56 +0000 (05:07 +0000)]
Reapply "[GVN] Prevent LoadPRE from hoisting across instructions that don't pass control flow to successors"
This patch fixes the miscompile that happens when PRE hoists loads across guards and
other instructions that don't always pass control flow to their successors. PRE is now prohibited
to hoist across such instructions because there is no guarantee that the load standing after such
instruction is still valid before such instruction. For example, a load from under a guard may be
invalid before the guard in the following case:
int array[LEN];
...
guard(0 <= index && index < LEN);
use(array[index]);
Philip Reames [Tue, 31 Oct 2017 04:19:06 +0000 (04:19 +0000)]
[SimplifyIndVar] Extract out invariant expression handling
Previously, the code returned early from the *function* when it couldn't find a free expansion, it should be returning from the *transform*. I don't have a test case, noticed this via inspection.
As a follow up, I'm going to revisit the logic in the extract function. I think that essentially the whole helper routine can be replaced with SCEVExpander, but I wanted to do that in a series of separate commits.
Shoaib Meenai [Tue, 31 Oct 2017 01:30:46 +0000 (01:30 +0000)]
[cmake] Make check_linker_flags operate via linker flags
`check_linker_flags` currently sets the *compiler* flags (via
`CMAKE_REQUIRED_FLAGS`), and thus implicitly relies on cmake's default
behavior of passing the compiler flags to the linker. This breaks when
cmake's build rules have been altered to not pollute the link line with
compiler flags (which can be desirable for build cleanliness). Instead,
set `CMAKE_EXE_LINKER_FLAGS` explicitly and use `CMP0056` to ensure the
linker flags are passed along. Additionally, since we're inside a
function, we can just alter the variable directly (as the alteration
will be limited to the scope of the function) rather than saving and
restoring the old value.
Philip Reames [Mon, 30 Oct 2017 23:59:51 +0000 (23:59 +0000)]
[CGP] Fix crash on i96 bit multiply
Issue found by llvm-isel-fuzzer on OSS fuzz, https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=3725
If anyone actually cares about > 64 bit arithmetic, there's a lot more to do in this area. There's a bunch of obviously wrong code in the same function. I don't have the time to fix all of them and am just using this to understand what the workflow for fixing fuzzer cases might look like.
Rui Ueyama [Mon, 30 Oct 2017 21:19:54 +0000 (21:19 +0000)]
Fix -fuse-ld feature detection error.
check_cxx_compiler_flag doesn't seem to try to link a program, so
the existing code doesn't correctly detect the availability of a given
linker. This patch uses check_cxx_source_compiles instead.
due to the construction not accounting for such variable. It led to the general
confusion and prevented setting linker-specific flags inside functions defined
in AddLLVM.cmake.
Davide Italiano [Mon, 30 Oct 2017 20:20:16 +0000 (20:20 +0000)]
[NewGVN] Stop assuming PHI args ordering when looking at phi-of-ops.
It's not guaranteed. There's a bug open to sort them in predecessor
order, but it won't happen anytime soon. In the meanwhile, passes
will have to do an O(#preds) scan. Such is life.
Daniel Neilson [Mon, 30 Oct 2017 19:51:48 +0000 (19:51 +0000)]
Create instruction classes for identifying any atomicity of memory intrinsic. (NFC)
Summary:
For reference, see: http://lists.llvm.org/pipermail/llvm-dev/2017-August/116589.html
This patch fleshes out the instruction class hierarchy with respect to atomic and
non-atomic memory intrinsics. With this change, the relevant part of the class
hierarchy becomes:
This involves some class renaming:
ElementUnorderedAtomicMemCpyInst -> AtomicMemCpyInst
ElementUnorderedAtomicMemMoveInst -> AtomicMemMoveInst
ElementUnorderedAtomicMemSetInst -> AtomicMemSetInst
A script for doing this renaming in downstream trees is included below.
An example of where the Any* classes should be used in LLVM is when reasoning
about the effects of an instruction (ex: aliasing).
---
Script for renaming AtomicMem* classes:
PREFIXES="[<,([:space:]]"
CLASSES="MemIntrinsic|MemTransferInst|MemSetInst|MemMoveInst|MemCpyInst"
SUFFIXES="[;)>,[:space:]]"
Craig Topper [Mon, 30 Oct 2017 14:51:37 +0000 (14:51 +0000)]
[X86] Make sure we don't create locked inc/dec instructions when the carry flag is being used.
Summary:
INC/DEC don't update the carry flag so we need to make sure we don't try to use it.
This patch introduces new X86ISD opcodes for locked INC/DEC. Teaches lowerAtomicArithWithLOCK to emit these nodes if INC/DEC is not slow or the function is being optimized for size. An additional flag is added that allows the INC/DEC to be disabled if the caller determines that the carry flag is being requested.
The test_sub_1_cmp_1_setcc_ugt test is currently showing this bug. The other test case changes are recovering cases that were regressed in r316860.
This should fully fix PR35068 finishing the fix started in r316860.
Yaxun Liu [Mon, 30 Oct 2017 14:30:28 +0000 (14:30 +0000)]
[AMDGPU] Emit metadata for hidden arguments for kernel enqueue
Identifies kernels which performs device side kernel enqueues and emit
metadata for the associated hidden kernel arguments. Such kernels are
marked with calls-enqueue-kernel function attribute by
AMDGPUOpenCLEnqueueKernelLowering pass and later on
hidden kernel arguments metadata HiddenDefaultQueue and
HiddenCompletionAction are emitted for them.
Clement Courbet [Mon, 30 Oct 2017 14:19:33 +0000 (14:19 +0000)]
[CodeGen][ExpandMemcmp] Allow memcmp to expand to vector loads (2).
- Targets that want to support memcmp expansions now return the list of
supported load sizes.
- Expansion codegen does not assume that all power-of-two load sizes
smaller than the max load size are valid. For examples, this is not the
case for x86(32bit)+sse2.
Florian Hahn [Mon, 30 Oct 2017 10:07:42 +0000 (10:07 +0000)]
Recommit r315288: [SCCP] Propagate integer range info for parameters in IPSCCP.
This version of the patch includes a fix addressing a stage2 LTO buildbot
failure and addressed some additional nits.
Original commit message:
This updates the SCCP solver to use of the ValueElement lattice for
parameters, which provides integer range information. The range
information is used to remove unneeded icmp instructions.
For the following function, f() can be optimized to ret i32 2 with
this change
Florian Hahn [Mon, 30 Oct 2017 09:04:18 +0000 (09:04 +0000)]
Recommit r315288: [SCCP] Propagate integer range info for parameters in IPSCCP.
This version of the patch includes a fix addressing a stage2 LTO buildbot
failure and addressed some additional nits.
Original commit message:
This updates the SCCP solver to use of the ValueElement lattice for
parameters, which provides integer range information. The range
information is used to remove unneeded icmp instructions.
For the following function, f() can be optimized to ret i32 2 with
this change
Craig Topper [Mon, 30 Oct 2017 04:39:18 +0000 (04:39 +0000)]
[X86] Rearrange code in X86InstrInfo.cpp to put all the foldMemoryOperandImpl methods together without partial/undef register handling in the middle. NFC
I have a future patch that wants to make use of the one of the partial functions in one of the earlier memory folding methods and the current ordering prevents that.
Sanjay Patel [Sun, 29 Oct 2017 20:49:31 +0000 (20:49 +0000)]
[(new) Pass Manager] instantiate SimplifyCFG with the same options as the old PM
The old PM sets the options of what used to be known as "latesimplifycfg" on the
instantiation after the vectorizers have run, so that's what we'redoing here.
FWIW, there's a later SimplifyCFGPass instantiation in both PMs where we do not
set the "late" options. I'm not sure if that's intentional or not.
Craig Topper [Sat, 28 Oct 2017 23:10:13 +0000 (23:10 +0000)]
[X86] Remove invalid code from LowerVSELECT.
This code attempted to say that v8i16/v16i16 VSELECT is legal if BWI and VLX are enabled, but the only way we could reach this point is if the condition was not a vXi1 type. Which means it really wasn't legal.
We don't have any tests that exercise this code. So I'm hoping it wasn't really reachable.
Craig Topper [Sat, 28 Oct 2017 19:56:57 +0000 (19:56 +0000)]
[X86] Fix a mistake in the X86ISelDAGToDAG.cpp code for MUL8r/IMUL8r.
I think this code is unreachable due to some promotions that occur elsewhere. I'll look into that to be sure, but for now I thought I should at least fix the obvious typo.
Sanjay Patel [Sat, 28 Oct 2017 18:43:07 +0000 (18:43 +0000)]
[SimplifyCFG] use pass options and remove the latesimplifycfg pass
This is no-functional-change-intended.
This is repackaging the functionality of D30333 (defer switch-to-lookup-tables) and
D35411 (defer folding unconditional branches) with pass parameters rather than a named
"latesimplifycfg" pass. Now that we have individual options to control the functionality,
we could decouple when these fire (but that's an independent patch if desired).
The next planned step would be to add another option bit to disable the sinking transform
mentioned in D38566. This should also make it clear that the new pass manager needs to
be updated to limit simplifycfg in the same way as the old pass manager.
Haicheng Wu [Sat, 28 Oct 2017 02:27:14 +0000 (02:27 +0000)]
[ConstantFold] Fix a crash when folding a GEP that has vector index
LLVM crashes when factoring out an out-of-bound index into preceding dimension
and the preceding dimension uses vector index. Simply bail out now when this
case happens.