Geoff Berry [Wed, 30 Aug 2017 18:41:07 +0000 (18:41 +0000)]
Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding"
Issues identified by buildbots addressed since original review:
- Fixed ARMLoadStoreOptimizer bug exposed by this change in r311907.
- The pass no longer forwards COPYs to physical register uses, since
doing so can break code that implicitly relies on the physical
register number of the use.
- The pass no longer forwards COPYs to undef uses, since doing so
can break the machine verifier by creating LiveRanges that don't
end on a use (since the undef operand is not considered a use).
[MachineCopyPropagation] Extend pass to do COPY source forwarding
This change extends MachineCopyPropagation to do COPY source forwarding.
This change also extends the MachineCopyPropagation pass to be able to
be run during register allocation, after physical registers have been
assigned, but before the virtual registers have been re-written, which
allows it to remove virtual register COPY LiveIntervals that become dead
through the forwarding of all of their uses.
Adrian Prantl [Wed, 30 Aug 2017 18:06:51 +0000 (18:06 +0000)]
Canonicalize the representation of empty an expression in DIGlobalVariableExpression
This change simplifies code that has to deal with
DIGlobalVariableExpression and mirrors how we treat DIExpressions in
debug info intrinsics. Before this change there were two ways of
representing empty expressions on globals, a nullptr and an empty
!DIExpression().
If someone needs to upgrade out-of-tree testcases:
perl -pi -e 's/(!DIGlobalVariableExpression\(var: ![0-9]*)\)/\1, expr: !DIExpression())/g' <MYTEST.ll>
will catch 95%.
Bob Haarman [Wed, 30 Aug 2017 17:50:21 +0000 (17:50 +0000)]
[codeview] make DbgVariableLocation::extractFromMachineInstruction use Optional
Summary:
DbgVariableLocation::extractFromMachineInstruction originally
returned a boolean indicating success. This change makes it return
an Optional<DbgVariableLocation> so we cannot try to access the fields
of the struct if they aren't valid.
Craig Topper [Wed, 30 Aug 2017 16:38:33 +0000 (16:38 +0000)]
[AVX512] Don't use 32-bit elements version of AND/OR/XOR/ANDN during isel unless we're matching a masked op or broadcast
Selecting 32-bit element logical ops without a select or broadcast requires matching a bitconvert on the inputs to the and. But that's a weird thing to rely on. It's entirely possible that one of the inputs doesn't have a bitcast and one does.
Since there's no functional difference, just remove the extra patterns and save some isel table size.
Balaram Makam [Wed, 30 Aug 2017 14:57:12 +0000 (14:57 +0000)]
Re-land MachineInstr: Reason locally about some memory objects before going to AA.
Summary:
Reverts r311008 to reinstate r310825 with a fix.
Refine alias checking for pseudo vs value to be conservative.
This fixes the original failure in builtbot unittest SingleSource/UnitTests/2003-07-09-SignedArgs.
This code is double-dead:
1. We simplify all selects with constant true/false condition in InstSimplify.
I've minimized/moved the tests to show that works as expected.
2. All remaining vector selects with a constant condition are canonicalized to
shufflevector, so we really can't see this pattern.
Florian Hahn [Wed, 30 Aug 2017 10:54:21 +0000 (10:54 +0000)]
[InstCombine] Fold insert sequence if first ins has multiple users.
Summary:
If the first insertelement instruction has multiple users and inserts at
position 0, we can re-use this instruction when folding a chain of
insertelement instructions. As we need to generate the first
insertelement instruction anyways, this should be a strict improvement.
We could get rid of the restriction of inserting at position 0 by
creating a different shufflemask, but it is probably worth to keep the
first insertelement instruction with position 0, as this is easier to do
efficiently than at other positions I think.
Sjoerd Meijer [Wed, 30 Aug 2017 08:38:13 +0000 (08:38 +0000)]
[AArch64] allow v4f16 types when FullFP16 is supported
Support for scalars was committed in r311154, this adds support for allowing
v4f16 vector types (thus avoiding conversions from/to single precision for
these types).
Gadi Haber [Wed, 30 Aug 2017 08:08:50 +0000 (08:08 +0000)]
[X86][Skylake] Fixing duplicated prefixes in the run command of Code Gen regression tests
NFC.
Replaced duplicated HASWELL prefixes in run commands in the X86 Code Gen regression tests by the SKYLAKE prefix when the -mcpu is set to skylake.
The fix is needed in preparation of an upcoming patch containing the Skylake scheduling info.
Craig Topper [Wed, 30 Aug 2017 07:48:39 +0000 (07:48 +0000)]
[AVX512] Correct isel patterns to support selecting masked vbroadcastf32x2/vbroadcasti32x2
Summary:
This patch adjusts the patterns to make the result type of the broadcast node vXf64/vXi64. Then adds a bitcast to vXi32 after that. Intrinsic lowering was also adjusted to generate this new pattern.
Fixes PR34357
We should probably just drop the intrinsic entirely and use native IR, but I'll leave that for a future patch.
Any idea what instruction we should be lowering the floating point 128-bit result version of this pattern to? There's a 128-bit v2i32 integer broadcast but not an fp one.
Craig Topper [Wed, 30 Aug 2017 05:00:35 +0000 (05:00 +0000)]
[X86] Apply SlowIncDec feature to Sandybridge/Ivybridge CPUs as well
Currently we start applying this on Haswell and newer. I don't believe anything changed in the Haswell architecture to make this the right cutoff point. The partial flag handling around this has been roughly the same since Sandybridge.
Matt Arsenault [Wed, 30 Aug 2017 03:26:18 +0000 (03:26 +0000)]
AMDGPU: Don't look for DS merge candidates with one use address
The merge is only possible if the base address register is the
same for the two instructions. If there is only the one use,
there's no point in doing an expensive forward scan checking
for memory interference looking for a merge candidate.
This gives a signficant improvement in one extreme testcase.
The code to do the scan is still algorithmically terrible,
so this is still the slowest pass in that example.
If denorms are not flushed we can use max instead of multiplication
by 1. For double that is simply faster, while for float and half
it is shorter, because mul uses constant bus and VOP3.
Lang Hames [Wed, 30 Aug 2017 00:47:42 +0000 (00:47 +0000)]
[Orc] Fix member variable ordering issue in OrcMCJITReplacement.
https://reviews.llvm.org/D36888
From that review description:
When an OrcMCJITReplacement object gets destructed, LazyEmitLayer may still
contain a shared_ptr of a module, which requires ShouldDelete in the deleter.
But ShouldDelete gets destructed before LazyEmitLayer due to the order of
declaration in OrcMCJITReplacement, which leads to a crash, when the destructor
of LazyEmitLayer is executed. Changing the order of declaration fixes this.
Lang Hames [Tue, 29 Aug 2017 23:29:09 +0000 (23:29 +0000)]
[Error] Add an optional error message to cantFail.
cantFail is the moral equivalent of an assertion that the wrapped call must
return a success value. This patch allows clients to include an associated
error message (the same way they would for an assertion for llvm_unreachable).
If the error message is not specified it will default to: "Failure value
returned from cantFail wrapped call".
Evgeniy Stepanov [Tue, 29 Aug 2017 22:40:19 +0000 (22:40 +0000)]
[cfi] Avoid branch veneers in jump tables when possible.
Summary:
When jumptable encoding does not match target code encoding (arm vs
thumb), a veneer is inserted by the linker. We can not avoid this
in all cases, because entries within one jumptable must have the same
encoding, but we can make it less common by selecting the jumptable
encoding to match the majority of its targets.
Wei Mi [Tue, 29 Aug 2017 21:45:11 +0000 (21:45 +0000)]
[LoopUnswitch] Fix a simple bug which disables loop unswitch for select statement
This is to fix PR34257. rL309059 takes an early return when FindLIVLoopCondition
fails to find a loop invariant condition. This is wrong and it will disable loop
unswitch for select. The patch fixes the bug.
Reid Kleckner [Tue, 29 Aug 2017 21:44:21 +0000 (21:44 +0000)]
[cmake] Stop putting the revision info in LLVM_VERSION_STRING
Summary:
This reduces the number of build actions after a no-op commit from
thousands to about six, which should be acceptable. If six actions is
still too many, developers can disable the LLVM_APPEND_VC_REV cmake
option.
llvm-config.h is a widely included header that should rarely change.
Before this patch, it would change after every re-configure. Very few
users of llvm-config.h need to know the precise version, and those that
do can migrate to incorporating LLVM_REVISION as provided by
llvm/Support/VCSRevision.h.
This should bring LLVM back to the behavior that it had before r306858
from June 30 2017. Most LLVM tools will now print a version string like
"6.0.0svn" instead of "6.0.0-git-c40c2a23de4".
Bob Haarman [Tue, 29 Aug 2017 20:59:25 +0000 (20:59 +0000)]
Reland r311957 [codeview] support more DW_OPs for more complete debug info
Summary:
Some variables show up in Visual Studio as "optimized out" even in -O0
-Od builds. This change fixes two issues that would cause this to
happen. The first issue is that not all DIExpressions we generate were
recognized by the CodeView writer. This has been addressed by adding
support for DW_OP_constu, DW_OP_minus, and DW_OP_plus. The second
issue is that we had no way to encode DW_OP_deref in CodeView. We get
around that by changinge the type we encode in the debug info to be
a reference to the type in the source code.
This fixes PR34261.
The reland adds two extra checks to the original: It checks if the
DbgVariableLocation is valid before checking any of its fields, and
it only emits ranges with nonzero registers.
Alexey Bataev [Tue, 29 Aug 2017 20:06:24 +0000 (20:06 +0000)]
[SimplifyCFG] Fix for PR34219: Preserve alignment after merging conditional stores.
Summary:
If SimplifyCFG pass is able to merge conditional stores into single one,
it loses the alignment. This may lead to incorrect codegen. Patch
sets the alignment of the new instruction if it is set in the original
one.
Craig Topper [Tue, 29 Aug 2017 18:58:13 +0000 (18:58 +0000)]
[InstCombine] Support vector splats in transformZExtICmp
This patch adds splat support to transformZExtICmp. The test cases are vector versions of tests that failed when commenting out parts of the existing scalar code.
One test didn't vectorize optimize properly due to another bug so a TODO has been added.
Hans Wennborg [Tue, 29 Aug 2017 18:41:00 +0000 (18:41 +0000)]
[DAG] Bound loop dependence check in merge optimization.
The loop dependence check looks for dependencies between store merge
candidates not captured by the chain sub-DAG doing a check of
predecessors which may be very large. Conservatively bound number of
nodes checked for compilation time. (Resolves PR34326).
Landing on behalf of Nirav Dave to unblock the 5.0.0 release.
Teresa Johnson [Tue, 29 Aug 2017 18:15:34 +0000 (18:15 +0000)]
[ThinLTO] Clean up stale alias import handling
Summary:
Remove some code that was no longer needed. The first FIXME is
stale since we long ago started using the index to drive importing,
rather than doing force importing based on linkage type. And
now with r309278, we no longer import any aliases.
Davide Italiano [Tue, 29 Aug 2017 17:24:09 +0000 (17:24 +0000)]
[LoopUnroll] Make the test for PR33437 actually useful.
I forgot to specify -unroll-loop-peel, making this test not
really effective. While here, adjust some details (naming and
run line). Thanks to Sanjoy and Michael Z. for pointing out in
their post-commit reviews.
Justin Bogner [Tue, 29 Aug 2017 17:08:44 +0000 (17:08 +0000)]
Fix build of llvm-mc-assemble/disassemble-fuzzer
Since these aren't built by default unless building with coverage (and
even then they aren't built for the check target) they've managed to
bit rot a little.
This just fixes the build. See llvm.org/pr34314 for the plan on making
sure these don't bit rot again.
Dehao Chen [Tue, 29 Aug 2017 15:28:12 +0000 (15:28 +0000)]
Add null check for promoted direct call
Summary: We originally assume that in pgo-icp, the promoted direct call will never be null after strip point casts. However, stripPointerCasts is so smart that it could possibly return the value of the function call if it knows that the return value is always an argument. In this case, the returned value cannot cast to Instruction. In this patch, null check is added to ensure null pointer will not be accessed.
Diana Picus [Tue, 29 Aug 2017 09:47:55 +0000 (09:47 +0000)]
[ARM] GlobalISel: Select globals in PIC mode
Support the selection of G_GLOBAL_VALUE in the PIC relocation model. For
simplicity we use the same pseudoinstructions for both Darwin and ELF:
(MOV|LDRLIT)_ga_pcrel(_ldr).
This is new for ELF, so it requires a small update to the ARM pseudo
expansion pass to make sure it adds the correct constant pool modifier
and add-current-address in the case of ELF.
Eric Christopher [Tue, 29 Aug 2017 08:23:46 +0000 (08:23 +0000)]
Revert "The current version of LLVM X86 disassembler incorrectly interprets some possible sets of x86 prefixes. This patch is the first step to close PR7709 and PR17697. There will be next patch(es) to close relative PRs." temporarily while some regressions are addressed.
Max Kazantsev [Tue, 29 Aug 2017 07:32:20 +0000 (07:32 +0000)]
[LSR] Fix Shadow IV in case of integer overflow
When LSR processes code like
int accumulator = 0;
for (int i = 0; i < N; i++) {
accummulator += i;
use((double) accummulator);
}
It may decide to replace integer `accumulator` with a double Shadow IV to get rid
of casts. The problem with that is that the `accumulator`'s value may overflow.
Starting from this moment, the behavior of integer and double accumulators
will differ.
This patch strenghtens up the conditions of Shadow IV mechanism applicability.
We only allow it for IVs that are proved to be `AddRec`s with `nsw`/`nuw` flag.
Stephen Hines [Tue, 29 Aug 2017 02:07:28 +0000 (02:07 +0000)]
Enable building LLVMgold.dll under mingw.
Summary:
Plugins can (and should) be enabled under mingw if we are building
libLLVM.dll, so this is just a missed case. This allows LLVMgold.dll to
be built now under mingw.
Yuka Takahashi [Tue, 29 Aug 2017 02:01:56 +0000 (02:01 +0000)]
[Bash-autocompletion] Add support for -std=
Summary:
Add support for autocompleting values of -std= by including
LangStandards.def. This patch relies on D36782, and is using two-stage
code generation.
Juergen Ributzka [Tue, 29 Aug 2017 00:34:56 +0000 (00:34 +0000)]
Re-apply "Fix cmake check for futimens when deploying to earlier macOS releases."
This fixes an issue with the use of LLVM_PARALLEL_LINK_JOBS.
Original commit message:
macOS 10.13 added a new API (futimens). This API is only available on macOS 10.13
and later, but the cmake check we have in place only tests if the symbol is
present and ignores the availability attribute. Luckily we have new warning for
this and by making this warning an error the cmake check will return the correct
result.
Justin Bogner [Tue, 29 Aug 2017 00:22:08 +0000 (00:22 +0000)]
Implement llvm-isel-fuzzer for fuzzing instruction selection
This implements a fuzzer tool for instruction selection, as described
in my [EuroLLVM 2017 talk][1].
The fuzzer must be given both libFuzzer args and llc-like args to
configure the backend. For example, to fuzz AArch64 GlobalISel at -O0,
you could invoke like so:
If you would like to seed the fuzzer with an initial corpus, simply
provide a directory of valid LLVM bitcode (not textual IR) as one of
the corpus dirs.
Craig Topper [Tue, 29 Aug 2017 00:13:49 +0000 (00:13 +0000)]
[InstCombine] Teach foldSelectICmpAndOr to handle vector splats
This was pretty close to working already. While I was here I went ahead and passed the ICmpInst pointer from the caller instead of doing a dyn_cast that can never fail.
Justin Bogner [Tue, 29 Aug 2017 00:11:05 +0000 (00:11 +0000)]
[sanitizer-coverage] Mark the guard and 8-bit counter arrays as used
In r311742 we marked the PCs array as used so it wouldn't be dead
stripped, but left the guard and 8-bit counters arrays alone since
these are referenced by the coverage instrumentation. This doesn't
quite work if we want the indices of the PCs array to match the other
arrays though, since elements can still end up being dead and
disappear.
Instead, we mark all three of these arrays as used so that they'll be
consistent with one another.
Bob Haarman [Tue, 29 Aug 2017 00:06:59 +0000 (00:06 +0000)]
[codeview] support more DW_OPs for more complete debug info
Summary:
Some variables show up in Visual Studio as "optimized out" even in -O0
-Od builds. This change fixes two issues that would cause this to
happen. The first issue is that not all DIExpressions we generate were
recognized by the CodeView writer. This has been addressed by adding
support for DW_OP_constu, DW_OP_minus, and DW_OP_plus. The second
issue is that we had no way to encode DW_OP_deref in CodeView. We get
around that by changinge the type we encode in the debug info to be
a reference to the type in the source code.
Marek Sokolowski [Mon, 28 Aug 2017 23:46:30 +0000 (23:46 +0000)]
[llvm-rc] Add MENU parsing ability (parser, pt 4/8).
This extends llvm-rc parsing tool by MENU resource
(msdn.microsoft.com/en-us/library/windows/desktop/aa381025(v=vs.85).aspx).
As for now, MENUEX
(msdn.microsoft.com/en-us/library/windows/desktop/aa381023(v=vs.85).aspx)
seems unnecessary.
Thanks for Nico Weber for his original work in this area.
Justin Bogner [Mon, 28 Aug 2017 23:46:11 +0000 (23:46 +0000)]
[sanitizer-coverage] Return the array from CreatePCArray. NFC
Be more consistent with CreateFunctionLocalArrayInSection in the API
of CreatePCArray, and assign the member variable in the caller like we
do for the guard and 8-bit counter arrays.
This also tweaks the order of method declarations to match the order
of definitions in the file.
Juergen Ributzka [Mon, 28 Aug 2017 23:04:38 +0000 (23:04 +0000)]
Fix cmake check for futimens when deploying to earlier macOS releases.
macOS 10.13 added a new API (futimens). This API is only available on macOS 10.13
and later, but the cmake check we have in place only tests if the symbol is
present and ignores the availability attribute. Luckily we have new warning for
this and by making this warning an error the cmake check will return the correct
result.
Evandro Menezes [Mon, 28 Aug 2017 22:51:52 +0000 (22:51 +0000)]
[AArch64] Adjust the cost model for Exynos M1 and M2
Add new predicate to more accurately model the scheduling around branches
and function calls and of loads and stores of pairs and integer
multiplications.