Craig Topper [Wed, 6 Dec 2017 18:40:46 +0000 (18:40 +0000)]
[X86] Simplify the TTI code for getInterleavedMemoryOpCost around for AVX512BW. NFCI
Previously the lambda for AVX512 passed out a flag that indicated whether AVX512BW was required and that was checked against the AVX512BW subtarget flag outside.
This patch changes the interface to pass the AVX512BW subtarget bit in and return its value if we detect 16 or 8 bit types.
Shoaib Meenai [Wed, 6 Dec 2017 18:33:07 +0000 (18:33 +0000)]
[cmake] Remove unnecessary header include in atomics check
The header include was required to work around PR19898, as noted in that
comment. That PR has since been marked resolved fixed, and the
configuration check passes without the header inclusion both when
compiling on Windows with cl and when cross-compiling on Linux using
clang-cl.
I noticed this because the inclusion was cased incorrectly (Intrin.h
instead of intrin.h), which when cross-compiling on a case sensitive
file system would cause the intrin.h from the Windows SDK to be included
(which LLVM can't handle) instead of the one from clang's resource
directory, making the check fail. This is the same issue as r309980.
Correcting the case of the inclusion makes the check pass when cross
compiling, but it seems better to get rid of the inclusion entirely,
since it appears to be unnecessary now.
Adam Nemet [Wed, 6 Dec 2017 16:50:50 +0000 (16:50 +0000)]
[opt-viewer] Suppress noisy Swift remarks
Most likely, this is not how we want to handle this in the long term. This
code should probably be in the Swift repo and somehow plugged into the
opt-viewer. This is still however very experimental at this point so I don't
want to over-engineer it at this point.
Nirav Dave [Wed, 6 Dec 2017 15:30:13 +0000 (15:30 +0000)]
[ARM][AArch64][DAG] Reenable post-legalize store merge
Reenable post-legalize stores with constant merging computation and
corresponding test case.
* Properly truncate store merge constants
* Disable merging of truncated stores floating points
* Ensure merges of constant stores into a single vector are
constructed from legal elements.
Jonas Paulsson [Wed, 6 Dec 2017 13:53:24 +0000 (13:53 +0000)]
[SystemZ] Bugfix in expandRxSBG()
Csmith discovered a program that caused wrong code generation with -O0:
When handling a SIGN_EXTEND in expandRxSBG(), RxSBG.BitSize may be less than
the Input width (if a truncate was previously traversed), so maskMatters()
should be called with a masked based on the width of the sign extend result
instead.
Max Kazantsev [Wed, 6 Dec 2017 12:44:56 +0000 (12:44 +0000)]
[SCEV][NFC] Check NoWrap flags before lexicographical comparison of SCEVs
Lexicographical comparison of SCEV trees is potentially expensive for big
expression trees. We can define ordering between them for AddRecs and
N-ary operations by SCEV NoWrap flags to make non-equality check
cheaper.
This change does not prevent grouping eqivalent SCEVs together and is
not supposed to have any meaningful impact on behavior of any transforms.
Max Kazantsev [Wed, 6 Dec 2017 08:58:16 +0000 (08:58 +0000)]
[SCEV][NFC] Share value cache between SCEVs in GroupByComplexity
Current implementation of `compareSCEVComplexity` is being unreasonable with `SCEVUnknown`s:
every time it sees one, it creates a new value cache and tries to prove equality of two values using it.
This cache reallocates and gets lost from SCEV to SCEV.
This patch changes this behavior: now we create one cache for all values and share it between SCEVs.
Craig Topper [Wed, 6 Dec 2017 07:37:20 +0000 (07:37 +0000)]
[X86] Split 512-bit vector extends from types other than vXi1 out of LowerZERO_EXTEND_AVX512/LowerSIGN_EXTEND_AVX512. NFCI
Most of the code in these routines is for handling extends from vXi1 types. The 512-bit handling for other extends is very much like the AVX2 code. So make the special routines just do vXi1 types and move the other 512-bit handling to the place that handles AVX2.
Hans Wennborg [Wed, 6 Dec 2017 01:47:55 +0000 (01:47 +0000)]
Revert r319482 and r319483 "[memcpyopt] Teach memcpyopt to optimize across basic blocks"
This caused PR35519.
> [memcpyopt] Teach memcpyopt to optimize across basic blocks
>
> This teaches memcpyopt to make a non-local memdep query when a local query
> indicates that the dependency is non-local. This notably allows it to
> eliminate many more llvm.memcpy calls in common Rust code, often by 20-30%.
>
> Fixes PR28958.
>
> Differential Revision: https://reviews.llvm.org/D38374
>
> [memcpyopt] Commit file missed in r319482.
>
> This change was meant to be included with r319482 but was accidentally
> omitted.
Zachary Turner [Wed, 6 Dec 2017 00:58:12 +0000 (00:58 +0000)]
Regex out the local hash comparison test.
Since the local hash is a different number of bytes depending
on host architecture, we don't have a consistent value. I
will need to re-do this test for both x86 and x64. For now
it accepts any value for the local hash.
Craig Topper [Tue, 5 Dec 2017 23:08:30 +0000 (23:08 +0000)]
[SelectionDAG] Don't promote mask operand when widening mstore and mscatter.
If the mask needs to be promoted that should occur by the legalizer detecting the mask operand needs to be promoted not as a side effect of another action.
Craig Topper [Tue, 5 Dec 2017 23:08:27 +0000 (23:08 +0000)]
[SelectionDAG] Don't promote mask operands of MGATHER and MLOAD to setcc result type while widening the result. Just widen the mask.
The mask will be promoted if necessary when operands are promoted. It's possible the mask type is legal, but the setcc result type is a different. We shouldn't promote to the setcc result type unless the mask needs to be promoted.
Shoaib Meenai [Tue, 5 Dec 2017 21:49:56 +0000 (21:49 +0000)]
[CMake] Use PRIVATE in target_link_libraries for executables
We currently use target_link_libraries without an explicit scope
specifier (INTERFACE, PRIVATE or PUBLIC) when linking executables.
Dependencies added in this way apply to both the target and its
dependencies, i.e. they become part of the executable's link interface
and are transitive.
Transitive dependencies generally don't make sense for executables,
since you wouldn't normally be linking against an executable. This also
causes issues for generating install export files when using
LLVM_DISTRIBUTION_COMPONENTS. For example, clang has a lot of LLVM
library dependencies, which are currently added as interface
dependencies. If clang is in the distribution components but the LLVM
libraries it depends on aren't (which is a perfectly legitimate use case
if the LLVM libraries are being built static and there are therefore no
run-time dependencies on them), CMake will complain about the LLVM
libraries not being in export set when attempting to generate the
install export file for clang. This is reasonable behavior on CMake's
part, and the right thing is for LLVM's build system to explicitly use
PRIVATE dependencies for executables.
Unfortunately, CMake doesn't allow you to mix and match the keyword and
non-keyword target_link_libraries signatures for a single target; i.e.,
if a single call to target_link_libraries for a particular target uses
one of the INTERFACE, PRIVATE, or PUBLIC keywords, all other calls must
also be updated to use those keywords. This means we must do this change
in a single shot. I also fully expect to have missed some instances; I
tested by enabling all the projects in the monorepo (except dragonegg),
and configuring both with and without shared libraries, on both Darwin
and Linux, but I'm planning to rely on the buildbots for other
configurations (since it should be pretty easy to fix those).
Even after this change, we still have a lot of target_link_libraries
calls that don't specify a scope keyword, mostly for shared libraries.
I'm thinking about addressing those in a follow-up, but that's a
separate change IMO.
Lang Hames [Tue, 5 Dec 2017 21:44:56 +0000 (21:44 +0000)]
[Orc] Add a SymbolStringPool data structure for efficient storage and fast
comparison of symbol names.
SymbolStringPool is a thread-safe string pool that will be used in upcoming Orc
APIs to facilitate efficient storage and fast comparison of symbol name strings.
Anna Thomas [Tue, 5 Dec 2017 21:39:37 +0000 (21:39 +0000)]
[SafepointIRVerifier] Allow deriving pointers from unrelocated base
Summary:
This patch allows to use derived pointers (GEPs/bitcasts) of unrelocated
base pointers. We care only about the uses of these derived pointers.
It is acheived by two changes:
1. When we have enough information to say if the pointer is unrelocated at some
point or not, we walk all BBs to remove from their Contributions all valid defs
of unrelocated pointers (GEP with unrelocated base or bitcast of unrelocated
pointer).
2. When it comes to verification we just ignore instructions that were removed
at stage 1.
Hans Wennborg [Tue, 5 Dec 2017 20:22:20 +0000 (20:22 +0000)]
Re-commit r319490 "XOR the frame pointer with the stack cookie when protecting the stack"
The patch originally broke Chromium (crbug.com/791714) due to its failing to
specify that the new pseudo instructions clobber EFLAGS. This commit fixes
that.
> Summary: This strengthens the guard and matches MSVC.
>
> Reviewers: hans, etienneb
>
> Subscribers: hiraditya, JDevlieghere, vlad.tsyrklevich, llvm-commits
>
> Differential Revision: https://reviews.llvm.org/D40622
Ulrich Weigand [Tue, 5 Dec 2017 19:42:07 +0000 (19:42 +0000)]
[SystemZ] Validate shifted compare value in adjustForTestUnderMask
When folding a shift into a test-under-mask comparison, make sure that
there is no loss of precision when creating the shifted comparison
value. This usually never happens, except for certain always-true
comparisons in unoptimized code.
Dan Gohman [Tue, 5 Dec 2017 18:29:48 +0000 (18:29 +0000)]
[WebAssembly] Make stack-pointer imports mutable.
This is not currently valid by the wasm spec, however:
- It replaces doing set_global on an immutable global, which is also
not valid.
- It's expected be valid in the near future:
https://github.com/WebAssembly/threads/blob/master/proposals/threads/Globals.md
- This only occurs before linking, so a fully linked object will be
valid.
Matt Arsenault [Tue, 5 Dec 2017 18:23:17 +0000 (18:23 +0000)]
AMDGPU: Fix infinite loop with dbg_value
Surprisingly SIOptimizeExecMaskingPreRA can infinite loop
in some case with DBG_VALUE. Most tests using dbg_value are
run at -O0, so don't run this pass. This seems to only
happen when the value argument is undef.
Craig Topper [Tue, 5 Dec 2017 17:37:19 +0000 (17:37 +0000)]
[SelectionDAG] Remove the code that handles SETCC with a scalar result type from vector widening.
There's no such thing as a setcc with vector operands and scalar result. And if we're trying to widen the result we would have to already be looking at a vector result type.
So this patch renames the VSETCC function as the SETCC function and delete the original SETCC function.
Without this when lld failed to replace the output file it would leave
the temporary behind. The problem is that the existing logic is
- cancel the delete flag
- rename
We have to cancel first to avoid renaming and then crashing and
deleting the old version. What is missing then is deleting the
temporary file if the rename fails.
This can be an issue on both unix and windows, but I am not sure how
to cause the rename to fail reliably on unix. I think it can be done
on ZFS since it has an ACL system similar to what windows uses, but
adding support for checking that in llvm-lit is probably not worth it.
Sam Parker [Tue, 5 Dec 2017 15:13:47 +0000 (15:13 +0000)]
[DAGCombine] Move AND nodes to multiple load leaves
Search from AND nodes to find whether they can be propagated back to
loads, so that the AND and load can be combined into a narrow load.
We search through OR, XOR and other AND nodes and all bar one of the
leaves are required to be loads or constants. The exception node then
needs to be masked off meaning that the 'and' isn't removed, but the
loads(s) are narrowed still.
[DAGCombine] Handle big endian correctly in CombineConsecutiveLoads
Summary:
Found out, at code inspection, that there was a fault in
DAGCombiner::CombineConsecutiveLoads for big-endian targets.
A BUILD_PAIR is always having the least significant bits of
the composite value in element 0. So when we are doing the checks
for consecutive loads, for big endian targets, we should check
if the load to elt 1 is at the lower address and the load
to elt 0 is at the higher address.
Normally this bug only resulted in missed oppurtunities for
doing the load combine. I guess that in some rare situation it
could lead to faulty combines, but I've not seen that happen.
Note that this patch actually will trigger load combine for
some big endian regression tests.
One example is test/CodeGen/PowerPC/anon_aggr.ll where we now get
t76: i64,ch = load<LD8[FixedStack-9]
instead of
t37: i32,ch = load<LD4[FixedStack-10]>
t35: i32,ch = load<LD4[FixedStack-9]>
t41: i64 = build_pair t37, t35
before legalization. Then the legalization will split the LD8
into two loads, so the end result is the same. That should
verify that the transfomation is correct now.
Mikael Holmen [Tue, 5 Dec 2017 14:14:00 +0000 (14:14 +0000)]
Bail out of a SimplifyCFG switch table opt at undef values.
Summary:
A true or false result is expected from a comparison, but it seems the possibility of undef was overlooked, which could lead to a failed assert. This is fixed by this patch by bailing out if we encounter undef.
The bug is old and the assert has been there since the end of 2014, so it seems this is unusual enough to forego optimization.
[XRay][docs] Document xray_mode and log registration API.
This marks certain flags in XRay as deprecated (in particular,
`xray_naive_log=` and `xray_fdr_log=`), and recommends the use of the
`xray_mode=` flag.
Simon Pilgrim [Tue, 5 Dec 2017 12:14:36 +0000 (12:14 +0000)]
[X86][AVX512] Tag VPCMP/VPCMPU instruction scheduler classes
Move hardcoded itinerary out to the instruction declarations. Not sure that IIC_SSE_ALU_F32P is the best schedule for integer comparisons, but I'm not going to change it right now.
Simon Pilgrim [Tue, 5 Dec 2017 12:02:22 +0000 (12:02 +0000)]
[X86][AVX512] Cleanup VPCMP scheduler classes
Move hardcoded itinerary out to the instruction declarations. Not sure that IIC_SSE_ALU_F32P is the best schedule for integer comparisons, but I'm not going to change it right now.
Jonas Paulsson [Tue, 5 Dec 2017 11:24:39 +0000 (11:24 +0000)]
[SystemZ] set 'guessInstructionProperties = 0' and set flags as needed.
This has proven a healthy exercise, as many cases of incorrect instruction
flags were corrected in the process. As part of this, IntrWriteMem was added
to several SystemZ instrinsics.
Furthermore, a bug was exposed in TwoAddress with this change (as incorrect
hasSideEffects flags were removed and instructions could now be sunk), and
the test case for that bugfix (r319646) is included here as
test/CodeGen/SystemZ/twoaddr-sink.ll.
One temporary test regression (one extra copy) which will hopefully go away
in upcoming patches for similar cases:
test/CodeGen/SystemZ/vec-trunc-to-i1.ll
Jonas Paulsson [Tue, 5 Dec 2017 10:52:24 +0000 (10:52 +0000)]
[Regalloc] Generate and store multiple regalloc hints.
MachineRegisterInfo used to allow just one regalloc hint per virtual
register. This patch extends this to a vector of regalloc hints, which is
filled in by common code with sorted copy hints. Such hints will make for
more ID copies that can be removed.
NB! This improvement is currently (and hopefully temporarily) *disabled* by
default, except for SystemZ. The only reason for this is the big impact this
has on tests, which has unfortunately proven unmanageable. It was a long
while since all the tests were updated and just waiting for review (which
didn't happen), but now targets have to enable this themselves
instead. Several targets could get a head-start by downloading the tests
updates from the Phabricator review. Thanks to those who helped, and sorry
you now have to do this step yourselves.
This should be an improvement generally for any target!
The target may still create its own hint, in which case this has highest
priority and is stored first in the vector. If it has target-type, it will
not be recomputed, as per the previous behaviour.
The temporary hook enableMultipleCopyHints() will be removed as soon as all
targets return true.
Pavel Labath [Tue, 5 Dec 2017 10:24:15 +0000 (10:24 +0000)]
Re-commit "[cmake] Enable zlib support on windows"
This recommits r319533 which was broken llvm-config --system-libs
output. The reason was that I used find_libraries for searching for the
z library. This returns absolute paths, and when these paths made it
into llvm-config, it made it produce nonsensical flags. To fix this, I
hand-roll a search for the library in the same way that we search for
the terminfo library a couple of lines below.
This is a bit less flexible than the find_library option, as it does not
allow the user to specify the path to the library at configure time
(which is important on windows, as zlib is unlikely to be found in any
of the standard places cmake searches), but I was able to guide the
build to find it with appropriate values of LIB and INCLUDE environment
variables.
George Rimar [Tue, 5 Dec 2017 10:09:59 +0000 (10:09 +0000)]
[Support/TarWriter] - Don't allow TarWriter to add the same file more than once.
This is for PR35460.
Currently when LLD adds files to TarWriter it may pass the same file
multiple times. For example it happens for clang reproduce file which specifies
archive (.a) files more than once in command line.
Patch makes TarWriter to ignore files with the same path, so it will
add only the first one to archive.
Guy Blank [Tue, 5 Dec 2017 09:08:24 +0000 (09:08 +0000)]
[X86] Fix a bug in handling GRXX subclasses in Domain Reassignment pass
When trying to determine the correct Mask register class corresponding
to a GPR register class, not all register classes were handled.
This caused an assertion to be raised on some scenarios.
Craig Topper [Tue, 5 Dec 2017 08:15:03 +0000 (08:15 +0000)]
[SelectionDAG] Use WidenTargetBoolean in WidenVecRes_MLOAD and WidenVecOp_MSTORE instead of implementing it manually and incorrectly.
The CONCAT_VECTORS operand get its type from getSetCCResultType, but if the mask type and the setcc have different scalar sizes this creates an illegal CONCAT_VECTORS operation. The concat type should be 2x the mask type, and then an extend should be added if needed.