Sam Clegg [Thu, 7 Dec 2017 02:55:51 +0000 (02:55 +0000)]
[WebAssembly] section kind can be code
Currently, when creating a named section, the Wasm
frontend forces it to use `SectionKind::Data`, whereas
in fact C++ does generate code sections with custom
names.
Dan Gohman [Wed, 6 Dec 2017 23:57:11 +0000 (23:57 +0000)]
[WebAssembly] Import the linear memory and function table.
Instead of having .o files contain linear-memory and function table
definitions, use imports. This is more consistent with the stack pointer
being imported, and it's consistent with the linker being the one to
decide whether linear memory and function table are imported or defined
in the linked output. This implements tool-conventions #23.
Florian Hahn [Wed, 6 Dec 2017 22:48:36 +0000 (22:48 +0000)]
[AArch64] Add patterns to replace fsub fmul with fma fneg.
Summary:
This patch adds MachineCombiner patterns for transforming
(fsub (fmul x y) z) into (fma x y (fneg z)). This has a lower
latency on micro architectures where fneg is cheap.
Matthew Simpson [Wed, 6 Dec 2017 21:22:54 +0000 (21:22 +0000)]
[PGO] Make indirect call promotion a utility
This patch factors out the main code transformation utilities in the pgo-driven
indirect call promotion pass and places them in Transforms/Utils. The change is
intended to be a non-functional change, letting non-pgo-driven passes share a
common implementation with the existing pgo-driven pass.
The common utilities are used to conditionally promote indirect call sites to
direct call sites. They perform the underlying transformation, and do not
consider profile information. The pgo-specific details (e.g., the computation
of branch weight metadata) have been left in the indirect call promotion pass.
Florian Hahn [Wed, 6 Dec 2017 20:27:33 +0000 (20:27 +0000)]
[MachineCombiner] Add up latencies of all instructions in new pattern.
Summary:
When calculating the RootLatency, we add up all the latencies of the
deleted instructions. But for NewRootLatency we only add the latency of
the new root instructions, ignoring the latencies of the other
instructions inserted. This leads the combiner to underestimate the cost
of patterns which add multiple instructions. This patch fixes that by
summing up the latencies of all new instructions. For NewRootNode, the
more complex getLatency function is used.
Note that we may be slightly more precise than just summing up
all latencies. For example, consider a pattern like
r1 = INS1 ..
r2 = INS2 ..
r3 = INS3 r1, r2
I think in some other places, the total latency of the pattern would be
estimated as lat(INS3) + max(lat(INS1), lat(INS2)). If you consider
that worth changing, I think it would be best to do in a follow-up
patch.
Alina Sbirlea [Wed, 6 Dec 2017 19:56:37 +0000 (19:56 +0000)]
[ModRefInfo] Do not use ModRefInfo result in if conditions as this makes
assumptions about the values in the enum. Replace with wrapper returning
bool [NFC].
Rui Ueyama [Wed, 6 Dec 2017 19:18:24 +0000 (19:18 +0000)]
[COFF] Ignore semicolons in module definition identifiers
Patch by David Major.
The NSS project's .def files make heavy use of semicolons in a
frightening attempt at portability:
https://hg.mozilla.org/projects/nss/raw-file/tip/lib/ckfw/capi/nsscapi.def
lld-link was treating the semicolon as part of the export name,
resulting in unresolved symbols. This patch includes ';' in the list of
characters to split on.
Craig Topper [Wed, 6 Dec 2017 18:40:46 +0000 (18:40 +0000)]
[X86] Simplify the TTI code for getInterleavedMemoryOpCost around for AVX512BW. NFCI
Previously the lambda for AVX512 passed out a flag that indicated whether AVX512BW was required and that was checked against the AVX512BW subtarget flag outside.
This patch changes the interface to pass the AVX512BW subtarget bit in and return its value if we detect 16 or 8 bit types.
Shoaib Meenai [Wed, 6 Dec 2017 18:33:07 +0000 (18:33 +0000)]
[cmake] Remove unnecessary header include in atomics check
The header include was required to work around PR19898, as noted in that
comment. That PR has since been marked resolved fixed, and the
configuration check passes without the header inclusion both when
compiling on Windows with cl and when cross-compiling on Linux using
clang-cl.
I noticed this because the inclusion was cased incorrectly (Intrin.h
instead of intrin.h), which when cross-compiling on a case sensitive
file system would cause the intrin.h from the Windows SDK to be included
(which LLVM can't handle) instead of the one from clang's resource
directory, making the check fail. This is the same issue as r309980.
Correcting the case of the inclusion makes the check pass when cross
compiling, but it seems better to get rid of the inclusion entirely,
since it appears to be unnecessary now.
Adam Nemet [Wed, 6 Dec 2017 16:50:50 +0000 (16:50 +0000)]
[opt-viewer] Suppress noisy Swift remarks
Most likely, this is not how we want to handle this in the long term. This
code should probably be in the Swift repo and somehow plugged into the
opt-viewer. This is still however very experimental at this point so I don't
want to over-engineer it at this point.
Nirav Dave [Wed, 6 Dec 2017 15:30:13 +0000 (15:30 +0000)]
[ARM][AArch64][DAG] Reenable post-legalize store merge
Reenable post-legalize stores with constant merging computation and
corresponding test case.
* Properly truncate store merge constants
* Disable merging of truncated stores floating points
* Ensure merges of constant stores into a single vector are
constructed from legal elements.
Jonas Paulsson [Wed, 6 Dec 2017 13:53:24 +0000 (13:53 +0000)]
[SystemZ] Bugfix in expandRxSBG()
Csmith discovered a program that caused wrong code generation with -O0:
When handling a SIGN_EXTEND in expandRxSBG(), RxSBG.BitSize may be less than
the Input width (if a truncate was previously traversed), so maskMatters()
should be called with a masked based on the width of the sign extend result
instead.
Max Kazantsev [Wed, 6 Dec 2017 12:44:56 +0000 (12:44 +0000)]
[SCEV][NFC] Check NoWrap flags before lexicographical comparison of SCEVs
Lexicographical comparison of SCEV trees is potentially expensive for big
expression trees. We can define ordering between them for AddRecs and
N-ary operations by SCEV NoWrap flags to make non-equality check
cheaper.
This change does not prevent grouping eqivalent SCEVs together and is
not supposed to have any meaningful impact on behavior of any transforms.
Max Kazantsev [Wed, 6 Dec 2017 08:58:16 +0000 (08:58 +0000)]
[SCEV][NFC] Share value cache between SCEVs in GroupByComplexity
Current implementation of `compareSCEVComplexity` is being unreasonable with `SCEVUnknown`s:
every time it sees one, it creates a new value cache and tries to prove equality of two values using it.
This cache reallocates and gets lost from SCEV to SCEV.
This patch changes this behavior: now we create one cache for all values and share it between SCEVs.
Craig Topper [Wed, 6 Dec 2017 07:37:20 +0000 (07:37 +0000)]
[X86] Split 512-bit vector extends from types other than vXi1 out of LowerZERO_EXTEND_AVX512/LowerSIGN_EXTEND_AVX512. NFCI
Most of the code in these routines is for handling extends from vXi1 types. The 512-bit handling for other extends is very much like the AVX2 code. So make the special routines just do vXi1 types and move the other 512-bit handling to the place that handles AVX2.
Hans Wennborg [Wed, 6 Dec 2017 01:47:55 +0000 (01:47 +0000)]
Revert r319482 and r319483 "[memcpyopt] Teach memcpyopt to optimize across basic blocks"
This caused PR35519.
> [memcpyopt] Teach memcpyopt to optimize across basic blocks
>
> This teaches memcpyopt to make a non-local memdep query when a local query
> indicates that the dependency is non-local. This notably allows it to
> eliminate many more llvm.memcpy calls in common Rust code, often by 20-30%.
>
> Fixes PR28958.
>
> Differential Revision: https://reviews.llvm.org/D38374
>
> [memcpyopt] Commit file missed in r319482.
>
> This change was meant to be included with r319482 but was accidentally
> omitted.
Zachary Turner [Wed, 6 Dec 2017 00:58:12 +0000 (00:58 +0000)]
Regex out the local hash comparison test.
Since the local hash is a different number of bytes depending
on host architecture, we don't have a consistent value. I
will need to re-do this test for both x86 and x64. For now
it accepts any value for the local hash.
Craig Topper [Tue, 5 Dec 2017 23:08:30 +0000 (23:08 +0000)]
[SelectionDAG] Don't promote mask operand when widening mstore and mscatter.
If the mask needs to be promoted that should occur by the legalizer detecting the mask operand needs to be promoted not as a side effect of another action.
Craig Topper [Tue, 5 Dec 2017 23:08:27 +0000 (23:08 +0000)]
[SelectionDAG] Don't promote mask operands of MGATHER and MLOAD to setcc result type while widening the result. Just widen the mask.
The mask will be promoted if necessary when operands are promoted. It's possible the mask type is legal, but the setcc result type is a different. We shouldn't promote to the setcc result type unless the mask needs to be promoted.
Shoaib Meenai [Tue, 5 Dec 2017 21:49:56 +0000 (21:49 +0000)]
[CMake] Use PRIVATE in target_link_libraries for executables
We currently use target_link_libraries without an explicit scope
specifier (INTERFACE, PRIVATE or PUBLIC) when linking executables.
Dependencies added in this way apply to both the target and its
dependencies, i.e. they become part of the executable's link interface
and are transitive.
Transitive dependencies generally don't make sense for executables,
since you wouldn't normally be linking against an executable. This also
causes issues for generating install export files when using
LLVM_DISTRIBUTION_COMPONENTS. For example, clang has a lot of LLVM
library dependencies, which are currently added as interface
dependencies. If clang is in the distribution components but the LLVM
libraries it depends on aren't (which is a perfectly legitimate use case
if the LLVM libraries are being built static and there are therefore no
run-time dependencies on them), CMake will complain about the LLVM
libraries not being in export set when attempting to generate the
install export file for clang. This is reasonable behavior on CMake's
part, and the right thing is for LLVM's build system to explicitly use
PRIVATE dependencies for executables.
Unfortunately, CMake doesn't allow you to mix and match the keyword and
non-keyword target_link_libraries signatures for a single target; i.e.,
if a single call to target_link_libraries for a particular target uses
one of the INTERFACE, PRIVATE, or PUBLIC keywords, all other calls must
also be updated to use those keywords. This means we must do this change
in a single shot. I also fully expect to have missed some instances; I
tested by enabling all the projects in the monorepo (except dragonegg),
and configuring both with and without shared libraries, on both Darwin
and Linux, but I'm planning to rely on the buildbots for other
configurations (since it should be pretty easy to fix those).
Even after this change, we still have a lot of target_link_libraries
calls that don't specify a scope keyword, mostly for shared libraries.
I'm thinking about addressing those in a follow-up, but that's a
separate change IMO.
Lang Hames [Tue, 5 Dec 2017 21:44:56 +0000 (21:44 +0000)]
[Orc] Add a SymbolStringPool data structure for efficient storage and fast
comparison of symbol names.
SymbolStringPool is a thread-safe string pool that will be used in upcoming Orc
APIs to facilitate efficient storage and fast comparison of symbol name strings.
Anna Thomas [Tue, 5 Dec 2017 21:39:37 +0000 (21:39 +0000)]
[SafepointIRVerifier] Allow deriving pointers from unrelocated base
Summary:
This patch allows to use derived pointers (GEPs/bitcasts) of unrelocated
base pointers. We care only about the uses of these derived pointers.
It is acheived by two changes:
1. When we have enough information to say if the pointer is unrelocated at some
point or not, we walk all BBs to remove from their Contributions all valid defs
of unrelocated pointers (GEP with unrelocated base or bitcast of unrelocated
pointer).
2. When it comes to verification we just ignore instructions that were removed
at stage 1.
Hans Wennborg [Tue, 5 Dec 2017 20:22:20 +0000 (20:22 +0000)]
Re-commit r319490 "XOR the frame pointer with the stack cookie when protecting the stack"
The patch originally broke Chromium (crbug.com/791714) due to its failing to
specify that the new pseudo instructions clobber EFLAGS. This commit fixes
that.
> Summary: This strengthens the guard and matches MSVC.
>
> Reviewers: hans, etienneb
>
> Subscribers: hiraditya, JDevlieghere, vlad.tsyrklevich, llvm-commits
>
> Differential Revision: https://reviews.llvm.org/D40622
Ulrich Weigand [Tue, 5 Dec 2017 19:42:07 +0000 (19:42 +0000)]
[SystemZ] Validate shifted compare value in adjustForTestUnderMask
When folding a shift into a test-under-mask comparison, make sure that
there is no loss of precision when creating the shifted comparison
value. This usually never happens, except for certain always-true
comparisons in unoptimized code.
Dan Gohman [Tue, 5 Dec 2017 18:29:48 +0000 (18:29 +0000)]
[WebAssembly] Make stack-pointer imports mutable.
This is not currently valid by the wasm spec, however:
- It replaces doing set_global on an immutable global, which is also
not valid.
- It's expected be valid in the near future:
https://github.com/WebAssembly/threads/blob/master/proposals/threads/Globals.md
- This only occurs before linking, so a fully linked object will be
valid.
Matt Arsenault [Tue, 5 Dec 2017 18:23:17 +0000 (18:23 +0000)]
AMDGPU: Fix infinite loop with dbg_value
Surprisingly SIOptimizeExecMaskingPreRA can infinite loop
in some case with DBG_VALUE. Most tests using dbg_value are
run at -O0, so don't run this pass. This seems to only
happen when the value argument is undef.