Nico Weber [Wed, 19 Dec 2018 23:52:16 +0000 (23:52 +0000)]
[gn build] Add check-lld target and make it work
Also add a build file for llvm-lit, which in turn needs llvm/tools/llvm-config.
With this, check-lld runs and passes all of lld's lit tests. It doesn't run any
of its unit tests yet.
Running just ninja -C out/gn will build all prerequisites needed to run tests,
but it won't run the tests (so that the build becomes clean after one build).
Running ninja -C out/gn check-lld will build prerequisites if needed and run
the tests. The check-lld target never becomes clean and runs tests every time.
llvm-config's build file is a bit gnarly: Everything not needed to run tests is
basically stubbed out. Also, to generate LibraryDependencies.inc we shell out
to llvm-build at build-time. It would be much nicer to get the library
dependencies by using the dependency data the GN build contains
(http://llvm-cs.pcc.me.uk/gen/tools/llvm-config/LibraryDependencies.inc#1).
Eli Friedman [Wed, 19 Dec 2018 22:52:04 +0000 (22:52 +0000)]
[CodeGenPrepare] Fix bad IR created by large offset GEP splitting.
Creating the IR builder, then modifying the CFG, leads to an IRBuilder
where the BB and insertion point are inconsistent, so new instructions
have the wrong parent.
Modified an existing test because the test wasn't covering anything
useful (the "invoke" was not actually an invoke by the time we hit the
code in question).
Nikita Popov [Wed, 19 Dec 2018 19:56:21 +0000 (19:56 +0000)]
[BDCE][DemandedBits] Detect dead uses of undead instructions
This (mostly) fixes https://bugs.llvm.org/show_bug.cgi?id=39771.
BDCE currently detects instructions that don't have any demanded bits
and replaces their uses with zero. However, if an instruction has
multiple uses, then some of the uses may be dead (have no demanded bits)
even though the instruction itself is still live. This patch extends
DemandedBits/BDCE to detect such uses and replace them with zero.
While this will not immediately render any instructions dead, it may
lead to simplifications (in the motivating case, by converting a rotate
into a simple shift), break dependencies, etc.
The implementation tries to strike a balance between analysis power and
complexity/memory usage. Originally I wanted to track demanded bits on
a per-use level, but ultimately we're only really interested in whether
a use is entirely dead or not. I'm using an extra set to track which uses
are dead. However, as initially all uses are dead, I'm not storing uses
those user is also dead. This case is checked separately instead.
The test case has a couple of cases that are not simplified yet. In
particular, we're only looking at uses of instructions right now. I think
it would make sense to also extend this to arguments. Furthermore
DemandedBits doesn't yet know some of the tricks that InstCombine does
for the demanded bits or bitwise or/and/xor in combination with known
bits information.
David Blaikie [Wed, 19 Dec 2018 19:34:24 +0000 (19:34 +0000)]
llvm-dwarfdump: Improve/fix pretty printing of array dimensions
This is to address post-commit feedback from Paul Robinson on r348954.
The original commit misinterprets count and upper bound as the same thing (I thought I saw GCC producing an upper bound the same as Clang's count, but GCC correctly produces an upper bound that's one less than the count (in C, that is, where arrays are zero indexed)).
I want to preserve the C-like output for the common case, so in the absence of a lower bound the count (or one greater than the upper bound) is rendered between []. In the trickier cases, where a lower bound is specified, a half-open range is used (eg: lower bound 1, count 2 would be "[1, 3)" and an unknown parts use a '?' (eg: "[1, ?)" or "[?, 7)" or "[?, ? + 3)").
Matthew Voss [Wed, 19 Dec 2018 19:07:45 +0000 (19:07 +0000)]
[ThinLTO] Remove dllimport attribute from locally defined symbols
Summary:
The LTO/ThinLTO driver currently creates invalid bitcode by setting
symbols marked dllimport as dso_local. The compiler often has access
to the definition (often dllexport) and the declaration (often
dllimport) of an object at link-time, leading to a conflicting
declaration. This patch resolves the inconsistency by removing the
dllimport attribute.
Jessica Paquette [Wed, 19 Dec 2018 19:01:36 +0000 (19:01 +0000)]
[GlobalISel][AArch64] Add support for @llvm.ceil
This adds a G_FCEIL generic instruction and uses it in AArch64. This adds
selection for floating point ceil where it has a supported, dedicated
instruction. Other cases aren't handled here.
It updates the relevant gisel tests and adds a select-ceil test. It also adds a
check to arm64-vcvt.ll which ensures that we don't fall back when we run into
one of the relevant cases.
Craig Topper [Wed, 19 Dec 2018 18:49:13 +0000 (18:49 +0000)]
[X86] Don't match TESTrr from (cmp (and X, Y), 0) during isel. Defer to post processing
The (cmp (and X, Y) 0) pattern is greedy and ends up forming a TESTrr and consuming the and when it might be better to use one of the BMI/TBM like BLSR or BLSI.
This patch moves removes the pattern from isel and adds a post processing check to combine TESTrr+ANDrr into just a TESTrr. With this patch we are able to select the BMI/TBM instructions, but we'll also emit a TESTrr when the result is compared to 0. In many cases the peephole pass will be able to use optimizeCompareInstr to remove the TEST, but its probably not perfect.
Craig Topper [Wed, 19 Dec 2018 18:45:57 +0000 (18:45 +0000)]
[X86] Fix assert fails in pass X86AvoidSFBPass
Fixes https://bugs.llvm.org/show_bug.cgi?id=38743
The function removeRedundantBlockingStores is supposed to remove any blocking stores contained in each other in lockingStoresDispSizeMap.
But it currently looks only at the previous one, which will miss some cases that result in assert.
This patch refine the function to check all previous layouts until find the uncontained one. So all redundant stores will be removed.
Yonghong Song [Wed, 19 Dec 2018 16:40:25 +0000 (16:40 +0000)]
[BPF] Generate BTF DebugInfo under BPF target
This patch implements BTF (BPF Type Format).
The BTF is the debug info format for BPF, introduced
in the below linux patch:
https://github.com/torvalds/linux/commit/69b693f0aefa0ed521e8bd02260523b5ae446ad7#diff-06fb1c8825f653d7e539058b72c83332
and further extended several times, e.g.,
https://www.spinics.net/lists/netdev/msg534640.html
https://www.spinics.net/lists/netdev/msg538464.html
https://www.spinics.net/lists/netdev/msg540246.html
The main advantage of implementing in LLVM is:
. better integration/deployment as no extra tools are needed.
. bpf JIT based compilation (like bcc, bpftrace, etc.) can get
BTF without much extra effort.
. BTF line_info needs selective source codes, which can be
easily retrieved when inside the compiler.
This patch implemented BTF generation by registering a BPF
specific DebugHandler in BPFAsmPrinter.
Signed-off-by: Yonghong Song <yhs@fb.com>
Differential Revision: https://reviews.llvm.org/D55752
Peter Wu [Wed, 19 Dec 2018 16:15:05 +0000 (16:15 +0000)]
[Object] Deduplicate long archive member names
Summary:
Import libraries as created by llvm-dlltool always use the same archive
member name for every object file (namely, the DLL library name). Ensure
that long names are not repeatedly stored in the string table.
Simon Pilgrim [Wed, 19 Dec 2018 14:43:36 +0000 (14:43 +0000)]
[X86][SSE] Auto upgrade PADDUS/PSUBUS intrinsics to UADD_SAT/USUB_SAT generic intrinsics (llvm)
Now that we use the generic ISD opcodes, we can use the generic intrinsics directly as well. This fixes the poor fast-isel codegen by not expanding to an easily broken IR code sequence.
I'm intending to deal with the signed saturation equivalents as well.
Simon Pilgrim [Wed, 19 Dec 2018 13:37:59 +0000 (13:37 +0000)]
[TargetLowering] Fix propagation of undefs in zero extension ops (PR40091)
As described on PR40091, we have several places where zext (and zext_vector_inreg) fold an undef input into an undef output. For zero extensions this is incorrect as the output should guarantee to least have the new upper bits set to zero.
SimplifyDemandedVectorElts is the worst offender (and its the most likely to cause new undefs to appear) but DAGCombiner's tryToFoldExtendOfConstant has a similar issue.
Nico Weber [Wed, 19 Dec 2018 13:35:53 +0000 (13:35 +0000)]
Let TableGen write output only if it changed, instead of doing so in cmake, attempt 2
This relands r330742:
"""
Let TableGen write output only if it changed, instead of doing so in cmake.
Removes one subprocess and one temp file from the build for each tablegen
invocation.
No intended behavior change.
"""
In particular, if you see rebuilds after this change that you didn't see
before this change, that's unintended and it's fine to revert this change
again (but let me know).
r330742 got reverted because some people reported that llvm-tblgen ran on every
build after it. This could happen if the depfile output got deleted without
deleting the main .inc output. To fix, make TableGen always write the depfile,
but keep writing the main .inc output only if it has changed. This matches what
we did in cmake before.
Simon Pilgrim [Wed, 19 Dec 2018 10:39:14 +0000 (10:39 +0000)]
[X86][SSE] Remove SSE ADDUS/SUBUS saturation intrinsics from schedule/stack tests
These are already being autoupgraded, currently to an IR sequence, but best to replace them with generic llvm uadd_sat/usub_sat intrinsics (which D55855 will be doing shortly anyhow).
The avx512 masked cases need doing as well but require a bit of tidyup first.
Carl Ritson [Wed, 19 Dec 2018 10:17:49 +0000 (10:17 +0000)]
AMDGPU/InsertWaitcnts: Update VGPR/SGPR bounds when brackets are merged
Summary:
Fix an issue where VGPR/SGPR bounds are not properly extended when brackets are merged.
This manifests as missing waitcnt insertions when multiple brackets are forwarded to a successor block and the first forward has lower VGPR/SGPR bounds.
Irreducible loop test has been extended based on a CTS failure detected for GFX9.
Diana Picus [Wed, 19 Dec 2018 09:55:10 +0000 (09:55 +0000)]
[ARM GlobalISel] Support G_CONSTANT for Thumb2
All we have to do is mark it as legal.
This allows us to select a lot of new patterns handled by TableGen. This
patch adds tests for them and splits up the existing test file for
binary operators into 2 files, one for arithmetic ops and one for
logical ones.
Brian Gesiak [Wed, 19 Dec 2018 03:42:19 +0000 (03:42 +0000)]
[bugpoint][PR29027] Reduce function attributes
Summary:
In addition to reducing the functions in an LLVM module, bugpoint now
reduces the function attributes associated with each of the remaining
functions.
To test this, add a -bugpoint-crashfuncattr test pass, which crashes if
a function in the module has a "bugpoint-crash" attribute. A test case
demonstrates that the IR is reduced to just that one attribute.
Richard Smith [Wed, 19 Dec 2018 03:24:03 +0000 (03:24 +0000)]
Fix use-after-free with profile remapping.
We need to keep the underlying profile reader alive as long as the
profile data, because the profile data may contain StringRefs referring
to strings in the reader's name table.
Kewen Lin [Wed, 19 Dec 2018 03:04:07 +0000 (03:04 +0000)]
[PowerPC]Exploit P9 vabsdu for unsigned vselect patterns
For type v4i32/v8ii16/v16i8, do following transforms:
(vselect (setcc a, b, setugt), (sub a, b), (sub b, a)) -> (vabsd a, b)
(vselect (setcc a, b, setuge), (sub a, b), (sub b, a)) -> (vabsd a, b)
(vselect (setcc a, b, setult), (sub b, a), (sub a, b)) -> (vabsd a, b)
(vselect (setcc a, b, setule), (sub b, a), (sub a, b)) -> (vabsd a, b)
Alexandre Ganea [Wed, 19 Dec 2018 01:30:29 +0000 (01:30 +0000)]
Re-land "Fix MSVC dependency issue between Clang-tablegen and LLVM-tablegen"
Previously, when compiling Visual Studio targets, one could see random build errors. This was caused by tablegen projects using the same build folders.
This workaround simply chains tablegen projects.
Yonghong Song [Tue, 18 Dec 2018 23:10:17 +0000 (23:10 +0000)]
[DebugInfo] Move several private headers to include directory
This patch moved the following files in lib/CodeGen/AsmPrinter/
AsmPrinterHandler.h
DbgEntityHistoryCalculator.h
DebugHandlerBase.h
to include/llvm/CodeGen directory.
Such a change will enable Target to extend DebugHandlerBase
and emit Target specific debug info sections.
Signed-off-by: Yonghong Song <yhs@fb.com>
Differential Revision: https://reviews.llvm.org/D55755
Sanjay Patel [Tue, 18 Dec 2018 22:51:06 +0000 (22:51 +0000)]
[InstCombine] add tests for extract of vector load; NFC
There's a mismatch internally about how we are handling these patterns.
We count loads as cheapToScalarize(), but then we don't actually
scalarize them, so that can leave extra instructions compared to where
we started when scalarizing other ops. If it's cheapToScalarize, then
we should be scalarizing.
Pete Cooper [Tue, 18 Dec 2018 22:31:34 +0000 (22:31 +0000)]
Add nonlazybind to objc_retain/objc_release when converting from intrinsics.
For performance reasons, clang set nonlazybind on these functions. Now that we
are using intrinsics instead of runtime calls, we should set this attribute when
creating the runtime functions.
Florian Hahn [Tue, 18 Dec 2018 22:25:11 +0000 (22:25 +0000)]
[LAA] Introduce enum for vectorization safety status (NFC).
This patch adds a VectorizationSafetyStatus enum, which will be extended
in a follow up patch to distinguish between 'safe with runtime checks'
and 'known unsafe' dependences.
Vitaly Buka [Tue, 18 Dec 2018 22:23:30 +0000 (22:23 +0000)]
[asan] Restore ODR-violation detection on vtables
Summary:
unnamed_addr is still useful for detecting of ODR violations on vtables
Still unnamed_addr with lld and --icf=safe or --icf=all can trigger false
reports which can be avoided with --icf=none or by using private aliases
with -fsanitize-address-use-odr-indicator
The first test claims to show that the vectorizer will
generate a vector load/loop, but then this file runs
other passes which might scalarize that op. I'm removing
instcombine from the RUN line here to break that dependency.
Also, I'm generating full checks to make it clear exactly
what the vectorizer has done.
Pete Cooper [Tue, 18 Dec 2018 22:20:03 +0000 (22:20 +0000)]
Rewrite objc intrinsics to runtime methods in PreISelIntrinsicLowering instead of SDAG.
SelectionDAG currently changes these intrinsics to function calls, but that won't work
for other ISel's. Also we want to eventually support nonlazybind and weak linkage coming
from the front-end which we can't do in SelectionDAG.
Kuba Mracek [Tue, 18 Dec 2018 21:20:17 +0000 (21:20 +0000)]
[asan] In llvm.asan.globals, allow entries to be non-GlobalVariable and skip over them
Looks like there are valid reasons why we need to allow bitcasts in llvm.asan.globals, see discussion at https://github.com/apple/swift-llvm/pull/133. Let's look through bitcasts when iterating over entries in the llvm.asan.globals list.
Alexandre Ganea [Tue, 18 Dec 2018 21:03:06 +0000 (21:03 +0000)]
Fix MSVC dependency issue between Clang-tablegen and LLVM-tablegen
Previously, when compiling Visual Studio targets, one could see random build errors. This was caused by tablegen projects using the same build folders.
This workaround simply chains tablegen projects.
Pete Cooper [Tue, 18 Dec 2018 20:32:49 +0000 (20:32 +0000)]
Change the objc ARC optimizer to use the new objc.* intrinsics
We're moving ARC optimisation and ARC emission in clang away from runtime methods
and towards intrinsics. This is the part which actually uses the intrinsics in the ARC
optimizer when both analyzing the existing calls and emitting new ones.
Nikita Popov [Tue, 18 Dec 2018 19:59:50 +0000 (19:59 +0000)]
[InstCombine] Simplify cttz/ctlz + icmp eq/ne into mask check
Checking whether a number has a certain number of trailing / leading
zeros means checking whether it is of the form XXXX1000 / 0001XXXX,
which can be done with an and+icmp.
Related to https://bugs.llvm.org/show_bug.cgi?id=28668. As a next
step, this can be extended to non-equality predicates.
Farhana Aleen [Tue, 18 Dec 2018 19:58:39 +0000 (19:58 +0000)]
[AMDGPU] Removed the unnecessary operand size-check-assert from processBaseWithConstOffset().
Summary: 32bit operand sizes are guaranteed by the opcode check AMDGPU::V_ADD_I32_e64 and
AMDGPU::V_ADDC_U32_e64. Therefore, we don't any additional operand size-check-assert.
Sanjay Patel [Tue, 18 Dec 2018 19:07:38 +0000 (19:07 +0000)]
[InstCombine] refactor isCheapToScalarize(); NFC
As the FIXME indicates, this has the potential to go
overboard. So I'm not sure if it's even worth keeping
this vs. iteratively doing simple matches, but we might
as well clean it up.
Nikita Popov [Tue, 18 Dec 2018 18:28:22 +0000 (18:28 +0000)]
[X86] Use SADDSAT/SSUBSAT instead of ADDS/SUBS
Migrate the X86 backend from X86ISD opcodes ADDS and SUBS to generic
ISD opcodes SADDSAT and SSUBSAT. This also improves scodegen for
@llvm.sadd.sat() and @llvm.ssub.sat() intrinsics.
Alexandre Ganea [Tue, 18 Dec 2018 18:17:00 +0000 (18:17 +0000)]
[CMake] Default options for faster executables on MSVC
- Disable incremental linking by default. /INCREMENTAL adds extra thunks in the EXE, which makes execution slower.
- Set /MT (static CRT lib) by default instead of CMake's default /MD (dll CRT lib). The previous default /MD makes all DLL functions to be thunked, thus making execution slower (memcmp, memset, etc.)
- Adds LLVM_ENABLE_INCREMENTAL_LINK which is set to OFF by default.
Michael Berg [Tue, 18 Dec 2018 17:54:52 +0000 (17:54 +0000)]
Add FMF management to common fp intrinsics in GlobalIsel
Summary: This the initial code change to facilitate managing FMF flags from Instructions to MI wrt Intrinsics in Global Isel. Eventually the GlobalObserver interface will be added as well, where FMF additions can be tracked for the builder and CSE.
Michael Kruse [Tue, 18 Dec 2018 17:46:09 +0000 (17:46 +0000)]
[LoopVectorize] Rename pass options. NFC.
Rename:
NoUnrolling to InterleaveOnlyWhenForced
and
AlwaysVectorize to !VectorizeOnlyWhenForced
Contrary to what the name 'AlwaysVectorize' suggests, it does not
unconditionally vectorize all loops, but applies a cost model to
determine whether vectorization is profitable to all loops. Hence,
passing false will disable the cost model, except when a loop is marked
with llvm.loop.vectorize.enable. The 'OnlyWhenForced' suffix (suggested
by @hfinkel in D55716) better matches this behavior.
Similarly, 'NoUnrolling' disables the profitability cost model for
interleaving (a term to distinguish it from unrolling by the
LoopUnrollPass); rename it for consistency.
Michael Kruse [Tue, 18 Dec 2018 17:16:05 +0000 (17:16 +0000)]
[LoopUnroll] Honor '#pragma unroll' even with -fno-unroll-loops.
When using clang with `-fno-unroll-loops` (implicitly added with `-O1`),
the LoopUnrollPass is not not added to the (legacy) pass pipeline. This
also means that it will not process any loop metadata such as
llvm.loop.unroll.enable (which is generated by #pragma unroll or
WarnMissedTransformationsPass emits a warning that a forced
transformation has not been applied (see
https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181210/610833.html).
Such explicit transformations should take precedence over disabling
heuristics.
This patch unconditionally adds LoopUnrollPass to the optimizing
pipeline (that is, it is still not added with `-O0`), but passes a flag
indicating whether automatic unrolling is dis-/enabled. This is the same
approach as LoopVectorize uses.
The new pass manager's pipeline builder has no option to disable
unrolling, hence the problem does not apply.