Simon Pilgrim [Wed, 19 Dec 2018 13:37:59 +0000 (13:37 +0000)]
[TargetLowering] Fix propagation of undefs in zero extension ops (PR40091)
As described on PR40091, we have several places where zext (and zext_vector_inreg) fold an undef input into an undef output. For zero extensions this is incorrect as the output should guarantee to least have the new upper bits set to zero.
SimplifyDemandedVectorElts is the worst offender (and its the most likely to cause new undefs to appear) but DAGCombiner's tryToFoldExtendOfConstant has a similar issue.
Nico Weber [Wed, 19 Dec 2018 13:35:53 +0000 (13:35 +0000)]
Let TableGen write output only if it changed, instead of doing so in cmake, attempt 2
This relands r330742:
"""
Let TableGen write output only if it changed, instead of doing so in cmake.
Removes one subprocess and one temp file from the build for each tablegen
invocation.
No intended behavior change.
"""
In particular, if you see rebuilds after this change that you didn't see
before this change, that's unintended and it's fine to revert this change
again (but let me know).
r330742 got reverted because some people reported that llvm-tblgen ran on every
build after it. This could happen if the depfile output got deleted without
deleting the main .inc output. To fix, make TableGen always write the depfile,
but keep writing the main .inc output only if it has changed. This matches what
we did in cmake before.
Simon Pilgrim [Wed, 19 Dec 2018 10:39:14 +0000 (10:39 +0000)]
[X86][SSE] Remove SSE ADDUS/SUBUS saturation intrinsics from schedule/stack tests
These are already being autoupgraded, currently to an IR sequence, but best to replace them with generic llvm uadd_sat/usub_sat intrinsics (which D55855 will be doing shortly anyhow).
The avx512 masked cases need doing as well but require a bit of tidyup first.
Carl Ritson [Wed, 19 Dec 2018 10:17:49 +0000 (10:17 +0000)]
AMDGPU/InsertWaitcnts: Update VGPR/SGPR bounds when brackets are merged
Summary:
Fix an issue where VGPR/SGPR bounds are not properly extended when brackets are merged.
This manifests as missing waitcnt insertions when multiple brackets are forwarded to a successor block and the first forward has lower VGPR/SGPR bounds.
Irreducible loop test has been extended based on a CTS failure detected for GFX9.
Diana Picus [Wed, 19 Dec 2018 09:55:10 +0000 (09:55 +0000)]
[ARM GlobalISel] Support G_CONSTANT for Thumb2
All we have to do is mark it as legal.
This allows us to select a lot of new patterns handled by TableGen. This
patch adds tests for them and splits up the existing test file for
binary operators into 2 files, one for arithmetic ops and one for
logical ones.
Brian Gesiak [Wed, 19 Dec 2018 03:42:19 +0000 (03:42 +0000)]
[bugpoint][PR29027] Reduce function attributes
Summary:
In addition to reducing the functions in an LLVM module, bugpoint now
reduces the function attributes associated with each of the remaining
functions.
To test this, add a -bugpoint-crashfuncattr test pass, which crashes if
a function in the module has a "bugpoint-crash" attribute. A test case
demonstrates that the IR is reduced to just that one attribute.
Richard Smith [Wed, 19 Dec 2018 03:24:03 +0000 (03:24 +0000)]
Fix use-after-free with profile remapping.
We need to keep the underlying profile reader alive as long as the
profile data, because the profile data may contain StringRefs referring
to strings in the reader's name table.
Kewen Lin [Wed, 19 Dec 2018 03:04:07 +0000 (03:04 +0000)]
[PowerPC]Exploit P9 vabsdu for unsigned vselect patterns
For type v4i32/v8ii16/v16i8, do following transforms:
(vselect (setcc a, b, setugt), (sub a, b), (sub b, a)) -> (vabsd a, b)
(vselect (setcc a, b, setuge), (sub a, b), (sub b, a)) -> (vabsd a, b)
(vselect (setcc a, b, setult), (sub b, a), (sub a, b)) -> (vabsd a, b)
(vselect (setcc a, b, setule), (sub b, a), (sub a, b)) -> (vabsd a, b)
Alexandre Ganea [Wed, 19 Dec 2018 01:30:29 +0000 (01:30 +0000)]
Re-land "Fix MSVC dependency issue between Clang-tablegen and LLVM-tablegen"
Previously, when compiling Visual Studio targets, one could see random build errors. This was caused by tablegen projects using the same build folders.
This workaround simply chains tablegen projects.
Yonghong Song [Tue, 18 Dec 2018 23:10:17 +0000 (23:10 +0000)]
[DebugInfo] Move several private headers to include directory
This patch moved the following files in lib/CodeGen/AsmPrinter/
AsmPrinterHandler.h
DbgEntityHistoryCalculator.h
DebugHandlerBase.h
to include/llvm/CodeGen directory.
Such a change will enable Target to extend DebugHandlerBase
and emit Target specific debug info sections.
Signed-off-by: Yonghong Song <yhs@fb.com>
Differential Revision: https://reviews.llvm.org/D55755
Sanjay Patel [Tue, 18 Dec 2018 22:51:06 +0000 (22:51 +0000)]
[InstCombine] add tests for extract of vector load; NFC
There's a mismatch internally about how we are handling these patterns.
We count loads as cheapToScalarize(), but then we don't actually
scalarize them, so that can leave extra instructions compared to where
we started when scalarizing other ops. If it's cheapToScalarize, then
we should be scalarizing.
Pete Cooper [Tue, 18 Dec 2018 22:31:34 +0000 (22:31 +0000)]
Add nonlazybind to objc_retain/objc_release when converting from intrinsics.
For performance reasons, clang set nonlazybind on these functions. Now that we
are using intrinsics instead of runtime calls, we should set this attribute when
creating the runtime functions.
Florian Hahn [Tue, 18 Dec 2018 22:25:11 +0000 (22:25 +0000)]
[LAA] Introduce enum for vectorization safety status (NFC).
This patch adds a VectorizationSafetyStatus enum, which will be extended
in a follow up patch to distinguish between 'safe with runtime checks'
and 'known unsafe' dependences.
Vitaly Buka [Tue, 18 Dec 2018 22:23:30 +0000 (22:23 +0000)]
[asan] Restore ODR-violation detection on vtables
Summary:
unnamed_addr is still useful for detecting of ODR violations on vtables
Still unnamed_addr with lld and --icf=safe or --icf=all can trigger false
reports which can be avoided with --icf=none or by using private aliases
with -fsanitize-address-use-odr-indicator
The first test claims to show that the vectorizer will
generate a vector load/loop, but then this file runs
other passes which might scalarize that op. I'm removing
instcombine from the RUN line here to break that dependency.
Also, I'm generating full checks to make it clear exactly
what the vectorizer has done.
Pete Cooper [Tue, 18 Dec 2018 22:20:03 +0000 (22:20 +0000)]
Rewrite objc intrinsics to runtime methods in PreISelIntrinsicLowering instead of SDAG.
SelectionDAG currently changes these intrinsics to function calls, but that won't work
for other ISel's. Also we want to eventually support nonlazybind and weak linkage coming
from the front-end which we can't do in SelectionDAG.
Kuba Mracek [Tue, 18 Dec 2018 21:20:17 +0000 (21:20 +0000)]
[asan] In llvm.asan.globals, allow entries to be non-GlobalVariable and skip over them
Looks like there are valid reasons why we need to allow bitcasts in llvm.asan.globals, see discussion at https://github.com/apple/swift-llvm/pull/133. Let's look through bitcasts when iterating over entries in the llvm.asan.globals list.
Alexandre Ganea [Tue, 18 Dec 2018 21:03:06 +0000 (21:03 +0000)]
Fix MSVC dependency issue between Clang-tablegen and LLVM-tablegen
Previously, when compiling Visual Studio targets, one could see random build errors. This was caused by tablegen projects using the same build folders.
This workaround simply chains tablegen projects.
Pete Cooper [Tue, 18 Dec 2018 20:32:49 +0000 (20:32 +0000)]
Change the objc ARC optimizer to use the new objc.* intrinsics
We're moving ARC optimisation and ARC emission in clang away from runtime methods
and towards intrinsics. This is the part which actually uses the intrinsics in the ARC
optimizer when both analyzing the existing calls and emitting new ones.
Nikita Popov [Tue, 18 Dec 2018 19:59:50 +0000 (19:59 +0000)]
[InstCombine] Simplify cttz/ctlz + icmp eq/ne into mask check
Checking whether a number has a certain number of trailing / leading
zeros means checking whether it is of the form XXXX1000 / 0001XXXX,
which can be done with an and+icmp.
Related to https://bugs.llvm.org/show_bug.cgi?id=28668. As a next
step, this can be extended to non-equality predicates.
Farhana Aleen [Tue, 18 Dec 2018 19:58:39 +0000 (19:58 +0000)]
[AMDGPU] Removed the unnecessary operand size-check-assert from processBaseWithConstOffset().
Summary: 32bit operand sizes are guaranteed by the opcode check AMDGPU::V_ADD_I32_e64 and
AMDGPU::V_ADDC_U32_e64. Therefore, we don't any additional operand size-check-assert.
Sanjay Patel [Tue, 18 Dec 2018 19:07:38 +0000 (19:07 +0000)]
[InstCombine] refactor isCheapToScalarize(); NFC
As the FIXME indicates, this has the potential to go
overboard. So I'm not sure if it's even worth keeping
this vs. iteratively doing simple matches, but we might
as well clean it up.
Nikita Popov [Tue, 18 Dec 2018 18:28:22 +0000 (18:28 +0000)]
[X86] Use SADDSAT/SSUBSAT instead of ADDS/SUBS
Migrate the X86 backend from X86ISD opcodes ADDS and SUBS to generic
ISD opcodes SADDSAT and SSUBSAT. This also improves scodegen for
@llvm.sadd.sat() and @llvm.ssub.sat() intrinsics.
Alexandre Ganea [Tue, 18 Dec 2018 18:17:00 +0000 (18:17 +0000)]
[CMake] Default options for faster executables on MSVC
- Disable incremental linking by default. /INCREMENTAL adds extra thunks in the EXE, which makes execution slower.
- Set /MT (static CRT lib) by default instead of CMake's default /MD (dll CRT lib). The previous default /MD makes all DLL functions to be thunked, thus making execution slower (memcmp, memset, etc.)
- Adds LLVM_ENABLE_INCREMENTAL_LINK which is set to OFF by default.
Michael Berg [Tue, 18 Dec 2018 17:54:52 +0000 (17:54 +0000)]
Add FMF management to common fp intrinsics in GlobalIsel
Summary: This the initial code change to facilitate managing FMF flags from Instructions to MI wrt Intrinsics in Global Isel. Eventually the GlobalObserver interface will be added as well, where FMF additions can be tracked for the builder and CSE.
Michael Kruse [Tue, 18 Dec 2018 17:46:09 +0000 (17:46 +0000)]
[LoopVectorize] Rename pass options. NFC.
Rename:
NoUnrolling to InterleaveOnlyWhenForced
and
AlwaysVectorize to !VectorizeOnlyWhenForced
Contrary to what the name 'AlwaysVectorize' suggests, it does not
unconditionally vectorize all loops, but applies a cost model to
determine whether vectorization is profitable to all loops. Hence,
passing false will disable the cost model, except when a loop is marked
with llvm.loop.vectorize.enable. The 'OnlyWhenForced' suffix (suggested
by @hfinkel in D55716) better matches this behavior.
Similarly, 'NoUnrolling' disables the profitability cost model for
interleaving (a term to distinguish it from unrolling by the
LoopUnrollPass); rename it for consistency.
Michael Kruse [Tue, 18 Dec 2018 17:16:05 +0000 (17:16 +0000)]
[LoopUnroll] Honor '#pragma unroll' even with -fno-unroll-loops.
When using clang with `-fno-unroll-loops` (implicitly added with `-O1`),
the LoopUnrollPass is not not added to the (legacy) pass pipeline. This
also means that it will not process any loop metadata such as
llvm.loop.unroll.enable (which is generated by #pragma unroll or
WarnMissedTransformationsPass emits a warning that a forced
transformation has not been applied (see
https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181210/610833.html).
Such explicit transformations should take precedence over disabling
heuristics.
This patch unconditionally adds LoopUnrollPass to the optimizing
pipeline (that is, it is still not added with `-O0`), but passes a flag
indicating whether automatic unrolling is dis-/enabled. This is the same
approach as LoopVectorize uses.
The new pass manager's pipeline builder has no option to disable
unrolling, hence the problem does not apply.
Petar Avramovic [Tue, 18 Dec 2018 15:59:51 +0000 (15:59 +0000)]
[MIPS GlobalISel] Select G_SDIV, G_UDIV, G_SREM and G_UREM
Add support for s64 libcalls for G_SDIV, G_UDIV, G_SREM and G_UREM
and use integer type of correct size when creating arguments for
CLI.lowerCall.
Select G_SDIV, G_UDIV, G_SREM and G_UREM for types s8, s16, s32 and s64
on MIPS32.
Nico Weber [Tue, 18 Dec 2018 13:52:21 +0000 (13:52 +0000)]
[gn build] Add build files for llvm-ar, llvm-nm, llvm-objdump, llvm-readelf
Also add build files for deps DebugInfo/Symbolize, ToolDrivers/dll-tool.
Also add gn/build/libs/xar (needed by llvm-objdump).
Also delete an incorrect part of the symlink description in //BUILD.gn (it used
to be true before I made the symlink step write a stamp file; now it's no
longer true).
These are all binaries needed by check-lld that need symlinks.
Nikita Popov [Tue, 18 Dec 2018 13:23:03 +0000 (13:23 +0000)]
[X86] Use UADDSAT/USUBSAT instead of ADDUS/SUBUS
Replace the X86ISD opcodes ADDUS and SUBUS with generic ISD opcodes
UADDSAT and USUBSAT. As a side-effect, this also makes codegen for
the @llvm.uadd.sat and @llvm.usub.sat intrinsics reasonable.
This only replaces use in the X86 backend, and does not move any of
the ADDUS/SUBUS X86 specific combines into generic codegen.
Peter Smith [Tue, 18 Dec 2018 12:40:19 +0000 (12:40 +0000)]
[docs] Improve HowToCrossCompilerBuiltinsOnArm
Some recent experience on llvm-dev pointed out some errors in the document:
- Assumption of ninja
- Use of --march rather than -march
- Problems with host include files when a multiarch setup was used
- Insufficient target information passed to assembler
- Instructions on using the cmake cache file BaremetalARM.cmake were
incomplete
There was also insufficient guidance on what to do when various stages
failed due to misconfiguration or missing steps.
Summary of changes:
- Fixed problems above
- Added a troubleshooting section with common errors.
- Cleared up one "at time of writing" that is no longer a problem.
George Rimar [Tue, 18 Dec 2018 12:15:01 +0000 (12:15 +0000)]
[llvm-dwarfdump] - Do not error out on R_X86_64_DTPOFF64/R_X86_64_DTPOFF32 relocations.
This is https://bugs.llvm.org/show_bug.cgi?id=39992,
If we have the following code (test.cpp):
thread_local int tdata = 24;
and build an .o file with debug information:
clang --target=x86_64-pc-linux -c bar.cpp -g
Then object produced may have R_X86_64_DTPOFF64/R_X86_64_DTPOFF32 relocations.
(clang emits R_X86_64_DTPOFF64 and gcc emits R_X86_64_DTPOFF32 for the code above for me)
Currently, llvm-dwarfdump fails to compute this TLS relocation when dumping
object and reports an
error:
failed to compute relocation: R_X86_64_DTPOFF64, Invalid data was encountered while parsing the file
This relocation represents the offset in the TLS block and resolved by the linker,
but this info is unavailable at the
point when the object file is dumped by this tool.
The patch adds the simple evaluation for such relocations to avoid emitting errors.
Resulting behavior seems to be equal to GNU dwarfdump.
Luke Cheeseman [Tue, 18 Dec 2018 10:37:42 +0000 (10:37 +0000)]
[AArch64] - Return address signing dwarf support
- Reapply changes intially introduced in r343089
- The archtecture info is no longer loaded whenever a DWARFContext is created
- The runtimes libraries (santiziers) make use of the dwarf context classes but
do not intialise the target info
- The architecture of the object can be obtained without loading the target info
- Adding a method to the dwarf context to get this information and multiplex the
string printing later on
Tim Northover [Tue, 18 Dec 2018 09:29:39 +0000 (09:29 +0000)]
SROA: preserve alignment tags on loads and stores.
When splitting up an alloca's uses we were dropping any explicit
alignment tags, which means they default to the ABI-required default
alignment and this can cause miscompiles if the real value was smaller.
Also refactor the TBAA metadata into a parent class since it's shared by
both children anyway.
This fold was incredibly specific - replace with a SimplifyDemandedBits fold to remove a VSRAI if only the original sign bit is demanded (its guaranteed to stay the same).
Kristof Beyls [Tue, 18 Dec 2018 08:50:02 +0000 (08:50 +0000)]
Introduce control flow speculation tracking pass for AArch64
The pass implements tracking of control flow miss-speculation into a "taint"
register. That taint register can then be used to mask off registers with
sensitive data when executing under miss-speculation, a.k.a. "transient
execution".
This pass is aimed at mitigating against SpectreV1-style vulnarabilities.
At the moment, it implements the tracking of miss-speculation of control
flow into a taint register, but doesn't implement a mechanism yet to then
use that taint register to mask off vulnerable data in registers (something
for a follow-on improvement). Possible strategies to mask out vulnerable
data that can be implemented on top of this are:
- speculative load hardening to automatically mask of data loaded
in registers.
- using intrinsics to mask of data in registers as indicated by the
programmer (see https://lwn.net/Articles/759423/).
For AArch64, the following implementation choices are made.
Some of these are different than the implementation choices made in
the similar pass implemented in X86SpeculativeLoadHardening.cpp, as
the instruction set characteristics result in different trade-offs.
- The speculation hardening is done after register allocation. With a
relative abundance of registers, one register is reserved (X16) to be
the taint register. X16 is expected to not clash with other register
reservation mechanisms with very high probability because:
. The AArch64 ABI doesn't guarantee X16 to be retained across any call.
. The only way to request X16 to be used as a programmer is through
inline assembly. In the rare case a function explicitly demands to
use X16/W16, this pass falls back to hardening against speculation
by inserting a DSB SYS/ISB barrier pair which will prevent control
flow speculation.
- It is easy to insert mask operations at this late stage as we have
mask operations available that don't set flags.
- The taint variable contains all-ones when no miss-speculation is detected,
and contains all-zeros when miss-speculation is detected. Therefore, when
masking, an AND instruction (which only changes the register to be masked,
no other side effects) can easily be inserted anywhere that's needed.
- The tracking of miss-speculation is done by using a data-flow conditional
select instruction (CSEL) to evaluate the flags that were also used to
make conditional branch direction decisions. Speculation of the CSEL
instruction can be limited with a CSDB instruction - so the combination of
CSEL + a later CSDB gives the guarantee that the flags as used in the CSEL
aren't speculated. When conditional branch direction gets miss-speculated,
the semantics of the inserted CSEL instruction is such that the taint
register will contain all zero bits.
One key requirement for this to work is that the conditional branch is
followed by an execution of the CSEL instruction, where the CSEL
instruction needs to use the same flags status as the conditional branch.
This means that the conditional branches must not be implemented as one
of the AArch64 conditional branches that do not use the flags as input
(CB(N)Z and TB(N)Z). This is implemented by ensuring in the instruction
selectors to not produce these instructions when speculation hardening
is enabled. This pass will assert if it does encounter such an instruction.
- On function call boundaries, the miss-speculation state is transferred from
the taint register X16 to be encoded in the SP register as value 0.
Future extensions/improvements could be:
- Implement this functionality using full speculation barriers, akin to the
x86-slh-lfence option. This may be more useful for the intrinsics-based
approach than for the SLH approach to masking.
Note that this pass already inserts the full speculation barriers if the
function for some niche reason makes use of X16/W16.
- no indirect branch misprediction gets protected/instrumented; but this
could be done for some indirect branches, such as switch jump tables.
Kewen Lin [Tue, 18 Dec 2018 08:11:32 +0000 (08:11 +0000)]
[PowerPC][NFC]Update vabsd cases with vselect test cases
Power9 VABSDU* instructions can be exploited for some special vselect sequences.
Check in the orignal test case here, later the exploitation patch will update this
and reviewers can check the differences easily.
Kewen Lin [Tue, 18 Dec 2018 03:16:43 +0000 (03:16 +0000)]
[PowerPC] Improve vec_abs on P9
Improve the current vec_abs support on P9, generate ISD::ABS node for vector types,
combine ABS node to VABSD node for some special cases to make use of P9 VABSD* insns,
do custom lowering to vsub(vneg later)+vmax if it has no combination opportunity.
Reid Kleckner [Tue, 18 Dec 2018 01:14:05 +0000 (01:14 +0000)]
[codeview] Align symbol records to save 441MB during linking clang.pdb
In PDBs, symbol records must be aligned to four bytes. However, in the
object file, symbol records may not be aligned. MSVC does not pad out
symbol records to make sure they are aligned. That means the linker has
to do extra work to insert the padding. Currently, LLD calculates the
required space with alignment, and copies each record one at a time
while padding them out to the correct size. It has a fast path that
avoids this copy when the records are already aligned.
This change fixes a bug in that codepath so that the copy is actually
saved, and tweaks LLVM's symbol record emission to align symbol records.
Here's how things compare when doing a plain clang Release+PDB build:
- objs are 0.65% bigger (negligible)
- link is 3.3% faster (negligible)
- saves allocating 441MB
- new LLD high water mark is ~1.05GB
David Blaikie [Tue, 18 Dec 2018 01:06:09 +0000 (01:06 +0000)]
Recommit r348806: DebugInfo: Use symbol difference for CU length to simplify assembly reading/editing
Mucking about simplifying a test case ( https://reviews.llvm.org/D55261 ) I stumbled across something I've hit before - that LLVM's (GCC's does too, FWIW) assembly output includes a hardcode length for a DWARF unit in its header. Instead we could emit a label difference - making the assembly easier to read/edit (though potentially at a slight (I haven't tried to observe it) performance cost of delaying/sinking the length computation into the MC layer).
Fix: Predicated all the changes (including creating the labels, even if they aren't used/needed) behind the NVPTX useSectionsAsReferences, avoiding emitting labels in NVPTX where ptxas can't parse them.
Joel E. Denny [Tue, 18 Dec 2018 00:03:36 +0000 (00:03 +0000)]
[FileCheck] Annotate input dump (7/7)
This patch implements annotations for diagnostics reporting CHECK-NOT
failed matches. These diagnostics are enabled by -vv. As for
diagnostics reporting failed matches for other directives, these
annotations mark the search ranges using `X~~`. The difference here
is that failed matches for CHECK-NOT are successes not errors, so they
are green not red when colors are enabled.
For example:
```
$ FileCheck -dump-input=help
The following description was requested by -dump-input=help to
explain the input annotations printed by -dump-input=always and
-dump-input=fail:
- L: labels line number L of the input file
- T:L labels the only match result for a pattern of type T from line L of
the check file
- T:L'N labels the Nth match result for a pattern of type T from line L of
the check file
- ^~~ marks good match (reported if -v)
- !~~ marks bad match, such as:
- CHECK-NEXT on same line as previous match (error)
- CHECK-NOT found (error)
- CHECK-DAG overlapping match (discarded, reported if -vv)
- X~~ marks search range when no match is found, such as:
- CHECK-NEXT not found (error)
- CHECK-NOT not found (success, reported if -vv)
- CHECK-DAG not found after discarded matches (error)
- ? marks fuzzy match when no match is found
- colors success, error, fuzzy match, discarded match, unmatched input
If you are not seeing color above or in input dumps, try: -color