Reapply: [mips] Handle MipsMCExpr sub-expression for the MEK_DTPREL tag
This reapplies commit r351987 with a failed test fix. Now the test
accepts both DW_OP_GNU_push_tls_address and DW_OP_form_tls_address
opcode.
Original commit message:
```
This is a fix for a regression introduced by the rL348194 commit. In
that change new type (MEK_DTPREL) of MipsMCExpr expression was added,
but in some places of the code this type of expression considered as
unexpected.
This change fixes the bug. The MEK_DTPREL type of expression is used for
marking TLS DIEExpr only and contains a regular sub-expression. Where we
need to handle the expression, we retrieve the sub-expression and
handle it in a common way.
```
------------------------------------------------------------------------
[mips] Emit .reloc R_{MICRO}MIPS_JALR along with j(al)r(c) $25
The callee address is added as an optional operand (MCSymbol) in
AdjustInstrPostInstrSelection() and then used by asm printer to insert:
'.reloc tmplabel, R_MIPS_JALR, symbol
tmplabel:'.
Controlled with '-mips-jalr-reloc', default is true.
Hans Wennborg [Thu, 24 Jan 2019 22:07:01 +0000 (22:07 +0000)]
Merging r351930 and r351932:
------------------------------------------------------------------------
r351930 | kbeyls | 2019-01-23 09:18:39 +0100 (Wed, 23 Jan 2019) | 30 lines
[SLH] AArch64: correctly pick temporary register to mask SP
As part of speculation hardening, the stack pointer gets masked with the
taint register (X16) before a function call or before a function return.
Since there are no instructions that can directly mask writing to the
stack pointer, the stack pointer must first be transferred to another
register, where it can be masked, before that value is transferred back
to the stack pointer.
Before, that temporary register was always picked to be x17, since the
ABI allows clobbering x17 on any function call, resulting in the
following instruction pattern being inserted before function calls and
returns/tail calls:
mov x17, sp
and x17, x17, x16
mov sp, x17
However, x17 can be live in those locations, for example when the call
is an indirect call, using x17 as the target address (blr x17).
To fix this, this patch looks for an available register just before the
call or terminator instruction and uses that.
In the rare case when no register turns out to be available (this
situation is only encountered twice across the whole test-suite), just
insert a full speculation barrier at the start of the basic block where
this occurs.
[AArch64] add more tests for buildvec to shuffle transform; NFC
These are copied from the sibling x86 file. I'm not sure which
of the current outputs (if any) is considered optimal, but
someone more familiar with AArch may want to take a look.
[DAGCombiner] fix crash when converting build vector to shuffle
The regression test is reduced from the example shown in D56281.
This does raise a question as noted in the test file: do we want
to handle this pattern? I don't have a motivating example for
that on x86 yet, but it seems like we could have that pattern
there too, so we could avoid the back-and-forth using a shuffle.
Assertion in isAllocaPromotable due to extra bitcast goes into lifetime marker
For the given test SROA detects possible replacement and creates a correct alloca. After that SROA is adding lifetime markers for this new alloca. The function getNewAllocaSlicePtr is trying to deduce the pointer type based on the original alloca, which is split, to use it later in lifetime intrinsic.
For the test we ended up with such code (rA is initial alloca [10 x float], which is split, and rA.sroa.0.0 is a new split allocation)
```
%rA.sroa.0.0.rA.sroa_cast = bitcast i32* %rA.sroa.0 to [10 x float]* <----- this one causing the assertion and is an extra bitcast
%5 = bitcast [10 x float]* %rA.sroa.0.0.rA.sroa_cast to i8*
call void @llvm.lifetime.start.p0i8(i64 4, i8* %5)
```
isAllocaPromotable code assumes that a user of alloca may go into lifetime marker through bitcast but it must be the only one bitcast to i8* type. In the test it's not a i8* type, return false and throw the assertion.
As we are creating a pointer, which will be used in lifetime markers only, the proposed fix is to create a bitcast to i8* immediately to avoid extra bitcast creation.
The test is a greatly simplified to just reproduce the assertion.
Author: Igor Tsimbalist <igor.v.tsimbalist@intel.com>
Summary:
InstCombine's sinking algorithm only thinks about memory. It doesn't
think about non-memory constraints like stack object lifetime. It can
sink dynamic allocas across a stacksave call, which may be used with
stackrestore, which can incorrectly reduce the lifetime of the dynamic
alloca.
[ARM64][Windows] Share unwind codes between epilogues
There are cases where we have multiple epilogues that have the exact same unwind
code sequence. In that case, the epilogues can share the same unwind codes in
the .xdata section. This should get us past the assert "SEH unwind data
splitting not yet implemented" in many cases.
We still need to add support for generating multiple .pdata/.xdata sections for
those functions that need to be split into fragments.
[COFF, ARM64] Implement support for SEH extensions __try/__except/__finally
Summary:
This patch supports MS SEH extensions __try/__except/__finally. The intrinsics localescape and localrecover are responsible for communicating escaped static allocas from the try block to the handler.
We need to preserve frame pointers for SEH. So we create a new function/property HasLocalEscape.
[X86] Add X86ISD::VSHLV and X86ISD::VSRLV nodes for psllv and psrlv
Previously we used ISD::SHL and ISD::SRL to represent these in SelectionDAG. ISD::SHL/SRL interpret an out of range shift amount as undefined behavior and will constant fold to undef. While the intrinsics are defined to return 0 for out of range shift amounts. A previous patch added a special node for VPSRAV to produce all sign bits.
This was previously believed safe because undefs frequently get turned into 0 either from the constant pool or a desire to not have a false register dependency. But undef is treated specially in some optimizations. For example, its ignored in detection of vector splats. So if the ISD::SHL/SRL can be constant folded and all of the elements with in bounds shift amounts are the same, we might fold it to single element broadcast from the constant pool. This would not put 0s in the elements with out of bounds shift amounts.
We do have an existing InstCombine optimization to use shl/lshr when the shift amounts are all constant and in bounds. That should prevent some loss of constant folding from this change.
[SLP] Fix PR40310: The reduction nodes should stay scalar.
Summary:
Sometimes the SLP vectorizer tries to vectorize the horizontal reduction
nodes during regular vectorization. This may happen inside of the loops,
when there are some vectorizable PHIs. Patch fixes this by checking if
the node is the reduction node and thus it must not be vectorized, it must
be gathered.
Pavel Labath [Wed, 16 Jan 2019 09:55:32 +0000 (09:55 +0000)]
[Support] Remove error return value from one overload of fs::make_absolute
Summary:
The version of make_absolute which accepted a specific directory to use
as the "base" for the computation could never fail, even though it
returned a std::error_code. The reason for that seems to be historical
-- the CWD flavour (which can fail due to failure to retrieve CWD) was
there first, and the new version was implemented by extending that.
This removes the error return value from the non-CWD overload and
reimplements the CWD version on top of that. This enables us to remove
some dead code where people were pessimistically trying to handle the
errors returned from this function.
Philip Pfaffe [Wed, 16 Jan 2019 09:28:01 +0000 (09:28 +0000)]
[NewPM][TSan] Reiterate the TSan port
Summary:
Second iteration of D56433 which got reverted in rL350719. The problem
in the previous version was that we dropped the thunk calling the tsan init
function. The new version keeps the thunk which should appease dyld, but is not
actually OK wrt. the current semantics of function passes. Hence, add a
helper to insert the functions only on the first time. The helper
allows hooking into the insertion to be able to append them to the
global ctors list.
Sam Parker [Wed, 16 Jan 2019 08:40:12 +0000 (08:40 +0000)]
[DAGCombine] Fix ReduceLoadWidth for shifted offsets
ReduceLoadWidth can trigger using a shifted mask is used and this
requires that the function return a shl node to correct for the
offset. However, the way that this was implemented meant that the
returned result could be an existing node, which would be incorrect.
This fixes the method of inserting the new node and replacing uses.
Tom Stellard [Wed, 16 Jan 2019 05:15:31 +0000 (05:15 +0000)]
Only promote args when function attributes are compatible
Summary:
Check to make sure that the caller and the callee have compatible
function arguments before promoting arguments. This uses the same
TargetTransformInfo queries that are used to determine if attributes
are compatible for inlining.
The goal here is to avoid breaking ABI when a called function's ABI
depends on a target feature that is not enabled in the caller.
This is a very conservative fix for PR37358. Ideally we would have a more
sophisticated check for ABI compatiblity rather than checking if the
attributes are compatible for inlining.
Serguei Katkov [Wed, 16 Jan 2019 04:36:26 +0000 (04:36 +0000)]
[InstCombine]Avoid introduction of unaligned mem access
InstCombine is able to transform mem transfer instrinsic to alone store or store/load pair.
It might result in generation of unaligned atomic load/store which later in backend
will be transformed to libcall. It is not an evident gain and it is better to keep intrinsic as is
and handle it at backend.
[GISel]: Add support for CSEing continuously during GISel passes.
https://reviews.llvm.org/D52803
This patch adds support to continuously CSE instructions during
each of the GISel passes. It consists of a GISelCSEInfo analysis pass
that can be used by the CSEMIRBuilder.
[EH] Rename llvm.x86.seh.recoverfp intrinsic to llvm.eh.recoverfp
Summary:
Make recoverfp intrinsic target-independent so that it can be implemented for AArch64, etc.
Refer D53541 for the context. Clang counterpart D56748.
The Android sanitizer tests are currently some of the most difficult
to run correctly, requiring at least 3 build directories which have
to be configured in just the right way and built in the correct order
(see e.g. [1] and the functions that it calls).
This patch adds a check-hwasan target which greatly simplifies running
the hwasan tests for gn users, taking advantage of its support for
multiple toolchains. With this the tests can be run simply by setting
an NDK path and running "ninja check-hwasan" with a compatible Android
device connected. The Linux/x86_64 and Android/aarch64 targets are
tested in parallel.
Craig Topper [Tue, 15 Jan 2019 23:36:25 +0000 (23:36 +0000)]
[X86] Add avx512 scatter intrinsics that use a vXi1 mask instead of a scalar integer.
We're trying to have the vXi1 types in IR as much as possible. This prevents the need for bitcasts when the producer of the mask was already a vXi1 value like an icmp. The bitcasts can be subject to code motion and interfere with basic block at a time isel in bad ways.
Changpeng Fang [Tue, 15 Jan 2019 23:12:36 +0000 (23:12 +0000)]
AMDGPU: Raise the priority of MAD24 in instruction selection.
Summary:
We have seen performance regression when v_add3 is generated. The major reason is that the v_mad pattern
is broken when v_add3 is generated. We also see the register pressure increased. While we could not properly
estimate register pressure during instruction selection, we can give mad a higher priority.
In this work, we raise the priority for mad24 in selection and resolve the performance regression.
When generating a reproducer in LLDB we build up the mapping but don't
immediately copy over the files on the file system.
Rather than keeping a separate data structure with real and virtual
paths, we might as well reuse the entries already stored in the
YAMLVFSWriter to lazily copy over the files when needed.
[VFS] Move RedirectingFileSystem interface into header (NFC)
This moves the RedirectingFileSystem into the header so it can be
extended. This is needed in LLDB we need a way to obtain the external
path to deal with FILE* and file descriptor APIs.
Discussion on the mailing list:
http://lists.llvm.org/pipermail/llvm-dev/2018-November/127755.html
The path to the resource directory will end up being used in several
more places once the support for running check-hwasan lands. This
moves the definition to a central location so that it can be used
from those places.
Roman Lebedev [Tue, 15 Jan 2019 21:31:18 +0000 (21:31 +0000)]
X86DAGToDAGISel::matchBitExtract() with truncation (PR36419)
Summary:
Previously in D54095 i have added support for extraction of `lshr` from `X` if we are to produce `BEXTR`.
That was good, but the fix was partial, there was still [[ https://bugs.llvm.org/show_bug.cgi?id=36419 | PR36419 ]].
That pattern can also appear, roughly, when you have a large (64-bit) storage, and the consume bits from it.
It will not be unexpected if you will be doing further computations in 32-bit width.
And then the current code breaks, as the tests show.
The basic idea/pattern here is following:
1. We have `i64` input
2. We perform `i64` right-shift on it.
3. We `trunc`ate that shifted value
4. We do all further work (masking) in `i32`
Since we see `trunc`ation and not `lshr`, we give up, and stop trying to extract that right-shift.
BUT. The mask is `i32`, therefore we can extend both of the operands of the masking (`and`) to `i64`
and truncate the result after masking: https://rise4fun.com/Alive/K4B
```
Name: @bextr64_32_b1 -> @bextr64_32_b0
%shiftedval = lshr i64 %val, %numskipbits
%truncshiftedval = trunc i64 %shiftedval to i32
%widenumlowbits1 = zext i8 %numlowbits to i32
%notmask1 = shl nsw i32 -1, %widenumlowbits1
%mask1 = xor i32 %notmask1, -1
%res = and i32 %truncshiftedval, %mask1
=>
%shiftedval = lshr i64 %val, %numskipbits
%widenumlowbits = zext i8 %numlowbits to i64
%notmask = shl nsw i64 -1, %widenumlowbits
%mask = xor i64 %notmask, -1
%wideres = and i64 %shiftedval, %mask
%res = trunc i64 %wideres to i32
```
Thus, we are again able to extract that `lshr` into `BEXTR`'s control.
David Callahan [Tue, 15 Jan 2019 21:26:51 +0000 (21:26 +0000)]
treat invoke like call
Summary:
InvokeInst should be treated like CallInst and
assigned a separate discriminator. This is particularly
import when an Invoke is converted to a Call
during compilation and so can invalidate sample profile
data collected wtih different link time optimizations
Michael Trent [Tue, 15 Jan 2019 20:41:30 +0000 (20:41 +0000)]
llvm-objdump -m -D should disassemble all text segments
Summary:
When running llvm-objdump with the -macho option objdump will by default
disassemble only the __TEXT,__text section (or __TEXT_EXEC,__text when
disassembling MH_KEXT_BUNDLE files). The -disassemble-all option is
treated no diferently than -disassemble.
This change upates llvm-objdump's MachO parsing code to disassemble all
__text sections found in a file when -disassemble-all is specified. This
is useful for disassembling files with more than one __text section, or
when disassembling files whose __text section is not present in __TEXT.
I added a lit test case that verifies "llvm-objdump -m -d" and
"llvm-objdump -m -D" produce the expected results on a reference binary.
I also updated the CommandGuide documentation for llvm-objdump.rst and
verified it renders correctly as man and html.
Craig Topper [Tue, 15 Jan 2019 20:12:33 +0000 (20:12 +0000)]
[X86] Add versions of the avx512 gather intrinsics that take the mask as a vXi1 vector instead of a scalar
In keeping with our general direction of having the vXi1 type present in IR, this patch converts the mask argument for avx512 gather to vXi1. This can avoid k-register to GPR to k-register transitions late in codegen.
I left the existing intrinsics behind because they have many out of tree users such as ISPC. They generate their own code and don't go through the autoupgrade path which only works for bitcode and ll parsing. Ideally we will get them to migrate to target independent intrinsics, but it might be easier for them to migrate to these new intrinsics.
Craig Topper [Tue, 15 Jan 2019 19:59:19 +0000 (19:59 +0000)]
[Nios2] Remove Nios2 backend
As mentioned here http://lists.llvm.org/pipermail/llvm-dev/2019-January/129121.html This backend is incomplete and has not been maintained in several months.
Yury Delendik [Tue, 15 Jan 2019 18:14:12 +0000 (18:14 +0000)]
[WebAssembly] Fix updating/moving DBG_VALUEs in RegStackify
Summary:
As described in PR40209, there can be issues in DBG_VALUEs handling when multiple defs present in a BB. This patch
adds logic for detection of related to def DBG_VALUEs and localizes register update and movement to found DBG_VALUEs.
Nirav Dave [Tue, 15 Jan 2019 17:09:14 +0000 (17:09 +0000)]
[X86] Fix register class for assembly constraints to ST(7). NFCI.
Modify getRegForInlineAsmConstraint to return special singleton
register class when a constraint references ST(7) not RFP80 for which
ST(7) is not a member.
Jordan Rupprecht [Tue, 15 Jan 2019 17:04:40 +0000 (17:04 +0000)]
[llvm-readelf] Allow single-letter flags to be merged.
Summary:
This patch adds support for merged arguments (e.g. -SW == -S -W) for llvm-readelf.
No changes are intended for llvm-readobj. There are a few short flags (-sd, -sr, -st, -dt) that would conflict with grouped single letter flags, and having only some grouped flags might be confusing. So, allow merged flags for readelf compatibility, but force separate args for llvm-readobj. From what I can tell, these two-letter flags are only used with llvm-readobj, not llvm-readelf.
Jordan Rupprecht [Tue, 15 Jan 2019 16:57:23 +0000 (16:57 +0000)]
[llvm-objcopy] Use SHT_NOTE for added note sections.
Summary:
Fix llvm-objcopy to add .note sections as SHT_NOTEs. GNU objcopy overrides section flags for special sections. For `.note` sections (with the exception of `.note.GNU-stack`), SHT_NOTE is used.
Many other sections are special cased by libbfd, but `.note` is the only special section I can seem to find being used with objcopy --add-section.
See `.note` in context of the full list of special sections here: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=blob;f=bfd/elf.c;h=eb3e1828e9c651678b95a1dcbc3b124783d1d2be;hb=HEAD#l2675
Sanjay Patel [Tue, 15 Jan 2019 16:11:05 +0000 (16:11 +0000)]
[DAGCombiner] reduce buildvec of zexted extracted element to shuffle
The motivating case for this is shown in the first regression test. We are
transferring to scalar and back rather than just zero-extending with 'vpmovzxdq'.
That's a special-case for a more general pattern as shown here. In all tests,
we're avoiding the vector-scalar-vector moves in favor of vector ops.
We aren't producing optimal shuffle code in some cases though, so the patch is
limited to reduce regressions.
Zaara Syeda [Tue, 15 Jan 2019 15:08:01 +0000 (15:08 +0000)]
[SimpleLoopUnswitch] Increment stats counter for unswitching switch instruction
Increment statistics counter NumSwitches at unswitchNontrivialInvariants() for
unswitching a non-trivial switch instruction. This is to fix a bug that it
increments NumBranches even for the case of switch instruction.
There is no functional change in this patch.
Roman Lebedev [Tue, 15 Jan 2019 09:44:13 +0000 (09:44 +0000)]
[llvm][IRBuilder] Introspection for CreateAlignmentAssumption*() functions
Summary:
Clang calls these functions to produce IR for assume-aligned attributes.
I would like to teach UBSAN to verify these assumptions.
For that, i need to access the final pointer on which the check is performed,
and the actual `icmp` that does the check.
The alternative to this would be to fully re-implement this in clang.
This is a second commit, the original one was r351104,
which was mass-reverted in r351159 because 2 compiler-rt tests were failing.
George Rimar [Tue, 15 Jan 2019 09:19:18 +0000 (09:19 +0000)]
[llvm-objdump] - Cleanup the code. NFCI.
This is a cosmetic cleanup for the llvm-objdump code.
This patch:
* Renames things to match the official LLVM code style (lower case -> upper case).
* Removes few obviously excessive variables.
* Moves a few lines closer to the place of use, reorders the code a bit to simplify it,
to avoid doing excessive returns and to avoid using 'else` after returns.
I focused only on a llvm-objdump.h/llvm-objdump.cpp files. Few changes in the
MachODump.cpp and COFFDump.cpp are a result of llvm-objdump.h modification.
gn build: Rename llvm_host_triple to llvm_current_triple and have it use current_{cpu,os}.
This makes e.g. ToolChain::isCrossCompiling() in
clang/lib/Driver/ToolChain.cpp return the correct result
if the compiler was cross-compiled. This change also affects
llvm_default_target_triple, so cross-compiled compilers default to
targeting the cross-compilation target, which makes more sense than
the host that the compiler was compiled on.
This change will also be necessary in order for the correct triples
to appear in generated lit files for non-native targets.
gn build: Make a couple of improvements to the unix toolchain.
Add an asm tool (will be required for building sanitizer_common on
x64) and set a soname for DSOs so that anything that links against
them gets the correct DT_NEEDED.
Dan Gohman [Tue, 15 Jan 2019 06:58:13 +0000 (06:58 +0000)]
[WebAssembly] Support multilibs for wasm32 and add a wasm OS that uses it
This adds support for multilib paths for wasm32 targets, following
[Debian's Multiarch conventions], and also adds an experimental OS name in
order to test it.
Craig Topper [Tue, 15 Jan 2019 06:39:49 +0000 (06:39 +0000)]
[X86] Switch the triple on avx2-intrinsics-x86.ll to be -unknown-unknown instead of darwin so the constant pool entries will be filtered better by the script.
Darwin uses LCPI instead of .LCPI so the filter doesn't work.
This is silly, but it will help reduce some future some test diffs.
Thomas Lively [Tue, 15 Jan 2019 02:16:03 +0000 (02:16 +0000)]
[WebAssembly] Expand SIMD shifts while V8's implementation disagrees
Summary:
V8 currently implements SIMD shifts as taking an immediate operation,
which disagrees with the spec proposal and the toolchain
implementation. As a stopgap measure to get things working, unroll all
vector shifts. Since this is a temporary measure, there are no tests.
Marek Olsak [Tue, 15 Jan 2019 02:13:18 +0000 (02:13 +0000)]
AMDGPU: Add a fast path for icmp.i1(src, false, NE)
Summary:
This allows moving the condition from the intrinsic to the standard ICmp
opcode, so that LLVM can do simplifications on it. The icmp.i1 intrinsic
is an identity for retrieving the SGPR mask.
And we can also get the mask from and i1, or i1, xor i1.
Reid Kleckner [Tue, 15 Jan 2019 01:24:18 +0000 (01:24 +0000)]
[X86] Avoid clobbering ESP/RSP in the epilogue.
Summary:
In r345197 ESP and RSP were added to GR32_TC/GR64_TC, allowing them to
be used for tail calls, but this also caused `findDeadCallerSavedReg` to
think they were acceptable targets for clobbering. Filter them out.