Craig Topper [Fri, 8 Feb 2019 20:48:56 +0000 (20:48 +0000)]
Implementation of asm-goto support in LLVM
This patch accompanies the RFC posted here:
http://lists.llvm.org/pipermail/llvm-dev/2018-October/127239.html
This patch adds a new CallBr IR instruction to support asm-goto
inline assembly like gcc as used by the linux kernel. This
instruction is both a call instruction and a terminator
instruction with multiple successors. Only inline assembly
usage is supported today.
This also adds a new INLINEASM_BR opcode to SelectionDAG and
MachineIR to represent an INLINEASM block that is also
considered a terminator instruction.
There will likely be more bug fixes and optimizations to follow
this, but we felt it had reached a point where we would like to
switch to an incremental development model.
Patch by Craig Topper, Alexander Ivchenko, Mikhail Dvoretckii
Vedant Kumar [Fri, 8 Feb 2019 20:48:04 +0000 (20:48 +0000)]
[CodeExtractor] Restore outputs after creating exit stubs
When CodeExtractor saves the result of InvokeInst at the first insertion
point of the 'normal destination' basic block, this block can be omitted
in the outlined region, so store is placed outside of the function. The
suggested solution is to process saving outputs after creating exit
stubs for new function, and stores will be placed in that blocks before
return in this case.
Matt Arsenault [Fri, 8 Feb 2019 19:59:32 +0000 (19:59 +0000)]
AMDGPU: Eliminate GPU specific SubtargetFeatures
Inline compatability is determined from the individual feature
bits. These are just sets of the separate features, but will always be
treated as incompatible unless they are specifically ignored.
Defining the ISA version number here in tablegen would be nice, but it
turns out this wasn't actually used.
[DAGCombine] Optimize pow(X, 0.75) to sqrt(X) * sqrt(sqrt(X))
The sqrt case is faster and we already do this for the case where
the exponent is 0.25. This adds the 0.75 case which is also not
sensitive to signed zeros.
Summary:
The motivating use case is eliminating duplicate profile data registered
for the same inline function in two object files. Before this change,
users would observe multiple symbol definition errors with VC link, but
links with LLD would succeed.
Users (Mozilla) have reported that PGO works well with clang-cl and LLD,
but when using LLD without this static registration, we would get into a
"relocation against a discarded section" situation. I'm not sure what
happens in that situation, but I suspect that duplicate, unused profile
information was retained. If so, this change will reduce the size of
such binaries with LLD.
Now, Windows uses static registration and is in line with all the other
platforms.
Simon Pilgrim [Fri, 8 Feb 2019 18:57:38 +0000 (18:57 +0000)]
[TargetLowering] Use ISD::FSHR in expandFixedPointMul
Replace OR(SHL,SRL) pattern with ISD::FSHR (legalization expands this later if necessary) - this helps with the scale == 0 'undefined' drop-through case that was discussed on D55720.
Teresa Johnson [Fri, 8 Feb 2019 17:08:27 +0000 (17:08 +0000)]
ArgumentPromotion should copy all metadata to new Function
Summary:
ArgumentPromotion had code to specifically move the dbg metadata over to
the new function, but other metadata such as the function_entry_count
!prof metadata was not. Replace code that moved dbg metadata with a call
to copyMetadata. The old metadata is automatically removed when the old
Function is removed.
Craig Topper [Fri, 8 Feb 2019 17:07:54 +0000 (17:07 +0000)]
[X86] Remove isReMaterializable from X87 floating point constant loads and constant pool loads.
Summary: These instructions update FPSW so they aren't generically safe to rematerialize into any location if FPSW is live for a comparison result. They also use FPCW for exception masking control. Though the only exception they can generate is stack overflow and we manage the stack ourselves so that's not really going to occur.
Carl Ritson [Fri, 8 Feb 2019 15:41:11 +0000 (15:41 +0000)]
[AMDGPU] Fix CS scratch setup on pre-GCN3 ASICs
Summary:
Prior to GCN3 s_load_dword offsets are in dwords rather than bytes.
Thus the scratch buffer descriptor offset must be adjusted for pre-GCN3 ASICs.
Petar Avramovic [Fri, 8 Feb 2019 14:27:23 +0000 (14:27 +0000)]
[MIPS GlobalISel] Select any extending load and truncating store
Make behavior of G_LOAD in widenScalar same as for G_ZEXTLOAD and
G_SEXTLOAD. That is perform widenScalarDst to size given by the target
and avoid additional checks in common code. Targets can reorder or add
additional rules in LegalizeRuleSet for the opcode to achieve desired
behavior.
Select extending load that does not have specified type of extension
into zero extending load.
Select truncating store that stores number of bytes indicated by size
in MachineMemoperand.
dpp move with uses and old reg initializer should be in the same BB.
bound_ctrl:0 is only considered when bank_mask and row_mask are fully enabled (0xF). Otherwise the old register value is checked for identity.
Added add, subrev, and, or instructions to the old folding function.
Kill flag is cleared for the src0 (DPP register) as it may be copied into more than one user.
[DWARF] LLVM ERROR: Broken function found, while removing Debug Intrinsics.
Check that when SimplifyCFG is flattening a 'br', all their debug intrinsic instructions are removed, including any dbg.label referencing a label associated with the basic blocks being removed.
Hans Wennborg [Fri, 8 Feb 2019 10:16:45 +0000 (10:16 +0000)]
Revert r353424 "[llvm-ar][libObject] Fix relative paths when nesting thin archives."
This broke the Chromium build on Windows, see https://crbug.com/930058
> Summary:
> When adding one thin archive to another, we currently chop off the relative path to the flattened members. For instance, when adding `foo/child.a` (which contains `x.txt`) to `parent.a`, whe
> lattening it we should add it as `foo/x.txt` (which exists) instead of `x.txt` (which does not exist).
>
> As a note, this also undoes the `IsNew` parameter of handling relative paths in r288280. The unit test there still passes.
>
> This was reported as part of testing the kernel build with llvm-ar: https://patchwork.kernel.org/patch/10767545/ (see the second point).
>
> Reviewers: mstorsjo, pcc, ruiu, davide, david2050
>
> Subscribers: hiraditya, llvm-commits
>
> Tags: #llvm
>
> Differential Revision: https://reviews.llvm.org/D57842
Max Kazantsev [Fri, 8 Feb 2019 08:12:41 +0000 (08:12 +0000)]
[LoopSimplifyCFG] Use DTU.applyUpdates instead of insert/deleteEdge
`insert/deleteEdge` methods in DTU can make updates incorrectly in some cases
(see https://bugs.llvm.org/show_bug.cgi?id=40528), and it is recommended to
use `applyUpdates` methods instead when it is needed to make a mass update in CFG.
Sam Parker [Fri, 8 Feb 2019 07:57:42 +0000 (07:57 +0000)]
[ARM] Add OptMinSize to ARMSubtarget
In many places in the backend, we like to know whether we're
optimising for code size and this is performed by checking the
current machine function attributes. A subtarget is created on a
per-function basis, so it's possible to know when we're compiling for
code size on construction so record this in the new object.
Sergey Dmitriev [Fri, 8 Feb 2019 06:55:18 +0000 (06:55 +0000)]
[CodeExtractor] Update function's assumption cache after extracting blocks from it
Summary: Assumption cache's self-updating mechanism does not correctly handle the case when blocks are extracted from the function by the CodeExtractor. As a result function's assumption cache may have stale references to the llvm.assume calls that were moved to the outlined function. This patch fixes this problem by removing extracted llvm.assume calls from the function’s assumption cache.
Caroline Tice [Fri, 8 Feb 2019 00:51:33 +0000 (00:51 +0000)]
lvm-dwarfdump: Stop counting out-of-line subprogram in the "inlined functions" statistic.
DW_TAG_subprogram DIEs should not be counted in the inlined function statistic. This also addresses the source variables count, as that uses the inlined function count in its calculations.
Craig Topper [Fri, 8 Feb 2019 00:44:39 +0000 (00:44 +0000)]
[X86] Add FPCW as a register and start using it as an implicit use on floating point instructions.
Summary:
FPCW contains the rounding mode control which we manipulate to implement fp to integer conversion by changing the roudning mode, storing the value to the stack, and then changing the rounding mode back. Because we didn't model FPCW and its dependency chain, other instructions could be scheduled into the middle of the sequence.
This patch introduces the register and adds it as an implciit def of FLDCW and implicit use of the FP binary arithmetic instructions and store instructions. There are more instructions that need to be updated, but this is a good start. I believe this fixes at least the reduced test case from PR40529.
Eli Friedman [Fri, 8 Feb 2019 00:23:35 +0000 (00:23 +0000)]
[AArch64] Fix condition for "high-vector" DUP optimizations.
AArch64 NEON has a bunch of instructions with a "2" suffix that extract
the top half of the source vectors, instead of the bottom half. We have
some DAGCombines to try to take advantage of that. However, they
assumed that any EXTRACT_VECTOR was extracting the high half of the
vector in question.
This issue has apparently existed since the AArch64 backend was merged.
Petar Jovanovic [Thu, 7 Feb 2019 22:57:33 +0000 (22:57 +0000)]
[mips][micromips] Fix how values in .gcc_except_table are calculated
When a landing pad is calculated in a program that is compiled for micromips
with -fPIC flag, it will point to an even address.
Such an error will cause a segmentation fault, as the instructions in
micromips are aligned on odd addresses. This patch sets the last bit of the
offset where a landing pad is, to 1, which will effectively be an odd
address and point to the instruction exactly.
Dan Gohman [Thu, 7 Feb 2019 22:03:32 +0000 (22:03 +0000)]
[WebAssembly] Fix imported function symbol names that differ from their import names in the .o format
Add a flag to allow symbols to have a wasm import name which differs from the
linker symbol name, allowing the linker to link code using the import_module
attribute.
[InstCombine] Optimize `atomicrmw <op>, 0` into `load atomic` when possible
This commit teaches InstCombine how to replace an atomicrmw operation
into a simple load atomic.
For a given `atomicrmw <op>`, this is possible when:
1. The ordering of that operation is compatible with a load (i.e.,
anything that doesn't have a release semantic).
2. <op> does not modify the value being stored
Sanjay Patel [Thu, 7 Feb 2019 21:12:01 +0000 (21:12 +0000)]
[InstCombine] Fix crashing from (icmp (bitcast ([su]itofp X)), Y)
This fixes a class of bugs introduced by D44367,
which transforms various cases of icmp (bitcast ([su]itofp X)), Y to icmp X, Y.
If the bitcast is between vector types with a different number of elements,
the current code will produce bad IR along the lines of: icmp <N x i32> ..., <M x i32> <...>.
This patch suppresses the transform if the bitcast changes the number of vector elements.
This is part of https://bugs.llvm.org/show_bug.cgi?id=40442.
Vector legalization is implemented for the add/sub overflow opcodes.
UMULO/SMULO are also handled as far as legalization is concerned, but
they don't support vector expansion yet (so no tests for them).
The vector result widening implementation is suboptimal, because it
could result in a legalization loop.
Matt Arsenault [Thu, 7 Feb 2019 19:37:44 +0000 (19:37 +0000)]
GlobalISel: Implement narrowScalar for shift main type
This is pretty much directly ported from SelectionDAG. Doesn't include
the shift by non-constant but known bits version, since there isn't a
globalisel version of computeKnownBits yet.
This shows a disadvantage of targets not specifically which type
should be used for the shift amount. If type 0 is legalized before
type 1, the operations on the shift amount type use the wider type
(which are also less likely to legalize). This can be avoided by
targets specifying legalization actions on type 1 earlier than for
type 0.
Teresa Johnson [Thu, 7 Feb 2019 17:50:35 +0000 (17:50 +0000)]
[HotColdSplit] With PGO add profile entry metadata to split cold function
Summary:
When compiling with profile data, ensure the split cold function gets
cold function_entry_count metadata (just use 0 since it should be cold).
Otherwise with function sections it will not be placed in the unlikely
text section with other cold code.
Sanjay Patel [Thu, 7 Feb 2019 17:43:34 +0000 (17:43 +0000)]
[DAGCombiner] fold add/sub with bool operand based on target's boolean contents
I noticed that we are missing this canonicalization in IR:
rL352515
...and then realized that we don't get this right in SDAG either,
so this has to be fixed first regardless of what we choose to do in IR.
The existing fold was limited to scalars and using the wrong predicate
to guard the transform. We have a boolean contents TLI query that can
be used to decide which direction to fold.
This may eventually lead back to the problems/question in:
https://bugs.llvm.org/show_bug.cgi?id=40486
...but it makes no difference to that yet.
Sanjay Patel [Thu, 7 Feb 2019 17:10:49 +0000 (17:10 +0000)]
[x86] split more 256/512-bit shuffles in lowering
This is intentionally a small step because it's hard to know exactly
where we might introduce a conflicting transform with the code that
tries to form wider shuffles. But I think this is safe - if we have
a wide shuffle with 2 operands, then we should do better with an
extract + narrow shuffle.
[llvm-ar][libObject] Fix relative paths when nesting thin archives.
Summary:
When adding one thin archive to another, we currently chop off the relative path to the flattened members. For instance, when adding `foo/child.a` (which contains `x.txt`) to `parent.a`, when flattening it we should add it as `foo/x.txt` (which exists) instead of `x.txt` (which does not exist).
As a note, this also undoes the `IsNew` parameter of handling relative paths in r288280. The unit test there still passes.
This was reported as part of testing the kernel build with llvm-ar: https://patchwork.kernel.org/patch/10767545/ (see the second point).
Alexandre Ganea [Thu, 7 Feb 2019 15:24:18 +0000 (15:24 +0000)]
[CodeView] Fix cycles in debug info when merging Types with global hashes
When type streams with forward references were merged using GHashes, cycles
were introduced in the debug info. This was caused by
GlobalTypeTableBuilder::insertRecordAs() not inserting the record on the second
pass, thus yielding an empty ArrayRef at that record slot. Later on, upon PDB
emission, TpiStreamBuilder::commit() would skip that empty record, thus
offseting all indices that came after in the stream.
This solution comes in two steps:
1. Fix the hash calculation, by doing a multiple-step resolution, iff there are
forward references in the input stream.
2. Fix merge by resolving with multiple passes, therefore moving records with
forward references at the end of the stream.
This patch also adds support for llvm-readoj --codeview-ghash.
Finally, fix dumpCodeViewMergedTypes() which previously could reference deleted
memory.
Sam Parker [Thu, 7 Feb 2019 13:32:54 +0000 (13:32 +0000)]
[LSR] Generate cross iteration indexes
Modify GenerateConstantOffsetsImpl to create offsets that can be used
by indexed addressing modes. If formulae can be generated which
result in the constant offset being the same size as the recurrence,
we can generate a pre-indexed access. This allows the pointer to be
updated via the single pre-indexed access so that (hopefully) no
add/subs are required to update it for the next iteration. For small
cores, this can significantly improve performance DSP-like loops.
Jiong Wang [Thu, 7 Feb 2019 10:43:09 +0000 (10:43 +0000)]
[BPF] add code-gen support for JMP32 instructions
JMP32 instructions has been added to eBPF ISA. They are 32-bit variants of
existing BPF conditional jump instructions, but the comparison happens on
low 32-bit sub-register only, therefore some unnecessary extensions could
be saved.
JMP32 instructions will only be available for -mcpu=v3. Host probe hook has
been updated accordingly.
JMP32 instructions will only be enabled in code-gen when -mattr=+alu32
enabled, meaning compiling the program using sub-register mode.
For JMP32 encoding, it is a new instruction class, and is using the
reserved eBPF class number 0x6.
This patch has been tested by compiling and running kernel bpf selftests
with JMP32 enabled.
Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@353384 91177308-0d34-0410-b5e6-96231b3b80d8
Craig Topper [Thu, 7 Feb 2019 06:21:28 +0000 (06:21 +0000)]
[BranchFolding] Remove dead code for handling EHPad blocks
Summary: This code tries to handle the case where IBB is an EHPad, but there's an earlier check that uses PBB->hasEHPadSuccessor(). Where PBB is a predecessor of IBB. The hasEHPadSuccessor function would have visited IBB and seen that it was an EHPad and returned false. This would prevent us from reaching this code with IBB as an EHPad.
Looks like this code was originally added in rL37427 (ancient) and made dead in rL143001.