Sanjay Patel [Tue, 21 May 2019 21:45:24 +0000 (21:45 +0000)]
[InstCombine] add more tests for shuffle folding; NFC
As discussed in D62024, we want to limit any potential IR
transforms of shuffles to cases where we know the SDAG
conversion would result in equivalent patterns for these
IR variants.
Enable CMake policy 77. This alters the behavior of option. The old behavior
would remove the value of the option from the cache and create a new one. The
new behavior does not create the variable if it is defined already. This ensures
that subsequent reconfigures will behave identically. This seems better than the
setting of OLD - the desire is to ensure that it is set to OLD or NEW.
Register coalescer fails for the test in the patch with the assertion in
JoinVals::ConflictResolution `DefMI != nullptr'. It attempts to join
live intervals for two adjacent instructions and erase the copy:
The LI needs to be adjusted to kill subrange for the erased instruction
and extend the subrange of the original def. That was done for the main
interval only but not for the subrange. As a result subrange had a VNI
pointing to the erased slot resulting in the above failure.
Leonard Chan [Tue, 21 May 2019 19:17:19 +0000 (19:17 +0000)]
[Intrinsic] Signed Fixed Point Saturation Multiplication Intrinsic
Add an intrinsic that takes 2 signed integers with the scale of them provided
as the third argument and performs fixed point multiplication on them. The
result is saturated and clamped between the largest and smallest representable
values of the first 2 operands.
This is a part of implementing fixed point arithmetic in clang where some of
the more complex operations will be implemented as intrinsics.
Sanjay Patel [Tue, 21 May 2019 18:28:22 +0000 (18:28 +0000)]
[SelectionDAG] remove redundant code; NFCI
getNode() squashes concatenation of undefs via FoldCONCAT_VECTORS():
// Concat of UNDEFs is UNDEF.
if (llvm::all_of(Ops, [](SDValue Op) { return Op.isUndef(); }))
return DAG.getUNDEF(VT);
Don Hinton [Tue, 21 May 2019 17:56:45 +0000 (17:56 +0000)]
[cmake] Add custom command to touch archives on Darwin so ninja won't rebuild them.
Summary:
clang and newer versions of ninja use high-resolutions timestamps, but
older versions of libtool on Darwin don't, so the archive will often
get an older timestamp than the last object that was added or updated.
To fix this, we add a custom command to touch the archive after it's
been built so that ninja won't rebuild it unnecessarily the next time
it's run.
Chris Bieneman [Tue, 21 May 2019 16:29:31 +0000 (16:29 +0000)]
[docs] Add new document on building distributions
Summary:
This document is an attempt to provide a guide for best practices for using the LLVM build system to generate distributable LLVM-based tools.
Most of the document is geared toward distributions of LLVM-based toolchains, but much of it also applies to distributing other LLVM-based tools and libraries.
Reviewers: tstellar, phosek, jroelofs, hans, sylvestre.ledru
Clement Courbet [Tue, 21 May 2019 13:34:12 +0000 (13:34 +0000)]
[MergeICmps][NFC] Make BCEAtom move-only.
And handle for self-move. This is required so that llvm::sort can work
with EXPENSIVE_CHECKS, as it will do a random shuffle of the input
which can result in self-moves.
Florian Hahn [Tue, 21 May 2019 13:04:53 +0000 (13:04 +0000)]
[ScheduleDAGInstrs] Compute topological ordering on demand.
In most cases, the topological ordering does not get changed in
ScheduleDAGInstrs. We can compute the ordering on demand, similar to
D60125.
This drastically cuts down the number of times we need to compute the
topological ordering, e.g. for SPEC2006, SPEC2k and MultiSource, we get
the following stats for -O3 -flto on X86 (showing the top reductions,
with small absolute values filtered). The smallest reduction is -50%.
Slightly positive impact on compile-time (-0.1 % geomean speedup for
test-suite + SPEC & co, with -O1 on X86)
Paul Robinson [Tue, 21 May 2019 11:59:03 +0000 (11:59 +0000)]
[DebugInfo] Handle '# line "file"' correctly for asm source.
This provides the correct file path for the original source, rather
than the preprocessed source.
Bob Haarman [Tue, 21 May 2019 11:53:41 +0000 (11:53 +0000)]
Revert r360902 "Resubmit: [Salvage] Change salvage debug info ..."
This reverts commit rr360902. It caused an assertion failure in
lib/IR/DebugInfoMetadata.cpp: Assertion `(OffsetInBits + SizeInBits <=
FragmentSizeInBits) && "new fragment outside of original fragment"'
failed.
Fangrui Song [Tue, 21 May 2019 10:41:25 +0000 (10:41 +0000)]
[PPC64] Update LocalEntry from assigned symbols
On PowerPC64 ELFv2 ABI, functions may have 2 entry points: global and local.
The local entry point location of a function is stored in the st_other field of the symbol, as an offset relative to the global entry point.
In order to make symbol assignments (e.g. .equ/.set) work properly with this, PPCTargetELFStreamer already copies the local entry bits from the source symbol to the destination one, on emitAssignment(). The problem is that this copy is performed only at the assignment location, where the source symbol may not yet have processed the .localentry directive, that sets the local entry. This may cause the destination symbol to end up with wrong local entry information. Other symbol info is not affected by this because, in this case, the destination symbol value is actually a symbol reference.
This change keeps track of these assignments, and update all needed st_other fields when finish() is called.
Florian Hahn [Tue, 21 May 2019 10:05:26 +0000 (10:05 +0000)]
[AArch64] Skip mask checks for masks with an odd number of elements.
Some checks in isShuffleMaskLegal expect an even number of elements,
e.g. isTRN_v_undef_Mask or isUZP_v_undef_Mask, otherwise they access
invalid elements and crash. This patch adds checks to the impacted
functions.
Sam Parker [Tue, 21 May 2019 07:56:47 +0000 (07:56 +0000)]
[ARM][CGP] Skip nuw in PrepareConstants
PrepareConstants step converts add/sub with 'negative' immediates to
sub/add with a 'positive' imm to make promotion more simple. nuw
already states that the add shouldn't cause an unsigned wrap, so
it shouldn't need any tweaking. Plus, we also don't allow a sub with
a 'negative' immediate to be safe wrap, so this functionality has
been removed. The PrepareConstants step now just handles the add
instructions that we've determined would be safe if they wrap around
zero.
Petr Hosek [Tue, 21 May 2019 07:13:58 +0000 (07:13 +0000)]
[CMake] Specify component for all target types
This addresses an issue introduced in r360230 which broke existing
use cases of LLVM_DISTRIBUTION_COMPONENTS since ARCHIVE and LIBRARY
target types are no longer handled as components.
Dylan McKay [Tue, 21 May 2019 06:38:02 +0000 (06:38 +0000)]
Add TargetLoweringInfo hook for explicitly setting the ABI calling convention endianess
Summary:
The endianess used in the calling convention does not always match the
endianess of the target on all architectures, namely AVR.
When an argument is too large to be legalised by the architecture and is
split for the ABI, a new hook TargetLoweringInfo::shouldSplitFunctionArgumentsAsLittleEndian
is queried to find the endianess that function arguments must be laid
out in.
This approach was recommended by Eli Friedman.
Originally reported in https://github.com/avr-rust/rust/issues/129.
Nico Weber [Tue, 21 May 2019 03:01:01 +0000 (03:01 +0000)]
Tweaks for setting CMAKE_LINKER to lld-link
- Just look for "lld-link", not "lld-link.exe".
llvm/cmake/platforms/WinMsvc.cmake for example sets CMAKE_LINKER to
lld-link without .exe
- Stop passing -gwarf to the compiler in sanitizer options when lld is
enabled -- there's no reason to use different debug information keyed
off the linker. (If this was for MinGW, we should check for that
instead.)
Nick Desaulniers [Mon, 20 May 2019 22:17:43 +0000 (22:17 +0000)]
[ORC] fix use-after-move. NFC
Summary:
scan-build flagged a potential use-after-move in debug builds. It's not
safe that a moved from value contains anything but garbage. Manually
DRY up these repeated expressions.
Matt Arsenault [Mon, 20 May 2019 22:04:42 +0000 (22:04 +0000)]
AMDGPU: Force skip branches over calls
Unfortunately the way SIInsertSkips works is backwards, and is
required for correctness. r338235 added handling of some special cases
where skipping is mandatory to avoid side effects if no lanes are
active. It conservatively handled asm correctly, but the same logic
needs to apply to calls.
Usually the call sequence code is larger than the skip threshold,
although the way the count is computed is really broken, so I'm not
sure if anything was likely to really hit this.
Lang Hames [Mon, 20 May 2019 20:53:05 +0000 (20:53 +0000)]
[Support] Renamed member 'Size' to 'AllocatedSize' in MemoryBlock and OwningMemoryBlock.
Rename member 'Size' to 'AllocatedSize' in order to provide a hint that the
allocated size may be different than the requested size. Comments are added to
clarify this point. Updated the InMemoryBuffer in FileOutputBuffer.cpp to track
the requested buffer size.
Nikita Popov [Mon, 20 May 2019 19:13:04 +0000 (19:13 +0000)]
[LFTR] Add additional PR31181 test cases
One case where overflow happens in the first loop iteration, and
two cases where we switch to a dynamically dead IV with post/pre
increment, respectively.
Craig Topper [Mon, 20 May 2019 17:37:52 +0000 (17:37 +0000)]
[X86] Add test case for r361177.
That commit makes sure we flush PendingExports in SelectDAGBuilder
before we create INLINEASM_BR. Unfortunatley, I haven't yet found
a CodeGen failure without that change.
This commit uses the debug output from SelectionDAG to at least
ensure we build the DAG correctly.
Craig Topper [Mon, 20 May 2019 17:08:02 +0000 (17:08 +0000)]
[SelectionDAGBuilder] Flush PendingExports before creating INLINEASM_BR node for asm goto.
Since INLINEASM_BR is a terminator we need to flush the pending exports before
emitting it. If we don't do this, a TokenFactor can be inserted between it and
the BR instruction emitted to finish the callbr lowering.
It looks like nodes are glued to the INLINEASM_BR so I had to make sure we emit
the TokenFactor before that.
Nick Desaulniers [Mon, 20 May 2019 16:58:59 +0000 (16:58 +0000)]
[DWARF] hoist nullptr checks. NFC
Summary:
This was flagged in https://www.viva64.com/en/b/0629/ under "Snippet No.
15" (see under #13). It looks like PVS studio flags nullptr checks where
the ptr is used inbetween creation and checking against nullptr.
Nick Desaulniers [Mon, 20 May 2019 16:48:09 +0000 (16:48 +0000)]
[INLINER] allow inlining of blockaddresses if sole uses are callbrs
Summary:
It was supposed that Ref LazyCallGraph::Edge's were being inserted by
inlining, but that doesn't seem to be the case. Instead, it seems that
there was no test for a blockaddress Constant in an instruction that
referenced the function that contained the instruction. Ex:
When iterating blockaddresses, do not add the function they refer to
back to the worklist if the blockaddress is referring to the contained
function (as opposed to an external function).
Because blockaddress has sligtly different semantics than GNU C's
address of labels, there are 3 cases that can occur with blockaddress,
where only 1 can happen in GNU C due to C's scoping rules:
* blockaddress is within the function it refers to (possible in GNU C).
* blockaddress is within a different function than the one it refers to
(not possible in GNU C).
* blockaddress is used in to declare a global (not possible in GNU C).
This patch adjusts the iteration of blockaddresses in
LazyCallGraph::visitReferences to not revisit the blockaddresses
function in the first case.
The Linux kernel contains code that's not semantically valid at -O0;
specifically code passed to asm goto. It requires that asm goto be
inline-able. This patch conservatively does not attempt to handle the
more general case of inlining blockaddresses that have non-callbr users
(pr/39560).
Bjorn Pettersson [Mon, 20 May 2019 16:41:08 +0000 (16:41 +0000)]
[AMDGPU] Fix std::array initializers to avoid warnings with older tool chains. NFC
A std::array is implemented as a template with an array
inside a struct. Older versions of clang, like 3.6,
require an extra set of curly braces around std::array
initializations to avoid warnings.
The C++ language was changed regarding this by CWG 1270.
So more modern tool chains does not complaing even if
leaving out one level of braces.
Craig Topper [Mon, 20 May 2019 16:27:09 +0000 (16:27 +0000)]
[Intrinsics] Merge lround.i32 and lround.i64 into a single intrinsic with overloaded result type. Make result type for llvm.llround overloaded instead of fixing to i64
We shouldn't really make assumptions about possible sizes for long and long long. And longer term we should probably support vectorizing these intrinsics. By making the result types not fixed we can support vectors as well.
Nikita Popov [Mon, 20 May 2019 16:09:22 +0000 (16:09 +0000)]
[SDAG] Vector op legalization for overflow ops
Fixes issue reported by aemerson on D57348. Vector op legalization
support is added for uaddo, usubo, saddo and ssubo (umulo and smulo
were already supported). As usual, by extracting TargetLowering methods
and calling them from vector op legalization.
Vector op legalization doesn't really deal with multiple result nodes,
so I'm explicitly performing a recursive legalization call on the
result value that is not being legalized.
There are some existing test changes because expansion happens
earlier, so we don't get a DAG combiner run in between anymore.
George Rimar [Mon, 20 May 2019 15:41:48 +0000 (15:41 +0000)]
[llvm-readelf] - Rework how we parse the .dynamic section.
This is a result of what I found during my work on https://bugs.llvm.org/show_bug.cgi?id=41679.
Previously LLVM readelf took the information about .dynamic section
from its PT_DYNAMIC segment only. GNU tools have a bit different logic.
They also use the information from the .dynamic section header if it is available.
This patch changes the code to improve the compatibility with the GNU Binutils.
Matt Arsenault [Mon, 20 May 2019 14:09:36 +0000 (14:09 +0000)]
RegAlloc: Fix verifier error with undef identity copies
The code did not match the example in the comment, and was checking
the undef flag on the copy dest instead of source. The existing tests
were only hitting the > 2 operands case.
[DebugInfo] Update loop metadata for inlined loops
Currently, when a loop is cloned while inlining function (A) into function (B)
the loop metadata is copied and then not modified at all. The loop metadata can
encode the loop's start and end DILocations. Therefore, the new inlined loop in
function (B) may have loop metadata which shows start and end locations residing
in function (A).
This patch ensures loop metadata is updated while inlining so that the start and
end DILocations are given the "inlinedAt" operand. I've also added a regression
test for this.
This fix is required for D60831 because that patch uses loop metadata to
determine the DILocation for the branches of new loop preheaders.
Sander de Smalen [Mon, 20 May 2019 09:54:06 +0000 (09:54 +0000)]
Match types of accumulator and result for llvm.experimental.vector.reduce.fadd/fmul
The scalar start/accumulator value of the fadd- and fmul reduction
should match the result type of the reduction, as well as the vector
element-type of the input vector. Although this was not explicitly
specified in the LangRef, it was taken for granted in code implementing
the reductions. The patch also fixes the LangRef by adding this
constraint.
[DebugInfo] Update loop metadata for inlined loops
Summary:
Currently, when a loop is cloned while inlining function (A) into function (B) the loop metadata is copied and then not modified at all. The loop metadata can encode the loop's start and end DILocations. Therefore, the new inlined loop in function (B) may have loop metadata which shows start and end locations residing in function (A).
This patch ensures loop metadata is updated while inlining so that the start and end DILocations are given the "inlinedAt" operand. I've also added a regression test for this.
This fix is required for D60831 because that patch uses loop metadata to determine the DILocation for the branches of new loop preheaders.
Carl Ritson [Mon, 20 May 2019 07:20:12 +0000 (07:20 +0000)]
[AMDGPU] gfx1010 Avoid SMEM WAR hazard for some s_waitcnt values
Summary:
Avoid introducing hazard mitigation when lgkmcnt is reduced to 0.
Clarify code comments to explain assumptions made for this hazard
mitigation. Expand and correct test cases to cover variants of
s_waitcnt.