Philip Reames [Fri, 15 Feb 2019 21:23:51 +0000 (21:23 +0000)]
[InstCombine] Convert atomicrmws to xchg or store where legal
Implement two more transforms of atomicrmw:
1) We can convert an atomicrmw which produces a known value in memory into an xchg instead.
2) We can convert an atomicrmw xchg w/o users into a store for some orderings.
Craig Topper [Fri, 15 Feb 2019 19:21:39 +0000 (19:21 +0000)]
[X86] Move all the SSE legality checks out of FP_TO_INTHelper and up to LowerFP_TO_INT. NFCI
These checks aren't needed on the call to FP_TO_INTHelper from the type legalizer for splitting i64. We always want to use X87 FIST/FISTT to memory there.
Moving up the SSE checks will allow this routine to focus on what it cares about and makes its return semantics cleaner.
Jonas Paulsson [Fri, 15 Feb 2019 19:13:55 +0000 (19:13 +0000)]
Recommit "[SystemZ] Do not emit VEXTEND or VROUND nodes without vector support."
It seems there were some problem with using a .mir test. For some reason
doing '-stop-before=codegenprepare' and then '-start-before=codegenprepare'
on the output .mir file results in the NoVRegs Property after instruction
selection.
Recommitting the same test as an .ll file instead.
Vedant Kumar [Fri, 15 Feb 2019 18:46:58 +0000 (18:46 +0000)]
[CodeExtractor] Do not lift lifetime.end markers for region inputs
If a lifetime.end marker occurs along one path through the extraction
region, but not another, then it's still incorrect to lift the marker,
because there is some path through the extracted function which would
ordinarily not reach the marker. If the call to the extracted function
is in a loop, unrolling can cause inputs to the function to become
optimized out as undef after the first iteration.
To prevent incorrect stack slot merging in the calling function, it
should be sufficient to lift lifetime.start markers for region inputs.
I've tested this theory out by doing a stage2 check-all with randomized
splitting enabled.
This is a follow-up to r353973, and there's additional context for this
change in https://reviews.llvm.org/D57834.
Vedant Kumar [Fri, 15 Feb 2019 18:46:44 +0000 (18:46 +0000)]
[HotColdSplit] Schedule splitting late to fix perf regression
With or without PGO data applied, splitting early in the pipeline
(either before the inliner or shortly after it) regresses performance
across SPEC variants. The cause appears to be that splitting hides
context for subsequent optimizations.
Schedule splitting late again, in effect reversing r352080, which
scheduled the splitting pass early for code size benefits (documented in
https://reviews.llvm.org/D57082).
Matt Arsenault [Fri, 15 Feb 2019 15:24:34 +0000 (15:24 +0000)]
GlobalISel: Fix inadequate verification of g_build_vector
Testing based on the total size of the elements failed to catch a few
invalid scenarios, so explicitly check the number of elements/operands
and types.
This failed to catch situations like
<4 x s16> = G_BUILD_VECTOR s32, s32 since the total size added
up. This also would fail to catch an implicit conversion between
pointers and scalars.
Matt Arsenault [Fri, 15 Feb 2019 15:24:31 +0000 (15:24 +0000)]
Try to organize MachineVerifier tests
The Verifier is separate from the MachineVerifier, so move it to a
different directory. Some other verifier tests were scattered in
target codegen tests as well (although I'm sure I missed some). Work
towards using a more consistent naming scheme to make it clearer where
the gaps still are for generic instructions.
Serge Guelton [Fri, 15 Feb 2019 15:17:29 +0000 (15:17 +0000)]
OptionalStorage implementation for trivial type, take III
This is another attempt at implementating optional storage
for trivially copyable type, using an union instead of a
raw buffer to hold the actual storage. This make it possible
to get rid of the reinterpret_cast, and hopefully to fix the UB
of the previous attempts.
This validates fine on my laptop for gcc 8.2 and gcc 4.8, I'll
revert if it breaks the validation.
Clement Courbet [Fri, 15 Feb 2019 14:17:17 +0000 (14:17 +0000)]
[MergeICmps] Make base ordering really deterministic.
Summary:
The idea is that we now manipulate bases through a `unsigned BaseID` based on
order of appearance in the comparison chain rather than through the `Value*`.
Fixes 40714.
Reviewers: gchatelet
Subscribers: mgrang, jfb, jdoerfert, llvm-commits, hans
Hans Wennborg [Fri, 15 Feb 2019 12:20:33 +0000 (12:20 +0000)]
Speculatively revert r354051 "Recommit Optional specialization for trivially copyable types"
and
r354055 "Optional specialization for trivially copyable types, part2"
These are suspected to cause Clang to get miscompiled on Ubuntu 14.04
(Trusty) which uses GCC 4.8.4. Reverting for an hour to see if this
helps. See llvm-commits thread.
> Recommit Optional specialization for trivially copyable types
>
> Unfortunately the original code gets misscompiled by GCC (at least 8.1),
> this is a tentative workaround using std::memcpy instead of inplace new
> for trivially copyable types. I'll revert if it breaks.
>
> Original revision: https://reviews.llvm.org/D57097
Sam Parker [Fri, 15 Feb 2019 11:50:21 +0000 (11:50 +0000)]
[BPI] Look through bitcasts in calcZeroHeuristic
Constant hoisting may have hidden a constant behind a bitcast so that
it isn't folded into its users. However, this prevents BPI from
calculating some of its heuristics that are based upon constant
values. So, I've added a simple helper function to look through these
casts.
Simon Pilgrim [Fri, 15 Feb 2019 11:39:21 +0000 (11:39 +0000)]
[X86][AVX] lowerShuffleAsLanePermuteAndPermute - fully populate the lane shuffle mask (PR40730)
As detailed on PR40730, we are not correctly filling in the lane shuffle mask (D53148/rL344446) - we fill in for the correct src lane but don't add it to the correct mask element, so any reference to the correct element is likely to see an UNDEF mask index.
This allows constant folding to propagate UNDEFs prior to the lane mask being (correctly) lowered to vperm2f128.
This patch fixes the issue by fully populating the lane shuffle mask - this is more than is necessary (if we only filled in the required mask elements we might be able to match other shuffle instructions - broadcasts etc.), but its the most cautious approach as this needs to be cherrypicked into the 8.0.0 release branch.
Diana Picus [Fri, 15 Feb 2019 10:50:02 +0000 (10:50 +0000)]
[ARM GlobalISel] Style fix. NFCI
Add the opcode for ADDrr / t2ADDrr to the Opcode cache, as we did for
all other opcodes where the handling is otherwise the same between arm
mode and thumb2.
Sam Parker [Fri, 15 Feb 2019 09:04:39 +0000 (09:04 +0000)]
[ARM CGP] Fix ConvertTruncs
ConvertTruncs is used to replace a trunc for an AND mask, however
this function wasn't working as expected. By performing the change
later, we can create a wide type integer mask instead of a narrow -1
value, which could then be simply removed (incorrectly). Because we
now perform this action later, it's necessary to cache the trunc type
before we perform the promotion.
[GISel][NFC]: Add methods to speed up insertion into GISelWorklist
https://reviews.llvm.org/D58073
Speed up insertion during the initial populating phase into the
GISelWorkList by deferring repeatedly resizing the DenseMap.
This results in ~10% improvement in the combiner passes, and
~3% speedup in the Legalizer.
Matt Davis [Thu, 14 Feb 2019 23:50:35 +0000 (23:50 +0000)]
[symbolizer] Avoid collecting symbols belonging to invalid sections.
Summary:
llvm-symbolizer would originally report symbols that belonged to an invalid object file section.
Specifically the case where: `*Symbol.getSection() == ObjFile.section_end()`
This patch prevents the Symbolizer from collecting symbols that belong to invalid sections.
The test (from PR40591) introduces a case where two symbols have address 0,
one symbol is defined, 'foo', and the other is not defined, 'bar'. This patch will cause
the Symbolizer to keep 'foo' and ignore 'bar'.
As a side note, the logic for adding symbols to the Symbolizer's store
(`SymbolizableObjectFile::addSymbol`) replaces symbols with the
same <address, size> pair. At some point that logic should be revisited as in the
aforementioned case, 'bar' was overwriting 'foo' in the Symbolizer's store,
and 'foo' was forgotten.
Nick Desaulniers [Thu, 14 Feb 2019 23:35:53 +0000 (23:35 +0000)]
[INLINER] allow inlining of address taken blocks
as long as their uses does not contain calls to functions that capture
the argument (potentially allowing the blockaddress to "escape" the
lifetime of the caller).
TODO:
- add more tests
- fix crash in llvm::updateCGAndAnalysisManagerForFunctionPass when
invoking Transforms/Inline/blockaddress.ll
Matt Arsenault [Thu, 14 Feb 2019 22:41:01 +0000 (22:41 +0000)]
Replace gcroot verifier tests
These haven't been checking anything useful and have been testing the
wrong failure reason for many years. Replace them with something which
stresses what is actually implemented in the verifier now.
[AMDGPU] Ressociate 'add (add x, y), z' to use SALU
Reassociate adds to collect scalar operands in a single
instruction when possible. That will result in a scalar
add followed by vector instead of two vector adds, thus
better utilizing SALU.
Teresa Johnson [Thu, 14 Feb 2019 21:22:50 +0000 (21:22 +0000)]
[ThinLTO] Detect partially split modules during the thin link
Summary:
The changes to disable LTO unit splitting by default (r350949) and
detect inconsistently split LTO units (r350948) are causing some crashes
when the inconsistency is detected in multiple threads simultaneously.
Fix that by having the code always look for the inconsistently split
LTO units during the thin link, by checking for the presence of type
tests recorded in the summaries.
Modify test added in r350948 to remove single threading required to fix
a bot failure due to this issue (and some debugging options added in the
process of diagnosing it).
Chris Bieneman [Thu, 14 Feb 2019 20:57:17 +0000 (20:57 +0000)]
[CMake] Fix ability to use LLVM_ENABLE_PROJECTS with LLVM_EXTERNAL_PROJECTS
LLVM r353148, changed the circumstances in which the project source directory variables are created to only create them for LLVM projects. This patch initializes the directory variables for projects specified in `LLVM_EXTERNAL_PROJECTS` as well.
Philip Reames [Thu, 14 Feb 2019 20:41:17 +0000 (20:41 +0000)]
Canonicalize all integer "idempotent" atomicrmw ops
For "idempotent" atomicrmw instructions which we can't simply turn into load, canonicalize the operation and constant. This reduces the matching needed elsewhere in the optimizer, but doesn't directly impact codegen.
For any architecture where OR/Zero is not a good default choice, you can extend the AtomicExpand lowerIdempotentRMWIntoFencedLoad mechanism. I reviewed X86 to make sure this works well, haven't audited other backends.
Serge Guelton [Thu, 14 Feb 2019 19:17:00 +0000 (19:17 +0000)]
Recommit Optional specialization for trivially copyable types
Unfortunately the original code gets misscompiled by GCC (at least 8.1),
this is a tentative workaround using std::memcpy instead of inplace new
for trivially copyable types. I'll revert if it breaks.
Original revision: https://reviews.llvm.org/D57097
Philip Reames [Thu, 14 Feb 2019 18:39:14 +0000 (18:39 +0000)]
Teach instcombine about remaining "idempotent" atomicrmw types
Expand on Quentin's r353471 patch which converts some atomicrmws into loads. Handle remaining operation types, and fix a slight bug. Atomic loads are required to have alignment. Since this was within the InstCombine fixed point, somewhere else in InstCombine was adding alignment before the verifier saw it, but still, we should fix.
Terminology wise, I'm using the "idempotent" naming that is used for the same operations in AtomicExpand and X86ISelLoweringInfo. Once this lands, I'll add similar tests for AtomicExpand, and move the pattern match function to a common location. In the review, there was seemingly consensus that "idempotent" was slightly incorrect for this context. Once we setle on a better name, I'll update all uses at once.
Jordan Rupprecht [Thu, 14 Feb 2019 18:35:13 +0000 (18:35 +0000)]
[llvm-ar] Implement the P modifier.
Summary:
GNU ar has a `P` modifier that changes filename comparisons to use full paths instead of the basename. As noted in the GNU docs, regular archives are not created with full path names, so P is used to deal with archives created by other archive programs (e.g. see the updated `absolute-paths.test` test case).
Since thin archives use full path names -- paths are relative to the archive -- it seems very error prone to not imply P when dealing with thin archives, so P is implied in those cases. (I think this is a deviation from GNU ar that makes sense).
This fixes PR37436 via https://github.com/ClangBuiltLinux/linux/issues/33.
Teresa Johnson [Thu, 14 Feb 2019 14:14:24 +0000 (14:14 +0000)]
Refine ArgPromotion metadata handling
Summary:
In r353537 we now copy all metadata to the new function, with the old
being removed when the old function is eliminated. In some cases the old
function is dropped to a declaration (seems to only occur with the old
PM). Go ahead and clear all metadata from the old function to handle that
case, since verification will complain otherwise. This is consistent
with what was being done for debug metadata before r353537.
Florian Hahn [Thu, 14 Feb 2019 13:59:39 +0000 (13:59 +0000)]
[LoopUnrollPeel] Add case where we should forget the peeled loop from SCEV.
The test case requires the peeled loop to be forgotten after peeling,
even though it does not have a parent. When called via the unroller,
SE->forgetTopmostLoop is also called, so the test case would also pass
without any SCEV invalidation, but peelLoop is exposed as utility
function. Also, in the test case, simplifyLoop will make changes,
removing the loop from SCEV, but it is better to not rely on this
behavior.
Sam McCall [Thu, 14 Feb 2019 12:57:01 +0000 (12:57 +0000)]
Reapply [VFS] Allow multiple RealFileSystem instances with independent CWDs.
This reverts commit r351091.
The original mac breakages are addressed by ensuring the root directory
we're working from is fully symlink-resolved before starting.
Petar Avramovic [Thu, 14 Feb 2019 11:39:53 +0000 (11:39 +0000)]
[MIPS GlobalISel] Select branch instructions
Select G_BR and G_BRCOND for MIPS32.
Unconditional branch G_BR does not have register operand,
for that reason we only add tests.
Since conditional branch G_BRCOND compares register to zero on MIPS32,
explicit extension must be performed on i1 condition in order to set
high bits to appropriate value.
Max Kazantsev [Thu, 14 Feb 2019 11:10:29 +0000 (11:10 +0000)]
Make widenable condition transparent for MemoryWriteTracking
Side effects of widenable condition intrinsic are modelled via
InaccessibleMemOnly, and there is no way to say that it isn't
really writing any memory. This patch teaches MemoryWriteTracking
ignore this intrinsic.
Jeremy Morse [Thu, 14 Feb 2019 11:09:24 +0000 (11:09 +0000)]
Fix an accidentally flipped pair of arguments, NFCI
While rebasing a refactor in r353950 I accidentally swapped two function
arguments; one is SelectionDAGBuilders "current" DebugLoc, the other is the one
from the "current" debug intrinsic. They're probably always identical, but I
haven't proved that yet.
David Green [Thu, 14 Feb 2019 11:09:24 +0000 (11:09 +0000)]
[ARM] Ensure we update the correct flags in the peephole optimiser
The Arm peephole optimiser code keeps track of both an MI and a SubAdd that can
be used to optimise away a CMP. In the rare case that both are found and not
ruled-out as valid, we could end up setting the flags on the wrong one.
Instead make sure we are using SubAdd if it exists, as it will be closer to the
CMP.
The testcase here is a little theoretical, with a dead def of cpsr. It should
hopefully show the point.
Andrew Ng [Thu, 14 Feb 2019 11:08:49 +0000 (11:08 +0000)]
[Support] Fix TempFile::discard to not leave behind temporary files
Moved the remove of the temporary file to after the close to avoid
remove failures caused by ETXTBSY errors.
This issue was seen when FileOutputBuffer falls back to an in memory
buffer due to the inability to mmap the on disk file. This occurred when
running LLD on an Ubuntu VM in VirtualBox on a Windows host attempting
to write the output to a VirtualBox shared folder.
Daniel Sanders [Thu, 14 Feb 2019 00:15:28 +0000 (00:15 +0000)]
[globalisel][combine] Split existing rules into a match and apply step
Summary:
The declarative tablegen definitions split rules into match and apply steps.
Prepare for that by doing the same in the C++ implementations. This aids
some of the migration effort while the tablegen version is incomplete.
Jordan Rupprecht [Wed, 13 Feb 2019 23:39:41 +0000 (23:39 +0000)]
[llvm-ar][libObject] Fix relative paths when nesting thin archives.
Summary:
When adding one thin archive to another, we currently chop off the relative path to the flattened members. For instance, when adding `foo/child.a` (which contains `x.txt`) to `parent.a`, when flattening it we should add it as `foo/x.txt` (which exists) instead of `x.txt` (which does not exist).
As a note, this also undoes the `IsNew` parameter of handling relative paths in r288280. The unit test there still passes.
This was reported as part of testing the kernel build with llvm-ar: https://patchwork.kernel.org/patch/10767545/ (see the second point).
Stefan Pintilie [Wed, 13 Feb 2019 23:37:23 +0000 (23:37 +0000)]
[PowerPC][NFC] Added tests for prologue and epilogue code gen.
Added four test files to check the existing behaviour of prologue
and epilogue code generation. This patch was done as a setup for
the upcoming patch listed on Phabricator that will change how the
prologue and epilogue work.
The upcoming patch is: https://reviews.llvm.org/D42590
Philip Reames [Wed, 13 Feb 2019 23:01:11 +0000 (23:01 +0000)]
[SelectionDAG] Inline a single use helper function, and remove last non-MMO interface [NFC]
For D57601, we need to know whether the instruction is volatile. We'd either have to pass yet another parameter, or just standardize on the MMO interface. I chose the second.
Mark Lacey [Wed, 13 Feb 2019 22:56:43 +0000 (22:56 +0000)]
[RegAllocGreedy] Take last chance recoloring into account in evicting.
Last chance recoloring inserts into FixedRegisters those virtual
registers it is attempting to assign a physical register to.
We must consider these when we consider candidates for eviction so that
we do not end up evicting something while we are attempting to recolor
to assign it.
This is hitting in an out-of-tree target and no longer reproduces on
trunk. That does not appear to be a result of it having been fixed, but
rather, it appears that optimization changes and/or other changes to
register allocation mask the problem.
I haven't found a way to come up with a reasonable test case for this
(i.e. one that I can actually commit to open source, is reasonable
in size, and actually reproduces the issue).
Leonard Chan [Wed, 13 Feb 2019 22:22:48 +0000 (22:22 +0000)]
[NewPM] Second attempt at porting ASan
This is the second attempt to port ASan to new PM after D52739. This takes the
initialization requried by ASan from the Module by moving it into a separate
class with it's own analysis that the new PM ASan can use.
Changes:
- Split AddressSanitizer into 2 passes: 1 for the instrumentation on the
function, and 1 for the pass itself which creates an instance of the first
during it's run. The same is done for AddressSanitizerModule.
- Add new PM AddressSanitizer and AddressSanitizerModule.
- Add legacy and new PM analyses for reading data needed to initialize ASan with.
- Removed DominatorTree dependency from ASan since it was unused.
- Move GlobalsMetadata and ShadowMapping out of anonymous namespace since the
new PM analysis holds these 2 classes and will need to expose them.
Serge Guelton [Wed, 13 Feb 2019 22:11:09 +0000 (22:11 +0000)]
Revert r353962
Specialization of Optional for trivially copyable types yields failure on the buildbots I fail to reproduce locally.
Better safe than sorry, reverting.
Philip Reames [Wed, 13 Feb 2019 20:42:59 +0000 (20:42 +0000)]
[SelectionDAG] Kill last uses of getAtomic w/o a MMO operand [NFC]
The helper function was used by only two callers, and largely ended up providing distinct functionality based on optional arguments and opcode. Inline and simply to make the functionality much more clear.
Vedant Kumar [Wed, 13 Feb 2019 19:53:38 +0000 (19:53 +0000)]
[CodeExtractor] Only lift lifetime markers present in the extraction region
When CodeExtractor finds liftime markers referencing inputs to the
extraction region, it lifts these markers out of the region and inserts
them around the call to the extracted function (see r350420, PR39671).
However, it should *only* lift lifetime markers that are actually
present in the extraction region. I.e., if a start marker is present in
the extraction region but a corresponding end marker isn't (or vice
versa), only the start marker (or end marker, resp.) should be lifted.
Philip Reames [Wed, 13 Feb 2019 19:49:17 +0000 (19:49 +0000)]
[Tests] More unordered atomic lowering tests
This time, focused around narrowing and widening transformations. Also, include a few simple memory optimization tests to highlight missed oppurtunities. This is part of building up the test base for D57601.