Simon Pilgrim [Fri, 15 Feb 2019 11:39:21 +0000 (11:39 +0000)]
[X86][AVX] lowerShuffleAsLanePermuteAndPermute - fully populate the lane shuffle mask (PR40730)
As detailed on PR40730, we are not correctly filling in the lane shuffle mask (D53148/rL344446) - we fill in for the correct src lane but don't add it to the correct mask element, so any reference to the correct element is likely to see an UNDEF mask index.
This allows constant folding to propagate UNDEFs prior to the lane mask being (correctly) lowered to vperm2f128.
This patch fixes the issue by fully populating the lane shuffle mask - this is more than is necessary (if we only filled in the required mask elements we might be able to match other shuffle instructions - broadcasts etc.), but its the most cautious approach as this needs to be cherrypicked into the 8.0.0 release branch.
Diana Picus [Fri, 15 Feb 2019 10:50:02 +0000 (10:50 +0000)]
[ARM GlobalISel] Style fix. NFCI
Add the opcode for ADDrr / t2ADDrr to the Opcode cache, as we did for
all other opcodes where the handling is otherwise the same between arm
mode and thumb2.
Sam Parker [Fri, 15 Feb 2019 09:04:39 +0000 (09:04 +0000)]
[ARM CGP] Fix ConvertTruncs
ConvertTruncs is used to replace a trunc for an AND mask, however
this function wasn't working as expected. By performing the change
later, we can create a wide type integer mask instead of a narrow -1
value, which could then be simply removed (incorrectly). Because we
now perform this action later, it's necessary to cache the trunc type
before we perform the promotion.
[GISel][NFC]: Add methods to speed up insertion into GISelWorklist
https://reviews.llvm.org/D58073
Speed up insertion during the initial populating phase into the
GISelWorkList by deferring repeatedly resizing the DenseMap.
This results in ~10% improvement in the combiner passes, and
~3% speedup in the Legalizer.
Matt Davis [Thu, 14 Feb 2019 23:50:35 +0000 (23:50 +0000)]
[symbolizer] Avoid collecting symbols belonging to invalid sections.
Summary:
llvm-symbolizer would originally report symbols that belonged to an invalid object file section.
Specifically the case where: `*Symbol.getSection() == ObjFile.section_end()`
This patch prevents the Symbolizer from collecting symbols that belong to invalid sections.
The test (from PR40591) introduces a case where two symbols have address 0,
one symbol is defined, 'foo', and the other is not defined, 'bar'. This patch will cause
the Symbolizer to keep 'foo' and ignore 'bar'.
As a side note, the logic for adding symbols to the Symbolizer's store
(`SymbolizableObjectFile::addSymbol`) replaces symbols with the
same <address, size> pair. At some point that logic should be revisited as in the
aforementioned case, 'bar' was overwriting 'foo' in the Symbolizer's store,
and 'foo' was forgotten.
Nick Desaulniers [Thu, 14 Feb 2019 23:35:53 +0000 (23:35 +0000)]
[INLINER] allow inlining of address taken blocks
as long as their uses does not contain calls to functions that capture
the argument (potentially allowing the blockaddress to "escape" the
lifetime of the caller).
TODO:
- add more tests
- fix crash in llvm::updateCGAndAnalysisManagerForFunctionPass when
invoking Transforms/Inline/blockaddress.ll
Matt Arsenault [Thu, 14 Feb 2019 22:41:01 +0000 (22:41 +0000)]
Replace gcroot verifier tests
These haven't been checking anything useful and have been testing the
wrong failure reason for many years. Replace them with something which
stresses what is actually implemented in the verifier now.
[AMDGPU] Ressociate 'add (add x, y), z' to use SALU
Reassociate adds to collect scalar operands in a single
instruction when possible. That will result in a scalar
add followed by vector instead of two vector adds, thus
better utilizing SALU.
Teresa Johnson [Thu, 14 Feb 2019 21:22:50 +0000 (21:22 +0000)]
[ThinLTO] Detect partially split modules during the thin link
Summary:
The changes to disable LTO unit splitting by default (r350949) and
detect inconsistently split LTO units (r350948) are causing some crashes
when the inconsistency is detected in multiple threads simultaneously.
Fix that by having the code always look for the inconsistently split
LTO units during the thin link, by checking for the presence of type
tests recorded in the summaries.
Modify test added in r350948 to remove single threading required to fix
a bot failure due to this issue (and some debugging options added in the
process of diagnosing it).
Chris Bieneman [Thu, 14 Feb 2019 20:57:17 +0000 (20:57 +0000)]
[CMake] Fix ability to use LLVM_ENABLE_PROJECTS with LLVM_EXTERNAL_PROJECTS
LLVM r353148, changed the circumstances in which the project source directory variables are created to only create them for LLVM projects. This patch initializes the directory variables for projects specified in `LLVM_EXTERNAL_PROJECTS` as well.
Philip Reames [Thu, 14 Feb 2019 20:41:17 +0000 (20:41 +0000)]
Canonicalize all integer "idempotent" atomicrmw ops
For "idempotent" atomicrmw instructions which we can't simply turn into load, canonicalize the operation and constant. This reduces the matching needed elsewhere in the optimizer, but doesn't directly impact codegen.
For any architecture where OR/Zero is not a good default choice, you can extend the AtomicExpand lowerIdempotentRMWIntoFencedLoad mechanism. I reviewed X86 to make sure this works well, haven't audited other backends.
Serge Guelton [Thu, 14 Feb 2019 19:17:00 +0000 (19:17 +0000)]
Recommit Optional specialization for trivially copyable types
Unfortunately the original code gets misscompiled by GCC (at least 8.1),
this is a tentative workaround using std::memcpy instead of inplace new
for trivially copyable types. I'll revert if it breaks.
Original revision: https://reviews.llvm.org/D57097
Philip Reames [Thu, 14 Feb 2019 18:39:14 +0000 (18:39 +0000)]
Teach instcombine about remaining "idempotent" atomicrmw types
Expand on Quentin's r353471 patch which converts some atomicrmws into loads. Handle remaining operation types, and fix a slight bug. Atomic loads are required to have alignment. Since this was within the InstCombine fixed point, somewhere else in InstCombine was adding alignment before the verifier saw it, but still, we should fix.
Terminology wise, I'm using the "idempotent" naming that is used for the same operations in AtomicExpand and X86ISelLoweringInfo. Once this lands, I'll add similar tests for AtomicExpand, and move the pattern match function to a common location. In the review, there was seemingly consensus that "idempotent" was slightly incorrect for this context. Once we setle on a better name, I'll update all uses at once.
Jordan Rupprecht [Thu, 14 Feb 2019 18:35:13 +0000 (18:35 +0000)]
[llvm-ar] Implement the P modifier.
Summary:
GNU ar has a `P` modifier that changes filename comparisons to use full paths instead of the basename. As noted in the GNU docs, regular archives are not created with full path names, so P is used to deal with archives created by other archive programs (e.g. see the updated `absolute-paths.test` test case).
Since thin archives use full path names -- paths are relative to the archive -- it seems very error prone to not imply P when dealing with thin archives, so P is implied in those cases. (I think this is a deviation from GNU ar that makes sense).
This fixes PR37436 via https://github.com/ClangBuiltLinux/linux/issues/33.
Teresa Johnson [Thu, 14 Feb 2019 14:14:24 +0000 (14:14 +0000)]
Refine ArgPromotion metadata handling
Summary:
In r353537 we now copy all metadata to the new function, with the old
being removed when the old function is eliminated. In some cases the old
function is dropped to a declaration (seems to only occur with the old
PM). Go ahead and clear all metadata from the old function to handle that
case, since verification will complain otherwise. This is consistent
with what was being done for debug metadata before r353537.
Florian Hahn [Thu, 14 Feb 2019 13:59:39 +0000 (13:59 +0000)]
[LoopUnrollPeel] Add case where we should forget the peeled loop from SCEV.
The test case requires the peeled loop to be forgotten after peeling,
even though it does not have a parent. When called via the unroller,
SE->forgetTopmostLoop is also called, so the test case would also pass
without any SCEV invalidation, but peelLoop is exposed as utility
function. Also, in the test case, simplifyLoop will make changes,
removing the loop from SCEV, but it is better to not rely on this
behavior.
Sam McCall [Thu, 14 Feb 2019 12:57:01 +0000 (12:57 +0000)]
Reapply [VFS] Allow multiple RealFileSystem instances with independent CWDs.
This reverts commit r351091.
The original mac breakages are addressed by ensuring the root directory
we're working from is fully symlink-resolved before starting.
Petar Avramovic [Thu, 14 Feb 2019 11:39:53 +0000 (11:39 +0000)]
[MIPS GlobalISel] Select branch instructions
Select G_BR and G_BRCOND for MIPS32.
Unconditional branch G_BR does not have register operand,
for that reason we only add tests.
Since conditional branch G_BRCOND compares register to zero on MIPS32,
explicit extension must be performed on i1 condition in order to set
high bits to appropriate value.
Max Kazantsev [Thu, 14 Feb 2019 11:10:29 +0000 (11:10 +0000)]
Make widenable condition transparent for MemoryWriteTracking
Side effects of widenable condition intrinsic are modelled via
InaccessibleMemOnly, and there is no way to say that it isn't
really writing any memory. This patch teaches MemoryWriteTracking
ignore this intrinsic.
Jeremy Morse [Thu, 14 Feb 2019 11:09:24 +0000 (11:09 +0000)]
Fix an accidentally flipped pair of arguments, NFCI
While rebasing a refactor in r353950 I accidentally swapped two function
arguments; one is SelectionDAGBuilders "current" DebugLoc, the other is the one
from the "current" debug intrinsic. They're probably always identical, but I
haven't proved that yet.
David Green [Thu, 14 Feb 2019 11:09:24 +0000 (11:09 +0000)]
[ARM] Ensure we update the correct flags in the peephole optimiser
The Arm peephole optimiser code keeps track of both an MI and a SubAdd that can
be used to optimise away a CMP. In the rare case that both are found and not
ruled-out as valid, we could end up setting the flags on the wrong one.
Instead make sure we are using SubAdd if it exists, as it will be closer to the
CMP.
The testcase here is a little theoretical, with a dead def of cpsr. It should
hopefully show the point.
Andrew Ng [Thu, 14 Feb 2019 11:08:49 +0000 (11:08 +0000)]
[Support] Fix TempFile::discard to not leave behind temporary files
Moved the remove of the temporary file to after the close to avoid
remove failures caused by ETXTBSY errors.
This issue was seen when FileOutputBuffer falls back to an in memory
buffer due to the inability to mmap the on disk file. This occurred when
running LLD on an Ubuntu VM in VirtualBox on a Windows host attempting
to write the output to a VirtualBox shared folder.
Daniel Sanders [Thu, 14 Feb 2019 00:15:28 +0000 (00:15 +0000)]
[globalisel][combine] Split existing rules into a match and apply step
Summary:
The declarative tablegen definitions split rules into match and apply steps.
Prepare for that by doing the same in the C++ implementations. This aids
some of the migration effort while the tablegen version is incomplete.
Jordan Rupprecht [Wed, 13 Feb 2019 23:39:41 +0000 (23:39 +0000)]
[llvm-ar][libObject] Fix relative paths when nesting thin archives.
Summary:
When adding one thin archive to another, we currently chop off the relative path to the flattened members. For instance, when adding `foo/child.a` (which contains `x.txt`) to `parent.a`, when flattening it we should add it as `foo/x.txt` (which exists) instead of `x.txt` (which does not exist).
As a note, this also undoes the `IsNew` parameter of handling relative paths in r288280. The unit test there still passes.
This was reported as part of testing the kernel build with llvm-ar: https://patchwork.kernel.org/patch/10767545/ (see the second point).
Stefan Pintilie [Wed, 13 Feb 2019 23:37:23 +0000 (23:37 +0000)]
[PowerPC][NFC] Added tests for prologue and epilogue code gen.
Added four test files to check the existing behaviour of prologue
and epilogue code generation. This patch was done as a setup for
the upcoming patch listed on Phabricator that will change how the
prologue and epilogue work.
The upcoming patch is: https://reviews.llvm.org/D42590
Philip Reames [Wed, 13 Feb 2019 23:01:11 +0000 (23:01 +0000)]
[SelectionDAG] Inline a single use helper function, and remove last non-MMO interface [NFC]
For D57601, we need to know whether the instruction is volatile. We'd either have to pass yet another parameter, or just standardize on the MMO interface. I chose the second.
Mark Lacey [Wed, 13 Feb 2019 22:56:43 +0000 (22:56 +0000)]
[RegAllocGreedy] Take last chance recoloring into account in evicting.
Last chance recoloring inserts into FixedRegisters those virtual
registers it is attempting to assign a physical register to.
We must consider these when we consider candidates for eviction so that
we do not end up evicting something while we are attempting to recolor
to assign it.
This is hitting in an out-of-tree target and no longer reproduces on
trunk. That does not appear to be a result of it having been fixed, but
rather, it appears that optimization changes and/or other changes to
register allocation mask the problem.
I haven't found a way to come up with a reasonable test case for this
(i.e. one that I can actually commit to open source, is reasonable
in size, and actually reproduces the issue).
Leonard Chan [Wed, 13 Feb 2019 22:22:48 +0000 (22:22 +0000)]
[NewPM] Second attempt at porting ASan
This is the second attempt to port ASan to new PM after D52739. This takes the
initialization requried by ASan from the Module by moving it into a separate
class with it's own analysis that the new PM ASan can use.
Changes:
- Split AddressSanitizer into 2 passes: 1 for the instrumentation on the
function, and 1 for the pass itself which creates an instance of the first
during it's run. The same is done for AddressSanitizerModule.
- Add new PM AddressSanitizer and AddressSanitizerModule.
- Add legacy and new PM analyses for reading data needed to initialize ASan with.
- Removed DominatorTree dependency from ASan since it was unused.
- Move GlobalsMetadata and ShadowMapping out of anonymous namespace since the
new PM analysis holds these 2 classes and will need to expose them.
Serge Guelton [Wed, 13 Feb 2019 22:11:09 +0000 (22:11 +0000)]
Revert r353962
Specialization of Optional for trivially copyable types yields failure on the buildbots I fail to reproduce locally.
Better safe than sorry, reverting.
Philip Reames [Wed, 13 Feb 2019 20:42:59 +0000 (20:42 +0000)]
[SelectionDAG] Kill last uses of getAtomic w/o a MMO operand [NFC]
The helper function was used by only two callers, and largely ended up providing distinct functionality based on optional arguments and opcode. Inline and simply to make the functionality much more clear.
Vedant Kumar [Wed, 13 Feb 2019 19:53:38 +0000 (19:53 +0000)]
[CodeExtractor] Only lift lifetime markers present in the extraction region
When CodeExtractor finds liftime markers referencing inputs to the
extraction region, it lifts these markers out of the region and inserts
them around the call to the extracted function (see r350420, PR39671).
However, it should *only* lift lifetime markers that are actually
present in the extraction region. I.e., if a start marker is present in
the extraction region but a corresponding end marker isn't (or vice
versa), only the start marker (or end marker, resp.) should be lifted.
Philip Reames [Wed, 13 Feb 2019 19:49:17 +0000 (19:49 +0000)]
[Tests] More unordered atomic lowering tests
This time, focused around narrowing and widening transformations. Also, include a few simple memory optimization tests to highlight missed oppurtunities. This is part of building up the test base for D57601.
Craig Topper [Wed, 13 Feb 2019 18:21:36 +0000 (18:21 +0000)]
[X86] Add 'fxsr' to the getHostCPUFeatures detection code.
We implicitly mark this feature as enabled when the target is 64-bits, but our detection code for -march=native didn't support it so you can't detect it on 32-bit targets.
Serge Guelton [Wed, 13 Feb 2019 18:12:04 +0000 (18:12 +0000)]
Re-commit rL353927, patch included
Make llvm::Optional<T> trivially copyable when T is trivially copyable
This is an ever-recurring issue (see https://bugs.llvm.org/show_bug.cgi?id=39427 and https://bugs.llvm.org/show_bug.cgi?id=35978)
but I believe that thanks to https://reviews.llvm.org/D54472 we can now ship a decent implementation of this.
Basically the fact that llvm::is_trivially_copyable has a consistent behavior across compilers should prevent any ABI issue,
and using in-place new instead of memcpy should keep compiler bugs away.
This patch is slightly different from the original revision https://reviews.llvm.org/rL353927 but achieves the same goal. It just avoids
going through std::conditional which may the code more explicit.
Petr Hosek [Wed, 13 Feb 2019 17:28:47 +0000 (17:28 +0000)]
[AArch64] Support reserving arbitrary general purpose registers
This is a follow up to D48580 and D48581 which allows reserving
arbitrary general purpose registers with the exception of registers
with special purpose (X8, X16-X18, X29, X30) and registers used by LLVM
(X0, X19). This change also generalizes some of the existing logic to
rely entirely on values generated from tablegen.
Jeremy Morse [Wed, 13 Feb 2019 16:33:05 +0000 (16:33 +0000)]
[DebugInfo][DAG] Either salvage dangling debug info or emit Undef DBG_VALUEs
In this patch SelectionDAG tries to salvage any dbg.values that are going to be
dropped, in case they can be recovered from Values in the current BB. It also
strengthens SelectionDAGs handling of dangling debug data, so that dbg.values
are *always* emitted (as Undef or otherwise) instead of dangling forever.
The motivation behind this patch exists in the new test case: a memory address
(here a bitcast and GEP) exist in one basic block, and a dbg.value referring to
the address is left in the 'next' block. The base pointer is live across all
basic blocks. In current llvm trunk the dbg.value cannot be encoded, and it
isn't even emitted as an Undef DBG_VALUE.
The change is simply: if we're definitely going to drop a dbg.value, repeatedly
apply salvageDebugInfo to its operand until either we find something that can
be encoded, or we can't salvage any further in which case we produce an Undef
DBG_VALUE. To know when we're "definitely going to drop a dbg.value",
SelectionDAG signals SelectionDAGBuilder when all IR instructions have been
encoded to force salvaging. This ensures that any dbg.value that's dangling
after DAG creation will have a corresponding DBG_VALUE encoded.
Jeremy Morse [Wed, 13 Feb 2019 15:53:10 +0000 (15:53 +0000)]
[DebugInfo][DAG] Refactor dbg.value lowering into its own method
This is a pure copy-and-paste job, moving the logic for lowering dbg.value
intrinsics to SDDbgValues into its own function. This is ahead of adding some
more users of this logic.
Jeremy Morse [Wed, 13 Feb 2019 13:37:33 +0000 (13:37 +0000)]
[DebugInfo][DAG] Limit special-casing of dbg.values for Arguments
SelectionDAGBuilder has special handling for dbg.value intrinsics that are
understood to define the location of function parameters on entry to the
function. To enable this, we avoid recording a dbg.value as a virtual register
reference if it might be such a parameter, so that it later hits
EmitFuncArgumentDbgValue.
This patch reduces the set of circumstances where we avoid recording a
dbg.value as a virtual register reference, to allow more "normal" variables
to be recorded that way. We now only bypass for potential parameters if:
* The dbg.value operand is an Argument,
* The Variable is a parameter, and
* The Variable is not inlined.
meaning it's very likely that the dbg.value is a function-entry parameter
location.
Andrea Di Biagio [Wed, 13 Feb 2019 11:02:42 +0000 (11:02 +0000)]
[MCA][Scheduler] Use latency information to further classify busy instructions.
This patch introduces a new instruction stage named 'IS_PENDING'.
An instruction transitions from the IS_DISPATCHED to the IS_PENDING stage if
input registers are not available, but their latency is known.
This patch also adds a new set of instructions named 'PendingSet' to class
Scheduler. The idea is that the PendingSet will only contain instructions that
have reached the IS_PENDING stage.
By construction, an instruction in the PendingSet is only dependent on
instructions that have already reached the execution stage. The plan is to use
this knowledge to identify bottlenecks caused by data dependencies (see
PR37494).
Jeremy Morse [Wed, 13 Feb 2019 10:54:53 +0000 (10:54 +0000)]
[DebugInfo][InstCombine] Prefer to salvage debuginfo over sinking it
When instcombine sinks an instruction between two basic blocks, it sinks any
dbg.value users in the source block with it, to prevent debug use-before-free.
However we can do better by attempting to salvage the debug users, which would
avoid moving where the variable location changes. If we successfully salvage,
still sink a (cloned) dbg.value with the sunk instruction, as the sunk
instruction is more likely to be "live" later in the compilation process.
If we can't salvage dbg.value users of a sunk instruction, mark the dbg.values
in the original block as being undef. This terminates any earlier variable
location range, and represents the fact that we've optimized out the variable
location for a portion of the program.
David Stenberg [Wed, 13 Feb 2019 09:34:07 +0000 (09:34 +0000)]
[DebugInfo] Stop changing labels for register-described parameter DBG_VALUEs
Summary:
This is a follow-up to D57510. This patch stops DebugHandlerBase from
changing the starting label for the first non-overlapping,
register-described parameter DBG_VALUEs to the beginning of the
function. That code did not consider what defined the registers, which
could result in the ranges for the debug values starting before their
defining instructions. We currently do not emit debug values for
constant values directly at the start of the function, so this code is
still useful for such values, but my intention is to remove the code
from DebugHandlerBase completely when we get there. One reason for
removing it is that the code violates the history map's ranges, which I
think can make it quite confusing when troubleshooting.
In D57510, PrologEpilogInserter was amended so that parameter DBG_VALUEs
now are kept at the start of the entry block, even after emission of
prologue code. That was done to reduce the degradation of debug
completeness from this patch. PR40638 is another example, where the
lexical-scope trimming that LDV does, in combination with scheduling,
results in instructions after the prologue being left without locations.
There might be other cases where the DBG_VALUEs are pushed further down,
for which the DebugHandlerBase code may be helpful, but as it now quite
often result in incorrect locations, even after the prologue, it seems
better to remove that code, and try to work our way up with accurate
locations.
In the long run we should maybe not aim to provide accurate locations
inside the prologue. Some single location descriptions, at least those
referring to stack values, generate inaccurate values inside the
epilogue, so we maybe should not aim to achieve accuracy for location
lists. However, it seems that we now emit line number programs that can
result in GDB and LLDB stopping inside the prologue when doing line
number stepping into functions. See PR40188 for more information.
A summary of some of the changed test cases is available in PR40188#c2.
Serge Guelton [Wed, 13 Feb 2019 09:31:22 +0000 (09:31 +0000)]
Make llvm::Optional<T> trivially copyable when T is trivially copyable
This is an ever-recurring issue (see https://bugs.llvm.org/show_bug.cgi?id=39427 and https://bugs.llvm.org/show_bug.cgi?id=35978)
but I believe that thanks to https://reviews.llvm.org/D54472 we can now ship a decent implementation of this.
Basically the fact that llvm::is_trivially_copyable has a consistent behavior across compilers should prevent any ABI issue,
and using in-place new instead of memcpy should keep compiler bugs away.
Michal Gorny [Wed, 13 Feb 2019 08:34:40 +0000 (08:34 +0000)]
[llvm] [cmake] Provide split include paths in LLVMConfig
Modify LLVMConfig to provide split variables for in-source and generated
include paths. Currently, it uses a single value for both
LLVM_INCLUDE_DIRS and LLVM_INCLUDE_DIR which works for install tree but
fails hard at build tree (where LLVM_INCLUDE_DIR incorrectly contains
multiple values).
Instead, put the generated directory in LLVM_INCLUDE_DIR, and the source
tree in LLVM_MAIN_INCLUDE_DIR which is consistent with in-LLVM builds.
For install tree, both variables will have the same value.
Anton Afanasyev [Wed, 13 Feb 2019 08:26:43 +0000 (08:26 +0000)]
[X86][SLP] Enable SLP vectorization for 128-bit horizontal X86 instructions (add, sub)
Try to use 64-bit SLP vectorization. In addition to horizontal instrs
this change triggers optimizations for partial vector operations (for instance,
using low halfs of 128-bit registers xmm0 and xmm1 to multiply <2 x float> by
<2 x float>).