Adrian Prantl [Tue, 18 Jun 2019 19:42:29 +0000 (19:42 +0000)]
Add debug location verification for !llvm.loop attachments.
This patch teaches the Verifier how to detect broken !llvm.loop
attachments as discussed in https://reviews.llvm.org/D60831. This
allows LLVM to warn and strip out the broken debug info before
attempting an LTO compilation with input generated by LLVM predating
https://reviews.llvm.org/rL361149.
Reid Kleckner [Tue, 18 Jun 2019 19:41:25 +0000 (19:41 +0000)]
[PDB] Ignore .debug$S subsections with high bit set
Some versions of the Visual C++ 2015 runtime have line tables with the
subsection kind of 0x800000F2. In cvinfo.h, 0x80000000 is documented to
be DEBUG_S_IGNORE. This appears to implement the intended behavior.
Craig Topper [Tue, 18 Jun 2019 19:04:03 +0000 (19:04 +0000)]
[X86] Remove unnecessary line that makes v4f32 FP_ROUND Legal. NFC
FP_ROUND defaults to Legal for all MVT types and nothing changes
the v4f32 entry way from this default. If we needed this line
we'd also need one for v8f32 with AVX512 which we don't have.
Other than adding consistent demanded elts handling which was a trivial addition, the other differences in functionality will be added in later patches.
Michael Liao [Tue, 18 Jun 2019 17:58:49 +0000 (17:58 +0000)]
[SROA] Enhance SROA to handle `addrspacecast`ed allocas
Summary:
- After `addrspacecast` is allowed to be eliminated in SROA, the
adjusting of storage pointer (from `alloca) needs to handle the
potential different address spaces between the storage pointer (from
alloca) and the pointer being used.
Other than adding consistent demanded elts handling which was a trivial addition, the other differences in functionality will be added in later patches.
Simon Atanasyan [Tue, 18 Jun 2019 16:59:57 +0000 (16:59 +0000)]
[mips] Add PTR_64 and GPR_64 predicates to some MIPS 64-bit instructions
Add `IsGP64bit` and `IsPTR64bit` to the list of `UnsupportedFeatures`
of the P5600 scheduling definitions. Also mark some MIPS 64-bit
instructions by PTR_64 and GPR_64 predicates. This reduces number
of "No schedule information for" and "lacks information for" errors
in case of marking this scheduler model as complete.
This patch is one of a series of patches. The goal is to make P5600
scheduler model complete and turn on the `CompleteModel` flag.
Simon Atanasyan [Tue, 18 Jun 2019 16:59:47 +0000 (16:59 +0000)]
[mips] Set the hasNoSchedulingInfo flag for the `MipsAsmPseudoInst`
Set the hasNoSchedulingInfo flag for the`MipsAsmPseudoInst`. These
pseudo-instructions are never used by codegen. This flag allows to
reduce number of "No schedule information for" and "lacks information
for" errors in case of marking a scheduler model as complete.
This patch is one of a series of patches. The goal is to make P5600
scheduler model complete and turn on the `CompleteModel` flag.
Adrian McCarthy [Tue, 18 Jun 2019 16:36:57 +0000 (16:36 +0000)]
Fix some lit test ResourceWarnings on Windows
When running LLDB lit tests on Windows, the system selects a debug version
of Python, which was issuing lots of ResourceWarnings about files that
weren't closed. There are two kinds of them, and each test triggered one
of each.
This patch fixes one kind by ensuring TestRunner explicitly close the
temporary files created for routing stderr. This is important on Windows
but has no net effect on Posix systems.
The remaining ResourceWarnings are more elusive; the bug may lie in
the Python library subprocess.py, and it may be Windows-specific.
Simon Tatham [Tue, 18 Jun 2019 16:19:59 +0000 (16:19 +0000)]
[ARM] Add MVE vector shift instructions.
This includes saturating and non-saturating shifts, both with
immediate shift count and with the shift counts given by another
vector register; VSHLC (in which the bits shifted out of each active
vector lane are shifted in to the next active lane); and also VMOVL,
which is enough like an immediate shift that it didn't fit too badly
in this category.
Summary:
These form a small family of their own, to go with the floating-point
VMINNM/VMAXNM instructions added in a previous commit.
They introduce the first of many special cases in the mnemonic
recognition code, because VMIN with the E suffix used by the VPT
predication system needs to avoid being interpreted as the nonexistent
instruction 'VMI' with an ordinary 'NE' condition suffix.
Part of fixing the X86 regression noted in D63281 - I've split this into X86 and generic parts - the generic commit will be coming shortly and will fix the vector-reduce-mul-widen.ll regression introduced here.
Simon Tatham [Tue, 18 Jun 2019 15:05:42 +0000 (15:05 +0000)]
[ARM] Rename MVE instructions in Tablegen for consistency.
Summary:
Their names began with a mishmash of `MVE_`, `t2` and no prefix at
all. Now they all start with `MVE_`, which seems like a reasonable
choice on the grounds that (a) NEON is the thing they're most at risk
of being confused with, and (b) MVE implies Thumb-2, so a prefix
indicating MVE is strictly more specific than one indicating Thumb-2.
Fangrui Song [Tue, 18 Jun 2019 14:01:03 +0000 (14:01 +0000)]
[llvm-readobj] Allow --hex-dump/--string-dump to dump multiple sections
1) `-x foo` currently dumps one `foo`. This change makes it dump all `foo`.
2) `-x foo -x foo` currently dumps `foo` twice. This change makes it dump `foo` once.
In addition, if foo has section index 9, `-x foo -x 9` dumps `foo` once.
3) Give a warning instead of an error if `foo` does not exist.
The new behaviors match GNU readelf.
Also, print a new line as a separator between two section dumps.
GNU readelf uses two lines, but one seems good enough.
Andrea Di Biagio [Tue, 18 Jun 2019 12:59:46 +0000 (12:59 +0000)]
[MCA] Slightly refactor the bottleneck analysis view. NFCI
This patch slightly refactors data structures internally used by the bottleneck
analysis to track data and resource dependencies.
This patch also updates methods used to print out information about dependency
edges when in debug mode.
This is the last of a sequence of commits done in preparation for an upcoming
patch that fixes PR37494. No functional change intended.
Matt Arsenault [Tue, 18 Jun 2019 12:48:36 +0000 (12:48 +0000)]
AMDGPU: Change API for checking for exec modification
Invert the name and return value to better reflect the imprecise
nature.
Force passing in the DefMI, since it's known in the 2 users and could
possibly fail for an arbitrary vreg.
Allow specifying a specific user instruction. Scan through use
instructions, instead of use operands. Add scan thresholds instead of
searching infinitely.
Stop using a set to track seen uses. I didn't understand this usage,
or why it would not check the last use. I don't think the use list has
any particular order.
Graham Hunter [Tue, 18 Jun 2019 10:11:56 +0000 (10:11 +0000)]
[SVE][IR] Scalable Vector IR Type with pr42210 fix
Recommit of D32530 with a few small changes:
- Stopped recursively walking through aggregates in
the verifier, so that we don't impose too much
overhead on large modules under LTO (see PR42210).
- Changed tests to match; the errors are slightly
different since they only report the array or
struct that actually contains a scalable vector,
rather than all aggregates which contain one in
a nested member.
- Corrected an older comment
Diogo N. Sampaio [Tue, 18 Jun 2019 10:04:36 +0000 (10:04 +0000)]
[NFC] Improve triple match of scripts that update tests
Summary:
The prior behavior of the triple matcher would stop
in the first matched triple. It was not possible to
create specific matches for sub-sets of a triple
(e.g aarch64-apple-darwin would never be used after
aarch64 was matched).
This patch:
1) Allows that specialized triples take priority,
considering that the string lenght of the triple
indentifies how specialized a triple is. If two
triples of same lenght match, the one matched first
prevails, preserving the old behavior.
2) Remove 20 duplicated triples of arm, thumb,
aarch64 options with same arguments, matching
the common prefix (aarch64, arm, thumb) of them.
3) Creates three new function matching regexes and
five triple options for arm64-apple-ios,
(arm|thumb)-apple-ios and thumb(v5)?-macho
Simon Pilgrim [Tue, 18 Jun 2019 09:50:13 +0000 (09:50 +0000)]
[X86] Replace any_extend* vector extensions with zero_extend* equivalents
First step toward addressing the vector-reduce-mul-widen.ll regression in D63281 - we should replace ANY_EXTEND/ANY_EXTEND_VECTOR_INREG in X86ISelDAGToDAG to avoid having to add duplicate patterns when treating any extensions as legal.
In future patches this will also allow us to keep any extension nodes around a lot longer in the DAG, which should mean that we can keep better track of undef elements that otherwise become zeros that we think we have to keep......
Jeremy Morse [Tue, 18 Jun 2019 08:52:38 +0000 (08:52 +0000)]
[DebugInfo][Docs] Document that prologue/epilogue variable location changes are ignored
This patch documents that LLVM does not describe all changes in variable
locations during the prologue and the epilogue. The debugger doesn't /
shouldn't step through that portion of the function anyway, and describing
every location through such stages would bloat location lists.
Perform some minor cleanup at the same time,
* Fix an enumerated list
* Document that dbg.declare intrinsics have their variable location recorded
in a MachineFunction table, not with DBG_VALUE meta-insts
* Adds frame-indexes to the list of things that can be operands to
DBG_VALUEs.
Craig Topper [Tue, 18 Jun 2019 03:23:11 +0000 (03:23 +0000)]
[X86] Introduce new MOVSSrm/MOVSDrm opcodes that use VR128 register class.
Rename the old versions that use FR32/FR64 to MOVSSrm_alt/MOVSDrm_alt.
Use the new versions in patterns that previously used a COPY_TO_REGCLASS
to VR128. These patterns expect the upper bits to be zero. The
current set up appears to work, but I'm not sure we should be
enforcing upper bits being zero through a COPY_TO_REGCLASS.
I wanted to flip the arrangement and use a COPY_TO_REGCLASS to
FR32/FR64 for the patterns that need an f32/f64 result, but that
complicated fastisel and globalisel.
I've been doing some experiments with reducing some isel patterns
and ended up in a situation where I had a
(SUBREG_TO_REG (COPY_TO_RECLASS (VMOVSSrm), VR128)) and our
post-isel peephole was unable to avoid using an instruction for
the SUBREG_TO_REG due to the COPY_TO_REGCLASS. Having a VR128
instruction removes the COPY_TO_REGCLASS that was breaking this.
Tom Stellard [Tue, 18 Jun 2019 02:05:06 +0000 (02:05 +0000)]
GlobalISel: Remove redundant pass initialization
Summary:
All the GlobalISel passes are initialized when the target calls
initializeGlobalISel(), so we don't need to call the initializers
from the pass constructors.
Alex Brachet [Tue, 18 Jun 2019 00:39:10 +0000 (00:39 +0000)]
[llvm-strip] Error when using stdin twice
Summary: Implements bug [[ https://bugs.llvm.org/show_bug.cgi?id=42204 | 42204 ]]. llvm-strip now warns when the same input file is used more than once, and errors when stdin is used more than once.
hwasan: Use bits [3..11) of the ring buffer entry address as the base stack tag.
This saves roughly 32 bytes of instructions per function with stack objects
and causes us to preserve enough information that we can recover the original
tags of all stack variables.
Now that stack tags are deterministic, we no longer need to pass
-hwasan-generate-tags-with-calls during check-hwasan. This also means that
the new stack tag generation mechanism is exercised by check-hwasan.
hwasan: Add a tag_offset DWARF attribute to instrumented stack variables.
The goal is to improve hwasan's error reporting for stack use-after-return by
recording enough information to allow the specific variable that was accessed
to be identified based on the pointer's tag. Currently we record the PC and
lower bits of SP for each stack frame we create (which will eventually be
enough to derive the base tag used by the stack frame) but that's not enough
to determine the specific tag for each variable, which is the stack frame's
base tag XOR a value (the "tag offset") that is unique for each variable in
a function.
In IR, the tag offset is most naturally represented as part of a location
expression on the llvm.dbg.declare instruction. However, the presence of the
tag offset in the variable's actual location expression is likely to confuse
debuggers which won't know about tag offsets, and moreover the tag offset
is not required for a debugger to determine the location of the variable on
the stack, so at the DWARF level it is represented as an attribute so that
it will be ignored by debuggers that don't know about it.
Amara Emerson [Mon, 17 Jun 2019 23:20:29 +0000 (23:20 +0000)]
[GlobalISel][Localizer] Rewrite localizer to run in 2 phases, inter & intra block.
Inter-block localization is the same as what currently happens, except now it
only runs on the entry block because that's where the problematic constants with
long live ranges come from.
The second phase is a new intra-block localization phase which attempts to
re-sink the already localized instructions further right before one of the
multiple uses.
One additional change is to also localize G_GLOBAL_VALUE as they're constants
too. However, on some targets like arm64 it takes multiple instructions to
materialize the value, so some additional heuristics with a TTI hook have been
introduced attempt to prevent code size regressions when localizing these.
Overall, these changes improve CTMark code size on arm64 by 1.2%.
This is part of the approved D63204 pending parent revision.
This small change is in fact a part of the VOP2b legalization which
does not technically belong to wave32 support, so extracted
separately.
Philip Reames [Mon, 17 Jun 2019 21:06:17 +0000 (21:06 +0000)]
Teach getSCEVAtScope how to handle loop phis w/invariant operands in loops w/taken backedges
This patch really contains two pieces:
Teach SCEV how to fold a phi in the header of a loop to the value on the backedge when a) the backedge is known to execute at least once, and b) the value is safe to use globally within the scope dominated by the original phi.
Teach IndVarSimplify's rewriteLoopExitValues to allow loop invariant expressions which already exist (and thus don't need new computation inserted) even in loops where we can't optimize away other uses.
Daniel Sanders [Mon, 17 Jun 2019 20:56:31 +0000 (20:56 +0000)]
[globalisel] Fix iterator invalidation in the extload combines
Summary:
Change the way we deal with iterator invalidation in the extload combines as it
was still possible to neglect to visit a use. Even worse, it happened in the
in-tree test cases and the checks weren't good enough to detect it.
We now take a cheap copy of the use list before iterating over it. This
prevents iterator invalidation from occurring and has the nice side effect
of making the existing schedule-for-erase/schedule-for-insert mechanism
moot.
Philip Reames [Mon, 17 Jun 2019 20:32:22 +0000 (20:32 +0000)]
Fix a bug w/inbounds invalidation in LFTR (recommit)
Recommit r363289 with a bug fix for crash identified in pr42279. Issue was that a loop exit test does not have to be an icmp, leading to a null dereference crash when new logic was exercised for that case. Test case previously committed in r363601.
Original commit comment follows:
This contains fixes for two cases where we might invalidate inbounds and leave it stale in the IR (a miscompile). Case 1 is when switching to an IV with no dynamically live uses, and case 2 is when doing pre-to-post conversion on the same pointer type IV.
The basic scheme used is to prove that using the given IV (pre or post increment forms) would have to already trigger UB on the path to the test we're modifying. As such, our potential UB triggering use does not change the semantics of the original program.
As was pointed out in the review thread by Nikita, this is defending against a separate issue from the hasConcreteDef case. This is about poison, that's about undef. Unfortunately, the two are different, see Nikita's comment for a fuller explanation, he explains it well.
(Note: I'm going to address Nikita's last style comment in a separate commit just to minimize chance of subtle bugs being introduced due to typos.)
Nicolai Haehnle [Mon, 17 Jun 2019 19:28:43 +0000 (19:28 +0000)]
AMDGPU/GFX10: Don't generate s_code_end padding in the asm-printer
Summary:
The purpose of the padding is to guard against stale code being
fetched into the instruction cache by the lowest level prefetching.
We're generating relocatable ELF here, and so the padding should
arguably be added by the linker. This is in fact what Mesa does.
Nicolai Haehnle [Mon, 17 Jun 2019 19:25:57 +0000 (19:25 +0000)]
AMDGPU: Explicitly define a triple for some tests
Summary:
This is related to the changes to the groupstaticsize intrinsic in
D61494 which would otherwise make the related tests in these files
fail or much less useful.
Note that for some reason, SOPK generation is less effective in the
amdhsa OS, which is why I chose PAL. I haven't investigated this
deeper.
Joseph Tremoulet [Mon, 17 Jun 2019 19:11:28 +0000 (19:11 +0000)]
[EarlyCSE] Fix hashing of self-compares
Summary:
Update compare normalization in SimpleValue hashing to break ties (when
the same value is being compared to itself) by switching to the swapped
predicate if it has a lower numerical value. This brings the hashing in
line with isEqual, which already recognizes the self-compares with
swapped predicates as equal.
Alina Sbirlea [Mon, 17 Jun 2019 18:58:40 +0000 (18:58 +0000)]
[MemorySSA] Don't use template when the clone is a simplified instruction.
Summary:
LoopRotate doesn't create a faithful clone of an instruction, it may
simplify it beforehand. Hence the clone of an instruction that has a
MemoryDef associated may not be a definition, but a use or not a memory
alternig instruction.
Don't rely on the template when the clone may be simplified.
If a XMM non-temporal store has less than natural alignment, scalarize the vector - with SSE4A we can stay on the vector and use MOVNTSD(f64), else we must move to GPRs and use MOVNTI(i32/i64).
Alina Sbirlea [Mon, 17 Jun 2019 18:16:53 +0000 (18:16 +0000)]
[MemorySSA] Add all MemoryPhis before filling their values.
Summary:
Add all MemoryPhis in IDF before filling in their incomign values.
Otherwise, a new Phi can be added that needs to become the incoming
value of another Phi.
Test fails the verification in verifyPrevDefInPhis.
[AMDGPU] Pass to propagate ABI attributes from kernels to the functions
The pass works in two modes:
Mode 1: Just set attributes starting from kernels. This can work at
the very beginning of opt and llc pipeline, but cannot clone functions
because it must be a function pass.
Mode 2: Actually clone functions for new attributes. This can only work
after all function passes in the opt pipeline because it has to be a
module pass.
Simon Pilgrim [Mon, 17 Jun 2019 17:22:38 +0000 (17:22 +0000)]
[X86][AVX] Split under-aligned vector nt-stores.
If a YMM/ZMM non-temporal store has less than natural alignment, split the vector - either they will be satisfactorily aligned or will continue to be split until they are XMMs - at which point the legalizer will scalarize it.
Warren Ristow [Mon, 17 Jun 2019 17:20:08 +0000 (17:20 +0000)]
[LV] Suppress vectorization in some nontemporal cases
When considering a loop containing nontemporal stores or loads for
vectorization, suppress the vectorization if the corresponding
vectorized store or load with the aligment of the original scaler
memory op is not supported with the nontemporal hint on the target.
This adds two new functions:
bool isLegalNTStore(Type *DataType, unsigned Alignment) const;
bool isLegalNTLoad(Type *DataType, unsigned Alignment) const;
to TTI, leaving the target independent default implementation as
returning true, but with overriding implementations for X86 that
check the legality based on available Subtarget features.
Matt Arsenault [Mon, 17 Jun 2019 17:01:35 +0000 (17:01 +0000)]
GlobalISel: Ignore callsite attributes when picking intrinsic type
A target intrinsic may be defined as possibly reading memory, but the
call site may have additional knowledge that it doesn't read
memory. The intrinsic lowering will expect the pessimistic assumption
of the intrinsic definition, so the chain should still be used.
Matt Arsenault [Mon, 17 Jun 2019 17:01:32 +0000 (17:01 +0000)]
GlobalISel: Verify intrinsics
I keep using the wrong instruction when manually writing tests. This
really needs to check the number of operands, but I don't see an easy
way to do that right now.
Whitney Tsang [Mon, 17 Jun 2019 14:38:56 +0000 (14:38 +0000)]
PHINode: introduce setIncomingValueForBlock() function, and use it.
Summary:
There is PHINode::getBasicBlockIndex() and PHINode::setIncomingValue()
but no function to replace incoming value for a specified BasicBlock*
predecessor.
Clearly, there are a lot of places that could use that functionality.
For loads, pre-SSE41 we can't perform NT loads at all, and after that we can only perform vector aligned loads, so if the alignment is less than for a xmm we'll just end up using the regular unaligned vector loads anyway.
First step towards fixing PR42026 - the next step for stores will be to use SSE4A movntsd where possible and to avoid the stack spill on SSE2 targets.
Matt Arsenault [Mon, 17 Jun 2019 14:13:29 +0000 (14:13 +0000)]
InferAddressSpaces: Fix cloning original addrspacecast
If an addrspacecast needed to be inserted again, this was creating a
clone of the original cast for each user. Just use the original, which
also saves losing the value name.