granicus.if.org Git

[LoopVectorize] Fix assertion failure in Fcmp vectorization

Summary:
When vectorizing fcmps we can trip on incorrect cast assertion when setting the
FastMathFlags after generating the vectorized FCmp.
This can happen if the FCmp can be folded to true or false directly. The fix
here is to set the FastMathFlag using the FastMathFlagBuilder *before* creating
the FCmp Instruction. This is what's done by other optimizations such as
InstCombine.
Added a test case which trips on cast assertion without this patch.

Reviewers: Ayal, mssimpso, mkuper, gilr

Reviewed by: Ayal, mssimpso

Subscribers: llvm-commits, mzolotukhin

Differential Revision: https://reviews.llvm.org/D36244

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310389 91177308-0d34-0410-b5e6-96231b3b80d8

Revert "[ARM] Fix assembly and disassembly for VMRS/VMSR"

This reverts r310243. Only MVFR2 is actually restricted to v8 and it'll be a
little while before we can get a proper fix together. Better that we allow
incorrect code than reject correct in the meantime.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310384 91177308-0d34-0410-b5e6-96231b3b80d8

[DomTree] Use a non-recursive DFS instead of a recursive one; NFC

Summary: The recursive DFS can stack overflow in pathological cases.

Reviewers: kuhar

Subscribers: mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D36442

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310383 91177308-0d34-0410-b5e6-96231b3b80d8

[KnownBits][ValueTracking] Move the math for calculating known bits for add/sub into a static method in KnownBits object

I want to reuse this code in SimplifyDemandedBits handling of Add/Sub. This will make that easier.

Wonder if we should use it in SelectionDAG's computeKnownBits too.

Differential Revision: https://reviews.llvm.org/D36433

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310378 91177308-0d34-0410-b5e6-96231b3b80d8

[RISCV] Fix warning about unused getSubtargetFeatureName()

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310375 91177308-0d34-0410-b5e6-96231b3b80d8

BasicAA: aliasGEP shouldn't get a PartialAlias response here
add an assert() to ensure that's the case (as I'm not convinced it won't happen)

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310373 91177308-0d34-0410-b5e6-96231b3b80d8

[DAGCombiner] simplifyShuffleMask - handle UNDEF inputs from shuffles as well as BUILD_VECTOR

Minor extension to D36393

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310372 91177308-0d34-0410-b5e6-96231b3b80d8

[RISCV] Add basic RISCVAsmParser (missing files)

This commit adds the files missing from rL310361. Apologies for the noise.

Differential Revision: https://reviews.llvm.org/D23563

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310363 91177308-0d34-0410-b5e6-96231b3b80d8

[RISCV] Add basic RISCVAsmParser

This doesn't yet support parsing things like %pcrel_hi(foo), but will handle
basic instructions with register or immediate operands.

Differential Revision: https://reviews.llvm.org/D23563

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310361 91177308-0d34-0410-b5e6-96231b3b80d8

[PowerPC] Don't crash on larger splats achieved through 1-byte splats

We've implemented a 1-byte splat using XXSPLTISB on P9. However, LLVM will
produce a 1-byte splat even for wider element BUILD_VECTOR nodes. This patch
prevents crashing in that situation.

Differential Revision: https://reviews.llvm.org/D35650

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310358 91177308-0d34-0410-b5e6-96231b3b80d8

[globalisel][tablegen] Remove unnecessary ; to satisfy ubuntu-gcc7.1-werror.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310357 91177308-0d34-0410-b5e6-96231b3b80d8

Appease compilers that have the -Wcovered-switch-default switch.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310356 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Improved X86::CMOV to Branch heuristic.
Resolved PR33954.
This patch contains two more constraints that aim to reduce the noise cases where we convert CMOV into branch for small gain, and end up spending more cycles due to overhead.

Differential Revision: https://reviews.llvm.org/D36081

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310352 91177308-0d34-0410-b5e6-96231b3b80d8

[PowerPC] Eliminate compares - add i32 sext/zext handling for SETLE/SETGE

Adds handling for SETLE/SETGE comparisons on i32 values. Furthermore, it adds
the handling for the special case where RHS == 0.

Differential Revision: https://reviews.llvm.org/D34048

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310346 91177308-0d34-0410-b5e6-96231b3b80d8

[DAGCombiner] Simplify shuffle mask index if the referenced input element is UNDEF

Fixes one of the cases in PR34041.

Differential Revision: https://reviews.llvm.org/D36393

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310344 91177308-0d34-0410-b5e6-96231b3b80d8

[globalisel][tablegen] Add support for importing 'imm' operands.

Summary:
This patch enables the import of rules containing 'imm' operands that do not
constrain the acceptable values using predicates. Support for ImmLeaf will
arrive in a later patch.

Depends on D35681

Reviewers: ab, t.p.northover, qcolombet, rovka, aditya_nandakumar

Reviewed By: rovka

Subscribers: kristof.beyls, javed.absar, igorb, llvm-commits

Differential Revision: https://reviews.llvm.org/D35833

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310343 91177308-0d34-0410-b5e6-96231b3b80d8

[PM] Fix a likely more critical infloop bug in the CGSCC pass manager.

This was just a bad oversight on my part. The code in question should
never have worked without this fix. But it turns out, there are
relatively few places that involve libfunctions that participate in
a single SCC, and unless they do, this happens to not matter.

The effect of not having this correct is that each time through this
routine, the edge from write_wrapper to write was toggled between a call
edge and a ref edge. First time through, it becomes a demoted call edge
and is turned into a ref edge. Next time it is a promoted call edge from
a ref edge. On, and on it goes forever.

I've added the asserts which should have always been here to catch silly
mistakes like this in the future as well as a test case that will
actually infloop without the fix.

The other (much scarier) infinite-inlining issue I think didn't actually
occur in practice, and I simply misdiagnosed this minor issue as that
much more scary issue. The other issue *is* still a real issue, but I'm
somewhat relieved that so far it hasn't happened in real-world code
yet...

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310342 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] Cast to BinaryOperator earlier in foldSelectIntoOp to simplify the code.

We no longer need the explicit operand count check or the later dynamic cast.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310339 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Fix warnings introduced by r310336

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310337 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Move R600 parts of AMDGPUISelDAGToDAG into their own class

Summary: This refactoring is required in order to split the R600 and GCN tablegen files.

Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D36286

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310336 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Also remove SI from docs

Differential Revision: https://reviews.llvm.org/D36424

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310335 91177308-0d34-0410-b5e6-96231b3b80d8

[PM] Relax the spelling of a pass name slightly in this test.

I forgot that MSVC doesn't preserve this typedef, my bad.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310334 91177308-0d34-0410-b5e6-96231b3b80d8

[PM] Fix new LoopUnroll function pass by invalidating loop analysis
results when a loop is completely removed.

This is very hard to manifest as a visible bug. You need to arrange for
there to be a subsequent allocation of a 'Loop' object which gets the
exact same address as the one which the unroll deleted, and you need the
LoopAccessAnalysis results to be significant in the way that they're
stale. And you need a million other things to align.

But when it does, you get a deeply mysterious crash due to actually
finding a stale analysis result. This fixes the issue and tests for it
by directly checking we successfully invalidate things. I have not been
able to get *any* test case to reliably trigger this. Changes to LLVM
itself caused the only test case I ever had to cease to crash.

I've looked pretty extensively at less brittle ways of fixing this and
they are actually very, very hard to do. This is a somewhat strange and
unusual case as we have a pass which is deleting an IR unit, but is not
running within that IR unit's pass framework (which is what handles this
cleanly for the normal loop unroll). And where there isn't a definitive
way to clear *all* of the stale cache entries. And where the pass *is*
updating the core analysis that provides the IR units!

For example, we don't have any of these problems with Function analyses
because it is easy to clear out function analyses when the functions
themselves may have been deleted -- we clear an entire module's worth!
But that is too heavy of a hammer down here in the LoopAnalysisManager
layer.

A better long-term solution IMO is to require that AnalysisManager's
make their keys durable to this kind of thing. Specifically, when
caching an analysis for one IR unit that is conceptually "owned" by
a higher level IR unit, the AnalysisManager should incorporate this into
its data structures so that we can reliably clear these results without
having to teach each and every pass to do so manually as we do here. But
that is a change for another day as it will be a fairly invasive change
to the AnalysisManager infrastructure. Until then, this fortunately
seems to be quite rare.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310333 91177308-0d34-0410-b5e6-96231b3b80d8

[AMDGPU] Fix some Clang-tidy modernize-use-using and Include What You Use warnings; other minor fixes (NFC).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310328 91177308-0d34-0410-b5e6-96231b3b80d8

[libFuzzer] simplify code, NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310326 91177308-0d34-0410-b5e6-96231b3b80d8

[libFuzzer] remove stale code

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310325 91177308-0d34-0410-b5e6-96231b3b80d8

[libFuzzer] simplify the implementation of -print_coverage=1

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310324 91177308-0d34-0410-b5e6-96231b3b80d8

[KnownBits] Fix copy pasto in comment. NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310320 91177308-0d34-0410-b5e6-96231b3b80d8

[X86][AVX] Added test for broadcast shuffle from binary sources with undefs (D36393)

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310317 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Implement getMinimumNopSize

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310310 91177308-0d34-0410-b5e6-96231b3b80d8

[Object] Initialize LoadConfig member to null

Executables may not contain a load config, and clients should be able to
test for nullability. Previously we'd return uninitialized memory. Now
getLoadConfig32/64 return valid pointers or null.

Fixes PR34108

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310308 91177308-0d34-0410-b5e6-96231b3b80d8

Do not instrument libFuzzer itself when built with -DLLVM_USE_SANITIZE_COVERAGE

Fixes regression from https://reviews.llvm.org/D36295

Differential Revision: https://reviews.llvm.org/D36428

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310305 91177308-0d34-0410-b5e6-96231b3b80d8

[llvm-pdbutil] Don't crash when a section contrib's isect is invalid.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310298 91177308-0d34-0410-b5e6-96231b3b80d8

Move the SampleProfileLoader right after EarlyFPM.

Summary: SampleProfileLoader pass do need to happen after some early cleanup passes so that inlining can happen correctly inside the SampleProfileLoader pass.

Reviewers: chandlerc, davidxl, tejohnson

Reviewed By: chandlerc, tejohnson

Subscribers: sanjoy, mehdi_amini, eraman, llvm-commits

Differential Revision: https://reviews.llvm.org/D36333

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310296 91177308-0d34-0410-b5e6-96231b3b80d8

Reapply fix PR23384 (part 3 of 3) r304824 (was reverted in r305720).
The root cause of reverting was fixed - PR33514.

Summary:
The patch makes instruction count the highest priority for
LSR solution for X86 (previously registers had highest priority).

Reviewers: qcolombet

Differential Revision: http://reviews.llvm.org/D30562

From: Evgeny Stupachenko <evstupac@gmail.com>
<evgeny.v.stupachenko@intel.com>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310289 91177308-0d34-0410-b5e6-96231b3b80d8

Removing an unused variable that was missed with the refactoring in r310272; NFC.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310285 91177308-0d34-0410-b5e6-96231b3b80d8

[AMDGPU] Add pseudo "old" source to all DPP instructions

Summary:
All instructions with the DPP modifier may not write to certain lanes of
the output if bound_ctrl=1 is set or any bits in bank_mask or row_mask
aren't set, so the destination register may be both defined and modified.
The right way to handle this is to add a constraint that the destination
register is the same as one of the inputs. We could tie the destination
to the first source, but that would be too restrictive for some use-cases
where we want the destination to be some other value before the
instruction executes. Instead, add a fake "old" source and tie it to the
destination. Effectively, the "old" source defines what value unwritten
lanes will get. We'll expose this functionality to users with a new
intrinsic later.

Also, we want to use DPP instructions for computing derivatives, which
means we need to set WQM for them. We also need to enable the entire
wavefront when using DPP intrinsics to implement nonuniform subgroup
reductions, since otherwise we'll get incorrect results in some cases.
To accomodate this, add a new operand to all DPP instructions which will
be interpreted by the SI WQM pass. This will be exposed with a new
intrinsic later. We'll also add support for Whole Wavefront Mode later.

I also fixed llvm.amdgcn.mov.dpp to overwrite the source and fixed up
the test. However, I could also keep the old behavior (where lanes that
aren't written are undefined) if people want it.

Reviewers: tstellar, arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D34716

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310283 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Remove -mcpu=SI

Leftover from before amdgcn/r600 split.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310277 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Remove redundant opt level check

addOptimizedRegAlloc isn't used for -O0 already.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310275 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Remove FixControlFlowLiveIntervals pass

This hasn't done anything in a long time. This was
running after the the control flow pseudos were expanded,
so this would never find them. The control flow pseudo
expansion was moved to solve the problem this pass was
supposed to solve in the first place, except handling
it earlier also fixes it for fast regalloc which doesn't
use LiveIntervals.

Noticed by checking LCOV reports.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310274 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] Support (X | C1) & C2 --> (X & C2^(C1&C2)) | (C1&C2) for vector splats

Note the original code I deleted incorrectly listed this as (X | C1) & C2 --> (X & C2^(C1&C2)) | C1 Which is only valid if C1 is a subset of C2. This relied on SimplifyDemandedBits to remove any extra bits from C1 before we got to that code.

My new implementation avoids relying on that behavior so that it can be naively verified with alive.

Differential Revision: https://reviews.llvm.org/D36384

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310272 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Use a custom areInlineCompatible

Fixes not inlining OpenCL library functions on AMDGPU,
which don't have an explicitly set target-cpu.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310269 91177308-0d34-0410-b5e6-96231b3b80d8

[X86][AVX] Add full test coverage of subvector_broadcasts from registers

X86SubVBroadcast is for memory subvector broadcasts, but we must test that it handles all cases without the load as well just in case.

This was noticed while I was triaging the test cases from PR34041.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310268 91177308-0d34-0410-b5e6-96231b3b80d8

[DebugInfo][DWARF] Address paulr's comment on rL310253.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310267 91177308-0d34-0410-b5e6-96231b3b80d8

[X86][AVX] Cleanup subvector broadcast tests - remove old prefixes.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310265 91177308-0d34-0410-b5e6-96231b3b80d8

[x86] revert r310208 to investigate test-suite failures (PR34105 / PR34097)

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310264 91177308-0d34-0410-b5e6-96231b3b80d8

[DebugInfo][DWARF] Correct some usages of PRIx32 to PRIx64

These lead to tests failing spuriously as the values after being rendered to a
string were incorrect.

Reviewers: clayborg

Differential Revision: https://reviews.llvm.org/D36319

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310262 91177308-0d34-0410-b5e6-96231b3b80d8

[SLP] General improvements of SLP vectorization process.

Patch tries to improve two-pass vectorization analysis, existing in SLP vectorizer. What it does:

1. Defines key nodes, that are the vectorization roots. Previously vectorization started if StoreInst or ReturnInst is found. For now, the vectorization started for all Instructions with no users and void types (Terminators, StoreInst) + CallInsts.
2. CmpInsts, InsertElementInsts and InsertValueInsts are stored in the
array. This array is processed only after the vectorization of the
first-after-these instructions key node is finished. Vectorization goes
in reverse order to try to vectorize as much code as possible.

Reviewers: mzolotukhin, Ayal, mkuper, gilr, hfinkel, RKSimon

Subscribers: ashahid, anemet, RKSimon, mssimpso, llvm-commits

Differential Revision: https://reviews.llvm.org/D29826

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310260 91177308-0d34-0410-b5e6-96231b3b80d8

Fix typo in comment

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310259 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Cleanup subtarget features

Try to avoid mutually exclusive features. Don't use
a real default GPU, and use a fake "generic". The goal
is to make it easier to see which set of features are
incompatible between feature strings.

Most of the test changes are due to random scheduling changes
from not having a default fullspeed model.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310258 91177308-0d34-0410-b5e6-96231b3b80d8

Revert "[SLP] General improvements of SLP vectorization process."

This reverts commit r310255.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310257 91177308-0d34-0410-b5e6-96231b3b80d8

[DAG] Extend visitSCALAR_TO_VECTOR optimization to truncated vector.

Relanding after case to insert explicit truncation as necessary.

Allow SCALAR_TO_VECTOR of EXTRACT_VECTOR_ELT to reduce to
EXTRACT_SUBVECTOR of vector shuffle when output is smaller. Marginally
improves vector shuffle computations.

Reviewers: efriedma, RKSimon, spatel

Subscribers: javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D35566

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310256 91177308-0d34-0410-b5e6-96231b3b80d8

[SLP] General improvements of SLP vectorization process.

Summary:
Patch tries to improve two-pass vectorization analysis, existing in SLP vectorizer. What it does:
1. Defines key nodes, that are the vectorization roots. Previously vectorization started if StoreInst or ReturnInst is found. For now, the vectorization started for all Instructions with no users and void types (Terminators, StoreInst) + CallInsts.
2. CmpInsts, InsertElementInsts and InsertValueInsts are stored in the array. This array is processed only after the vectorization of the first-after-these instructions key node is finished. Vectorization goes in reverse order to try to vectorize as much code as possible.

Reviewers: mzolotukhin, Ayal, mkuper, gilr, hfinkel, RKSimon

Subscribers: ashahid, anemet, RKSimon, mssimpso, llvm-commits

Differential Revision: https://reviews.llvm.org/D29826

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310255 91177308-0d34-0410-b5e6-96231b3b80d8

[TableGen] AsmMatcher: fix OpIdx computation when HasOptionalOperands is true

Relanding after fixing UB issue with DefaultOffsets.

Consider the following instruction: "inst.eq $dst, $src" where ".eq"
is an optional flag operand.  The $src and $dst operands are
registers.  If we parse the instruction "inst r0, r1", the flag is not
present and it will be marked in the "OptionalOperandsMask" variable.
After the matching is complete we call the "convertToMCInst" method.

The current implementation works only if the optional operands are at
the end of the array.  The "Operands" array looks like [token:"inst",
reg:r0, reg:r1].  The first operand that must be added to the MCInst
is the destination, the r0 register.  The "OpIdx" (in the Operands
array) for this register is 2.  However, since the flag is not present
in the Operands, the actual index for r0 should be 1.  The flag is not
present since we rely on the default value.

This patch removes the "NumDefaults" variable and replaces it with an
array (DefaultsOffset).  This array contains an index for each operand
(excluding the mnemonic).  At each index, the array contains the
number of optional operands that should be subtracted.  For the
previous example, this array looks like this: [0, 1, 1].  When we need
to access the r0 register, we compute its index as 2 -
DefaultsOffset[1] = 1.

Patch by Alexandru Guduleasa!

Reviewers: SamWot, nhaustov, niravd

Reviewed By: niravd

Subscribers: vitalybuka, llvm-commits

Differential Revision: https://reviews.llvm.org/D35998

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310254 91177308-0d34-0410-b5e6-96231b3b80d8

[DebugInfo][DWARF] Use PRIx64 explicitly in output.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310253 91177308-0d34-0410-b5e6-96231b3b80d8

[X86][LLVM]Expanding Supports lowerInterleavedStore() in X86InterleavedAccess (VF16 stride 4).

This patch expands the support of lowerInterleavedStore to 16x8i stride 4.

LLVM creates suboptimal shuffle code-gen for AVX2. In overall, this patch is a specific fix for the pattern (Strid=4 VF=16) and we plan to include more patterns in the future.

The patch goal is to optimize the following sequence:
At the end of the computation, we have ymm2, ymm0, ymm12 and ymm3 holding
each 16 chars:

c0, c1, , c16
m0, m1, , m16
y0, y1, , y16
k0, k1, ., k16

And these need to be transposed/interleaved and stored like so:

c0 m0 y0 k0 c1 m1 y1 k1 c2 m2 y2 k2 c3 m3 y3 k3 ....

Differential Revision: https://reviews.llvm.org/D35829

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310252 91177308-0d34-0410-b5e6-96231b3b80d8

[AMDGPU][MC] Corrected VOP3 version of v_interp_* instructions for VI

See bug 32621: https://bugs.llvm.org//show_bug.cgi?id=32621

Reviewers: vpykhtin, SamWot, arsenm

Differential Revision: https://reviews.llvm.org/D35902

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310251 91177308-0d34-0410-b5e6-96231b3b80d8

[llvm-objdump] Use PRIx64 for output of ARM64_RELOC_ADDEND

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310250 91177308-0d34-0410-b5e6-96231b3b80d8

[X86][AVX] Added test for broadcast shuffle with undefs (PR34041)

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310249 91177308-0d34-0410-b5e6-96231b3b80d8

[ARM] Fix assembly and disassembly for VMRS/VMSR

This patch addresses two issues with assembly and disassembly for VMRS/VMSR:

1.currently VMRS/VMSR instructions accessing fpsid, mvfr{0-2} and fpexc, are
accepted for non ARMv8-A targets.

2. all VMRS/VMSR instructions accept writing/reading to PC and SP, when only
ARMv7-A and ARMv8-A should be allowed to write/read to SP and none to PC.

This patch addresses those issues and adds tests for these cases.

Differential Revision: https://reviews.llvm.org/D36306

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310243 91177308-0d34-0410-b5e6-96231b3b80d8

[asan] Fix asan dynamic shadow check before copyArgsPassedByValToAllocas

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310242 91177308-0d34-0410-b5e6-96231b3b80d8

[asan] Disable checking of arguments passed by value for --asan-force-dynamic-shadow

Fails with "Instruction does not dominate all uses!"

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310241 91177308-0d34-0410-b5e6-96231b3b80d8

Add -asan-force-dynamic-shadow test

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310240 91177308-0d34-0410-b5e6-96231b3b80d8

[SelectionDAG] reset NewNodesMustHaveLegalTypes flag between basic blocks

The NewNodesMustHaveLegalTypes flag is set to false at the beginning of CodeGenAndEmitDAG, and set to true after legalizing types.
But before calling CodeGenAndEmitDAG we build the DAG for the basic block.
So for the first basic block NewNodesMustHaveLegalTypes would be 'false' during the SDAG building, and for all other basic blocks it would be 'true'.

This patch sets the flag to false before SDAG building each basic block.

Differential Revision:
https://reviews.llvm.org/D33435

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310239 91177308-0d34-0410-b5e6-96231b3b80d8

[Reassociate] Use a range loop for clarity. NFCI.

While here, rename `i` to `Rank` as the latter is more
self-explanatory (and this code also uses `I` two lines below to
identify an Instruction).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310238 91177308-0d34-0410-b5e6-96231b3b80d8

[Reassociate] Try to bail out early when canonicalizing.

This commit rearranges the checks to avoid calls to getRank()
when not needed (e.g. when RHS == LHS).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310237 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] Remove shift handling from OptAndOp.

Summary: This is all handled by SimplifyDemandedBits.

Reviewers: spatel, davide

Reviewed By: davide

Subscribers: davide, llvm-commits

Differential Revision: https://reviews.llvm.org/D36382

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310234 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] Support (X ^ C1) & C2 --> (X & C2) ^ (C1&C2) for vector splats.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310233 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] Support '(C - X) ^ signmask -> (C + signmask - X)' and '(X + C) ^ signmask -> (X + C + signmask)' for vector splats.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310232 91177308-0d34-0410-b5e6-96231b3b80d8

[SLPVectorizer][X86] Cleanup test case. NFCI

Remove excess attributes/metadata

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310227 91177308-0d34-0410-b5e6-96231b3b80d8

[llvm-dlltool] Map the "arm64" machine type

Differential Revision: https://reviews.llvm.org/D36365

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310223 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Fix typo in feature description

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310217 91177308-0d34-0410-b5e6-96231b3b80d8

[x86] use more shift or LEA for select-of-constants

We can convert any select-of-constants to math ops:
http://rise4fun.com/Alive/d7d

For this patch, I'm enhancing an existing x86 transform that uses fake multiplies
(they always become shl/lea) to avoid cmov or branching. The current code misses
cases where we have a negative constant and a positive constant, so this is just
trying to plug that hole.

The DAGCombiner diff prevents us from hitting a terrible inefficiency: we can start
with a select in IR, create a select DAG node, convert it into a sext, convert it
back into a select, and then lower it to sext machine code.

Some notes about the test diffs:

1. 2010-08-04-MaskedSignedCompare.ll - We were creating control flow that didn't exist in the IR.
2. memcmp.ll - Choose -1 or 1 is the case that got me looking at this again. I
   think we could avoid the push/pop in some cases if we used 'movzbl %al' instead of an xor on
   a different reg? That's a post-DAG problem though.
3. mul-constant-result.ll - The trade-off between sbb+not vs. setne+neg could be addressed if
   that's a regression, but I think those would always be nearly equivalent.
4. pr22338.ll and sext-i1.ll - These tests have undef operands, so I don't think we actually care about these diffs.
5. sbb.ll - This shows a win for what I think is a common case: choose -1 or 0.
6. select.ll - There's another borderline case here: cmp+sbb+or vs. test+set+lea? Also, sbb+not vs. setae+neg shows up again.
7. select_const.ll - These are motivating cases for the enhancement; replace cmov with cheaper ops.

Assembly differences between movzbl and xor to avoid a partial reg stall are caused later by the X86 Fixup SetCC pass.

Differential Revision: https://reviews.llvm.org/D35340

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310208 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Add comment to match closing Defs = [FPSW]. NFCI.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310202 91177308-0d34-0410-b5e6-96231b3b80d8

[X86][X87] Regenerate inline-asm tests

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310201 91177308-0d34-0410-b5e6-96231b3b80d8

[AVR] Compute code model if one is not provided

The patch from r310028 fixed things to work with the new
`LLVMTargetMachine` constructor that came in on r309911.
However, the fix was partial since an object of type
`CodeModel::Model` must be passed to `LLVMTargetMachine`
(not one of `Optional<CodeModel::Model>`).

This patch fixes the problem in the same fashion that r309911
did for other machines: by checking if the passed optional
code model has a value and using `CodeModel::Small` if not.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310200 91177308-0d34-0410-b5e6-96231b3b80d8

[X86][X87] Add test case for PR34080

Test with/without the sandybridge (default) model for SSE2, SSE3 and AVX targets.

pre-SSE3 the issue is the order of the fpsw and fpcw load/stores (with SSE3 trunc-store FIST instructions avoid the sw/cw manipulations).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310198 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] Support ~(c-X) --> X+(-c-1) and ~(X-c) --> (-c-1)-X for splat vectors.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310195 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Enable isel to use the PAUSE instruction even when SSE2 is disabled

Summary:
On older processors this instruction encoding is treated as a NOP.

MSVC doesn't disable intrinsics based on features the way clang/gcc does. Because the PAUSE instruction encoding doesn't crash older processors, some software out there uses these intrinsics without checking for SSE2.

This change also seems to also be consistent with gcc behavior.

Fixes PR34079

Reviewers: RKSimon, zvi

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D36361

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310190 91177308-0d34-0410-b5e6-96231b3b80d8

[ADT] Add a much simpler loop to DenseMap::clear when the types are
POD-like and we can just splat the empty key across memory.

Sadly we can't optimize the normal loop well enough because we can't
turn the conditional store into an unconditional store according to the
memory model.

This loop actually showed up in a profile of code that was calling clear
as a serious source of time. =[

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310189 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] Regenerate test28_sub test case in xor.ll that I forgot to commit after fixing a typo in r310186.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310188 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] Fold (C - X) ^ signmask -> (C + signmask - X).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310186 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] Teach the code that pulls logical operators through constant shifts to handle vector splats too.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310185 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] Support vector splats in foldSelectICmpAnd.

Unfortunately, it looks like there's some other missed optimizations in the generated code for some of these cases. I'll try to look at some of those next.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310184 91177308-0d34-0410-b5e6-96231b3b80d8

[SLPVectorizer] Add extra parameter to setInsertPointAfterBundle to handle different opcodes, NFCI.

Differential Revision: https://reviews.llvm.org/D35769

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310183 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] refactor trunc(binop) transforms; NFCI

In addition to moving the shift transforms over, we may want to
detect too-wide rotate patterns here (PR34046).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310181 91177308-0d34-0410-b5e6-96231b3b80d8

[ARM] The ARM backend is MachineVerifier clean now.

Summary: Thanks everyone involved in fixing the outstanding issues.

Reviewers: rovka, MatzeB, efriedma

Reviewed By: MatzeB

Subscribers: aemerson, javed.absar, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D36153

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310180 91177308-0d34-0410-b5e6-96231b3b80d8

[ARM] Add registers to debuginfo MIR test cases.

Summary:
MIRParserImpl::computeFunctionProperties uses MRI.getNumVirtRegs() to
set the NoVReg property. By adding a bunch of registers to the MIR test
cases, the NoVReg property is not set when importing the MIR. Otherwise
NoVReg is set after instruction selection while the machine instructions
still contain virtual registers, causing expensive checks to fail.

Reviewers: efriedma, MatzeB, aprantl

Reviewed By: MatzeB, aprantl

Subscribers: aemerson, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D36152

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310178 91177308-0d34-0410-b5e6-96231b3b80d8

[LCG] Remove yet another variable only used inside of asserts.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310174 91177308-0d34-0410-b5e6-96231b3b80d8

[LCG] Fold otherwise unused variable into assert.

No functionality change intended.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310173 91177308-0d34-0410-b5e6-96231b3b80d8

IPRA: Don't crash on null getCallPreservedMask

Kernels aren't callable, so they don't have a call preserved mask.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310172 91177308-0d34-0410-b5e6-96231b3b80d8

[LCG] Completely remove the parent set and leaf tracking for RefSCCs.

After the previous series of patches, this is now trivial and deletes
a pretty astonishing amount of complexity. This has been a long time
coming, as the move toward a PO sequence of RefSCCs started eroding the
underlying use cases for this half of the data structure.

Among the biggest advantages here is that now there aren't two
independent data structures that need to stay in sync.

Some of my profiling has also indicated that updating the parent sets
was among the most expensive parts of the lazy call graph. Eliminating
it whole sale is likely to be a nice win in terms of compile time.

Last but not least, I had discussed with some folks previously keeping
it around for asserts and other correctness checking, but once the
fundamentals of the parent and child checking were implemented without
the parent sets their value in correctness checking was tiny and no
where near worth the cost of the complexity required to keep everything
up-to-date.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310171 91177308-0d34-0410-b5e6-96231b3b80d8

[LCG] Re-implement the basic isParentOf, isAncestorOf, isChildOf, and
isDescendantOf methods on RefSCCs in terms of the forward edges rather
than the parent sets.

This is technically slower, but probably not interestingly slower, and
all of these routines were already so expensive that they're guarded
behind both !NDEBUG and EXPENSIVE_CHECKS.

This removes another non-critical usage of parent sets.

I've also added some comments to try and help clarify to any potential
users the costs of these routines. They're mostly useful for debugging,
asserts, or other queries.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310170 91177308-0d34-0410-b5e6-96231b3b80d8

[LCG] Add the concept of a "dead" node and use it to avoid a complex
walk over the parent set.

When removing a single function from the call graph, we previously would
walk the entire RefSCC's parent set and then walk every outgoing edge
just to find the ones to remove. In addition to this being quite high
complexity in theory, it is also the last fundamental use of the parent
sets.

With this change, when we remove a function we transform the node
containing it to be recognizably "dead" and then teach the edge
iterators to recognize edges to such nodes and skip them the same way
they skip null edges.

We can't move fully to using "dead" nodes -- when disconnecting two live
nodes we need to null out the edge. But the complexity this adds to the
edge sequence isn't too bad and the simplification of lazily handling
this seems like a significant win.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310169 91177308-0d34-0410-b5e6-96231b3b80d8

[AArch64] LSE Atomics reorg - part 1

Add memory synchronization semantics to LSE Atomics.

The memory semantics feature will be added in a subsequent patch.

In this patch, several corrections were added to the existing LSE Atomics
implementation, based on the ARM Errata D11904 from 05/12/2017.

Patch by: steleman

Differential Revision: https://reviews.llvm.org/D35319

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310167 91177308-0d34-0410-b5e6-96231b3b80d8

[LCG] Replace an implicit bool operator with a named function. (NFC)

The definition of 'false' here was already pretty vague and debatable,
and I'm about to add another potential 'false' that would actually make
much more sense in a bool operator. Especially given how rarely this is
used, a nicely named method seems better.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310165 91177308-0d34-0410-b5e6-96231b3b80d8

[LCG] When removing a dead function and clearing out the data
structures, actually null out the graph pointers as well. We won't ever
update these, and we certainly shouldn't be calling any methods on them,
so it seems good to defensively nuke them.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310164 91177308-0d34-0410-b5e6-96231b3b80d8

[LCG] Rather than walking the directed graph structure to update graph
pointers in node objects, just walk the map from function to node.

It doesn't have stable ordering, but works just as well and is much
simpler. We don't need ordering when just updating internal pointers.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310163 91177308-0d34-0410-b5e6-96231b3b80d8

[LCG] Remove the complex walk of the parent sets to update graph
pointers.

This is completely unnecessary as we have a trivial list of RefSCCs now
that we can walk.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310162 91177308-0d34-0410-b5e6-96231b3b80d8

[LCG] Remove the use of the parent sets to compute connectivity when
merging RefSCCs.

The logic to directly use the reference edges is simpler and not
substantially slower (despite the comments to the contrary) because this
is not actually an especially hot part of LCG in practice.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310161 91177308-0d34-0410-b5e6-96231b3b80d8