granicus.if.org Git

Reduced test case for pr42279 in advance of the relevant re-commit + fix

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363601 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Explicitly define a triple for some tests

Summary:
This is related to the changes to the groupstaticsize intrinsic in
D61494 which would otherwise make the related tests in these files
fail or much less useful.

Note that for some reason, SOPK generation is less effective in the
amdhsa OS, which is why I chose PAL. I haven't investigated this
deeper.

Change-Id: I6bb99569338f7a433c28b4c9eb1e3e036b00d166

Reviewers: arsenm

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63392

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363600 91177308-0d34-0410-b5e6-96231b3b80d8

[EarlyCSE] Fix hashing of self-compares

Summary:
Update compare normalization in SimpleValue hashing to break ties (when
the same value is being compared to itself) by switching to the swapped
predicate if it has a lower numerical value. This brings the hashing in
line with isEqual, which already recognizes the self-compares with
swapped predicates as equal.

Fixes PR 42280.

Reviewers: spatel, efriedma, nikic, fhahn, uabelho

Reviewed By: nikic

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63349

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363598 91177308-0d34-0410-b5e6-96231b3b80d8

[MemorySSA] Don't use template when the clone is a simplified instruction.

Summary:
LoopRotate doesn't create a faithful clone of an instruction, it may
simplify it beforehand. Hence the clone of an instruction that has a
MemoryDef associated may not be a definition, but a use or not a memory
alternig instruction.
Don't rely on the template when the clone may be simplified.

Reviewers: george.burgess.iv

Subscribers: jlebar, Prazek, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63355

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363597 91177308-0d34-0410-b5e6-96231b3b80d8

[GlobalISel][AArch64] Fold G_SUB into G_ICMP when it's safe to do so

Basically porting over the behaviour in AArch64ISelLowering to GISel. See
emitComparison for reference.

When we have something like this:

```
  lhs = G_SUB 0, y
  ...
  G_ICMP lhs, rhs
```

We can fold away the G_SUB and produce a cmn instead, given that we produce
the same value in NZCV.

Add a test showing that the transformation works, and also showing that we
don't perform the transformation when it's unsafe.

Also factor out the CSet emission into emitCSetForICMP.

Differential Revision: https://reviews.llvm.org/D63163

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363596 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Add TB_NO_REVERSE to some memory folding table entries where the register form requires 64-bit mode, but the memory form does not.

We don't know if its safe to unfold if we're in 32-bit mode.

This is simlar to what was done to some load opcodes in r363523.

I think its pretty unlikely we will try to unfold these anyway so
I don't think this is testable.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363595 91177308-0d34-0410-b5e6-96231b3b80d8

LiveInterval.h: add LiveRange::findIndexesLiveAt function - return a list of SlotIndexes the LiveRange live at.

Differential revision: https://reviews.llvm.org/D62411

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363593 91177308-0d34-0410-b5e6-96231b3b80d8

[X86][SSE] Scalarize under-aligned XMM vector nt-stores (PR42026)

If a XMM non-temporal store has less than natural alignment, scalarize the vector - with SSE4A we can stay on the vector and use MOVNTSD(f64), else we must move to GPRs and use MOVNTI(i32/i64).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363592 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Make getreg intrinsic inaccessiblememonly

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363591 91177308-0d34-0410-b5e6-96231b3b80d8

[MemorySSA] Add all MemoryPhis before filling their values.

Summary:
Add all MemoryPhis in IDF before filling in their incomign values.
Otherwise, a new Phi can be added that needs to become the incoming
value of another Phi.
Test fails the verification in verifyPrevDefInPhis.

Reviewers: george.burgess.iv

Subscribers: jlebar, Prazek, zzheng, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63353

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363590 91177308-0d34-0410-b5e6-96231b3b80d8

[AMDGPU] gfx1010 wavefrontsize intrinsic folding

Differential Revision: https://reviews.llvm.org/D63206

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363588 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Fold readlane/readfirstlane calls

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363587 91177308-0d34-0410-b5e6-96231b3b80d8

[AMDGPU] Pass to propagate ABI attributes from kernels to the functions

The pass works in two modes:

Mode 1: Just set attributes starting from kernels. This can work at
the very beginning of opt and llc pipeline, but cannot clone functions
because it must be a function pass.

Mode 2: Actually clone functions for new attributes. This can only work
after all function passes in the opt pipeline because it has to be a
module pass.

Differential Revision: https://reviews.llvm.org/D63208

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363586 91177308-0d34-0410-b5e6-96231b3b80d8

gn build: Merge r363541

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363583 91177308-0d34-0410-b5e6-96231b3b80d8

[X86][AVX] Split under-aligned vector nt-stores.

If a YMM/ZMM non-temporal store has less than natural alignment, split the vector - either they will be satisfactorily aligned or will continue to be split until they are XMMs - at which point the legalizer will scalarize it.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363582 91177308-0d34-0410-b5e6-96231b3b80d8

[LV] Suppress vectorization in some nontemporal cases

When considering a loop containing nontemporal stores or loads for
vectorization, suppress the vectorization if the corresponding
vectorized store or load with the aligment of the original scaler
memory op is not supported with the nontemporal hint on the target.

This adds two new functions:
bool isLegalNTStore(Type *DataType, unsigned Alignment) const;
bool isLegalNTLoad(Type *DataType, unsigned Alignment) const;

to TTI, leaving the target independent default implementation as
returning true, but with overriding implementations for X86 that
check the legality based on available Subtarget features.

This fixes https://llvm.org/PR40759

Differential Revision: https://reviews.llvm.org/D61764

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363581 91177308-0d34-0410-b5e6-96231b3b80d8

GlobalISel: Ignore callsite attributes when picking intrinsic type

A target intrinsic may be defined as possibly reading memory, but the
call site may have additional knowledge that it doesn't read
memory. The intrinsic lowering will expect the pessimistic assumption
of the intrinsic definition, so the chain should still be used.

I fixed the same bug in SelectionDAG in r287593.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363580 91177308-0d34-0410-b5e6-96231b3b80d8

GlobalISel: Verify intrinsics

I keep using the wrong instruction when manually writing tests. This
really needs to check the number of operands, but I don't see an easy
way to do that right now.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363579 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU/GlobalISel: Account for multiple defs when finding intrinsic ID

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363578 91177308-0d34-0410-b5e6-96231b3b80d8

[AMDGPU] gfx1010 wave32 metadata

Differential Revision: https://reviews.llvm.org/D63207

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363577 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU/GlobalISel: Implement select for G_ICMP and G_SELECT

Reviewers: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, hiraditya, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60640

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363576 91177308-0d34-0410-b5e6-96231b3b80d8

[Remarks] Extend -fsave-optimization-record to specify the format

Use -fsave-optimization-record=<format> to specify a different format
than the default, which is YAML.

For now, only YAML is supported.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363573 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] combineLoad - begun making the load split code more generic. NFCI.

This is currently only used for ymm->xmm splitting but we shouldn't hardcode the offsets/alignment.

This is necessary for an upcoming patch to split under-aligned non-temporal vector loads.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363570 91177308-0d34-0410-b5e6-96231b3b80d8

PHINode: introduce setIncomingValueForBlock() function, and use it.

Summary:
There is PHINode::getBasicBlockIndex() and PHINode::setIncomingValue()
but no function to replace incoming value for a specified BasicBlock*
predecessor.
Clearly, there are a lot of places that could use that functionality.

Reviewer: craig.topper, lebedev.ri, Meinersbur, kbarton, fhahn
Reviewed By: Meinersbur, fhahn
Subscribers: fhahn, hiraditya, zzheng, jsji, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D63338

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363566 91177308-0d34-0410-b5e6-96231b3b80d8

[X86][SSE] Add tests for underaligned nt loads

Test both 'unaligned' (which we should just use regular unaligned loads) and 'subvector aligned' (which we should split)

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363565 91177308-0d34-0410-b5e6-96231b3b80d8

[X86][SSE] Prevent misaligned non-temporal vector load/store combines

For loads, pre-SSE41 we can't perform NT loads at all, and after that we can only perform vector aligned loads, so if the alignment is less than for a xmm we'll just end up using the regular unaligned vector loads anyway.

First step towards fixing PR42026 - the next step for stores will be to use SSE4A movntsd where possible and to avoid the stack spill on SSE2 targets.

Differential Revision: https://reviews.llvm.org/D63246

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363564 91177308-0d34-0410-b5e6-96231b3b80d8

InferAddressSpaces: Fix cloning original addrspacecast

If an addrspacecast needed to be inserted again, this was creating a
clone of the original cast for each user. Just use the original, which
also saves losing the value name.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363562 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Ignore subtarget for InferAddressSpaces

Even if the target doesn't have flat instructions, addrspace(0) is
still flat. It just happens to not work.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363561 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Mark exp/exp.compr as inaccessiblememonly

Should also be marked writeonly, but I think that would require
splitting the version with done set to a separate intrinsic

Test change is only from renumbering the attribute group numbers,
which for some reason the generated check lines consider.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363560 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU/GlobalISel: Fix default mapping for non-register operands

Tests will be in future commits when new intrinsics are handled here.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363559 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Cleanup custom PseudoSourceValue definitions

Use separate enums for each kind, avoid repeating overloads, and add
missing classof implementation.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363558 91177308-0d34-0410-b5e6-96231b3b80d8

[CodeGen] Check for HardwareLoop Latch ExitBlock

The HardwareLoops pass finds exit blocks with a scevable exit count.
If the target specifies to update the loop counter in a register,
through a phi, we need to ensure that the exit block is a latch so
that we can insert the phi with the correct value for the incoming
edge.

Differential Revision: https://reviews.llvm.org/D63336

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363556 91177308-0d34-0410-b5e6-96231b3b80d8

[X86][SSE] Avoid unnecessary stack codegen in NT store codegen tests.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363552 91177308-0d34-0410-b5e6-96231b3b80d8

AsmPrinter: add doc-string for EmitLinkage

Change-Id: I376fcbd58f84a2aac6aaf744bc1665c92d312b25

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363550 91177308-0d34-0410-b5e6-96231b3b80d8

gn build: Merge r363530

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363549 91177308-0d34-0410-b5e6-96231b3b80d8

[LV] Deny irregular types in interleavedAccessCanBeWidened

Summary:
Avoid that loop vectorizer creates loads/stores of vectors
with "irregular" types when interleaving. An example of
an irregular type is x86_fp80 that is 80 bits, but that
may have an allocation size that is 96 bits. So an array
of x86_fp80 is not bitcast compatible with a vector
of the same type.

Not sure if interleavedAccessCanBeWidened is the best
place for this check, but it solves the problem seen
in the added test case. And it is the same kind of check
that already exists in memoryInstructionCanBeWidened.

Reviewers: fhahn, Ayal, craig.topper

Reviewed By: fhahn

Subscribers: hiraditya, rkruppe, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63386

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363547 91177308-0d34-0410-b5e6-96231b3b80d8

Test forward references in IntrinsicEmitter on Neon LD(2|3|4)

This patch tests the forward-referencing added in D62995 by changing
some existing intrinsics to use forward referencing of overloadable
parameters, rather than backward referencing.

This patch changes the TableGen definition/implementation of
llvm.aarch64.neon.ld2lane and llvm.aarch64.neon.ld2lane intrinsics
(and similar for ld3 and ld4). This change is intended to be
non-functional, since the behaviour of the intrinsics is
expected to be the same.

Reviewers: arsenm, dmgreen, RKSimon, greened, rnk

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D63189

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363546 91177308-0d34-0410-b5e6-96231b3b80d8

[DAGCombiner] [CodeGenPrepare] More comprehensive GEP splitting

Some GEPs were not being split, presumably because that split would just be
undone by the DAGCombiner. Not performing those splits can prevent important
optimizations, such as preventing the element indices / member offsets from
being (partially) folded into load/store instruction immediates. This patch:

- Makes the splits also occur in the cases where the base address and the GEP
are in the same BB.
- Ensures that the DAGCombiner doesn't reassociate them back again.

Differential Revision: https://reviews.llvm.org/D60294

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363544 91177308-0d34-0410-b5e6-96231b3b80d8

Fix clang -Wcovered-switch-default after stack-id change by D60137

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363543 91177308-0d34-0410-b5e6-96231b3b80d8

[SelectionDAG] Fold insert_subvector(undef, extract_subvector(v, c), c) -> v in getNode

This is already done in DAGCombiner::visitINSERT_SUBVECTOR, but this helps a number of shuffles across different vector widths recognise when they come from the same source.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363542 91177308-0d34-0410-b5e6-96231b3b80d8

[SCEV] Use NoWrapFlags when expanding a simple mul

Second functional change following on from rL362687. Pass the
NoWrapFlags from the MulExpr to InsertBinop when we're generating a
shl or mul.

Differential Revision: https://reviews.llvm.org/D61934

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363540 91177308-0d34-0410-b5e6-96231b3b80d8

[llvm-objdump] Use %08 instead of %016 to print leading addresses for 32-bit binaries

Reviewed By: grimar

Differential Revision: https://reviews.llvm.org/D63398

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363539 91177308-0d34-0410-b5e6-96231b3b80d8

[lit] Delete empty lines at the end of lit.local.cfg NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363538 91177308-0d34-0410-b5e6-96231b3b80d8

[NFC][Codegen] Standalone tests for icmp eq/ne (urem %x, C), 0 -> icmp eq/ne %x, 0 fold (D63390)

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363537 91177308-0d34-0410-b5e6-96231b3b80d8

[ARM] Fix another -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds after D63265

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363535 91177308-0d34-0410-b5e6-96231b3b80d8

[ARM] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds after D63265

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363534 91177308-0d34-0410-b5e6-96231b3b80d8

Describe stack-id as an enum

This patch changes MIR stack-id from an integer to an enum,
and adds printing/parsing support for this in MIR files. The default
stack-id '0' is now renamed to 'default'.

This should make MIR tests that have stack objects with different stack-ids
more descriptive. It also clarifies code operating on StackID.

Reviewers: arsenm, thegameg, qcolombet

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D60137

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363533 91177308-0d34-0410-b5e6-96231b3b80d8

[ARM] Remove ARMComputeBlockSize

Forgot to remove file!

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363532 91177308-0d34-0410-b5e6-96231b3b80d8

[ARM] Add ARMBasicBlockInfo.cpp

Forgot to add file!

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363531 91177308-0d34-0410-b5e6-96231b3b80d8

[ARM] Extract some code from ARMConstantIslandPass

Create the ARMBasicBlockUtils class for tracking and querying basic
blocks sizes so we can use them when generating low-overhead loops.

Differential Revision: https://reviews.llvm.org/D63265

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363530 91177308-0d34-0410-b5e6-96231b3b80d8

Re-commit r357452 (take 3): "SimplifyCFG SinkCommonCodeFromPredecessors: Also sink function calls without used results (PR41259)"

Third time's the charm.

This was reverted in r363220 due to being suspected of an internal benchmark
regression and a test failure, none of which turned out to be caused by this.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363529 91177308-0d34-0410-b5e6-96231b3b80d8

[SimplifyCFG] Fix prof branch_weights MD while removing unreachable switch cases

SimplifyCFG has a bug that results in inconsistent prof branch_weights metadata
if unreachable switch cases are removed. This patch fixes this bug by making use
of the newly introduced SwitchInstProfUpdateWrapper class (see patch D62122).
A new test is created.

Differential Revision: https://reviews.llvm.org/D62186

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363527 91177308-0d34-0410-b5e6-96231b3b80d8

PowerPC: Optimize SPE double parameter calling setup

Summary:
SPE passes doubles the same as soft-float, in register pairs as i32
types.  This is all handled by the target-independent layer.  However,
this is not optimal when splitting or reforming the doubles, as it
pushes to the stack and loads from, on either side.

For instance, to pass a double argument to a function, assuming the
double value is in r5, the sequence currently looks like this:

    evstdd      5, X(1)
    lwz         3, X(1)
    lwz         4, X+4(1)

Likewise, to form a double into r5 from args in r3 and r4:

    stw         3, X(1)
    stw         4, X+4(1)
    evldd       5, X(1)

This optimizes the fence to use SPE instructions.  Now, to pass a double
to a function:

    mr          4, 5
    evmergehi   3, 5, 5

And to form a double into r5 from args in r3 and r4:

    evmergelo   5, 3, 4

This is comparable to the way that gcc generates the double splits.

This also fixes a bug with expanding builtins to libcalls, where the
LowerCallTo() code path was generating intermediate illegal type nodes.

Reviewers: nemanjai, hfinkel, joerg

Subscribers: kbarton, jfb, jsji, llvm-commits

Differential Revision: https://reviews.llvm.org/D54583

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363526 91177308-0d34-0410-b5e6-96231b3b80d8

[yaml2obj][MachO] Don't fill dummy data for virtual sections

Summary:
Currently, MachOWriter::writeSectionData writes dummy data (0xdeadbeef) to fill section data areas in the file even if the section is a virtual one. Since virtual sections don't occupy any space in the file, writing dummy data could results the "OS.tell() - fileStart <= Sec.offset" assertion failure.

This patch fixes the bug by simply not writing any dummy data for virtual sections.

Reviewers: beanz, jhenderson, rupprecht, alexshap

Reviewed By: alexshap

Subscribers: compnerd, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62991

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363525 91177308-0d34-0410-b5e6-96231b3b80d8

[llvm-objcopy] Add elf32-sparc and elf32-sparcel target

Summary:
The "sparc"/"sparcel" architectures appears in ArchMap (used by -B option) but not in OutputFormatMap (used by -I/-O option). Add their targets into OutputFormatMap for consistency.

Note that AFAIK there're no targets for 32-bit little-endian SPARC ("elf32-sparcel") in GNU binutils.

Reviewers: espindola, alexshap, rupprecht, jhenderson, compnerd, jakehehrlich

Reviewed By: jhenderson, compnerd, jakehehrlich

Subscribers: jyknight, emaste, arichardson, fedor.sergeev, jakehehrlich, MaskRay, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63238

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363524 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Add TB_NO_REVERSE to some folding table entries where the register from uses the REX prefix, but the memory form does not.

It would not be safe to unfold the memory form the register form
without checking that we are compiling for 64-bit mode.

This probaby isn't a real functional issue since we are unlikely
to unfold any of these instructions since they don't have any
tied registers, aren't commutable, and don't have any inputs
other than the address.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363523 91177308-0d34-0410-b5e6-96231b3b80d8

[InstSimplify] Fix addo/subo undef folds (PR42209)

Fix folds of addo and subo with an undef operand to be:

`@llvm.{u,s}{add,sub}.with.overflow` all fold to `{ undef, false }`,
as per LLVM undef rules.
Same for commuted variants.

Based on the original version of the patch by @nikic.

Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=42209 | PR42209 ]]

Differential Revision: https://reviews.llvm.org/D63065

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363522 91177308-0d34-0410-b5e6-96231b3b80d8

[AsmPrinter] Make EmitLinkage and EmitVisibility public

Summary:
This allows target to implement custom emit of global variables if
required. See subsequent patch for a use case.

Change-Id: I9654197e3df24503104a54c41fff06845aed37fe

Reviewers: arsenm, kzhuravl

Subscribers: wdng, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61650

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363519 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Prepare for explicit absolute relocations in code generation

Summary:
We will use absolute relocations for LDS symbols.

Change-Id: I9a32795ed0ea835e433a787129cfe3c57ee9a325

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61492

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363517 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Be explicit about whether the high-word in SI_PC_ADD_REL_OFFSET is 0

Summary:
Instead of encoding a high-word of 0 using a fake TargetGlobalAddress,
just use a literal target constant. This simplifies some subsequent changes.

The generated assembly is now more explicit about the kind of relocation
that is to be used.

Change-Id: I066835202d23b5941fa7a358eb4b89e9b71ab6f8

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61491

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363516 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU/GFX10: Support DLC bit in llvm.amdgcn.s.buffer.load intrinsic

Summary: Change-Id: Ie4c971462a7749740938c687144e77441dac2539

Reviewers: rampitec, arsenm

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62486

Change-Id: Iae59523edd75c74918d2118df6571a7b671717a0

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363514 91177308-0d34-0410-b5e6-96231b3b80d8

[AMDGPU] gfx10 conditional registers handling

This is cpp source part of wave32 support, excluding overriden
getRegClass().

Differential Revision: https://reviews.llvm.org/D63351

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363513 91177308-0d34-0410-b5e6-96231b3b80d8

[CodeGenPrepare][x86] shift both sides of a vector select when profitable

This is based on the example/discussion in PR37428:
https://bugs.llvm.org/show_bug.cgi?id=37428

Proper vector shift instructions don't appear until AVX2, so we may generate several
extra instructions within a loop trying to compensate for that. It's difficult to
recover from that shift expansion later than this, so use the existing TLI hook and
splat analysis to enable better codegen.

This extends CGP functionality introduced with:
rL201655

Differential Revision: https://reviews.llvm.org/D63233

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363511 91177308-0d34-0410-b5e6-96231b3b80d8

[x86] split 256-bit vector selects if operands are vector concats

This is similar logic/motivation to the select splitting in D62969.

In D63233, the pattern changes so that we no longer have an extract_subvector of vselect,
but the operands of the select are still being concatenated.

The closest case is represented in either the first or last test diffs here - we have an
extra instruction, but we converted 3-4 ymm instructions into 4-5 xmm instructions.
I think that's the right trade-off for most AVX1 targets.

In the example based on PR37428:
https://bugs.llvm.org/show_bug.cgi?id=37428
...this makes the loop about 30% faster (tested on Haswell by compiling with -mavx).

Differential Revision: https://reviews.llvm.org/D63364

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363508 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] CombineShuffleWithExtract - handle cases with different vector extract sources

Insert the shorter vector source into an undef vector of the longer vector source's type.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363507 91177308-0d34-0410-b5e6-96231b3b80d8

gn build: Merge r363444

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363505 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] CombineShuffleWithExtract - assert all src ops types are multiples of rootsize. NFCI.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363501 91177308-0d34-0410-b5e6-96231b3b80d8

[X86][AVX] Handle lane-crossing shuffle(extract_subvector(x,c1),extract_subvector(y,c2),m1) shuffles

Pull out the existing (non)lane-crossing fold into a helper lambda and use for lane-crossing unary shuffles as well.

Fixes PR34380

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363500 91177308-0d34-0410-b5e6-96231b3b80d8

[X86][AVX] Decode constant bits from insert_subvector(c1, c2, c3)

This mostly happens due to SimplifyDemandedVectorElts reducing a vector to insert_subvector(undef, c1, 0)

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363499 91177308-0d34-0410-b5e6-96231b3b80d8

[NFC][MCA][X86] Add one more 'clear super register' pattern - movss/movsd load clears high XMM bits

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363498 91177308-0d34-0410-b5e6-96231b3b80d8

[NFC][MCA][X86] Add baseline test coverage for AMD Barcelona (aka K10, fam10h)

Looking into sched model for that CPU ...

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363497 91177308-0d34-0410-b5e6-96231b3b80d8

[Clang] Harmonize Split DWARF options with llc

Summary:
With Split DWARF the resulting object file (then called skeleton CU)
contains the file name of another ("DWO") file with the debug info.
This can be a problem for remote compilation, as it will contain the
name of the file on the compilation server, not on the client.

To use Split DWARF with remote compilation, one needs to either

* make sure only relative paths are used, and mirror the build directory
structure of the client on the server,
* inject the desired file name on the client directly.

Since llc already supports the latter solution, we're just copying that
over. We allow setting the actual output filename separately from the
value of the DW_AT_[GNU_]dwo_name attribute in the skeleton CU.

Fixes PR40276.

Reviewers: dblaikie, echristo, tejohnson

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D59673

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363496 91177308-0d34-0410-b5e6-96231b3b80d8

[PowerPC] Set the innermost hot loop to align 32 bytes

Summary:
If the nested loop is an innermost loop, prefer to a 32-byte alignment, so that
we can decrease cache misses and branch-prediction misses. Actual alignment of
the loop will depend on the hotness check and other logic in alignBlocks.

The old code will only align hot loop to 32 bytes when the LoopSize larger than
16 bytes and smaller than 32 bytes, this patch will align the innermost hot loop
to 32 bytes not only for the hot loop whose size is 16~32 bytes.

Reviewed By: steven.zhang, jsji

Differential Revision: https://reviews.llvm.org/D61228

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363495 91177308-0d34-0410-b5e6-96231b3b80d8

[clang] Add storage for APValue in ConstantExpr

Summary:
When using ConstantExpr we often need the result of the expression to be kept in the AST. Currently this is done on a by the node that needs the result and has been done multiple times for enumerator, for constexpr variables... . This patch adds to ConstantExpr the ability to store the result of evaluating the expression. no functional changes expected.

Changes:
- Add trailling object to ConstantExpr that can hold an APValue or an uint64_t. the uint64_t is here because most ConstantExpr yield integral values so there is an optimized layout for integral values.
- Add basic* serialization support for the trailing result.
- Move conversion functions from an enum to a fltSemantics from clang::FloatingLiteral to llvm::APFloatBase. this change is to make it usable for serializing APValues.
- Add basic* Import support for the trailing result.
- ConstantExpr created in CheckConvertedConstantExpression now stores the result in the ConstantExpr Node.
- Adapt AST dump to print the result when present.

basic* : None, Indeterminate, Int, Float, FixedPoint, ComplexInt, ComplexFloat,
the result is not yet used anywhere but for -ast-dump.

Reviewers: rsmith, martong, shafik

Reviewed By: rsmith

Subscribers: rnkovacs, hiraditya, dexonsmith, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D62399

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363493 91177308-0d34-0410-b5e6-96231b3b80d8

[BranchProbability] Delete a redundant overflow check

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363492 91177308-0d34-0410-b5e6-96231b3b80d8

[SCEV] Use unsigned/signed intersection type in SCEV

Based on D59959, this switches SCEV to use unsigned/signed range
intersection based on the sign hint. This will prefer non-wrapping
ranges in the relevant domain. I've left the one intersection in
getRangeForAffineAR() to use the smallest intersection heuristic,
as there doesn't seem to be any obvious preference there.

Differential Revision: https://reviews.llvm.org/D60035

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363490 91177308-0d34-0410-b5e6-96231b3b80d8

[SimplifyIndVar] Simplify non-overflowing saturating add/sub

If we can detect that saturating math that depends on an IV cannot
overflow, replace it with simple math. This is similar to the CVP
optimization from D62703, just based on a different underlying
analysis (SCEV vs LVI) that catches different cases.

Differential Revision: https://reviews.llvm.org/D62792

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363489 91177308-0d34-0410-b5e6-96231b3b80d8

[RISCV] Regenerate remat.ll and atomic-rmw.ll after D43256

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363487 91177308-0d34-0410-b5e6-96231b3b80d8

[RISCV] Simplify RISCVAsmBackend::writeNopData(). NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363486 91177308-0d34-0410-b5e6-96231b3b80d8

[objcopy] Error when --preserve-dates is specified with standard streams

Summary: llvm-objcopy/strip now error when -p is specified when reading from stdin or writing to stdout

Reviewers: jhenderson, rupprecht, espindola, alexshap

Reviewed By: jhenderson, rupprecht

Subscribers: emaste, arichardson, jakehehrlich, MaskRay, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63090

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363485 91177308-0d34-0410-b5e6-96231b3b80d8

adding more fmf propagation for selects plus updated tests

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363484 91177308-0d34-0410-b5e6-96231b3b80d8

Revert "adding more fmf propagation for selects plus tests"

This reverts rL363474. -debug-only=isel was added to some tests that
don't specify `REQUIRES: asserts`. This causes failures on
-DLLVM_ENABLE_ASSERTIONS=off builds.

I chose to revert instead of fixing the tests because I'm not sure
whether we should add `REQUIRES: asserts` to more tests.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363482 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] Add tests to show missing fold opportunity for "icmp and shift" (nfc).

Summary:
For icmp pred (and (sh X, Y), C), 0

  When C is signbit, expect to fold (X << Y) & signbit ==/!= 0 into (X << Y) >=/< 0,
  rather than (X & (signbit >> Y)) != 0.

  When C+1 is power of 2, expect to fold (X << Y) & ~C ==/!= 0 into (X << Y) </>= C+1,
  rather than (X & (~C >> Y)) == 0.

For icmp pred (and X, (sh signbit, Y)), 0

  Expect to fold (X & (signbit l>> Y)) ==/!= 0 into (X << Y) >=/< 0
  Expect to fold (X & (signbit << Y)) ==/!= 0 into (X l>> Y) >=/< 0

  Reviewers: lebedev.ri, efriedma, spatel, craig.topper

  Reviewed By: lebedev.ri

  Subscribers: llvm-commits

  Tags: #llvm

  Differential Revision: https://reviews.llvm.org/D63025

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363479 91177308-0d34-0410-b5e6-96231b3b80d8

Reapply "GlobalISel: Avoid producing Illegal copies in RegBankSelect"

This reapplies r363410, avoiding null dereference if there is no
AltRegBank.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363478 91177308-0d34-0410-b5e6-96231b3b80d8

Add a map_range function for applying map_iterator to a range.

In preparation for use in Clang.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363477 91177308-0d34-0410-b5e6-96231b3b80d8

Revert "GlobalISel: Avoid producing Illegal copies in RegBankSelect"

This patch breaks UBSan build bots. See
https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild for
a guide as to how to reproduce the error.

This reverts commit c2864c0de07efb5451d32d27a7d4ff2984830929.
This reverts rL363410.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363476 91177308-0d34-0410-b5e6-96231b3b80d8

adding more fmf propagation for selects plus tests

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363474 91177308-0d34-0410-b5e6-96231b3b80d8

[MBP] Move a latch block with conditional exit and multi predecessors to top of loop

Current findBestLoopTop can find and move one kind of block to top, a latch block has one successor. Another common case is:

    * a latch block
    * it has two successors, one is loop header, another is exit
    * it has more than one predecessors

If it is below one of its predecessors P, only P can fall through to it, all other predecessors need a jump to it, and another conditional jump to loop header. If it is moved before loop header, all its predecessors jump to it, then fall through to loop header. So all its predecessors except P can reduce one taken branch.

Differential Revision: https://reviews.llvm.org/D43256

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363471 91177308-0d34-0410-b5e6-96231b3b80d8

[ObjC][ARC] Delete ObjC runtime calls on global variables annotated
with 'objc_arc_inert'

Those calls are no-ops, so they can be safely deleted.

rdar://problem/49839633

Differential Revision: https://reviews.llvm.org/D62433

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363468 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Avoid most waitcnts before calls

Currently you get extra waits, because waits are inserted for the
register dependencies of the call, and the function prolog waits on
everything.

Currently waits are still inserted on returns. It may make sense to
not do this, and wait in the caller instead.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363465 91177308-0d34-0410-b5e6-96231b3b80d8

Add --print-supported-cpus flag for clang.

This patch allows clang users to print out a list of supported CPU models using
clang [--target=<target triple>] --print-supported-cpus

Then, users can select the CPU model to compile to using
clang --target=<triple> -mcpu=<model> a.c

It is a handy feature to help cross compilation.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363464 91177308-0d34-0410-b5e6-96231b3b80d8

[Remarks][NFC] Improve testing and documentation of -foptimization-record-passes

This adds:

* documentation to the user manual
* nicer error message
* test for the error case
* test for the gold plugin

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363463 91177308-0d34-0410-b5e6-96231b3b80d8

SROA: Allow eliminating addrspacecasted allocas

There is a circular dependency between SROA and InferAddressSpaces
today that requires running both multiple times in order to be able to
eliminate all simple allocas and addrspacecasts. InferAddressSpaces
can't remove addrspacecasts when written to memory, and SROA helps
move pointers out of memory.

This should avoid inserting new commuting addrspacecasts with GEPs,
since there are unresolved questions about pointer wrapping between
different address spaces.

For now, don't replace volatile operations that don't match the alloca
addrspace, as it would change the address space of the access. It may
be still OK to insert an addrspacecast from the new alloca, but be
more conservative for now.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363462 91177308-0d34-0410-b5e6-96231b3b80d8

[PowerPC][NFC] Comments update and remove some unused def

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363461 91177308-0d34-0410-b5e6-96231b3b80d8

SROA: Add baseline test for addrspacecast changes

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363460 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Fix capitalized register names in asm constraints

This was a workaround a long time ago, but the canonical lower case
names work now.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363459 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Fix dropping memref for ds append/consume

The way SelectionDAG treats memory operands is very frustrating, and
by default drops them unless a property is set on the pattern. There
is no pattern for manually selected instructions, so this requires
manually setting them.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363455 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Set isTrap on S_TRAP

This seems to only be used for generating some kind
of documentation, but might as well set it.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363454 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Add baseline test for call waitcnt insertion

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363453 91177308-0d34-0410-b5e6-96231b3b80d8

UpdateTestChecks: Consider .section as end of function for AMDGPU

Kernels seem to go directly to a section switch instead of emitting
.Lfunc_end. This fixes including all of the kernel metadata in the
check lines, which is undesirable most of the time.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363452 91177308-0d34-0410-b5e6-96231b3b80d8