granicus.if.org Git

gn build: Merge r357383

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357398 91177308-0d34-0410-b5e6-96231b3b80d8

[AArch64] Add v8.5-a Memory Tagging STZGM instruction

This instruction writes a block of allocation tags
and stores zero to the associated data locations.

It differs from STGM by 1 bit and has the same
arguments.

The specification can be found here:
https://developer.arm.com/docs/ddi0596/c

Differential Revision: https://reviews.llvm.org/D60065

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357397 91177308-0d34-0410-b5e6-96231b3b80d8

[RISCV] Attach VK_RISCV_CALL to symbols upon creation

This patch replaces the addition of VK_RISCV_CALL in RISCVMCCodeEmitter by
creating the RISCVMCExpr when tail/call are parsed, or in the codegen case
when the callee symbols are created.

This required adding a new CallSymbol operand to allow only adding
VK_RISCV_CALL to tail/call instructions.

This patch will allow further expansion of parsing and codegen to easily
include PLT symbols which must generate the R_RISCV_CALL_PLT relocation.

Differential Revision: https://reviews.llvm.org/D55560
Patch by Lewis Revill.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357396 91177308-0d34-0410-b5e6-96231b3b80d8

[AArch64] Add v8.5-a Memory Tagging STGM/LDGM instructions

The STGV/LDGV instructions were replaced with
STGM/LDGM. The encodings remain the same but there
is no longer writeback so there are no unpredictable
encodings to check for.

The specfication can be found here:
https://developer.arm.com/docs/ddi0596/c

Differential Revision: https://reviews.llvm.org/D60064

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357395 91177308-0d34-0410-b5e6-96231b3b80d8

[RISCV] Generate address sequences suitable for mcmodel=medium

This patch adds an implementation of a PC-relative addressing sequence to be
used when -mcmodel=medium is specified. With absolute addressing, a 'medium'
codemodel may cause addresses to be out of range. This is because while
'medium' implies a 2 GiB addressing range, this 2 GiB can be at any offset as
opposed to 'small', which implies the first 2 GiB only.

Note that LLVM/Clang currently specifies code models differently to GCC, where
small and medium imply the same functionality as GCC's medlow and medany
respectively.

Differential Revision: https://reviews.llvm.org/D54143
Patch by Lewis Revill.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357393 91177308-0d34-0410-b5e6-96231b3b80d8

[AArch64] Add v8.5-a Memory Tagging GMID_EL1 register

The latest version of the MTE spec added a system
register 'GMID_EL1'. It contains the block size used
by the LDGM and STGM instructions and is read only.

The specification can be found here:
https://developer.arm.com/docs/ddi0596/c

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357392 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] Handle vector gep with scalar argument in evaluateInDifferentElementOrder

Summary:
This fixes PR41270.

The recursive function evaluateInDifferentElementOrder expects to be called
on a vector Value, so when we call it on a vector GEP's arguments, we must
first check that the argument is indeed a vector.

Reviewers: reames, spatel

Reviewed By: spatel

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60058

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357389 91177308-0d34-0410-b5e6-96231b3b80d8

X86: Fix override warning

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357388 91177308-0d34-0410-b5e6-96231b3b80d8

Revert "[InstCombine] Handle vector gep with scalar argument in evaluateInDifferentElementOrder"

This reverts commit 75216a6dbcfe5fb55039ef06a07e419fa875f4a5.

I'll recommit with a better commit message with reference to the
phabricator review.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357387 91177308-0d34-0410-b5e6-96231b3b80d8

InstSimplify: Add baseline test for upcoming change

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357386 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] Handle vector gep with scalar argument in evaluateInDifferentElementOrder

This fixes PR41270.

The recursive function evaluateInDifferentElementOrder expects to be called
on a vector Value, so when we call it on a vector GEP's arguments, we must
first check that the argument is indeed a vector.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357385 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Make post-ra scheduling macrofusion-aware.

Subscribers: MatzeB, arsenm, jvesely, nhaehnle, hiraditya, javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59688

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357384 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] eliminate commuted select-shuffles + binop (PR41304)

If we have a commutable vector binop with inverted select-shuffles,
we don't care about the order of the operands in each vector lane:

LHS = shuffle V1, V2, <0, 5, 6, 3>
RHS = shuffle V2, V1, <0, 5, 6, 3>
LHS + RHS --> <V1[0]+V2[0], V2[1]+V1[1], V2[2]+V1[2], V1[3]+V2[3]> --> V1 + V2

PR41304:
https://bugs.llvm.org/show_bug.cgi?id=41304
...is currently titled as an SLP enhancement, but at least for the
given example, we can reduce that in instcombine because we are just
eliminating shuffles.

As noted in the TODO, this could be generalized, but I haven't thought
through those patterns completely, so this is limited to what appears
to be always safe.

Differential Revision: https://reviews.llvm.org/D60048

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357382 91177308-0d34-0410-b5e6-96231b3b80d8

[X86MacroFusion][NFC] Add more tests.

In preparation for D59688.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357381 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Fix a test from r357317

Summary:
The missing `<` causes the lld command to override the test file, which fails in
environments marking the test files as readonly.

Reviewers: bkramer

Reviewed By: bkramer

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60060

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357380 91177308-0d34-0410-b5e6-96231b3b80d8

[X86][SSE] Add fcmp constant folding tests

Initial test coverage for D60006

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357379 91177308-0d34-0410-b5e6-96231b3b80d8

[RISCV] Add seto pattern expansion

Adds a `seto` pattern expansion. Without it the lowerings of `fcmp one` and
`fcmp ord` would be inefficient due to an unoptimized double negation.

Differential Revision: https://reviews.llvm.org/D59699

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357378 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Use ISD::INTRINSIC_VOID in getTgtMemIntrinsic for truncating stores and scatter intrinsics.

This is the appropriate opcode for only having a chain output. Though I'm not
sure it matters much.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357375 91177308-0d34-0410-b5e6-96231b3b80d8

[RISCV] Don't evaluatePCRelLo if a relocation will be forced (e.g. due to linker relaxation)

A pcrel_lo will point to the associated pcrel_hi fixup which in turn points to
the real target. RISCVMCExpr::evaluatePCRelLo will work around this
indirection in order to allow the fixup to be evaluate properly. However, if
relocations are forced (e.g. due to linker relaxation is enabled) then its
evaluation is undesired and will result in a relocation with the wrong target.

This patch modifies evaluatePCRelLo so it will not try to evaluate if the
fixup will be forced as a relocation. A new helper method is added to
RISCVAsmBackend to query this.

Differential Revision: https://reviews.llvm.org/D59686

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357374 91177308-0d34-0410-b5e6-96231b3b80d8

gn build: Add build files for most clang-tools-extra unit tests

Differential Revision: https://reviews.llvm.org/D60038

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357369 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] add tests for inverted select-shuffles + binop (PR41304); NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357368 91177308-0d34-0410-b5e6-96231b3b80d8

[x86] allow movmsk with 2-element reductions

One motivation for making this change is that the lack of using movmsk is likely
a main source of perf difference between clang and gcc on the C-Ray benchmark as
shown here:
https://www.phoronix.com/scan.php?page=article&item=gcc-clang-2019&num=5
...but this change alone isn't enough to solve that problem.

The 'all-of' examples show what is likely the worst case trade-off: we end up with
an extra instruction (or 2 if we count the 'xor' register clearing). The 'any-of'
examples look clearly better using movmsk because we've traded 2 vector instructions
for 2 scalar instructions, and movmsk may have better timing than the generic 'movq'.

If we examine the llvm-mca output for these cases, it appears that even though the
'all-of' movmsk variant looks worse on paper, it would perform better on both
Haswell and Jaguar.

  $ llvm-mca -mcpu=haswell no_movmsk.s -timeline
  Iterations:        100
  Instructions:      400
  Total Cycles:      504
  Total uOps:        400

  Dispatch Width:    4
  uOps Per Cycle:    0.79
  IPC:               0.79
  Block RThroughput: 1.0

  $ llvm-mca -mcpu=haswell movmsk.s -timeline
  Iterations:        100
  Instructions:      600
  Total Cycles:      358
  Total uOps:        600

  Dispatch Width:    4
  uOps Per Cycle:    1.68
  IPC:               1.68
  Block RThroughput: 1.5

  $ llvm-mca -mcpu=btver2 no_movmsk.s -timeline
  Iterations:        100
  Instructions:      400
  Total Cycles:      407
  Total uOps:        400

  Dispatch Width:    2
  uOps Per Cycle:    0.98
  IPC:               0.98
  Block RThroughput: 2.0

  $ llvm-mca -mcpu=btver2 movmsk.s -timeline
  Iterations:        100
  Instructions:      600
  Total Cycles:      311
  Total uOps:        600

  Dispatch Width:    2
  uOps Per Cycle:    1.93
  IPC:               1.93
  Block RThroughput: 3.0

Finally, there may be CPUs where movmsk is horribly slow (old AMD small cores?), but if
that's true, then we're also almost certainly making the wrong transform already for
reductions with >2 elements, so that should be fixed independently.

Differential Revision: https://reviews.llvm.org/D59997

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357367 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] canonicalize select shuffles by commuting

In PR41304:
https://bugs.llvm.org/show_bug.cgi?id=41304
...we have a case where we want to fold a binop of select-shuffle (blended) values.

Rather than try to match commuted variants of the pattern, we can canonicalize the
shuffles and check for mask equality with commuted operands.

We don't produce arbitrary shuffle masks in instcombine, but select-shuffles are a
special case that the backend is required to handle because we already canonicalize
vector select to this shuffle form.

So there should be no codegen difference from this change. It's possible that this
improves CSE in IR though.

Differential Revision: https://reviews.llvm.org/D60016

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357366 91177308-0d34-0410-b5e6-96231b3b80d8

fix typo: "\t" => " "

Reviewers: llvm.org, Jim

Reviewed By: Jim

Subscribers: arsenm, jvesely, nhaehnle, rupprecht, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59983

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357365 91177308-0d34-0410-b5e6-96231b3b80d8

SafepointIRVerifier port to new Pass Manager

Straightforward port of StatepointIRVerifier pass to new Pass Manager framework.

Fix By: skatkov
Reviewed By: fedor.sergeev
Differential Revision: https://reviews.llvm.org/D59825

This is a re-land of r357147/r357148 with LLVM_ENABLE_MODULES build fixed.
Adding IR/SafepointIRVerifier.h into its own module.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357361 91177308-0d34-0410-b5e6-96231b3b80d8

[NFC][InstCombine] Add tests for combining icmp of no-wrap sub w/ constant.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357360 91177308-0d34-0410-b5e6-96231b3b80d8

gn build: Merge r357340

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357358 91177308-0d34-0410-b5e6-96231b3b80d8

gn build: Merge r357326

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357357 91177308-0d34-0410-b5e6-96231b3b80d8

[SystemZ] Remove fcmp undef from reduced test

Pre-commit for D60006 (Add fcmp UNDEF handling to SelectionDAG::FoldSetCC)

Approved by @uweigand (Ulrich Weigand)

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357355 91177308-0d34-0410-b5e6-96231b3b80d8

[MIPS] Remove fcmp undef from reduced test

Pre-commit for D60006 (Add fcmp UNDEF handling to SelectionDAG::FoldSetCC)

Approved by @atanasyan (Simon Atanasyan)

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357354 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Teach isel for RMW binops to handle negate

Negate updates flags like a subtract. We should be able to use the flags from the RMW form of negate when we have (store (X86ISD::SUB 0, load A), A)

Differential Revision: https://reviews.llvm.org/D60007

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357353 91177308-0d34-0410-b5e6-96231b3b80d8

[RISCV] Add codegen support for ilp32f, ilp32d, lp64f, and lp64d ("hard float") ABIs

This patch adds support for the RISC-V hard float ABIs, building on top of
rL355771, which added basic target-abi parsing and MC layer support. It also
builds on some re-organisations and expansion of the upstream ABI and calling
convention tests which were recently committed directly upstream.

A number of aspects of the RISC-V float hard float ABIs require frontend
support (e.g. flattening of structs and passing int+fp for fp+fp structs in a
pair of registers), and will be addressed in a Clang patch.

As can be seen from the tests, it would be worthwhile extending
RISCVMergeBaseOffsets to handle constant pool as well as global accesses.

Differential Revision: https://reviews.llvm.org/D59357

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357352 91177308-0d34-0410-b5e6-96231b3b80d8

[X86][SSE] detectAVGPattern - Match zext(or(x,y)) 'add like' patterns (PR41316)

Fixes PR41316 where the expanded PAVG intrinsic had had one of its ADDs turned into an OR due to its operands having no conflicting bits.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357351 91177308-0d34-0410-b5e6-96231b3b80d8

[RISCV] Add RV64 CHECK lines to test/CodeGen/RISCV/vararg.ll and prepare for hard float tests

vararg.ll previously missed RV64 tests. This patch also prepares for using
vararg.ll to test handling of varargs for the ilp32f/ilp32d/lp64f/lp64d hard
float ABIs. In these ABIs, varargs are passed as in either the ilp32 or lp64
ABI. Due to some slight codegen differences, different check lines are needed
for when RV32D is enabled.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357350 91177308-0d34-0410-b5e6-96231b3b80d8

[X86][SSE] detectAVGPattern - begin generalizing ADD matches

Move the ADD matching into a helper - first NFC stage towards supporting 'ADD like' cases such as in PR41316

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357349 91177308-0d34-0410-b5e6-96231b3b80d8

[cmake] Change deprecated $<CONFIG> to $<CONFIGURATION>. NFC

See rL357338 for a similar change. The informational expression
$<CONFIGURATION> has been deprecated since CMake 3.0

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357348 91177308-0d34-0410-b5e6-96231b3b80d8

[llvm-objcopy] Replace the size() helper with SectionTableRef::size

Summary:
BTW, STLExtras.h provides llvm::size() which is similar to std::size()
for random access iterators. However, if we prefer qualified
llvm::size(), the member function .size() will be more convenient.

Reviewers: jhenderson, jakehehrlich, rupprecht, grimar, alexshap, espindola

Reviewed By: grimar

Subscribers: emaste, arichardson, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60028

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357347 91177308-0d34-0410-b5e6-96231b3b80d8

[X86][SSE] Add PAVG test case from PR41316

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357346 91177308-0d34-0410-b5e6-96231b3b80d8

[WebAssembly] Fix unwind destination mismatches in CFG stackify

Summary:
Linearing the control flow by placing `try`/`end_try` markers can create
mismatches in unwind destinations. This patch resolves these mismatches
by wrapping those instructions with an incorrect unwind destination with
a nested `try`/`catch`/`end_try` and branching to the right destination
within the new catch block.

Reviewers: dschuff

Subscribers: sunfish, sbc100, jgravelle-google, chrib, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D48345

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357343 91177308-0d34-0410-b5e6-96231b3b80d8

[WebAssembly] Run ExplicitLocals pass after CFGStackify

Summary:
While this does not change any final output, this will greatly simplify
ixing unwind destination mismatches in CFGStackify (D48345), because we
have to create some new registers there.

Reviewers: dschuff

Subscribers: sunfish, sbc100, jgravelle-google, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59652

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357342 91177308-0d34-0410-b5e6-96231b3b80d8

[RISCV] Add DAGCombine for (SplitF64 (ConstantFP x))

The SplitF64 node is used on RV32D to convert an f64 directly to a pair of i32
(necessary as bitcasting to i64 isn't legal). When performed on a ConstantFP,
this will result in a FP load from the constant pool followed by a store to
the stack and two integer loads from the stack (necessary as there is no way
to directly move between f64 FPRs and i32 GPRs on RV32D). It's always cheaper
to just materialise integers for the lo and hi parts of the FP constant, so do
that instead.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357341 91177308-0d34-0410-b5e6-96231b3b80d8

Adds `-ftime-trace` option to clang that produces Chrome `chrome://tracing` compatible JSON profiling output dumps.

This change adds hierarchical "time trace" profiling blocks that can be visualized in Chrome, in a "flame chart" style. Each profiling block can have a "detail" string that for example indicates the file being processed, template name being instantiated, function being optimized etc.

This is taken from GitHub PR: https://github.com/aras-p/llvm-project-20170507/pull/2

Patch by Aras Pranckevičius.

Differential Revision: https://reviews.llvm.org/D58675

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357340 91177308-0d34-0410-b5e6-96231b3b80d8

[RISCV][NFC] Remove floating point operations from test/CodeGen/RISCV/vararg.ll

This minimises differences in output when compiling with hardware floating
point support, which will be done in a future patch (to demonstrate the same
vararg calling convention is used).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357339 91177308-0d34-0410-b5e6-96231b3b80d8

[cmake] Remove use of deprecated generator expression. NFC

Use $<CONFIG> instead of $<CONFIGURATION>, since the latter has been
deprecated since CMake 3.0, and the former is entirely equivalent.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357338 91177308-0d34-0410-b5e6-96231b3b80d8

[WebAssembly] Optimize the number of routing blocks in FixIrreducibleCFG

Summary:
Currently we create a routing block to the dispatch block for every
predecessor of every entry. So the total number of routing blocks
created will be (# of preds) * (# of entries). But we don't need to do
this: we need at most 2 routing blocks per loop entry, one for when the
predecessor is inside the loop and one for it is outside the loop. (We
can't merge these into one because this will creates another loop cycle
between blocks inside and blocks outside) This patch fixes this and
creates at most 2 routing blocks per entry.

This also renames variable `Split` to `Routing`, which I think is a bit
clearer.

Reviewers: kripken

Subscribers: sunfish, dschuff, sbc100, jgravelle-google, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59462

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357337 91177308-0d34-0410-b5e6-96231b3b80d8

[Support] Implement is_local_impl with AIX mntctl

Summary:
On AIX, we can determine whether a filesystem is remote using `mntctl`.

If the information is not found, then claim that the file is remote
(since that is the more restrictive case). Testing for the associated
interface is restored with a modified version of the unit test from
rL295768.

Reviewers: jasonliu, xingxue

Reviewed By: xingxue

Subscribers: jsji, apaprocki, Hahnfeld, zturner, krytarowski, kristina, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58801

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357333 91177308-0d34-0410-b5e6-96231b3b80d8

[LoopPredication] Remove stale TODO

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357331 91177308-0d34-0410-b5e6-96231b3b80d8

[LoopPredication] Use the builder's insertion point everywhere [NFC]

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357330 91177308-0d34-0410-b5e6-96231b3b80d8

[MemorySSA] Temporary fix assert when reaching 0 limit.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357327 91177308-0d34-0410-b5e6-96231b3b80d8

Try to fix buildbot error

Error is:

llvm/lib/Analysis/ScalarEvolution.cpp:3534:10: error: chosen constructor is explicit in copy-initialization
  return {UniqueSCEVs.FindNodeOrInsertPos(ID, IP), std::move(ID), IP};
         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/bin/../lib/gcc/aarch64-linux-gnu/5.4.0/../../../../include/c++/5.4.0/tuple:479:19: note: explicit constructor declared here
        constexpr tuple(_UElements&&... __elements)
                  ^
1 error generated.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357324 91177308-0d34-0410-b5e6-96231b3b80d8

[WebAssembly] Add mutable globals feature

Summary:
This feature is not actually used for anything in the WebAssembly
backend, but adding it allows users to get it into the target features
sections of their objects, which makes these objects
future-compatible.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, jdoerfert, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D60013

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357321 91177308-0d34-0410-b5e6-96231b3b80d8

[SCEV] Check the cache in get{S|U}MaxExpr before doing any work

Summary:
This lets us avoid e.g. checking if A >=s B in getSMaxExpr(A, B) if we've
already established that (A smax B) is the best we can do.

Fixes PR41225.

Reviewers: asbirlea

Subscribers: mcrosier, jlebar, bixia, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60010

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357320 91177308-0d34-0410-b5e6-96231b3b80d8

[MemorySSA] Limit clobber walks.

Summary: This patch limits all getClobberingMemoryAccess() walks to MaxCheckLimit.

Reviewers: george.burgess.iv

Subscribers: sanjoy, jlebar, Prazek, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59569

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357319 91177308-0d34-0410-b5e6-96231b3b80d8

[GlobalISel][AArch64] Add isel support for G_INSERT_VECTOR_ELT on v2s32s

This adds support for v2s32 vector inserts, and updates the selection +
regbankselect tests for G_INSERT_VECTOR_ELT.

Differential Revision: https://reviews.llvm.org/D59910

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357318 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] When using Win64 ABI, exit with error if SSE is disabled for varargs

We need XMM registers to handle varargs with the Win64 ABI. Before we would
silently generate bad code resulting in an assertion failure elsewhere in the
backend.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357317 91177308-0d34-0410-b5e6-96231b3b80d8

[MemorySSA] Don't optimize incomplete phis.

Summary:
MemoryPhis cannot be optimized out until they are complete.
Resolves PR41254.

Reviewers: george.burgess.iv

Subscribers: sanjoy, jlebar, Prazek, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59966

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357315 91177308-0d34-0410-b5e6-96231b3b80d8

[DAGCombiner] Rewrite ImproveLifetimeNodeChain to avoid DAG loop.

Avoid EXPENSIVE_CHECK failure. NFCI.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357309 91177308-0d34-0410-b5e6-96231b3b80d8

[WebAssembly] Handle END_LOOP in unreachable BB in CFGStackify

Summary:
This fixes crashes when a BB in which an END_LOOP is to be placed is
unreachable and does not have any predecessors. Fixes PR41307.

Reviewers: dschuff

Subscribers: yurydelendik, sbc100, jgravelle-google, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60004

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357303 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Remove dx10-clamp from subtarget features

Since this can be set with s_setreg*, it should not be a subtarget
property. Set a default based on the calling convention, and Introduce
a new amdgpu-dx10-clamp attribute to override this if desired.

Also introduce a new amdgpu-ieee attribute to match.

The values need to match to allow inlining. I think it is OK for the
caller's dx10-clamp attribute to override the callee, but there
doesn't appear to be the infrastructure to do this currently without
definining the attribute in the generic Attributes.td.

Eventually the calling convention lowering will need to insert a mode
switch somewhere for these.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357302 91177308-0d34-0410-b5e6-96231b3b80d8

[Hexagon] Remove fcmp undef from reduced tests

Pre-commit for D60006 (Add fcmp UNDEF handling to SelectionDAG::FoldSetCC)

Approved by @kparzysz (Krzysztof Parzyszek)

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357301 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Add test cases showing failure to use RMW form of negate when only flags are used. NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357300 91177308-0d34-0410-b5e6-96231b3b80d8

[DAG] Avoid redundancy in StoreMerge TokenFactor generation.

Avoid generating redundant TokenFactor when all merged stores have
the same chain.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357299 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Use cached OptForSize in X86ISelDAGToDAG.cpp instead of pulling it from the function attribute. NFCI

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357297 91177308-0d34-0410-b5e6-96231b3b80d8

[SystemZ] Regenerate double constant comparison test

Prep work for PR40800 (Add UNDEF handling to SelectionDAG::FoldSetCC)

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357295 91177308-0d34-0410-b5e6-96231b3b80d8

[MIPS] Regenerate double constant comparison test

Prep work for PR40800 (Add UNDEF handling to SelectionDAG::FoldSetCC)

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357294 91177308-0d34-0410-b5e6-96231b3b80d8

[ARM] Regenerate execute-only float comparison tests

Prep work for PR40800 (Add UNDEF handling to SelectionDAG::FoldSetCC)

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357293 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] autogenerate complete checks; NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357291 91177308-0d34-0410-b5e6-96231b3b80d8

[AMDGPU] Add an additional Code Object V3 assembler example

Document the intended use of the `.amdgcn.next_free_{s,v}gpr` in the
context of multiple kernels and functions.

Differential Revision: https://reviews.llvm.org/D59949

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357289 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] regenerate test checks; NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357288 91177308-0d34-0410-b5e6-96231b3b80d8

[AArch64] Regenerate half precision tests

Prep work for PR40800 (Add UNDEF handling to SelectionDAG::FoldSetCC)

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357286 91177308-0d34-0410-b5e6-96231b3b80d8

[llvm][NFC] Factor out logic for getting incoming & back Loop edges

Reviewers: davidxl

Reviewed By: davidxl

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59967

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357284 91177308-0d34-0410-b5e6-96231b3b80d8

[DAGCombine] Prune unnused nodes.

Summary:
Nodes that have no uses are eventually pruned when they are selected
from the worklist. Record nodes newly added to the worklist or DAG and
perform pruning after every combine attempt.

Reviewers: efriedma, RKSimon, craig.topper, spatel, jyknight

Reviewed By: jyknight

Subscribers: jdoerfert, jyknight, nemanjai, jvesely, nhaehnle, javed.absar, hiraditya, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58070

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357283 91177308-0d34-0410-b5e6-96231b3b80d8

[ARM] Regenerate vector comparison tests

Prep work for PR40800 (Add UNDEF handling to SelectionDAG::FoldSetCC)

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357281 91177308-0d34-0410-b5e6-96231b3b80d8

[CodeGen] Refactor the option for the maximum jump table size

Refactor the option `max-jump-table-size` to default to the maximum
representable number. Essentially, NFC.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357280 91177308-0d34-0410-b5e6-96231b3b80d8

[DAG] Set up infrastructure to avoid smart constructor-based dangling nodes

Summary:
Various SelectionDAG non-combine operations (e.g. the getNode smart
constructor and legalization) may leave dangling nodes by applying
optimizations without fully pruning unused result values. This results
in nodes that are never added to the worklist and therefore can not be
pruned.

Add a node inserter for the combiner to make sure such nodes have the
chance of being pruned. This allows a number of additional peephole
optimizations.

Reviewers: efriedma, RKSimon, craig.topper, jyknight

Reviewed By: jyknight

Subscribers: msearles, jyknight, sdardis, nemanjai, javed.absar, hiraditya, jrtc27, atanasyan, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58068

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357279 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Fix some tests using fcmp with undef arguments

Prep work for PR40800 (Add UNDEF handling to SelectionDAG::FoldSetCC)

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357278 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] move shuffle canonicalizations before other transforms

This may not be NFC, but I'm not sure how to expose any diffs in
tests. In theory, it should be slightly more efficient and possibly
more profitable to do the canonicalizations (which can increase the
undef elements in the mask) ahead of SimplifyDemandedVectorElts().

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357272 91177308-0d34-0410-b5e6-96231b3b80d8

[llvm-readobj] Add some generic notes (e.g. NT_VERSION)

Summary: Support reading notes that don't have a standard note name.

Reviewers: MaskRay

Reviewed By: MaskRay

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59969

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357271 91177308-0d34-0410-b5e6-96231b3b80d8

[llvm-readelf] Allow prefix flags for -p and -x

Summary: This allows syntax like `llvm-readelf -p.data1 -x.data2`.

Reviewers: jhenderson

Reviewed By: jhenderson

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59965

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357270 91177308-0d34-0410-b5e6-96231b3b80d8

[SLP] Add support for commutative icmp/fcmp predicates

For the cases where the icmp/fcmp predicate is commutative, use reorderInputsAccordingToOpcode to collect and commute the operands.

This requires a helper to recognise commutativity in both general Instruction and CmpInstr types - the CmpInst::isCommutative doesn't overload the Instruction::isCommutative method for reasons I'm not clear on (maybe because its based on predicate not opcode?!?).

Differential Revision: https://reviews.llvm.org/D59992

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357266 91177308-0d34-0410-b5e6-96231b3b80d8

[llvm-objcopy] Fix case style of LayoutSegments. NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357265 91177308-0d34-0410-b5e6-96231b3b80d8

[mips] Fix lowering a signed immediate for *.d MSA instructions

The `lowerMSASplatImm` function zero-extends `i32` immediates while
building constant. If target type is `i64`, negative immediate loses
the sign. As a result, for example `__builtin_msa_ldi_d(-1)` lowered
to series of instruction loads incorrect value 0xffffffff to the `$w0`
register instead of single `ldi.d $w0, -1` instruction.

The fix zero-extends unsigned immediates and signed-extend signed
immediates.

Differential Revision: http://reviews.llvm.org/D59884

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357264 91177308-0d34-0410-b5e6-96231b3b80d8

[NFC][llvm-exegesis] Also promote getSchedClassPoint() into ResolvedSchedClass.

Summary:
It doesn't need anything from Analysis::SchedClassCluster class,
and takes ResolvedSchedClass as param, so this seems rather fitting.

Reviewers: courbet, gchatelet

Reviewed By: courbet

Subscribers: tschuett, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59994

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357263 91177308-0d34-0410-b5e6-96231b3b80d8

[AMDGPU][MC] Corrected conversion rules for inlinable constants to match rules for literals

See bug 40806: https://bugs.llvm.org/show_bug.cgi?id=40806

Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D59786

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357262 91177308-0d34-0410-b5e6-96231b3b80d8

gn build: Merge r357248

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357261 91177308-0d34-0410-b5e6-96231b3b80d8

gn build: Merge r357259

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357260 91177308-0d34-0410-b5e6-96231b3b80d8

[NFC][llvm-exegesis] Refactor ResolvedSchedClass & friends

Summary:
`ResolvedSchedClass` will need to be used outside of `Analysis`
(before `InstructionBenchmarkClustering` even), therefore promote
it into a non-private top-level class, and while there also
move all of the functions that are only called by `ResolvedSchedClass`
into that same new file.

Reviewers: courbet, gchatelet

Reviewed By: courbet

Subscribers: mgorny, tschuett, mgrang, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59993

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357259 91177308-0d34-0410-b5e6-96231b3b80d8

[DAGCombiner] simplify shuffle of shuffle

After investigating the examples from D59777 targeting an SSE4.1 machine,
it looks like a very different problem due to how we map illegal types (256-bit in these cases).

We're missing a shuffle simplification that maps elements of a vector back to a shuffled operand.
We have a more general version of this transform in DAGCombiner::visitVECTOR_SHUFFLE(), but that
generality means it is limited to patterns with a one-use constraint, and the examples here have
2 uses. We don't need any uses or legality limitations for a simplification (no new value is
created).

It looks like we miss this pattern in IR too.

In one of the zext examples here, we have shuffle masks like this:

Shuf0 = vector_shuffle<0,u,3,7,0,u,3,7>
Shuf = vector_shuffle<4,u,6,7,u,u,u,u>

...so that's moving the high half of the 1st vector into the low half. But the high half of the
1st vector is already identical to the low half.

Differential Revision: https://reviews.llvm.org/D59961

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357258 91177308-0d34-0410-b5e6-96231b3b80d8

Recommit "[DSE] Preserve basic block ordering using OrderedBasicBlock."

Updated to use DenseMap::insert instead of [] operator for insertion, to
avoid a crash caused by epoch checks.

This reverts commit 2b85de438326f9d27bc96dc934ec98b98abdb337.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357257 91177308-0d34-0410-b5e6-96231b3b80d8

[DAGCombine] Improve Lifetime node chains.

Improve both start and end lifetime nodes chain dependencies.

Reviewers: courbet

Reviewed By: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59795

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357256 91177308-0d34-0410-b5e6-96231b3b80d8

[DAGCombiner] fold sext into decrement

This is a sibling to rL357178 that I noticed we'd hit if we chose
an alternate transform in D59818.

  %z = zext i8 %x to i32
  %dec = add i32 %z, -1
  %r = sext i32 %dec to i64
  =>
  %z2 = zext i8 %x to i64
  %r = add i64 %z2, -1

https://rise4fun.com/Alive/kPP

The x86 vector diffs show a slight regression, so there's a chance
that we should limit this and the previous transform to scalars.

But given that we allowed vectors before, I'm matching that behavior
here. We should change both transforms together if that's the right
thing to do.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357254 91177308-0d34-0410-b5e6-96231b3b80d8

Switch lowering: exploit unreachable fall-through when lowering case range cluster

In the example below, we would previously emit two range checks, one for cases
1--3 and one for 4--6. This patch makes us exploit the fact that the
fall-through is unreachable and only one range check is necessary.

  switch i32 %i, label %default [
    i32 1,  label %bb1
    i32 2,  label %bb1
    i32 3,  label %bb1
    i32 4,  label %bb2
    i32 5,  label %bb2
    i32 6,  label %bb2
  ]
  default: unreachable

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357252 91177308-0d34-0410-b5e6-96231b3b80d8

[x86] add tests for decrement+sext; NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357251 91177308-0d34-0410-b5e6-96231b3b80d8

[AMDGPU][MC] Corrected handling of tied src for atomic return MUBUF opcodes

See bug 40917: https://bugs.llvm.org/show_bug.cgi?id=40917

Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D59878

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357249 91177308-0d34-0410-b5e6-96231b3b80d8

[MCA] Add an experimental MicroOpQueue stage.

This patch adds an experimental stage named MicroOpQueueStage.
MicroOpQueueStage can be used to simulate a hardware micro-op queue (basically,
a decoupling queue between 'decode' and 'dispatch').  Users can specify a queue
size, as well as a optional MaxIPC (which - in the absence of a "Decoders" stage
- can be used to simulate a different throughput from the decoders).

This stage is added to the default pipeline between the EntryStage and the
DispatchStage only if PipelineOption::MicroOpQueue is different than zero. By
default, llvm-mca sets PipelineOption::MicroOpQueue to the value of hidden flag
-micro-op-queue-size.

Throughput from the decoder can be simulated via another hidden flag named
-decoder-throughput.  That flag allows us to quickly experiment with different
frontend throughputs.  For targets that declare a loop buffer, flag
-decoder-throughput allows users to do multiple runs, each time simulating a
different throughput from the decoders.

This stage can/will be extended in future. For example, we could add a "buffer
full" event to notify bottlenecks caused by backpressure. flag
-decoder-throughput would probably go away if in future we delegate to another
stage (DecoderStage?) the simulation of a (potentially variable) throughput from
the decoders. For now, flag -decoder-throughput is "good enough" to run some
simple experiments.

Differential Revision: https://reviews.llvm.org/D59928

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357248 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Make sram-ecc off by default for Vega20

Differential Revision: https://reviews.llvm.org/D59718

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357247 91177308-0d34-0410-b5e6-96231b3b80d8

[llvm-readelf]Merge dynamic and static relocation printing to avoid code duplication

The majority of the printRelocation and printDynamicRelocation functions
were identical. This patch factors this all out into a new function.
There are a couple of minor differences to do with printing of symbols
without names, but I think these are harmless, and in some cases a small
improvement.

Reviewed by: grimar, rupprecht, Higuoxing

Differential Revision: https://reviews.llvm.org/D59823

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357246 91177308-0d34-0410-b5e6-96231b3b80d8

[NFC][llvm-exegesis] Refactor Analysis::SchedClassCluster::measurementsMatch()

Summary:
The diff looks scary but it really isn't:
1. I moved the check for the number of measurements into `SchedClassClusterCentroid::validate()`
2. While there, added a check that we can only have a single inverse throughput measurement. I missed that when adding it initially.
3. In `Analysis::SchedClassCluster::measurementsMatch()` is called with the current LLVM values from schedule class and the values from Centroid.
3.1. The values from centroid we can already get from `SchedClassClusterCentroid::getAsPoint()`.
     This isn't 100% a NFC, because previously for inverse throughput we used `min()`. I have asked whether i have done that correctly in
     https://reviews.llvm.org/D57647?id=184939#inline-510384 but did not hear back. I think `avg()` should be used too, thus it is a fix.
3.2. Finally, refactor the computation of the LLVM-specified values into `Analysis::SchedClassCluster::getSchedClassPoint()`
     I will need that function for [[ https://bugs.llvm.org/show_bug.cgi?id=41275 | PR41275 ]]

Reviewers: courbet, gchatelet

Reviewed By: courbet

Subscribers: tschuett, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59951

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357245 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Add X86TargetLowering::isCommutativeBinOp override.

We currently just have test coverage for PMULUDQ - will add more in the future.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357244 91177308-0d34-0410-b5e6-96231b3b80d8

[SLP] Add support for swapping icmp/fcmp predicates to permit vectorization

We should be able to match elements with the swapped predicate as well - as long as we commute the source operands.

Differential Revision: https://reviews.llvm.org/D59956

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357243 91177308-0d34-0410-b5e6-96231b3b80d8