granicus.if.org Git

[X86] Make a bunch of merge masked binops commutable for loading folding.

This primarily affects add/fadd/mul/fmul/and/or/xor/pmuludq/pmuldq/max/min/fmaxc/fminc/pmaddwd/pavg.

We already commuted the unmasked and zero masked versions.

I've added 512-bit stack folding tests for most of the instructions
affected. I've tested needing commuting and not commuting across
unmasked, merged masked, and zero masked. The 128/256 bit instructions
should behave similarly.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362746 91177308-0d34-0410-b5e6-96231b3b80d8

[InstSimplify] add tests for fcmp with known-never-nan operands; NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362742 91177308-0d34-0410-b5e6-96231b3b80d8

[NFC][CodeGen] Add unary fneg tests to X86/fma-scalar-combine.ll

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362741 91177308-0d34-0410-b5e6-96231b3b80d8

[CFLGraph] Add support for unary fneg instruction.

Differential Revision: https://reviews.llvm.org/D62791

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362737 91177308-0d34-0410-b5e6-96231b3b80d8

[LV] Wrap LV illegality reporting in a function. NFC.

A function for loop vectorization illegality reporting has been
introduced:

void LoopVectorizationLegality::reportVectorizationFailure(
const StringRef DebugMsg, const StringRef OREMsg,
const StringRef ORETag, Instruction * const I) const;

The function prints a debug message when the debug for the compilation
unit is enabled as well as invokes the optimization report emitter to
generate a message with a specified tag. The function doesn't cover any
complicated logic when a custom lambda should be passed to the emitter,
only generating a message with a tag is supported.

The function always prints the instruction `I` after the debug message
whenever the instruction is specified, otherwise the debug message
ends with a dot: 'LV: Not vectorizing: Disabled/already vectorized.'

Patch by Pavel Samolysov <samolisov@gmail.com>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362736 91177308-0d34-0410-b5e6-96231b3b80d8

[AIX] Implement function descriptor on SDAG

Summary:
(1) Function descriptor on AIX
On AIX, a called routine may have 2 distinct symbols associated with it:
* A function descriptor (Name)
* A function entry point (.Name)

The descriptor structure on AIX is the same as those in the ELF V1 ABI:
* The address of the entry point of the function.
* The TOC base address for the function.
* The environment pointer.

The descriptor symbol uses the same name as the source level function in C.
The function entry point is analogous to the symbol we would generate for a
function in a non-descriptor-based ABI, except that it is renamed by
prepending a ".".

Which symbol gets referenced depends on the context:
* Taking the address of the function references the descriptor symbol.
* Calling the function references the entry point symbol.

(2) Speaking of implementation on AIX, for direct function call target, we
create proper MCSymbol SDNode(e.g . ".foo") while constructing SDAG to
replace original TargetGlobalAddress SDNode. Then down the path, we can
take advantage of this MCSymbol.

Patch by: Xiangling_L

Reviewed by: sfertile, hubert.reinterpretcast, jasonliu, syzaara

Differential Revision: https://reviews.llvm.org/D62532

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362735 91177308-0d34-0410-b5e6-96231b3b80d8

[NFC][CodeGen] Add unary fneg tests to X86/fma4-fneg-combine.ll

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362733 91177308-0d34-0410-b5e6-96231b3b80d8

[InlineCost] Add support for unary fneg.

This adds support for unary fneg based on the implementation of BinaryOperator without the soft float FP cost.

Previously we would just delegate to visitUnaryInstruction. I think the only real change is that we will pass the FastMath flags to SimplifyFNeg now.

Differential Revision: https://reviews.llvm.org/D62699

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362732 91177308-0d34-0410-b5e6-96231b3b80d8

[NFC][CodeGen] Add unary fneg tests to X86/fma_patterns.ll

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362730 91177308-0d34-0410-b5e6-96231b3b80d8

[LoopPred] Fix a bug in unconditional latch bailout introduced in r362284

This is a really silly bug that even a simple test w/an unconditional latch would have caught. I tried to guard against the case, but put it in the wrong if check. Oops.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362727 91177308-0d34-0410-b5e6-96231b3b80d8

[DAGCombine] MergeConsecutiveStores - improve non-temporal load\store handling (PR42123)

This patch is the first step towards ensuring MergeConsecutiveStores correctly handles non-temporal loads\stores:

1 - When merging load\stores we must ensure that they all have the same non-temporal flag. This is unlikely to occur, but can in strange cases where we're storing at the end of one page and the beginning of another.

2 - The merged load\store node must retain the non-temporal flag.

Differential Revision: https://reviews.llvm.org/D62910

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362723 91177308-0d34-0410-b5e6-96231b3b80d8

[NFC][CodeGen] Add unary fneg tests to X86/fma_patterns_wide.ll

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362720 91177308-0d34-0410-b5e6-96231b3b80d8

gn build: Merge r362685

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362719 91177308-0d34-0410-b5e6-96231b3b80d8

Remove unused PPC.h includes under llvm/lib/Target/PowerPC.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362718 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Make masked floating point equality/ordered compares commutable for load folding purposes.

Same as what is supported for the unmasked form.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362717 91177308-0d34-0410-b5e6-96231b3b80d8

[NFC][CodeGen] Add unary fneg tests to fmul-combines.ll fnabs.ll

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362715 91177308-0d34-0410-b5e6-96231b3b80d8

[PowerPC] Add R_PPC_IRELATIVE

This will be used by lld's powerpc port.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362713 91177308-0d34-0410-b5e6-96231b3b80d8

[NFC][CodeGen] Add unary fneg tests to fp-fast.ll fp-fold.ll fp-in-intregs.ll fp-stack-compare-cmov.ll fp-stack-compare.ll fsxor-alignment.ll

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362712 91177308-0d34-0410-b5e6-96231b3b80d8

[DA] Add an option to control delinearization validity checks

Summary: Dependence Analysis performs static checks to confirm validity
of delinearization. These checks often fail for 64-bit targets due to
type conversions and integer wrapping that prevent simplification of the
SCEV expressions. These checks would also fail at compile-time if the
lower bound of the loops are compile-time unknown.

For example:

void foo(int n, int m, int a[][m]) {
  for (int i = 0; i < n; ++i)
    for (int j = 0; j < m; ++j) {
      a[i][j] = a[i+1][j-2];
    }
}

opt -mem2reg -instcombine -indvars -loop-simplify -loop-rotate -inline
-pass-remarks=.* -debug-pass=Arguments
-da-permissive-validity-checks=false k3.ll -analyze -da
will produce the following by default:

da analyze - anti [* *|<]!
but will produce the following expected dependence vector if the
validity checks are disabled:

da analyze - consistent anti [1 -2]!
This revision will introduce a debug option that will leave the validity
checks in place by default, but allow them to be turned off. New tests
are added for cases where it cannot be proven at compile-time that the
individual subscripts stay in-bound with respect to a particular
dimension of an array. These tests enable the option to provide user
guarantee that the subscripts do not over/under-flow into other
dimensions, thereby producing more accurate dependence vectors.

For prior discussion on this topic, leading to this change, please see
the following thread:
http://lists.llvm.org/pipermail/llvm-dev/2019-May/132372.html

Reviewers: Meinersbur, jdoerfert, kbarton, dmgreen, fhahn
Reviewed By: Meinersbur, jdoerfert, dmgreen
Subscribers: fhahn, hiraditya, javed.absar, llvm-commits, Whitney,
etiotto
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D62610

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362711 91177308-0d34-0410-b5e6-96231b3b80d8

[NFC][CodeGen] Remove duplicate test in fp-fast.ll

@test10 is the same as @test11.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362710 91177308-0d34-0410-b5e6-96231b3b80d8

gn build: Add new tidy checks to gn files

The checks were added in r362673 and r362672.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362709 91177308-0d34-0410-b5e6-96231b3b80d8

[AIX] Implement call lowering with parameters could pass onto GPRs

Summary:
This patch implements SDAG call lowering on AIX for functions
which only have parameters that could fit into GPRs.

Reviewers: hubert.reinterpretcast, syzaara

Differential Revision: https://reviews.llvm.org/D62823

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362708 91177308-0d34-0410-b5e6-96231b3b80d8

FileCheck [6/12]: Introduce numeric variable definition

Summary:
This patch is part of a patch series to add support for FileCheck
numeric expressions. This specific patch introduces support for defining
numeric variable in a CHECK directive.

This commit introduces support for defining numeric variable from a
litteral value in the input text. Numeric expressions can then use the
variable provided it is on a later line.

Copyright:
    - Linaro (changes up to diff 183612 of revision D55940)
    - GraphCore (changes in later versions of revision D55940 and
                 in new revision created off D55940)

Reviewers: jhenderson, chandlerc, jdenny, probinson, grimar, arichardson, rnk

Subscribers: hiraditya, llvm-commits, probinson, dblaikie, grimar, arichardson, tra, rnk, kristina, hfinkel, rogfer01, JonChesterfield

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60386

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362705 91177308-0d34-0410-b5e6-96231b3b80d8

[llvm-ar] Create thin archives with MRI scripts

This patch implements the "CREATE_THIN" MRI script command, allowing thin archives to be created via MRI scripts.

Differential Revision: https://reviews.llvm.org/D62919

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362704 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] add tests for loads of bitcasted vector pointer; NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362703 91177308-0d34-0410-b5e6-96231b3b80d8

AArch64] Handle ISD::LRINT and ISD::LLRINT for float16

This patch is a follow up for D62018 to add lrint/llrint
support for float16.

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D62863

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362700 91177308-0d34-0410-b5e6-96231b3b80d8

Revert "[SCEV] Use wrap flags in InsertBinop"

This reverts commit r362687. Miscompiles llvm-profdata during selfhost.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362699 91177308-0d34-0410-b5e6-96231b3b80d8

[AArch64] Handle ISD::LROUND and ISD::LLROUND for float16

This patch is a follow up for D61391 to add lround/llround
support for float16.

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D62861

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362698 91177308-0d34-0410-b5e6-96231b3b80d8

[X86][SSE] Add nonuniform constant vector test for PR42105

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362697 91177308-0d34-0410-b5e6-96231b3b80d8

Include what you use in LanaiAsmParser.cpp

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362696 91177308-0d34-0410-b5e6-96231b3b80d8

[DAGCombine] Cleanup isNegatibleForFree/GetNegatedExpression. NFCI.

Prep work for PR42105 - clang-format, use auto for cast and merge nested if()s

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362695 91177308-0d34-0410-b5e6-96231b3b80d8

Fix whitespace indentation. NFCI.

Tabs are not our friends.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362694 91177308-0d34-0410-b5e6-96231b3b80d8

[RISCV] Disable test/Analysis/CostModel/RISCV tests if RISCV backend not built

Adds missing lit.local.cfg. Fixes rL362691.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362693 91177308-0d34-0410-b5e6-96231b3b80d8

[MIPS GlobalISel] Select sqrt

Select G_FSQRT for MIPS32.

Differential Revision: https://reviews.llvm.org/D62905

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362692 91177308-0d34-0410-b5e6-96231b3b80d8

[RISCV] Add CostModel GEP tests

Differential Revision: https://reviews.llvm.org/D61185

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362691 91177308-0d34-0410-b5e6-96231b3b80d8

[MIPS GlobalISel] Select fabs

Select G_FABS for MIPS32.

Differential Revision: https://reviews.llvm.org/D62903

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362690 91177308-0d34-0410-b5e6-96231b3b80d8

[MIPS GlobalISel] Select fpext and fptrunc

Select G_FPEXT and G_FPTRUNC for MIPS32.

Differential Revision: https://reviews.llvm.org/D62902

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362689 91177308-0d34-0410-b5e6-96231b3b80d8

[MIPS GlobalISel] Select floor and ceil

Select G_FFLOOR and G_FCEIL for MIPS32.

Differential Revision: https://reviews.llvm.org/D62901

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362688 91177308-0d34-0410-b5e6-96231b3b80d8

[SCEV] Use wrap flags in InsertBinop

If the given SCEVExpr has no (un)signed flags attached to it, transfer
these to the resulting instruction or use them to find an existing
instruction.

Differential Revision: https://reviews.llvm.org/D61934

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362687 91177308-0d34-0410-b5e6-96231b3b80d8

[AVR] Fix the 'load.ll' test after r362351

In that commit, the 'load.ll' test was modified, but still failed.

This commit updates the test so that it now passes.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362684 91177308-0d34-0410-b5e6-96231b3b80d8

[AArch64][GlobalISel] Add manual selection support for G_ZEXTLOADs to s64.

We already get support for G_ZEXTLOAD to s32 from the importer, but it can't
deal with the SUBREG_TO_REG in the pattern. Tweaking the existing manual
selection code for G_LOAD to handle an additional SUBREG_TO_REG when dealing
with G_ZEXTLOAD isn't much work.

Also add tests to check the imported pattern selections to s32 work.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362681 91177308-0d34-0410-b5e6-96231b3b80d8

[AArch64][GlobalISel] Add the new changes to fix PR42129 that were supposed to go into r362666.

The changes weren't staged so ended up just re-commiting the unmodified reverted change.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362677 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Don't turn avx masked.load with constant mask into masked.load+vselect when passthru value is all zeroes.

This is intended to enable the use of an immediate blend or
more optimal instruction. But if the passthru is zero we don't
need any additional instructions.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362675 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Add test case for masked load with constant mask and all zeros passthru.

avx/avx2 masked loads only support all zeros for passthru in hardware.
So we have to emit a blend for all other values. We have an optimization
that tries to optimize this blend if the mask is constant. But we
don't need to perform this optimization if the passthru value is zero
which doesn't need the blend at all.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362674 91177308-0d34-0410-b5e6-96231b3b80d8

Revert "Revert "[AArch64][GlobalISel] Optimize G_FCMP + G_SELECT pairs when G_SELECT is fp""

When looking through copies, make sure to not try to find the vreg def of a physreg.
Normally getVRegDef will return nullptr in this case, but if there happens to be
multiple defs then it will assert.

This fixes PR42129.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362666 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Don't fix emergency stack slot at offset 0

This forced the caller to be aware of this, which is an ugly ABI
feature.

Partially reverts r295877. The original reasons for doing this are
mostly fixed. Alloca is now in a non-0 address space, so it should be
OK to have 0 as a valid pointer. Since we treat the absolute address
as the pointer value, this part only really needed to apply to
kernels.

Since r357093, we avoid the need to increment/decrement the offset
register in more cases, and since r354816 the scavenger can fail
without spilling, so it's less critical that we try to avoid an offset
that fits in the MUBUF offset.

Restrict to callable functions for now to split this into 2 steps to
limit thte number of test updates and in case anything breaks.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362665 91177308-0d34-0410-b5e6-96231b3b80d8

[MSAN] Add unary FNeg visitor to the MemorySanitizer

Differential Revision: https://reviews.llvm.org/D62909

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362664 91177308-0d34-0410-b5e6-96231b3b80d8

Allow target to handle STRICT floating-point nodes

The ISD::STRICT_ nodes used to implement the constrained floating-point
intrinsics are currently never passed to the target back-end, which makes
it impossible to handle them correctly (e.g. mark instructions are depending
on a floating-point status and control register, or mark instructions as
possibly trapping).

This patch allows the target to use setOperationAction to switch the action
on ISD::STRICT_ nodes to Legal. If this is done, the SelectionDAG common code
will stop converting the STRICT nodes to regular floating-point nodes, but
instead pass the STRICT nodes to the target using normal SelectionDAG
matching rules.

To avoid having the back-end duplicate all the floating-point instruction
patterns to handle both strict and non-strict variants, we make the MI
codegen explicitly aware of the floating-point exceptions by introducing
two new concepts:

- A new MCID flag "mayRaiseFPException" that the target should set on any
  instruction that possibly can raise FP exception according to the
  architecture definition.
- A new MI flag FPExcept that CodeGen/SelectionDAG will set on any MI
  instruction resulting from expansion of any constrained FP intrinsic.

Any MI instruction that is *both* marked as mayRaiseFPException *and*
FPExcept then needs to be considered as raising exceptions by MI-level
codegen (e.g. scheduling).

Setting those two new flags is straightforward. The mayRaiseFPException
flag is simply set via TableGen by marking all relevant instruction
patterns in the .td files.

The FPExcept flag is set in SDNodeFlags when creating the STRICT_ nodes
in the SelectionDAG, and gets inherited in the MachineSDNode nodes created
from it during instruction selection. The flag is then transfered to an
MIFlag when creating the MI from the MachineSDNode. This is handled just
like fast-math flags like no-nans are handled today.

This patch includes both common code changes required to implement the
new features, and the SystemZ implementation.

Reviewed By: andrew.w.kaylor

Differential Revision: https://reviews.llvm.org/D55506

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362663 91177308-0d34-0410-b5e6-96231b3b80d8

Revert "[AArch64][GlobalISel] Optimize G_FCMP + G_SELECT pairs when G_SELECT is fp"

This reverts commit r362435 as this triggers ICE, see PR42129 for details.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362662 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Invert frame index offset interpretation

Since the beginning, the offset of a frame index has been consistently
interpreted backwards. It was treating it as an offset from the
scratch wave offset register as a frame register. The correct
interpretation is the offset from the SP on entry to the function,
before the prolog. Frame index elimination then should select either
SP or another register as an FP.

Treat the scratch wave offset on kernel entry as the pre-incremented
SP. Rely more heavily on the standard hasFP and frame pointer
elimination logic, and clean up the private reservation code. This
saves a copy in most callee functions.

The kernel prolog emission code is still kind of a mess relying on
checking the uses of physical registers, which I would prefer to
eliminate.

Currently selection directly emits MUBUF instructions, which require
using a reference to some register. Use the register chosen for SP,
and then ignore this later. This should probably be cleaned up to use
pseudos that don't refer to any specific base register until frame
index elimination.

Add a workaround for shaders using large numbers of SGPRs. I'm not
sure these cases were ever working correctly, since as far as I can
tell the logic for figuring out which SGPR is the scratch wave offset
doesn't match up with the shader input initialization in the shader
programming guide.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362661 91177308-0d34-0410-b5e6-96231b3b80d8

[EarlyCSE] Add tests for negated min/max/abs [NFC]

Summary:
I'm planning to update the hashing logic to recognize their equivalence
in a subsequent change (D62644).

Reviewers: spatel

Reviewed By: spatel

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62918

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362657 91177308-0d34-0410-b5e6-96231b3b80d8

[CallSite removal] Refactoring llvm::InlineFunction APIs

Summary:
This change only unifies the API previous API pair accepting
CallInst and InvokeInst, thus making it easier to refactor
inliner pass ode to CallBase. The implementation of the unified
API still relies on the CallSite implementation.

Reviewers: eraman, chandlerc, jdoerfert

Reviewed By: jdoerfert

Subscribers: jdoerfert, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62283

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362656 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] simplify code for bitcast of insertelement; NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362655 91177308-0d34-0410-b5e6-96231b3b80d8

NewGVN: Handle addrspacecast

The AllConstant check needs to be moved out of the if/else if chain to
avoid a test regression. The "there is no SimplifyZExt" comment
puzzles me, since there is SimplifyCastInst. Additionally, the
Simplify* calls seem to not see the operand as constant, so this needs
to be tried if the simplify failed.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362653 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Fix mistake that marked VADDSSrrb_Int/VADDSDrrb_Int/VMULSSrrb_Int/VMULSDrrb_Int as commutable.

One of the sources controls the pass through value for the upper bits
of the result so we can't really commute it.

In practice this problem isn't a functional issue because we would
only try to commute this instruction in order to fold a load. But
we can't do embedded rounding and fold a load at the same time. So
the load fold would never succeed so I don't think we would ever
commute or at least keep the version after commuting.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362647 91177308-0d34-0410-b5e6-96231b3b80d8

[LOOPINFO] Extend Loop object to add utilities to get the loop bounds,
step, and loop induction variable.

Summary: This PR extends the loop object with more utilities to get loop
bounds, step, and loop induction variable. There already exists passes
which try to obtain the loop induction variable in their own pass, e.g.
loop interchange. It would be useful to have a common area to get these
information.

/// Example:
/// for (int i = lb; i < ub; i+=step)
///   <loop body>
/// --- pseudo LLVMIR ---
/// beforeloop:
///   guardcmp = (lb < ub)
///   if (guardcmp) goto preheader; else goto afterloop
/// preheader:
/// loop:
///   i1 = phi[{lb, preheader}, {i2, latch}]
///   <loop body>
///   i2 = i1 + step
/// latch:
///   cmp = (i2 < ub)
///   if (cmp) goto loop
/// exit:
/// afterloop:
///
/// getBounds
///   getInitialIVValue      --> lb
///   getStepInst            --> i2 = i1 + step
///   getStepValue           --> step
///   getFinalIVValue        --> ub
///   getCanonicalPredicate  --> '<'
///   getDirection           --> Increasing
/// getInductionVariable          --> i1
/// getAuxiliaryInductionVariable --> {i1}
/// isCanonical                   --> false

Reviewers: kbarton, hfinkel, dmgreen, Meinersbur, jdoerfert, syzaara,
fhahn
Reviewed By: kbarton
Subscribers: tvvikram, bmahjour, etiotto, fhahn, jsji, hiraditya,
llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D60565

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362644 91177308-0d34-0410-b5e6-96231b3b80d8

InstCombine: correctly change byval type attribute alongside call args.

When the byval attribute has a type, it must match the pointee type of
any parameter; but InstCombine was not updating the attribute when
folding casts of various kinds away.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362643 91177308-0d34-0410-b5e6-96231b3b80d8

IR: make getParamByValType Just Work. NFC.

Most parts of LLVM don't care whether the byval type is derived from an
explicit Attribute or from the parameter's pointee type, so it makes
sense for the main access function to just return the right value.

The very few users who do care (only BitcodeReader so far) can find out
how it's specified by accessing the Attribute directly.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362642 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Remove amdgpu-max-work-group-size attribute

This has been deprecated for a long time, and mesa recently switched
to amdgpu-flat-work-group-size.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362641 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Fix using 2 different enums for same operand flags

These enums are really for the same namespace of flags set on
arbitrary MachineOperands, so merge them to avoid value collisions.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362640 91177308-0d34-0410-b5e6-96231b3b80d8

[WebAssembly] Limit PIC support to the Emscripten target

The current PIC support currently only works with Emscripten, so
disable it for other targets.

This is the PIC portion of https://reviews.llvm.org/D62542.

Reviewed By: dschuff, sbc100

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362638 91177308-0d34-0410-b5e6-96231b3b80d8

[X86][SSE] Add vector tests to cover more isNegatibleForFree/GetNegatedExpression cases (PR42105)

Some already combine correctly, but vector constant analysis is weak.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362633 91177308-0d34-0410-b5e6-96231b3b80d8

[NFC][Reassociate] Fix mistake in 468b2ad

Missed 2 'fast fsub(0.0,X) -> fneg(X)' changes.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362631 91177308-0d34-0410-b5e6-96231b3b80d8

[NFC][Reassociate] Add unary fneg tests to fast-basictest.ll

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362630 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Add the vector integer min/max instructions to isAssociativeAndCommutative.

As far as I know these should be freely reassociatable just like
the floating point MAXC/MINC instructions.

The *reduce* test changes are largely regressions and caused by
the "generic" CPU we default to not having a scheduler model.

The machine-combiner-int-vec.ll test shows the positive benefits
of this change.

Differential Revision: https://reviews.llvm.org/D62787

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362629 91177308-0d34-0410-b5e6-96231b3b80d8

[Tests] Add poison inference tests for indvars showing both existing transforms, and some room for improvement

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362628 91177308-0d34-0410-b5e6-96231b3b80d8

[NFC][Reassociate] Regenerate CHECKs for fast-basictest.ll

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362627 91177308-0d34-0410-b5e6-96231b3b80d8

Fix shadow local variable warning. NFCI.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362622 91177308-0d34-0410-b5e6-96231b3b80d8

[dsymutil] Support more than 4 architectures

When running dsymutil on a fat binary, we use temporary files in a small
vector of size four. When processing more than 4 architectures, this
resulted in a user-after-move, because the temporary files got moved to
the heap. Instead of storing an optional temp file, we now use a unique
pointer, so the location of the actual temp file doesn't change.

We could test this by checking in 5 binaries for 5 different
architectures, but this seems wasteful, especially since the number of
elements in the small vector is arbitrary.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362621 91177308-0d34-0410-b5e6-96231b3b80d8

[x86] split more 256-bit stores of concatenated vectors

As suggested in D62498 - collectConcatOps() matches both
concat_vectors and insert_subvector patterns, and we see
more test improvements by using the more general match.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362620 91177308-0d34-0410-b5e6-96231b3b80d8

[X86][AVX] Generalize split256BitStore to splitVectorStore. NFCI.

Enables us to use this to split 512-bit vectors in future patches.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362617 91177308-0d34-0410-b5e6-96231b3b80d8

[X86][SSE] Add additional nt-load test cases as discussed on D62910

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362616 91177308-0d34-0410-b5e6-96231b3b80d8

Revert "Title: [LOOPINFO] Extend Loop object to add utilities to get the loop"

This reverts commit d34797dfc26c61cea19f45669a13ea572172ba34.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362615 91177308-0d34-0410-b5e6-96231b3b80d8

[llvm-readobj] - Remove TODOs from gnu-hash-symbols.test and demangle.test test cases.

We can remove this TODOs now.

Differential revision: https://reviews.llvm.org/D62846

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362614 91177308-0d34-0410-b5e6-96231b3b80d8

[SLP] Fix regression in broadcasts caused by operand reordering patch D59973.

This patch fixes a regression caused by the operand reordering refactoring patch https://reviews.llvm.org/D59973 .
The fix changes the strategy to Splat instead of Opcode, if broadcast opportunities are found.
Please see the lit test for some examples.

Committed on behalf of @vporpo (Vasileios Porpodas)

Differential Revision: https://reviews.llvm.org/D62427

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362613 91177308-0d34-0410-b5e6-96231b3b80d8

[LoopUtils][SLPVectorizer] clean up management of fast-math-flags

Instead of passing around fast-math-flags as a parameter, we can set those
using an IRBuilder guard object. This is no-functional-change-intended.

The motivation is to eventually fix the vectorizers to use and set the
correct fast-math-flags for reductions. Examples of that not behaving as
expected are:
https://bugs.llvm.org/show_bug.cgi?id=23116 (should be able to reduce with less than 'fast')
https://bugs.llvm.org/show_bug.cgi?id=35538 (possible miscompile for -0.0)
D61802 (should be able to reduce with IR-level FMF)

Differential Revision: https://reviews.llvm.org/D62272

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362612 91177308-0d34-0410-b5e6-96231b3b80d8

[LoopInfo] Fix unused variable warning. NFC.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362610 91177308-0d34-0410-b5e6-96231b3b80d8

Title: [LOOPINFO] Extend Loop object to add utilities to get the loop
bounds, step, and loop induction variable.

Summary: This PR extends the loop object with more utilities to get loop
bounds, step, and loop induction variable. There already exists passes
which try to obtain the loop induction variable in their own pass, e.g.
loop interchange. It would be useful to have a common area to get these
information.

/// Example:
/// for (int i = lb; i < ub; i+=step)
///   <loop body>
/// --- pseudo LLVMIR ---
/// beforeloop:
///   guardcmp = (lb < ub)
///   if (guardcmp) goto preheader; else goto afterloop
/// preheader:
/// loop:
///   i1 = phi[{lb, preheader}, {i2, latch}]
///   <loop body>
///   i2 = i1 + step
/// latch:
///   cmp = (i2 < ub)
///   if (cmp) goto loop
/// exit:
/// afterloop:
///
/// getBounds
///   getInitialIVValue      --> lb
///   getStepInst            --> i2 = i1 + step
///   getStepValue           --> step
///   getFinalIVValue        --> ub
///   getCanonicalPredicate  --> '<'
///   getDirection           --> Increasing
/// getInductionVariable          --> i1
/// getAuxiliaryInductionVariable --> {i1}
/// isCanonical                   --> false

Reviewers: kbarton, hfinkel, dmgreen, Meinersbur, jdoerfert, syzaara,
fhahn
Reviewed By: kbarton
Subscribers: tvvikram, bmahjour, etiotto, fhahn, jsji, hiraditya,
llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D60565

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362609 91177308-0d34-0410-b5e6-96231b3b80d8

[NFC][Codegen][X86] Add AVX2 runline for '(X & (C l>> Y)) ==/!= 0' tests

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362606 91177308-0d34-0410-b5e6-96231b3b80d8

UpdateTestChecks: hexagon support

Summary:
These tests are being affected by an upcoming patch,
so having an understandable (autogenerated) diff is helpful.

This target, again, prefers `-march`:
```
llvm/test/CodeGen/Hexagon$ grep -r triple | wc -l
467
llvm/test/CodeGen/Hexagon$ grep -r march | wc -l
1167
```

Reviewers: RKSimon, kparzysz

Reviewed By: kparzysz

Subscribers: xbolva00, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62867

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362605 91177308-0d34-0410-b5e6-96231b3b80d8

[MIPS GlobalISel] Select fcmp

Select floating point compare for MIPS32.

Differential Revision: https://reviews.llvm.org/D62721

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362603 91177308-0d34-0410-b5e6-96231b3b80d8

[yaml2obj] - Change how we handle implicit sections.

We have a few sections that can be added implicitly to the output:
".dynsym", ".dynstr", ".symtab", ".strtab" and ".shstrtab".

Problem appears when such section is listed explicitly in YAML.
In that case it's content is written twice:
first time during writing of regular sections listed in the document
and second time during special handling.

Because of that their file offsets can become unexpectedly broken:
(yaml file for sample below lists .dynsym explicitly before .text.foo)

Before patch:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] .dynsym           DYNSYM           0000000000000100  00000250
       0000000000000030  0000000000000018   A       6     0     8
  [ 2] .text.foo         PROGBITS         0000000000000200  00000200
       0000000000000000  0000000000000000  AX       0     0     0

After patch:
Section Headers:
  [Nr] Name         Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] .dynsym           DYNSYM           0000000000000100  00000200
       0000000000000030  0000000000000018   A       6     0     8
  [ 2] .text.foo         PROGBITS         0000000000000200  00000230
       0000000000000000  0000000000000000  AX       0     0     0

This patch reorganizes our code and fixes the issue described.

Differential revision: https://reviews.llvm.org/D62809

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362602 91177308-0d34-0410-b5e6-96231b3b80d8

[ARM] Allow "-march=foo+fp" to vary with foo

This is the LLVM part of this change, the Clang part contains the full
description in its commit message.

Differential Revision: https://reviews.llvm.org/D60697

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362600 91177308-0d34-0410-b5e6-96231b3b80d8

[X86][AVX] combineX86ShuffleChain - combine shuffle(extractsubvector(x),extractsubvector(y))

We already handle the case where we combine shuffle(extractsubvector(x),extractsubvector(x)), this relaxes the requirement to permit different sources as long as they have the same value type.

This causes a couple of cases where the VPERMV3 binary shuffles occur at a wider width than before, which I intend to improve in future commits - but as only the subvector's mask indices are defined, these will broadcast so we don't see any increase in constant size.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362599 91177308-0d34-0410-b5e6-96231b3b80d8

gn build: Merge r362578

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362598 91177308-0d34-0410-b5e6-96231b3b80d8

[llvm-objdump] - Disassemble non-executable sections if specifically requested.

This is https://bugs.llvm.org/show_bug.cgi?id=41897.

Previously -d + -j .data had no effect, that wasn't consistent with GNU,
which proccesses .data in that case. With this patch we follow this behavior.

Diffeential revision: https://reviews.llvm.org/D62848

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362596 91177308-0d34-0410-b5e6-96231b3b80d8

[TargetLowering] SimplifyDemandedBits - pull out shift value type. NFCI.

Will be used more in an upcoming patch.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362595 91177308-0d34-0410-b5e6-96231b3b80d8

[X86][SSE] Add some nt-store test cases inspired by PR42123

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362594 91177308-0d34-0410-b5e6-96231b3b80d8

Sanitize llvm-size help

Remove irrelevant options from standard help output.

New output:

    OVERVIEW: llvm object size dumper

    USAGE: llvm-size [options] <input files>

    OPTIONS:

    Generic Options:

      --help           - Display available options (--help-hidden for more)
      --help-list      - Display list of available options (--help-list-hidden for more)
      --version        - Display the version of this program

    llvm-size Options:

      Specify output format
          -A             - System V format
          -B             - Berkeley format
          -m             - Darwin -m format
      --arch=<string>  - architecture(s) from a Mach-O file to dump
      --common         - Print common symbols in the ELF file.  When using Berkely format, this is added to bss.
      Print size in radix:
          -o             - Print size in octal
          -d             - Print size in decimal
          -x             - Print size in hexadecimal
      --format=<value> - Specify output format
        =sysv          -   System V format
        =berkeley      -   Berkeley format
        =darwin        -   Darwin -m format
      -l               - When format is darwin, use long format to include addresses and offsets.
      --radix=<value>  - Print size in radix
        =8             -   Print size in octal
        =10            -   Print size in decimal
        =16            -   Print size in hexadecimal
      --totals         - Print totals of all objects - Berkeley format only

Differential Revision: https://reviews.llvm.org/D62482

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362593 91177308-0d34-0410-b5e6-96231b3b80d8

[IPO] Disabled 'default only' switch statements to fix MSVC warnings.

@jdoerfert Looks like these are placeholders for incoming abstract attributes patches so I've just commented the code out, even though this is usually frowned upon.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362592 91177308-0d34-0410-b5e6-96231b3b80d8

Include what you use in PPCFrameLowering.h

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362590 91177308-0d34-0410-b5e6-96231b3b80d8

[CMake] Export CMAKE_CONFIGURATION_TYPES for the LLVM build-tree

Summary: Useful info for standalone builds of subprojects. If a multi-configuration generator was used for the provided LLVM build-tree, standalone builds should consider actual subdirectories per configuration in `find_program()` (e.g. looking for `llvm-lit` or `llvm-tblgen`).

Reviewers: labath, beanz, mgorny

Subscribers: lldb-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62878

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362588 91177308-0d34-0410-b5e6-96231b3b80d8

Resubmit "[CorrelatedValuePropagation] Fix prof branch_weights metadata handling for SwitchInst"

This reverts commit 5b32f60ec31ce136edac6f693538aeb6039f4ad0.
The fix is in commit 4f9e68148bd0dada2d6997625432385918ac2e2c.

This patch fixes the CorrelatedValuePropagation pass to keep
prof branch_weights metadata of SwitchInst consistent.
It makes use of SwitchInstProfUpdateWrapper.
New tests are added.

Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D62126

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362583 91177308-0d34-0410-b5e6-96231b3b80d8

Suppress false-positive GCC -Wreturn-type warning.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362582 91177308-0d34-0410-b5e6-96231b3b80d8

Read .note.gnu.property sections and emit a merged .note.gnu.property section.

This patch also adds `--require-cet` option for the sake of testing.
The actual feature for IBT-aware PLT is not included in this patch.

This is a part of https://reviews.llvm.org/D59780. Submitting this
first should make it easy to work with a related change
(https://reviews.llvm.org/D62609).

Differential Revision: https://reviews.llvm.org/D62853

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362579 91177308-0d34-0410-b5e6-96231b3b80d8

[Attributor] Pass infrastructure and fixpoint framework

NOTE: Note that no attributes are derived yet. This patch will not go in
alone but only with others that derive attributes. The framework is
split for review purposes.

This commit introduces the Attributor pass infrastructure and fixpoint
iteration framework. Further patches will introduce abstract attributes
into this framework.

In a nutshell, the Attributor will update instances of abstract
arguments until a fixpoint, or a "timeout", is reached. Communication
between the Attributor and the abstract attributes that are derived is
restricted to the AbstractState and AbstractAttribute interfaces.

Please see the file comment in Attributor.h for detailed information
including design decisions and typical use case. Also consider the class
documentation for Attributor, AbstractState, and AbstractAttribute.

Reviewers: chandlerc, homerdin, hfinkel, fedor.sergeev, sanjoy, spatel, nlopes, nicholas, reames

Subscribers: mehdi_amini, mgorny, hiraditya, bollu, steven_wu, dexonsmith, dang, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59918

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362578 91177308-0d34-0410-b5e6-96231b3b80d8

[NFC][FnAttrs] Stress tests for attribute deduction

This commit is a preparation of upcoming patches on attribute deduction.
It will shorten the diffs and make it clear what we inferred before.

Reviewers: chandlerc, homerdin, hfinkel, fedor.sergeev, sanjoy, spatel, nlopes

Subscribers: bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59903

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362577 91177308-0d34-0410-b5e6-96231b3b80d8

[PowerPC] Collapse RLDICL/RLDICR into RLDIC when possible

Generally speaking, we lower to an optimal rotate sequence for nodes visible in
the SDAG. However, there are instances where the two rotates are not visible at
ISEL time - most notably those in a very common sequence when lowering switch
statements to jump tables.

A common situation is a switch on a 32-bit integer. This value has to have the
upper 32 bits cleared and because jump table offsets are word offsets, the value
needs to be shifted left by 2 bits. We currently emit the clear and the left
shift as two separate instructions, but this is not needed as we can lower it to
a single RLDIC.

This patch just cleans that up.

Differential revision: https://reviews.llvm.org/D60402

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362576 91177308-0d34-0410-b5e6-96231b3b80d8

[PowerPC][NFC] Add codegen test for consecutive stores of vector elements

NFC commit of a test case in order for the subsequent review to show differences
in codegen.

Differential revision: https://reviews.llvm.org/D62843

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362573 91177308-0d34-0410-b5e6-96231b3b80d8

[llvm-objdump/llvm-readobj/obj2yaml/yaml2obj] Support DT_PPC_GOT and DT_PPC_OPT

In glibc, DT_PPC_GOT indicates that PowerPC32 Secure PLT ABI is used.
I plan to use it in D62464.

DT_PPC_OPT currently indicates if a TLSDESC inspired TLS optimization is
enabled.

Reviewed By: grimar, jhenderson, rupprecht

Differential Revision: https://reviews.llvm.org/D62851

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362569 91177308-0d34-0410-b5e6-96231b3b80d8