granicus.if.org Git

[LV] Merge floating-point and integer induction widening code

This patch merges the existing floating-point induction variable widening code
into the integer induction variable widening code, creating a single set of
functions for both kinds of inductions. The primary motivation for doing this
is to enable vector phi node creation for floating-point induction variables.

Differential Revision: https://reviews.llvm.org/D30211

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296145 91177308-0d34-0410-b5e6-96231b3b80d8

[PowerPC] Use subfic instruction for subtract from immediate

Provide a 64-bit pattern to use SUBFIC for subtracting from a 16-bit immediate.
The corresponding pattern already exists for 32-bit integers.

Committing on behalf of Hiroshi Inoue.

Differential Revision: https://reviews.llvm.org/D29387

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296144 91177308-0d34-0410-b5e6-96231b3b80d8

[PowerPC] Use rldicr instruction for AND with an immediate if possible

Emit clrrdi (extended mnemonic for rldicr) for AND-ing with masks that
clear bits from the right hand size.

Committing on behalf of Hiroshi Inoue.

Differential Revision: https://reviews.llvm.org/D29388

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296143 91177308-0d34-0410-b5e6-96231b3b80d8

[APInt] Add APInt::extractBits() method to extract APInt subrange

The current pattern for extract bits in range is typically:

Mask.lshr(BitOffset).trunc(SubSizeInBits);

Which can be particularly slow for large APInts (MaskSizeInBits > 64) as they require the allocation of memory for the temporary variable.

This is another of the compile time issues identified in PR32037 (see also D30265).

This patch adds the APInt::extractBits() helper method which avoids the temporary memory allocation.

Differential Revision: https://reviews.llvm.org/D30336

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296141 91177308-0d34-0410-b5e6-96231b3b80d8

Fixed IntOperandMatcher::emitCxxPredicateExpr arguments

Extra const in the StringRef argument meant that MSVC complained about it not correctly overriding from OperandPredicateMatcher::emitCxxPredicateExpr (which didn't have the const)

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296138 91177308-0d34-0410-b5e6-96231b3b80d8

[DAGCombiner] add missing folds for scalar select of {-1,0,1}

The motivation for filling out these select-of-constants cases goes back to D24480,
where we discussed removing an IR fold from add(zext) --> select. And that goes back to:
https://reviews.llvm.org/rL75531
https://reviews.llvm.org/rL159230

The idea is that we should always canonicalize patterns like this to a select-of-constants
in IR because that's the smallest IR and the best for value tracking. Note that we currently
do the opposite in some cases (like the cases in *this* patch). Ie, the proposed folds in
this patch already exist in InstCombine today:
https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/InstCombine/InstCombineSelect.cpp#L1151

As this patch shows, most targets generate better machine code for simple ext/add/not ops
rather than a select of constants. So the follow-up steps to make this less of a patchwork
of special-case folds and missing IR canonicalization:

1. Have DAGCombiner convert any select of constants into ext/add/not ops.
2 Have InstCombine canonicalize in the other direction (create more selects).

Differential Revision: https://reviews.llvm.org/D30180

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296137 91177308-0d34-0410-b5e6-96231b3b80d8

Recommit "[mips] Fix atomic compare and swap at O0."

This time with the missing files.

Similar to PR/25526, fast-regalloc introduces spills at the end of basic
blocks. When this occurs in between an ll and sc, the store can cause the
atomic sequence to fail.

This patch fixes the issue by introducing more pseudos to represent atomic
operations and moving their lowering to after the expansion of postRA
pseudos.

This resolves PR/32020.

Thanks to James Cowgill for reporting the issue!

Reviewers: slthakur

Differential Revision: https://reviews.llvm.org/D30257

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296134 91177308-0d34-0410-b5e6-96231b3b80d8

Revert "[mips] Fix atomic compare and swap at O0."

This reverts r296132. I forgot to include the tests.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296133 91177308-0d34-0410-b5e6-96231b3b80d8

[mips] Fix atomic compare and swap at O0.

Similar to PR/25526, fast-regalloc introduces spills at the end of basic
blocks. When this occurs in between an ll and sc, the store can cause the
atomic sequence to fail.

This patch fixes the issue by introducing more pseudos to represent atomic
operations and moving their lowering to after the expansion of postRA
pseudos.

This resolves PR/32020.

Thanks to James Cowgill for reporting the issue!

Reviewers: slthakur

Differential Revision: https://reviews.llvm.org/D30257

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296132 91177308-0d34-0410-b5e6-96231b3b80d8

[globalisel] Decouple src pattern operands from dst pattern operands.

Summary:
This isn't testable for AArch64 by itself so this patch also adds
support for constant immediates in the pattern and physical
register uses in the result.

The new IntOperandMatcher matches the constant in patterns such as
'(set $rd:GPR32, (G_XOR $rs:GPR32, -1))'. It's always safe to fold
immediates into an instruction so this is the first rule that will match
across multiple BB's.

The Renderer hierarchy is responsible for adding operands to the result
instruction. Renderers can copy operands (CopyRenderer) or add physical
registers (in particular %wzr and %xzr) to the result instruction
in any order (OperandMatchers now import the operand names from
SelectionDAG to allow renderers to access any operand). This allows us to
emit the result instruction for:
%1 = G_XOR %0, -1 --> %1 = ORNWrr %wzr, %0
%1 = G_XOR -1, %0 --> %1 = ORNWrr %wzr, %0
although the latter is untested since the matcher/importer has not been
taught about commutativity yet.

Added BuildMIAction which can build new instructions and mutate them where
possible. W.r.t the mutation aspect, MatchActions are now told the name of
an instruction they can recycle and BuildMIAction will emit mutation code
when the renderers are appropriate. They are appropriate when all operands
are rendered using CopyRenderer and the indices are the same as the matcher.
This currently assumes that all operands have at least one matcher.

Finally, this change also fixes a crash in
AArch64InstructionSelector::select() caused by an immediate operand
passing isImm() rather than isCImm(). This was uncovered by the other
changes and was detected by existing tests.

Depends on D29711

Reviewers: t.p.northover, ab, qcolombet, rovka, aditya_nandakumar, javed.absar

Reviewed By: rovka

Subscribers: aemerson, dberris, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D29712

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296131 91177308-0d34-0410-b5e6-96231b3b80d8

[X86][SSE] Target shuffle combine can try to combine up to 16 vectors

Noticed while profiling PR32037, the target shuffle ops were being stored in SmallVector<*,8> types but the combiner could store as many as 16 ops at maximum depth (2 per depth).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296130 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] don't try SimplifyDemandedInstructionBits from zext/sext because it's slow and unnecessary

This one seems more obvious than D30270 that it can't make improvements because an extension always needs
all of the incoming bits. There's one specific transform in SimplifyDemandedInstructionBits of converting
a sext to a zext when the sign-bit is known zero, but that is handled explicitly in visitSext() with
ComputeSignBit().

Like D30270, there are no IR differences (other than instruction names) for the case in PR32037:
https://bugs.llvm.org//show_bug.cgi?id=32037
...and no regression test differences.

Zext/sext are a smaller part of the profile, but this still appears to shave off another 0.5% or so from
'opt -O2'.

Differential Revision: https://reviews.llvm.org/D30280

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296129 91177308-0d34-0410-b5e6-96231b3b80d8

[x86] use DAG.getAllOnesConstant(); NFCI

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296128 91177308-0d34-0410-b5e6-96231b3b80d8

Fix missing call to base class constructor in r296121.

The 'Kind' member used in RTTI for InstructionPredicateMatcher was not
initialized but went undetected since I always ended up with the correct value.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296126 91177308-0d34-0410-b5e6-96231b3b80d8

[mips] Handle 64 bit immediate in and/or/xor pseudo instructions on mips64

Previously LLVM was assuming 32-bit signed immediates which results in and with
a bitmask that has bit 31 set to incorrectly include bits 63-32 in the result.
After applying this patch I can now compile all of the FreeBSD mips assembly
code with clang.

This issue also affects the nor, slt and sltu macros and I will fix those in a
separate review.

Patch By: Alexander Richardson

Commit message reformatted by sdardis.

Reviewers: atanasyan, theraven, sdardis

Differential Revision: https://reviews.llvm.org/D30298

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296125 91177308-0d34-0410-b5e6-96231b3b80d8

[ARM] GlobalISel: Select G_STORE

Same as selecting G_LOAD.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296122 91177308-0d34-0410-b5e6-96231b3b80d8

[globalisel] Sort RuleMatchers by priority.

Summary:
This makes more important rules have priority over less important rules.
For example, '%a = G_ADD $b:s64, $c:s64' has priority over
'%a = G_ADD $b:s32, $c:s32'. Previously these rules were emitted in the
correct order by chance.

NFC in this patch but it is required to make the next patch work correctly.

Depends on D29710

Reviewers: t.p.northover, ab, qcolombet, aditya_nandakumar, rovka

Reviewed By: ab, rovka

Subscribers: javed.absar, dberris, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D29711

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296121 91177308-0d34-0410-b5e6-96231b3b80d8

Minor test fix

The test was using a size of 8 for loading/storing pointers. It should be 4.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296120 91177308-0d34-0410-b5e6-96231b3b80d8

[ARM] GlobalISel: Add reg bank mappings for stores

Same as the ones for loads.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296115 91177308-0d34-0410-b5e6-96231b3b80d8

[mips][mc] Fix a crash when disassembling odd sized sections

Attempt to fix failing test.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296112 91177308-0d34-0410-b5e6-96231b3b80d8

Fixup r296105 - only run tests on Mips

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296111 91177308-0d34-0410-b5e6-96231b3b80d8

Fix signed/unsigned comparison warnings

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296109 91177308-0d34-0410-b5e6-96231b3b80d8

[ARM] GlobalISel: Legalize stores

Allow the same types that we allow for loads.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296108 91177308-0d34-0410-b5e6-96231b3b80d8

[mips][mc] Fix a crash when disassembling odd sized sections

Corresponding test.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296106 91177308-0d34-0410-b5e6-96231b3b80d8

[mips][mc] Fix a crash when disassembling odd sized sections

Make the MIPS disassembler consistent with the other targets in returning
a Size of zero when the input buffer cannot contain an instruction due
to it's size. Previously it reported the minimum instruction size when
it failed due to the buffer not being big enough for an instruction
causing llvm-objdump to crash when disassembling all sections.

Reviewers: slthakur

Differential Revision: https://reviews.llvm.org/D29984

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296105 91177308-0d34-0410-b5e6-96231b3b80d8

Revert "[ARM] GlobalISel: Legalize stores"

This reverts commit r296103 because the test broke on one of the bots. Sorry!

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296104 91177308-0d34-0410-b5e6-96231b3b80d8

[ARM] GlobalISel: Legalize stores

Allow the same types that we allow for loads.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296103 91177308-0d34-0410-b5e6-96231b3b80d8

[APInt] Add APInt::setBits() method to set all bits in range

The current pattern for setting bits in range is typically:

Mask |= APInt::getBitsSet(MaskSizeInBits, LoPos, HiPos);

Which can be particularly slow for large APInts (MaskSizeInBits > 64) as they require the allocation memory for the temporary variable.

This is one of the key compile time issues identified in PR32037.

This patch adds the APInt::setBits() helper method which avoids the temporary memory allocation completely, this first implementation uses setBit() internally instead but already significantly reduces the regression in PR32037 (~10% drop). Additional optimization may be possible.

I investigated whether there is need for APInt::clearBits() and APInt::flipBits() equivalents but haven't seen these patterns to be particularly common, but reusing the code would be trivial.

Differential Revision: https://reviews.llvm.org/D30265

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296102 91177308-0d34-0410-b5e6-96231b3b80d8

Add missing initialization for MachineOptimizationRemarkEmitter

This was missed in r293110.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296096 91177308-0d34-0410-b5e6-96231b3b80d8

[WebAssembly] Add a README.txt entry for mergeable sections.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296095 91177308-0d34-0410-b5e6-96231b3b80d8

[AVX-512] Separate the fadd/fsub/fmul/fdiv/fmax/fmin with rounding mode ISD opcodes into separate packed and scalar opcodes. This is more consistent with the rest of the ISD opcodes. NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296094 91177308-0d34-0410-b5e6-96231b3b80d8

[ExecutionDepsFix] Use range-based for loop. NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296093 91177308-0d34-0410-b5e6-96231b3b80d8

[IR][X86] Fix llvm version number in comments in AutoUpgrade. Forgot the next release is 5.0 not 4.1

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296092 91177308-0d34-0410-b5e6-96231b3b80d8

[AVX-512] Remove lzcnt intrinsics and autoupgrade them to generic ctlz intrinsics with select.

Clang has been emitting cltz intrinsics for a while now.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296091 91177308-0d34-0410-b5e6-96231b3b80d8

[AVX-512] Move lzcnt and conflict intrinsic tests to avx512cd intrinsic test file since that's their feature.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296090 91177308-0d34-0410-b5e6-96231b3b80d8

[AVX-512] Use update_llc_test_checks.py to generate a test.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296089 91177308-0d34-0410-b5e6-96231b3b80d8

[Fuchsia] Use thread-pointer ABI slots for stack-protector and safe-stack

The Fuchsia ABI defines slots from the thread pointer where the
stack-guard value for stack-protector, and the unsafe stack pointer
for safe-stack, are stored. This parallels the Android ABI support.

Patch by Roland McGrath

Differential Revision: https://reviews.llvm.org/D30237

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296081 91177308-0d34-0410-b5e6-96231b3b80d8

Add some testcases for bitfields with illegal widths.

clang will generate IR like this for input using packed bitfields;
very simple semantically, but it's a bit tricky to actually
generate good code.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296080 91177308-0d34-0410-b5e6-96231b3b80d8

Fix old testcase for dead store to match the original intent.

The x86 backend has a special case for load+xor+store, which isn't really
what this is trying to test.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296077 91177308-0d34-0410-b5e6-96231b3b80d8

Fix an iterator invalidation bug when simplifying LIC user.

LoopUnswitch/simplify-with-nonvalness.ll is the test case for this.
The LIC has 2 users and deleting the 1st user when it can be simplified
invalidated the iterator for the 2nd user.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296069 91177308-0d34-0410-b5e6-96231b3b80d8

[LazyMachineBFI] Add testcase

This is based on Justin's testcase and checking whether BFI is not populated
in case hotness is off.

This is a patch meant on top of Justin's patch to enable Machine opt-remarks
in the
AsmPrinter (http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20170130/426595.html)

Differential Revision: https://reviews.llvm.org/D29837

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296065 91177308-0d34-0410-b5e6-96231b3b80d8

Revert r269060 to pacify bots.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296064 91177308-0d34-0410-b5e6-96231b3b80d8

OptDiag: Add test for r296053

Forgot to commit this with the change.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296061 91177308-0d34-0410-b5e6-96231b3b80d8

[CGP] Split some critical edges coming out of indirect branches

Splitting critical edges when one of the source edges is an indirectbr
is hard in general (because it requires changing the memory the indirectbr
reads). But if a block only has a single indirectbr predecessor (which is
the common case), we can simulate splitting that edge by splitting
the destination block, and retargeting the *direct* branches.

This is motivated by the use of computed gotos in python 2.7: PyEval_EvalFrame()
ends up using an indirect branch with ~100 successors, and passing a constant to
each of those. Since MachineSink can't break indirect critical edges on demand
(and doing this in MIR doesn't look feasible), this causes us to emit about ~100
defs of registers containing constants, which we in the predecessor block, where
only one of those constants is used in each successor. So, at each computed goto,
we needlessly spill about a 100 constants to stack. The end result is that a
clang-compiled python interpreter can be about ~2.5x slower on a simple python
reduction loop than a gcc-compiled interpreter.

Differential Revision: https://reviews.llvm.org/D29916

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296060 91177308-0d34-0410-b5e6-96231b3b80d8

[GlobalISel] Use the same name for all remarks.

While there, switch to the explicit ctor.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296059 91177308-0d34-0410-b5e6-96231b3b80d8

[GlobalISel] Use the DISubprogram for translation failure remarks.

Justin added support for DISubprogram locs in r295531 and r296052.
Use that instead of no-loc for constants and arguments.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296058 91177308-0d34-0410-b5e6-96231b3b80d8

[GlobalISel] Remove now-unnecessary variable. NFC.

Since r296047, we're able to return early on failures.
Don't track whether we succeeded.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296057 91177308-0d34-0410-b5e6-96231b3b80d8

Fix unit tests after r296049.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296055 91177308-0d34-0410-b5e6-96231b3b80d8

OptDiag: Summarize the instruction count in asm-printer

Add an optimization remark to asm-printer that summarizes the number
of instructions emitted per function.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296053 91177308-0d34-0410-b5e6-96231b3b80d8

OptDiag: Use DiagnosticLocation in MachineOptimizationRemarks

DiagnosticInfo switched from DebugLoc to DiagnosticLocation in
r295519, update these subclasses to match.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296052 91177308-0d34-0410-b5e6-96231b3b80d8

[msan] Fix instrumentation of array allocas.

Before this, MSan poisoned exactly one element of any array alloca,
even if the number of elements was zero.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296050 91177308-0d34-0410-b5e6-96231b3b80d8

Implement some methods for NativeRawSymbol

This allows the ability to call IPDBSession::getGlobalScope with a NativeSession and
to then query it for some basic fields from the PDB's InfoStream.
Note that the symbols now have non-const references back to the Session so that
NativeRawSymbol can access the PDBFile through the Session.

Differential Revision: https://reviews.llvm.org/D30314

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296049 91177308-0d34-0410-b5e6-96231b3b80d8

[GlobalISel] Don't translate other blocks when one failed.

We were stopping the translation of the parent block when the
translation of an instruction failed, but we were still trying to
translate the other blocks of the parent function.

Don't do that.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296047 91177308-0d34-0410-b5e6-96231b3b80d8

[GlobalISel] Finalize translated function on scope exit. NFC.

This is the compromise between having a per-function IRTranslator
and manually managing the per-function state.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296046 91177308-0d34-0410-b5e6-96231b3b80d8

fix 80-column violation

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296045 91177308-0d34-0410-b5e6-96231b3b80d8

Delete outdated comment. NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296043 91177308-0d34-0410-b5e6-96231b3b80d8

LoopUnswitch - Simplify based on known not to a be constant.

Summary: In case we do not know what the condition is in an unswitched loop, but we know its definitely NOT a known constant. We can perform simplifcations based on this information.

Reviewers: sanjoy, hfinkel, chenli, efriedma

Reviewed By: efriedma

Subscribers: david2050, llvm-commits, mzolotukhin

Differential Revision: https://reviews.llvm.org/D28968

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296041 91177308-0d34-0410-b5e6-96231b3b80d8

[OptDiag] Comment about the legacy status of emitOptimizationRemark*
functions

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296039 91177308-0d34-0410-b5e6-96231b3b80d8

[OptDiag] Remove hotness parameter from legacy remark ctors

Anything using hotness should be using ORE.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296038 91177308-0d34-0410-b5e6-96231b3b80d8

[OptDiag] Hide legacy remark ctors

These are only used when emitting remarks without ORE directly using the free
functions emitOptimizationRemark*.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296037 91177308-0d34-0410-b5e6-96231b3b80d8

[ADT] Fix zip iterator interface.

This commit provides `zip_{first,shortest}` with the standard member types and
methods expected of iterators (e.g., `difference_type`), in order for zip to be
used with other adaptors, such as `make_filter_range`.

Support for reverse iteration has also been added.

Differential Revision: https://reviews.llvm.org/D30246

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296036 91177308-0d34-0410-b5e6-96231b3b80d8

[IR] Add a Instruction::dropPoisonGeneratingFlags helper

Summary:
The helper will be used in a later change. This change itself is NFC
since the only user of this new function is its unit test.

Reviewers: majnemer, efriedma

Reviewed By: efriedma

Subscribers: efriedma, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D30184

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296035 91177308-0d34-0410-b5e6-96231b3b80d8

[NVPTX] Added support for .f16x2 instructions.

This patch enables support for .f16x2 operations.

Added new register type Float16x2.
Added support for .f16x2 instructions.
Added handling of vectorized loads/stores of v2f16 values.

Differential Revision: https://reviews.llvm.org/D30057
Differential Revision: https://reviews.llvm.org/D30310

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296032 91177308-0d34-0410-b5e6-96231b3b80d8

ARM: make sure FastISel bails on f64 operations for Cortex-M4.

FastISel wasn't checking the isFPOnlySP subtarget feature before emitting
double-precision operations, so it got completely invalid CodeGen for doubles
on Cortex-M4F.

The normal ISel testing wasn't spectacular either so I added a second RUN line
to improve that while I was in the area.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296031 91177308-0d34-0410-b5e6-96231b3b80d8

Revert r282872 "CVP. Turn marking adds as no wrap on by default"

While not CVP's fault, this caused miscompiles (PR31181). Reverting
until those are resolved.

(This also reverts the follow-ups r288154 and r288161 which removed the
flag.)

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296030 91177308-0d34-0410-b5e6-96231b3b80d8

[llvm-cov] Strip redundant path components from filenames (fix PR31982)

Instead of stripping the longest common prefix off of the filenames in a
report, strip out the longest chain of redundant path components. This
fixes the case in PR31982, where there are two files with the same
prefix, and stripping out the LCP makes things less intelligible.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296029 91177308-0d34-0410-b5e6-96231b3b80d8

Add call branch annotation for ICP promoted direct call in SamplePGO mode.

Summary: SamplePGO uses branch_weight annotation to represent callsite hotness. When ICP promotes an indirect call to direct call, we need to make sure the direct call is annotated with branch_weight in SamplePGO mode, so that downstream function inliner can use hot callsite heuristic.

Reviewers: davidxl, eraman, xur

Reviewed By: davidxl, xur

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D30282

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296028 91177308-0d34-0410-b5e6-96231b3b80d8

[Hexagon] Handle saturations in Hexagon bit tracker

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296026 91177308-0d34-0410-b5e6-96231b3b80d8

[Hexagon] Allow setting register in BitVal without storing into map

In the bit tracker, references to other bit values in which the register
is 0 are prohibited. This means that generating self-referential register
cells like { w:32 [0-15]:s[0-15] [16-31]:s[15] } is impossible. In order
to get a self-referential cell, it had to be stored into a map and then
reloaded from it. To avoid this step, add a function that will set the
register to a given value without going through the map.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296025 91177308-0d34-0410-b5e6-96231b3b80d8

[AMDGPU] Shut the warning "getRegUnitWeight hides overload...". NFC.

Clang issues warning about hidden overload. That was intended, so
add "using AMDGPUGenRegisterInfo::getRegUnitWeight;" to mute it.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296021 91177308-0d34-0410-b5e6-96231b3b80d8

[ORE] Remove ORE.emit{{.+}} functions

Last use was killed in my previous patch. The preferred way is now to
construct the remark, pipe things to it and pass it to ORE.emit.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296019 91177308-0d34-0410-b5e6-96231b3b80d8

CodeGen: MachineBlockPlacement: Rename member to more general name. NFC.

Rename ComputedTrellisEdges to ComputedEdges to allow for other methods of
pre-computing edges.

Differential Revision: https://reviews.llvm.org/D30308

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296018 91177308-0d34-0410-b5e6-96231b3b80d8

[LAA] Remove unused LoopAccessReport

The need for this removed when I converted everything to use the opt-remark
classes directly with the streaming interface.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296017 91177308-0d34-0410-b5e6-96231b3b80d8

[LV] Remove unused VectorizationReport

The need for this removed when I converted everything to use the opt-remark
classes directly with the streaming interface.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296016 91177308-0d34-0410-b5e6-96231b3b80d8

Disable TLS for stack protector on Android API<17.

The TLS slot did not exist back then.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296014 91177308-0d34-0410-b5e6-96231b3b80d8

[GlobalISel] Emit opt remarks on isel fallbacks.

Having more fine-grained information on the specific construct that
caused us to fallback is valuable for large-scale data collection.

We still have the fallback warning, that's also used for FastISel.
We still need to remove the fallback warning, and teach FastISel to also
emit remarks (it currently has a combination of the warning, stats, and
debug prints: the remarks could unify all three).

The abort-on-fallback path could also be better handled using remarks:
one could imagine a "-Rpass-error", analoguous to "-Werror", which would
promote missed/failed remarks to errors. It's not clear whether that
would be useful for other remarks though, so we're not there yet.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296013 91177308-0d34-0410-b5e6-96231b3b80d8

[CodeGen] Teach opt remarks how to print MI instructions.

This will be used with GISel opt remarks.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296012 91177308-0d34-0410-b5e6-96231b3b80d8

[CodeGen] Print MI without a newline when skipping debugloc. NFC.

This matches the behavior for skip-operands. While there, document it.
This is a follow-up to r296007.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296011 91177308-0d34-0410-b5e6-96231b3b80d8

[CodeGen] Use const MBBs in the opt remark diagnostics. NFC.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296010 91177308-0d34-0410-b5e6-96231b3b80d8

Correct register pressure calculation in presence of subregs

If a subreg is used in an instruction it counts as a whole superreg
for the purpose of register pressure calculation. This patch corrects
improper register pressure calculation by examining operand's lane mask.

Differential Revision: https://reviews.llvm.org/D29835

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296009 91177308-0d34-0410-b5e6-96231b3b80d8

[ORE] Use const CodeRegions in the remark diagnostics. NFC.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296008 91177308-0d34-0410-b5e6-96231b3b80d8

[CodeGen] Add a way to SkipDebugLoc in MachineInstr::print(). NFC.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296007 91177308-0d34-0410-b5e6-96231b3b80d8

[GlobalISel] Simplify Select type cleanup using a ScopeExit. NFC.

This lets us use more natural early-returns when selection fails.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296006 91177308-0d34-0410-b5e6-96231b3b80d8

Revert "Teach the IR verifier to reject conflicting debug info for function arguments."

This reverts commit r295749 while investigating PR32042.

It looks like this check uncovered a problem in the frontend that
needs to be fixed before the check can be enabled again.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296005 91177308-0d34-0410-b5e6-96231b3b80d8

[DAG] add convenience function to get -1 constant; NFCI

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296004 91177308-0d34-0410-b5e6-96231b3b80d8

[Reassociate] Add negated value of negative constant to the Duplicates list.

In OptimizeAdd, we scan the operand list to see if there are any common factors
between operands that can be factored out to reduce the number of multiplies
(e.g., 'A*A+A*B*C+D' -> 'A*(A+B*C)+D'). For each operand of the operand list, we
only consider unique factors (which is tracked by the Duplicate set). Now if we
find a factor that is a negative constant, we add the negated value as a factor
as well, because we can percolate the negate out. However, we mistakenly don't
add this negated constant to the Duplicates set.

Consider the expression A*2*-2 + B. Obviously, nothing to factor.

For the added value A*2*-2 we over count 2 as a factor without this change,
which causes the assert reported in PR30256. The problem is that this code is
assuming that all the multiply operands of the add are already reassociated.
This change avoids the issue by making OptimizeAdd tolerate multiplies which
haven't been completely optimized; this sort of works, but we're doing wasted
work: we'll end up revisiting the add later anyway.

Another possible approach would be to enforce RPO iteration order more strongly.
If we have RedoInsts, we process them immediately in RPO order, rather than
waiting until we've finished processing the whole function. Intuitively, it
seems like the natural approach: reassociation works on expression trees, so
the optimization only works in one direction. That said, I'm not sure how
practical that is given the current Reassociate; the "optimal" form for an
expression depends on its use list (see all the uses of "user_back()"), so
Reassociate is really an iterative optimization of sorts, so any changes here
would probably get messy.

PR30256

Differential Revision: https://reviews.llvm.org/D30228

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296003 91177308-0d34-0410-b5e6-96231b3b80d8

Use base discriminator in sample pgo profile matching.

Summary: The discriminator has been encoded, and only the base discriminator should be used during profile matching.

Reviewers: dblaikie, davidxl

Reviewed By: dblaikie, davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30218

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295999 91177308-0d34-0410-b5e6-96231b3b80d8

[Hexagon] Avoid IMPLICIT_DEFs as new-value producers

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295997 91177308-0d34-0410-b5e6-96231b3b80d8

[LazyMachineBFI] Reimplement with getAnalysisIfAvailable

Since LoopInfo is not available in machine passes as universally as in IR
passes, using the same approach for OptimizationRemarkEmitter as we did for IR
will run LoopInfo and DominatorTree unnecessarily.  (LoopInfo is not used
lazily by ORE.)

To fix this, I am modifying the approach I took in D29836.  LazyMachineBFI now
uses its client passes including MachineBFI itself that are available or
otherwise compute them on the fly.

So for example GreedyRegAlloc, since it's already using MBFI, will reuse that
instance.  On the other hand, AsmPrinter in Justin's patch will generate DT,
LI and finally BFI on the fly.

(I am of course wondering now if the simplicity of this approach is even
preferable in IR.  I will do some experiments.)

Testing is provided by an updated version of D29837 which requires Justin's
patch to bring ORE to the AsmPrinter.

Differential Revision: https://reviews.llvm.org/D30128

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295996 91177308-0d34-0410-b5e6-96231b3b80d8

[AddressSanitizer] Add PS4 offset

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295994 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] use loop instead of recursion to peek through FPExt; NFCI

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295992 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] use 'match' to reduce code; NFCI

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295991 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU/SI: Fix trunc i16 pattern

Hit on ASICs that support 16bit instructions.

Differential Revision: https://reviews.llvm.org/D30281

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295990 91177308-0d34-0410-b5e6-96231b3b80d8

Strip trailing whitespace.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295989 91177308-0d34-0410-b5e6-96231b3b80d8

[Hexagon] Patterns for CTPOP, BSWAP and BITREVERSE

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295981 91177308-0d34-0410-b5e6-96231b3b80d8

[docs] Add information about how to checkout polly to getting started page

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295974 91177308-0d34-0410-b5e6-96231b3b80d8

[ARM] GlobalISel: Lower call returns

Introduce a common ValueHandler for call returns and formal arguments, and
inherit two different versions for handling the differences (at the moment the
only difference is the way physical registers are marked as used).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295973 91177308-0d34-0410-b5e6-96231b3b80d8

[SLP] Fix for PR32036: Vectorized horizontal reduction returning wrong
result

Summary:
If the same value is used several times as an extra value, SLP
vectorizer takes it into account only once instead of actual number of
using.
For example:
```
int val = 1;
for (int y = 0; y < 8; y++) {
  for (int x = 0; x < 8; x++) {
    val = val + input[y * 8 + x] + 3;
  }
}
```
We have 2 extra rguments: `1` - initial value of horizontal reduction
and `3`, which is added 8*8 times to the reduction. Before the patch we
added `1` to the reduction value and added once `3`, though it must be
added 64 times.

Reviewers: mkuper, mzolotukhin

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30262

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295972 91177308-0d34-0410-b5e6-96231b3b80d8

[ARM] GlobalISel: Lower call parameters in regs

Add support for lowering calls with parameters than can fit into regs. Use the
same ValueHandler that we used for function returns, but rename it to match its
new, extended purpose.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295971 91177308-0d34-0410-b5e6-96231b3b80d8

[X86][AVX] Disable VCVTSS2SD & VCVTSD2SS memory folding and fix the register class of their first input when creating node in fast-isel.

(Quick fix to buildbot failure after rL295940 commit).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295970 91177308-0d34-0410-b5e6-96231b3b80d8