granicus.if.org Git

[TableGen][GlobalISel] Make the arguments of the Instruction and Operand Matchers consistent

Move InsnVarID and OpIdx at the beginning of the list of arguments
for all the constructors of the OperandMatcher subclasses.
This matches what we do for the InstructionMatcher.

NFC.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321031 91177308-0d34-0410-b5e6-96231b3b80d8

Fix buffer overrun in WindowsResourceCOFFWriter::writeSymbolTable()

Summary:
We were using sprintf(..., "$R06X", <some uint32_t>) to create strings
that are expected to be exactly length 8, but this results in longer
strings if the uint32_t is greater than 0xffffff. This change modifies
the behavior as follows:

- Uses the loop counter instead of the data offset. This gives us
sequential symbol names, avoiding collisions as much as possible.

- Masks the value to 0xffffff to avoid generating names longer than 8
bytes.

- Uses formatv instead of sprintf.

Fixes PR35581.

Reviewers: ruiu, zturner

Reviewed By: ruiu

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D41270

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321030 91177308-0d34-0410-b5e6-96231b3b80d8

Add test for .req directive starting with 'p'

Reduced test case from libjpeg_turbo.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321029 91177308-0d34-0410-b5e6-96231b3b80d8

[MachineOutliner][NFC] Gardening: use std::any_of instead of bool + loop

River Riddle suggested to use std::any_of instead of the bool + loop thing on
r320229. This commit does that.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321028 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Don't use NOPL when the assembler is passed an empty CPU string. Update tests to force a CPU with NOPL

Empty string should be equivalent to "generic" which doesn't allow NOPL. Force tests to use specificy 'pentiumpro' to guarantee NOPL.

Fixes PR35686

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321026 91177308-0d34-0410-b5e6-96231b3b80d8

[TableGen][GlobalISel] Refactor optimizeRules related bit to allow code reuse

In theory, reapplying optimizeRules on each group matchers should give
us a second nesting level on the matching table. In practice, we need
more work to make that happen because all the predicates are actually
not directly available through the predicate matchers list.

NFC.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321025 91177308-0d34-0410-b5e6-96231b3b80d8

Revert "[AArch64][SVE] Asm" changes, they broke libjpeg_turbo

This reverts changes r320992, r320986, r320973, and r320970.

r320970 by itself breaks the test case, and the rest depend on it.

Test case will land soon.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321024 91177308-0d34-0410-b5e6-96231b3b80d8

[Analysis] Generate more precise TBAA tags when one access encloses the other

There are cases when two tags with different base types denote
accesses to the same direct or indirect member of a structure
type. Currently, merging of such tags results in a tag that
represents an access to an object that has the type of that
member. This patch changes this so that if one of the accesses
encloses the other, then the generic tag is the one of the
enclosed access.

Differential Revision: https://reviews.llvm.org/D39557

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321019 91177308-0d34-0410-b5e6-96231b3b80d8

[PGO] Fix handling of cold entry count for instrumented PGO

Summary:
In r277849, getEntryCount was changed to return None when the entry
count was 0, specifically for SamplePGO where it means no samples were
recorded. However, for instrumentation PGO a 0 entry count should be
returned directly, since it does mean that the function was completely
cold. Otherwise we end up treating these functions conservatively
in isFunctionEntryCold() and isColdBB().

Instead, for SamplePGO use -1 when there are no samples, and change
getEntryCount to return None when the value is -1.

Reviewers: danielcdh, davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D41307

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321018 91177308-0d34-0410-b5e6-96231b3b80d8

[TableGen][GlobalISel] Optimize MatchTable for faster instruction selection

*** Context ***

Prior to this patchw, the table generated for matching instruction was
straight forward but highly inefficient.

Basically, each pattern generates its own set of self contained checks
and actions.
E.g., TableGen generated:
// First pattern
CheckNumOperand 3
CheckOpcode G_ADD
...
Build ADDrr
// Second pattern
CheckNumOperand 3
CheckOpcode G_ADD
...
Build ADDri
// Third pattern
CheckNumOperand 3
CheckOpcode G_SUB
...
Build SUBrr

*** Problem ***

Because of that generation, a *lot* of check were redundant between each
pattern and were checked every single time until we reach the pattern
that matches.
E.g., Taking the previous table, let say we are matching a G_SUB, that
means we were going to check all the rules for G_ADD before looking at
the G_SUB rule. In particular we are going to do:
check 3 operands; PASS
check G_ADD; FAIL
; Next rule
check 3 operands; PASS (but we already knew that!)
check G_ADD; FAIL (well it is still not true)
; Next rule
check 3 operands; PASS (really!!)
check G_SUB; PASS (at last :P)

*** Proposed Solution ***

This patch introduces a concept of group of rules (GroupMatcher) that
share some predicates and only get checked once for the whole group.

This patch only creates groups with one nesting level. Conceptually
there is nothing preventing us for having deeper nest level. However,
the current implementation is not smart enough to share the recording
(aka capturing) of values. That limits its ability to do more sharing.

For the given example the current patch will generate:
// First group
CheckOpcode G_ADD

// First pattern
CheckNumOperand 3
...
Build ADDrr
// Second pattern
CheckNumOperand 3
...
Build ADDri

// Second group
CheckOpcode G_SUB

// Third pattern
CheckNumOperand 3
...
Build SUBrr

But if we allowed several nesting level, it could create a sub group
for the checknumoperand 3.
(We would need to call optimizeRules on the rules within a group.)

*** Result ***

With only one level of nesting, the instruction selection pass is up
to 4x faster. For instance, one instruction now takes 500 checks,
instead of 24k! With more nesting we could get in the tens I believe.

Differential Revision: https://reviews.llvm.org/D39034

rdar://problem/34670699

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321017 91177308-0d34-0410-b5e6-96231b3b80d8

Fix more inconsistent line endings. NFC.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321016 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Minor formatting fix to getHostCPUFeatures. NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321015 91177308-0d34-0410-b5e6-96231b3b80d8

[MachineOutliner] Recommit r320229

LR was undefined entering outlined functions that contain calls. This made the
machine verifier unhappy when expensive checks were enabled. This fixes that.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321014 91177308-0d34-0410-b5e6-96231b3b80d8

[PPC] Also disable the pre-emit version of reg+reg to reg+imm transformation.

This has the same issue as the early pass disabled in r321010.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321013 91177308-0d34-0410-b5e6-96231b3b80d8

[cmake] Update experimental target error message

Summary:
Update this error message indicate this test only ensures experimental
targets were passed via LLVM_EXPERIMENTAL_TARGETS_TO_BUILD.

Originally, this test validated all targets, but in r184923, it was moved
after the LLVMBUILDTOOL test, which also validates all targets, making
that part of the test redundant.

Differential Revision: https://reviews.llvm.org/D41273

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321012 91177308-0d34-0410-b5e6-96231b3b80d8

Recommit "[DWARFv5] Dump an MD5 checksum in the line-table header."
Adds missing support for DW_FORM_data16.

Update of r320852/r320886, fixing the unittest again, this time use a
raw char string for the test data.

Differential Revision: https://reviews.llvm.org/D41090

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321011 91177308-0d34-0410-b5e6-96231b3b80d8

[PPC] Disable reg+reg to reg+imm transformation.

It creates invalid instructions. PR35688.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321010 91177308-0d34-0410-b5e6-96231b3b80d8

Fix inconsistent line endings in HexagonVectorLoopCarriedReuse.cpp. NFC.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321009 91177308-0d34-0410-b5e6-96231b3b80d8

[Hexagon] Higher versions of HVX imply presence of lower versions

The code in Hexagon_MC::completeHVXFeatures wasn't setting all HVX-
related features correctly.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321008 91177308-0d34-0410-b5e6-96231b3b80d8

[IR] Support the new TBAA metadata format in IR verifier

Differential Revision: https://reviews.llvm.org/D40438

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321007 91177308-0d34-0410-b5e6-96231b3b80d8

Fix inconsistent line endings in ARCDisassembler.cpp. NFC.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321006 91177308-0d34-0410-b5e6-96231b3b80d8

i[Hexagon] ANY_EXTEND_VECTOR_INREG should be Custom, not Legal in r321004

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321005 91177308-0d34-0410-b5e6-96231b3b80d8

[Hexagon] Generate HVX code for vector sign-, zero- and any-extends

Implement any-extend as zero-extend.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321004 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Regenerate test to improve codegen testing for D41350

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321003 91177308-0d34-0410-b5e6-96231b3b80d8

[Hexagon] Prefer to widen HVX vectors instead of promoting

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321002 91177308-0d34-0410-b5e6-96231b3b80d8

Removed unused DominanceFrontier

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321001 91177308-0d34-0410-b5e6-96231b3b80d8

[ThinLTO] Make distributed indexes test more robust

Modify test so that it passes in the reverse-iteration bot.
We use DenseMap instead of std::map for the summaries to emit into
distributed index files. The iteration order is not defined, but
it is deterministic, which is good enough.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321000 91177308-0d34-0410-b5e6-96231b3b80d8

[PGO] add MST min edge selection heuristic to ensure non-zero entry count

Differential Revision: http://reviews.llvm.org/D41059

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320998 91177308-0d34-0410-b5e6-96231b3b80d8

[YAML] Add support for non-printable characters

LLVM IR function names which disable mangling start with '\01'
(https://www.llvm.org/docs/LangRef.html#identifiers).

When an identifier like "\01@abc@" gets dumped to MIR, it is quoted, but
only with single quotes.

http://www.yaml.org/spec/1.2/spec.html#id2770814:

"The allowed character range explicitly excludes the C0 control block
allowed), the surrogate block #xD800-#xDFFF, #xFFFE, and #xFFFF."

http://www.yaml.org/spec/1.2/spec.html#id2776092:

"All non-printable characters must be escaped.
[...]
Note that escape sequences are only interpreted in double-quoted scalars."

This patch adds support for printing escaped non-printable characters
between double quotes if needed.

Should also fix PR31743.

Differential Revision: https://reviews.llvm.org/D41290

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320996 91177308-0d34-0410-b5e6-96231b3b80d8

[IR] Add MDBuilder helpers for the new TBAA metadata format

The new helpers are supposed to be used in clang to generate TBAA
information in the new format proposed in this thread:

http://lists.llvm.org/pipermail/llvm-dev/2017-November/118748.html

Differential Revision: https://reviews.llvm.org/D39956

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320993 91177308-0d34-0410-b5e6-96231b3b80d8

[AArch64][SVE] Asm: Improve diagnostics further when +sve is not specified

Summary: Patch [4/4] in a series to add parsing of predicates and properly parse SVE ZIP1/ZIP2 instructions. This patch further improves diagnostic messages for when the SVE feature is not specified.

Reviewers: rengolin, fhahn, olista01, echristo, efriedma

Reviewed By: fhahn

Subscribers: sdardis, aemerson, javed.absar, tschuett, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D40363

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320992 91177308-0d34-0410-b5e6-96231b3b80d8

Reland "[mips] Fix the target specific instruction verifier"

Fix an off by one error in the bounds checking for 'dinsu' and update
the ranges in the test comments so that they are accurate.

This version has the correct commit message.

Reviewers: atanasyan

Differential Revision: https://reviews.llvm.org/D41183

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320991 91177308-0d34-0410-b5e6-96231b3b80d8

[Memcpy Loop Lowering] Remove the fixed int8 lowering.

Switch over to the lowering that uses target supplied operand types.

Differential Revision: https://reviews.llvm.org/D41201

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320989 91177308-0d34-0410-b5e6-96231b3b80d8

[TableGen][AsmMatcherEmitter] Only choose specific diagnostic for enabled instruction

Summary:
When emitting a diagnostic for an invalid operand, a specific diagnostic
should only be reported when the instruction being matched is actually
enabled by the feature flags.

Patch [3/4] in a series to add parsing of predicates and properly parse SVE
ZIP1/ZIP2 instructions. This patch fixes bogus diagnostic messages for when
the SVE feature is not specified.

Reviewers: rengolin, craig.topper, olista01, sdardis, stoklund

Reviewed By: olista01, sdardis

Subscribers: fhahn, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D40362

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320986 91177308-0d34-0410-b5e6-96231b3b80d8

[LVI] Support for ashr in LVI

Enhance LVI to analyze the ‘ashr’ binary operation. This leverages the infrastructure in ConstantRange for the ashr operation.

Patch by Surya Kumari Jangala!

Differential Revision: https://reviews.llvm.org/D40886

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320983 91177308-0d34-0410-b5e6-96231b3b80d8

[ARM GlobalISel] Fix G_(UN)MERGE_VALUES handling after r319524

r319524 has made more G_MERGE_VALUES/G_UNMERGE_VALUES pairs legal than
are supported by the rest of the pipeline. Restrict that to only the
cases that we can currently handle: packing 32-bit values into 64-bit
ones, when we have hardware FP.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320980 91177308-0d34-0410-b5e6-96231b3b80d8

Constexprify LaneBitmask factory methods.

This avoids global constructors when they're used in a global constant.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320979 91177308-0d34-0410-b5e6-96231b3b80d8

[ConstantRange] Support for ashr in ConstantRange computation

Extend the ConstantRange implementation to compute the range of possible values resulting from an arithmetic right shift operation.
There will be a follow up patch to leverage this constant range infrastructure in LazyValueInfo.

Patch by Surya Kumari Jangala!

Differential Revision: https://reviews.llvm.org/D40881

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320976 91177308-0d34-0410-b5e6-96231b3b80d8

Revert "[mips] Fix the target specific instruction verifier"

This reverts commit r320974. The commit message lacked the Differential Revison: line.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320975 91177308-0d34-0410-b5e6-96231b3b80d8

[mips] Fix the target specific instruction verifier

Fix an off by one error in the bounds checking for 'dinsu' and update
the ranges in the test comments so that they are accurate.

Reviewers: atanasyan

https://reviews.llvm.org/D41183

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320974 91177308-0d34-0410-b5e6-96231b3b80d8

[AArch64][SVE] Asm: Add ZIP1/ZIP2 instructions (predicate/data vectors)

Summary: Patch [2/4] in a series to add parsing of predicates and properly parse SVE ZIP1/ZIP2 instructions.

Reviewers: rengolin, kristof.beyls, fhahn, mcrosier, evandro

Reviewed By: fhahn

Subscribers: aemerson, javed.absar, llvm-commits, tschuett

Differential Revision: https://reviews.llvm.org/D40361

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320973 91177308-0d34-0410-b5e6-96231b3b80d8

[AArch64][SVE] Asm: Add SVE predicate register definitions and parsing support

Summary: Patch [1/4] in a series to add parsing of predicates and properly parse SVE ZIP1/ZIP2 instructions.

Reviewers: rengolin, kristof.beyls, fhahn, mcrosier, evandro, echristo, efriedma

Reviewed By: fhahn

Subscribers: aemerson, javed.absar, llvm-commits, tschuett

Differential Revision: https://reviews.llvm.org/D40360

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320970 91177308-0d34-0410-b5e6-96231b3b80d8

[ThinLTO] Remove unused code

This is a re-commit of r320464, after patch for gold plugin
was landed.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320968 91177308-0d34-0410-b5e6-96231b3b80d8

AArch64: work around how Cyclone handles "movi.2d vD, #0".

For Cylone, the instruction "movi.2d vD, #0" is executed incorrectly in some rare
circumstances. Work around the issue conservatively by avoiding the instruction entirely.

This patch changes CodeGen so that problematic instructions are never
generated, and the AsmParser so that an equivalent instruction is used (with a
warning).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320965 91177308-0d34-0410-b5e6-96231b3b80d8

[TargetLibraryInfo] Discard library functions with incorrectly sized integers

Differential Revision: https://reviews.llvm.org/D41184

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320964 91177308-0d34-0410-b5e6-96231b3b80d8

[ARM] Adjust test checks

Correct the CHECK-LABELS of a couple of dag combine tests.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320963 91177308-0d34-0410-b5e6-96231b3b80d8

[DAGCombine] Move AND nodes to multiple load leaves

Search from AND nodes to find whether they can be propagated back to
loads, so that the AND and load can be combined into a narrow load.
We search through OR, XOR and other AND nodes and all bar one of the
leaves are required to be loads or constants. The exception node then
needs to be masked off meaning that the 'and' isn't removed, but the
loads(s) are narrowed still.

Differential Revision: https://reviews.llvm.org/D41177

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320962 91177308-0d34-0410-b5e6-96231b3b80d8

[NFC][CodeGen][ExpandMemCmp] Fix documentation.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320960 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Use mattr instead of mcpu in some of the cost model tests.

Based on the names of the check lines, features seems more appropriate that cpu.

Spotted while prototyping my patch to make 512-bit vectors illegal on SKX sometimes.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320959 91177308-0d34-0410-b5e6-96231b3b80d8

[SROA] Disable non-whole-alloca splits by default

This patch introduce a switch to control splitting of non-whole-alloca slices with default off.
The switch will be default on again after fixing an issue reported in PR35657.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320958 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Fix mistake that I made when splitting up the setOperationAction calls recently.

The block I moved things that need BWI and 512-bit or VLX is incorrectly qualified with just hasBWI || hasVLX. Here I've qualified it with hasBWI && (hasAVX512 || hasVLX) where the hasAVX512 will be replaced with allowing 512-bit vectors in an upcoming patch.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320957 91177308-0d34-0410-b5e6-96231b3b80d8

[CGP] Fix the handling select inst in complex addressing mode

When we put the value in select placeholder we must pass
the value through simplification tracker due to the value might
be already simplified and erased.

This is a fix for PR35658.

Reviewers: john.brawn, uabelho
Reviewed By: john.brawn
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41251

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320956 91177308-0d34-0410-b5e6-96231b3b80d8

[x86] add tests for finite libcall lowering (PR35672); NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320955 91177308-0d34-0410-b5e6-96231b3b80d8

Re-commit "Properly handle multi-element and dynamically sized allocas in getPointerDereferenceableBytes()""

llvm-clang-x86_64-expensive-checks-win is still broken, so the failure
seems unrelated.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320953 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Add test cases that show cases where buildvector of extract and inserts should be turned into fmsubadd.

This is a follow up to the fmaddsub support added in r320950. Hopefully in the future we can fix lowering to handle this fmsubadd too.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320951 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Make the code that creates fmaddsub from build_vector of extracts and inserts functional and add tests.

Summary:
We had no tests for this and we couldn't do the optimization because of a bad use count check. We need to know how many non-undef pieces of the build vector were filled in and ensure our use count is equal to that. But on the shuffle combine version we need the use count to be 2.

The missing coverage was noticed during the review of D40335.

Reviewers: RKSimon, zvi, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D41133

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320950 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Regenerate truncated rotation tests + add missing 32-bit checks

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320949 91177308-0d34-0410-b5e6-96231b3b80d8

use uint32_t

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320947 91177308-0d34-0410-b5e6-96231b3b80d8

[WebAssembly] Export some more info on wasm funtions

Summary:
These fields are useful for lld's gc-sections support

Also remove an unused field.

Subscribers: jfb, dschuff, jgravelle-google, aheejin, sunfish

Differential Revision: https://reviews.llvm.org/D41320

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320946 91177308-0d34-0410-b5e6-96231b3b80d8

Revert "Properly handle multi-element and dynamically sized allocas in getPointerDereferenceableBytes()"

This reverts commit 217067d5179882de9deb60d2e866befea4c126e7.

Fails on llvm-clang-x86_64-expensive-checks-win

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320945 91177308-0d34-0410-b5e6-96231b3b80d8

Revert "Treat sret arguments as being dereferenceable in getPointerDereferenceableBytes()"

This reverts commit 8b7a7660a3904b2088bc594311bcea2c651def08.

I didn't mean to commit this.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320944 91177308-0d34-0410-b5e6-96231b3b80d8

Treat sret arguments as being dereferenceable in getPointerDereferenceableBytes()

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320943 91177308-0d34-0410-b5e6-96231b3b80d8

Remove superfluous break after a return. NFCI.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320941 91177308-0d34-0410-b5e6-96231b3b80d8

[X86DomainReassignment] Store legal domains in a std::bitset instead of using a SmallVector that really only ever has one element as a set.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320940 91177308-0d34-0410-b5e6-96231b3b80d8

Properly handle byval arguments in getPointerDereferenceableBytes()

Summary:
For byval arguments, the number of dereferenceable bytes is equal to
the size of the pointee, not the pointer.

Reviewers: hfinkel, rnk

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D41305

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320939 91177308-0d34-0410-b5e6-96231b3b80d8

Properly handle multi-element and dynamically sized allocas in getPointerDereferenceableBytes()

Reviewers: hfinkel, rnk

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D41288

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320938 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Use extract_vector_elt instead of X86ISD::VEXTRACT for isel of vXi1 extractions.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320937 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Canonicalize extract_vector_elt from vXi1 to always return MVT::i32.

This allows us to remove some isel patterns that allowed MVT::i8 result type.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320936 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Don't create X86ISD::VEXTRACT nodes directly. Use EXTRACT_VECTOR_ELT and allow that to be legaized to VEXTRACT.

I think we can remove the VEXTRACT node completely and use a canonicalized EXTRACT_VECTOR_ELT instead. This is a first step.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320935 91177308-0d34-0410-b5e6-96231b3b80d8

Fix unused variable warning.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320934 91177308-0d34-0410-b5e6-96231b3b80d8

[X86][AVX] lowerVectorShuffleAsBroadcast - aggressively peek through BITCASTs

Assuming we can safely adjust the broadcast index for the new type to keep it suitably aligned, then peek through BITCASTs when looking for the broadcast source.

Fixes PR32007

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320933 91177308-0d34-0410-b5e6-96231b3b80d8

[X86][AVX] Use extract128BitVector helper. NFCI.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320932 91177308-0d34-0410-b5e6-96231b3b80d8

[X86][AVX] Fix failed broadcast fold

Strip excess BITCASTs from EXTRACT_SUBVECTOR input

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320930 91177308-0d34-0410-b5e6-96231b3b80d8

[Memcpy Loop Lowering] Only calculate residual size/bytes copied when needed.

If the loop operand type is int8 then there will be no residual loop for the
unknown size expansion. Dont create the residual-size and bytes-copied values
when they are not needed.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320929 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Don't pass a zero input to the passthru operand of getVectorMaskingNode/getScalarMaskingNode when its going to emit an ISD::OR/ISD::AND. NFCI

In those cases, the pass thru operand of the methods isn't used. The calls to the scalar version were passing a MVT::i1 zero, which is an illegal type at the stage this code runs.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320928 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Have getVectorMaskingNode return an ISD::AND for X86ISD::VPSHUFBITQMB instead of creating a select with one input being 0.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320927 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] When using vpopcntdq for ctpop of v8i16 vectors, only promote to v8i32.

Previously we promoted to v8i64, but we don't need to go all the way to 512-bits. If we have VLX we can use the 256-bit instruction. And even if we don't have VLX we can widen v8i32 to v16i32 and drop the upper half.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320926 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Combine some more scheduler model entries using regular expressions.

We had a lot of separate 32 and 64 instructions that had the same scheduling data. This merges them into the same regular expression. This is pretty consistent with a lot of other instructions.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320924 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Use instrs instead of instregex for gather/scatter instructions in the scheduler models. Combine into single InstrRW entries.

The reduces the number of scheduler groups in subtarget info.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320923 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] Regenerate FMUL/FMA combine tests with update_test_checks.py

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320922 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] canonicalize shifty abs(): ashr+add+xor --> cmp+neg+sel

We want to do this for 2 reasons:
1. Value tracking does not recognize the ashr variant, so it would fail to match for cases like D39766.
2. DAGCombiner does better at producing optimal codegen when we have the cmp+sel pattern.

More detail about what happens in the backend:
1. DAGCombiner has a generic transform for all targets to convert the scalar cmp+sel variant of abs
   into the shift variant. That is the opposite of this IR canonicalization.
2. DAGCombiner has a generic transform for all targets to convert the vector cmp+sel variant of abs
   into either an ABS node or the shift variant. That is again the opposite of this IR canonicalization.
3. DAGCombiner has a generic transform for all targets to convert the exact shift variants produced by #1 or #2
   into an ISD::ABS node. Note: It would be an efficiency improvement if we had #1 go directly to an ABS node
   when that's legal/custom.
4. The pattern matching above is incomplete, so it is possible to escape the intended/optimal codegen in a
   variety of ways.
   a. For #2, the vector path is missing the case for setlt with a '1' constant.
   b. For #3, we are missing a match for commuted versions of the shift variants.
5. Therefore, this IR canonicalization can only help get us to the optimal codegen. The version of cmp+sel
   produced by this patch will be recognized in the DAG and converted to an ABS node when possible or the
   shift sequence when not.
6. In the following examples with this patch applied, we may get conditional moves rather than the shift
   produced by the generic DAGCombiner transforms. The conditional move is created using a target-specific
   decision for any given target. Whether it is optimal or not for a particular subtarget may be up for debate.

define i32 @abs_shifty(i32 %x) {
  %signbit = ashr i32 %x, 31
  %add = add i32 %signbit, %x
  %abs = xor i32 %signbit, %add
  ret i32 %abs
}

define i32 @abs_cmpsubsel(i32 %x) {
  %cmp = icmp slt i32 %x, zeroinitializer
  %sub = sub i32 zeroinitializer, %x
  %abs = select i1 %cmp, i32 %sub, i32 %x
  ret i32 %abs
}

define <4 x i32> @abs_shifty_vec(<4 x i32> %x) {
  %signbit = ashr <4 x i32> %x, <i32 31, i32 31, i32 31, i32 31>
  %add = add <4 x i32> %signbit, %x
  %abs = xor <4 x i32> %signbit, %add
  ret <4 x i32> %abs
}

define <4 x i32> @abs_cmpsubsel_vec(<4 x i32> %x) {
  %cmp = icmp slt <4 x i32> %x, zeroinitializer
  %sub = sub <4 x i32> zeroinitializer, %x
  %abs = select <4 x i1> %cmp, <4 x i32> %sub, <4 x i32> %x
  ret <4 x i32> %abs
}

> $ ./opt -instcombine shiftyabs.ll -S | ./llc -o - -mtriple=x86_64 -mattr=avx
> abs_shifty:
> movl %edi, %eax
> negl %eax
> cmovll %edi, %eax
> retq
>
> abs_cmpsubsel:
> movl %edi, %eax
> negl %eax
> cmovll %edi, %eax
> retq
>
> abs_shifty_vec:
> vpabsd %xmm0, %xmm0
> retq
>
> abs_cmpsubsel_vec:
> vpabsd %xmm0, %xmm0
> retq
>
> $ ./opt -instcombine shiftyabs.ll -S | ./llc -o - -mtriple=aarch64
> abs_shifty:
> cmp w0, #0                  // =0
> cneg w0, w0, mi
> ret
>
> abs_cmpsubsel:
> cmp w0, #0                  // =0
> cneg w0, w0, mi
> ret
>
> abs_shifty_vec:
> abs v0.4s, v0.4s
> ret
>
> abs_cmpsubsel_vec:
> abs v0.4s, v0.4s
> ret
>
> $ ./opt -instcombine shiftyabs.ll -S | ./llc -o - -mtriple=powerpc64le
> abs_shifty:
> srawi 4, 3, 31
> add 3, 3, 4
> xor 3, 3, 4
> blr
>
> abs_cmpsubsel:
> srawi 4, 3, 31
> add 3, 3, 4
> xor 3, 3, 4
> blr
>
> abs_shifty_vec:
> vspltisw 3, -16
> vspltisw 4, 15
> vsubuwm 3, 4, 3
> vsraw 3, 2, 3
> vadduwm 2, 2, 3
> xxlxor 34, 34, 35
> blr
>
> abs_cmpsubsel_vec:
> vspltisw 3, -16
> vspltisw 4, 15
> vsubuwm 3, 4, 3
> vsraw 3, 2, 3
> vadduwm 2, 2, 3
> xxlxor 34, 34, 35
> blr
>

Differential Revision: https://reviews.llvm.org/D40984

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320921 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Remove GCCBuiltin from kand/kandn/kor/kxor/kxnor/knot intrinsics so clang can implement with native IR.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320918 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Remove unneeded code for handling the old kunpck intrinsics.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320917 91177308-0d34-0410-b5e6-96231b3b80d8

Move Transforms/LoopVectorize/consecutive-ptr-cg-bug.ll into the X86 subdirectory

This test depends on X86's TTI; move into the X86 subdirectory.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320914 91177308-0d34-0410-b5e6-96231b3b80d8

[LV] Extend InstWidening with CM_Widen_Recursive

Changes to the original scalar loop during LV code gen cause the return value
of Legal->isConsecutivePtr() to be inconsistent with the return value during
legal/cost phases (further analysis and information of the bug is in D39346).
This patch is an alternative fix to PR34965 following the CM_Widen approach
proposed by Ayal and Gil in D39346. It extends InstWidening enum with
CM_Widen_Reverse to properly record the widening decision for consecutive
reverse memory accesses and, consequently, get rid of the
Legal->isConsetuviePtr() call in LV code gen. I think this is a simpler/cleaner
solution to PR34965 than the one in D39346.

Fixes PR34965.

Patch by Diego Caballero, thanks!

Differential Revision: https://reviews.llvm.org/D40742

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320913 91177308-0d34-0410-b5e6-96231b3b80d8

Fixed warning 'function declaration isn’t a prototype [-Werror=strict-prototypes]'

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320912 91177308-0d34-0410-b5e6-96231b3b80d8

[PowerPC, AsmParser] Enable the mnemonic spell corrector

r307148 added an assembly mnemonic spelling correction support and enabled it
on ARM. This enables that support on PowerPC as well.

Patch by Dmitry Venikov, thanks!

Differential Revision: https://reviews.llvm.org/D40552

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320911 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Add 128 and 256-bit VPOPCNTDQ instructions. Adjust some tablegen classes LZCNT/POPCNT.

I think when this instruction was first published it was only for a Knights CPU and thus VLX version was missing.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320910 91177308-0d34-0410-b5e6-96231b3b80d8

[LTO] Update tests for r320905

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320909 91177308-0d34-0410-b5e6-96231b3b80d8

Remove trailing whitespace

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320907 91177308-0d34-0410-b5e6-96231b3b80d8

[WebAssembly] Return ArrayRef's rather than const std::vector&

From working on lld I've learned this is generally the
preferred way for several reasons (e.g. more concise, improves
encapsulation).

Differential Revision: https://reviews.llvm.org/D41265

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320906 91177308-0d34-0410-b5e6-96231b3b80d8

[LTO] Make processing of combined module more consistent

Summary:
1. Use stream 0 only for combined module. Previously if combined module was not
processes ThinLTO used the stream for own output. However small changes in input,
could trigger combined module and shuffle outputs making life of llvm::LTO harder.

2. Always process combined module and write output to stream 0. Processing empty
combined module is cheap and allows llvm::LTO users to avoid implementing processing
which is already done in llvm::LTO.

Subscribers: mehdi_amini, inglorion, eraman, hiraditya

Differential Revision: https://reviews.llvm.org/D41267

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320905 91177308-0d34-0410-b5e6-96231b3b80d8

Add another missing -enable-import-metadata to test

r320895 modified a test so that it needs -enable-import-metadata which
is false by default for NDEBUG, found another place that needs this
added.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320903 91177308-0d34-0410-b5e6-96231b3b80d8

[SimplifyLibCalls] Inline calls to cabs when it's safe to do so

When unsafe algerbra is allowed calls to cabs(r) can be replaced by:

sqrt(creal(r)*creal(r) + cimag(r)*cimag(r))

Patch by Paul Walker, thanks!

Differential Revision: https://reviews.llvm.org/D40069

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320901 91177308-0d34-0410-b5e6-96231b3b80d8

[LV] NFC patch for moving VP*Recipe class definitions from LoopVectorize.cpp to VPlan.h

This is a small step forward to move VPlan stuff to where it should belong (i.e., VPlan.*):

  1. VP*Recipe classes in LoopVectorize.cpp are moved to VPlan.h.
  2. Many of VP*Recipe::print() and execute() definitions are still left in
     LoopVectorize.cpp since they refer to things declared in LoopVectorize.cpp. To
     be moved to VPlan.cpp at a later time.
  3. InterleaveGroup class is moved from anonymous namespace to llvm namespace.
     Referencing it in anonymous namespace from VPlan.h ended up in warning.

Patch by Hideki Saito, thanks!

Differential Revision: https://reviews.llvm.org/D41045

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320900 91177308-0d34-0410-b5e6-96231b3b80d8

Add -enable-import-metadata to test

r320895 modified a test so that it needs -enable-import-metadata which
is false by default for NDEBUG.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320899 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Add back the assert from r320830 that was reverted in r320850

Hopefully r320864 has fixed the offending case that failed the assert.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320898 91177308-0d34-0410-b5e6-96231b3b80d8

Fix NDEBUG build problem in r320895

Fix incorrect placement of #endif causing NDEBUG build failures.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320897 91177308-0d34-0410-b5e6-96231b3b80d8

[ThinLTO] Enable importing of aliases as copy of aliasee

Summary:
This implements a missing feature to allow importing of aliases, which
was previously disabled because alias cannot be available_externally.
We instead import an alias as a copy of its aliasee.

Some additional work was required in the IndexBitcodeWriter for the
distributed build case, to ensure that the aliasee has a value id
in the distributed index file (i.e. even when it is not being
imported directly).

This is a performance win in codes that have many aliases, e.g. C++
applications that have many constructor and destructor aliases.

Reviewers: pcc

Subscribers: mehdi_amini, inglorion, eraman, llvm-commits

Differential Revision: https://reviews.llvm.org/D40747

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320895 91177308-0d34-0410-b5e6-96231b3b80d8

Fix WebAssembly backend for some LLVM API changes

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320893 91177308-0d34-0410-b5e6-96231b3b80d8