granicus.if.org Git

[tblgen] GCC/MS builtin to target intrisics map.

Patch by Ettore Speziale

Allow TableGen to generate static functions to perform GCC/MS builtin name to
target specific intrinsic ID mapping.

https://reviews.llvm.org/D31150

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300735 91177308-0d34-0410-b5e6-96231b3b80d8

[AMDGPU][mc][tests][NFC] Update bulk ISA tests for Gfx7 and Gfx8

Added approx. 1100 gfx7 and 1040 gfx8 test cases.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300734 91177308-0d34-0410-b5e6-96231b3b80d8

StructurizeCFG: Directly invert cmp instructions

The most common case for a branch condition is
a single use compare. Directly invert the branch
predicate rather than adding a lot of xor i1 true
which the DAG will have to fold later.

This produces nicer to read structurizer output.

This produces some random changes in codegen
due to the DAG swapping branch conditions itself,
and then does a poor job of dealing with those
inverts.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300732 91177308-0d34-0410-b5e6-96231b3b80d8

[GVN] Don't coerce non-integral pointers to integers or vice versa

Summary:
See http://llvm.org/docs/LangRef.html#non-integral-pointer-type

The NewGVN test does not fail without these changes (perhaps it does
try to coerce pointers <-> integers to begin with?), but I added the
test case anyway.

Reviewers: dberlin

Subscribers: mcrosier, llvm-commits, Prazek

Differential Revision: https://reviews.llvm.org/D32208

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300730 91177308-0d34-0410-b5e6-96231b3b80d8

Update comment to match r300252.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300728 91177308-0d34-0410-b5e6-96231b3b80d8

ARM: TLS calling convention doesn't preserve r9 or r12 on Darwin.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300726 91177308-0d34-0410-b5e6-96231b3b80d8

[DAG] add splat vector support for 'and' in SimplifyDemandedBits

The patch itself is simple: stop discriminating against vectors in visitAnd() and again in
SimplifyDemandedBits().

Some notes for reference:

1. We're not consistent about calls to SimplifyDemandedBits in the various visitXXX functions.
   Sometimes, we check if the RHS is a constant first. Other times (like here), we just dive in.
2. I'd like to break the vector shackles in steps for the sake of risk minimization, but we could
    make similar simultaneous changes in other places if we think that would be better.
3. I don't know what the intent of the changed tests in this patch was supposed to be, but since
   they wiggled in a positive way, I'm just going with that. :)
4. In the rotate tests, note that we can see through non-splat constants. This is a result of D24253.
5. My motivation for being here now is to make D31944 look better, so this is step 1 of N towards
   improving the vector codegen in that patch without writing any actual new code.

Differential Revision: https://reviews.llvm.org/D32230

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300725 91177308-0d34-0410-b5e6-96231b3b80d8

IR: Remove some comments that are documenting the obvious. NFC.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300724 91177308-0d34-0410-b5e6-96231b3b80d8

[MathExtras] Fix undefined behavior (shift by bit width)

While there add some unit tests for uint64_t. Found by ubsan.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300721 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Don't align callable functions to 256

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300720 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Change DivergenceAnalysis for function arguments

Stop assuming all functions are kernels.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300719 91177308-0d34-0410-b5e6-96231b3b80d8

Prefer addAttr(Attribute::AttrKind) over the AttributeList overload

This should simplify the call sites, which typically want to tweak one
attribute at a time. It should also avoid creating ephemeral
AttributeLists that live forever.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300718 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] Reduce visitLoadInst() code duplication. NFCI.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300717 91177308-0d34-0410-b5e6-96231b3b80d8

[APInt] Move the 'return *this' from the slow cases of assignment operators inline. We should let the compiler see that the fast/slow cases both return *this.

I don't think we chain assignments together very often so this shouldn't matter much.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300715 91177308-0d34-0410-b5e6-96231b3b80d8

[InstSimplify] fold identity shuffles (recursing if needed)

This patch simplifies the examples from D31509 and D31927 (PR30630) and catches
the basic identity shuffle tests that Zvi recently added.

I'm not sure if we have something like this in DAGCombiner, but we should?

It's worth noting that "MaxRecurse / RecursionLimit" is only 3 on entry at the moment.
We might want to bump that up if there are longer shuffle chains like this in the wild.

For now, we're ignoring shuffles that have undef mask elements because it's not
clear how those should be handled.

Differential Revision: https://reviews.llvm.org/D31960

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300714 91177308-0d34-0410-b5e6-96231b3b80d8

use 'auto' with 'dyn_cast' and fix formatting; NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300713 91177308-0d34-0410-b5e6-96231b3b80d8

Add an #include for <climits> for CHAR_BIT.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300711 91177308-0d34-0410-b5e6-96231b3b80d8

[Support] Add some helpers to generate bitmasks.

Frequently you you want a bitmask consisting of a specified
number of 1s, either at the beginning or end of a word.

The naive way to do this is to write

template<typename T>
T leadingBitMask(unsigned N) {
return (T(1) << N) - 1;
}

but using this function you cannot produce a word with every
bit set to 1 (i.e. leadingBitMask<uint8_t>(8)) because left
shift is undefined when N is greater than or equal to the
number of bits in the word.

This patch provides an efficient, branch-free implementation
that works for all values of N in [0, CHAR_BIT*sizeof(T)]

Differential Revision: https://reviews.llvm.org/D32212

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300710 91177308-0d34-0410-b5e6-96231b3b80d8

Remove eol-style:native from MathExtras.h

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300709 91177308-0d34-0410-b5e6-96231b3b80d8

Revert r300697 which causes buildbot failure.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300708 91177308-0d34-0410-b5e6-96231b3b80d8

[Hexagon] Generate proper offset in opt-addr-mode

Also, make a few changes to allow using the pass in .mir testcases.
Among other things, change the abbreviation from opt-amode to amode-opt,
because otherwise lit would expand the "opt" part to the full path to
the opt binary.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300707 91177308-0d34-0410-b5e6-96231b3b80d8

[Hexagon] Remove RDefMap, use Liveness:getNearestAliasedRef instead

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300706 91177308-0d34-0410-b5e6-96231b3b80d8

[RDF] Switch NodeList to SmallVector from std::vector

The list has a single element 75+% of the time, reservation of 4 elements
is sufficient in 95% of cases.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300705 91177308-0d34-0410-b5e6-96231b3b80d8

[RDF] Use faster version of findBlock

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300704 91177308-0d34-0410-b5e6-96231b3b80d8

[RDF] Cache register units for reg masks instead of recalculating them

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300702 91177308-0d34-0410-b5e6-96231b3b80d8

[Hexagon] Cache reached blocks in bit tracker instead of scanning list

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300701 91177308-0d34-0410-b5e6-96231b3b80d8

[PowerPC] add test and auto-generate checks; NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300700 91177308-0d34-0410-b5e6-96231b3b80d8

[ARM] add test and auto-generate checks; NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300698 91177308-0d34-0410-b5e6-96231b3b80d8

Using address range map to speedup finding inline stack for address.

Summary:
In the current implementation, to find inline stack for an address incurs expensive linear search in 2 places:

* linear search for the top-level DIE
* recursive linear traverse the DIE tree to find the path to the leaf DIE

In this patch, a map is built from address to its corresponding leaf DIE. The inline stack is built by traversing from the leaf DIE up to the root DIE. This speeds up batch symbolization by ~10X without noticible memory overhead.

Reviewers: dblaikie

Reviewed By: dblaikie

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D32177

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300697 91177308-0d34-0410-b5e6-96231b3b80d8

[InstSimplify] Deduce correct type for vector GEP.

InstSimplify returned the wrong type when simplifying a vector GEP
and we ended up crashing when trying to replace all uses with the
new value. Fixes PR32697.

Differential Revision: https://reviews.llvm.org/D32180

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300693 91177308-0d34-0410-b5e6-96231b3b80d8

[DAG] Loop over remaining candidates on successful merge of stores of
extracted vectors types. NFCI.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300688 91177308-0d34-0410-b5e6-96231b3b80d8

[AVR] Remove the 'multibyte' asm test

It tests registers which are not actually used on AVR.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300684 91177308-0d34-0410-b5e6-96231b3b80d8

Regenerate test. NFCI.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300683 91177308-0d34-0410-b5e6-96231b3b80d8

[AVR] Fix the test suite

A bunch of tests failed because memory operations have been reordered.

I am unsure which commit changed this behaviour as the AVR build was
failing at that point with an unrelated error.

This commit just reoders some of the CHECK lines in some tests to suit
current llc output.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300682 91177308-0d34-0410-b5e6-96231b3b80d8

[GlobalIsel][X86] support G_TRUNC selection.

Summary:
[GlobalIsel][X86] support G_TRUNC selection.
Add regbank-select and legalizer tests. Currently legalization of trunc i64 on 32bit platform not supported.

Reviewers: ab, zvi, rovka

Reviewed By: zvi

Subscribers: dberris, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D32115

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300678 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Add D32039/PR31357 tests to show current BSWAP codegen

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300672 91177308-0d34-0410-b5e6-96231b3b80d8

[X86][SSE] Add scheduling latency/throughput tests for (most) SSE2 instructions

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300671 91177308-0d34-0410-b5e6-96231b3b80d8

Revert "ARMFrameLowering: Reserve emergency spill slot for large arguments"

This reverts commit r300639, as it broke self-hosting on ARM. PR32709.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300668 91177308-0d34-0410-b5e6-96231b3b80d8

[GlobalISel][X86] Split select tests. NFC.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300666 91177308-0d34-0410-b5e6-96231b3b80d8

[ARM] GlobalISel: Add support for G_MUL

Support G_MUL, very similar to G_ADD and G_SUB. The only difference is
in the instruction selector, where we have to select either MUL or MULv5
depending on the target.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300665 91177308-0d34-0410-b5e6-96231b3b80d8

[GlobalISel] Support vector-of-pointers in LLT

This fixes PR32471.

As comment 10 on that bug report highlights
(https://bugs.llvm.org//show_bug.cgi?id=32471#c10), there are quite a
few different defendable design tradeoffs that could be made, including
not representing pointers at all in LLT.

I decided to go for representing vector-of-pointer as a concept in LLT,
while keeping the size of the LLT type 64 bits (this is an increase from
48 bits before). My rationale for keeping pointers explicit is that on
some targets probably it's very handy to have the distinction between
pointer and non-pointer (e.g. 68K has a different register bank for
pointers IIRC). If we keep a scalar pointer, it probably is easiest to
also have a vector-of-pointers to keep LLT relatively conceptually clean
and orthogonal, while we don't have a very strong reason to break that
orthogonality. Once we gain more experience on the use of LLT, we can
of course reconsider this direction.

Rejecting vector-of-pointer types in the IRTranslator is also an option
to avoid the crash reported in PR32471, but that is only a very
short-term solution; also needs quite a bit of code tweaks in places,
and is probably fragile. Therefore I didn't consider this the best
option.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300664 91177308-0d34-0410-b5e6-96231b3b80d8

[GlobalISel] Remove non-determinism from IRTranslator.

This showed up in r300535/r300537, which were reverted in r300538 due to
some of the introduced tests in there failing on some bots, due to the
non-determinism fixed in this commit.

Re-committing r300535/r300537 will add 2 tests for the change in this
commit.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300663 91177308-0d34-0410-b5e6-96231b3b80d8

Revert r300657 due to crashes in stage2 of bootstraps:
http://lab.llvm.org:8011/builders/clang-with-lto-ubuntu/builds/2476/steps/build-stage2-LLVMgold.so/logs/stdio
http://bb.pgr.jp/builders/clang-3stage-x86_64-linux/builds/15036/steps/build_llvmclang/logs/stdio

I've updated the commit thread, reverting to get the bots back to green.

Original commit summary:
[JumpThread] We want to fold (not thread) when all predecessor go to single BB's successor.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300662 91177308-0d34-0410-b5e6-96231b3b80d8

[JumpThread] We want to fold (not thread) when all predecessor go to single BB's successor. .

Summary: In case all predecessor go to a single successor of current BB. We want to fold (not thread).

Reviewers: efriedma, sanjoy

Reviewed By: sanjoy

Subscribers: dberlin, majnemer, llvm-commits

Differential Revision: https://reviews.llvm.org/D30869

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300657 91177308-0d34-0410-b5e6-96231b3b80d8

Cleanup some GraphTraits iteration code

Use children<> and nodes<> in appropriate places to cleanup the code.

Also, as part of the cleanup,
change the signature of DominatorTreeBase's Split.
It is a protected non-virtual member function called only twice,
both from within the class,
and the removed passed argument in both cases is '*this'.
The reason for the existence of that argument seems to be that
back before r43115 Split was a free function,
so an argument to get '*this' was needed - but now that is no longer the
case.

Patch by Yoav Ben-Shalom!

Differential Revision: https://reviews.llvm.org/D32118

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300656 91177308-0d34-0410-b5e6-96231b3b80d8

ARM: Use methods to access data stored with frame instructions

In r300196 several methods were added to TarfetInstrInfo to access
data stored with call frame setup/destroy instructions. This change
replaces calls to getOperand with calls to such special methods in
ARM target.

Differential Revision: https://reviews.llvm.org/D32127

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300655 91177308-0d34-0410-b5e6-96231b3b80d8

Remove buggy 'addAttributes(unsigned, AttrBuilder)' overload

The 'addAttributes(unsigned, AttrBuilder)' overload delegated to 'get'
instead of 'addAttributes'.

Since we can implicitly construct an AttrBuilder from an AttributeSet,
just standardize on AttrBuilder.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300651 91177308-0d34-0410-b5e6-96231b3b80d8

[libFuzzer] update -help: mention -exact_artifact_path in help for -minimize_crash and -cleanse_crash

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300642 91177308-0d34-0410-b5e6-96231b3b80d8

[AVR] Migrate to new MCAsmInfo CodePointerSize

Reviewers: dylanmckay, rengolin, kzhuravl, jroelofs

Reviewed By: kzhuravl, jroelofs

Subscribers: kzhuravl, llvm-commits

Differential Revision: https://reviews.llvm.org/D32154

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300641 91177308-0d34-0410-b5e6-96231b3b80d8

ARMFrameLowering: Reserve emergency spill slot for large arguments

We need to reserve an emergency spill slot in cases with large argument
types that could overflow immediate offsets for FP relative address
calculations.

rdar://31317893

Differential Revision: https://reviews.llvm.org/D31643

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300639 91177308-0d34-0410-b5e6-96231b3b80d8

[DataLayout] Removed default value from a variable that isn't used without being overwritten. Make variable an enum instead of an int to avoid a cast later. NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300634 91177308-0d34-0410-b5e6-96231b3b80d8

[XRay][tools] Fix yaml matching to be more permissive

Account for a potentially empty function name.

Follow-up to D32153.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300631 91177308-0d34-0410-b5e6-96231b3b80d8

Allow suppressing host and target info in VersionPrinter

Summary:
VersionPrinter by default outputs information about the Host CPU
and Default target. Printing this information requires linking in
a large amount of data, such as supported target triples as C
strings, which in turn bloats the binary size.

Enable a new CMake option LLVM_VERSION_PRINTER_SHOW_HOST_TARGET_INFO
which controls printing of the host and target info. This allows
the target triple names to be dead-code stripped. This is a nice
win for LLVM clients that wish to minimize their binary size, such
as graphics drivers.

By default this is ON, so there is no change in the default behavior.
Clients who wish to suppress this printing can do so by setting this
option to off via CMake.

A test app on Linux that uses ParseCommandLineOptions() shows a binary
size reduction of 23KB (from 149K to 126K) for a Release build, and 24KB
(from 135K to 111K) in a MinSizeRel build.

Reviewers: klimek, beanz, bogner, chandlerc, compnerd

Reviewed By: compnerd

Patch by pammon (Peter Ammon) !

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30904

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300630 91177308-0d34-0410-b5e6-96231b3b80d8

[AVR] Fix the build

'PointerSize' was renamed to 'CodePointerSize'.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300629 91177308-0d34-0410-b5e6-96231b3b80d8

[XRay][tools] Add option to llvm-xray extract to symbolize functions

Summary:
This allows us to, if the symbol names are available in the binary, be
able to provide the function name in the YAML output.

Reviewers: dblaikie, pelikan

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D32153

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300624 91177308-0d34-0410-b5e6-96231b3b80d8

[ConstantRange] Optimize APInt creation in getSignedMax/getSignedMin.

We were creating an APInt at the top of these methods that isn't always returned. For ranges wider than 64-bits this results in an allocation and deallocation when its not used.

In getSignedMax we were creating Upper-1 to use in a compare and then creating it again for a return value. The compiler is unable to determine that these can be shared. So help it out and create the Upper-1 in a temporary that can be reused.

This provides a little compile time improvement.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300621 91177308-0d34-0410-b5e6-96231b3b80d8

[x86] add tests for potential andn optimization; NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300617 91177308-0d34-0410-b5e6-96231b3b80d8

Fix crash in AttributeList::addAttributes, add test

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300614 91177308-0d34-0410-b5e6-96231b3b80d8

Add a getPointerOperandType() helper to LoadInst and StoreInst; NFC

I will use this in a later change.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300613 91177308-0d34-0410-b5e6-96231b3b80d8

[MemoryBuiltins] Add isMallocOrCallocLikeFn so BasicAA can check for both at the same time

BasicAA wants to know if a function is either a malloc or calloc like function. Currently we have to check both separately. This means both calls check if its an intrinsic, query TLI, check the nobuiltin attribute, scan the AllocationFnData, etc.

This patch adds a isMallocOrCallocLikeFn so we can go through all of the checks once per call.

This also changes the one other location I saw that called both together.

Differential Revision: https://reviews.llvm.org/D32188

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300608 91177308-0d34-0410-b5e6-96231b3b80d8

[LoopReroll] Prefer hasNUses/hasNUses or more as they're cheaper. NFCI.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300607 91177308-0d34-0410-b5e6-96231b3b80d8

DAG: Make mayBeEmittedAsTailCall parameter const

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300603 91177308-0d34-0410-b5e6-96231b3b80d8

Fix typo

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300597 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Make MFI fields private

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300596 91177308-0d34-0410-b5e6-96231b3b80d8

[MemoryBuiltins] Use ImmutableCallSite instead of CallSite to remove a const_cast and const correct. NFCI

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300585 91177308-0d34-0410-b5e6-96231b3b80d8

NewGVN: Fix memory congruence verification. The return true should be a return false. Merge the appropriate if statements so it doesn't happen again.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300584 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Keep EXTRACT_VECTOR_ELT result type as f128 for Android x86_64.

Android x86_64 target uses f128 type and stores f128 values in %xmm* registers.
SoftenFloatRes_EXTRACT_VECTOR_ELT should not convert result value
from f128 to i128.

Differential Revision: http://reviews.llvm.org/D32102

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300583 91177308-0d34-0410-b5e6-96231b3b80d8

[APInt] Inline the single word case of lshrInPlace similar to what we do for <<=.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300577 91177308-0d34-0410-b5e6-96231b3b80d8

[X86][SSE] Add scheduling latency/throughput tests for (most) SSE1 instructions

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300576 91177308-0d34-0410-b5e6-96231b3b80d8

[SLP vectorizer] Allow phi node reordering in tryToVectorizeList.

In tryToVectorizeList, under a very limited circumstance (when entered
from tryToVectorizePair), the values may be reordered (swapped) and the
SLP tree is built with the new order. This extends that to the case when
starting from phis in vectorizeChainsInBlock when there are exactly two
phis. The textual order of phi nodes shouldn't really matter. Without
this change, the loop body in the accompnaying test case is fully vectorized
when we swap the orde of the phis but not with this order. While this
doesn't solve the phi-ordering problem in a general way (for more than 2
phis), this is simple fix that piggybacks on an existing mechanism and
is useful in cases like multiplying two complex numbers.

Differential revision: https://reviews.llvm.org/D32065

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300574 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Use for-range loop. NFCI.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300567 91177308-0d34-0410-b5e6-96231b3b80d8

[APInt] Use lshrInPlace to replace lshr where possible

This patch uses lshrInPlace to replace code where the object that lshr is called on is being overwritten with the result.

This adds an lshrInPlace(const APInt &) version as well.

Differential Revision: https://reviews.llvm.org/D32155

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300566 91177308-0d34-0410-b5e6-96231b3b80d8

NewGVN: Don't waste time value numbering unreachable blocks

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300565 91177308-0d34-0410-b5e6-96231b3b80d8

[DAG] Improve store merge candidate pruning.

Remove non-consecutive stores from store merge candidate search as
they cannot be merged and will prevent us from finding subsequent
mergeable store cases.

Reviewers: jyknight, bogner, javed.absar, spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D32086

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300561 91177308-0d34-0410-b5e6-96231b3b80d8

Add base-index-based store merge test

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300559 91177308-0d34-0410-b5e6-96231b3b80d8

LoopRerollPass: Prefer Value::hasOneUse() over Value::getNumUses(). NFC.

getNumUses() can be more expensive as it iterates over all list's elements.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300558 91177308-0d34-0410-b5e6-96231b3b80d8

[LV] Cache block mask values

This patch is part of D28975's breakdown.

Add caching for block masks similar to the cache already used for edge masks,
replacing generation per user with reusing the first generated value which
dominates all uses.

Differential Revision: https://reviews.llvm.org/D32054

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300557 91177308-0d34-0410-b5e6-96231b3b80d8

[ConstantRange] fix doxygen comment formatting; NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300554 91177308-0d34-0410-b5e6-96231b3b80d8

Make globalaa-retained.ll test catching more cases.

Summary:
* Add checks for store. That is needed because GlobalsAA is called
  twice in the current pipeline with different sets of Function passes
  following it. However, the loads are eliminated using instcombine
  which happens everywhere. On the other hand, DeadStoreElimination is
  performed only once so by checking for store we'll be able to catch
  more cases when GlobalsAA is invalidated unintentionally.
* Add empty function above/below the test so that we don't depend on
  the relative order of instcombine/dead-store-elimination and the
  pass that invalidates the analysis (inside the same
  FunctionPassManager).

Reviewers: kristof.beyls

Reviewed By: kristof.beyls

Subscribers: llvm-commits, n.bozhenov

Differential Revision: https://reviews.llvm.org/D32015
Patch by Andrei Elovikov <andrei.elovikov@intel.com>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300553 91177308-0d34-0410-b5e6-96231b3b80d8

[GVNHoist] Mark GlobalsAA as preserved by GVNHoist.

Reviewers: sebpop, hiraditya

Reviewed By: sebpop

Subscribers: n.bozhenov, llvm-commits

Differential Revision: https://reviews.llvm.org/D32158
Patch by Andrei Elovikov <andrei.elovikov@intel.com>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300552 91177308-0d34-0410-b5e6-96231b3b80d8

Add store Merge test.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300551 91177308-0d34-0410-b5e6-96231b3b80d8

[ARM] Add hardware build attributes in assembler

In the assembler, we should emit build attributes based on the target
selected with command-line options. This matches the GNU assembler's
behaviour. We only do this for build attributes which describe the
hardware that is expected to be available, not the ones that describe
ABI compatibility.

This is done by moving some of the attribute emission code to
ARMTargetStreamer, so that it can be shared between the assembly and
code-generation code paths. Since the assembler only creates a
MCSubtargetInfo, not an ARMSubtarget, the code had to be changed to
check raw features, and not use the convenience functions in
ARMSubtarget.

If different attributes are later specified using the .eabi_attribute
directive, then they will take precedence, as happens when the same
.eabi_attribute is specified twice.

This must be enabled by an option, because we don't want to do this when
parsing inline assembly. The attributes would match the ones emitted at
the start of the file, so wouldn't actually change the emitted object
file, but the extra directives would be added to every inline assembly
block when emitting assembly, which we'd like to avoid.

The majority of the changes in the build-attributes.ll test are just
re-ordering the directives, because the hardware attributes are now
emitted before the ABI ones. However, I did fix one bug which I spotted:
Tag_CPU_arch_profile was not being emitted for v6M.

Differential revision: https://reviews.llvm.org/D31812

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300547 91177308-0d34-0410-b5e6-96231b3b80d8

[ARM] GlobalISel: Add support for G_SUB

Support G_SUB throughout the GlobalISel pipeline. It is exactly the same
as G_ADD, nothing fancy.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300546 91177308-0d34-0410-b5e6-96231b3b80d8

[SampleProfile] Don't assert when printing the DebugLoc of a branch. NFC.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300544 91177308-0d34-0410-b5e6-96231b3b80d8

[SampleProfile] Skip intrinsic calls when visiting callsites in InlineHotFunctions.

Before this patch, we always called method 'findCalleeFunctionSamples()' on
intrinsic calls. However, intrinsic calls like llvm.dbg.value() are not viable
candidates for obvious reasons.

No functional change intended.

Differential Revision: https://reviews.llvm.org/D32008

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300541 91177308-0d34-0410-b5e6-96231b3b80d8

Revert "[GlobalISel] Support vector-of-pointers in LLT"

This reverts r300535 and r300537.
The newly added tests in test/CodeGen/AArch64/GlobalISel/arm64-fallback.ll
produces slightly different code between LLVM versions being built with different compilers.
E.g., dependent on the compiler LLVM is built with, either one of the following
can be produced:

remark: <unknown>:0:0: unable to legalize instruction: %vreg0<def>(p0) = G_EXTRACT_VECTOR_ELT %vreg1, %vreg2; (in function: vector_of_pointers_extractelement)
remark: <unknown>:0:0: unable to legalize instruction: %vreg2<def>(p0) = G_EXTRACT_VECTOR_ELT %vreg1, %vreg0; (in function: vector_of_pointers_extractelement)

Non-determinism like this is clearly a bad thing, so reverting this until
I can find and fix the root cause of the non-determinism.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300538 91177308-0d34-0410-b5e6-96231b3b80d8

Fix gcc build after r300535.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300537 91177308-0d34-0410-b5e6-96231b3b80d8

[ARM] Check for correct HW div when lowering divmod

For subtargets that use the custom lowering for divmod, e.g. gnueabi,
we used to check if the subtarget has hardware divide and then lower to
a div-mul-sub sequence if true, or to a libcall if false.

However, judging by the usage of hasDivide vs hasDivideInARMMode, it
seems that hasDivide only refers to Thumb. For instance, in the
ARMTargetLowering constructor, the code that specifies whether to use
libcalls for (S|U)DIV looks like this:

bool hasDivide = Subtarget->isThumb() ? Subtarget->hasDivide()
: Subtarget->hasDivideInARMMode();

In the case of divmod for arm-gnueabi, using only hasDivide() to
determine what to do means that instead of lowering to __aeabi_idivmod
to get the remainder, we lower to div-mul-sub and then further lower the
div to __aeabi_idiv. Even worse, if we have hardware divide in ARM but
not in Thumb, we generate a libcall instead of using it (this is not an
issue in practice since AFAICT none of the cores that we support have
hardware divide in ARM but not Thumb).

This patch fixes the code dealing with custom lowering to take into
account the mode (Thumb or ARM) when deciding whether or not hardware
division is available.

Differential Revision: https://reviews.llvm.org/D32005

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300536 91177308-0d34-0410-b5e6-96231b3b80d8

[GlobalISel] Support vector-of-pointers in LLT

This fixes PR32471.

As comment 10 on that bug report highlights
(https://bugs.llvm.org//show_bug.cgi?id=32471#c10), there are quite a
few different defendable design tradeoffs that could be made, including
not representing pointers at all in LLT.

I decided to go for representing vector-of-pointer as a concept in LLT,
while keeping the size of the LLT type 64 bits (this is an increase from
48 bits before). My rationale for keeping pointers explicit is that on
some targets probably it's very handy to have the distinction between
pointer and non-pointer (e.g. 68K has a different register bank for
pointers IIRC). If we keep a scalar pointer, it probably is easiest to
also have a vector-of-pointers to keep LLT relatively conceptually clean
and orthogonal, while we don't have a very strong reason to break that
orthogonality. Once we gain more experience on the use of LLT, we can
of course reconsider this direction.

Rejecting vector-of-pointer types in the IRTranslator is also an option
to avoid the crash reported in PR32471, but that is only a very
short-term solution; also needs quite a bit of code tweaks in places,
and is probably fragile. Therefore I didn't consider this the best
option.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300535 91177308-0d34-0410-b5e6-96231b3b80d8

test commit

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300532 91177308-0d34-0410-b5e6-96231b3b80d8

[APInt] Cleanup the reverseBits slow case a little.

Use lshrInPlace. Use single bit extract and operator|=(uint64_t) to avoid a few temporary APInts.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300527 91177308-0d34-0410-b5e6-96231b3b80d8

[APInt] Make operator<<= shift in place. Improve the implementation of tcShiftLeft and use it to implement operator<<=.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300526 91177308-0d34-0410-b5e6-96231b3b80d8

PR32382: Fix emitting complex DWARF expressions.

The DWARF specification knows 3 kinds of non-empty simple location
descriptions:
1. Register location descriptions
  - describe a variable in a register
  - consist of only a DW_OP_reg
2. Memory location descriptions
  - describe the address of a variable
3. Implicit location descriptions
  - describe the value of a variable
  - end with DW_OP_stack_value & friends

The existing DwarfExpression code is pretty much ignorant of these
restrictions. This used to not matter because we only emitted very
short expressions that we happened to get right by accident.  This
patch makes DwarfExpression aware of the rules defined by the DWARF
standard and now chooses the right kind of location description for
each expression being emitted.

This would have been an NFC commit (for the existing testsuite) if not
for the way that clang describes captured block variables. Based on
how the previous code in LLVM emitted locations, DW_OP_deref
operations that should have come at the end of the expression are put
at its beginning. Fixing this means changing the semantics of
DIExpression, so this patch bumps the version number of DIExpression
and implements a bitcode upgrade.

There are two major changes in this patch:

I had to fix the semantics of dbg.declare for describing function
arguments. After this patch a dbg.declare always takes the *address*
of a variable as the first argument, even if the argument is not an
alloca.

When lowering a DBG_VALUE, the decision of whether to emit a register
location description or a memory location description depends on the
MachineLocation — register machine locations may get promoted to
memory locations based on their DIExpression. (Future) optimization
passes that want to salvage implicit debug location for variables may
do so by appending a DW_OP_stack_value. For example:
  DBG_VALUE, [RBP-8]                        --> DW_OP_fbreg -8
  DBG_VALUE, RAX                            --> DW_OP_reg0 +0
  DBG_VALUE, RAX, DIExpression(DW_OP_deref) --> DW_OP_reg0 +0

All testcases that were modified were regenerated from clang. I also
added source-based testcases for each of these to the debuginfo-tests
repository over the last week to make sure that no synchronized bugs
slip in. The debuginfo-tests compile from source and run the debugger.

https://bugs.llvm.org/show_bug.cgi?id=32382
<rdar://problem/31205000>

Differential Revision: https://reviews.llvm.org/D31439

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300522 91177308-0d34-0410-b5e6-96231b3b80d8

Add const to a const method. NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300520 91177308-0d34-0410-b5e6-96231b3b80d8

[Target] Use hasOneUse() instead of getNumUses().

The latter does a liner scan over a linked list, therefore is
much more expensive.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300518 91177308-0d34-0410-b5e6-96231b3b80d8

Object: Shrink the size of irsymtab::Symbol by a word. NFCI.

Instead of storing an UncommonIndex on the Symbol, use a flag bit to store
whether the Symbol has an Uncommon. This shrinks Chromium's .bc files (after
D32061) by about 1%.

Differential Revision: https://reviews.llvm.org/D32070

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300514 91177308-0d34-0410-b5e6-96231b3b80d8

Build SymbolMap in SampleProfileLoader to help matchin function names with suffix.

Summary: If there is suffix added in the function name (e.g. module hash added by thinLTO), we will not be able to find a match in profile as the suffix does not exist in profile. This patch build a map from function name to Function *. The map includes the entry for the stripped function name so that inlineHotFunctions can find the corresponding function to promote/inline.

Reviewers: davidxl, dnovillo, tejohnson

Reviewed By: davidxl

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D31952

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300507 91177308-0d34-0410-b5e6-96231b3b80d8

Change the testcase tail-merge-after-mbp.ll to tail-merge-after-mbp.mir

Differential Revision: https://reviews.llvm.org/D32037

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300506 91177308-0d34-0410-b5e6-96231b3b80d8

[SimplifyCFG] Use hasNUses instead of comparing getNumUses to a constant."

The use list is a linked list so getNumUses requires a linear scan through the whole list. hasNUses will stop scanning at N and see if that is the end.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300505 91177308-0d34-0410-b5e6-96231b3b80d8

[APInt] Merge the multiword code from lshrInPlace and tcShiftRight into a single implementation

This merges the two different multiword shift right implementations into a single version located in tcShiftRight. lshrInPlace now calls tcShiftRight for the multiword case.

I retained the memmove fast path from lshrInPlace and used a memset for the zeroing. The for loop is basically tcShiftRight's implementation with the zeroing and the intra-shift of 0 removed.

Differential Revision: https://reviews.llvm.org/D32114

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300503 91177308-0d34-0410-b5e6-96231b3b80d8