granicus.if.org Git

Make block placement deterministic

We fail to produce bit-to-bit matching stage2 and stage3 compiler in PGO
bootstrap build. The reason is because LoopBlockSet is of SmallPtrSet type
whose iterating order depends on the pointer value.

This patch fixes this issue by changing to use SmallSetVector.

Differential Revision: http://reviews.llvm.org/D26634

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287148 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] replace unreachable with assert and remove unreachable code; NFCI

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287147 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Enable ConstrainCopy DAG mutation

This fixes a probably unintended divergence from the default
scheduler behavior.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287146 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] fix formatting and add FIXMEs to foldOperationIntoSelectOperand(); NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287145 91177308-0d34-0410-b5e6-96231b3b80d8

[AArch64] Handle vector types in replaceZeroVectorStore.

Summary:
Extend replaceZeroVectorStore to handle more vector type stores,
floating point zero vectors and set alignment more accurately on split
stores.

This is a follow-up change to r286875.

This change fixes PR31038.

Reviewers: MatzeB

Subscribers: mcrosier, aemerson, llvm-commits, rengolin

Differential Revision: https://reviews.llvm.org/D26682

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287142 91177308-0d34-0410-b5e6-96231b3b80d8

[LoopVectorize] Fix for non-determinism in codegen

Summary: This patch fixes issues in codegen uncovered due to https://reviews.llvm.org/D26718

Reviewers: mssimpso

Subscribers: llvm-commits, mzolotukhin

Differential Revision: https://reviews.llvm.org/D26727

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287135 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU/SI: Avoid creating unnecessary copies in the SIFixSGPRCopies pass

Summary:
1. Don't try to copy values to and from the same register class.
2. Replace copies with of registers with immediate values with v_mov/s_mov
instructions.

The main purpose of this change is to make MachineSink do a better job of
determining when it is beneficial to split a critical edge, since the pass
assumes that copies will become move instructions.

This prevents a regression in uniform-cfg.ll if we enable critical edge
splitting for AMDGPU.

Reviewers: arsenm

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: https://reviews.llvm.org/D23408

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287131 91177308-0d34-0410-b5e6-96231b3b80d8

[ExecutionEngine] Fix examples build broken in r287126 and other Include What You Use warnings.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287130 91177308-0d34-0410-b5e6-96231b3b80d8

fix comment formatting; NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287127 91177308-0d34-0410-b5e6-96231b3b80d8

[ExecutionEngine] Fix some Clang-tidy modernize-use-default, modernize-use-equals-delete and Include What You Use warnings; other minor fixes.

Differential revision: https://reviews.llvm.org/D26729

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287126 91177308-0d34-0410-b5e6-96231b3b80d8

[x86] add fake scalar FP logic instructions to ReplaceableInstrs to save some bytes

We can replace "scalar" FP-bitwise-logic with other forms of bitwise-logic instructions.
Scalar SSE/AVX FP-logic instructions only exist in your imagination and/or the bowels of
compilers, but logically equivalent int, float, and double variants of bitwise-logic
instructions are reality in x86, and the float variant may be a shorter instruction
depending on which flavor (SSE or AVX) of vector ISA you have...so just prefer float all
the time.

This is a preliminary step towards solving PR6137:
https://llvm.org/bugs/show_bug.cgi?id=6137

Differential Revision:
https://reviews.llvm.org/D26712

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287122 91177308-0d34-0410-b5e6-96231b3b80d8

[Orc] Re-enable the RPC unit test disabled in r286917.

This unit test infinite-looped on s390x due to a thread_yield being optimized
out. I've updated the QueueChannel class (where thread_yield was called) to use
a condition variable instead. This should cause the unit test to behave
correctly.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287121 91177308-0d34-0410-b5e6-96231b3b80d8

[sancov] Name the global containing the main source file name

If the global name doesn't start with __sancov_gen, ASan will insert
unecessary red zones around it.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287117 91177308-0d34-0410-b5e6-96231b3b80d8

test commit, changed tab to spaces, NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287116 91177308-0d34-0410-b5e6-96231b3b80d8

Add a little endian variant of TCE.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287111 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Add integer division test for PR23590

Shows missed opportunity to recognise reduced integer division result size

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287110 91177308-0d34-0410-b5e6-96231b3b80d8

[X86][AVX512] Autoupgrade lossless i32/u32 to f64 conversion intrinsics with generic IR

Both the (V)CVTDQ2PD (i32 to f64) and (V)CVTUDQ2PD (u32 to f64) conversion instructions are lossless and can be safely represented as generic SINT_TO_FP/UINT_TO_FP calls instead of x86 intrinsics without affecting final codegen.

LLVM counterpart to D26686

Differential Revision: https://reviews.llvm.org/D26736

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287108 91177308-0d34-0410-b5e6-96231b3b80d8

[X86][AVX512] Added some mask/maskz tests for sitofp/uitofp i32 to f64

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287106 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Regenerated integer divide tests to test on 32 and 64 bit targets

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287104 91177308-0d34-0410-b5e6-96231b3b80d8

[X86][SSE] Added PSUBUS from SELECT tests from D25987

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287103 91177308-0d34-0410-b5e6-96231b3b80d8

[mips] Fix unsigned/signed type error

MipsFastISel uses a a class to represent addresses with a signed member
to represent the offset. MipsFastISel::emitStore, emitLoad and computeAddress
all treated the offset as being positive. In cases where the offset was
actually negative and a frame pointer was used, this would cause the constant
synthesis routine to crash as it would generate an unexpected instruction
sequence when frame indexes are replaced.

Reviewers: vkalintiris

Differential Revision: https://reviews.llvm.org/D26192

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287099 91177308-0d34-0410-b5e6-96231b3b80d8

[mips] not instruction alias

This patch adds the single operand form of the not alias to microMIPS and
MIPS along with additional tests.

This partially resolves PR/30381.

Thanks to Sean Bruno for reporting the issue!

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287097 91177308-0d34-0410-b5e6-96231b3b80d8

Remove TimeValue class

Summary:
All uses have been replaced by appropriate std::chrono types, and the class is
now unused.

Reviewers: zturner, mehdi_amini

Subscribers: llvm-commits, mgorny

Differential Revision: https://reviews.llvm.org/D26447

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287094 91177308-0d34-0410-b5e6-96231b3b80d8

[X86][AVX512] Removing llvm x86 intrinsics for _mm_mask_move_{ss|sd} intrinsics.

Differential Revision: https://reviews.llvm.org/D26128

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287087 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Remove the scalar intrinsics for fadd/fsub/fdiv/fmul

Summary: These intrinsics have been unused for clang for a while. This patch removes them. We auto upgrade them to extractelements, a scalar operation and then an insertelement. This matches the sequence used by clangs intrinsic file.

Reviewers: zvi, delena, RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26660

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287083 91177308-0d34-0410-b5e6-96231b3b80d8

[ELF] Convert ELF.h to Expected<T>.

This has two advantages:
1) We slowly move away from ErrorOr to the new handling interface,
in the hope of having an uniform error handling in LLVM, eventually.
2) We're starting to have *meaningful* error messages for invalid
object ELF files, rather than a generic "parse error". At some point
we should include also the offset to improve the quality of the
diagnostic.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287081 91177308-0d34-0410-b5e6-96231b3b80d8

test: use separate input file for test

Rather than using sed to generate the input and pipe the result to
strings, use the static input instead.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287079 91177308-0d34-0410-b5e6-96231b3b80d8

[AMDGPU] Refactor v_mac_{f16, f32} patterns into a class NFC

Differential Revision: https://reviews.llvm.org/D26711

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287077 91177308-0d34-0410-b5e6-96231b3b80d8

AArch64: Use DeadRegisterDefinitionsPass before regalloc.

Doing this before register allocation reduces register pressure as we do
not even have to allocate a register for those dead definitions.

Differential Revision: https://reviews.llvm.org/D26111

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287076 91177308-0d34-0410-b5e6-96231b3b80d8

Fix build break when the host C compiler is C89.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287075 91177308-0d34-0410-b5e6-96231b3b80d8

[AMDGPU] Handle f16 select{_cc}

- Select `select` to `v_cndmask_b32`
- Expand `select_cc`
- Refactor patterns

Differential Revision: https://reviews.llvm.org/D26714

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287074 91177308-0d34-0410-b5e6-96231b3b80d8

[XRay][docs] Define requirements on installed log handlers.

Summary:
We update the documentation to define what the requirements are for the
provided XRay log handler. This is to make it clear that the function
pointer provided must do internal synchronisation and that there are no
guarantees provided by XRay on when the function shall be invoked once
it has been installed as a log handler.

Reviewers: rSerge, rengolin

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26651

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287073 91177308-0d34-0410-b5e6-96231b3b80d8

[RegAllocGreedy] Record missed hint for late recoloring.

In https://reviews.llvm.org/D25347, Geoff noticed that we still have
useless copy that we can eliminate after register allocation. At the
time the allocation is chosen for those copies, they are not useless
but, because of changes in the surrounding code, later on they might
become useless.
The Greedy allocator already has a mechanism to deal with such cases
with a late recoloring. However, we missed to record the some of the
missed hints.

This commit fixes that.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287070 91177308-0d34-0410-b5e6-96231b3b80d8

Align Modi and FileInfo substreams on 32-byte offsets.

This is required by DbiStream, but DbiStreamBuilder didn't align
these substreams, so the output of DbiSTreamBuilder couldn't be
read by DbiStream.

Test will be added to LLD.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287067 91177308-0d34-0410-b5e6-96231b3b80d8

Fixed the lost FastMathFlags for CALL operations in SLPVectorizer.
Reviewer: Michael Zolotukhin.
Differential Revision: https://reviews.llvm.org/D26575

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287064 91177308-0d34-0410-b5e6-96231b3b80d8

[BypassSlowDivision] Handle division by constant numerators better.

Summary:
We don't do BypassSlowDivision when the denominator is a constant, but
we do do it when the numerator is a constant.

This patch makes two related changes to BypassSlowDivision when the
numerator is a constant:

* If the numerator is too large to fit into the bypass width, don't
   bypass slow division (because we'll never run the smaller-width
   code).

* If we bypass slow division where the numerator is a constant, don't
   OR together the numerator and denominator when determining whether
   both operands fit within the bypass width.  We need to check only the
   denominator.

Reviewers: tra

Subscribers: llvm-commits, jholewinski

Differential Revision: https://reviews.llvm.org/D26699

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287062 91177308-0d34-0410-b5e6-96231b3b80d8

[BypassSlowDivision] Simplify partially-tautological if statement.

if (A || (B && A)) --> if (A).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287061 91177308-0d34-0410-b5e6-96231b3b80d8

Fix Modi and File count if there are more than 65535 modules/files.

These numbers are intended to be capped at 65535, but
`std::max<uint16_t>(UINT16_MAX, N)` always returns N for any N because
the expression is the same as `std::max((uint16_t)UINT16_MAX, (uint16_t)N)`.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287060 91177308-0d34-0410-b5e6-96231b3b80d8

Always use relative jump table encodings on PowerPC64.

For the default, small and medium code model, use the existing
difference from the jump table towards the label. For all other code
models, setup the picbase and use the difference between the picbase and
the block address.

Overall, this results in smaller data tables at the expensive of one or
two more arithmetic operation at the jump site. Given that we only create
jump tables with a lot more than two entries, it is a net win in size.
For larger code models the assumption remains that individual functions
are no larger than 2GB.

Differential Revision: https://reviews.llvm.org/D26336

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287059 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU/GCN: Exit early in hazard recognizer if there is no vreg argument

wbinvl.* are vector instruction that do not sue vector registers.

v2: check only M?BUF instructions

Differential Revision: https://reviews.llvm.org/D26633

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287056 91177308-0d34-0410-b5e6-96231b3b80d8

[x86] regenerate checks; NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287051 91177308-0d34-0410-b5e6-96231b3b80d8

General clean up of Mach-O error handling in llvm-objdump.

To get a good error message for all files that could contain Mach-O
files the code in llvm-objdump needs to use the archive member name
and name of the architecture of a slice of a universal file in those cases
where the error come from a Mach-O file in an archive or a universal file.

Most of this is fixed by moving the call to checkSymbolTable() into
ProcessMachO() and calling it when the operation needs the symbol
table.  And then calling the form of report_error() that has the
ArchiveName and ArchitectureName arguments.  One other place
needed to call this form of report_error() also with these arguments.

Also changed the code in MachODump.cpp to not use report_fatal_error()
and use report_error() instead to make the code smaller and cleaner.  All
cases of this are for errors with the symbol table which should now never
be tripped since checkSymbolTable() should be called first to get a good
error message in these cases.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287050 91177308-0d34-0410-b5e6-96231b3b80d8

[x86] auto-generate better checks; NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287049 91177308-0d34-0410-b5e6-96231b3b80d8

[x86] auto-generate better checks; NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287048 91177308-0d34-0410-b5e6-96231b3b80d8

[AddressSanitizer] Add support for (constant-)masked loads and stores.

This patch adds support for instrumenting masked loads and stores under
ASan, if they have a constant mask.

isInterestingMemoryAccess now supports returning a mask to be applied to
the loads, and instrumentMop will use it to generate additional checks.

Added tests for v4i32 v8i32, and v4p0i32 (~v4i64) for both loads and
stores (as well as a test to verify we don't add checks to non-constant
masks).

Differential Revision: https://reviews.llvm.org/D26230

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287047 91177308-0d34-0410-b5e6-96231b3b80d8

[x86] auto-generate better checks; NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287046 91177308-0d34-0410-b5e6-96231b3b80d8

[C API] Prevent nullptr dereferences in C API for counting attributes.

See https://reviews.llvm.org/D26392

Patch by @maleadt

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287044 91177308-0d34-0410-b5e6-96231b3b80d8

Object: replace backslashes with slashes in embedded relative thin archive paths on Windows.

This makes these thin archives portable between *nix and Windows.

Differential Revision: https://reviews.llvm.org/D26696

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287038 91177308-0d34-0410-b5e6-96231b3b80d8

[AArch64] Add support for Qualcomm's Falkor CPU.

Differential Revision: https://reviews.llvm.org/D26673

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287036 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU/SI: Fix pattern for i16 = sign_extend i1

Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D26670

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287035 91177308-0d34-0410-b5e6-96231b3b80d8

[x86] add tests for FP-logic equivalent instruction replacement

The ANDN test needs at least 3 different fixes.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287032 91177308-0d34-0410-b5e6-96231b3b80d8

[AArch64] Refactor test per Matthias' request.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287031 91177308-0d34-0410-b5e6-96231b3b80d8

[sanitizer-coverage] make sure asan does not instrument coverage guards (reported in https://github.com/google/oss-fuzz/issues/84)

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287030 91177308-0d34-0410-b5e6-96231b3b80d8

Fix llvm-symbolizer to correctly sort a symbol array and calculate symbol sizes

Sometimes, llvm-symbolizer gives wrong results due to incorrect sizes of some symbols. The reason for that was an incorrectly sorted array in computeSymbolSizes. The comparison function used subtraction of unsigned types, which is incorrect. Let's change this to return explicit -1 or 1.

Differential Revision: https://reviews.llvm.org/D26537

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287028 91177308-0d34-0410-b5e6-96231b3b80d8

GlobalISel: remove unused variable to silence warning.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287027 91177308-0d34-0410-b5e6-96231b3b80d8

llvm-objdump: deal with unexpected object files more gracefully.

Specifically, we don't want to segfault on release builds, so print the problem
instead.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287022 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Enable store clustering

Also respect the TII hook for these like the generic code does
in case we want a flag later to disable this.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287021 91177308-0d34-0410-b5e6-96231b3b80d8

[AArch64]  Lower multiplication by a constant int to shl+add+shl

Lower a = b * C where C = (2^n + 1) * 2^m to

add     w0, w0, w0, lsl n
lsl     w0, w0, m

Differential Revision: https://reviews.llvm.org/D229245

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287019 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Analyze mubuf with immediate soffset

Fixes giving up on clustering common addr64 accesses with
constant 0 soffset.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287018 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Fix return after else

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287015 91177308-0d34-0410-b5e6-96231b3b80d8

Revert r286999 which caused buildbot test failures. Some testcases need to be made target specific.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287014 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Replace assert(false) with unreachable

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287013 91177308-0d34-0410-b5e6-96231b3b80d8

[ELF] Rewrite isMips64EL() using isMipsELF64(). NFCI.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287011 91177308-0d34-0410-b5e6-96231b3b80d8

[AMDGPU] Add wave barrier builtin

The wave barrier represents the discardable barrier. Its main purpose is to
carry convergent attribute, thus preventing illegal CFG optimizations. All lanes
in a wave come to convergence point simultaneously with SIMT, thus no special
instruction is needed in the ISA. The barrier is discarded during code generation.

Differential Revision: https://reviews.llvm.org/D26585

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287007 91177308-0d34-0410-b5e6-96231b3b80d8

[x86] auto-generate checks; NFC

Also, fix the test params to use an attribute rather than a CPU model
and remove the AVX run because that does nothing but check for a 'v'
prefix in all of these tests.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287003 91177308-0d34-0410-b5e6-96231b3b80d8

[LSR] Allow formula containing Reg for SCEVAddRecExpr related with outerloop.

In RateRegister of existing LSR, if a formula contains a Reg which is a SCEVAddRecExpr,
and this SCEVAddRecExpr's loop is an outerloop, the formula will be marked as Loser
and dropped.

Suppose we have an IR that %for.body is outerloop and %for.body2 is innerloop. LSR only
handle inner loop now so only %for.body2 will be handled.

Using the logic above, formula like
reg(%array) + reg({1,+, %size}<%for.body>) + 1*reg({0,+,1}<%for.body2>) will be dropped
no matter what because reg({1,+, %size}<%for.body>) is a SCEVAddRecExpr type reg related
with outerloop. Only formula like
reg(%array) + 1*reg({{1,+, %size}<%for.body>,+,1}<nuw><nsw><%for.body2>) will be kept
because the SCEVAddRecExpr related with outerloop is folded into the initial value of the
SCEVAddRecExpr related with current loop.

But in some cases, we do need to share the basic induction variable
reg{0 ,+, 1}<%for.body2> among LSR Uses to reduce the final total number of induction
variables used by LSR, so we don't want to drop the formula like
reg(%array) + reg({1,+, %size}<%for.body>) + 1*reg({0,+,1}<%for.body2>) unconditionally.

From the existing comment, it tries to avoid considering multiple level loops at the same time.
However, existing LSR only handles innermost loop, so for any SCEVAddRecExpr with a loop other
than current loop, it is an invariant and will be simple to handle, and the formula doesn't have
to be dropped.

Differential Revision: https://reviews.llvm.org/D26429

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286999 91177308-0d34-0410-b5e6-96231b3b80d8

Integer legalization: fix MUL expansion

Summary:
This fixes the runtime results produces by the fallback multiplication expansion introduced in r270720.

For tests I created a fuzz tester that compares the results with Boost.Multiprecision.

Reviewers: hfinkel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26628

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286998 91177308-0d34-0410-b5e6-96231b3b80d8

vector load store with length (left justified) llvm portion

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286993 91177308-0d34-0410-b5e6-96231b3b80d8

fix formatting; NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286989 91177308-0d34-0410-b5e6-96231b3b80d8

[IndVars] Change the order to compute WidenAddRec in widenIVUse.

When both WidenIV::getWideRecurrence and WidenIV::getExtendedOperandRecurrence
return non-null but different WideAddRec, if getWideRecurrence is called
before getExtendedOperandRecurrence, we won't bother to call
getExtendedOperandRecurrence again. But As we know it is possible that after
SCEV folding, we cannot prove the legality using the SCEVAddRecExpr returned
by getWideRecurrence. Meanwhile if getExtendedOperandRecurrence returns non-null
WideAddRec, we know for sure that it is legal to do widening for current instruction.
So it is better to put getExtendedOperandRecurrence before getWideRecurrence, which
will increase the chance of successful widening.

Differential Revision: https://reviews.llvm.org/D26059

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286987 91177308-0d34-0410-b5e6-96231b3b80d8

[ARM] GlobalISel: Remove unused members. NFCI

This silences some warnings that I didn't see with my host compiler.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286981 91177308-0d34-0410-b5e6-96231b3b80d8

[AVX-512] Add AVX-512 vector shift intrinsics to memory santitizer.

Just needed to add the intrinsics to the exist switch. The code is generic enough to support the wider vectors with no changes.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286980 91177308-0d34-0410-b5e6-96231b3b80d8

[X86][SSE] Improve SINT_TO_FP of boolean vector results (signum)

This patch helps avoids poor legalization of boolean vector results (e.g. 8f32 -> 8i1 -> 8i16) that feed into SINT_TO_FP by inserting an early SIGN_EXTEND and so help improve the truncation logic.

This is not necessary for AVX512 targets where boolean vectors are legal - AVX512 manages to lower ( sint_to_fp vXi1 ) into some form of ( select mask, 1.0f , 0.0f ) in most cases.

Fix for PR13248

Differential Revision: https://reviews.llvm.org/D26583

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286979 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] add tests for bitcasted selects; NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286978 91177308-0d34-0410-b5e6-96231b3b80d8

Revert "[JumpThreading] Unfold selects that depend on the same condition"

This reverts commit ac54d0066c478a09c7cd28d15d0f9ff8af984afc.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286976 91177308-0d34-0410-b5e6-96231b3b80d8

Revert "[JumpThreading] Prevent non-deterministic use lists"

This reverts commit f2c2f5354070469dac253373c66527ca971ddc66.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286975 91177308-0d34-0410-b5e6-96231b3b80d8

[ARM] Make sure GlobalISel is only initialized once. NFCI

Move some code inside the proper 'if' block to make sure it is only run once,
when the subtarget is first created. Things can still break if we use different
ARM target machines or if we have functions with different 'target-cpu' or
'target-features', we should fix that too in the future.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286974 91177308-0d34-0410-b5e6-96231b3b80d8

[LoopVectorizer] When estimating reg usage, unused insts may "end" another use

The register usage algorithm incorrectly treats instructions whose value is
not used within the loop (e.g. those that do not produce a value).

The algorithm first calculates the usages within the loop.  It iterates over
the instructions in order, and records at which instruction index each use
ends (in fact, they're actually recorded against the next index, as this is
when we want to delete them from the open intervals).

The algorithm then iterates over the instructions again, adding each
instruction in turn to a list of open intervals.  Instructions are then
removed from the list of open intervals when they occur in the list of uses
ended at the current index.

The problem is, instructions which are not used in the loop are skipped.
However, although they aren't used, the last use of a value may have been
recorded against that instruction index.  In this case, the use is not deleted
from the open intervals, which may then bump up the estimated register usage.

This patch fixes the issue by simply moving the "is used" check after the loop
which erases the uses at the current index.

Differential Revision: https://reviews.llvm.org/D26554

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286969 91177308-0d34-0410-b5e6-96231b3b80d8

[PowerPC] Implement BE VSX load/store builtins - llvm portion.

This patch implements all the overloads for vec_xl_be and vec_xst_be. On BE,
they behaves exactly the same with vec_xl and vec_xst, therefore they are
simply implemented by defining a matching macro. On LE, they are implemented
by defining new builtins and intrinsics. For int/float/long long/double, it
is just a load (lxvw4x/lxvd2x) or store(stxvw4x/stxvd2x). For char/char/short,
we also need some extra shuffling before or after call the builtins to get the
desired BE order. For int128, simply call vec_xl or vec_xst.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286967 91177308-0d34-0410-b5e6-96231b3b80d8

Get GlobalISel to build on Linux after r286407

r286407 has introduced calls to llvm::AddLandingPadInfo, which lives in the
SelectionDAG component. Add it to LLVMBuild to avoid linker failures on Linux.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286962 91177308-0d34-0410-b5e6-96231b3b80d8

[X86][FastISel] Assert that we are dealing with arithmetic with overflow intrinsics. NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286961 91177308-0d34-0410-b5e6-96231b3b80d8

[AMDGPU] TableGen: change individual instruction flags to bit type from bits<1>

Summary: This is needed to be able to use this flags in InstrMappings.

Reviewers: tstellarAMD, vpykhtin

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye

Differential Revision: https://reviews.llvm.org/D26666

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286960 91177308-0d34-0410-b5e6-96231b3b80d8

[X86][FastISel] Fix lowering of overflow result on AVX512 targets

    Summary:
    Fix a case where the overflow value of type i1, which is legal on AVX512, was assigned to a VK1 register class.
    We always want this value to be assigned to a GPR since the overflow return value is lowered to a SETO instruction.

    Fixes pr30981.

    Reviewers: mkuper, igorb, craig.topper, guyblank, qcolombet

    Subscribers: qcolombet, llvm-commits

    Differential Revision: https://reviews.llvm.org/D26620

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286958 91177308-0d34-0410-b5e6-96231b3b80d8

Test commit, remove trailing space.

This commit is used to test commit access.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286957 91177308-0d34-0410-b5e6-96231b3b80d8

clang format include/llvm/Support/ELF.h. NFC.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286956 91177308-0d34-0410-b5e6-96231b3b80d8

DWARFAbbreviationDeclaration.h: Fix a typo in r286924. [-Wdocumentation]

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286954 91177308-0d34-0410-b5e6-96231b3b80d8

Introduce TLI predicative for base-relative Jump Tables.

For 64bit ABIs it is common practice to use relative Jump Tables with
potentially different relocation bases. As the logic for the jump table
itself doesn't depend on the relocation base, make it easier for targets
to use the generic logic. Start by dropping the now redundant MIPS logic.

Differential Revision: https://reviews.llvm.org/D26578

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286951 91177308-0d34-0410-b5e6-96231b3b80d8

[ARM] Add machine scheduler for Cortex-R52

This patch adds the Sched Machine Model for Cortex-R52.

Details of the pipeline and descriptions are in comments
in file ARMScheduleR52.td included in this patch.

Reviewers: rengolin, jmolloy

Differential Revision: https://reviews.llvm.org/D26500

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286949 91177308-0d34-0410-b5e6-96231b3b80d8

Fix -Wunused introduced in r286945 for release builds.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286946 91177308-0d34-0410-b5e6-96231b3b80d8

[tablegen] Extract portions of AsmMatcherEmitter for re-use by another generator. NFC.

Summary:
This change is preparation for a change that will allow targets to verify that the instructions
they emit meet the predicates they specify. This is useful to ensure that C++
legalization/lowering/instruction-selection doesn't incorrectly select code for a different
subtarget than intended. Such cases are not caught by the integrated assembler when emitting
instructions directly to an object file.

Reviewers: qcolombet

Subscribers: qcolombet, beanz, mgorny, llvm-commits, modocache

Differential Revision: https://reviews.llvm.org/D25614

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286945 91177308-0d34-0410-b5e6-96231b3b80d8

[opt-viewer] Add support for libYAML for faster parsing

This results in a speed-up of over 6x on sqlite3.

Before:

$ time -p /org/llvm/utils/opt-viewer/opt-viewer.py ./MultiSource/Applications/sqlite3/CMakeFiles/sqlite3.dir/sqlite3.c.opt.yaml html
  real 415.07
  user 410.00
  sys 4.66

After with libYAML:

$ time -p /org/llvm/utils/opt-viewer/opt-viewer.py ./MultiSource/Applications/sqlite3/CMakeFiles/sqlite3.dir/sqlite3.c.opt.yaml html
  real 63.96
  user 60.03
  sys 3.67

I followed these steps to get libYAML working with PyYAML: http://rmcgibbo.github.io/blog/2013/05/23/faster-yaml-parsing-with-libyaml/

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286942 91177308-0d34-0410-b5e6-96231b3b80d8

DAGCombiner: fix combine of trunc and select

bugzilla:
https://llvm.org/bugs/show_bug.cgi?id=29002
pr29002

Differential Revision: https://reviews.llvm.org/D26449

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286938 91177308-0d34-0410-b5e6-96231b3b80d8

TableGen: Add operator !or

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286936 91177308-0d34-0410-b5e6-96231b3b80d8

[X86][GlobalISel] Add minimal call lowering support to the IRTranslator

Summary:
    Add basic functionality to support call lowering for X86.
    Currently only supports functions which return void and take zero arguments.
    Inspired by commit 286573.

Reviewers: ab, qcolombet, t.p.northover

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26593

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286935 91177308-0d34-0410-b5e6-96231b3b80d8

[AVX-512] Add an example test case for PR31018.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286934 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Add LLVM version number for each intrinsic handled by auto upgrade for age tracking.

One day we'd like to remove some of this autoupgrade support and it will be easier if we know how long some of it has been around.

Differential Revision: https://reviews.llvm.org/D26321

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286933 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Fix f16 fabs/fneg

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286931 91177308-0d34-0410-b5e6-96231b3b80d8

[ORC] Work around an apparent modules/linkage issue.

<rdar://problem/29247092>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286930 91177308-0d34-0410-b5e6-96231b3b80d8

Simplify identify_magic.

This patch defines a memcmp-ish helper function to simplify identify_magic.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286928 91177308-0d34-0410-b5e6-96231b3b80d8

Improve DWARF parsing speed by improving DWARFAbbreviationDeclaration

This patch gets a DWARF parsing speed improvement by having DWARFAbbreviationDeclaration instances know if they have a fixed byte size. If an abbreviation has a fixed byte size that can be calculated given a DWARFUnit, then parsing a DIE becomes two steps: parse ULEB128 abbrev code, and then add constant size to the offset.

This patch also adds a fixed byte size to each DWARFAbbreviationDeclaration::AttributeSpec so that attributes can quickly skip their values if needed without the need to lookup the fixed for size.

Notable improvements:

- DWARFAbbreviationDeclaration::findAttributeIndex() now returns an Optional<uint32_t> instead of a uint32_t and we no longer have to look for the magic -1U return value
- Optional<uint32_t> DWARFAbbreviationDeclaration::findAttributeIndex(dwarf::Attribute attr) const;
- DWARFAbbreviationDeclaration now has a getAttributeValue() function that extracts an attribute value given a DIE offset that takes advantage of the DWARFAbbreviationDeclaration::AttributeSpec::ByteSize
- bool DWARFAbbreviationDeclaration::getAttributeValue(const uint32_t DIEOffset, const dwarf::Attribute Attr, const DWARFUnit &U, DWARFFormValue &FormValue) const;
- A DWARFAbbreviationDeclaration instance can return a fixed byte size for itself so DWARF parsing is faster:
- Optional<size_t> DWARFAbbreviationDeclaration::getFixedAttributesByteSize(const DWARFUnit &U) const;
- Any functions that used to take a "const DWARFUnit *U" that would crash if U was NULL now take a "const DWARFUnit &U" and are only called with a valid DWARFUnit

Differential Revision: https://reviews.llvm.org/D26567

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286924 91177308-0d34-0410-b5e6-96231b3b80d8