granicus.if.org Git

[PPC] Fix atomics lowering in DAG lowering.

I forgot to forward the chain, causing some missing instruction
dependencies. The test crashes the compiler without this patch.

Inspired by the test case, D33519 also tries to remove the extra sync.

Differential Revision: https://reviews.llvm.org/D33573

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303931 91177308-0d34-0410-b5e6-96231b3b80d8

Fix test to handle running on platforms which don't enable pubnames at all

Check that there are no entries in the pub sections, but that they may
either be not present or present-but-empty.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303927 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] Add an InstCombine specific wrapper around isKnownToBeAPowerOfTwo to shorten code. NFC

We have wrappers for several other ValueTracking methods that take care of passing all of the analysis and assumption cache parameters. This extends it to isKnownToBeAPowerOfTwo.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303924 91177308-0d34-0410-b5e6-96231b3b80d8

[GVN] Add phi-translate support in scalarpre.

Right now scalarpre doesn't have phi-translate support, so it will miss some
simple pre opportunities. Like the following testcase, current scalarpre cannot
recognize the last "a * b" is fully redundent because a and b used by the last
"a * b" expr are both defined by phis.

  long a[100], b[100], g1, g2, g3;
  __attribute__((pure)) long goo();

  void foo(long a, long b, long c, long d) {
    g1 = a * b;
    if (__builtin_expect(g2 > 3, 0)) {
      a = c;
      b = d;
      g2 = a * b;
    }
    g3 = a * b;      // fully redundant.
  }

The patch adds phi-translate support in scalarpre. This is only a temporary
solution before the newpre based on newgvn is available.

Differential Revision: https://reviews.llvm.org/D32252

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303923 91177308-0d34-0410-b5e6-96231b3b80d8

Add constrained intrinsics for some libm-equivalent operations

Differential revision: https://reviews.llvm.org/D32319

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303922 91177308-0d34-0410-b5e6-96231b3b80d8

CodeGen: Rename DEBUG_TYPE to match passnames

Rename the DEBUG_TYPE to match the names of corresponding passes where
it makes sense. Also establish the pattern of simply referencing
DEBUG_TYPE instead of repeating the passname where possible.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303921 91177308-0d34-0410-b5e6-96231b3b80d8

[lld] Fix a bug where we continually re-follow type servers.

Originally this was intended to be set up so that when linking
a PDB which refers to a type server, it would only visit the
PDB once, and on subsequent visitations it would just skip it
since all the records had already been added.

Due to some C++ scoping issues, this was not occurring and it
was revisiting the type server every time, which caused every
record to end up being thrown away on all subsequent visitations.

This doesn't affect the performance of linking clang-cl generated
object files because we don't use type servers, but when linking
object files and libraries generated with /Zi via MSVC, this means
only 1 object file has to be linked instead of N object files, so
the speedup is quite large.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303920 91177308-0d34-0410-b5e6-96231b3b80d8

[CodeView Type Merging] Don't keep re-allocating temp serializer.

Previously, every time we wanted to serialize a field list record, we
would create a new copy of FieldListRecordBuilder, which would in turn
create a temporary instance of TypeSerializer, which itself had a
std::vector<> that was about 128K in size. So this 128K allocation was
happening every time. We can re-use the same instance over and over, we
just have to clear its internal hash table and seen records list between
each run. This saves us from the constant re-allocations.

This is worth an ~18.5% speed increase (3.75s -> 3.05s) in my tests.

Differential Revision: https://reviews.llvm.org/D33506

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303919 91177308-0d34-0410-b5e6-96231b3b80d8

Make BinaryStreamReader::readCString a bit faster.

Previously it would do a character by character search for a null
terminator, to account for the fact that an arbitrary stream need not
store its data contiguously so you couldn't just do a memchr. However, the
stream API has a function which will return the longest contiguous chunk
without doing a copy, and by using this function we can do a memchr on the
individual chunks. For certain types of streams like data from object
files etc, this is guaranteed to find the null terminator with only a
single memchr, but even with discontiguous streams such as
MappedBlockStream, it's rare that any given string will cross a block
boundary, so even those will almost always be satisfied with a single
memchr.

This optimization is worth a 10-12% reduction in link time (4.2 seconds ->
3.75 seconds)

Differential Revision: https://reviews.llvm.org/D33503

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303918 91177308-0d34-0410-b5e6-96231b3b80d8

[pdb] pad source file name buffer at the end instead of the beginning

Summary:
DbiStreamBuilder calculated the offset of the source file names inside
the file info substream as the size of the file info substream minus
the size of the file names. Since the file info substream is padded to
a multiple of 4 bytes, this caused the first file name to be aligned
on a 4-byte boundary. By contrast, DbiModuleList would read the file
names immediately after the file name offset table, without skipping
to the next 4-byte boundary. This change makes it so that the file
names are written to the location where DbiModuleList expects them,
and puts any necessary padding for the file info substream after the
file names instead of before it.

Reviewers: amccarth, rnk, zturner

Reviewed By: amccarth, zturner

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D33475

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303917 91177308-0d34-0410-b5e6-96231b3b80d8

Fix a bug in MappedBlockStream.

It was using the number of blocks of the entire PDB file as the number
of blocks of each stream that was created. This was only an issue in
the readLongestContiguousChunk function, which was never called prior.
This bug surfaced when I updated an algorithm to use this function and
the algorithm broke.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303916 91177308-0d34-0410-b5e6-96231b3b80d8

[WebAssembly] MC: Include unnamed data when writing wasm files

Also, include global entries for all data symbols, not
just external ones, since these are referenced by the
relocation records.

Add a test case that includes unnamed data.

Differential Revision: https://reviews.llvm.org/D33079

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303915 91177308-0d34-0410-b5e6-96231b3b80d8

[CodeView Type Merging] Avoid record deserialization when possible.

A profile shows the majority of time doing type merging is spent
deserializing records from sequences of bytes into friendly C++ structures
that we can easily access members of in order to find the type indices to
re-write.

Records are prefixed with their length, however, and most records have
type indices that appear at fixed offsets in the record. For these
records, we can save some cycles by just looking at the right place in the
byte sequence and re-writing the value, then skipping the record in the
type stream. This saves us from the costly deserialization of examining
every field, including potentially null terminated strings which are the
slowest, even though it was unnecessary to begin with.

In addition, we apply another optimization. Previously, after
deserializing a record and re-writing its type indices, we would
unconditionally re-serialize it in order to compute the hash of the
re-written record. This would result in an alloc and memcpy for every
record. If no type indices were re-written, however, this was an
unnecessary allocation. In this patch re-writing is made two phase. The
first phase discovers the indices that need to be rewritten and their new
values. This information is passed through to the de-duplication code,
which only copies and re-writes type indices in the serialized byte
sequence if at least one type index is different.

Some records have type indices which only appear after variable length
strings, or which have lists of type indices, or various other situations
that can make it tricky to make this optimization. While I'm not giving up
on optimizing these cases as well, for now we can get the easy cases out
of the way and lay the groundwork for more complicated cases later.

This patch yields another 50% speedup on top of the already large speedups
submitted over the past 2 days. In two tests I have run, I went from 9
seconds to 3 seconds, and from 16 seconds to 8 seconds.

Differential Revision: https://reviews.llvm.org/D33480

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303914 91177308-0d34-0410-b5e6-96231b3b80d8

Update the documentation and CMake file for Visual Studio generators.

By default, CMake uses a 32-bit toolchain, even when on a 64-bit platform targeting a 64-bit build. However, due to the size of the binaries involved, this can cause linker instabilities (such as the linker running out of memory). Guide people to the correct solution to get CMake to use the native toolchain.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303912 91177308-0d34-0410-b5e6-96231b3b80d8

PPC: Correct Size for GETtlsADDR

PPC::GETtlsADDR is lowered to a branch and a nop, by the assembly
printer. Its size was incorrectly marked as 4, correct it to 8. The
incorrect size can cause incorrect branch relaxation in
PPCBranchSelector under the right conditions.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303904 91177308-0d34-0410-b5e6-96231b3b80d8

Revert r303859, CodeGen/AMDGPU/llvm.amdgcn.s.getpc.ll fails on bots.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303902 91177308-0d34-0410-b5e6-96231b3b80d8

[AArch64]: add 'a' inline asm operand modifier.

Summary:
This is used in the Linux kernel, and effectively just means "print an
address". This brings back r193593.

Reviewed by: Renato Golin

Reviewers: t.p.northover, rengolin, richard.barton.arm, kristof.beyls

Subscribers: aemerson, javed.absar, llvm-commits, eraman

Differential Revision: https://reviews.llvm.org/D33558

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303901 91177308-0d34-0410-b5e6-96231b3b80d8

Fix SelectionDAGBuilder::getDbgValue to not expect DW_OP_deref on FI vars

This fixes an oversight in r300522, which changed alloca
dbg.values to no longer emit a DW_OP_deref.

The array.ll testcase was regenerated from source.

Fixes PR33166:
https://bugs.llvm.org/show_bug.cgi?id=33166

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303897 91177308-0d34-0410-b5e6-96231b3b80d8

Delete an obsolete paragraph in LangRef.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303896 91177308-0d34-0410-b5e6-96231b3b80d8

DebugInfo: Produce debug_{gnu_}pub{names,types} entries when explicitly requested, even in -gmlt or when empty

Turns out gold doesn't use the DW_AT_GNU_pubnames to decide whether to
parse the rest of the DIEs when building gdb-index. This causes gold to
trip over LLVM's output when there are DW_FORM_ref_addr present.

Gold does use the presence of a debug_gnu_pub{names,types} entry for the
CU to skip parsing the debug_info portion, so make sure that's included
even when empty (technically, when empty there couldn't be any ref_addr
anyway - it only came up when gmlt didn't produce any (even non-empty)
pubnames - but given what that reveals about gold's implementation, this
seems like a good thing to do for consistency).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303894 91177308-0d34-0410-b5e6-96231b3b80d8

[llvm-pdbdump] [yaml2pdb] always include object file name in module info

Summary:
Previously, the yaml2pdb subcommand of llvm-pdbdump only
included object file names in module info if a module info stream was
present. This change makes it so that we include the object file name
even if there is no module info stream for the module. As a result,
running
llvm-pdbdump pdb2yaml -dbi-module-info original.pdb > original.yaml &&
llvm-pdbdump yaml2pdb -pdb=new.pdb original.yaml && llvm-pdbdump
pdb2yaml -dbi-module-info new.pdb > new.yaml now produces identical
original.yaml and new.yaml files.

Reviewers: amccarth, zturner

Reviewed By: zturner

Subscribers: fhahn, llvm-commits

Differential Revision: https://reviews.llvm.org/D33463

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303891 91177308-0d34-0410-b5e6-96231b3b80d8

NewGVN: Fix PR 33119, PR 33129, due to regressed undef handling
Fix PR33120 and others by eliminating self-cycles a different way.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303875 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] Teach isAllocSiteRemovable to look through addrspacecasts

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D28565

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303870 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] make icmp-mul fold more efficient

There's probably a lot more like this (see also comments in D33338 about responsibility),
but I suspect we don't usually get a visible manifestation.

Given the recent interest in improving InstCombine efficiency, another potential micro-opt
that could be repeated several times in this function: morph the existing icmp pred/operands
instead of creating a new instruction.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303860 91177308-0d34-0410-b5e6-96231b3b80d8

[AMDGPU] add intrinsic for s_getpc

Summary: The s_getpc instruction is exposed as intrinsic llvm.amdgcn.s.getpc.

Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D32862

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303859 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Adding vpopcntd and vpopcntq instructions

AVX512_VPOPCNTDQ is a new feature set that was published by Intel.
The patch represents the LLVM side of the addition of two new intrinsic based instructions (vpopcntd and vpopcntq).

Differential Revision: https://reviews.llvm.org/D33169

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303858 91177308-0d34-0410-b5e6-96231b3b80d8

[GVNSink] Pacify MSVC

Don't convert an unsigned to a pointer for a sentinel, use a size_t instead.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303855 91177308-0d34-0410-b5e6-96231b3b80d8

[GVNSink] Don't define operator<< in NDEBUG

Without debug macros enabled, the raw_ostream operator<< overload
is unused.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303852 91177308-0d34-0410-b5e6-96231b3b80d8

[GVNSink] GVNSink pass

This patch provides an initial prototype for a pass that sinks instructions based on GVN information, similar to GVNHoist. It is not yet ready for commiting but I've uploaded it to gather some initial thoughts.

This pass attempts to sink instructions into successors, reducing static
instruction count and enabling if-conversion.
We use a variant of global value numbering to decide what can be sunk.
Consider:

[ %a1 = add i32 %b, 1  ]   [ %c1 = add i32 %d, 1  ]
[ %a2 = xor i32 %a1, 1 ]   [ %c2 = xor i32 %c1, 1 ]
                 \           /
           [ %e = phi i32 %a2, %c2 ]
           [ add i32 %e, 4         ]

GVN would number %a1 and %c1 differently because they compute different
results - the VN of an instruction is a function of its opcode and the
transitive closure of its operands. This is the key property for hoisting
and CSE.

What we want when sinking however is for a numbering that is a function of
the *uses* of an instruction, which allows us to answer the question "if I
replace %a1 with %c1, will it contribute in an equivalent way to all
successive instructions?". The (new) PostValueTable class in GVN provides this
mapping.

This pass has some shown really impressive improvements especially for codesize already on internal benchmarks, so I have high hopes it can replace all the sinking logic in SimplifyCFG.

Differential revision: https://reviews.llvm.org/D24805

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303850 91177308-0d34-0410-b5e6-96231b3b80d8

[PM] Teach the PGO instrumentation pasess to run GlobalDCE before
instrumenting code.

This is important in the new pass manager. The old pass manager's
inliner has a small DCE routine embedded within it. The new pass manager
relies on the actual GlobalDCE pass for this.

Without this patch, instrumentation profiling with the new PM results in
massive code bloat in the object files because the instrumentation
itself ends up preventing DCE from working to remove the code.

We should probably change the instrumentation (and/or DCE) so that we
can eliminate dead code even if instrumented, but we shouldn't even
spend the time generating instrumentation for that code so this still
seems like a good patch.

Differential Revision: https://reviews.llvm.org/D33535

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303845 91177308-0d34-0410-b5e6-96231b3b80d8

[PM/Unswitch] Fix a bug in the domtree update logic for the new unswitch
pass.

The original logic only considered direct successors of the hoisted
domtree nodes, but that isn't really enough. If there are other basic
blocks that are completely within the subtree, their successors could
just as easily be impacted by the hoisting.

The more I think about it, the more I think the correct update here is
to hoist every block on the dominance frontier which has an idom in the
chain we hoist across. However, this is subtle enough that I'd
definitely appreciate some more eyes on it.

Sadly, if this is the correct algorithm, it requires computing a (highly
localized) dominance frontier. I've done this in the simplest (IE, least
code) way I could come up with, but that may be too naive. Suggestions
welcome here, dominance update algorithms are not an area I've studied
much, so I don't have strong opinions.

In good news, with this patch, turning on simple unswitch passes the
LLVM test suite for me with asserts enabled.

Differential Revision: https://reviews.llvm.org/D32740

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303843 91177308-0d34-0410-b5e6-96231b3b80d8

[MVT] Fix the identation of the start of the MVT class. NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303841 91177308-0d34-0410-b5e6-96231b3b80d8

[SelectionDAG] Fix off by one in a compare in getOperationAction.

If Op is equal to array_lengthof, the lookup would be out of bounds, but we were only checking for greater than. I suspect nothing ever passes in the equal value because its a sentinel to mark the end of the builtin opcodes and not a real opcode.

So really this fix is just so that the code looks right and makes sense.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303840 91177308-0d34-0410-b5e6-96231b3b80d8

[LegacyPM] Make the 'addLoop' method accept a loop to add rather than
having it internally allocate the loop.

This is a much more flexible API and necessary in the new loop unswitch
to reasonably support both new and old PMs in common code. It also just
seems like a cleaner separation of concerns.

NFC, this should just be a pure refactoring.

Differential Revision: https://reviews.llvm.org/D33528

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303834 91177308-0d34-0410-b5e6-96231b3b80d8

Fixed nondeterminism in RuleMatcher::emit.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303829 91177308-0d34-0410-b5e6-96231b3b80d8

[libFuzzer] Don't replace custom signal handlers.

Summary:
This allows to keep handlers installed by sanitizers.
In other cases third-party code can replace handlers after libFuzzer
initialization anyway.

Reviewers: kcc

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D33522

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303828 91177308-0d34-0410-b5e6-96231b3b80d8

Fix coverage check for full post-dominator basic blocks.

Coverage instrumentation which does not instrument full post-dominators
and full-dominators may skip valid paths, as the reasoning for skipping
blocks may become circular.
This patch fixes that, by only skipping
full post-dominators with multiple predecessors, as such predecessors by
definition can not be full-dominators.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303827 91177308-0d34-0410-b5e6-96231b3b80d8

[coroutines] CoroFrame.cpp conform to coding convention (s/repeat/Repeat) (NFC)

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303826 91177308-0d34-0410-b5e6-96231b3b80d8

[coroutines] Relocate instructions that maybe spilled after coro.begin

Summary:
Frontend generates store instructions after allocas, for example:

```
define i8* @f(i64 %this) "coroutine.presplit"="1" personality i32 0 {
entry:
  %this.addr = alloca i64
  store i64 %this, i64* %this.addr
  ..
  %hdl = call i8* @llvm.coro.begin(token %id, i8* %alloc)

```
Such instructions may require spilling into coro.frame, but, coro-frame address is only available after coro.begin and thus needs to be moved after coro.begin.
The only instructions that should not be moved are the arguments of coro.begin and all of their operands.

Reviewers: GorNishanov, majnemer

Reviewed By: GorNishanov

Subscribers: llvm-commits, EricWF

Differential Revision: https://reviews.llvm.org/D33527

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303825 91177308-0d34-0410-b5e6-96231b3b80d8

[PowerPC] Fix a performance bug for PPC::XXSLDWI.

There are some VectorShuffle Nodes in SDAG which can be selected to XXSLDWI
instruction, this patch recognizes them and does the selection to improve the
PPC performance.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303822 91177308-0d34-0410-b5e6-96231b3b80d8

Print symbols from COFF import libraries.

This change allows llvm-nm to print symbols found in import libraries,
in part by allowing COFFImportFiles to be casted to SymbolicFiles.

Patch by Dave Lee!

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303821 91177308-0d34-0410-b5e6-96231b3b80d8

[CodeGen] Fix some Clang-tidy modernize-use-using and Include What You Use warnings; other minor fixes (NFC).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303820 91177308-0d34-0410-b5e6-96231b3b80d8

[coroutines] Allow rematerialization upto 4 times. Remove incorrect assert

Reviewers: majnemer

Subscribers: EricWF, llvm-commits

Differential Revision: https://reviews.llvm.org/D33524

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303819 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] use m_APInt to allow icmp-mul-mul vector fold

The swapped operands in the first test is a manifestation of an
inefficiency for vectors that doesn't exist for scalars because
the IRBuilder checks for an all-ones mask for scalars, but not
vectors.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303818 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] add tests for icmp eq (mul X, C), (mul Y, C); NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303816 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] move tests and use FileCheck; NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303808 91177308-0d34-0410-b5e6-96231b3b80d8

[DAG] Prevent crashes when merging constant stores with high-bit set. NFC.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303802 91177308-0d34-0410-b5e6-96231b3b80d8

[AArch64] Prevent nested ADDs from address calc in splitStoreSplat. NFC

In preparation for late-stage store merging.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303800 91177308-0d34-0410-b5e6-96231b3b80d8

Revert "Revert "Attempt to pacify ASan and UBSan reports in CrashRecovery tests""

This dependents on r303729 which was reverted.

This reverts commit r303783.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303796 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] Merge together the SimplifyDemandedUseBits implementations for ZExt and Trunc. NFC

While there avoid resizing the DemandedMask twice. Make a copy into a separate variable instead. This potentially removes an allocation on large bit widths.

With the use of the zextOrTrunc methods on APInt and KnownBits these can be made almost source identical. The only difference is the zero of the upper bits for ZExt. This is similar to how its done in computeKnownBits in ValueTracking.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303791 91177308-0d34-0410-b5e6-96231b3b80d8

Prevent UBSan report in CrashRecovery tests
Reverted by mistake with r303783.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303785 91177308-0d34-0410-b5e6-96231b3b80d8

Revert "Attempt to pacify ASan and UBSan reports in CrashRecovery tests"

It's not needed after r303729.

This reverts commit r303311.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303783 91177308-0d34-0410-b5e6-96231b3b80d8

Fix a couple of typos in memory intrinsic optimization output (NFC)

s/instrinsic/intrinsic

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303782 91177308-0d34-0410-b5e6-96231b3b80d8

P9: D-form vector load/store. Differential Revision: https://reviews.llvm.org/D33248

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303780 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] Use less bitwise operations to handle Instruction::SExt in SimplifyDemandedUseBits. Other improvements.

The current code created a NewBits mask and used it as a mask several times. One of them just before a call to trunc making it unnecessary. A call to getActiveBits can get us the same information for the case. We also ORed with this mask later when we should have just sign extended the known bits.

We also called trunc on the guaranteed to be zero KnownZeros/Ones masks entering this code. Creating appropriately sized temporary APInts is probably better.

Differential Revision: https://reviews.llvm.org/D32098

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303779 91177308-0d34-0410-b5e6-96231b3b80d8

Move machine-cse-physreg.mir to test/CodeGen/Thumb

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303778 91177308-0d34-0410-b5e6-96231b3b80d8

[InstSimplify] Simplify uadd/sadd/umul/smul with overflow intrinsics when the Zero or Undef is on the LHS.

Summary: This code was migrated from InstCombine a few years ago. InstCombine had nearby code that would move Constants to the RHS for these, but InstSimplify doesn't have such code on this path.

Reviewers: spatel, majnemer, davide

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D33473

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303774 91177308-0d34-0410-b5e6-96231b3b80d8

[ValueTracking] Convert most of the calls to computeKnownBits to use the version that returns the KnownBits object.

This continues the changes started when computeSignBit was replaced with this new version of computeKnowBits.

Differential Revision: https://reviews.llvm.org/D33431

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303773 91177308-0d34-0410-b5e6-96231b3b80d8

[ValueTracking] Add OptimizationRemarkEmitter to the other signature for commuteKnownBits.

This is needed for an upcoming patch.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303772 91177308-0d34-0410-b5e6-96231b3b80d8

Revert r291254: [AArch64] Reduce vector insert/extract cost for Falkor

The default vector insert/extract cost is more profitable on Falkor than the
reduced cost.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303771 91177308-0d34-0410-b5e6-96231b3b80d8

Add some tips on benchmarking.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303769 91177308-0d34-0410-b5e6-96231b3b80d8

[AMDGPU] Prevent too large store merges in AMDGPU Subtargets. NFCI.

Various address spaces on the SI and R600 subtargets have stricter
limits on memory access size that other address spaces. Use
canMergeStoresTo predicate to prevent the DAGCombiner from creating
these stores as they will be split up during legalization.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303767 91177308-0d34-0410-b5e6-96231b3b80d8

[LV] Update type in cost model for scalarization

For non-uniform instructions marked for scalarization, we should update
`VectorTy` when computing instruction costs to reflect the scalar type. In
addition to determining instruction costs, this type is also used to signal
that all instructions in the loop will be scalarized. This currently affects
memory instructions and non-pointer induction variables and their updates. (We
also mark GEPs scalar after vectorization, but their cost is computed together
with memory instructions.) For scalarized induction updates, this patch also
scales the scalar cost by the vectorization factor, corresponding to each
induction step.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303763 91177308-0d34-0410-b5e6-96231b3b80d8

[MSP430] Fix PR33050: Don't use ADD16ri to lower FrameIndex.

Use ADDframe pseudo instruction instead.
This will fix machine verifier error, and will help to fix PR32146.

Differential Revision: https://reviews.llvm.org/D33452

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303758 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] add tests to show potential missing folds; NFC

As noted in https://bugs.llvm.org/show_bug.cgi?id=33138 and
the comments, there are multiple ways to view this. If we
choose not to solve this in InstCombine, these tests will
serve as documentation of that choice.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303755 91177308-0d34-0410-b5e6-96231b3b80d8

Revert "AMDGPU: Fold CI-specific complex SMRD patterns into existing complex patterns"

This reverts commit e065977c4b5f68ab845400b256f6a3822b1325fa.

It doesn't work. S_LOAD_DWORD_IMM_ci and friends aren't selected by any of
the patterns, so it was putting 32-bit literals into the 8-bit field.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303754 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] add tests to document bitcast + bitwise-logic behavior; NFC

The solution for PR26702 ( https://bugs.llvm.org/show_bug.cgi?id=26702 )
added a canonicalization rule, but the minimal regression tests don't
demonstrate how that rule interacts with other folds.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303750 91177308-0d34-0410-b5e6-96231b3b80d8

Revert "[SCEV] Do not fold dominated SCEVUnknown into AddRecExpr start"

This reverts commit r303730 because it broke all the buildbots.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303747 91177308-0d34-0410-b5e6-96231b3b80d8

[Hexagon] Fix comment in HexagonPacketizer::runOnMachineFunction

Patch by Wei-Ren Chen.

Differential Revision: https://reviews.llvm.org/D33439

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303745 91177308-0d34-0410-b5e6-96231b3b80d8

[LoopVectorizer] Let target prefer scalar addressing computations.

The loop vectorizer usually vectorizes any instruction it can and then
extracts the elements for a scalarized use. On SystemZ, all elements
containing addresses must be extracted into address registers (GRs). Since
this extraction is not free, it is better to have the address in a suitable
register to begin with. By forcing address arithmetic instructions and loads
of addresses to be scalar after vectorization, two benefits result:

* No need to extract the register
* LSR optimizations trigger (LSR isn't handling vector addresses currently)

Benchmarking show improvements on SystemZ with this new behaviour.

Any other target could try this by returning false in the new hook
prefersVectorizedAddressing().

Review: Renato Golin, Elena Demikhovsky, Ulrich Weigand
https://reviews.llvm.org/D32422

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303744 91177308-0d34-0410-b5e6-96231b3b80d8

[SystemZ] Fix register modelling in expandLoadStackGuard()

EXPENSIVE_CHECKS found this bug (https://bugs.llvm.org/show_bug.cgi?id=33047), which
this patch fixes. The EAR instruction defines a GR32, not a GR64.

Review: Ulrich Weigand

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303743 91177308-0d34-0410-b5e6-96231b3b80d8

Demangler: Fix constructor cv qualifier handling

Previously if we parsed a constructor then we set parsed_ctor_dtor_cv
to true and never reseted it. This causes issue when a template argument
references a constructor (e.g. type of lambda defined inside a
constructor) as we will have the parsed_ctor_dtor_cv flag set what will
cause issues when parsing later arguments.

Differential Revision: https://reviews.llvm.org/D33385
libcxxabi change: https://reviews.llvm.org/rL303737

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303738 91177308-0d34-0410-b5e6-96231b3b80d8

Strip trailing whitespace. NFCI.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303736 91177308-0d34-0410-b5e6-96231b3b80d8

[ARM] Remove ThumbTargetMachines. (NFC)

Summary:
Thumb code generation is controlled by ARMSubtarget and the concrete
ThumbLETargetMachine and ThumbBETargetMachine are not needed.

Eric Christopher suggested removing the unneeded target machines in
https://reviews.llvm.org/D33287.

I think it still makes sense to keep separate TargetMachines for big and
little endian as we probably do not want to have different endianess for
difference functions in a single compilation unit. The MIPS backend has
two separate TargetMachines for big and little endian as well.

Reviewers: echristo, rengolin, kristof.beyls, t.p.northover

Reviewed By: echristo

Subscribers: aemerson, javed.absar, arichardson, llvm-commits

Differential Revision: https://reviews.llvm.org/D33318

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303733 91177308-0d34-0410-b5e6-96231b3b80d8

MachineCSE: Respect interblock physreg liveness

Summary:
This is a fix for PR32538. MachineCSE first looks at MO.isDead(), but
if it is not marked dead, MachineCSE still wants to do its own check
to see if it is trivially dead. This check for the trivial case
assumed that physical registers cannot be live out of a block.

Patch by Mattias Eriksson.

Reviewers: qcolombet, jbhateja

Reviewed By: qcolombet, jbhateja

Subscribers: jbhateja, llvm-commits

Differential Revision: https://reviews.llvm.org/D33408

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303731 91177308-0d34-0410-b5e6-96231b3b80d8

[SCEV] Do not fold dominated SCEVUnknown into AddRecExpr start

When folding arguments of AddExpr or MulExpr with recurrences, we rely on the fact that
the loop of our base recurrency is the bottom-lost in terms of domination. This assumption
may be broken by an expression which is treated as invariant, and which depends on a complex
Phi for which SCEVUnknown was created. If such Phi is a loop Phi, and this loop is lower than
the chosen AddRecExpr's loop, it is invalid to fold our expression with the recurrence.

Another reason why it might be invalid to fold SCEVUnknown into Phi start value is that unlike
other SCEVs, SCEVUnknown are sometimes position-bound. For example, here:

for (...) { // loop
phi = {A,+,B}
}
X = load ...
Folding phi + X into {A+X,+,B}<loop> actually makes no sense, because X does not exist and cannot
exist while we are iterating in loop (this memory can be even not allocated and not filled by this moment).
It is only valid to make such folding if X is defined before the loop. In this case the recurrence {A+X,+,B}<loop>
may be existant.

This patch prohibits folding of SCEVUnknown (and those who use them) into the start value of an AddRecExpr,
if this instruction is dominated by the loop. Merging the dominating unknown values is still valid. Some tests that
relied on the fact that some SCEVUnknown should be folded into AddRec's are changed so that they no longer
expect such behavior.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303730 91177308-0d34-0410-b5e6-96231b3b80d8

Explicitly set CPU and -slow-incdec to try to fix r303678's test on llvm-clang-x86_64-expensive-checks-win.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303727 91177308-0d34-0410-b5e6-96231b3b80d8

[APInt] Use std::end to avoid mentioning the size of a local buffer repeatedly.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303726 91177308-0d34-0410-b5e6-96231b3b80d8

Revert r303720: Tweak r303678's test to try to fix llvm-clang-x86_64-expensive-checks-win.

It doesn't fix that builder.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303721 91177308-0d34-0410-b5e6-96231b3b80d8

Tweak r303678's test to try to fix llvm-clang-x86_64-expensive-checks-win.

I suspect this buildbot has slow-incdec set by default, most likely due to
the default CPU having this set. This feature bit can prevent optsize from
having an effect on this IR.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303720 91177308-0d34-0410-b5e6-96231b3b80d8

[ARM] Add VLDx/VSTx sched defs for machine-schedulers. NFCI

This patch adds missing scheds for Neon VLDx/VSTx instructions.
This will help one write schedulers easier/faster in the future for ARM sub-targets.
Existing models will not affected by this patch.
Reviewed by: Renato Golin, Diana Picus
Differential Revision: https://reviews.llvm.org/D33120

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303717 91177308-0d34-0410-b5e6-96231b3b80d8

[NewGVN] Update additionalUsers when we simplify to a value.

Otherwise we don't revisit an instruction that could be simplified,
and when we verify, we discover there's something that changed, i.e.
what we had wasn't a maximal fixpoint.

Fixes PR32836.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303715 91177308-0d34-0410-b5e6-96231b3b80d8

Fix broken build.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303711 91177308-0d34-0410-b5e6-96231b3b80d8

Revert "Disable coverage opt-out for strong postdominator blocks."

This reverts commit 2ed06f05fc10869dd1239cff96fcdea2ee8bf4ef.
Buildbots do not like this on Linux.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303710 91177308-0d34-0410-b5e6-96231b3b80d8

Revert "Fixes for tests for r303698"

This reverts commit 69bfaf72e7502eb08bbca88a57925fa31c6295c6.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303709 91177308-0d34-0410-b5e6-96231b3b80d8

git-llvm script should add .exe on Windows.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303708 91177308-0d34-0410-b5e6-96231b3b80d8

Don't do a full scan of the type stream before processing records.

LazyRandomTypeCollection is designed for random access, and in
order to provide this it lazily indexes ranges of types. In the
case of types from an object file, there is no partial index
to build off of, so it has to index the full stream up front.
However, merging types only requires sequential access, and when
that is needed, this extra work is simply wasted. Changing the
algorithm to work on sequential arrays of types rather than
random access type collections eliminates this up front scan.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303707 91177308-0d34-0410-b5e6-96231b3b80d8

[SCCP] Use the `hasAddressTaken()` version defined in `Function`.

Instead of using the SCCP homegrown one. We should eventually
make the private SCCP version disappear, but that wont' be today.
PR33143 tracks this issue.

Add braces for consistency while here. No functional change intended.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303706 91177308-0d34-0410-b5e6-96231b3b80d8

[LIR] Use the newly `getRecurrenceVar()` helper. NFCI.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303704 91177308-0d34-0410-b5e6-96231b3b80d8

Fixes for tests for r303698

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303701 91177308-0d34-0410-b5e6-96231b3b80d8

[LIR] Strengthen the check for recurrence variable in popcnt/CTLZ.

Fixes PR33114.
Differential Revision: https://reviews.llvm.org/D33420

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303700 91177308-0d34-0410-b5e6-96231b3b80d8

Disable coverage opt-out for strong postdominator blocks.

Coverage instrumentation has an optimization not to instrument extra
blocks, if the pass is already "accounted for" by a
successor/predecessor basic block.
However (https://github.com/google/sanitizers/issues/783) this
reasoning may become circular, which stops valid paths from having
coverage.
In the worst case this can cause fuzzing to stop working entirely.

This change simplifies logic to something which trivially can not have
such circular reasoning, as losing valid paths does not seem like a
good trade-off for a ~15% decrease in the # of instrumented basic blocks.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303698 91177308-0d34-0410-b5e6-96231b3b80d8

Revert LLVM changes for "Sema: allow imaginary constants via GNU extension if UDL overloads not present."

The changes accidentally crept into a Clang commit I was making.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303697 91177308-0d34-0410-b5e6-96231b3b80d8

[git-llvm] Check if svn is installed.

The error message that git-llvm script prints out when svn is missing
is very cryptic. I spent a fair amount of time to find what was wrong
with my environment. It looks like many newcomers also exprienced a
hard time to submit their first patches due to this error.

This patch adds a more user-friendly error message.

Differential Revision: https://reviews.llvm.org/D33458

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303696 91177308-0d34-0410-b5e6-96231b3b80d8

[MSP430] Add subtarget features for hardware multiplier.

Also add more processors to make -mcpu option behave similar to gcc.

Differential Revision: https://reviews.llvm.org/D33335

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303695 91177308-0d34-0410-b5e6-96231b3b80d8

Sema: allow imaginary constants via GNU extension if UDL overloads not present.

C++14 added user-defined literal support for complex numbers so that you can
write something like "complex<double> val = 2i". However, there is an existing
GNU extension supporting this syntax and interpreting the result as a _Complex
type.

This changes parsing so that such literals are interpreted in terms of C++14's
operators if an overload is present but otherwise falls back to the original
GNU extension.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303694 91177308-0d34-0410-b5e6-96231b3b80d8

Silence MSVC warning about unsigned integer overflow, which has defined behavior

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303693 91177308-0d34-0410-b5e6-96231b3b80d8

abtest: remove duplicate script

This is fixing a mistake from r303690.

Differential Revision: https://reviews.llvm.org/D33303

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303692 91177308-0d34-0410-b5e6-96231b3b80d8

[AMDGPU] Add INDIRECT_BASE_ADDR to R600_Reg32 class (PR33045)

This fixes 17 of the 41 -verify-machineinstrs test failures identified in PR33045

Differential Revision: https://reviews.llvm.org/D33451

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303691 91177308-0d34-0410-b5e6-96231b3b80d8

AsmPrinter: mark the beginning and the end of a function in verbose mode

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303690 91177308-0d34-0410-b5e6-96231b3b80d8