granicus.if.org Git

[DAGCombiner] improve throughput of shift+logic+shift

The motivating case for this is a long way from here:
https://bugs.llvm.org/show_bug.cgi?id=43146
...but I think this is where we have to start.

We need to canonicalize/optimize sequences of shift and logic to ease
pattern matching for things like bswap and improve perf in general.
But without the artificial limit of '!LegalTypes' (early combining),
there are a lot of test diffs, and not all are good.

In the minimal tests added for this proposal, x86 should have better
throughput in all cases. AArch64 is neutral for scalar tests because
it can fold shifts into bitwise logic ops.

There are 3 shift opcodes and 3 logic opcodes for a total of 9 possible patterns:
https://rise4fun.com/Alive/VlI
https://rise4fun.com/Alive/n1m
https://rise4fun.com/Alive/1Vn

Differential Revision: https://reviews.llvm.org/D67021

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370617 91177308-0d34-0410-b5e6-96231b3b80d8

Fix MSVC unreferenced formal parameter warning. NFCI.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370615 91177308-0d34-0410-b5e6-96231b3b80d8

Fix MSVC unreferenced formal parameter warning. NFCI.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370614 91177308-0d34-0410-b5e6-96231b3b80d8

[X86][AVX] Rename + cleanup lowerShuffleAsLanePermuteAndBlend. NFCI.

Rename to lowerShuffleAsLanePermuteAndShuffle to make it clear that not just blends are performed.

Cleanup the in-lane shuffle mask generation to make it more obvious what's going on.

Some prep work noticed while investigating the poor shuffle code mentioned in D66004.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370613 91177308-0d34-0410-b5e6-96231b3b80d8

Fix shadow variable warning. NFCI.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370610 91177308-0d34-0410-b5e6-96231b3b80d8

[ConstantFolding] Fix 'undef' folding for @llvm.[us]{add,sub}.with.overflow ops (PR43188)

As we have already established/fixed in
  https://bugs.llvm.org/show_bug.cgi?id=42209
  https://reviews.llvm.org/D63065
  https://reviews.llvm.org/rL363522
the InstSimplify handling for @llvm.with.overflow ops with undefs
is correct. Therefore if ConstantFolding produces different results,
then it is wrong.

This duplication of code hints at the need for some refactoring,
but for now address the brokenness of ConstantFolding by
copying the known-good handling from rL363522.

Fixes https://bugs.llvm.org/show_bug.cgi?id=43188

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370608 91177308-0d34-0410-b5e6-96231b3b80d8

[ARM] Remove MVE masked loads/stores

These were never enabled correctly and are causing other problems. Taking them
out for the moment, whilst we work on the issues.

This reverts r370329.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370607 91177308-0d34-0410-b5e6-96231b3b80d8

[TargetLowering] Fix Bugzilla ID 43183 to avoid soften comparison broken with constant inputs

Summary:
This fixes the bugzilla id 43183 which triggerd by the following commit:
[RISCV] Avoid generating AssertZext for LP64 ABI when lowering floating LibCall

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370604 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Remove unused custom node definition

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370603 91177308-0d34-0410-b5e6-96231b3b80d8

[GlobalISel][NFC] Regression test cases for aarch64 legalizer (s128 sext+icmp).

There were legalizer asserts in aarch64 globalisel (in debug mode) with s128
sext+icmp before r367060 and r366943 landed. These are just a couple reduced
mir and ir regression tests that came from a build where these were encountered.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370602 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Replace some COPY_TO_REGCLASS from GR32/GR64 to VR128 in isel patterns with VMOVDI2PDIrr/VMOV64toPQIrr.

This is what the copies will eventually be turned into. We don't
use COPY_TO_REGCLASS for scalar_to_vector patterns. So we should
use the real instruction here too.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370601 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Compress the flag bits in the folding tables to make room for more bits in an upcoming patch.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370600 91177308-0d34-0410-b5e6-96231b3b80d8

[NFC] Fixed -Wdocumentation warning

/srv/llvm-buildbot-srcatch/llvm-build-dir/clang-x86_64-debian-fast/llvm.src/lib/Target/AMDGPU/AMDGPUGenRegisterBankInfo.def:98:1: warning: not a Doxygen trailing comment [-Wdocumentation]
1 warning generated.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370596 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] mempcpy(d,s,n) to memcpy(d,s,n) + n

Summary:
Back-end currently expands mempcpy, but middle-end should work with memcpy instead of mempcpy to enable more memcpy-optimization.

GCC backend emits mempcpy, so LLVM backend could form it too, if we know mempcpy libcall is better than memcpy + n.
https://godbolt.org/z/dOCG96

Reviewers: efriedma, spatel, craig.topper, RKSimon, jdoerfert

Reviewed By: efriedma

Subscribers: hjl.tools, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65737

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370593 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] EltsFromConsecutiveLoads - Don't confuse elt count with vector element count (PR43170)

EltsFromConsecutiveLoads was assuming that the number of input elts was the same as the number of elements in the output vector type when creating a zeroing shuffle, causing an assert when subvectors were being combined instead of just scalars.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370592 91177308-0d34-0410-b5e6-96231b3b80d8

[X86][AVX512] Regenerate tests with common prefixes

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370591 91177308-0d34-0410-b5e6-96231b3b80d8

[AArch64][x86] increase value type coverage in tests; NFC
This goes with D67021.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370590 91177308-0d34-0410-b5e6-96231b3b80d8

Fix shadow variable warning by making CondCodes names more explicit. NFCI.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370589 91177308-0d34-0410-b5e6-96231b3b80d8

Revert [Clang Interpreter] Initial patch for the constexpr interpreter

This reverts r370584 (git commit afcb3de117265a69d21e5673356e925a454d7d02)

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370588 91177308-0d34-0410-b5e6-96231b3b80d8

[DAGCombiner] clean up code in visitShiftByConstant()

This is not quite NFC because the SDLoc propagation is changed,
but there are no regression test diffs from that.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370587 91177308-0d34-0410-b5e6-96231b3b80d8

Fix shadow variable warning. NFCI.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370585 91177308-0d34-0410-b5e6-96231b3b80d8

[Clang Interpreter] Initial patch for the constexpr interpreter

Summary:
This patch introduces the skeleton of the constexpr interpreter,
capable of evaluating a simple constexpr functions consisting of
if statements. The interpreter is described in more detail in the
RFC. Further patches will add more features.

Reviewers: Bigcheese, jfb, rsmith

Subscribers: bruno, uenoku, ldionne, Tyker, thegameg, tschuett, dexonsmith, mgorny, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D64146

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370584 91177308-0d34-0410-b5e6-96231b3b80d8

[X86ISelLowering] combineCMov - cleanup CMOV->LEA codegen. NFCI.

Only compute the diff once and we don't need the truncation code (assert the bitwidth is correct just to be safe).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370583 91177308-0d34-0410-b5e6-96231b3b80d8

[X86ISelLowering] LowerSELECT - remove duplicate value type. NFCI.

VT of SELECT result and selection ops will be the same.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370581 91177308-0d34-0410-b5e6-96231b3b80d8

Fix cppcheck shadow variable and variable scope warnings. NFCI.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370580 91177308-0d34-0410-b5e6-96231b3b80d8

[DAGCombiner] Match (add X, X) as (shl X, 1) when detecting rotate.

Summary: The combiner transforms (shl X, 1) into (add X, X).

Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66882

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370578 91177308-0d34-0410-b5e6-96231b3b80d8

[llvm-objcopy] Simplify alignToAddr with llvm::alignTo

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370577 91177308-0d34-0410-b5e6-96231b3b80d8

[DAGCombiner] Don't create illegal narrow stores

Narrowing stores when the target doesn't support the narrow version
forces the target to expand into a load-modify-store sequence, which
is highly suboptimal. The information narrowing throws away (legality
of the inverse transform) is hard to re-analyze. If the target doesn't
support a store of the narrow type, don't narrow even in pre-legalize
mode.

No test as this is DAGCombiner and depends on target bits.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370576 91177308-0d34-0410-b5e6-96231b3b80d8

[LVI] Extract solveBlockValueExtractValue(); NFC

Extract this method in preparation for additional extractvalue
support.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370575 91177308-0d34-0410-b5e6-96231b3b80d8

[CVP] Add tests for simplified with.overflow + icmp; NFC

These tests are based on D19867.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370574 91177308-0d34-0410-b5e6-96231b3b80d8

[CVP] Generate simpler code for elided with.overflow intrinsics

Use a { iN undef, i1 false } struct as the base, and only insert
the first operand, instead of using { iN undef, i1 undef } as the
base and inserting both. This is the same as what we do in InstCombine.

Differential Revision: https://reviews.llvm.org/D67034

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370573 91177308-0d34-0410-b5e6-96231b3b80d8

[CodeGen] Refactor DAGTypeLegalizer::ExpandIntRes_MULFIX. NFC

Restructured the code a little bit in preparation for adding
UMULFIXSAT. I think it will be easier to understand the code
if not interleaving the codegen for signed/unsigned/saturated
cases that much.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370569 91177308-0d34-0410-b5e6-96231b3b80d8

[LangRef] Update saturating examples for llvm.smul.fix.sat. NFC

Some saturation examples for llvm.smul.fix.sat were not showing
the correct result. I've adjusted the operands to make sure that
we actually trigger overflow in those examples.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370566 91177308-0d34-0410-b5e6-96231b3b80d8

Fix some errors introduced by rL370563 which were not exposed on my local machine.
1. zlib::compress accept &size_t but the param is an uint64_t.
2. Some systems don't have zlib installed. Don't use compression by default.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370564 91177308-0d34-0410-b5e6-96231b3b80d8

[SampleFDO] Add profile symbol list section to discriminate function being
cold versus function being newly added.

This is the second half of https://reviews.llvm.org/D66374.

Profile symbol list is the collection of function symbols showing up in
the binary which generates the current profile. It is used to discriminate
function being cold versus function being newly added. Profile symbol list
is only added for profile with ExtBinary format.

During profile use compilation, when profile-sample-accurate is enabled,
a function without profile will be regarded as cold only when it is
contained in that list.

Differential Revision: https://reviews.llvm.org/D66766

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370563 91177308-0d34-0410-b5e6-96231b3b80d8

llvm-dwarfdump: Cache CU low_pc when computing statistics.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370559 91177308-0d34-0410-b5e6-96231b3b80d8

[WebAssembly] Add SIMD QFMA/QFMS

Summary:
Adds clang builtins and LLVM intrinsics for these experimental
instructions. They are not implemented in engines yet, but that is ok
because the user must opt into using them by calling the builtins.

Reviewers: aheejin, dschuff

Reviewed By: aheejin

Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D67020

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370556 91177308-0d34-0410-b5e6-96231b3b80d8

[lit] Only set DYLD_LIBRARY_PATH for shared builds

In r370135 I committed a temporary workaround for the sanitized bot to
not set (DY)LD_LIBRARY_PATH when (DY)LD_INSERT_LIBRARIES was set.
Setting (DY)LD_LIBRARY_PATH is only necessary for (standalone)
shared-library builds, so a better solution is to only set the
environment variable when necessary.

Differential revision: https://reviews.llvm.org/D67012

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370549 91177308-0d34-0410-b5e6-96231b3b80d8

[MemorySSA] Rename all phi entries.

When renaming Phis incoming values, there may be multiple edges incoming
from the same block (switch). Rename all.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370548 91177308-0d34-0410-b5e6-96231b3b80d8

[GVN] Verify value equality before doing phi translation for call instruction

This is an updated version of https://reviews.llvm.org/D66909 to fix PR42605.

Basically, current phi translatation translates an old value number to an new
value number for a call instruction based on the literal equality of call
expression, without verifying there is no clobber in between. This is incorrect.

To get a finegrain check, use MachineDependence analysis to do the job. However,
this is still not ideal. Although given a call instruction,
`MemoryDependenceResults::getCallDependencyFrom` returns identical call
instructions without clobber in between using MemDepResult with its DepType to
be `Def`. However, identical is too strict here and we want it to be relaxed a
little to consider phi-translation -- callee is the same, param operands can be
different. That means changing the semantic of `MemDepResult::Def` and I don't
know the potential impact.

So currently the patch is still conservative to only handle
MemDepResult::NonFuncLocal, which means the current call has no function local
clobber. If there is clobber, even if the clobber doesn't stand in between the
current call and the call with the new value, we won't do phi-translate.

Differential Revision: https://reviews.llvm.org/D67013

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370547 91177308-0d34-0410-b5e6-96231b3b80d8

Fix SEH_NoReturn machine verifier error

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370543 91177308-0d34-0410-b5e6-96231b3b80d8

[MC] Avoid crashes from improperly nested or wrong target .seh_handlerdata directives

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370540 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Print register names in .seh_* directives

Also improve assembler parser register validation for .seh_ directives.
This requires moving X86-specific seh directive handling into the x86
backend, which addresses some assembler FIXMEs.

Differential Revision: https://reviews.llvm.org/D66625

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370533 91177308-0d34-0410-b5e6-96231b3b80d8

[x86] add tests for shift-logic-shift; NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370529 91177308-0d34-0410-b5e6-96231b3b80d8

[AArch64] add tests for shift-logic-shift; NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370528 91177308-0d34-0410-b5e6-96231b3b80d8

[Windows] Disable TrapUnreachable for Win64, add SEH_NoReturn

Users have complained llvm.trap produce two ud2 instructions on Win64,
one for the trap, and one for unreachable. This change fixes that.

TrapUnreachable was added and enabled for Win64 in r206684 (April 2014)
to avoid poorly understood issues with the Windows unwinder.

There seem to be two major things in play:
- the unwinder
- C++ EH, _CxxFrameHandler3 & co

The unwinder disassembles forward from the return address to scan for
epilogues. Inserting a ud2 had the effect of stopping the unwinder, and
ensuring that it ran the EH personality function for the current frame.
However, it's not clear what the unwinder does when the return address
happens to be the last address of one function and the first address of
the next function.

The Visual C++ EH personality, _CxxFrameHandler3, needs to figure out
what the current EH state number is. It does this by consulting the
ip2state table, which maps from PC to state number. This seems to go
wrong when the return address is the last PC of the function or catch
funclet.

I'm not sure precisely which system is involved here, but in order to
address these real or hypothetical problems, I believe it is enough to
insert int3 after a call site if it would otherwise be the last
instruction in a function or funclet. I was able to reproduce some
similar problems locally by arranging for a noreturn call to appear at
the end of a catch block immediately before an unrelated function, and I
confirmed that the problems go away when an extra trailing int3
instruction is added.

MSVC inserts int3 after every noreturn function call, but I believe it's
only necessary to do it if the call would be the last instruction. This
change inserts a pseudo instruction that expands to int3 if it is in the
last basic block of a function or funclet. I did what I could to run the
Microsoft compiler EH tests, and the ones I was able to run showed no
behavior difference before or after this change.

Differential Revision: https://reviews.llvm.org/D66980

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370525 91177308-0d34-0410-b5e6-96231b3b80d8

[IFS][NFC] llvm-ifs: Fixing build bot build break: revert r370517 and r370510.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370522 91177308-0d34-0410-b5e6-96231b3b80d8

[Thumb2] tighten CHECK lines in test; NFC

The sequence between the function call and the asm start
may change without affecting what this test is looking for,
but we should have a better idea about what that sequence
looks like.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370518 91177308-0d34-0410-b5e6-96231b3b80d8

[IFS][NFC] llvm-ifs: Fixing build bot error due to commit conflicts.

r370510 and r370504

Again only on gcc.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370517 91177308-0d34-0410-b5e6-96231b3b80d8

gn build: Merge r370512

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370516 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Fix mul test cases in avx512-broadcast-unfold.ll to not get canonicalized to fadd. Remove the fsub test cases which were also testing fadd.

Not sure how to prevent an fsub by constant getting turned into an fadd by negative constant.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370515 91177308-0d34-0410-b5e6-96231b3b80d8

[IFS][NFC] llvm-ifs: Fixing build errors for bots using GCC.

gcc produces the error:

error: specialization of
‘template<class T, class Enable> struct llvm::yaml::ScalarTraits’ in
different namespace

For all specializations outside of llvm::yaml. So I added llvm::yaml to these
specializations to fix the errors on the bots building with gcc (/usr/bin/c++).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370510 91177308-0d34-0410-b5e6-96231b3b80d8

[DFAPacketizer] Allow namespacing of automata per-itinerary

The Hexagon itineraries are cunningly crafted such that functional units between
itineraries do not clash. Because all itineraries are bundled into the same DFA,
a functional unit index clash would cause an incorrect DFA to be generated.

A workaround for this is to ensure all itineraries declare the universe of all
possible functional units, but this isn't ideal for three reasons:
  1) We only have a limited number of FUs we can encode in the packetizer, and
     using the universe causes us to hit the limit without care.
  2) Silent codegen faults are bad, and careful triage of the FU list shouldn't
     be required.
  3) Smooshing all itineraries into the same automaton allows combinations of
     instruction classes that cannot exist, which bloats the table.

A simple solution is to allow "namespacing" packetizers.

Differential Revision: https://reviews.llvm.org/D66940

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370508 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Regenerate the test cases added in r370506.

Something weird happened with the v2i64/v2f64 test cases which
don't use broadcast. So they should already be hoisted, but
weren't in the version I submitted in r370506. This fixes that.
Not sure if something changed or I screwed up.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370507 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Add test caes for opportunities for machine LICM to unfold broadcasted constant pool loads.

MachineLICM is able to unfold loads to move an invariant load out
a loop, but X86 infrastructure currently lacks the ability to do
this when avx512 embedded broadcasting is used.

This test adds examples for the basic float point operations,
add, mul, and, or, and xor.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370506 91177308-0d34-0410-b5e6-96231b3b80d8

[PowerPC][NFC] Avoid checking non-relevant .cfi instructions

Summary:
This is brought up in
https://reviews.llvm.org/D64662?id=209923#inline-599490

CFI information are non-relevant to quite some testcases,
we should get rid of checking them when its unecessary.

This patch avoid generating cfi info in testcases that are not
testing prolog/epilog or exception handling.

Reviewers: kbarton, hfinkel, nemanjai, #powerpc

Reviewed By: hfinkel

Subscribers: MaskRay, shchenz, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67016

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370505 91177308-0d34-0410-b5e6-96231b3b80d8

Fix compilation warnings. NFC.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370504 91177308-0d34-0410-b5e6-96231b3b80d8

gn build: Merge r370500

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370501 91177308-0d34-0410-b5e6-96231b3b80d8

[MachinePipeliner] Separate schedule emission, NFC

This is the first stage in refactoring the pipeliner and making it more
accessible for backends to override and control. This separates the logic and
state required to *emit* a scheudule from the logic that *computes* and
validates a schedule.

This will enable (a) new schedule emitters and (b) new modulo scheduling
implementations to coexist.

NFC.

Differential Revision: https://reviews.llvm.org/D67006

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370500 91177308-0d34-0410-b5e6-96231b3b80d8

[llvm-ifs][IFS] llvm Interface Stubs merging + object file generation tool.

This tool merges interface stub files to produce a merged interface stub file
or a stub library. Currently it for stub library generation it can produce an
ELF .so stub file, or a TBD file (experimental). It will be used by the clang
-emit-interface-stubs compilation pipeline to merge and assemble the per-CU
stub files into a stub library.

The new IFS format is as follows:

--- !experimental-ifs-v1
IfsVersion:      1.0
Triple:          <llvm triple>
ObjectFileFormat: <ELF | TBD>
Symbols:
  _ZSymbolName: { Type: <type>, etc... }
...

Differential Revision: https://reviews.llvm.org/D66405

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370499 91177308-0d34-0410-b5e6-96231b3b80d8

[DAGCombine] ReduceLoadWidth - remove duplicate SDLoc. NFCI.

SDLoc(N0) and SDLoc(cast<LoadSDNode>(N0)) should be equivalent.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370498 91177308-0d34-0410-b5e6-96231b3b80d8

[TargetLowering] SimplifyDemandedBits ADD/SUB/MUL - correctly inherit SDNodeFlags from the original node.

Just disable NSW/NUW flags. This matches what we're already doing for the other situations for these nodes, it was just missed for the demanded constant case.

Noticed by inspection - confirmed in offline discussion with @spatel. I've checked we have test coverage in the x86 extract-bits.ll and extract-lowbits.ll tests

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370497 91177308-0d34-0410-b5e6-96231b3b80d8

GlobalISel: Fix missing pass dependency

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370496 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Pass v32i16/v64i8 in zmm registers on KNL target.

gcc and icc pass these types in zmm registers in zmm registers.

This patch implements a quick hack to override the register
type before calling convention handling to one that is legal.
Longer term we might want to do something similar to 256-bit
integer registers on AVX1 where we just split all the operations.

Fixes PR42957

Differential Revision: https://reviews.llvm.org/D66708

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370495 91177308-0d34-0410-b5e6-96231b3b80d8

[ValueTypes] Add v16f16 and v32f16 to EVT::getEVTString and Tablegen's getEnumName

Missed these when I hadded the enum entries

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370494 91177308-0d34-0410-b5e6-96231b3b80d8

gn build: Merge r370490

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370492 91177308-0d34-0410-b5e6-96231b3b80d8

MemTag: unchecked load/store optimization.

Summary:
MTE allows memory access to bypass tag check iff the address argument
is [SP, #imm]. This change takes advantage of this to demote uses of
tagged addresses to regular FrameIndex operands, reducing register
pressure in large functions.

MO_TAGGED target flag is used to signal that the FrameIndex operand
refers to memory that might be tagged, and needs to be handled with
care. Such operand must be lowered to [SP, #imm] directly, without a
scratch register.

The transformation pass attempts to predict when the offset will be
out of range and disable the optimization.
AArch64RegisterInfo::eliminateFrameIndex has an escape hatch in case
this prediction has been wrong, but it is quite inefficient and should
be avoided.

Reviewers: pcc, vitalybuka, ostannard

Subscribers: mgorny, javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66457

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370490 91177308-0d34-0410-b5e6-96231b3b80d8

[DAGCombine] visitVSELECT - remove equivalent getValueType() call. NFCI.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370489 91177308-0d34-0410-b5e6-96231b3b80d8

[INSTRUCTIONS] Add support of const for getLoadStorePointerOperand() and
getLoadStorePointerOperand().
Reviewer: hsaito, sebpop, reames, hfinkel, mkuper, bogner, haicheng,
arsenm, lattner, chandlerc, grosser, rengolin
Reviewed By: reames
Subscribers: wdng, llvm-commits, bmahjour
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D66595

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370486 91177308-0d34-0410-b5e6-96231b3b80d8

[Attributor] Fix: do not pretend to preserve the CFG

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370485 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Merge X86InstrInfo::loadRegFromAddr/storeRegToAddr into their only call site.

I'm looking at unfolding broadcast loads on AVX512 which will
require refactoring this code to select broadcast opcodes instead
of regular load/stores in some cases. Merging them to avoid
further complicating their interfaces.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370484 91177308-0d34-0410-b5e6-96231b3b80d8

[Attributor] Use existing function information for the call site

Summary:
Instead of recomputing information for call sites we now use the
function information directly. This is always valid and once we have
call site specific information we can improve here.

This patch also bootstraps attributes that are created on-demand through
an initial update call. Information that is known will then directly be
available in the new attribute without causing an iteration delay.

The tests show how this improves the iteration count.

Reviewers: sstefan1, uenoku

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66781

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370480 91177308-0d34-0410-b5e6-96231b3b80d8

[Attributor] Manifest load/store alignment generally

Summary:
Any pointer could have load/store users not only floating ones so we
move the manifest logic for alignment into the AAAlignImpl class.

Reviewers: uenoku, sstefan1

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66922

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370479 91177308-0d34-0410-b5e6-96231b3b80d8

[DAGCombine] visitVSELECT - remove duplicate getOperand calls. NFCI.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370478 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine][AMDGPU] Simplify tbuffer loads

Summary: Add missing tbuffer loads intrinsics in SimplifyDemandedVectorElts.

Reviewers: arsenm, nhaehnle

Reviewed By: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66926

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370475 91177308-0d34-0410-b5e6-96231b3b80d8

[llvm-nm] Small fix to Exected<StringRef>

Differential Revision: https://reviews.llvm.org/D66976

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370474 91177308-0d34-0410-b5e6-96231b3b80d8

[yaml2obj][obj2yaml] - Use a single "Other" field instead of "Other", "Visibility" and "StOther".

Currenly we can encode the 'st_other' field of symbol using 3 fields.
'Visibility' is used to encode STV_* values.
'Other' is used to encode everything except the visibility, but it can't handle arbitrary values.
'StOther' is used to encode arbitrary values when 'Visibility'/'Other' are not helpfull enough.

'st_other' field is used to encode symbol visibility and platform-dependent
flags and values. Problem to encode it is that it consists of Visibility part (STV_* values)
which are enumeration values and the Other part, which is different and inconsistent.

For MIPS the Other part contains flags for all STO_MIPS_* values except STO_MIPS_MIPS16.
(Like comment in ELFDumper says: "Someones in their infinite wisdom decided to make
STO_MIPS_MIPS16 flag overlapped with other ST_MIPS_xxx flags."...)

And for PPC64 the Other part might actually encode any value.

This patch implements custom logic for handling the st_other and removes
'Visibility' and 'StOther' fields.

Here is an example of a new YAML style this patch allows:

- Name:  foo
  Other: [ 0x4 ]
- Name:  bar
  Other: [ STV_PROTECTED, 4 ]
- Name:  zed
  Other: [ STV_PROTECTED, STO_MIPS_OPTIONAL, 0xf8 ]

Differential revision: https://reviews.llvm.org/D66886

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370472 91177308-0d34-0410-b5e6-96231b3b80d8

[DAGCombine] visitVSELECT - use getShiftAmountTy for shift amounts.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370471 91177308-0d34-0410-b5e6-96231b3b80d8

[DAGCombine] visitMULHS - use getScalarValueSizeInBits() to make safe for vector types.

This is hidden behind a (scalar-only) isOneConstant(N1) check at the moment, but once we get around to adding vector support we need to ensure we're dealing with the scalar bitwidth, not the total.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370468 91177308-0d34-0410-b5e6-96231b3b80d8

[mips] Merge common checkings under the same check prefix. NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370467 91177308-0d34-0410-b5e6-96231b3b80d8

[RISCV] Fix a couple of tests' CHECKs

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370466 91177308-0d34-0410-b5e6-96231b3b80d8

Remove an extra ";", NFC.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370465 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Add tests for rotate matching. NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370464 91177308-0d34-0410-b5e6-96231b3b80d8

[CodeGen] Introduce MachineBasicBlock::replacePhiUsesWith helper and use it. NFC

Summary:
Found a couple of places in the code where all the PHI nodes
of a MBB is updated, replacing references to one MBB by
reference to another MBB instead.

This patch simply refactors the code to use a common helper
(MachineBasicBlock::replacePhiUsesWith) for such PHI node
updates.

Reviewers: t.p.northover, arsenm, uabelho

Subscribers: wdng, hiraditya, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66750

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370463 91177308-0d34-0410-b5e6-96231b3b80d8

[DAGCombine] visitMULHS/visitMULHU - isBuildVectorAllZeros doesn't mean node is all zeros

Return a proper zero vector, just in case some elements are undef.

Noticed by inspection after dealing with a similar issue in PR43159.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370460 91177308-0d34-0410-b5e6-96231b3b80d8

Fix Wdocumentation warning. NFCI.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370459 91177308-0d34-0410-b5e6-96231b3b80d8

[llvm-objcopy] Allow the visibility of symbols created by --binary and
--add-symbol to be specified with --new-symbol-visibility

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370458 91177308-0d34-0410-b5e6-96231b3b80d8

[Attributor] Implement AANoAliasCallSiteArgument initialization

Summary: This patch adds an appropriate `initialize` method for `AANoAliasCallSiteArgument`.

Reviewers: jdoerfert, sstefan1

Reviewed By: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66927

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370456 91177308-0d34-0410-b5e6-96231b3b80d8

[LoopIdiomRecognize] BCmp loop idiom recognition

Summary:
@mclow.lists brought up this issue up in IRC.
It is a reasonably common problem to compare some two values for equality.
Those may be just some integers, strings or arrays of integers.

In C, there is `memcmp()`, `bcmp()` functions.
In C++, there exists `std::equal()` algorithm.
One can also write that function manually.

libstdc++'s `std::equal()` is specialized to directly call `memcmp()` for
various types, but not `std::byte` from C++2a. https://godbolt.org/z/mx2ejJ

libc++ does not do anything like that, it simply relies on simple C++'s
`operator==()`. https://godbolt.org/z/er0Zwf (GOOD!)

So likely, there exists a certain performance opportunities.
Let's compare performance of naive `std::equal()` (no `memcmp()`) with one that
is using `memcmp()` (in this case, compiled with modified compiler). {F8768213}

```
#include <algorithm>
#include <cmath>
#include <cstdint>
#include <iterator>
#include <limits>
#include <random>
#include <type_traits>
#include <utility>
#include <vector>

#include "benchmark/benchmark.h"

template <class T>
bool equal(T* a, T* a_end, T* b) noexcept {
  for (; a != a_end; ++a, ++b) {
    if (*a != *b) return false;
  }
  return true;
}

template <typename T>
std::vector<T> getVectorOfRandomNumbers(size_t count) {
  std::random_device rd;
  std::mt19937 gen(rd());
  std::uniform_int_distribution<T> dis(std::numeric_limits<T>::min(),
                                       std::numeric_limits<T>::max());
  std::vector<T> v;
  v.reserve(count);
  std::generate_n(std::back_inserter(v), count,
                  [&dis, &gen]() { return dis(gen); });
  assert(v.size() == count);
  return v;
}

struct Identical {
  template <typename T>
  static std::pair<std::vector<T>, std::vector<T>> Gen(size_t count) {
    auto Tmp = getVectorOfRandomNumbers<T>(count);
    return std::make_pair(Tmp, std::move(Tmp));
  }
};

struct InequalHalfway {
  template <typename T>
  static std::pair<std::vector<T>, std::vector<T>> Gen(size_t count) {
    auto V0 = getVectorOfRandomNumbers<T>(count);
    auto V1 = V0;
    V1[V1.size() / size_t(2)]++;  // just change the value.
    return std::make_pair(std::move(V0), std::move(V1));
  }
};

template <class T, class Gen>
void BM_bcmp(benchmark::State& state) {
  const size_t Length = state.range(0);

  const std::pair<std::vector<T>, std::vector<T>> Data =
      Gen::template Gen<T>(Length);
  const std::vector<T>& a = Data.first;
  const std::vector<T>& b = Data.second;
  assert(a.size() == Length && b.size() == a.size());

  benchmark::ClobberMemory();
  benchmark::DoNotOptimize(a);
  benchmark::DoNotOptimize(a.data());
  benchmark::DoNotOptimize(b);
  benchmark::DoNotOptimize(b.data());

  for (auto _ : state) {
    const bool is_equal = equal(a.data(), a.data() + a.size(), b.data());
    benchmark::DoNotOptimize(is_equal);
  }
  state.SetComplexityN(Length);
  state.counters["eltcnt"] =
      benchmark::Counter(Length, benchmark::Counter::kIsIterationInvariant);
  state.counters["eltcnt/sec"] =
      benchmark::Counter(Length, benchmark::Counter::kIsIterationInvariantRate);
  const size_t BytesRead = 2 * sizeof(T) * Length;
  state.counters["bytes_read/iteration"] =
      benchmark::Counter(BytesRead, benchmark::Counter::kDefaults,
                         benchmark::Counter::OneK::kIs1024);
  state.counters["bytes_read/sec"] = benchmark::Counter(
      BytesRead, benchmark::Counter::kIsIterationInvariantRate,
      benchmark::Counter::OneK::kIs1024);
}

template <typename T>
static void CustomArguments(benchmark::internal::Benchmark* b) {
  const size_t L2SizeBytes = []() {
    for (const benchmark::CPUInfo::CacheInfo& I :
         benchmark::CPUInfo::Get().caches) {
      if (I.level == 2) return I.size;
    }
    return 0;
  }();
  // What is the largest range we can check to always fit within given L2 cache?
  const size_t MaxLen = L2SizeBytes / /*total bufs*/ 2 /
                        /*maximal elt size*/ sizeof(T) / /*safety margin*/ 2;
  b->RangeMultiplier(2)->Range(1, MaxLen)->Complexity(benchmark::oN);
}

BENCHMARK_TEMPLATE(BM_bcmp, uint8_t, Identical)
    ->Apply(CustomArguments<uint8_t>);
BENCHMARK_TEMPLATE(BM_bcmp, uint16_t, Identical)
    ->Apply(CustomArguments<uint16_t>);
BENCHMARK_TEMPLATE(BM_bcmp, uint32_t, Identical)
    ->Apply(CustomArguments<uint32_t>);
BENCHMARK_TEMPLATE(BM_bcmp, uint64_t, Identical)
    ->Apply(CustomArguments<uint64_t>);

BENCHMARK_TEMPLATE(BM_bcmp, uint8_t, InequalHalfway)
    ->Apply(CustomArguments<uint8_t>);
BENCHMARK_TEMPLATE(BM_bcmp, uint16_t, InequalHalfway)
    ->Apply(CustomArguments<uint16_t>);
BENCHMARK_TEMPLATE(BM_bcmp, uint32_t, InequalHalfway)
    ->Apply(CustomArguments<uint32_t>);
BENCHMARK_TEMPLATE(BM_bcmp, uint64_t, InequalHalfway)
    ->Apply(CustomArguments<uint64_t>);
```
{F8768210}
```
$ ~/src/googlebenchmark/tools/compare.py --no-utest benchmarks build-{old,new}/test/llvm-bcmp-bench
RUNNING: build-old/test/llvm-bcmp-bench --benchmark_out=/tmp/tmpb6PEUx
2019-04-25 21:17:11
Running build-old/test/llvm-bcmp-bench
Run on (8 X 4000 MHz CPU s)
CPU Caches:
  L1 Data 16K (x8)
  L1 Instruction 64K (x4)
  L2 Unified 2048K (x4)
  L3 Unified 8192K (x1)
Load Average: 0.65, 3.90, 4.14
---------------------------------------------------------------------------------------------------
Benchmark                                         Time             CPU   Iterations UserCounters...
---------------------------------------------------------------------------------------------------
<...>
BM_bcmp<uint8_t, Identical>/512000           432131 ns       432101 ns         1613 bytes_read/iteration=1000k bytes_read/sec=2.20706G/s eltcnt=825.856M eltcnt/sec=1.18491G/s
BM_bcmp<uint8_t, Identical>_BigO               0.86 N          0.86 N
BM_bcmp<uint8_t, Identical>_RMS                   8 %             8 %
<...>
BM_bcmp<uint16_t, Identical>/256000          161408 ns       161409 ns         4027 bytes_read/iteration=1000k bytes_read/sec=5.90843G/s eltcnt=1030.91M eltcnt/sec=1.58603G/s
BM_bcmp<uint16_t, Identical>_BigO              0.67 N          0.67 N
BM_bcmp<uint16_t, Identical>_RMS                 25 %            25 %
<...>
BM_bcmp<uint32_t, Identical>/128000           81497 ns        81488 ns         8415 bytes_read/iteration=1000k bytes_read/sec=11.7032G/s eltcnt=1077.12M eltcnt/sec=1.57078G/s
BM_bcmp<uint32_t, Identical>_BigO              0.71 N          0.71 N
BM_bcmp<uint32_t, Identical>_RMS                 42 %            42 %
<...>
BM_bcmp<uint64_t, Identical>/64000            50138 ns        50138 ns        10909 bytes_read/iteration=1000k bytes_read/sec=19.0209G/s eltcnt=698.176M eltcnt/sec=1.27647G/s
BM_bcmp<uint64_t, Identical>_BigO              0.84 N          0.84 N
BM_bcmp<uint64_t, Identical>_RMS                 27 %            27 %
<...>
BM_bcmp<uint8_t, InequalHalfway>/512000      192405 ns       192392 ns         3638 bytes_read/iteration=1000k bytes_read/sec=4.95694G/s eltcnt=1.86266G eltcnt/sec=2.66124G/s
BM_bcmp<uint8_t, InequalHalfway>_BigO          0.38 N          0.38 N
BM_bcmp<uint8_t, InequalHalfway>_RMS              3 %             3 %
<...>
BM_bcmp<uint16_t, InequalHalfway>/256000     127858 ns       127860 ns         5477 bytes_read/iteration=1000k bytes_read/sec=7.45873G/s eltcnt=1.40211G eltcnt/sec=2.00219G/s
BM_bcmp<uint16_t, InequalHalfway>_BigO         0.50 N          0.50 N
BM_bcmp<uint16_t, InequalHalfway>_RMS             0 %             0 %
<...>
BM_bcmp<uint32_t, InequalHalfway>/128000      49140 ns        49140 ns        14281 bytes_read/iteration=1000k bytes_read/sec=19.4072G/s eltcnt=1.82797G eltcnt/sec=2.60478G/s
BM_bcmp<uint32_t, InequalHalfway>_BigO         0.40 N          0.40 N
BM_bcmp<uint32_t, InequalHalfway>_RMS            18 %            18 %
<...>
BM_bcmp<uint64_t, InequalHalfway>/64000       32101 ns        32099 ns        21786 bytes_read/iteration=1000k bytes_read/sec=29.7101G/s eltcnt=1.3943G eltcnt/sec=1.99381G/s
BM_bcmp<uint64_t, InequalHalfway>_BigO         0.50 N          0.50 N
BM_bcmp<uint64_t, InequalHalfway>_RMS             1 %             1 %
RUNNING: build-new/test/llvm-bcmp-bench --benchmark_out=/tmp/tmpQ46PP0
2019-04-25 21:19:29
Running build-new/test/llvm-bcmp-bench
Run on (8 X 4000 MHz CPU s)
CPU Caches:
  L1 Data 16K (x8)
  L1 Instruction 64K (x4)
  L2 Unified 2048K (x4)
  L3 Unified 8192K (x1)
Load Average: 1.01, 2.85, 3.71
---------------------------------------------------------------------------------------------------
Benchmark                                         Time             CPU   Iterations UserCounters...
---------------------------------------------------------------------------------------------------
<...>
BM_bcmp<uint8_t, Identical>/512000            18593 ns        18590 ns        37565 bytes_read/iteration=1000k bytes_read/sec=51.2991G/s eltcnt=19.2333G eltcnt/sec=27.541G/s
BM_bcmp<uint8_t, Identical>_BigO               0.04 N          0.04 N
BM_bcmp<uint8_t, Identical>_RMS                  37 %            37 %
<...>
BM_bcmp<uint16_t, Identical>/256000           18950 ns        18948 ns        37223 bytes_read/iteration=1000k bytes_read/sec=50.3324G/s eltcnt=9.52909G eltcnt/sec=13.511G/s
BM_bcmp<uint16_t, Identical>_BigO              0.08 N          0.08 N
BM_bcmp<uint16_t, Identical>_RMS                 34 %            34 %
<...>
BM_bcmp<uint32_t, Identical>/128000           18627 ns        18627 ns        37895 bytes_read/iteration=1000k bytes_read/sec=51.198G/s eltcnt=4.85056G eltcnt/sec=6.87168G/s
BM_bcmp<uint32_t, Identical>_BigO              0.16 N          0.16 N
BM_bcmp<uint32_t, Identical>_RMS                 35 %            35 %
<...>
BM_bcmp<uint64_t, Identical>/64000            18855 ns        18855 ns        37458 bytes_read/iteration=1000k bytes_read/sec=50.5791G/s eltcnt=2.39731G eltcnt/sec=3.3943G/s
BM_bcmp<uint64_t, Identical>_BigO              0.32 N          0.32 N
BM_bcmp<uint64_t, Identical>_RMS                 33 %            33 %
<...>
BM_bcmp<uint8_t, InequalHalfway>/512000        9570 ns         9569 ns        73500 bytes_read/iteration=1000k bytes_read/sec=99.6601G/s eltcnt=37.632G eltcnt/sec=53.5046G/s
BM_bcmp<uint8_t, InequalHalfway>_BigO          0.02 N          0.02 N
BM_bcmp<uint8_t, InequalHalfway>_RMS             29 %            29 %
<...>
BM_bcmp<uint16_t, InequalHalfway>/256000       9547 ns         9547 ns        74343 bytes_read/iteration=1000k bytes_read/sec=99.8971G/s eltcnt=19.0318G eltcnt/sec=26.8159G/s
BM_bcmp<uint16_t, InequalHalfway>_BigO         0.04 N          0.04 N
BM_bcmp<uint16_t, InequalHalfway>_RMS            29 %            29 %
<...>
BM_bcmp<uint32_t, InequalHalfway>/128000       9396 ns         9394 ns        73521 bytes_read/iteration=1000k bytes_read/sec=101.518G/s eltcnt=9.41069G eltcnt/sec=13.6255G/s
BM_bcmp<uint32_t, InequalHalfway>_BigO         0.08 N          0.08 N
BM_bcmp<uint32_t, InequalHalfway>_RMS            30 %            30 %
<...>
BM_bcmp<uint64_t, InequalHalfway>/64000        9499 ns         9498 ns        73802 bytes_read/iteration=1000k bytes_read/sec=100.405G/s eltcnt=4.72333G eltcnt/sec=6.73808G/s
BM_bcmp<uint64_t, InequalHalfway>_BigO         0.16 N          0.16 N
BM_bcmp<uint64_t, InequalHalfway>_RMS            28 %            28 %
Comparing build-old/test/llvm-bcmp-bench to build-new/test/llvm-bcmp-bench
Benchmark                                                  Time             CPU      Time Old      Time New       CPU Old       CPU New
---------------------------------------------------------------------------------------------------------------------------------------
<...>
BM_bcmp<uint8_t, Identical>/512000                      -0.9570         -0.9570        432131         18593        432101         18590
<...>
BM_bcmp<uint16_t, Identical>/256000                     -0.8826         -0.8826        161408         18950        161409         18948
<...>
BM_bcmp<uint32_t, Identical>/128000                     -0.7714         -0.7714         81497         18627         81488         18627
<...>
BM_bcmp<uint64_t, Identical>/64000                      -0.6239         -0.6239         50138         18855         50138         18855
<...>
BM_bcmp<uint8_t, InequalHalfway>/512000                 -0.9503         -0.9503        192405          9570        192392          9569
<...>
BM_bcmp<uint16_t, InequalHalfway>/256000                -0.9253         -0.9253        127858          9547        127860          9547
<...>
BM_bcmp<uint32_t, InequalHalfway>/128000                -0.8088         -0.8088         49140          9396         49140          9394
<...>
BM_bcmp<uint64_t, InequalHalfway>/64000                 -0.7041         -0.7041         32101          9499         32099          9498
```

What can we tell from the benchmark?
* Performance of naive equality check somewhat improves with element size,
  maxing out at eltcnt/sec=1.58603G/s for uint16_t, or bytes_read/sec=19.0209G/s
  for uint64_t. I think, that instability implies performance problems.
* Performance of `memcmp()`-aware benchmark always maxes out at around
  bytes_read/sec=51.2991G/s for every type. That is 2.6x the throughput of the
  naive variant!
* eltcnt/sec metric for the `memcmp()`-aware benchmark maxes out at
  eltcnt/sec=27.541G/s for uint8_t (was: eltcnt/sec=1.18491G/s, so 24x) and
  linearly decreases with element size.
  For uint64_t, it's ~4x+ the elements/second.
* The call obvious is more pricey than the loop, with small element count.
  As it can be seen from the full output {F8768210}, the `memcmp()` is almost
  universally worse, independent of the element size (and thus buffer size) when
  element count is less than 8.

So all in all, bcmp idiom does indeed pose untapped performance headroom.
This diff does implement said idiom recognition. I think a reasonable test
coverage is present, but do tell if there is anything obvious missing.

Now, quality. This does succeed to build and pass the test-suite, at least
without any non-bundled elements. {F8768216} {F8768217}
This transform fires 91 times:
```
$ /build/test-suite/utils/compare.py -m loop-idiom.NumBCmp result-new.json
Tests: 1149
Metric: loop-idiom.NumBCmp

Program                                         result-new

MultiSourc...Benchmarks/7zip/7zip-benchmark    79.00
MultiSource/Applications/d/make_dparser         3.00
SingleSource/UnitTests/vla                      2.00
MultiSource/Applications/Burg/burg              1.00
MultiSourc.../Applications/JM/lencod/lencod     1.00
MultiSource/Applications/lemon/lemon            1.00
MultiSource/Benchmarks/Bullet/bullet            1.00
MultiSourc...e/Benchmarks/MallocBench/gs/gs     1.00
MultiSourc...gs-C/TimberWolfMC/timberwolfmc     1.00
MultiSourc...Prolangs-C/simulator/simulator     1.00
```
The size changes are:
I'm not sure what's going on with SingleSource/UnitTests/vla.test yet, did not look.
```
$ /build/test-suite/utils/compare.py -m size..text result-{old,new}.json --filter-hash
Tests: 1149
Same hash: 907 (filtered out)
Remaining: 242
Metric: size..text

Program                                        result-old result-new diff
test-suite...ingleSource/UnitTests/vla.test   753.00     833.00     10.6%
test-suite...marks/7zip/7zip-benchmark.test   1001697.00 966657.00  -3.5%
test-suite...ngs-C/simulator/simulator.test   32369.00   32321.00   -0.1%
test-suite...plications/d/make_dparser.test   89585.00   89505.00   -0.1%
test-suite...ce/Applications/Burg/burg.test   40817.00   40785.00   -0.1%
test-suite.../Applications/lemon/lemon.test   47281.00   47249.00   -0.1%
test-suite...TimberWolfMC/timberwolfmc.test   250065.00  250113.00   0.0%
test-suite...chmarks/MallocBench/gs/gs.test   149889.00  149873.00  -0.0%
test-suite...ications/JM/lencod/lencod.test   769585.00  769569.00  -0.0%
test-suite.../Benchmarks/Bullet/bullet.test   770049.00  770049.00   0.0%
test-suite...HMARK_ANISTROPIC_DIFFUSION/128    NaN        NaN        nan%
test-suite...HMARK_ANISTROPIC_DIFFUSION/256    NaN        NaN        nan%
test-suite...CHMARK_ANISTROPIC_DIFFUSION/64    NaN        NaN        nan%
test-suite...CHMARK_ANISTROPIC_DIFFUSION/32    NaN        NaN        nan%
test-suite...ENCHMARK_BILATERAL_FILTER/64/4    NaN        NaN        nan%
Geomean difference                                                   nan%
         result-old    result-new       diff
count  1.000000e+01  10.00000      10.000000
mean   3.152090e+05  311695.40000  0.006749
std    3.790398e+05  372091.42232  0.036605
min    7.530000e+02  833.00000    -0.034981
25%    4.243300e+04  42401.00000  -0.000866
50%    1.197370e+05  119689.00000 -0.000392
75%    6.397050e+05  639705.00000 -0.000005
max    1.001697e+06  966657.00000  0.106242
```

I don't have timings though.

And now to the code. The basic idea is to completely replace the whole loop.
If we can't fully kill it, don't transform.
I have left one or two comments in the code, so hopefully it can be understood.

Also, there is a few TODO's that i have left for follow-ups:
* widening of `memcmp()`/`bcmp()`
* step smaller than the comparison size
* Metadata propagation
* more than two blocks as long as there is still a single backedge?
* ???

Reviewers: reames, fhahn, mkazantsev, chandlerc, craig.topper, courbet

Reviewed By: courbet

Subscribers: hiraditya, xbolva00, nikic, jfb, gchatelet, courbet, llvm-commits, mclow.lists

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61144

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370454 91177308-0d34-0410-b5e6-96231b3b80d8

[NFC] SCEVExpander: add SetCurrentDebugLocation() / getCurrentDebugLocation() wrappers

Summary:
The internal `Builder` is private, which means there is
currently no way to set the debuginfo locations for `SCEVExpander`.
This only adds the wrappers, but does not use them anywhere.

Reviewers: mkazantsev, sanjoy, gberry, jyknight, dneilson

Reviewed By: sanjoy

Subscribers: javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61007

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370453 91177308-0d34-0410-b5e6-96231b3b80d8

[LiveDebugValues] Insert entry values after bundles

Summary:
Change LiveDebugValues so that it inserts entry values after the bundle
which contains the clobbering instruction. Previously it would insert
the debug value after the bundle head using insertAfter(), breaking the
bundle.

Reviewers: djtodoro, NikolaPrica, aprantl, vsk

Reviewed By: vsk

Subscribers: hiraditya, llvm-commits

Tags: #debug-info, #llvm

Differential Revision: https://reviews.llvm.org/D66888

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370448 91177308-0d34-0410-b5e6-96231b3b80d8

vim: add `immarg` keyword

The `immarg` attribute was added in r355981.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370443 91177308-0d34-0410-b5e6-96231b3b80d8

gn build: Merge r370441

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370442 91177308-0d34-0410-b5e6-96231b3b80d8

[ADT] Removed VariadicFunction

Summary:
It is not used. It uses macro-based unrolling instead of variadic
templates, so it is not idiomatic anymore, and therefore it is a
questionable API to keep "just in case".

Subscribers: mgorny, dmgreen, dexonsmith, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66961

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370441 91177308-0d34-0410-b5e6-96231b3b80d8

[LLD] [COFF] Support merging resource object files

Extend WindowsResourceParser to support using a ResourceSectionRef for
loading resources from an object file.

Only allow merging resource object files in mingw mode; keep the
existing error on multiple resource objects in link mode.

If there only is one resource object file and no .res resources,
don't parse and recreate the .rsrc section, but just link it in without
inspecting it. This allows users to produce any .rsrc section (outside
of what the parser supports), just like before. (I don't have a specific
need for this, but it reduces the risk of this new feature.)

Separate out the .rsrc section chunks in InputFiles.cpp, and only include
them in the list of section chunks to link if we've determined that there
only was one single resource object. (We need to keep other chunks from
those object files, as they can legitimately contain other sections as
well, in addition to .rsrc section chunks.)

Differential Revision: https://reviews.llvm.org/D66824

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370436 91177308-0d34-0410-b5e6-96231b3b80d8

[WindowsResource] Remove use of global variables in WindowsResourceParser

Instead of updating a global variable counter for the next index of
strings and data blobs, pass along a reference to actual data/string
vectors and let the TreeNode insertion methods add their data/strings to
the vectors when a new entry is needed.

Additionally, if the resource tree had duplicates, that were ignored
with -force:multipleres in lld, we no longer store all versions of the
duplicated resource data, now we only keep the one that actually ends
up referenced.

Differential Revision: https://reviews.llvm.org/D66823

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370435 91177308-0d34-0410-b5e6-96231b3b80d8

[WindowsResource] Avoid duplicating the input filenames for each resource. NFC.

Differential Revision: https://reviews.llvm.org/D66821

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370434 91177308-0d34-0410-b5e6-96231b3b80d8

[COFF] Add a ResourceSectionRef method for getting resource contents

This allows llvm-readobj to print the contents of each resource
when printing resources from an object file or executable, like it
already does for plain .res files.

This requires providing the whole COFFObjectFile to ResourceSectionRef.

This supports both object files and executables. For executables,
the DataRVA field is used as is to look up the right section.

For object files, ideally we would need to complete linking of them
and fix up all relocations to know what the DataRVA field would end up
being. In practice, the only thing that makes sense for an RVA field
is an ADDR32NB relocation. Thus, find a relocation pointing at this
field, verify that it has the expected type, locate the symbol it
points at, look up the section the symbol points at, and read from the
right offset in that section.

This works both for GNU windres object files (which use one single
.rsrc section, with all relocations against the base of the .rsrc
section, with the original value of the DataRVA field being the
offset of the data from the beginning of the .rsrc section) and
cvtres object files (with two separate .rsrc$01 and .rsrc$02 sections,
and one symbol per data entry, with the original pre-relocated DataRVA
field being set to zero).

Differential Revision: https://reviews.llvm.org/D66820

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370433 91177308-0d34-0410-b5e6-96231b3b80d8

[MIPS GlobalISel] Lower uitofp

Add custom lowering for G_UITOFP for MIPS32.

Differential Revision: https://reviews.llvm.org/D66930

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370432 91177308-0d34-0410-b5e6-96231b3b80d8

[MIPS GlobalISel] Lower fptoui

Add lower for G_FPTOUI. Algorithm is similar to the SDAG version
in TargetLowering::expandFP_TO_UINT.
Lower G_FPTOUI for MIPS32.

Differential Revision: https://reviews.llvm.org/D66929

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370431 91177308-0d34-0410-b5e6-96231b3b80d8