granicus.if.org Git

[DebugInfo] Apply subprogram attributes on behalf of owner CU

When using full LTO it is possible that template function definition DIE
is bound to one compilation unit and it's declaration to another. We should
add function declaration attributes on behalf of its owner CU otherwise
we may end up with malformed file identifier in function declaration
DW_AT_decl_file attribute.

Differential revision: https://reviews.llvm.org/D58538

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354978 91177308-0d34-0410-b5e6-96231b3b80d8

[AMDGPU][MC] Added register size check for VOP3/SDWA/DPP operands

See bug 37943: https://bugs.llvm.org/show_bug.cgi?id=37943

Reviewers: artem.tamazov, arsenm, rampitec

Differential Revision: https://reviews.llvm.org/D58287

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354974 91177308-0d34-0410-b5e6-96231b3b80d8

[DebugInfo] add SectionedAddress to DebugInfo interfaces.

      That patch is the fix for https://bugs.llvm.org/show_bug.cgi?id=40703
   "wrong line number info for obj file compiled with -ffunction-sections"
   bug. The problem happened with only .o files. If object file contains
   several .text sections then line number information showed incorrectly.
   The reason for this is that DwarfLineTable could not detect section which
   corresponds to specified address(because address is the local to the
   section). And as the result it could not select proper sequence in the
   line table. The fix is to pass SectionIndex with the address. So that it
   would be possible to differentiate addresses from various sections. With
   this fix llvm-objdump shows correct line numbers for disassembled code.

   Differential review: https://reviews.llvm.org/D58194

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354972 91177308-0d34-0410-b5e6-96231b3b80d8

[AMDGPU][MC][GFX8+] Added syntactic sugar for 'vgpr index' operand of instructions s_set_gpr_idx_on and s_set_gpr_idx_mode

See bug 39331: https://bugs.llvm.org/show_bug.cgi?id=39331

Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D58288

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354969 91177308-0d34-0410-b5e6-96231b3b80d8

[llvm-objcopy] - Check for invalidated relocations when removing a section.

This is https://bugs.llvm.org/show_bug.cgi?id=40818

Removing a section that is used by relocation is an error
we did not report. The patch fixes that.

Differential revision: https://reviews.llvm.org/D58625

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354962 91177308-0d34-0410-b5e6-96231b3b80d8

[X86][AVX] Only combine loads to broadcasts for legal types

Thanks to @echristo for spotting this.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354961 91177308-0d34-0410-b5e6-96231b3b80d8

[llvm-readobj]Fix error messages for bad archive members and add testing for archive handling

llvm-readobj's error messages were broken for bad archive members. This
patch fixes them, and also adds testing for archive and thin archive
handling within llvm-readobj.

Reviewed by: rupprecht, grimar, higuoxing

Differential Revision: https://reviews.llvm.org/D58681

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354960 91177308-0d34-0410-b5e6-96231b3b80d8

Fix Wenum-compare gcc7 warning. NFCI.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354958 91177308-0d34-0410-b5e6-96231b3b80d8

[llvm-readobj] Print DF_1_DISPRELPND

The test will be added by D58677.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354955 91177308-0d34-0410-b5e6-96231b3b80d8

[BPF] Don't fail for static variables

Currently, the LLVM will print an error like
  Unsupported relocation: try to compile with -O2 or above,
  or check your static variable usage
if user defines more than one static variables in a single
ELF section (e.g., .bss or .data).

There is ongoing effort to support static and global
variables in libbpf and kernel. This patch removed the
assertion so user programs with static variables won't
fail compilation.

The static variable in-section offset is written to
the "imm" field of the corresponding to-be-relocated
bpf instruction. Below is an example to show how the
application (e.g., libbpf) can relate variable to relocations.

  -bash-4.4$ cat g1.c
  static volatile long a = 2;
  static volatile int b = 3;
  int test() { return a + b; }
  -bash-4.4$ clang -target bpf -O2 -c g1.c
  -bash-4.4$ llvm-readelf -r g1.o

  Relocation section '.rel.text' at offset 0x158 contains 2 entries:
      Offset             Info             Type               Symbol's Value  Symbol's Name
  0000000000000000  0000000400000001 R_BPF_64_64            0000000000000000 .data
  0000000000000018  0000000400000001 R_BPF_64_64            0000000000000000 .data
  -bash-4.4$ llvm-readelf -s g1.o

  Symbol table '.symtab' contains 6 entries:
     Num:    Value          Size Type    Bind   Vis      Ndx Name
       0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
       1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS g1.c
       2: 0000000000000000     8 OBJECT  LOCAL  DEFAULT    4 a
       3: 0000000000000008     4 OBJECT  LOCAL  DEFAULT    4 b
       4: 0000000000000000     0 SECTION LOCAL  DEFAULT    4
       5: 0000000000000000    64 FUNC    GLOBAL DEFAULT    2 test
  -bash-4.4$ llvm-objdump -d g1.o

  g1.o:   file format ELF64-BPF

  Disassembly of section .text:
  0000000000000000 test:
       0:       18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00         r1 = 0 ll
       2:       79 11 00 00 00 00 00 00         r1 = *(u64 *)(r1 + 0)
       3:       18 02 00 00 08 00 00 00 00 00 00 00 00 00 00 00         r2 = 8 ll
       5:       61 20 00 00 00 00 00 00         r0 = *(u32 *)(r2 + 0)
       6:       0f 10 00 00 00 00 00 00         r0 += r1
       7:       95 00 00 00 00 00 00 00         exit
  -bash-4.4$

  . from symbol table, static variable "a" is in section #4, offset 0.
  . from symbol table, static variable "b" is in section #4, offset 8.
  . the first relocation is against symbol #4:
    4: 0000000000000000     0 SECTION LOCAL  DEFAULT    4
    and in-section offset 0 (see llvm-objdump result)
  . the second relocation is against symbol #4:
    4: 0000000000000000     0 SECTION LOCAL  DEFAULT    4
    and in-section offset 8 (see llvm-objdump result)
  . therefore, the first relocation is for variable "a", and
    the second relocation is for variable "b".

Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354954 91177308-0d34-0410-b5e6-96231b3b80d8

Revert "[PGO] Context sensitive PGO (part 1)"

This reverts commit r354930, it was causing UBSan failures.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354953 91177308-0d34-0410-b5e6-96231b3b80d8

Support: enable backtraces on Windows

Some platforms, e.g. Windows, support backtraces but don't have
BACKTRACE. Checking for BACKTRACE prevents Windows from having
backtraces.

Patch by Jason Mittertreiner!

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354951 91177308-0d34-0410-b5e6-96231b3b80d8

[WebAssembly] Fix ScopeTops info in CFGStackify for EH pads

Summary:
When creating `ScopeTops` info for `try` ~ `catch` ~ `end_try`, we
should create not only `end_try` -> `try` mapping but also `catch` ->
`try` mapping as well. If this is not created, `block` and `end_block`
markers later added may span across an existing `catch`, resulting in
the incorrect code like:
```
try
  block     --|  (X)
catch         |
  end_block --|
end_try
```

Reviewers: dschuff

Subscribers: sunfish, sbc100, jgravelle-google, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58605

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354945 91177308-0d34-0410-b5e6-96231b3b80d8

[DWARFFormValue] Cleanup DWARFFormValue interface. (NFC)

DWARFFormValues can be created from a data extractor or by passing its
value directly. Until now this was done by member functions that
modified an existing object's internal state. This patch replaces a
subset of these methods with static method that return a new
DWARFFormValue.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354941 91177308-0d34-0410-b5e6-96231b3b80d8

[WebAssembly] Remove unnecessary instructions after TRY marker placement

Summary:
This removes unnecessary instructions after TRY marker placement. There
are two cases:
- `end`/`end_block` can be removed if they overlap with `try`/`end_try`
and they have the same return types.
- `br` right before `catch` that branches to after `end_try` can be
deleted.

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58591

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354939 91177308-0d34-0410-b5e6-96231b3b80d8

[SystemZ] Pass regalloc hints to help Load-and-Test transformations.

Since there is no "Load-and-Test-High" instruction, the 32 bit load of a
register to be compared with 0 can only be implemented with LT if the virtual
GRX32 register ends up in a low part (GR32 register).

This patch detects these cases and passes the GR32 registers (low parts) as
(soft) hints in getRegAllocationHints().

Review: Ulrich Weigand.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354935 91177308-0d34-0410-b5e6-96231b3b80d8

vim: `swiftself` is an attribute

Highlight the `swiftself` attribute on parameters.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354934 91177308-0d34-0410-b5e6-96231b3b80d8

[HotColdSplit] Disable splitting for sanitized functions

Splitting can make sanitizer errors harder to understand, as the
trapping instruction may not be in the function where the bug was
detected.

rdar://48142697

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354931 91177308-0d34-0410-b5e6-96231b3b80d8

[PGO] Context sensitive PGO (part 1)

Current PGO profile counts are not context sensitive. The branch probabilities
for the inlined functions are kept the same for all call-sites, and they might
be very different from the actual branch probabilities. These suboptimal
profiles can greatly affect some downstream optimizations, in particular for
the machine basic block placement optimization.

In this patch, we propose to have a post-inline PGO instrumentation/use pass,
which we called Context Sensitive PGO (CSPGO). For the users who want the best
possible performance, they can perform a second round of PGO instrument/use on
the top of the regular PGO. They will have two sets of profile counts. The
first pass profile will be manly for inline, indirect-call promotion, and
CGSCC simplification pass optimizations. The second pass profile is for
post-inline optimizations and code-gen optimizations.

A typical usage:
// Regular PGO instrumentation and generate pass1 profile.
> clang -O2 -fprofile-generate source.c -o gen
> ./gen
> llvm-profdata merge default.*profraw -o pass1.profdata
// CSPGO instrumentation.
> clang -O2 -fprofile-use=pass1.profdata -fcs-profile-generate -o gen2
> ./gen2
// Merge two sets of profiles
> llvm-profdata merge default.*profraw pass1.profdata -o profile.profdata
// Use the combined profile. Pass manager will invoke two PGO use passes.
> clang -O2 -fprofile-use=profile.profdata -o use

This change touches many components in the compiler. The reviewed patch
(D54175) will committed in phrases.

Differential Revision: https://reviews.llvm.org/D54175

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354930 91177308-0d34-0410-b5e6-96231b3b80d8

[AMDGPU] Fixed hang during DAG combine

SITargetLowering::reassociateScalarOps() does not touch constants
so that DAGCombiner::ReassociateOps() does not revert the combine.
However a global address is not a ConstantSDNode.

Switched to the method used by DAGCombiner::ReassociateOps() itself
to detect constants.

Differential Revision: https://reviews.llvm.org/D58695

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354926 91177308-0d34-0410-b5e6-96231b3b80d8

Fix a small comment typo.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354923 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Fix bug in vectorcall calling convention

Original implementation can't correctly handle __m256 and __m512 types
passed by reference through stack. This patch fixes it.

Patch by Wei Xiao!

Differential Revision: https://reviews.llvm.org/D57643

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354921 91177308-0d34-0410-b5e6-96231b3b80d8

[MemorySSA & SimpleLoopUnswitch] Update MemorySSA in ReplaceUsesOfWith.

SimpleLoopUnswitch must update MemorySSA when removing instructions.
Resolves PR39197.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354919 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Use X86_CPU_SUBTYPE_COMPAT for 'cascadelake' cpu.

This CPU is supported by at least libgcc trunk now so we should make it available to __builtin_cpu_is.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354913 91177308-0d34-0410-b5e6-96231b3b80d8

[lit] Allow setting parallelism groups to None

Check that we do not crash if a parallelism group is explicitly set to
None. Permits usage of the following pattern.

[lit.common.cfg]
  lit_config.parallelism_groups['my_group'] = None
  if <condition>:
    lit_config.parallelism_groups['my_group'] = 3

[project/lit.cfg]
  config.parallelism_group = 'my_group'

Reviewers: rnk

Differential Revision: https://reviews.llvm.org/D58305

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354912 91177308-0d34-0410-b5e6-96231b3b80d8

Update docs of memcpy/move/set wrt. align and len

Fix https://bugs.llvm.org/show_bug.cgi?id=38583: Describe
how memcpy/memmove/memset behave when len=0. Also fix
some fallout from when the alignment parameter was
replaced by an attribute.

This closes PR38583.

Patch by RalfJung (Ralf)

Differential Revision: https://reviews.llvm.org/D57600

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354911 91177308-0d34-0410-b5e6-96231b3b80d8

[TableGen] Make OpcodeMappings sort comparator deterministic NFCI

The previous sort comparator was not deterministic, i.e. in some
situations it would be possible for lhs < rhs && rhs < lhs. This was
discovered by an STL assertion in a Windows debug build of llvm-tblgen.

Differential Revision: https://reviews.llvm.org/D58687

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354910 91177308-0d34-0410-b5e6-96231b3b80d8

[InstSimplify] remove zero-shift-guard fold for general funnel shift

As discussed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2019-February/130491.html

We can't remove the compare+select in the general case because
we are treating funnel shift like a standard instruction (as
opposed to a special instruction like select/phi).

That means that if one of the operands of the funnel shift is
poison, the result is poison regardless of whether we know that
the operand is actually unused based on the instruction's
particular semantics.

The motivating case for this transform is the more specific
rotate op (rather than funnel shift), and we are preserving the
fold for that case because there is no chance of introducing
extra poison when there is no anonymous extra operand to the
funnel shift.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354905 91177308-0d34-0410-b5e6-96231b3b80d8

[MIPS GlobalISel] Select G_UADDO

Lower G_UADDO.
Legalize G_UADDO for MIPS32

Differential Revision: https://reviews.llvm.org/D58671

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354900 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] AMD znver2 enablement

This patch enables the following

1) AMD family 17h "znver2" tune flag (-march, -mcpu).
2) ISAs that are enabled for "znver2" architecture.
3) For the time being, it uses the znver1 scheduler model.
4) Tests are updated.
5) Scheduler descriptions are yet to be put in place.

Reviewers: craig.topper

Differential Revision: https://reviews.llvm.org/D58343

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354897 91177308-0d34-0410-b5e6-96231b3b80d8

[SystemZ] Wait with selection of legal vector/FP constants until Select().

This patch aims to make sure that any such constant that can be generated
with a vector instruction (for example VGBM) is recognized as such during
legalization and kept as a target independent node through post-legalize
DAGCombining.

Two new functions named isVectorConstantLegal() and loadVectorConstant()
replace old ways of handling vector/FP constants.

A new struct named SystemZVectorConstantInfo is used to cache the results of
isVectorConstantLegal() and pass them onto loadVectorConstant().

Support for fp128 constants in the presence of FeatureVectorEnhancements1
(z14) has been added.

Review: Ulrich Weigand
https://reviews.llvm.org/D58270

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354896 91177308-0d34-0410-b5e6-96231b3b80d8

[InstSimplify] add tests for rotate; NFC

Rotate is a special-case of funnel shift that has different
poison constraints than the general case. That's not visible
yet in the existing tests, but it needs to be corrected.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354894 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] remove duplicate (but not updated) tests; NFC

Not sure how it happened, but rL354886 was a duplicate of rL354881,
but not updated with rL354887.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354889 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] canonicalize more unsigned saturated add with 'not'

Yet another pattern variation suggested by:
https://bugs.llvm.org/show_bug.cgi?id=14613

There are 8 more potential commuted patterns here on top of the
8 that were already handled (rL354221, rL354276, rL354393).
We have the obvious commute of the 'add' + commute of the cmp
predicate/operands (ugt/ult) + commute of the select operands:

Name: base
%notx = xor i32 %x, -1
%a = add i32 %notx, %y
%c = icmp ult i32 %x, %y
%r = select i1 %c, i32 -1, i32 %a
=>
%c2 = icmp ult i32 %a, %y
%r = select i1 %c2, i32 -1, i32 %a

Name: ugt
%notx = xor i32 %x, -1
%a = add i32 %notx, %y
%c = icmp ugt i32 %y, %x
%r = select i1 %c, i32 -1, i32 %a
=>
%c2 = icmp ult i32 %a, %y
%r = select i1 %c2, i32 -1, i32 %a

Name: commute select
%notx = xor i32 %x, -1
%a = add i32 %notx, %y
%c = icmp ult i32 %y, %x
%r = select i1 %c, i32 %a, i32 -1
=>
%c2 = icmp ult i32 %a, %y
%r = select i1 %c2, i32 -1, i32 %a

Name: ugt + commute select
%notx = xor i32 %x, -1
%a = add i32 %notx, %y
%c = icmp ugt i32 %x, %y
%r = select i1 %c, i32 %a, i32 -1
=>
%c2 = icmp ult i32 %a, %y
%r = select i1 %c2, i32 -1, i32 %a

https://rise4fun.com/Alive/den

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354887 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] add more tests for saturated add; NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354886 91177308-0d34-0410-b5e6-96231b3b80d8

[DAG] Fix constant store folding to handle non-byte sizes.

Avoid crashes from zero-byte values due to sub-byte store sizes.

Reviewers: uabelho, courbet, rnk

Reviewed By: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58626

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354884 91177308-0d34-0410-b5e6-96231b3b80d8

[mips] Emit `.module softfloat` directive

This change fixes crash on an assertion in case of using
`soft float` ABI for mips32r6 target.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354882 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] add more tests for saturated add; NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354881 91177308-0d34-0410-b5e6-96231b3b80d8

[MCA] Always check if scheduler resources are unavailable when reporting dispatch stalls.

Dispatch stall cycles may be associated to multiple dispatch stall events.
Before this patch, each stall cycle was associated with a single stall event.
This patch also improves a couple of code comments, and adds a helper method to
query the Scheduler for dispatch stalls.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354877 91177308-0d34-0410-b5e6-96231b3b80d8

[yaml2obj][obj2yaml] - Add support for the architecture specific dynamic tags.

This allows tools to parse/dump the architecture specific tags
like DT_MIPS_*, DT_PPC64_* and DT_HEXAGON_*

Also fixes a bug in DynamicTags.def which was revealed in this patch.

Differential revision: https://reviews.llvm.org/D58667

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354876 91177308-0d34-0410-b5e6-96231b3b80d8

[AArch64] Add arithmetic zext bswap tests.

As requested on D58017.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354872 91177308-0d34-0410-b5e6-96231b3b80d8

[llvm-objdump] Add `Version Definitions` dumper

Summary: `llvm-objdump` needs a `Version Definitions` dumper.

Reviewers: grimar, jhenderson

Reviewed By: grimar, jhenderson

Subscribers: rupprecht, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58615

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354871 91177308-0d34-0410-b5e6-96231b3b80d8

[llvm-objdump] Implement -Mreg-names-raw/-std options.

The --disassembler-options, or -M, are used to customize
the disassembler and affect its output.

The two implemented options allow selecting register names on ARM:
* With -Mreg-names-raw, the disassembler uses rNN for all registers.
* With -Mreg-names-std it prints sp, lr and pc for r13, r14 and r15,
which is the default behavior of llvm-objdump.

Differential Revision: https://reviews.llvm.org/D57680

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354870 91177308-0d34-0410-b5e6-96231b3b80d8

[AArch64] Add 'free' zext bswap tests.

As requested on D58017.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354869 91177308-0d34-0410-b5e6-96231b3b80d8

[ARM] Add Cortex-M35P

- Add LLVM backend support for Cortex-M35P
- Documentation can be found at
https://developer.arm.com/products/processors/cortex-m/cortex-m35p

Differentail Revision: https://reviews.llvm.org/D57763

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354868 91177308-0d34-0410-b5e6-96231b3b80d8

[LegalizeDAG] Use APInt::getSplat helper to create bitreverse masks. NFCI.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354867 91177308-0d34-0410-b5e6-96231b3b80d8

[LegalizeDAG] Expand SADDO/SSUBO using SADDSAT/SSUBSAT (PR37763)

If SADDSAT/SSUBSAT are legal, then we can expand SADDO/SSUBO by performing a ADD/SUB and a SADDO/SSUBO and then compare the results.

I looked at doing this for UADDO/USUBO as well but as we don't have to do as many range comparisons I didn't see any/much benefit.

Differential Revision: https://reviews.llvm.org/D58637

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354866 91177308-0d34-0410-b5e6-96231b3b80d8

[AMDGPU] Regenerate bswap/bitreverse tests.

Make codegen changes more obvious in D58017

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354863 91177308-0d34-0410-b5e6-96231b3b80d8

[llvm-exegesis] Teach llvm-exegesis to handle instructions with multiple tied variables.

Reviewers: gchatelet

Subscribers: tschuett, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58285

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354862 91177308-0d34-0410-b5e6-96231b3b80d8

[llvm-objcopy] Add --set-start, --change-start and --adjust-start

Differential revision: https://reviews.llvm.org/D58173

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354854 91177308-0d34-0410-b5e6-96231b3b80d8

[ThinLTO] Use defined node and edge order when dumping DOT file

Differential revision: https://reviews.llvm.org/D58631

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354850 91177308-0d34-0410-b5e6-96231b3b80d8

Revert "Improve "llvm-nm -f sysv" output for Elf files"

This reverts commit r354833, it was causing ASan test failures on
sanitizer-x86_64-linux-fast.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354849 91177308-0d34-0410-b5e6-96231b3b80d8

[NFC] Add to contributor list.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354847 91177308-0d34-0410-b5e6-96231b3b80d8

[WebAssembly] Properly align fp128 arguments in outgoing varargs arguments

For outgoing varargs arguments, it's necessary to check the OrigAlign field
of the corresponding OutputArg entry to determine argument alignment, rather
than just computing an alignment from the argument value type. This is
because types like fp128 are split into multiple argument values, with
narrower types that don't reflect the ABI alignment of the full fp128.

This fixes the printf("printfL: %4.*Lf\n", 2, lval); testcase.

Differential Revision: https://reviews.llvm.org/D58656

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354846 91177308-0d34-0410-b5e6-96231b3b80d8

[ARM] Be super conservative about atomics

As requested during review of D57601 <https://reviews.llvm.org/D57601> https://reviews.llvm.org/D57601, be equally conservative for atomic MMOs as for volatile MMOs in all in tree backends. At the moment, all atomic MMOs are also volatile, but I'm about to change that.

Differential Revision: https://reviews.llvm.org/D58490

Note: D58498 landed in several pieces as individual backends were approved. This is the last chunk.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354845 91177308-0d34-0410-b5e6-96231b3b80d8

[WebAssembly] Fix a bug deleting instruction in a ranged for loop

Summary: We shouldn't delete elements while iterating a ranged for loop.

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58519

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354844 91177308-0d34-0410-b5e6-96231b3b80d8

[WebAssembly] Improve readability of EH tests

Summary:
- Indent check lines to easily figure out try-catch-end structure
- Add the original C++ code the tests were genereated from
- Add a few more lines to make the structure more readable
- Rename a couple function / structures
- Add label and branch annotations to cfg-stackify-eh.ll
- Temporarily delete check lines for `test1` in `cfg-stackify-eh.ll`
because it will be updated in a later CL soon and there's no point of
making it look better here

Reviewers: dschuff

Subscribers: sunfish, sbc100, jgravelle-google, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58562

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354842 91177308-0d34-0410-b5e6-96231b3b80d8

[CodeView] Emit HasConstructorOrDestructor class option for non-trivial constructors

Reviewers: zturner, rnk, llvm-commits, aleksandr.urakov

Reviewed By: zturner, rnk

Subscribers: jdoerfert, majnemer, asmith

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D44406

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354841 91177308-0d34-0410-b5e6-96231b3b80d8

[llvm-cov] Fix llvm-cov on Windows and un-XFAIL test

Summary:
The llvm-cov tool needs to be able to find coverage names in the
executable, so the .lprfn and .lcovmap sections cannot be merged into
.rdata.

Also, the linker merges .lprfn$M into .lprfn, so llvm-cov needs to
handle that when looking up sections. It has to support running on both
relocatable object files and linked PE files.

Lastly, when loading .lprfn from a PE file, llvm-cov needs to skip the
leading zero byte added by the profile runtime.

Reviewers: vsk

Subscribers: hiraditya, #sanitizers, llvm-commits

Tags: #sanitizers, #llvm

Differential Revision: https://reviews.llvm.org/D58661

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354840 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Fix bug in x86_intrcc with arg copy elision

Summary:
Use a custom calling convention handler for interrupts instead of fixing
up the locations in LowerMemArgument. This way, the offsets are correct
when constructed and we don't need to account for them in as many
places.

Depends on D56883

Replaces D56275

Reviewers: craig.topper, phil-opp

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D56944

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354837 91177308-0d34-0410-b5e6-96231b3b80d8

Improve "llvm-nm -f sysv" output for Elf files

Specifically, compute and Print Type and Section columns.

Differential Revision: https://reviews.llvm.org/D58263

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354833 91177308-0d34-0410-b5e6-96231b3b80d8

[AMDGPU] Added target to mir test. NFC.

Test was used without -mcpu, although tested instructions
not available on all ASICs.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354830 91177308-0d34-0410-b5e6-96231b3b80d8

RegBankSelect: Handle slightly more complex value mappings

Try to use concat_vectors. Also remove unnecessary assert on
pointers. Fixes asserting for <4 x s16> operations and 64-bit pointers
for AMDGPU.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354828 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU/GlobalISel: Fix bit ops for non-power-of-2 sizes

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354825 91177308-0d34-0410-b5e6-96231b3b80d8

Revert "[Support] Make raw_string_ostream unbuffered"

Shame on me, did not run all the tests, bots are angry.

This reverts commit r354819.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354822 91177308-0d34-0410-b5e6-96231b3b80d8

[LangRef] *.overflow intrinsics now support vectors

We have all the necessary legalization, expansion and unrolling support required for the *.overflow intrinsics with vector types, so update the docs to make that clear.

Note: vectorization is not in place yet (the non-homogenous return types aren't well supported) so we still must explicitly use the vectors intrinsics and not reply on slp/loop.

Differential Revision: https://reviews.llvm.org/D58618

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354821 91177308-0d34-0410-b5e6-96231b3b80d8

[Support] Make raw_string_ostream unbuffered

Summary:
In D58580 i have noted that `llvm::to_string()` is a memory hog.
It uses `raw_string_ostream`, and since it was buffered,
every `raw_string_ostream` had a cost of `BUFSIZ` bytes
(which is `8192` at least here). So every `llvm::to_string()`
call, even to just print an `int`, costed `8192` bytes.

In D58580, getting rid of that buffering //had// significant
performance and memory consumption improvements for `llvm-xray convert`.

Similarly, in D58580 @rnk pointed out that the `raw_svector_ostream`
is already unbuffered, and `write_unsigned_impl` and friends
do internal buffering. So it should be ok performance-wise to just
make the `raw_string_ostream` itself unbuffered.

Here, i don't have any perf measurements.
Another letdown is that i'm leaving a loose-end - not deleting the
`flush()` method. I don't expect that cleanup to be anything more
than just fixing every new compiler error, but i'm presently unable
to do that. Will look into that later.

Reviewers: rnk, zturner

Reviewed By: rnk

Subscribers: kristina, jdoerfert, llvm-commits, rnk

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58643

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354819 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU/GlobalISel: Clamp max implicit_def elements

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354818 91177308-0d34-0410-b5e6-96231b3b80d8

RegisterScavenger: Allow fail without spill

AMDGPU wants to use this in some contexts where
the spilling is either impossible, or a worse alternative
to doing something else.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354816 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Remove IntrReadMem from memtime/memrealtime intrinsics

EarlyCSE with MemorySSA was able to use this to merge multiple calls
with no intervening store.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354814 91177308-0d34-0410-b5e6-96231b3b80d8

GlobalISel: Make legalizer/regbankselect clear NoPHIs property

If no phi existed in the original MIR and these introduced one, the
verifier would fail.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354813 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Improve detection of unneeded shift amount masking to also handle the case that the LHS has known zeroes in it

If the LHS has known zeros, the RHS immediate will have had bits removed. So call computeKnownBits to get the known zeroes so we can handle this case.

Differential Revision: https://reviews.llvm.org/D58475

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354811 91177308-0d34-0410-b5e6-96231b3b80d8

Fix a sign compare warning breaking the -Werror build.

The warning was introduced at r354793.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354810 91177308-0d34-0410-b5e6-96231b3b80d8

AMDGPU: Correct definitions for bitset instructions

These really read and write the result register, so these need a tied
input.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354809 91177308-0d34-0410-b5e6-96231b3b80d8

[Mips] Fix missing masking in fast-isel of br (PR40325)

Fixes https://bugs.llvm.org/show_bug.cgi?id=40325 by zero extending
(and x, 1) the condition before branching on it.

To avoid regressing trivial cases, I'm combining emission of cmp+br
sequences for the single-use + same block case (similar to what we
do in x86). icmpbr1.ll still regresses due to the cross-bb usage
of the condition.

Differential Revision: https://reviews.llvm.org/D58576

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354808 91177308-0d34-0410-b5e6-96231b3b80d8

[AArch64][GlobalISel] Refactor selectBuildVector to use MachineIRBuilder. NFC.

This is a preparatory change as I want to use emitScalarToVector() elsewhere,
and in general we want to transition to MIRBuilder instead of using BuildMI
directly.

Differential Revision: https://reviews.llvm.org/D58528

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354807 91177308-0d34-0410-b5e6-96231b3b80d8

[Lanai] Be super conservative about atomics

As requested during review of D57601 <https://reviews.llvm.org/D57601>, be equally conservative for atomic MMOs as for volatile MMOs in all in tree backends. At the moment, all atomic MMOs are also volatile, but I'm about to change that.

Reviewed as part of https://reviews.llvm.org/D58490, with other backends still pending review.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354800 91177308-0d34-0410-b5e6-96231b3b80d8

[SelectionDAG] Add demanded elts variants to isConstOrConstSplat helpers. NFCI.

These helpers extend the existing isConstOrConstSplat helper checks to support DemandedElts masks as well.

We already had a local version of this in SelectionDAG that computeKnownBits/ComputeNumSignBits made use of, but this adds the functionality directly to the BuildVectorSDNode node and extends isConstOrConstSplat etc. to use that.

This will allow us to reuse the functionality in SimplifyDemandedVectorElts/SimplifyDemandedBits.

Differential Revision: https://reviews.llvm.org/D58503

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354797 91177308-0d34-0410-b5e6-96231b3b80d8

[DAGCombine] Add undef shuffle elt support to partitionShuffleOfConcats

Support undef shuffle mask indices in the shuffle(concat_vectors, concat_vectors) -> concat_vectors fold

Differential Revision: https://reviews.llvm.org/D58585

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354793 91177308-0d34-0410-b5e6-96231b3b80d8

[ARM] Add some more missing T1 opcodes for the peephole optimisier

This adds a few extra Thumb1 opcodes to improve the peephole opimisers
ability to remove redundant cmp instructions. tADC and tSBC require
a small fixup to prevent MOVS being moved past the instruction, giving
the wrong flags.

Differential Revision: https://reviews.llvm.org/D58281

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354791 91177308-0d34-0410-b5e6-96231b3b80d8

[Vectorizer] Add vectorization support for fixed smul/umul intrinsics

This requires a couple of tweaks to existing vectorization functions as they were assuming that only the second call argument (ctlz/cttz/powi) could ever be the 'always scalar' argument, but for smul.fix + umul.fix its the third argument.

Differential Revision: https://reviews.llvm.org/D58616

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354790 91177308-0d34-0410-b5e6-96231b3b80d8

[AArch64] Add support for Cortex-A76 and Cortex-A76AE

- Add LLVM backend support for Cortex-A76 and Cortex-A76AE
- Documentation can be found at
https://developer.arm.com/products/processors/cortex-a/cortex-a76

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354788 91177308-0d34-0410-b5e6-96231b3b80d8

[llvm-objcopy] Add --add-symbol

Differential revision: https://reviews.llvm.org/D58234

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354787 91177308-0d34-0410-b5e6-96231b3b80d8

Fixed typos in tests: s/CHEKC/CHECK/

Reviewers: ilya-biryukov

Subscribers: nemanjai, javed.absar, jsji, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D58611

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354785 91177308-0d34-0410-b5e6-96231b3b80d8

[TTI] Add generic cost model for smul/umul overflow intrinsics

Based off smul/umul fixed costs and the implementation in TargetLowering::expandMULO.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354784 91177308-0d34-0410-b5e6-96231b3b80d8

[SLPVectorizer][X86] Add fixed smul/umul tests

Baseline tests - fixed mul intrinsics aren't flagged as vectorizable yet

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354783 91177308-0d34-0410-b5e6-96231b3b80d8

[llvm-objdump] Add `Version References` dumper

Summary: Add symbol version dumper for [#30241](https://bugs.llvm.org/show_bug.cgi?id=30241)

Reviewers: jhenderson, MaskRay, kristina, emaste, grimar

Reviewed By: jhenderson, grimar

Subscribers: grimar, rupprecht, jakehehrlich, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D54697

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354782 91177308-0d34-0410-b5e6-96231b3b80d8

Fixed typos in tests: s/CEHCK/CHECK/

Reviewers: ilya-biryukov

Subscribers: sanjoy, sdardis, javed.absar, jrtc27, atanasyan, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58608

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354781 91177308-0d34-0410-b5e6-96231b3b80d8

Test commit (remove a blank space)

Change-Id: I69175571d3b1defeb85e96fdd87db5c3ccadcb63

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354775 91177308-0d34-0410-b5e6-96231b3b80d8

[TTI] Add generic cost model for fixed point smul/umul

Based on an IR equivalent of target lowering's generic expansion - target specific costs will typically be lower (IR doesn't have a good mull/mulh equivalent) but we need a baseline.

Differential Revision: https://reviews.llvm.org/D57925

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354774 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Merge ISD::ADD/SUB nodes into X86ISD::ADD/SUB equivalents (PR40483)

Avoid ADD/SUB instruction duplication by reusing the X86ISD::ADD/SUB results.

Includes ADD commutation - I tried to include NEG+SUB SUB commutation as well but this causes regressions as we don't have good combine coverage to simplify X86ISD::SUB.

Differential Revision: https://reviews.llvm.org/D58597

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354771 91177308-0d34-0410-b5e6-96231b3b80d8

[yaml2obj]Re-allow dynamic sections to have raw content

Recently, support was added to yaml2obj to allow dynamic sections to
have a list of entries, to make it easier to write tests with dynamic
sections. However, this change also removed the ability to provide
custom contents to the dynamic section, making it hard to test
malformed contents (e.g. because the section is not a valid size to
contain an array of entries). This change reinstates this. An error is
emitted if raw content and dynamic entries are both specified.

Reviewed by: grimar, ruiu

Differential Review: https://reviews.llvm.org/D58543

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354770 91177308-0d34-0410-b5e6-96231b3b80d8

[ARM] Make fullfp16 instructions not conditionalisable.

More or less all the instructions defined in the v8.2a full-fp16
extension are defined as UNPREDICTABLE if you put them in an IT block
(Thumb) or use with any condition other than AL (ARM). LLVM didn't
know that, and was happy to conditionalise them.

In order to force these instructions to count as not predicable, I had
to make a small Tablegen change. The code generation back end mostly
decides if an instruction was predicable by looking for something it
can identify as a predicate operand; there's an isPredicable bit flag
that overrides that check in the positive direction, but nothing that
overrides it in the negative direction.

(I considered the alternative approach of actually removing the
predicate operand from those instructions, but thought that it would
be more painful overall for instructions differing only in data type
to have different shapes of operand list. This way, the only code that
has to notice the difference is the if-converter.)

So I've added an isUnpredicable bit alongside isPredicable, and set
that bit on the right subset of FP16 instructions, and also on the
VSEL, VMAXNM/VMINNM and VRINT[ANPM] families which should be
unpredicable for all data types.

I've included a couple of representative regression tests, both of
which previously caused an fp16 instruction to be conditionalised in
ARM state and (with -arm-no-restrict-it) to be put in an IT block in
Thumb.

Reviewers: SjoerdMeijer, t.p.northover, efriedma

Reviewed By: efriedma

Subscribers: jdoerfert, javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57823

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354768 91177308-0d34-0410-b5e6-96231b3b80d8

[llvm-exegesis] Split Epsilon param into two (PR40787)

Summary:
This eps param is used for two distinct things:
* initial point clusterization
* checking clusters against the llvm values

What if one wants to only look at highly different clusters, without changing
the clustering itself? In particular, this helps to weed out noisy measurements
(since the clusterization epsilon is still small, so there is a better chance
that noisy measurements from the same opcode will go into different clusters)

By splitting it into two params it is now possible.

This is nearly-free performance-wise:
Old:
```
$ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 10099 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-old.html'
...
Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (25 runs):

            390.01 msec task-clock                #    0.998 CPUs utilized            ( +-  0.25% )
                12      context-switches          #   31.735 M/sec                    ( +- 27.38% )
                 0      cpu-migrations            #    0.000 K/sec
              4745      page-faults               # 12183.732 M/sec                   ( +-  0.54% )
        1562711900      cycles                    # 4012303.327 GHz                   ( +-  0.24% )  (82.90%)
         185567822      stalled-cycles-frontend   #   11.87% frontend cycles idle     ( +-  0.52% )  (83.30%)
         392106234      stalled-cycles-backend    #   25.09% backend cycles idle      ( +-  1.31% )  (33.79%)
        1839236666      instructions              #    1.18  insn per cycle
                                                  #    0.21  stalled cycles per insn  ( +-  0.15% )  (50.37%)
         407035764      branches                  # 1045074878.710 M/sec              ( +-  0.12% )  (66.80%)
          10896459      branch-misses             #    2.68% of all branches          ( +-  0.17% )  (83.20%)

          0.390629 +- 0.000972 seconds time elapsed  ( +-  0.25% )
```
```
$ perf stat -r 9 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-old.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 50572 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-old.html'
...
Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (9 runs):

           6803.36 msec task-clock                #    0.999 CPUs utilized            ( +-  0.96% )
               262      context-switches          #   38.546 M/sec                    ( +- 23.06% )
                 0      cpu-migrations            #    0.065 M/sec                    ( +- 76.03% )
             13287      page-faults               # 1953.206 M/sec                    ( +-  0.32% )
       27252537904      cycles                    # 4006024.257 GHz                   ( +-  0.95% )  (83.31%)
        1496314935      stalled-cycles-frontend   #    5.49% frontend cycles idle     ( +-  0.97% )  (83.32%)
       16128404524      stalled-cycles-backend    #   59.18% backend cycles idle      ( +-  0.30% )  (33.37%)
       17611143370      instructions              #    0.65  insn per cycle
                                                  #    0.92  stalled cycles per insn  ( +-  0.05% )  (50.04%)
        3894906599      branches                  # 572537147.437 M/sec               ( +-  0.03% )  (66.69%)
         116314514      branch-misses             #    2.99% of all branches          ( +-  0.20% )  (83.35%)

            6.8118 +- 0.0689 seconds time elapsed  ( +-  1.01%)
```
New:
```
$ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 10099 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new.html'
...
Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new.html' (25 runs):

            400.14 msec task-clock                #    0.998 CPUs utilized            ( +-  0.66% )
                12      context-switches          #   29.429 M/sec                    ( +- 25.95% )
                 0      cpu-migrations            #    0.100 M/sec                    ( +-100.00% )
              4714      page-faults               # 11796.496 M/sec                   ( +-  0.55% )
        1603131306      cycles                    # 4011840.105 GHz                   ( +-  0.66% )  (82.85%)
         199538509      stalled-cycles-frontend   #   12.45% frontend cycles idle     ( +-  2.40% )  (83.10%)
         402249109      stalled-cycles-backend    #   25.09% backend cycles idle      ( +-  1.19% )  (34.05%)
        1847783963      instructions              #    1.15  insn per cycle
                                                  #    0.22  stalled cycles per insn  ( +-  0.18% )  (50.64%)
         407162722      branches                  # 1018925730.631 M/sec              ( +-  0.12% )  (67.02%)
          10932779      branch-misses             #    2.69% of all branches          ( +-  0.51% )  (83.28%)

           0.40077 +- 0.00267 seconds time elapsed  ( +-  0.67% )

lebedevri@pini-pini:/build/llvm-build-Clang-release$ perf stat -r 9 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-new.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 50572 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new.html'
...
Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-new.html' (9 runs):

           6947.79 msec task-clock                #    1.000 CPUs utilized            ( +-  0.90% )
               217      context-switches          #   31.236 M/sec                    ( +- 36.16% )
                 1      cpu-migrations            #    0.096 M/sec                    ( +- 50.00% )
             13258      page-faults               # 1908.389 M/sec                    ( +-  0.34% )
       27830796523      cycles                    # 4006032.286 GHz                   ( +-  0.89% )  (83.30%)
        1504554006      stalled-cycles-frontend   #    5.41% frontend cycles idle     ( +-  2.10% )  (83.32%)
       16716574843      stalled-cycles-backend    #   60.07% backend cycles idle      ( +-  0.65% )  (33.38%)
       17755545931      instructions              #    0.64  insn per cycle
                                                  #    0.94  stalled cycles per insn  ( +-  0.09% )  (50.04%)
        3897255686      branches                  # 560980426.597 M/sec               ( +-  0.06% )  (66.70%)
         117045395      branch-misses             #    3.00% of all branches          ( +-  0.47% )  (83.34%)

            6.9507 +- 0.0627 seconds time elapsed  ( +-  0.90% )
```

I.e. it's +2.6% slowdown for one whole sweep, or +2% for 5 whole sweeps.
Within noise i'd say.

Should help with [[ https://bugs.llvm.org/show_bug.cgi?id=40787 | PR40787 ]].

Reviewers: courbet, gchatelet

Reviewed By: courbet

Subscribers: tschuett, RKSimon, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58476

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354767 91177308-0d34-0410-b5e6-96231b3b80d8

[XRay][tools] Revert "Use Support/JSON.h in llvm-xray convert"

Summary:
This reverts D50129 / rL338834: [XRay][tools] Use Support/JSON.h in llvm-xray convert

Abstractions are great.
Readable code is great.
JSON support library is a *good* idea.

However unfortunately, there is an internal detail that one needs
to be aware of in `llvm::json::Object` - it uses `llvm::DenseMap`.
So for **every** `llvm::json::Object`, even if you only store a single `int`
entry there, you pay the whole price of `llvm::DenseMap`.

Unfortunately, it matters for `llvm-xray`.

I was trying to analyse the `llvm-exegesis` analysis mode performance,
and for that i wanted to view the LLVM X-Ray log visualization in Chrome
trace viewer. And the `llvm-xray convert` is sluggish, and sometimes
even ended up being killed by OOM.

`xray-log.llvm-exegesis.lwZ0sT` was acquired from `llvm-exegesis`
(compiled with ` -fxray-instruction-threshold=128`)
analysis mode over `-benchmarks-file` with 10099 points (one full
latency measurement set), with normal runtime of 0.387s.

Timings:
Old: (copied from D58580)
```
$ perf stat -r 5 ./bin/llvm-xray convert -sort -symbolize -instr_map=./bin/llvm-exegesis -output-format=trace_event -output=/tmp/trace.yml xray-log.llvm-exegesis.lwZ0sT

Performance counter stats for './bin/llvm-xray convert -sort -symbolize -instr_map=./bin/llvm-exegesis -output-format=trace_event -output=/tmp/trace.yml xray-log.llvm-exegesis.lwZ0sT' (5 runs):

          21346.24 msec task-clock                #    1.000 CPUs utilized            ( +-  0.28% )
               314      context-switches          #   14.701 M/sec                    ( +- 59.13% )
                 1      cpu-migrations            #    0.037 M/sec                    ( +-100.00% )
           2181354      page-faults               # 102191.251 M/sec                  ( +-  0.02% )
       85477442102      cycles                    # 4004415.019 GHz                   ( +-  0.28% )  (83.33%)
       14526427066      stalled-cycles-frontend   #   16.99% frontend cycles idle     ( +-  0.70% )  (83.33%)
       32371533721      stalled-cycles-backend    #   37.87% backend cycles idle      ( +-  0.27% )  (33.34%)
       67896890228      instructions              #    0.79  insn per cycle
                                                  #    0.48  stalled cycles per insn  ( +-  0.03% )  (50.00%)
       14592654840      branches                  # 683631198.653 M/sec               ( +-  0.02% )  (66.67%)
         212207534      branch-misses             #    1.45% of all branches          ( +-  0.94% )  (83.34%)

           21.3502 +- 0.0585 seconds time elapsed  ( +-  0.27% )
```
New:
```
$ perf stat -r 9 ./bin/llvm-xray convert -sort -symbolize -instr_map=./bin/llvm-exegesis -output-format=trace_event -output=/tmp/trace.yml xray-log.llvm-exegesis.lwZ0sT

Performance counter stats for './bin/llvm-xray convert -sort -symbolize -instr_map=./bin/llvm-exegesis -output-format=trace_event -output=/tmp/trace.yml xray-log.llvm-exegesis.lwZ0sT' (9 runs):

           7178.38 msec task-clock                #    1.000 CPUs utilized            ( +-  0.26% )
               182      context-switches          #   25.402 M/sec                    ( +- 28.84% )
                 0      cpu-migrations            #    0.046 M/sec                    ( +- 70.71% )
             33701      page-faults               # 4694.994 M/sec                    ( +-  0.88% )
       28761053971      cycles                    # 4006833.933 GHz                   ( +-  0.26% )  (83.32%)
        2028297997      stalled-cycles-frontend   #    7.05% frontend cycles idle     ( +-  1.61% )  (83.32%)
       10773154901      stalled-cycles-backend    #   37.46% backend cycles idle      ( +-  0.38% )  (33.36%)
       36199132874      instructions              #    1.26  insn per cycle
                                                  #    0.30  stalled cycles per insn  ( +-  0.03% )  (50.02%)
        6434504227      branches                  # 896420204.421 M/sec               ( +-  0.03% )  (66.68%)
          73355176      branch-misses             #    1.14% of all branches          ( +-  1.46% )  (83.33%)

            7.1807 +- 0.0190 seconds time elapsed  ( +-  0.26% )
```

So using `llvm::json` nearly triples run-time on that test case.
(+3x is times, not percent.)

Memory:
Old:
```
total runtime: 39.88s.
bytes allocated in total (ignoring deallocations): 79.07GB (1.98GB/s)
calls to allocation functions: 33267816 (834135/s)
temporary memory allocations: 5832298 (146235/s)
peak heap memory consumption: 9.21GB
peak RSS (including heaptrack overhead): 147.98GB
total memory leaked: 1.09MB
```
New:
```
total runtime: 17.42s.
bytes allocated in total (ignoring deallocations): 5.12GB (293.86MB/s)
calls to allocation functions: 21382982 (1227284/s)
temporary memory allocations: 232858 (13364/s)
peak heap memory consumption: 350.69MB
peak RSS (including heaptrack overhead): 2.55GB
total memory leaked: 79.95KB
```
Diff:
```
total runtime: -22.46s.
bytes allocated in total (ignoring deallocations): -73.95GB (3.29GB/s)
calls to allocation functions: -11884834 (529155/s)
temporary memory allocations: -5599440 (249307/s)
peak heap memory consumption: -8.86GB
peak RSS (including heaptrack overhead): 0B
total memory leaked: -1.01MB
```
So using `llvm::json` increases *peak* memory consumption on *this* testcase ~+27x.
And total allocation count +15x. Both of these numbers are times, *not* percent.

And note that memory usage is clearly unbound with `llvm::json`, it directly depends
on the length of the log, so peak memory consumption is always increasing.
This isn't so with the dumb code, there is no accumulating memory consumption,
peak memory consumption is fixed. Naturally, that means it will handle *much*
larger logs without OOM'ing.

Readability is good, but the price is simply unacceptable here.
Too bad none of this analysis was done as part of the development/review D50129 itself.

Reviewers: dberris, kpw, sammccall

Reviewed By: dberris

Subscribers: riccibruno, hans, courbet, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58584

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354764 91177308-0d34-0410-b5e6-96231b3b80d8

[SelectionDAG] Add a OPC_CheckChild2CondCode to SelectionDAGISel to remove a MoveChild and MoveParent pair.

OPC_CheckCondCode is always used as operand 2 of a setcc. And its always surrounded by a MoveChild2 and a MoveParent. By having a dedicated opcode for this case we can reduce the number of bytes needed for this pattern from 4 bytes to 2.

This saves ~3000 bytes in the X86 table.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354763 91177308-0d34-0410-b5e6-96231b3b80d8

[PowerPC] [PowerPC] Enhance the fast selection of fptoi & fptrunc instruction and clean up related asserts

Summary:
Fast selection of llvm fptoi & fptrunc instructions is not handled well about
VSX instruction support.
We'd use VSX float convert integer instruction instead of non-vsx float convert
integer instruction if the operand register class is VSSRC or VSFRC because i32
and i64 are mapped to VSSRC and VSFRC correspondingly if VSX feature is
openeded.
For float trunc instruction, we do this silimar work like float convert integer
instruction to try to use VSX instruction.

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D58430

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354762 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] Add tests for PR40846; NFC

The icmps are the same as the overflow result of the intrinsic.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354760 91177308-0d34-0410-b5e6-96231b3b80d8

[InstCombine] Move with.overflow tests to separate file; NFC

And regenerate checks. I had to rename some variables, because
update_test_checks can't deal with the same variable names used
in lower and upper case. I've also dropped the result type aliases,
as just using the type directly gives a cleaner result.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354759 91177308-0d34-0410-b5e6-96231b3b80d8

[X86] Add PR40483 test cases

Demonstrate failure to merge ISD::ADD(x,y)/X86ISD::ADD(x,y) + ISD::SUB(x,y)/X86ISD::SUB(x,y) equivalent ops

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354758 91177308-0d34-0410-b5e6-96231b3b80d8