Nick Desaulniers [Fri, 24 May 2019 18:58:21 +0000 (18:58 +0000)]
[ARM] additionally check for ARM::INLINEASM_BR w/ ARM::INLINEASM
Summary:
We were observing failures for arm32 allyesconfigs of the Linux kernel
with the asm goto Clang patch, where ldr's were being generated to
offsets too far away to encode in imm12.
It looks like since INLINEASM_BR was created off of INLINEASM, a few
checks for INLINEASM needed to be updated to check for either case.
Sanjay Patel [Fri, 24 May 2019 16:46:09 +0000 (16:46 +0000)]
[LoopVectorize] update test to be independent of instcombine; NFC
This is a regression test for vectorization, so remove instcombine
from the RUN line and adjust the comparison predicates to show what
the vectorizer is creating rather than how instcombine cleans it up.
Chris Bieneman [Fri, 24 May 2019 16:21:38 +0000 (16:21 +0000)]
[CMake] Fix issues building runtimes
This resolves two issues:
(1) LIBCXX_HEADER_DIR is a very misleadingly named variable because it shouldn't be set to the header directory, instead it needs to be the root binary dir.
(2) If you build runtimes without libcxx, we can't depend on the libcxx header target, so we should instaed refer to it by the variable name which will be unset if libcxx isn't present.
[AMDGPU] Divergence driven ISel. Assign register class for cross block values according to the divergence.
Details: To make instruction selection really divergence driven it is necessary to assign
the correct register classes to the cross block values beforehand. For the divergent targets
same value type requires different register classes dependent on the value divergence.
Simon Atanasyan [Fri, 24 May 2019 12:22:53 +0000 (12:22 +0000)]
[llvm-readobj] Implement GNU-style output for dynamic table
GNU readelf tool prints slightly different dynamic table "header" and
surrounds dynamic tag names by brackets. This patch implements the same
formatting for GNU-style output of the `llvm-readobj`.
LLVM
```
DynamicSection [ (13 entries)
Tag Type Name/Value
0x00000006 SYMTAB 0x168
...
]
```
GNU
```
Dynamic section at offset 0x1d0 contains 13 entries:
Tag Type Name/Value
0x00000006 (SYMTAB) 0x168
...
```
Stefan Pintilie [Fri, 24 May 2019 12:05:37 +0000 (12:05 +0000)]
[PowerPC] Remove CRBits Copy Of Unset/set CBit
For the situation, where we generate the following code:
crxor 8, 8, 8
< Some instructions>
.LBB0_1:
< Some instructions>
cror 1, 8, 8
cror (COPY of CRbit) depends on the result of the crxor instruction.
CR8 is known to be zero as crxor is equivalent to CRUNSET. We can simply use
crxor 1, 1, 1 instead to zero out CR1, which does not have any dependency on
any previous instruction.
This patch will optimize it to:
< Some instructions>
.LBB0_1:
< Some instructions>
cror 1, 1, 1
George Rimar [Fri, 24 May 2019 11:12:50 +0000 (11:12 +0000)]
[llvm-readelf] - Allow dumping of the .dynamic section even if there is no PT_DYNAMIC header.
It is now possible after D61937 was landed and was discussed
in it's review comments. It is not consistent with GNU, which
does not output .dynamic section content in this case for
no visible reason.
James Henderson [Fri, 24 May 2019 10:07:24 +0000 (10:07 +0000)]
[llvm-objdump][test] Fix for spurious matches against file paths
r361479 added tests that did --implicit-check-not=main, but a user found
that they failed on his machine, due to it having 'main' in a file path
printed earlier in the output. This test fixes this issue by making the
check pattern more explicit.
Simon Pilgrim [Fri, 24 May 2019 10:03:11 +0000 (10:03 +0000)]
[SelectionDAG] computeKnownBits - support constant pool values from target
This patch adds the overridable TargetLowering::getTargetConstantFromLoad function which allows targets to return any constant value loaded by a LoadSDNode node - only X86 makes use of this so far but everything should be in place for other targets.
computeKnownBits then uses this function to improve codegen, notably vector code after legalization.
A future commit will do the same for ComputeNumSignBits but computeKnownBits sees the bigger benefit.
This required a couple of fixes:
* SimplifyDemandedBits must early-out for getTargetConstantFromLoad cases to prevent infinite loops of constant regeneration (similar to what we already do for BUILD_VECTOR).
* Fix a DAGCombiner::visitTRUNCATE issue as we had trunc(shl(v8i32),v8i16) <-> shl(trunc(v8i16),v8i32) infinite loops after legalization on AVX512 targets.
Summary:
This patch adds support for the polynomial multiplication instructions
PMULLB/PMULLT. The 64-bit source and 128-bit destination element
variants are enabled with crypto extensions (+sve2-aes), similar to the
NEON PMULL2 instruction. All other variants are enabled with +sve2.
The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest
Bjorn Pettersson [Fri, 24 May 2019 09:20:20 +0000 (09:20 +0000)]
Use the DataLayout::typeSizeEqualsStoreSize helper. NFC
Just a minor refactoring to use the new helper method
DataLayout::typeSizeEqualsStoreSize(). This is done when
checking if getTypeSizeInBits is equal/non-equal to
getTypeStoreSizeInBits.
Immediate forms of the SQSHL and UQSHL instructions are also added to
the existing SVE bitwise shift by immediate (predicated) group, as well
as three new instructions SRSHR/URSHR/SQSHLU. The new instructions in
this group are encoded similarly and are implemented using the same
TableGen class with a minimal change (1 bit in encoding).
The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest
Neil Henning [Fri, 24 May 2019 08:59:17 +0000 (08:59 +0000)]
StructurizeCFG: Relax uniformity checks.
This change relaxes the checks for hasOnlyUniformBranches such that our
region is uniform if:
1. All conditional branches that are direct children are uniform.
2. And either:
a. All sub-regions are uniform.
b. There is one or less conditional branches among the direct
children.
Cullen Rhodes [Fri, 24 May 2019 08:45:37 +0000 (08:45 +0000)]
[AArch64][SVE2] Asm: fix overlapping bit
Summary:
Bit 20 in sve2_int_arith_pred TableGen class was overlapping. The
encodings are not affected as bit 20 is defined by the opc bits
and this was overwriting the earlier error of setting bit 20 to 0.
Raised by Momchil: https://reviews.llvm.org/D62130
Tim Northover [Fri, 24 May 2019 08:40:13 +0000 (08:40 +0000)]
GlobalISel: support swifterror attribute on AArch64.
swifterror marks an argument as a register pretending to be a pointer, so we
need a guaranteed mem2reg-like analysis of its uses. Fortunately most of the
infrastructure can be reused from the DAG world.
Simon Atanasyan [Fri, 24 May 2019 08:39:40 +0000 (08:39 +0000)]
[mips] Always check that `shift and add` optimization is efficient.
The D45316 introduced the `shouldTransformMulToShiftsAddsSubs` function
to check that breaking down constant multiplications into a series
of shifts, adds, and subs is efficient. Unfortunately, this function
does not check maximum number of steps on all paths of the algorithm.
This patch fixes this bug.
Bjorn Pettersson [Fri, 24 May 2019 08:32:02 +0000 (08:32 +0000)]
[DSE] Bugfix to avoid PartialStoreMerging involving non byte-sized stores
Summary:
The DeadStoreElimination pass now skips doing
PartialStoreMerging when stores overlap according to
OW_PartialEarlierWithFullLater and at least one of
the stores is having a store size that is different
from the size of the type being stored.
This solves problems seen in
https://bugs.llvm.org/show_bug.cgi?id=41949
for which we in the past could end up with
mis-compiles or assertions.
The content and location of the padding bits is not
formally described (or undefined) in the LangRef
at the moment. So the solution is chosen based on
that we cannot assume anything about the padding bits
when having a store that clobbers more memory than
indicated by the type of the value that is stored
(such as storing an i6 using an 8-bit store instruction).
Sjoerd Meijer [Fri, 24 May 2019 08:25:02 +0000 (08:25 +0000)]
[ARM] ARMExpandPseudoInsts: add debug messages
This pass wasn't printing any messages at all, which I find really inconvenient
while debugging/tracing things. It now dumps the before and after of expanded
instructions. It doesn't do this yet for all instructions, but this is a good
start I guess.
QingShan Zhang [Fri, 24 May 2019 05:30:09 +0000 (05:30 +0000)]
[Power9] Add a specific heuristic to schedule the addi before the load
When we are scheduling the load and addi, if all other heuristic didn't take effect,
we will try to schedule the addi before the load, to hide the latency, and avoid the
true dependency added by RA. And this only take effects for Power9.
Yevgeny Rouban [Fri, 24 May 2019 04:34:23 +0000 (04:34 +0000)]
[NFC] SwitchInst: Introduce wrapper for prof branch_weights handling
This patch introduces a wrapper class that re-implements
several mutator methods of SwitchInst to handle changes
of prof branch_weights metadata along with remove/add
switch case methods.
Subsequent patches will use this wrapper to implement
prof branch_weights metadata handling for SwitchInst.
Reid Kleckner [Fri, 24 May 2019 01:27:20 +0000 (01:27 +0000)]
[AArch64] Preserve X8 for thunks ending in variadic musttail calls
Summary:
On Windows, X8 may be used to pass in the address of an aggregate that
is returned indirectly. Therefore, it should be forwarded to variadic
musttail calls and preserved in thunks.
David Blaikie [Fri, 24 May 2019 01:05:52 +0000 (01:05 +0000)]
dwarfdump: Add a bit more DWARF64 support
This test case was incorrect because it mixed DWARF32 and DWARF64 for a
single unit (DWARF32 unit referencing a DWARF64 str_offsets section). So
fix enough of the unit parsing for DWARF64 and make the test valid.
(not sure if anyone needs DWARF64 support though - support in
libDebugInfoDWARF has been added piecemeal and LLVM doesn't produce it
at all)
llvm-objcopy: Change sectionWithinSegment() to use virtual addresses instead of file offsets for SHT_NOBITS sections.
Without this, sectionWithinSegment() will return the wrong answer for bss
sections. This doesn't seem to matter now (for non-broken ELF files), but
it will matter with a change that I'm working on.
Daniel Sanders [Thu, 23 May 2019 23:02:56 +0000 (23:02 +0000)]
Break false dependencies on target libraries
Summary:
For the most part this consists of replacing ${LLVM_TARGETS_TO_BUILD} with
some combination of AllTargets* so that they depend on specific components
of a target backend rather than all of it. The overall effect of this is
that, for example, tools like opt no longer falsely depend on the
disassembler, while tools like llvm-ar no longer depend on the code
generator.
There's a couple quirks to point out here:
* AllTargetsCodeGens is a bit more prevalent than expected. Tools like dsymutil
seem to need it which I was surprised by.
* llvm-xray linked to all the backends but doesn't seem to need any of them.
It builds and passes the tests so that seems to be correct.
* I left gold out as it's not built when binutils is not available so I'm
unable to test it
Bob Haarman [Thu, 23 May 2019 22:28:18 +0000 (22:28 +0000)]
fix accidental implicit matches in elf-disassemble-symbol-labels-rel.test
llvm/test/tools/llvm-objdump/X86/elf-disassemble-symbol-labels-rel.test
uses --implicit-check-not to verify that certain patterns do not occur
in llvm-objdump's output, except in places where they are explicitly
checked. Unfortunately, the patterns are generic enough that they may
be part of the file name which is also output by llvm-objdump. This
change matches the line with the filename explicitly so that the
implicit patterns are not applied to it.
Sanjay Patel [Thu, 23 May 2019 21:49:47 +0000 (21:49 +0000)]
[InstSimplify] insertelement V, undef, ? --> V
This was part of InstCombine, but it's better placed in
InstSimplify. InstCombine also had an unreachable but weaker
fold for insertelement with undef index, so that is deleted.
Sanjay Patel [Thu, 23 May 2019 20:17:25 +0000 (20:17 +0000)]
[DAGCombiner] make folds of binops safe for opcodes that produce >1 value
This is no-functional-change-intended currently because the definition
of isBinOp() only includes opcodes that produce 1 value. But if we
share that implementation with isCommutativeBinOp() as proposed in
D62191, then we need to make sure that the callers bail out for
opcodes that they are not prepared to handle correctly.
Based on this, we can assume the high 15 bits of a frame index or sret
are 0. The frame index value is the per-lane offset, so the maximum
frame index value is MAX_WAVE_SCRATCH / wavesize.
Remove the corresponding subtarget feature and option that made
this configurable.
Alina Sbirlea [Thu, 23 May 2019 19:07:41 +0000 (19:07 +0000)]
[SLPVectorizer] Set flag to previous default.
Summary:
The refactoring in r360276 moved the `RunSLPVectorization` flag and added the default explicitly. The default should have been `false`, as before.
The new pass manager used to have SLPVectorization on by default, now it's off in opt, and needs D61617 checked in to enable it in clang.
Sanjay Patel [Thu, 23 May 2019 18:46:03 +0000 (18:46 +0000)]
[InstCombine] be more careful when transforming a shuffle mask
This is reduced from a fuzzer test:
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=14890
Usually, demanded elements should be able to simplify shuffle
mask elements that are pointing to undef elements of its source
operands, but that doesn't happen in the test case.
Jordan Rupprecht [Thu, 23 May 2019 18:43:19 +0000 (18:43 +0000)]
[git] Be more specific when looking for llvm-svn
Summary:
A commit may, for some reason, have `llvm-svn:` in it multiple times. It may even take up the whole line and look identical to what gets added automatically when svn commits land in github.
To workaround this, make changes to both lookups:
1) When doing the git -> svn lookup, make sure to go through the whole message, and:
a) Only look for llvm-svn starting at the beginning of the line (excluding the whitespace that `git log` adds).
b) Take the last one (at the end of the commit message), if there are multiple matches.
2) When doing the svn -> git lookup, look through a sizeable but still reasonably small number of git commits (10k, about 4-5 months right now), and:
a) Only consider commits with the '^llvm-svn: NNNNNN' we expect, and
b) Only consider those that also follow the same git -> svn matching above. (Error if it's not exactly one commit).
The functions findPotentiallyBlockedCopies and buildCopy are currently not
accounting for the presence of debug instructions. In the former this results
in the optimization not being trigerred, and in the latter results in
inconsistent codegen.
This patch enables the optimization to be performed in a debug build and
ensures the codegen is consistent with non-debug builds.
Kit Barton [Thu, 23 May 2019 17:56:35 +0000 (17:56 +0000)]
[LOOPINFO] Extend Loop object to add utilities to get the loop bounds, step, induction variable, and guard branch.
Summary:
This PR extends the loop object with more utilities to get loop bounds, step, induction variable, and guard branch. There already exists passes which try to obtain the loop induction variable in their own pass, e.g. loop interchange. It would be useful to have a common area to get these information. Moreover, loop fusion (https://reviews.llvm.org/D55851) is planning to use getGuard() to extend the kind of loops it is able to fuse, e.g. rotated loop with non-constant upper bound, which would have a loop guard.
Summary:
Remove all llvm:: prefixes in FileCheck library header and
implementation except for calls to make_unique and make_shared since
both files already use the llvm namespace.
Chris Bieneman [Thu, 23 May 2019 17:06:46 +0000 (17:06 +0000)]
[CMake] Copy C++ headers before configuring runtimes build
Summary: On some platforms C++ headers are packaged with the compiler not the sysroot. If you don't copy C++ headers into the build include directory during configuraiton of the outer build the C++ check during the runtime configuration may get inaccurate results.
Andrea Di Biagio [Thu, 23 May 2019 16:32:19 +0000 (16:32 +0000)]
[MCA] Add the ability to compute critical register dependency of an instruction.
This patch adds the methods `getCriticalRegDep()` and `computeCriticalRegDep()` to
class InstructionBase.
The goal is to allow users to obtain information about the critical register
dependency that most affects the latency of an instruction.
These methods are currently unused. However, the long term plan is to use them
in order to allow the computation of a critical-path as part of the bottleneck
analysis. So, this is yet another step towards fixing PR37494.
Shoaib Meenai [Thu, 23 May 2019 16:29:09 +0000 (16:29 +0000)]
[AsmPrinter] Treat a narrowing PtrToInt like Trunc
When printing assembly for PtrToInt, AsmPrinter::lowerConstant
incorrectly assumed that if PtrToInt was not converting to an
int with exactly the same number of bits, it must be widening
to a larger int. But this isn't necessarily true; PtrToInt can
also shrink the size, which is useful when you want to produce
a known 32-bit pointer on a 64-bit platform (on x86_64 ELF
this yields a R_X86_64_32 relocation).
The old behavior of falling through to the widening case for a
narrowing PtrToInt yields bogus assembly code like this, which
fails to assemble because the no-op bit and it accidentally
creates is not a valid relocation:
```
.long a&-1
```
The fix is to treat a narrowing PtrToInt exactly the same as
it already treats Trunc: just emit the expression and let
the assembler deal with truncating it in the appropriate way.
Fangrui Song [Thu, 23 May 2019 16:01:59 +0000 (16:01 +0000)]
[Object] object::ELFObjectFile::symbol_begin(): skip symbol index 0
For clients iterating the symbol table, none expects to handle index 0
(STN_UNDEF). Skip it to improve consistency with other binary formats.
Clients that need STN_UNDEF (e.g. lld) can use
getSectionContentsAsArray(). A test will be added in D62148.
Don Hinton [Thu, 23 May 2019 15:03:22 +0000 (15:03 +0000)]
[cmake] When getting Ninja version, don't include CMakeNinjaFindMake
which doesn't play well with passing CMAKE_MAKE_PROGRAM from the
commandline without a path.
Lewis Revill [Thu, 23 May 2019 14:46:27 +0000 (14:46 +0000)]
[RISCV] Support assembling TLS LA pseudo instructions
This patch adds the pseudo instructions la.tls.ie and la.tls.gd, used in
the initial-exec and global-dynamic TLS models respectively when
addressing a global. The pseudo instructions are expanded in the
assembly parser.
Andrea Di Biagio [Thu, 23 May 2019 13:42:47 +0000 (13:42 +0000)]
[MCA] Introduce class LSUnitBase and let LSUnit derive from it.
Class LSUnitBase provides a abstract interface for all the concrete LS units in
llvm-mca.
Methods exposed by the public abstract LSUnitBase interface are:
- Status isAvailable(const InstRef&);
- void dispatch(const InstRef &);
- const InstRef &isReady(const InstRef &);
LSUnitBase standardises the API, but not the data structures internally used by
LS units. This allows for more flexibility.
Previously, only method `isReady()` was declared virtual by class LSUnit.
Also, derived classes had to inherit all the internal data members of LSUnit.
James Henderson [Thu, 23 May 2019 12:38:06 +0000 (12:38 +0000)]
[llvm-objdump][test] Make test names consistent
This change renames a number of the disassembly tests to standardise
disasm/diassemble/disassembly to disassemble. Requested in
https://reviews.llvm.org/D62255.
James Henderson [Thu, 23 May 2019 12:30:39 +0000 (12:30 +0000)]
[llvm-objdump][test] Improve testing of some switches #3
This is the third commit in a series of patches to improve test coverage
of llvm-objdump. In this patch I have added a number of tests testing
various aspects of disassembly.
Konrad Kleine [Thu, 23 May 2019 11:14:47 +0000 (11:14 +0000)]
[lldb] NFC modernize codebase with modernize-use-nullptr
Summary:
NFC = [[ https://llvm.org/docs/Lexicon.html#nfc | Non functional change ]]
This commit is the result of modernizing the LLDB codebase by using
`nullptr` instread of `0` or `NULL`. See
https://clang.llvm.org/extra/clang-tidy/checks/modernize-use-nullptr.html
for more information.
This is the command I ran and I to fix and format the code base:
NOTE: There were also changes to `llvm/utils/unittest` but I did not
include them because I felt that maybe this library shall be updated in
isolation somehow.
NOTE: I know this is a rather large commit but it is a nobrainer in most
parts.
Petar Jovanovic [Thu, 23 May 2019 10:37:13 +0000 (10:37 +0000)]
[DwarfExpression] Refactor dwarf expression (NFC)
Refactor location description kind in order to be easier for extensions
(needed for D60866).
In addition, cut off some bits from the other class fields.
James Henderson [Thu, 23 May 2019 10:17:10 +0000 (10:17 +0000)]
[llvm-objdump][test] Improve testing of some switches #2
This patch focuses on adding additional testing for the --source switch.
For reference, the source-interleave-x86_64.ll test file has been split
into two parts - the input (shared with the other tests) and the test
itself.
George Rimar [Thu, 23 May 2019 09:18:57 +0000 (09:18 +0000)]
[llvm-objcopy] - Many minor NFC changes to cleanup/improve the code in ELF/Object.cpp.
The code in ELF/Object.cpp is sometimes a bit hard to read because of
lots of auto used everywhere. The main intention of this patch is
to replace them with the real type for places where it is not obvious.
Also it cleanups few places.
It is NFC change, but I want to be sure that there is no objections to do that since it
is massive.
Sam Parker [Thu, 23 May 2019 07:46:39 +0000 (07:46 +0000)]
[ARM][CGP] Clear SafeWrap before each search
The previous patch added a member set to store instructions that we
could allow to wrap. But this wasn't cleared between searches meaning
that they could get promoted, incorrectly, during the promotion of a
separate valid chain.