George Rimar [Fri, 24 May 2019 11:12:50 +0000 (11:12 +0000)]
[llvm-readelf] - Allow dumping of the .dynamic section even if there is no PT_DYNAMIC header.
It is now possible after D61937 was landed and was discussed
in it's review comments. It is not consistent with GNU, which
does not output .dynamic section content in this case for
no visible reason.
James Henderson [Fri, 24 May 2019 10:07:24 +0000 (10:07 +0000)]
[llvm-objdump][test] Fix for spurious matches against file paths
r361479 added tests that did --implicit-check-not=main, but a user found
that they failed on his machine, due to it having 'main' in a file path
printed earlier in the output. This test fixes this issue by making the
check pattern more explicit.
Simon Pilgrim [Fri, 24 May 2019 10:03:11 +0000 (10:03 +0000)]
[SelectionDAG] computeKnownBits - support constant pool values from target
This patch adds the overridable TargetLowering::getTargetConstantFromLoad function which allows targets to return any constant value loaded by a LoadSDNode node - only X86 makes use of this so far but everything should be in place for other targets.
computeKnownBits then uses this function to improve codegen, notably vector code after legalization.
A future commit will do the same for ComputeNumSignBits but computeKnownBits sees the bigger benefit.
This required a couple of fixes:
* SimplifyDemandedBits must early-out for getTargetConstantFromLoad cases to prevent infinite loops of constant regeneration (similar to what we already do for BUILD_VECTOR).
* Fix a DAGCombiner::visitTRUNCATE issue as we had trunc(shl(v8i32),v8i16) <-> shl(trunc(v8i16),v8i32) infinite loops after legalization on AVX512 targets.
Summary:
This patch adds support for the polynomial multiplication instructions
PMULLB/PMULLT. The 64-bit source and 128-bit destination element
variants are enabled with crypto extensions (+sve2-aes), similar to the
NEON PMULL2 instruction. All other variants are enabled with +sve2.
The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest
Bjorn Pettersson [Fri, 24 May 2019 09:20:20 +0000 (09:20 +0000)]
Use the DataLayout::typeSizeEqualsStoreSize helper. NFC
Just a minor refactoring to use the new helper method
DataLayout::typeSizeEqualsStoreSize(). This is done when
checking if getTypeSizeInBits is equal/non-equal to
getTypeStoreSizeInBits.
Immediate forms of the SQSHL and UQSHL instructions are also added to
the existing SVE bitwise shift by immediate (predicated) group, as well
as three new instructions SRSHR/URSHR/SQSHLU. The new instructions in
this group are encoded similarly and are implemented using the same
TableGen class with a minimal change (1 bit in encoding).
The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest
Neil Henning [Fri, 24 May 2019 08:59:17 +0000 (08:59 +0000)]
StructurizeCFG: Relax uniformity checks.
This change relaxes the checks for hasOnlyUniformBranches such that our
region is uniform if:
1. All conditional branches that are direct children are uniform.
2. And either:
a. All sub-regions are uniform.
b. There is one or less conditional branches among the direct
children.
Cullen Rhodes [Fri, 24 May 2019 08:45:37 +0000 (08:45 +0000)]
[AArch64][SVE2] Asm: fix overlapping bit
Summary:
Bit 20 in sve2_int_arith_pred TableGen class was overlapping. The
encodings are not affected as bit 20 is defined by the opc bits
and this was overwriting the earlier error of setting bit 20 to 0.
Raised by Momchil: https://reviews.llvm.org/D62130
Tim Northover [Fri, 24 May 2019 08:40:13 +0000 (08:40 +0000)]
GlobalISel: support swifterror attribute on AArch64.
swifterror marks an argument as a register pretending to be a pointer, so we
need a guaranteed mem2reg-like analysis of its uses. Fortunately most of the
infrastructure can be reused from the DAG world.
Simon Atanasyan [Fri, 24 May 2019 08:39:40 +0000 (08:39 +0000)]
[mips] Always check that `shift and add` optimization is efficient.
The D45316 introduced the `shouldTransformMulToShiftsAddsSubs` function
to check that breaking down constant multiplications into a series
of shifts, adds, and subs is efficient. Unfortunately, this function
does not check maximum number of steps on all paths of the algorithm.
This patch fixes this bug.
Bjorn Pettersson [Fri, 24 May 2019 08:32:02 +0000 (08:32 +0000)]
[DSE] Bugfix to avoid PartialStoreMerging involving non byte-sized stores
Summary:
The DeadStoreElimination pass now skips doing
PartialStoreMerging when stores overlap according to
OW_PartialEarlierWithFullLater and at least one of
the stores is having a store size that is different
from the size of the type being stored.
This solves problems seen in
https://bugs.llvm.org/show_bug.cgi?id=41949
for which we in the past could end up with
mis-compiles or assertions.
The content and location of the padding bits is not
formally described (or undefined) in the LangRef
at the moment. So the solution is chosen based on
that we cannot assume anything about the padding bits
when having a store that clobbers more memory than
indicated by the type of the value that is stored
(such as storing an i6 using an 8-bit store instruction).
Sjoerd Meijer [Fri, 24 May 2019 08:25:02 +0000 (08:25 +0000)]
[ARM] ARMExpandPseudoInsts: add debug messages
This pass wasn't printing any messages at all, which I find really inconvenient
while debugging/tracing things. It now dumps the before and after of expanded
instructions. It doesn't do this yet for all instructions, but this is a good
start I guess.
QingShan Zhang [Fri, 24 May 2019 05:30:09 +0000 (05:30 +0000)]
[Power9] Add a specific heuristic to schedule the addi before the load
When we are scheduling the load and addi, if all other heuristic didn't take effect,
we will try to schedule the addi before the load, to hide the latency, and avoid the
true dependency added by RA. And this only take effects for Power9.
Yevgeny Rouban [Fri, 24 May 2019 04:34:23 +0000 (04:34 +0000)]
[NFC] SwitchInst: Introduce wrapper for prof branch_weights handling
This patch introduces a wrapper class that re-implements
several mutator methods of SwitchInst to handle changes
of prof branch_weights metadata along with remove/add
switch case methods.
Subsequent patches will use this wrapper to implement
prof branch_weights metadata handling for SwitchInst.
Reid Kleckner [Fri, 24 May 2019 01:27:20 +0000 (01:27 +0000)]
[AArch64] Preserve X8 for thunks ending in variadic musttail calls
Summary:
On Windows, X8 may be used to pass in the address of an aggregate that
is returned indirectly. Therefore, it should be forwarded to variadic
musttail calls and preserved in thunks.
David Blaikie [Fri, 24 May 2019 01:05:52 +0000 (01:05 +0000)]
dwarfdump: Add a bit more DWARF64 support
This test case was incorrect because it mixed DWARF32 and DWARF64 for a
single unit (DWARF32 unit referencing a DWARF64 str_offsets section). So
fix enough of the unit parsing for DWARF64 and make the test valid.
(not sure if anyone needs DWARF64 support though - support in
libDebugInfoDWARF has been added piecemeal and LLVM doesn't produce it
at all)
llvm-objcopy: Change sectionWithinSegment() to use virtual addresses instead of file offsets for SHT_NOBITS sections.
Without this, sectionWithinSegment() will return the wrong answer for bss
sections. This doesn't seem to matter now (for non-broken ELF files), but
it will matter with a change that I'm working on.
Daniel Sanders [Thu, 23 May 2019 23:02:56 +0000 (23:02 +0000)]
Break false dependencies on target libraries
Summary:
For the most part this consists of replacing ${LLVM_TARGETS_TO_BUILD} with
some combination of AllTargets* so that they depend on specific components
of a target backend rather than all of it. The overall effect of this is
that, for example, tools like opt no longer falsely depend on the
disassembler, while tools like llvm-ar no longer depend on the code
generator.
There's a couple quirks to point out here:
* AllTargetsCodeGens is a bit more prevalent than expected. Tools like dsymutil
seem to need it which I was surprised by.
* llvm-xray linked to all the backends but doesn't seem to need any of them.
It builds and passes the tests so that seems to be correct.
* I left gold out as it's not built when binutils is not available so I'm
unable to test it
Bob Haarman [Thu, 23 May 2019 22:28:18 +0000 (22:28 +0000)]
fix accidental implicit matches in elf-disassemble-symbol-labels-rel.test
llvm/test/tools/llvm-objdump/X86/elf-disassemble-symbol-labels-rel.test
uses --implicit-check-not to verify that certain patterns do not occur
in llvm-objdump's output, except in places where they are explicitly
checked. Unfortunately, the patterns are generic enough that they may
be part of the file name which is also output by llvm-objdump. This
change matches the line with the filename explicitly so that the
implicit patterns are not applied to it.
Sanjay Patel [Thu, 23 May 2019 21:49:47 +0000 (21:49 +0000)]
[InstSimplify] insertelement V, undef, ? --> V
This was part of InstCombine, but it's better placed in
InstSimplify. InstCombine also had an unreachable but weaker
fold for insertelement with undef index, so that is deleted.
Sanjay Patel [Thu, 23 May 2019 20:17:25 +0000 (20:17 +0000)]
[DAGCombiner] make folds of binops safe for opcodes that produce >1 value
This is no-functional-change-intended currently because the definition
of isBinOp() only includes opcodes that produce 1 value. But if we
share that implementation with isCommutativeBinOp() as proposed in
D62191, then we need to make sure that the callers bail out for
opcodes that they are not prepared to handle correctly.
Based on this, we can assume the high 15 bits of a frame index or sret
are 0. The frame index value is the per-lane offset, so the maximum
frame index value is MAX_WAVE_SCRATCH / wavesize.
Remove the corresponding subtarget feature and option that made
this configurable.
Alina Sbirlea [Thu, 23 May 2019 19:07:41 +0000 (19:07 +0000)]
[SLPVectorizer] Set flag to previous default.
Summary:
The refactoring in r360276 moved the `RunSLPVectorization` flag and added the default explicitly. The default should have been `false`, as before.
The new pass manager used to have SLPVectorization on by default, now it's off in opt, and needs D61617 checked in to enable it in clang.
Sanjay Patel [Thu, 23 May 2019 18:46:03 +0000 (18:46 +0000)]
[InstCombine] be more careful when transforming a shuffle mask
This is reduced from a fuzzer test:
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=14890
Usually, demanded elements should be able to simplify shuffle
mask elements that are pointing to undef elements of its source
operands, but that doesn't happen in the test case.
Jordan Rupprecht [Thu, 23 May 2019 18:43:19 +0000 (18:43 +0000)]
[git] Be more specific when looking for llvm-svn
Summary:
A commit may, for some reason, have `llvm-svn:` in it multiple times. It may even take up the whole line and look identical to what gets added automatically when svn commits land in github.
To workaround this, make changes to both lookups:
1) When doing the git -> svn lookup, make sure to go through the whole message, and:
a) Only look for llvm-svn starting at the beginning of the line (excluding the whitespace that `git log` adds).
b) Take the last one (at the end of the commit message), if there are multiple matches.
2) When doing the svn -> git lookup, look through a sizeable but still reasonably small number of git commits (10k, about 4-5 months right now), and:
a) Only consider commits with the '^llvm-svn: NNNNNN' we expect, and
b) Only consider those that also follow the same git -> svn matching above. (Error if it's not exactly one commit).
The functions findPotentiallyBlockedCopies and buildCopy are currently not
accounting for the presence of debug instructions. In the former this results
in the optimization not being trigerred, and in the latter results in
inconsistent codegen.
This patch enables the optimization to be performed in a debug build and
ensures the codegen is consistent with non-debug builds.
Kit Barton [Thu, 23 May 2019 17:56:35 +0000 (17:56 +0000)]
[LOOPINFO] Extend Loop object to add utilities to get the loop bounds, step, induction variable, and guard branch.
Summary:
This PR extends the loop object with more utilities to get loop bounds, step, induction variable, and guard branch. There already exists passes which try to obtain the loop induction variable in their own pass, e.g. loop interchange. It would be useful to have a common area to get these information. Moreover, loop fusion (https://reviews.llvm.org/D55851) is planning to use getGuard() to extend the kind of loops it is able to fuse, e.g. rotated loop with non-constant upper bound, which would have a loop guard.
Summary:
Remove all llvm:: prefixes in FileCheck library header and
implementation except for calls to make_unique and make_shared since
both files already use the llvm namespace.
Chris Bieneman [Thu, 23 May 2019 17:06:46 +0000 (17:06 +0000)]
[CMake] Copy C++ headers before configuring runtimes build
Summary: On some platforms C++ headers are packaged with the compiler not the sysroot. If you don't copy C++ headers into the build include directory during configuraiton of the outer build the C++ check during the runtime configuration may get inaccurate results.
Andrea Di Biagio [Thu, 23 May 2019 16:32:19 +0000 (16:32 +0000)]
[MCA] Add the ability to compute critical register dependency of an instruction.
This patch adds the methods `getCriticalRegDep()` and `computeCriticalRegDep()` to
class InstructionBase.
The goal is to allow users to obtain information about the critical register
dependency that most affects the latency of an instruction.
These methods are currently unused. However, the long term plan is to use them
in order to allow the computation of a critical-path as part of the bottleneck
analysis. So, this is yet another step towards fixing PR37494.
Shoaib Meenai [Thu, 23 May 2019 16:29:09 +0000 (16:29 +0000)]
[AsmPrinter] Treat a narrowing PtrToInt like Trunc
When printing assembly for PtrToInt, AsmPrinter::lowerConstant
incorrectly assumed that if PtrToInt was not converting to an
int with exactly the same number of bits, it must be widening
to a larger int. But this isn't necessarily true; PtrToInt can
also shrink the size, which is useful when you want to produce
a known 32-bit pointer on a 64-bit platform (on x86_64 ELF
this yields a R_X86_64_32 relocation).
The old behavior of falling through to the widening case for a
narrowing PtrToInt yields bogus assembly code like this, which
fails to assemble because the no-op bit and it accidentally
creates is not a valid relocation:
```
.long a&-1
```
The fix is to treat a narrowing PtrToInt exactly the same as
it already treats Trunc: just emit the expression and let
the assembler deal with truncating it in the appropriate way.
Fangrui Song [Thu, 23 May 2019 16:01:59 +0000 (16:01 +0000)]
[Object] object::ELFObjectFile::symbol_begin(): skip symbol index 0
For clients iterating the symbol table, none expects to handle index 0
(STN_UNDEF). Skip it to improve consistency with other binary formats.
Clients that need STN_UNDEF (e.g. lld) can use
getSectionContentsAsArray(). A test will be added in D62148.
Don Hinton [Thu, 23 May 2019 15:03:22 +0000 (15:03 +0000)]
[cmake] When getting Ninja version, don't include CMakeNinjaFindMake
which doesn't play well with passing CMAKE_MAKE_PROGRAM from the
commandline without a path.
Lewis Revill [Thu, 23 May 2019 14:46:27 +0000 (14:46 +0000)]
[RISCV] Support assembling TLS LA pseudo instructions
This patch adds the pseudo instructions la.tls.ie and la.tls.gd, used in
the initial-exec and global-dynamic TLS models respectively when
addressing a global. The pseudo instructions are expanded in the
assembly parser.
Andrea Di Biagio [Thu, 23 May 2019 13:42:47 +0000 (13:42 +0000)]
[MCA] Introduce class LSUnitBase and let LSUnit derive from it.
Class LSUnitBase provides a abstract interface for all the concrete LS units in
llvm-mca.
Methods exposed by the public abstract LSUnitBase interface are:
- Status isAvailable(const InstRef&);
- void dispatch(const InstRef &);
- const InstRef &isReady(const InstRef &);
LSUnitBase standardises the API, but not the data structures internally used by
LS units. This allows for more flexibility.
Previously, only method `isReady()` was declared virtual by class LSUnit.
Also, derived classes had to inherit all the internal data members of LSUnit.
James Henderson [Thu, 23 May 2019 12:38:06 +0000 (12:38 +0000)]
[llvm-objdump][test] Make test names consistent
This change renames a number of the disassembly tests to standardise
disasm/diassemble/disassembly to disassemble. Requested in
https://reviews.llvm.org/D62255.
James Henderson [Thu, 23 May 2019 12:30:39 +0000 (12:30 +0000)]
[llvm-objdump][test] Improve testing of some switches #3
This is the third commit in a series of patches to improve test coverage
of llvm-objdump. In this patch I have added a number of tests testing
various aspects of disassembly.
Konrad Kleine [Thu, 23 May 2019 11:14:47 +0000 (11:14 +0000)]
[lldb] NFC modernize codebase with modernize-use-nullptr
Summary:
NFC = [[ https://llvm.org/docs/Lexicon.html#nfc | Non functional change ]]
This commit is the result of modernizing the LLDB codebase by using
`nullptr` instread of `0` or `NULL`. See
https://clang.llvm.org/extra/clang-tidy/checks/modernize-use-nullptr.html
for more information.
This is the command I ran and I to fix and format the code base:
NOTE: There were also changes to `llvm/utils/unittest` but I did not
include them because I felt that maybe this library shall be updated in
isolation somehow.
NOTE: I know this is a rather large commit but it is a nobrainer in most
parts.
Petar Jovanovic [Thu, 23 May 2019 10:37:13 +0000 (10:37 +0000)]
[DwarfExpression] Refactor dwarf expression (NFC)
Refactor location description kind in order to be easier for extensions
(needed for D60866).
In addition, cut off some bits from the other class fields.
James Henderson [Thu, 23 May 2019 10:17:10 +0000 (10:17 +0000)]
[llvm-objdump][test] Improve testing of some switches #2
This patch focuses on adding additional testing for the --source switch.
For reference, the source-interleave-x86_64.ll test file has been split
into two parts - the input (shared with the other tests) and the test
itself.
George Rimar [Thu, 23 May 2019 09:18:57 +0000 (09:18 +0000)]
[llvm-objcopy] - Many minor NFC changes to cleanup/improve the code in ELF/Object.cpp.
The code in ELF/Object.cpp is sometimes a bit hard to read because of
lots of auto used everywhere. The main intention of this patch is
to replace them with the real type for places where it is not obvious.
Also it cleanups few places.
It is NFC change, but I want to be sure that there is no objections to do that since it
is massive.
Sam Parker [Thu, 23 May 2019 07:46:39 +0000 (07:46 +0000)]
[ARM][CGP] Clear SafeWrap before each search
The previous patch added a member set to store instructions that we
could allow to wrap. But this wasn't cleared between searches meaning
that they could get promoted, incorrectly, during the promotion of a
separate valid chain.
Christian Bruel [Thu, 23 May 2019 05:53:10 +0000 (05:53 +0000)]
[GlobalOpt] recognize dead struct fields and propagate values
Summary:
Allow struct fields SRA and dead stores. This works by considering fields accesses from getElementPtr to be considered as a possible pointer root that can be cleaned up.
We check that the variable can be SRA by recursively checking the sub expressions with the new isSafeSubSROAGEP function.
basically this allows the array in following C code to be optimized out
Thomas Lively [Thu, 23 May 2019 01:24:01 +0000 (01:24 +0000)]
[WebAssembly] Implement __builtin_return_address for emscripten
Summary:
In this patch, `ISD::RETURNADDR` is lowered on the emscripten target
to the new Emscripten runtime function `emscripten_return_address`, which
implements the functionality.
Fangrui Song [Thu, 23 May 2019 01:05:13 +0000 (01:05 +0000)]
[X86] Support -fno-plt __tls_get_addr calls
In general dynamic/local dynamic TLS models, with -fno-plt,
* x86: emit `calll *___tls_get_addr@GOT(%ebx)` instead of `calll ___tls_get_addr@PLT`
Note, on x86, if we can get rid of %ebx as the PIC register,
it may be better to use a register not preserved across function calls.
* x86_64: emit `callq *__tls_get_addr@GOTPCREL(%rip)` instead of `callq __tls_get_addr@PLT`
Reorganize the code by separating 32-bit and 64-bit.
Summary:
With now a clear distinction between string and numeric substitutions,
this patch introduces separate classes to represent them with a parent
class implementing the common interface. Diagnostics in
printSubstitutions() are also adapted to not require knowing which
substitution is being looked at since it does not hinder clarity and
makes the implementation simpler.
Summary:
Terminology introduced by [[#]] blocks is confusing and does not
integrate well with existing terminology.
First, variables referred by [[]] blocks are called "pattern variables"
while the text a CHECK directive needs to match is called a "CHECK
pattern". This is inconsistent with variables in [[#]] blocks since
[[#]] blocks are also found in CHECK pattern yet those variables are
called "numeric variable".
Second, the replacing of both [[]] and [[#]] blocks by the value of the
variable or expression they contain is represented by a
FileCheckPatternSubstitution class. The naming refers to being a
substitution in a CHECK pattern but could be wrongly understood as being
a substitution of a pattern variable.
Third and lastly, comments use "numeric expression" to refer both to the
[[#]] blocks as well as to the numeric expressions these blocks contain
which get evaluated at match time.
This patch solves these confusions by
- calling variables in [[]] and [[#]] blocks as string and numeric
variables respectively;
- referring to [[]] and [[#]] as substitution *blocks*, with the former
being a string substitution block and the latter a numeric
substitution block;
- calling [[]] and [[#]] blocks to be replaced by the value of a
variable or expression they contain a substitution (as opposed to
definition when these blocks are used to defined a variable), with the
former being a string substitution and the latter a numeric
substitution;
- renaming the FileCheckPatternSubstitution as a FileCheckSubstitution
class with FileCheckStringSubstitution and
FileCheckNumericSubstitution subclasses;
- restricting the use of "numeric expression" to refer to the expression
that is evaluated in a numeric substitution.
While numeric substitution blocks only support numeric substitutions of
numeric expressions at the moment there are plans to augment numeric
substitution blocks to support numeric definitions as well as both a
numeric definition and numeric substitution in the same numeric
substitution block.
Chris Bieneman [Wed, 22 May 2019 21:42:06 +0000 (21:42 +0000)]
[Runtimes] If LLVM_INCLUDE_TESTS=On depend on gtest
Summary: If we are building the tests for the runtimes we should make them depend on gtest so that gtest is built and ready before we run any of the check-* targets.
Matt Arsenault [Wed, 22 May 2019 21:28:20 +0000 (21:28 +0000)]
TableGen: Handle nontrivial foreach range bounds
This allows using anything that isn't a literal integer as the bounds
for a foreach. Some of the diagnostics aren't perfect, but nobody ever
accused tablegen of having good errors. For example, the existing
wording suggests a bitrange is valid, but as far as I can tell this
has never worked.