QingShan Zhang [Fri, 24 May 2019 05:30:09 +0000 (05:30 +0000)]
[Power9] Add a specific heuristic to schedule the addi before the load
When we are scheduling the load and addi, if all other heuristic didn't take effect,
we will try to schedule the addi before the load, to hide the latency, and avoid the
true dependency added by RA. And this only take effects for Power9.
Yevgeny Rouban [Fri, 24 May 2019 04:34:23 +0000 (04:34 +0000)]
[NFC] SwitchInst: Introduce wrapper for prof branch_weights handling
This patch introduces a wrapper class that re-implements
several mutator methods of SwitchInst to handle changes
of prof branch_weights metadata along with remove/add
switch case methods.
Subsequent patches will use this wrapper to implement
prof branch_weights metadata handling for SwitchInst.
Reid Kleckner [Fri, 24 May 2019 01:27:20 +0000 (01:27 +0000)]
[AArch64] Preserve X8 for thunks ending in variadic musttail calls
Summary:
On Windows, X8 may be used to pass in the address of an aggregate that
is returned indirectly. Therefore, it should be forwarded to variadic
musttail calls and preserved in thunks.
David Blaikie [Fri, 24 May 2019 01:05:52 +0000 (01:05 +0000)]
dwarfdump: Add a bit more DWARF64 support
This test case was incorrect because it mixed DWARF32 and DWARF64 for a
single unit (DWARF32 unit referencing a DWARF64 str_offsets section). So
fix enough of the unit parsing for DWARF64 and make the test valid.
(not sure if anyone needs DWARF64 support though - support in
libDebugInfoDWARF has been added piecemeal and LLVM doesn't produce it
at all)
llvm-objcopy: Change sectionWithinSegment() to use virtual addresses instead of file offsets for SHT_NOBITS sections.
Without this, sectionWithinSegment() will return the wrong answer for bss
sections. This doesn't seem to matter now (for non-broken ELF files), but
it will matter with a change that I'm working on.
Daniel Sanders [Thu, 23 May 2019 23:02:56 +0000 (23:02 +0000)]
Break false dependencies on target libraries
Summary:
For the most part this consists of replacing ${LLVM_TARGETS_TO_BUILD} with
some combination of AllTargets* so that they depend on specific components
of a target backend rather than all of it. The overall effect of this is
that, for example, tools like opt no longer falsely depend on the
disassembler, while tools like llvm-ar no longer depend on the code
generator.
There's a couple quirks to point out here:
* AllTargetsCodeGens is a bit more prevalent than expected. Tools like dsymutil
seem to need it which I was surprised by.
* llvm-xray linked to all the backends but doesn't seem to need any of them.
It builds and passes the tests so that seems to be correct.
* I left gold out as it's not built when binutils is not available so I'm
unable to test it
Bob Haarman [Thu, 23 May 2019 22:28:18 +0000 (22:28 +0000)]
fix accidental implicit matches in elf-disassemble-symbol-labels-rel.test
llvm/test/tools/llvm-objdump/X86/elf-disassemble-symbol-labels-rel.test
uses --implicit-check-not to verify that certain patterns do not occur
in llvm-objdump's output, except in places where they are explicitly
checked. Unfortunately, the patterns are generic enough that they may
be part of the file name which is also output by llvm-objdump. This
change matches the line with the filename explicitly so that the
implicit patterns are not applied to it.
Sanjay Patel [Thu, 23 May 2019 21:49:47 +0000 (21:49 +0000)]
[InstSimplify] insertelement V, undef, ? --> V
This was part of InstCombine, but it's better placed in
InstSimplify. InstCombine also had an unreachable but weaker
fold for insertelement with undef index, so that is deleted.
Sanjay Patel [Thu, 23 May 2019 20:17:25 +0000 (20:17 +0000)]
[DAGCombiner] make folds of binops safe for opcodes that produce >1 value
This is no-functional-change-intended currently because the definition
of isBinOp() only includes opcodes that produce 1 value. But if we
share that implementation with isCommutativeBinOp() as proposed in
D62191, then we need to make sure that the callers bail out for
opcodes that they are not prepared to handle correctly.
Based on this, we can assume the high 15 bits of a frame index or sret
are 0. The frame index value is the per-lane offset, so the maximum
frame index value is MAX_WAVE_SCRATCH / wavesize.
Remove the corresponding subtarget feature and option that made
this configurable.
Alina Sbirlea [Thu, 23 May 2019 19:07:41 +0000 (19:07 +0000)]
[SLPVectorizer] Set flag to previous default.
Summary:
The refactoring in r360276 moved the `RunSLPVectorization` flag and added the default explicitly. The default should have been `false`, as before.
The new pass manager used to have SLPVectorization on by default, now it's off in opt, and needs D61617 checked in to enable it in clang.
Sanjay Patel [Thu, 23 May 2019 18:46:03 +0000 (18:46 +0000)]
[InstCombine] be more careful when transforming a shuffle mask
This is reduced from a fuzzer test:
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=14890
Usually, demanded elements should be able to simplify shuffle
mask elements that are pointing to undef elements of its source
operands, but that doesn't happen in the test case.
Jordan Rupprecht [Thu, 23 May 2019 18:43:19 +0000 (18:43 +0000)]
[git] Be more specific when looking for llvm-svn
Summary:
A commit may, for some reason, have `llvm-svn:` in it multiple times. It may even take up the whole line and look identical to what gets added automatically when svn commits land in github.
To workaround this, make changes to both lookups:
1) When doing the git -> svn lookup, make sure to go through the whole message, and:
a) Only look for llvm-svn starting at the beginning of the line (excluding the whitespace that `git log` adds).
b) Take the last one (at the end of the commit message), if there are multiple matches.
2) When doing the svn -> git lookup, look through a sizeable but still reasonably small number of git commits (10k, about 4-5 months right now), and:
a) Only consider commits with the '^llvm-svn: NNNNNN' we expect, and
b) Only consider those that also follow the same git -> svn matching above. (Error if it's not exactly one commit).
The functions findPotentiallyBlockedCopies and buildCopy are currently not
accounting for the presence of debug instructions. In the former this results
in the optimization not being trigerred, and in the latter results in
inconsistent codegen.
This patch enables the optimization to be performed in a debug build and
ensures the codegen is consistent with non-debug builds.
Kit Barton [Thu, 23 May 2019 17:56:35 +0000 (17:56 +0000)]
[LOOPINFO] Extend Loop object to add utilities to get the loop bounds, step, induction variable, and guard branch.
Summary:
This PR extends the loop object with more utilities to get loop bounds, step, induction variable, and guard branch. There already exists passes which try to obtain the loop induction variable in their own pass, e.g. loop interchange. It would be useful to have a common area to get these information. Moreover, loop fusion (https://reviews.llvm.org/D55851) is planning to use getGuard() to extend the kind of loops it is able to fuse, e.g. rotated loop with non-constant upper bound, which would have a loop guard.
Summary:
Remove all llvm:: prefixes in FileCheck library header and
implementation except for calls to make_unique and make_shared since
both files already use the llvm namespace.
Chris Bieneman [Thu, 23 May 2019 17:06:46 +0000 (17:06 +0000)]
[CMake] Copy C++ headers before configuring runtimes build
Summary: On some platforms C++ headers are packaged with the compiler not the sysroot. If you don't copy C++ headers into the build include directory during configuraiton of the outer build the C++ check during the runtime configuration may get inaccurate results.
Andrea Di Biagio [Thu, 23 May 2019 16:32:19 +0000 (16:32 +0000)]
[MCA] Add the ability to compute critical register dependency of an instruction.
This patch adds the methods `getCriticalRegDep()` and `computeCriticalRegDep()` to
class InstructionBase.
The goal is to allow users to obtain information about the critical register
dependency that most affects the latency of an instruction.
These methods are currently unused. However, the long term plan is to use them
in order to allow the computation of a critical-path as part of the bottleneck
analysis. So, this is yet another step towards fixing PR37494.
Shoaib Meenai [Thu, 23 May 2019 16:29:09 +0000 (16:29 +0000)]
[AsmPrinter] Treat a narrowing PtrToInt like Trunc
When printing assembly for PtrToInt, AsmPrinter::lowerConstant
incorrectly assumed that if PtrToInt was not converting to an
int with exactly the same number of bits, it must be widening
to a larger int. But this isn't necessarily true; PtrToInt can
also shrink the size, which is useful when you want to produce
a known 32-bit pointer on a 64-bit platform (on x86_64 ELF
this yields a R_X86_64_32 relocation).
The old behavior of falling through to the widening case for a
narrowing PtrToInt yields bogus assembly code like this, which
fails to assemble because the no-op bit and it accidentally
creates is not a valid relocation:
```
.long a&-1
```
The fix is to treat a narrowing PtrToInt exactly the same as
it already treats Trunc: just emit the expression and let
the assembler deal with truncating it in the appropriate way.
Fangrui Song [Thu, 23 May 2019 16:01:59 +0000 (16:01 +0000)]
[Object] object::ELFObjectFile::symbol_begin(): skip symbol index 0
For clients iterating the symbol table, none expects to handle index 0
(STN_UNDEF). Skip it to improve consistency with other binary formats.
Clients that need STN_UNDEF (e.g. lld) can use
getSectionContentsAsArray(). A test will be added in D62148.
Don Hinton [Thu, 23 May 2019 15:03:22 +0000 (15:03 +0000)]
[cmake] When getting Ninja version, don't include CMakeNinjaFindMake
which doesn't play well with passing CMAKE_MAKE_PROGRAM from the
commandline without a path.
Lewis Revill [Thu, 23 May 2019 14:46:27 +0000 (14:46 +0000)]
[RISCV] Support assembling TLS LA pseudo instructions
This patch adds the pseudo instructions la.tls.ie and la.tls.gd, used in
the initial-exec and global-dynamic TLS models respectively when
addressing a global. The pseudo instructions are expanded in the
assembly parser.
Andrea Di Biagio [Thu, 23 May 2019 13:42:47 +0000 (13:42 +0000)]
[MCA] Introduce class LSUnitBase and let LSUnit derive from it.
Class LSUnitBase provides a abstract interface for all the concrete LS units in
llvm-mca.
Methods exposed by the public abstract LSUnitBase interface are:
- Status isAvailable(const InstRef&);
- void dispatch(const InstRef &);
- const InstRef &isReady(const InstRef &);
LSUnitBase standardises the API, but not the data structures internally used by
LS units. This allows for more flexibility.
Previously, only method `isReady()` was declared virtual by class LSUnit.
Also, derived classes had to inherit all the internal data members of LSUnit.
James Henderson [Thu, 23 May 2019 12:38:06 +0000 (12:38 +0000)]
[llvm-objdump][test] Make test names consistent
This change renames a number of the disassembly tests to standardise
disasm/diassemble/disassembly to disassemble. Requested in
https://reviews.llvm.org/D62255.
James Henderson [Thu, 23 May 2019 12:30:39 +0000 (12:30 +0000)]
[llvm-objdump][test] Improve testing of some switches #3
This is the third commit in a series of patches to improve test coverage
of llvm-objdump. In this patch I have added a number of tests testing
various aspects of disassembly.
Konrad Kleine [Thu, 23 May 2019 11:14:47 +0000 (11:14 +0000)]
[lldb] NFC modernize codebase with modernize-use-nullptr
Summary:
NFC = [[ https://llvm.org/docs/Lexicon.html#nfc | Non functional change ]]
This commit is the result of modernizing the LLDB codebase by using
`nullptr` instread of `0` or `NULL`. See
https://clang.llvm.org/extra/clang-tidy/checks/modernize-use-nullptr.html
for more information.
This is the command I ran and I to fix and format the code base:
NOTE: There were also changes to `llvm/utils/unittest` but I did not
include them because I felt that maybe this library shall be updated in
isolation somehow.
NOTE: I know this is a rather large commit but it is a nobrainer in most
parts.
Petar Jovanovic [Thu, 23 May 2019 10:37:13 +0000 (10:37 +0000)]
[DwarfExpression] Refactor dwarf expression (NFC)
Refactor location description kind in order to be easier for extensions
(needed for D60866).
In addition, cut off some bits from the other class fields.
James Henderson [Thu, 23 May 2019 10:17:10 +0000 (10:17 +0000)]
[llvm-objdump][test] Improve testing of some switches #2
This patch focuses on adding additional testing for the --source switch.
For reference, the source-interleave-x86_64.ll test file has been split
into two parts - the input (shared with the other tests) and the test
itself.
George Rimar [Thu, 23 May 2019 09:18:57 +0000 (09:18 +0000)]
[llvm-objcopy] - Many minor NFC changes to cleanup/improve the code in ELF/Object.cpp.
The code in ELF/Object.cpp is sometimes a bit hard to read because of
lots of auto used everywhere. The main intention of this patch is
to replace them with the real type for places where it is not obvious.
Also it cleanups few places.
It is NFC change, but I want to be sure that there is no objections to do that since it
is massive.
Sam Parker [Thu, 23 May 2019 07:46:39 +0000 (07:46 +0000)]
[ARM][CGP] Clear SafeWrap before each search
The previous patch added a member set to store instructions that we
could allow to wrap. But this wasn't cleared between searches meaning
that they could get promoted, incorrectly, during the promotion of a
separate valid chain.
Christian Bruel [Thu, 23 May 2019 05:53:10 +0000 (05:53 +0000)]
[GlobalOpt] recognize dead struct fields and propagate values
Summary:
Allow struct fields SRA and dead stores. This works by considering fields accesses from getElementPtr to be considered as a possible pointer root that can be cleaned up.
We check that the variable can be SRA by recursively checking the sub expressions with the new isSafeSubSROAGEP function.
basically this allows the array in following C code to be optimized out
Thomas Lively [Thu, 23 May 2019 01:24:01 +0000 (01:24 +0000)]
[WebAssembly] Implement __builtin_return_address for emscripten
Summary:
In this patch, `ISD::RETURNADDR` is lowered on the emscripten target
to the new Emscripten runtime function `emscripten_return_address`, which
implements the functionality.
Fangrui Song [Thu, 23 May 2019 01:05:13 +0000 (01:05 +0000)]
[X86] Support -fno-plt __tls_get_addr calls
In general dynamic/local dynamic TLS models, with -fno-plt,
* x86: emit `calll *___tls_get_addr@GOT(%ebx)` instead of `calll ___tls_get_addr@PLT`
Note, on x86, if we can get rid of %ebx as the PIC register,
it may be better to use a register not preserved across function calls.
* x86_64: emit `callq *__tls_get_addr@GOTPCREL(%rip)` instead of `callq __tls_get_addr@PLT`
Reorganize the code by separating 32-bit and 64-bit.
Summary:
With now a clear distinction between string and numeric substitutions,
this patch introduces separate classes to represent them with a parent
class implementing the common interface. Diagnostics in
printSubstitutions() are also adapted to not require knowing which
substitution is being looked at since it does not hinder clarity and
makes the implementation simpler.
Summary:
Terminology introduced by [[#]] blocks is confusing and does not
integrate well with existing terminology.
First, variables referred by [[]] blocks are called "pattern variables"
while the text a CHECK directive needs to match is called a "CHECK
pattern". This is inconsistent with variables in [[#]] blocks since
[[#]] blocks are also found in CHECK pattern yet those variables are
called "numeric variable".
Second, the replacing of both [[]] and [[#]] blocks by the value of the
variable or expression they contain is represented by a
FileCheckPatternSubstitution class. The naming refers to being a
substitution in a CHECK pattern but could be wrongly understood as being
a substitution of a pattern variable.
Third and lastly, comments use "numeric expression" to refer both to the
[[#]] blocks as well as to the numeric expressions these blocks contain
which get evaluated at match time.
This patch solves these confusions by
- calling variables in [[]] and [[#]] blocks as string and numeric
variables respectively;
- referring to [[]] and [[#]] as substitution *blocks*, with the former
being a string substitution block and the latter a numeric
substitution block;
- calling [[]] and [[#]] blocks to be replaced by the value of a
variable or expression they contain a substitution (as opposed to
definition when these blocks are used to defined a variable), with the
former being a string substitution and the latter a numeric
substitution;
- renaming the FileCheckPatternSubstitution as a FileCheckSubstitution
class with FileCheckStringSubstitution and
FileCheckNumericSubstitution subclasses;
- restricting the use of "numeric expression" to refer to the expression
that is evaluated in a numeric substitution.
While numeric substitution blocks only support numeric substitutions of
numeric expressions at the moment there are plans to augment numeric
substitution blocks to support numeric definitions as well as both a
numeric definition and numeric substitution in the same numeric
substitution block.
Chris Bieneman [Wed, 22 May 2019 21:42:06 +0000 (21:42 +0000)]
[Runtimes] If LLVM_INCLUDE_TESTS=On depend on gtest
Summary: If we are building the tests for the runtimes we should make them depend on gtest so that gtest is built and ready before we run any of the check-* targets.
Matt Arsenault [Wed, 22 May 2019 21:28:20 +0000 (21:28 +0000)]
TableGen: Handle nontrivial foreach range bounds
This allows using anything that isn't a literal integer as the bounds
for a foreach. Some of the diagnostics aren't perfect, but nobody ever
accused tablegen of having good errors. For example, the existing
wording suggests a bitrange is valid, but as far as I can tell this
has never worked.
Petr Hosek [Wed, 22 May 2019 21:08:33 +0000 (21:08 +0000)]
[runtimes] Move libunwind, libc++abi and libc++ to lib/$target/c++ and include/c++
This change is a consequence of the discussion in "RFC: Place libs in
Clang-dedicated directories", specifically the suggestion that
libunwind, libc++abi and libc++ shouldn't be using Clang resource
directory. Tools like clangd make this assumption, but this is
currently not true for the LLVM_ENABLE_PER_TARGET_RUNTIME_DIR build.
This change addresses that by moving the output of these libraries to
lib/$target/c++ and include/c++ directories, leaving resource directory
only for compiler-rt runtimes and Clang builtin headers.
Craig Topper [Wed, 22 May 2019 21:00:18 +0000 (21:00 +0000)]
[X86] Explcitly disable VEXTRACT instruction matching for an immediate of 0. Remove a bunch of isel patterns that become unnecessary.
We effectively had a second set of isel patterns that tried to use a
regular store instruction and an extract_subreg instruction. Or a masked move
and an extract_subreg. These patterns were intended to override the
matching of VEXTRACT instructions by taking advantage of the priority
of the explicit immediate 0 for the index.
This patch instaed just disables the immediate 0 matchin the VEXTRACT
patterns. This each of the component pieces of the larger patterns will
match by themselves.
This found a bug of sorts were we didn't use 128-bit store for 512->128
extract on KNL. Its unclear what the right thing here should be.
Using the vextract avoids constraining the register allocator to use
xmm0-15. But it always results in a longer encoding if the register
allocator ends up choosing xmm0-15 anyway.
Craig Topper [Wed, 22 May 2019 20:04:55 +0000 (20:04 +0000)]
[X86][InstCombine] Remove InstCombine code that turns X86 round intrinsics into llvm.ceil/floor. Remove some isel patterns that existed because that was happening.
We were turning roundss/sd/ps/pd intrinsics with immediates of 1 or 2 into
llvm.floor/ceil. The llvm.ceil/floor intrinsics are supposed to correspond
to the libm functions. For the libm functions we need to disable the
precision exception so the llvm.floor/ceil functions should always map to
encodings 0x9 and 0xA.
We had a mix of isel patterns where some used 0x9 and 0xA and others used
0x1 and 0x2. We need to be consistent and always use 0x9 and 0xA.
Since we have no way in isel of knowing where the llvm.ceil/floor came
from, we can't map X86 specific intrinsics with encodings 1 or 2 to it.
We could map 0x9 and 0xA to llvm.ceil/floor instead, but I'd really like
to see a use case and optimization advantage first.
I've left the backend test cases to show the blend we now emit without
the extra isel patterns. But I've removed the InstCombine tests completely.
It copies 32-bit value from gpr32 into gpr64. But Live DEBUG_VALUE analysis
is not able to create debug location record for that instruction. So the problem
is in that debug info for argc variable is incorrect. The fix is
to write custom isCopyInstrImpl() which would recognize the ORRWrs instr.
Xing Xue [Wed, 22 May 2019 17:41:27 +0000 (17:41 +0000)]
Disable EHFrameSupport in JITLink/RuntimeDyld on AIX
Summary:
EH Frames aren't supported on AIX with the system compiler, but the definition of HAVE_EHTABLE_SUPPORT misses this which causes linking problems on AIX. This patch updates the definition of HAVE_EHTABLE_SUPPORT in both JITLink and RuntimeDyld.
Matt Arsenault [Wed, 22 May 2019 16:28:41 +0000 (16:28 +0000)]
MC: Allow getMaxInstLength to depend on the subtarget
Keep it optional in cases this is ever needed in some global
context. Currently it's only used for getting an upper bound inline
asm code size.
For AMDGPU, gfx10 increases the maximum instruction size to
20-bytes. This avoids penalizing older subtargets when estimating code
size, and making some annoying branch relaxation test adjustments.
Nico Weber [Wed, 22 May 2019 15:53:23 +0000 (15:53 +0000)]
llvm-undname: Fix an assert-on-invalid, found by oss-fuzz
If a template parameter refers to a pointer to member, but the mangling
of that was a string literal instead of a real symbol, llvm-undname used
to crash instead of rejecting the input.
Sanjay Patel [Wed, 22 May 2019 15:50:46 +0000 (15:50 +0000)]
[IR] allow fast-math-flags on select of FP values
This is a minimal start to correcting a problem most directly discussed in PR38086:
https://bugs.llvm.org/show_bug.cgi?id=38086
We have been hacking around a limitation for FP select patterns by using the
fast-math-flags on the condition of the select rather than the select itself.
This patch just allows FMF to appear with the 'select' opcode. No changes are
needed to "FPMathOperator" because it already includes select-of-FP because
that definition is based on the (return) value type.
Once we have this ability, we can start correcting and adding IR transforms
to use the FMF on a 'select' instruction. The instcombine and vectorizer test
diffs only show that the IRBuilder change is behaving as expected by applying
an FMF guard value to 'select'.
For reference:
rL241901 - allowed FMF with fcmp
rL255555 - allowed FMF with FP calls