Zachary Turner [Tue, 8 Jan 2019 21:05:51 +0000 (21:05 +0000)]
[llvm-undname] Add support for demangling msvc's noexcept types.
Starting in C++17, MSVC introduced a new mangling for function
parameters that are themselves noexcept functions. This patch
makes llvm-undname properly demangle them.
Patch by Zachary Henkel
Differential Revision: https://reviews.llvm.org/D55769
Anna Thomas [Tue, 8 Jan 2019 17:16:25 +0000 (17:16 +0000)]
[UnrollRuntime] Fix domTree failures in multiexit unrolling
Summary:
This fixes the IDom for exit blocks and all blocks reachable from the exit blocks, when runtime unrolling under multiexit/exiting case.
We initially had a restrictive check that the IDom is only updated when
it is the header of the loop.
However, we also need to update the IDom to the correct one when the
IDom is any block within the original loop. See added test cases (which
fail dom tree verification without the patch).
Yonghong Song [Tue, 8 Jan 2019 16:36:06 +0000 (16:36 +0000)]
[BPF] Fix .BTF.ext reloc type assigment issue
Commit f1db33c5c1a9 ("[BPF] Disable relocation for .BTF.ext section")
assigned relocation type R_BPF_NONE if the fixup type
is FK_Data_4 and the symbol is temporary.
The reason is we use FK_Data_4 as a fixup type
for insn offsets in .BTF.ext section.
Just checking whether the symbol is temporary is not enough.
For example, .debug_info may reference some strings whose
fixup is FK_Data_4 with a temporary symbol as well.
To truely reflect the case for .BTF.ext section,
this patch further checks that the section associateed with the symbol
must be SHF_ALLOC and SHF_EXECINSTR, i.e., in the text section.
This fixed the above-mentioned problem.
Signed-off-by: Yonghong Song <yhs@fb.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@350637 91177308-0d34-0410-b5e6-96231b3b80d8
Florian Hahn [Tue, 8 Jan 2019 15:16:23 +0000 (15:16 +0000)]
[MachineVerifier] Include offending register in allocatable live-in error msg.
This patch adds a convenience report() method for physical registers and
uses it to print the offending register with the 'MBB has allocatable
live-in' error.
Nico Weber [Tue, 8 Jan 2019 15:16:14 +0000 (15:16 +0000)]
[gn build] Add build files for llvm/lib/Target/PowerPC + tests
The PowerPC target itself is similar to the X86 target in https://reviews.llvm.org/rL348903
The llvm-exegesis unittests bits are similar to the corresponding AArch64 in https://reviews.llvm.org/rL350499
The whole patch is very similar to the WebAssembly target being added in https://reviews.llvm.org/rL350628
Also add a dep from tools/llvm-exegesis/lib to the AArch64 subdir, which I
failed to do in r350499.
The motivation for this target is solely that it has a unit test and I want to
enable the GN<->CMake unittest syncing check for llvm.
Nico Weber [Tue, 8 Jan 2019 15:12:42 +0000 (15:12 +0000)]
[gn build] Add build files for llvm/lib/Target/WebAssembly + tests
The WebAssembly target itself is similar to the X86 target in https://reviews.llvm.org/rL348903
The unittests bits are similar to the corresponding AArch64 in https://reviews.llvm.org/rL350499
The motivation for this target is solely that it has a unit test and I want to
enable the GN<->CMake unittest syncing check for llvm. (After this, only the
PowerPC target is needed and I can turn it on.)
Petr Pavlu [Tue, 8 Jan 2019 14:19:06 +0000 (14:19 +0000)]
[GlobalISel] Fix choice of instruction selector for AArch64 at -O0 with -global-isel=0
Commit rL347861 introduced an unintentional change in the behaviour when
compiling for AArch64 at -O0 with -global-isel=0. Previously, explicitly
disabling GlobalISel resulted in using FastISel but an updated condition
in the commit changed it to using SelectionDAG. The patch fixes this
condition and slightly better organizes the code that chooses the
instruction selector.
However, we are falling back to DWARF in this case because we cannot
encode %rax as a saved register.
This requirement is wrong, since we don't care about the contents of
%rax, it is the equivalent of a sub.
In order to specify that we care about the contents of %rax, we would
need a .cfi_offset %rax, <offset>.
It's also overzealous in the case where there are pushes for callee saved
registers followed by a "push rax/eax" instead of "sub", in which case we should
also be able to encode the callee saved regs and everything else using compact
unwind.
We have code to split vector splats (of zero and non-zero) for performance
reasons, but it ignores the fact that a store might be truncating.
Actually, truncating stores are formed for vNi8 and vNi16 types. Since the
truncation is from a legal type, the size of the store is always <= 64-bits and
so they don't actually benefit from being split up anyway, so this patch just
disables that transformation.
Sam Parker [Tue, 8 Jan 2019 10:12:36 +0000 (10:12 +0000)]
[ARM] Add missing patterns for DSP muls
Using a PatLeaf for sext_16_node allowed matching smulbb and smlabb
instructions once the operands had been sign extended. But we also
need to use sext_inreg operands along with sext_16_node to catch a
few more cases that enable use to remove the unnecessary sxth.
Matt Arsenault [Tue, 8 Jan 2019 06:30:53 +0000 (06:30 +0000)]
AMDGPU/GlobalISel: Introduce vcc reg bank
I'm not entirely sure this is the correct thing
to do with the global isel philosophy, but I think
this is necessary to handle how differently SGPRs
are used normally vs. from a condition.
For example, it makes sense to allow a copy
from a VGPR to an SGPR, but it makes no sense
to allow a copy from VGPRs to SGPRs used as
select mask.
This avoids regbankselecting strange code with
a truncate feeding directly into a condition field.
Now a copy is forced from sgpr(s1) to vcc, which is
more sensible to handle.
Some of these issues could probably avoided with making enough
operations resulting in i1 illegal. I think we can't avoid
this register bank for legality.
For example, an i1 and where one source is from a truncate, and
one source is a compare needs some kind of copy inserted to
make sure both are in condition registers.
Thomas Lively [Tue, 8 Jan 2019 06:25:55 +0000 (06:25 +0000)]
[WebAssembly] Massive instruction renaming
Summary:
An automated renaming of all the instructions listed at
https://github.com/WebAssembly/spec/issues/884#issuecomment-426433329
as well as some similarly-named identifiers.
Robert Widmann [Tue, 8 Jan 2019 06:24:19 +0000 (06:24 +0000)]
[LLVM-C] Allow For Creating a BasicBlock without a Parent Function
Summary: Add a utility function for creating a basic block without a parent function. A useful operation for compilers that need to synthesize and conditionally insert code without having to bother with appending and immediately unlinking a block.
Robert Widmann [Tue, 8 Jan 2019 06:23:22 +0000 (06:23 +0000)]
[LLVM-C] Allow Specifying Signedness in Int Cast
Summary: Fix an old outstanding problem with the int cast builder binding always assuming the cast is signed by introducing a new LLVMBuildIntCast2 operation and deprecating the old prototype.
Heejin Ahn [Tue, 8 Jan 2019 01:25:12 +0000 (01:25 +0000)]
[WebAssembly] Move CFG-changing passes before RegStackify
Summary:
FixIrreducibleControlFlow and LateEHPrepare both possibly modify CFG and
create new registers. There seems to be no reason these passes go after
register-related optimization passes (PrepareForLiveIntervals,
OptimizeLiveIntervals, StoreResults, RegStackify, and RegColoring), and
this also possibly create new optimization opportunities. I think we
should put all current and future optimization passes before RegStackify
(and related passes) unless there's a reason not to.
Matt Arsenault [Tue, 8 Jan 2019 01:22:47 +0000 (01:22 +0000)]
RegBankSelect: Fix copy insertion point for terminators
If a copy was needed to handle the condition of brcond, it was being
inserted before the defining instruction. Add tests for iterator edge
cases.
I find the existing code here suspect for the case where it's looking
for terminators that modify the register. It's going to insert a copy
in the middle of the terminators, which isn't allowed (it might be
necessary to have a COPY_terminator if anybody actually needs this).
[dsymutil] Fix assertion triggered by empty address range.
An assertion was hit when running dsymutil on a gcc generated binary
that contained an empty address range. Address ranges are stored in an
interval map of half open intervals. Since the interval is empty and
therefore meaningless, we simply don't add it to the map.
Wei Mi [Tue, 8 Jan 2019 00:26:11 +0000 (00:26 +0000)]
[RegisterCoalescer] dst register's live interval needs to be updated when
merging a src register in ToBeUpdated set.
This is to fix PR40061 related with https://reviews.llvm.org/rL339035.
In https://reviews.llvm.org/rL339035, live interval of source pseudo register
in rematerialized copy may be saved in ToBeUpdated set and its update may be
postponed.
In PR40061, %t2 = %t1 is rematerialized and %t1 is added into toBeUpdated set
to postpone its live interval update. After the rematerialization, the live
interval of %t1 is larger than necessary. Then %t1 is merged into %t3 and %t1
gets removed. After the merge, %t3 contains live interval larger than necessary.
Because %t3 is not in toBeUpdated set, its live interval is not updated after
register coalescing and it will break some assumption in regalloc.
The patch requires the live interval of destination register in a merge to be
updated if the source register is in ToBeUpdated.
The unobufscation support for BCSymbolMaps was the last piece of code
that hasn't been upstreamed yet. This patch contains a reworked version
of the existing code and relevant tests.
Rong Xu [Mon, 7 Jan 2019 23:25:56 +0000 (23:25 +0000)]
[PGO] Use SourceFileName rather module name in PGOFuncName
In LTO or Thin-lto mode (though linker plugin), the module
names are of temp file names which are different for
different compilations. Using SourceFileName avoids the issue.
This should not change any functionality for current PGO as
all the current callers of getPGOFuncName() is before LTO.
Craig Topper [Mon, 7 Jan 2019 20:13:45 +0000 (20:13 +0000)]
[X86][AutoUpgrade] Make some tweaks to reduce the number of nested if/else in the intrinsic upgrade code to avoid an MSVC compiler limit.
MSVC has a nesting limit of around 110-130. An if/else if/else if counts against this next level. The autoupgrade code consists a long chain of these checking matches against strings.
This commit moves some code to a helper function to move out a large if/else chain that was inside of one of the blocks into a separate function. There are more of these we could move or we could change some to lookup tables.
I've also merged together a few similar blocks in the outer chain. This should buy us some margin for a little bit.
Craig Topper [Mon, 7 Jan 2019 19:30:43 +0000 (19:30 +0000)]
[TargetLowering][AMDGPU] Remove the SimplifyDemandedBits function that takes a User and OpIdx. Stop using it in AMDGPU target for simplifyI24.
As we saw in D56057 when we tried to use this function on X86, it's unsafe. It allows the operand node to have multiple users, but doesn't prevent recursing past the first node when it does have multiple users. This can cause other simplifications earlier in the graph without regard to what bits are needed by the other users of the first node. Ideally all we should do to the first node if it has multiple uses is bypass it when its not needed by the user we started from. Doing any other transformation that SimplifyDemandedBits can do like turning ZEXT/SEXT into AEXT would result in an increase in instructions.
Fortunately, we already have a function that can do just that, GetDemandedBits. It will only make transformations that involve bypassing a node.
This patch changes AMDGPU's simplifyI24, to use a combination of GetDemandedBits to handle the multiple use simplifications. And then uses the regular SimplifyDemandedBits on each operand to handle simplifications allowed when the operand only has a single use. Unfortunately, GetDemandedBits simplifies constants more aggressively than SimplifyDemandedBits. This caused the -7 constant in the changed test to be simplified to remove the upper bits. I had to modify computeKnownBits to account for this by ignoring the upper 8 bits of the input.
Alina Sbirlea [Mon, 7 Jan 2019 18:40:27 +0000 (18:40 +0000)]
[MemorySSA] Extend the clobber walker with the option to skip the starting access.
Summary:
The option enables loop transformations to hoist accesses that do not
have clobbers in the loop. If the clobber queries skips the starting
access, the result may be outside the loop instead of the header Phi.
Adding the walker that uses this option in a separate patch.
Nikita Popov [Mon, 7 Jan 2019 18:03:36 +0000 (18:03 +0000)]
[DemandedBits] Use SetVector for Worklist
DemandedBits currently uses a simple vector for the worklist, which
means that instructions may be inserted multiple times into it.
Especially in combination with the deep lattice, this may cause
instructions too be recomputed very often. To avoid this, switch
to a SetVector.
[elfabi] Add option to manually specify file read format
Although llvm-elfabi will attempt to read input files without needing the format to be manually specified, doing so has the potential to introduce extraneous errors that can hinder debugging (since multiple readers may fail in attempts to read the file). This change allows the input file format to be manually specified to force elfabi to use a single reader. This makes it easier to test and debug errors specific to a given reader.
Summary:
The -O flag is currently being mostly ignored; it's only checked whether or not the output format is "binary". This adds support for a few formats (e.g. elf64-x86-64), so that when specified, the output can change between 32/64 bit and sizes/alignments are updated accordingly.
David Greene [Mon, 7 Jan 2019 16:24:37 +0000 (16:24 +0000)]
[lit] Respect PYTHONPATH
If a user has PYTHONPATH set in the environment, append new entries to
it rather than blindly setting PYTHONPATH to a fixed string. This
allows tests to, for example, find psutil if it is in
PYTHONPATH. Without this change, lit will detect psutil but then
various tests will fail because PYTHONPATH has been overwritten and
psutil cannot be found.
Rhys Perry [Mon, 7 Jan 2019 15:52:28 +0000 (15:52 +0000)]
AMDGPU: test for uniformity of branch instruction, not its condition
Summary:
If a divergent branch instruction is marked as divergent by propagation
rule 2 in DivergencePropagator::exploreSyncDependency() and its condition
is uniform, that branch would incorrectly be assumed to be uniform.
[CallSite removal] Migrate all Alias Analysis APIs to use the newly
minted `CallBase` class instead of the `CallSite` wrapper.
This moves the largest interwoven collection of APIs that traffic in
`CallSite`s. While a handful of these could have been migrated with
a minorly more shallow migration by converting from a `CallSite` to
a `CallBase`, it hardly seemed worth it. Most of the APIs needed to
migrate together because of the complex interplay of AA APIs and the
fact that converting from a `CallBase` to a `CallSite` isn't free in its
current implementation.
Out of tree users of these APIs can fairly reliably migrate with some
combination of `.getInstruction()` on the `CallSite` instance and
casting the resulting pointer. The most generic form will look like `CS`
-> `cast_or_null<CallBase>(CS.getInstruction())` but in most cases there
is a more elegant migration. Hopefully, this migrates enough APIs for
users to fully move from `CallSite` to the base class. All of the
in-tree users were easily migrated in that fashion.
[CallSite removal] Add `CallBase` support to the `InstVisitor` in such
a way that it still supports `CallSite` but users can be ported to rely
on `CallBase` instead.
This will unblock the ports across the analysis and transforms libraries
(and out-of-tree users) and once done we can clean this up by removing
the `CallSite` layer.
Nico Weber [Mon, 7 Jan 2019 01:26:12 +0000 (01:26 +0000)]
[gn build] Add build files for llvm/lib/Target/ARM + tests
The ARM target itself is similar to the X86 target in https://reviews.llvm.org/rL348903
The llvm-exegesis unittests ARM bits are similar to the X86 bits in https://reviews.llvm.org/rL350413
Both are similar to the corresponding AArch64 bits in https://reviews.llvm.org/rL350499 too
After this, everything in my local GN branch is upstreamed to LLVM.
Nico Weber [Mon, 7 Jan 2019 01:23:11 +0000 (01:23 +0000)]
[gn build] Add build files for llvm/lib/Target/AArch64 + tests
The AArch64 target itself is similar to the X86 target in https://reviews.llvm.org/rL348903
The llvm-exegesis AArch64 bits are similar to the X86 bits in http://reviews.llvm.org/rL350184
The llvm-exegesis unittests AArch64 bits are similar to the X86 bits in https://reviews.llvm.org/rL350413
llvm/unittests/Target/AArch64 doesn't have an equivalent since the X86 Target
only has lit tests, no unittests.
Sanjay Patel [Sun, 6 Jan 2019 16:21:42 +0000 (16:21 +0000)]
[x86] explicitly set cost of integer add/sub
There are no test changes here in the existing cost model
regression tests because integer add/sub have a default
legal cost of 1 already. This would break, however, if
we custom lower those ops because the default cost model
assumes that custom-lowered ops are more expensive.
This is similar to the change in rL350403. See discussion
in D56011 for more details. When we enhance that patch to
handle integer ops, we need this cost model change to avoid
unintended diffs here from the custom lowering.
Lama Saba [Sun, 6 Jan 2019 15:45:40 +0000 (15:45 +0000)]
Resubmit rL345008 "Split MachinePipeliner code into header and cpp files"
Resubmitted in rL345290 and reverted in rL350345 due to failures in
http://green.lab.llvm.org/green/job/lldb-cmake/
Resubmitting after a workaround to lldb-cmake failure was
committed in rL350346, more info in https://reviews.llvm.org/D56084
Craig Topper [Sat, 5 Jan 2019 23:30:28 +0000 (23:30 +0000)]
[X86][AsmParser] Don't allow X86::DX in CheckBaseRegAndIndexRegAndScale.
This was here because out and in instructions allow '(%dx)' even though its not a memory reference. To handle this we build a special operand for the DX register reference before we get to the call to CheckBaseRegAndIndexRegAndScale. So we no longer need this special case.
Craig Topper [Sat, 5 Jan 2019 21:40:07 +0000 (21:40 +0000)]
[X86] Allow combinevxi1Bitcast to use pmovmskb on avx512 targets if the input is a truncate from v16i8/v32i8.
This is especially helpful on targets without avx512bw since we don't have a good way to convert from v16i8/v32i8 to v16i1/v32i1 for the truncate anyway. If we're just going to convert it to a GPR we might as well use pmovmskb to accomplish both.