Philip Reames [Tue, 29 Jan 2019 06:34:46 +0000 (06:34 +0000)]
[Tests] Regen to remove future test diffs
This file appears to have been manually editted at some point after being auto-updated. A future change adjusts this file slightly, and all of the updates makes the diff super confusing.
Max Kazantsev [Tue, 29 Jan 2019 05:37:59 +0000 (05:37 +0000)]
[SCEV] Take correct loop in AddRec simplification. PR40420
The code of AddRec simplification is using wrong loop when it creates a new
AddRecExpr. It should be using AddRecLoop which we have saved and against which
all gate checks are made, and not calling AddRec->getLoop() over and over
again because AddRec may change and become an AddRecurrency from outer loop
during the transform iterations.
Considering this change trivial, commiting for postcommit review.
Teresa Johnson [Tue, 29 Jan 2019 02:04:01 +0000 (02:04 +0000)]
Try to make new test more resilient to different orderings
New test added in r352441 getting a bot failure which I believe is
due to different ordering in the dumping which isn't being handled
well. Try to make test more resilient to ordering differences.
Sam Clegg [Tue, 29 Jan 2019 00:30:46 +0000 (00:30 +0000)]
[WebAssembly] Handle more types of uses in WebAssemblyAddMissingPrototypes
Previously we were only handling bitcast operations, however
prototypeless functions can also appear in other places such as
comparisons and as function params.
Switch to using replaceAllUsesWith() to replace the prototype-less
function uses. This new approach results in some redundant bitcasting
but is much simpler and handles all cases.
Reid Kleckner [Tue, 29 Jan 2019 00:30:35 +0000 (00:30 +0000)]
[PPC] Include tablegenerated PPCGenCallingConv.inc once
Move the CC analysis implementation to its own .cpp file instead of
duplicating it and artificually using functions in PPCISelLowering.cpp
and PPCFastISel.cpp. Follow-up to the same change done for X86, ARM, and
AArch64.
Teresa Johnson [Mon, 28 Jan 2019 23:43:26 +0000 (23:43 +0000)]
[ThinLTO] Add option to dump per-module summary dot graph
Summary:
I found that there currently isn't a way to invoke exportToDot from
the command line for a per-module summary index, and therefore no
testing of that case. Add an internal option and use it to test dumping
of per module summary indexes.
In particular, I am looking at fixing the limitation that causes the
aliasee GUID in the per-module summary to be 0, and want to be able to
test that change.
Philip Reames [Mon, 28 Jan 2019 23:24:49 +0000 (23:24 +0000)]
Demanded elements support for vector GEPs
GEPs can produce either scalar or vector results. If we're extracting only a subset of the vector lanes, simplifying the operands is helpful in eliminating redundant computation, and (eventually) allowing further optimizations
Teresa Johnson [Mon, 28 Jan 2019 22:27:05 +0000 (22:27 +0000)]
[ThinLTO] Refine reachability check to fix compile time increase
Summary:
A recent fix to the ThinLTO whole program dead code elimination (D56117)
increased the thin link time on a large MSAN'ed binary by 2x.
It's likely that the time increased elsewhere, but was more noticeable
here since it was already large and ended up timing out.
That change made it so we would repeatedly scan all copies of linkonce
symbols for liveness every time they were encountered during the graph
traversal. This was needed since we only mark one copy of an aliasee as
live when we encounter a live alias. This patch fixes the issue in a
more efficient manner by simply proactively visiting the aliasee (thus
marking all copies live) when we encounter a live alias.
Two notes: One, this requires a hash table lookup (finding the aliasee
summary in the index based on aliasee GUID). However, the impact of this
seems to be small compared to the original pre-D56117 thin link time. It
could be addressed if we keep the aliasee ValueInfo in the alias summary
instead of the aliasee GUID, which I am exploring in a separate patch.
Second, we only populate the aliasee GUID field when reading summaries
from bitcode (whether we are reading individual summaries and merging on
the fly to form the compiled index, or reading in a serialized combined
index). Thankfully, that's currently the only way we can get to this
code as we don't yet support reading summaries from LLVM assembly
directly into a tool that performs the thin link (they must be converted
to bitcode first). I added a FIXME, however I have the fix under test
already. The easiest fix is to simply populate this field always, which
isn't hard, but more likely the change I am exploring to store the
ValueInfo instead as described above will subsume this. I don't want to
hold up the regression fix for this though.
Craig Topper [Mon, 28 Jan 2019 21:38:47 +0000 (21:38 +0000)]
Recommit r352255 "[SelectionDAG][X86] Don't use SEXTLOAD for promoting masked loads in the type legalizer"
This did not cause the buildbot failure it was previously reverted for.
Original commit message:
I'm not sure why we were using SEXTLOAD. EXTLOAD seems more appropriate since we don't care about the upper bits.
This patch changes this and then modifies the X86 post legalization combine to emit a extending shuffle instead of a sign_extend_vector_inreg. Could maybe use an any_extend_vector_inre
On AVX512 targets I think we might be able to use a masked vpmovzx and not have to expand this at all.
Yonghong Song [Mon, 28 Jan 2019 21:35:23 +0000 (21:35 +0000)]
[RuntimeDyld] load all sections with ProcessAllSections
This patch tried to address the following use case.
. bcc (https://github.com/iovisor/bcc) utilizes llvm JIT to
compile for BTF target.
. with -g, .BTF and .BTF.ext sections (BPF debug info)
will be generated by LLVM.
. .BTF does not have relocations and .BTF.ext has some
relocations.
. With ProcessAllSections, .BTF.ext is loaded by JIT dynamic linker
and is available to application. But .BTF is not loaded.
The bcc application needs both .BTF.ext and .BTF for debugging
purpose, and .BTF is not loaded. This patch addressed this issue
by iterating over all sections and loading any missing
sections, after symbol/relocation processing in loadObjectImpl().
Signed-off-by: Yonghong Song <yhs@fb.com>
Differential Revision: https://reviews.llvm.org/D55943
Reid Kleckner [Mon, 28 Jan 2019 21:28:40 +0000 (21:28 +0000)]
[AArch64] Include AArch64GenCallingConv.inc once
Summary:
Avoids duplicating generated static helpers for calling convention
analysis.
This also means you can modify AArch64CallingConv.td without recompiling
the AArch64ISelLowering.cpp monolith, so it provides faster incremental
rebuilds.
Saves 12K in llc.exe, but adds a new object file, which is large.
Matt Arsenault [Mon, 28 Jan 2019 20:14:49 +0000 (20:14 +0000)]
AMDGPU: Add DS append/consume intrinsics
Since these pass the pointer in m0 unlike other DS instructions, these
need to worry about whether the address is uniform or not. This
assumes the address is dynamically uniform, and just uses
readfirstlane to get a copy into an SGPR.
I don't know if these have the same 16-bit add for the addressing mode
offset problem on SI or not, but I've just assumed they do.
Also includes some misc. changes to avoid test differences between the
LDS and GDS versions.
Jessica Paquette [Mon, 28 Jan 2019 19:53:14 +0000 (19:53 +0000)]
[GlobalISel][AArch64] Add instruction selection support for @llvm.log10
This adds instruction selection support for @llvm.log10 in AArch64. It teaches
GISel to lower it to a library call, updates the relevant tests, and adds a
legalizer test for log10.
Nico Weber [Mon, 28 Jan 2019 19:32:52 +0000 (19:32 +0000)]
gn build: Fix `lld-link: unknown flag: -fuse-ld=lld` warnings on Windows
Fixes a minor regression from r351248.
While here, also make it possible to opt out of lld by saying
use_lld=false when clang_base_path is set. (use_lld still defaults to
true if clang_base_path is set.)
Jessica Paquette [Mon, 28 Jan 2019 19:22:29 +0000 (19:22 +0000)]
[GlobalISel] Add ISel support for @llvm.lifetime.start and @llvm.lifetime.end
This adds ISel support for lifetime markers in opt levels above O0.
It also updates the arm64-irtranslator test, and updates some AArch64 tests that
use them for added coverage.
It also adds a testcase taken from the X86 codegen tests which verified a bug
caused by lifetime markers + stack colouring in the past. This is intended to
make sure that GISel doesn't re-introduce the bug.
(This is basically a straight copy from what SelectionDAG does in
SelectionDAGBuilder.cpp)
Jessica Paquette [Mon, 28 Jan 2019 18:34:18 +0000 (18:34 +0000)]
[GlobalISel][AArch64] Add instruction selection support for G_FCOS and G_FSIN
This contains all of the legalizer changes from D57197 necessary to select
G_FCOS and G_FSIN. It also updates several existing IR tests in
test/CodeGen/AArch64 that verify that we correctly lower the G_FCOS and G_FSIN
instructions.
Alina Sbirlea [Mon, 28 Jan 2019 17:48:45 +0000 (17:48 +0000)]
[SimpleLoopUnswitch] Early check exit for trivial unswitch with MemorySSA.
Summary:
If MemorySSA is avaiable, we can skip checking all instructions if block has any Defs.
(volatile loads are also Defs).
We still need to check all instructions for "canThrow", even if no Defs are found.
George Rimar [Mon, 28 Jan 2019 16:36:12 +0000 (16:36 +0000)]
[llvm-objdump] - Restore a piece of code removed by mistake in r352366.
Seems when committed the r352366
("[llvm-objdump] - Print LMAs when dumping section headers.")
I resolved merge conflict incorrectly and removed this piece by mistake.
Bots did not catch this yet, seems they are slow today,
but the `X86/adjust-vma.test` test case fails locally for me without that.
Sanjay Patel [Mon, 28 Jan 2019 15:51:34 +0000 (15:51 +0000)]
[x86] allow more shuffle splitting to avoid vpermps (PR40434)
This is tricky to make optimal: sometimes we're better off using
a single wider op, but other times it makes more sense to combine
a narrow ops to achieve the same result.
This solves the case from:
https://bugs.llvm.org/show_bug.cgi?id=40434
There's potentially a similar change for vectors with 64-bit elements,
but it needs adjustments similar to rL352333 to avoid creating infinite
loops.
Ranjeet Singh [Mon, 28 Jan 2019 15:48:07 +0000 (15:48 +0000)]
VERSION_GREATER_EQUAL not supported in llvm cmake.
Patch https://reviews.llvm.org/D56329 caused build failures for me when
building on Windows because of the use of cmake operator
'VERSION_GREATER_EQUAL' which isn't supported in older versions of cmake. The
llvm website states that minimum required version of cmake for building llvm is
3.4.3 https://llvm.org/docs/CMake.html
Remove no longer needed Arm specific LICENSE.TXT file.
As the codebase is now under the Apache 2.0 license with LLVM
Exceptions, and all Arm's contributions, past or future, are under that
new license, this Arm specific LICENSE.TXT is no longer needed, thus
removing it.
Michal Gorny [Mon, 28 Jan 2019 15:16:03 +0000 (15:16 +0000)]
[cmake] Fix get_llvm_lit_path() to respect LLVM_EXTERNAL_LIT always
Refactor the get_llvm_lit_path() logic to respect LLVM_EXTERNAL_LIT,
and require the fallback to be defined explicitly
as LLVM_DEFAULT_EXTERNAL_LIT. This fixes building libcxx standalone
after r346888.
The old logic was using LLVM_EXTERNAL_LIT both as user-defined cache
variable and an optional pre-definition of default value from caller
(e.g. libcxx). It included a hack to make this work by assigning
the value back and forth but it was fragile and stopped working
in libcxx.
The new logic is simpler and more transparent. Default value is
provided in a separate variable, and used only when user-specified
variable is empty (i.e. not overriden).
Jordan Rupprecht [Mon, 28 Jan 2019 15:02:40 +0000 (15:02 +0000)]
[llvm-objcopy] Fix crash when writing empty binary output
Summary: When using llvm-objcopy -O binary and the resulting file will be empty (e.g. removing the only section that would be written, or using --only-keep with a section that doesn't exist/isn't SHF_ALLOC), we crash because FileOutputBuffer expects Size > 0. Add a regression test, and change Buffer to open/truncate the output file in this case.
Instruction abs.[ds] is not generating correct result when working
with NaNs for revisions prior mips32r6 and mips64r6.
To generate a sequence which always produce a correct result, but also
to allow user more control on how his code is compiled, attribute
+abs2008 is added, so user can choose legacy or 2008.
By default legacy mode is used on revisions prior R6. Mips32r6 and
mips64r6 use abs2008 mode by default.
George Rimar [Mon, 28 Jan 2019 14:11:35 +0000 (14:11 +0000)]
[llvm-objdump] - Print LMAs when dumping section headers.
When --section-headers is used, GNU objdump prints both LMA and VMA for sections.
llvm-objdump does not do that what makes it's output be slightly inconsistent.
Patch teaches llvm-objdump to print LMA/VMA for ELF file formats.
The behavior for other formats remains unchanged.
James Y Knight [Mon, 28 Jan 2019 13:25:57 +0000 (13:25 +0000)]
[opaque pointer types] Remove GraphTraits specialization for Type.
The only caller has been deleted in r352076, and I'd like to minimize
the amount of code walking Type hierarchies generically, to make it
easier to identify code depending on pointee types.
Jeremy Morse [Mon, 28 Jan 2019 12:08:31 +0000 (12:08 +0000)]
[DebugInfo][DAG] Avoid re-ordering of DBG_VALUEs
This patch improves the placement of DBG_VALUEs when by SelectionDAG, which
as documented in PR40427 can go very wrong. At the core of this is
ProcessSourceNode, which assumes the last instruction in a BB is the start
of the last processed IR instruction, which isn't always true.
Instead, use a helper function to call InstrEmitter::EmitNode, that records
before-and-after iterators and determines the first of any new instruction
created during emission. This is passed to ProcessSourceNode, which can
then make more elightened decisions about ordering for DBG_VALUE placement.
George Rimar [Mon, 28 Jan 2019 10:44:01 +0000 (10:44 +0000)]
[llvm-objdump] - Implement the --adjust-vma option.
GNU objdump's help says: "--adjust-vma: Add OFFSET to all displayed section addresses"
In real life what it does is a bit more complicated
(and IMO not always reasonable. For example, GNU objdump prints not only VMA, but also LMA
for sections. And with --adjust-vma it adjusts LMA, but only when a section has relocations.
llvm-objsump does not seem to support printing LMAs yet, but GNU's logic anyways does not
make sense for me here).
This patch tries to adjust VMA. I tried to implement a reasonable approach.
I am not adjusting sections that are not allocatable. As, for example, adjusting debug sections
VA's and rel[a] sections VA's should not make sense. This behavior seems to be GNU compatible.
Craig Topper [Mon, 28 Jan 2019 05:42:39 +0000 (05:42 +0000)]
[X86] Add vbmi2 compressstore and expandload tests that aren't fast-isel tests.
These got removed when we autoupgraded to target independent intrinsics, but we didn't have coverage anywhere else. The avx512f/avx512vl versions do have coverage.
Also move some tests back from the upgrade file that aren't really upgraded.
Petr Hosek [Mon, 28 Jan 2019 04:12:54 +0000 (04:12 +0000)]
[CMake] Use __libc_start_main rather than fopen when checking for C library
The check_library_exists CMake uses a custom symbol definition. This
is a problem when checking for C library symbols because Clang
recognizes many of them as builtins, and returns the
-Wbuiltin-requires-header (or -Wincompatible-library-redeclaration)
error. When building with -Werror which is the default, this causes
the check_library_exists check fail making the build think that C
library isn't available.
To avoid this issue, we should use a symbol that isn't recognized by
Clang and wouldn't cause the same issue. __libc_start_main seems like
reasonable choice that fits the bill.
Sanjay Patel [Sun, 27 Jan 2019 21:53:33 +0000 (21:53 +0000)]
[x86] add restriction for lowering to vpermps
This transform was added with rL351346, and we had
an escape for shufps, but we also want one for
unpckps vs. vpermps because vpermps doesn't take
an immediate shuffle index operand.
Sanjay Patel [Sun, 27 Jan 2019 18:12:03 +0000 (18:12 +0000)]
[x86] refactor logic in lowerShuffleWithUndefHalf
Although this is longer code, this is no-functional-change-intended.
The goal is to untangle the conditions under which we bail out, so
that's easier to adjust.
Roman Lebedev [Sun, 27 Jan 2019 15:36:35 +0000 (15:36 +0000)]
[X86][NFC] Replace "<%s" with "< %s" in run-lines.
While i have no intention of actually commiting regeneration
of the check lines in these test files with update_llc_test_checks,
lack of that whitespace breaks that util, which is mildly inconvenient.
Simon Pilgrim [Sun, 27 Jan 2019 13:51:59 +0000 (13:51 +0000)]
[TTI] Add generic SADDSAT/SSUBSAT costs
Add generic costs calculation for SADDSAT/SSUBSAT intrinsics, this uses generic costs for sadd_with_overflow/ssub_with_overflow, an extra sign comparison + a selects based on the sign/overflow.
Amara Emerson [Sun, 27 Jan 2019 10:56:20 +0000 (10:56 +0000)]
[AArch64][GlobalISel] Fix the G_EXTLOAD combiner creating non-extending illegal instructions.
This fixes loads like 's1 = load %p (load 1 from %p)' being combined with an
extend into an illegal 's8 = g_extload %p (load 1 from %p)' which doesn't do any
extension, by avoiding touching those < s8 size loads.
This bug was uncovered by a verifier update r351584, which I reverted it to keep
the bots green.
[ValueTracking] Look through casts when determining non-nullness
Bitcast and certain Ptr2Int/Int2Ptr instructions will not alter the
value of their operand and can therefore be looked through when we
determine non-nullness.