Jessica Paquette [Mon, 28 Jan 2019 19:22:29 +0000 (19:22 +0000)]
[GlobalISel] Add ISel support for @llvm.lifetime.start and @llvm.lifetime.end
This adds ISel support for lifetime markers in opt levels above O0.
It also updates the arm64-irtranslator test, and updates some AArch64 tests that
use them for added coverage.
It also adds a testcase taken from the X86 codegen tests which verified a bug
caused by lifetime markers + stack colouring in the past. This is intended to
make sure that GISel doesn't re-introduce the bug.
(This is basically a straight copy from what SelectionDAG does in
SelectionDAGBuilder.cpp)
Jessica Paquette [Mon, 28 Jan 2019 18:34:18 +0000 (18:34 +0000)]
[GlobalISel][AArch64] Add instruction selection support for G_FCOS and G_FSIN
This contains all of the legalizer changes from D57197 necessary to select
G_FCOS and G_FSIN. It also updates several existing IR tests in
test/CodeGen/AArch64 that verify that we correctly lower the G_FCOS and G_FSIN
instructions.
Alina Sbirlea [Mon, 28 Jan 2019 17:48:45 +0000 (17:48 +0000)]
[SimpleLoopUnswitch] Early check exit for trivial unswitch with MemorySSA.
Summary:
If MemorySSA is avaiable, we can skip checking all instructions if block has any Defs.
(volatile loads are also Defs).
We still need to check all instructions for "canThrow", even if no Defs are found.
George Rimar [Mon, 28 Jan 2019 16:36:12 +0000 (16:36 +0000)]
[llvm-objdump] - Restore a piece of code removed by mistake in r352366.
Seems when committed the r352366
("[llvm-objdump] - Print LMAs when dumping section headers.")
I resolved merge conflict incorrectly and removed this piece by mistake.
Bots did not catch this yet, seems they are slow today,
but the `X86/adjust-vma.test` test case fails locally for me without that.
Sanjay Patel [Mon, 28 Jan 2019 15:51:34 +0000 (15:51 +0000)]
[x86] allow more shuffle splitting to avoid vpermps (PR40434)
This is tricky to make optimal: sometimes we're better off using
a single wider op, but other times it makes more sense to combine
a narrow ops to achieve the same result.
This solves the case from:
https://bugs.llvm.org/show_bug.cgi?id=40434
There's potentially a similar change for vectors with 64-bit elements,
but it needs adjustments similar to rL352333 to avoid creating infinite
loops.
Ranjeet Singh [Mon, 28 Jan 2019 15:48:07 +0000 (15:48 +0000)]
VERSION_GREATER_EQUAL not supported in llvm cmake.
Patch https://reviews.llvm.org/D56329 caused build failures for me when
building on Windows because of the use of cmake operator
'VERSION_GREATER_EQUAL' which isn't supported in older versions of cmake. The
llvm website states that minimum required version of cmake for building llvm is
3.4.3 https://llvm.org/docs/CMake.html
Remove no longer needed Arm specific LICENSE.TXT file.
As the codebase is now under the Apache 2.0 license with LLVM
Exceptions, and all Arm's contributions, past or future, are under that
new license, this Arm specific LICENSE.TXT is no longer needed, thus
removing it.
Michal Gorny [Mon, 28 Jan 2019 15:16:03 +0000 (15:16 +0000)]
[cmake] Fix get_llvm_lit_path() to respect LLVM_EXTERNAL_LIT always
Refactor the get_llvm_lit_path() logic to respect LLVM_EXTERNAL_LIT,
and require the fallback to be defined explicitly
as LLVM_DEFAULT_EXTERNAL_LIT. This fixes building libcxx standalone
after r346888.
The old logic was using LLVM_EXTERNAL_LIT both as user-defined cache
variable and an optional pre-definition of default value from caller
(e.g. libcxx). It included a hack to make this work by assigning
the value back and forth but it was fragile and stopped working
in libcxx.
The new logic is simpler and more transparent. Default value is
provided in a separate variable, and used only when user-specified
variable is empty (i.e. not overriden).
Jordan Rupprecht [Mon, 28 Jan 2019 15:02:40 +0000 (15:02 +0000)]
[llvm-objcopy] Fix crash when writing empty binary output
Summary: When using llvm-objcopy -O binary and the resulting file will be empty (e.g. removing the only section that would be written, or using --only-keep with a section that doesn't exist/isn't SHF_ALLOC), we crash because FileOutputBuffer expects Size > 0. Add a regression test, and change Buffer to open/truncate the output file in this case.
Instruction abs.[ds] is not generating correct result when working
with NaNs for revisions prior mips32r6 and mips64r6.
To generate a sequence which always produce a correct result, but also
to allow user more control on how his code is compiled, attribute
+abs2008 is added, so user can choose legacy or 2008.
By default legacy mode is used on revisions prior R6. Mips32r6 and
mips64r6 use abs2008 mode by default.
George Rimar [Mon, 28 Jan 2019 14:11:35 +0000 (14:11 +0000)]
[llvm-objdump] - Print LMAs when dumping section headers.
When --section-headers is used, GNU objdump prints both LMA and VMA for sections.
llvm-objdump does not do that what makes it's output be slightly inconsistent.
Patch teaches llvm-objdump to print LMA/VMA for ELF file formats.
The behavior for other formats remains unchanged.
James Y Knight [Mon, 28 Jan 2019 13:25:57 +0000 (13:25 +0000)]
[opaque pointer types] Remove GraphTraits specialization for Type.
The only caller has been deleted in r352076, and I'd like to minimize
the amount of code walking Type hierarchies generically, to make it
easier to identify code depending on pointee types.
Jeremy Morse [Mon, 28 Jan 2019 12:08:31 +0000 (12:08 +0000)]
[DebugInfo][DAG] Avoid re-ordering of DBG_VALUEs
This patch improves the placement of DBG_VALUEs when by SelectionDAG, which
as documented in PR40427 can go very wrong. At the core of this is
ProcessSourceNode, which assumes the last instruction in a BB is the start
of the last processed IR instruction, which isn't always true.
Instead, use a helper function to call InstrEmitter::EmitNode, that records
before-and-after iterators and determines the first of any new instruction
created during emission. This is passed to ProcessSourceNode, which can
then make more elightened decisions about ordering for DBG_VALUE placement.
George Rimar [Mon, 28 Jan 2019 10:44:01 +0000 (10:44 +0000)]
[llvm-objdump] - Implement the --adjust-vma option.
GNU objdump's help says: "--adjust-vma: Add OFFSET to all displayed section addresses"
In real life what it does is a bit more complicated
(and IMO not always reasonable. For example, GNU objdump prints not only VMA, but also LMA
for sections. And with --adjust-vma it adjusts LMA, but only when a section has relocations.
llvm-objsump does not seem to support printing LMAs yet, but GNU's logic anyways does not
make sense for me here).
This patch tries to adjust VMA. I tried to implement a reasonable approach.
I am not adjusting sections that are not allocatable. As, for example, adjusting debug sections
VA's and rel[a] sections VA's should not make sense. This behavior seems to be GNU compatible.
Craig Topper [Mon, 28 Jan 2019 05:42:39 +0000 (05:42 +0000)]
[X86] Add vbmi2 compressstore and expandload tests that aren't fast-isel tests.
These got removed when we autoupgraded to target independent intrinsics, but we didn't have coverage anywhere else. The avx512f/avx512vl versions do have coverage.
Also move some tests back from the upgrade file that aren't really upgraded.
Petr Hosek [Mon, 28 Jan 2019 04:12:54 +0000 (04:12 +0000)]
[CMake] Use __libc_start_main rather than fopen when checking for C library
The check_library_exists CMake uses a custom symbol definition. This
is a problem when checking for C library symbols because Clang
recognizes many of them as builtins, and returns the
-Wbuiltin-requires-header (or -Wincompatible-library-redeclaration)
error. When building with -Werror which is the default, this causes
the check_library_exists check fail making the build think that C
library isn't available.
To avoid this issue, we should use a symbol that isn't recognized by
Clang and wouldn't cause the same issue. __libc_start_main seems like
reasonable choice that fits the bill.
Sanjay Patel [Sun, 27 Jan 2019 21:53:33 +0000 (21:53 +0000)]
[x86] add restriction for lowering to vpermps
This transform was added with rL351346, and we had
an escape for shufps, but we also want one for
unpckps vs. vpermps because vpermps doesn't take
an immediate shuffle index operand.
Sanjay Patel [Sun, 27 Jan 2019 18:12:03 +0000 (18:12 +0000)]
[x86] refactor logic in lowerShuffleWithUndefHalf
Although this is longer code, this is no-functional-change-intended.
The goal is to untangle the conditions under which we bail out, so
that's easier to adjust.
Roman Lebedev [Sun, 27 Jan 2019 15:36:35 +0000 (15:36 +0000)]
[X86][NFC] Replace "<%s" with "< %s" in run-lines.
While i have no intention of actually commiting regeneration
of the check lines in these test files with update_llc_test_checks,
lack of that whitespace breaks that util, which is mildly inconvenient.
Simon Pilgrim [Sun, 27 Jan 2019 13:51:59 +0000 (13:51 +0000)]
[TTI] Add generic SADDSAT/SSUBSAT costs
Add generic costs calculation for SADDSAT/SSUBSAT intrinsics, this uses generic costs for sadd_with_overflow/ssub_with_overflow, an extra sign comparison + a selects based on the sign/overflow.
Amara Emerson [Sun, 27 Jan 2019 10:56:20 +0000 (10:56 +0000)]
[AArch64][GlobalISel] Fix the G_EXTLOAD combiner creating non-extending illegal instructions.
This fixes loads like 's1 = load %p (load 1 from %p)' being combined with an
extend into an illegal 's8 = g_extload %p (load 1 from %p)' which doesn't do any
extension, by avoiding touching those < s8 size loads.
This bug was uncovered by a verifier update r351584, which I reverted it to keep
the bots green.
[ValueTracking] Look through casts when determining non-nullness
Bitcast and certain Ptr2Int/Int2Ptr instructions will not alter the
value of their operand and can therefore be looked through when we
determine non-nullness.
Simon Pilgrim [Sat, 26 Jan 2019 20:23:04 +0000 (20:23 +0000)]
[X86] combineCarryThroughADD - add support for X86::COND_A commutations (PR24545)
As discussed on PR24545, we should try to commute X86::COND_A 'icmp ugt' cases to X86::COND_B 'icmp ult' to more optimally bind the carry flag output to a SBB instruction.
Sanjay Patel [Sat, 26 Jan 2019 16:20:22 +0000 (16:20 +0000)]
[x86] add helper for creating a half-width shuffle; NFC
This reduces a bit of duplication between the combining and
lowering places that use it, but the primary motivation is
to make it easier to rearrange the lowering logic and solve
PR40434:
https://bugs.llvm.org/show_bug.cgi?id=40434
Craig Topper [Sat, 26 Jan 2019 02:41:54 +0000 (02:41 +0000)]
[X86] Remove GCCBuiltins from 512-bit cvt(u)qqtops, cvt(u)qqtopd, and cvt(u)dqtops intrinsics. Add new variadic uitofp/sitofp with rounding mode intrinsics.
Summary: See clang patch D56998 for a full description.
Matt Arsenault [Sat, 26 Jan 2019 01:42:13 +0000 (01:42 +0000)]
GlobalISel: Fix address space limit in LLT
The IR enforced limit for the address space is 24-bits, but LLT was
only using 23-bits. Additionally, the argument to the constructor was
truncating to 16-bits.
A similar problem still exists for the number of vector elements. The
IR enforces no limit, so if you try to use a vector with > 65535
elements the IRTranslator asserts in the LLT constructor.
Nemanja Ivanovic [Sat, 26 Jan 2019 01:18:48 +0000 (01:18 +0000)]
[PowerPC] Update Vector Costs for P9
For the power9 CPU, vector operations consume a pair of execution units rather
than one execution unit like a scalar operation. Update the target transform
cost functions to reflect the higher cost of vector operations when targeting
Power9.
Craig Topper [Sat, 26 Jan 2019 01:17:09 +0000 (01:17 +0000)]
[X86] Add DAG combine to merge vzext_movl with the various fp<->int conversion operations that only write the lower 64-bits of an xmm register and zero the rest.
Summary: We have isel patterns for this, but we're missing some load patterns and all broadcast patterns. A DAG combine seems like a better fit for this.
Artem Belevich [Sat, 26 Jan 2019 00:28:32 +0000 (00:28 +0000)]
[NVPTX] Some nvvm.read.ptx.sreg intrinsics should have IntrInaccessibleMemOnly attribute.
These intrinsics may return different values every time they are called
and should not be CSE'd. IntrInaccessibleMemOnly appears to be the right
attribute to model this behavior.
Craig Topper [Sat, 26 Jan 2019 00:26:37 +0000 (00:26 +0000)]
[SelectionDAG][X86] Don't use SEXTLOAD for promoting masked loads in the type legalizer
Summary:
I'm not sure why we were using SEXTLOAD. EXTLOAD seems more appropriate since we don't care about the upper bits.
This patch changes this and then modifies the X86 post legalization combine to emit a extending shuffle instead of a sign_extend_vector_inreg. Could maybe use an any_extend_vector_inreg, but I just did what we already do in LowerLoad. I think we can actually get rid of this code entirely if we switch to -x86-experimental-vector-widening-legalization.
On AVX512 targets I think we might be able to use a masked vpmovzx and not have to expand this at all.
Hans Wennborg [Fri, 25 Jan 2019 22:45:17 +0000 (22:45 +0000)]
Build LLVM-C.dll by default on windows and enable in release package
With the fixes to the building of LLVM-C.dll in D56781 this should now
be safe to land. This will greatly simplify dealing with LLVM for people
that just want to use the C API on windows. This is a follow up from
D35077.
As shown in double-bitmanip-dagcombines.ll, this can be advantageous. But
RV32FD doesn't use bitcast directly (as i64 isn't a legal type), and instead
uses RISCVISD::SplitF64. This patch adds an equivalent DAG combine for
SplitF64.
Mircea Trofin [Fri, 25 Jan 2019 21:49:54 +0000 (21:49 +0000)]
[llvm] Opt-in flag for X86DiscriminateMemOps
Summary:
Currently, if an instruction with a memory operand has no debug information,
X86DiscriminateMemOps will generate one based on the first line of the
enclosing function, or the last seen debug info.
This may cause confusion in certain debugging scenarios. The long term
approach would be to use the line number '0' in such cases, however, that
brings in challenges: the base discriminator value range is limited
(4096 values).
For the short term, adding an opt-in flag for this feature.
See bug 40319 (https://bugs.llvm.org/show_bug.cgi?id=40319)