Matt Arsenault [Fri, 25 Aug 2017 01:26:13 +0000 (01:26 +0000)]
DAG: Fix naming crime
Because isOperationCustom was only checking for custom
lowering on illegal types, this was behaving inconsistently
with the other isOperation* functions, so that
isOperationLegalOrCustom != (isOperationLegal || isOperationCustom)
Luckily this is only used in one place which already checks the
type legality on its own.
[unittests] Remove reverse iteration tests which use pointer-like keys
Summary: The expected order of pointer-like keys is hash-function-dependent which in turn depends on the platform/environment. Need to come up with a better way to test reverse iteration of containers with pointer-like keys.
Chandler Carruth [Fri, 25 Aug 2017 00:56:05 +0000 (00:56 +0000)]
[x86] Back out one aspect of r311318: don't generically set
FeatureSlowUAMem32.
The idea was to mark things that are slow on widely available processors
as slow in the generic CPU so that the code generated for that CPU would
be fast across those processors. However, for this feature that doesn't
work out very well at all.
The problem here is that you can very easily enable AVX or AVX2 on top
of this generic CPU. For example, this can happen just by using AVX2
intrinsics from Clang within a region of code guarded by a dynamic CPU
feature test. When you do that, the generated code with SlowUAMem32 set
is ... amazingly slower. The problem is that there really aren't very
good alternatives to the unaligned loads, and so our vector codegen
regresses significantly.
The other issue is that there are plenty of AMD CPUs with AVX1 that
don't set FeatureSlowUAMem32 and so we shouldn't just check for AVX2
instead of this special feature. =/
It would be nice to have the target attriute logic be able to
enable/disable more than just one feature at a time and control this in
a more fine grained and useful way, but that doesn't seem easy. Given
that it is only Sandybridge and Ivybridge that set this feature, for now
I'm just backing it out of the generic CPU. That has the additional
advantage of going back to the previous state that people seemed vaguely
happy with.
Stephen Hines [Fri, 25 Aug 2017 00:48:21 +0000 (00:48 +0000)]
Fix two (three) more issues with unchecked Error.
Summary:
If assertions are disabled, but LLVM_ABI_BREAKING_CHANGES is enabled,
this will cause an issue with an unchecked Success. Switching to
consumeError() is the correct way to bypass the check. This patch also
includes disabling 2 tests that can't work without assertions enabled,
since llvm_unreachable() with NDEBUG won't crash.
Chandler Carruth [Fri, 25 Aug 2017 00:34:07 +0000 (00:34 +0000)]
[x86] Fix an amazing goof in the handling of sub, or, and xor lowering.
The comment for this code indicated that it should work similar to our
handling of add lowering above: if we see uses of an instruction other
than flag usage and store usage, it tries to avoid the specialized
X86ISD::* nodes that are designed for flag+op modeling and emits an
explicit test.
Problem is, only the add case actually did this. In all the other cases,
the logic was incomplete and inverted. Any time the value was used by
a store, we bailed on the specialized X86ISD node. All of this appears
to have been historical where we had different logic here. =/
Turns out, we have quite a few patterns designed around these nodes. We
should actually form them. I fixed the code to match what we do for add,
and it has quite a positive effect just within some of our test cases.
The only thing close to a regression I see is using:
notl %r
testl %r, %r
instead of:
xorl -1, %r
But we can add a pattern or something to fold that back out. The
improvements seem more than worth this.
I've also worked with Craig to update the comments to no longer be
actively contradicted by the code. =[ Some of this still remains
a mystery to both Craig and myself, but this seems like a large step in
the direction of consistency and slightly more accurate comments.
Many thanks to Craig for help figuring out this nasty stuff.
Sanjay Patel [Thu, 24 Aug 2017 23:24:43 +0000 (23:24 +0000)]
[DAG] convert vector select-of-constants to logic/math
This goes back to a discussion about IR canonicalization. We'd like to preserve and convert
more IR to 'select' than we currently do because that's likely the best choice in IR:
http://lists.llvm.org/pipermail/llvm-dev/2016-September/105335.html
...but that's often not true for codegen, so we need to account for this pattern coming in
to the backend and transform it to better DAG ops.
Steps in this patch:
1. Add an EVT param to the existing convertSelectOfConstantsToMath() TLI hook to more finely
enable this transform. Other targets will probably want that anyway to distinguish scalars
from vectors. We're using that here to exclude AVX512 targets, but it may not be necessary.
2. Convert a vselect to ext+add. This eliminates a constant load/materialization, and the
vector ext is often free.
Implementing a more general fold using xor+and can be a follow-up for targets that don't have
a legal vselect. It's also possible that we can remove the TLI hook for the special case fold
implemented here because we're eliminating a constant, but it needs to be tested on other
targets.
Sanjay Patel [Thu, 24 Aug 2017 22:54:01 +0000 (22:54 +0000)]
[InstCombine] fix and enhance udiv/urem narrowing
There are 3 small independent changes here:
1. Account for multiple uses in the pattern matching: avoid the transform if it increases the instruction count.
2. Add a missing fold for the case where the numerator is the constant: http://rise4fun.com/Alive/E2p
3. Enable all folds for vector types.
There's still one more potential change - use "shouldChangeType()" to keep from transforming to an illegal integer type.
Jacob Gravelle [Thu, 24 Aug 2017 19:53:44 +0000 (19:53 +0000)]
[WebAssembly] FastISel : Bail to SelectionDAG for constexpr calls
Summary: Currently FastISel lowers constexpr calls as indirect calls.
We'd like those to direct calls, and falling back to SelectionDAGISel
handles that.
Heejin Ahn [Thu, 24 Aug 2017 19:43:09 +0000 (19:43 +0000)]
[WebAssembly] Update GCC test suite failure expectations
Summary:
Update GCC test suite failure expectations as we add -O0 to the bare tests in
WebAssembly waterfall. There are still several untriaged lld failures.
[Hexagon] Generate correct runtime check when recognizing memmove
The check (assuming positive stride) for validity of memmove should be
(a) the destination is at a lower address than the source, or
(b) the distance between the source and destination is greater than or
equal the number of bytes copied.
For the second part it is sufficient to assume that the destination
is at a higher address, since the opposite case is covered by (a).
The distance calculation was previously done by subtracting the
pointers in the wrong order.
[ARM, Thumb1] Prevent ARMTargetLowering::isLegalAddressingMode from accepting illegal modes
ARMTargetLowering::isLegalAddressingMode can accept illegal addressing modes
for the Thumb1 target. This causes generation of redundant code and affects
performance.
This fixes PR34106: https://bugs.llvm.org/show_bug.cgi?id=34106
Polly uses since several months a performance model for BLAS computations that
derives optimal cache and register tile sizes from cache and latency
information (based on ideas from "Analytical Modeling Is Enough for High-Performance BLIS", by Tze Meng Low published at TOMS 2016).
While bootstrapping this model, these target values have been kept in Polly.
However, as our implementation is now rather mature, it seems time to teach
LLVM itself about cache sizes.
Interestingly, L1 and L2 cache sizes are pretty constant across
micro-architectures, hence a set of architecture specific default values
seems like a good start. They can be expanded to more target specific values,
in case certain newer architectures require different values. For now a set
of Intel architectures are provided.
Just as a little teaser, for a simple gemm kernel this model allows us to
improve performance from 1.2s to 0.27s. For gemm kernels with less optimal
memory layouts even larger speedups can be reported.
Sjoerd Meijer [Thu, 24 Aug 2017 09:21:10 +0000 (09:21 +0000)]
[AArch64] Custom lowering of copysign f16
This is a follow up patch of r311154 and introduces custom lowering of copysign
f16 to avoid promotions to single precision types when the subtarget supports
fullfp16.
Daniel Sanders [Thu, 24 Aug 2017 09:11:20 +0000 (09:11 +0000)]
Re-commit: [globalisel][tablegen] Add support for ImmLeaf without SDNodeXForm
Summary:
This patch adds support for predicates on imm nodes but only for ImmLeaf and not
for PatLeaf or PatFrag and only where the value does not need to be transformed
before being rendered into the instruction.
The limitation on PatLeaf/PatFrag/SDNodeXForm is due to differences in the
necessary target-supplied C++ for GlobalISel.
Depends on D36085
The previous commit was reverted for breaking the build but this appears to have
been the recurring problem on the Windows bots with tablegen not being re-run
when llvm-tblgen is changed but the .td's aren't. If it re-occurs then forcing a
build with clean=True should fix it but this string should do this in advance:
Requires a clean build.
Coby Tayree [Thu, 24 Aug 2017 09:08:33 +0000 (09:08 +0000)]
[LLVM][x86][Inline Asm] support for GCC style inline asm - Y<x> constraints
This patch is intended to enable the use of basic double letter constraints used in GCC extended inline asm {Yi Y2 Yz Y0 Ym Yt}.
Supersedes D35204
Clang counterpart: D36371
Mikael Holmen [Thu, 24 Aug 2017 09:05:00 +0000 (09:05 +0000)]
[Reassociate] Do not drop debug location if replacement is missing
Summary:
When reassociating an expression, do not drop the instruction's
original debug location in case the replacement location is
missing.
The debug location must at least not be dropped for inlinable
callsites of debug-info-bearing functions in debug-info-bearing
functions. Failing to do so would result in an "inlinable function "
"call in a function with debug info must have a !dbg location"
error in the verifier.
As preserving the original debug location is not expected
to result in overly jumpy debug line information, it is
preserved for all other cases too.
This fixes PR34231:
https://bugs.llvm.org/show_bug.cgi?id=34231
Coby Tayree [Thu, 24 Aug 2017 08:46:25 +0000 (08:46 +0000)]
[X86AsmParser] Refactoring, (almost) NFC.
Some refactoring to X86AsmParser, mostly regarding the way rewrites are conducted.
Mainly, we try to concentrate all the rewrite effort under one hood, so it'll hopefully be less of a mess and easier to maintain and understand.
naturally, some frontend tests were affected: D36794
Chandler Carruth [Thu, 24 Aug 2017 07:38:36 +0000 (07:38 +0000)]
[x86] NFC: Clean up two tests and generate precise checks for them.
Mostly this involved giving unnamed values names and running the IR
through `opt` to re-format it but merging in any important comments in
the original. I then deleted pointless comments and inlined the function
attributes for ease of reading and editting.
All of this is to make it much easier to see the instructions being
generated here and evaluate any updates to the tests.
Lang Hames [Thu, 24 Aug 2017 05:35:27 +0000 (05:35 +0000)]
[Support] Rewrite handleAllErrors in terms of cantFail.
This just switches handleAllErrors from using custom assertions that all errors
have been handled to using cantFail. This change involves moving some of the
class and function definitions around though.
Hans Wennborg [Thu, 24 Aug 2017 01:08:27 +0000 (01:08 +0000)]
[DAG] Fix Node Replacement in PromoteIntBinOp
When one operand is a user of another in a promoted binary operation
we may replace and delete the returned value before returning
triggering an assertion. Reorder node replacements to prevent this.
Tim Northover [Wed, 23 Aug 2017 22:07:10 +0000 (22:07 +0000)]
ARM: use internal relocations for local symbols after all.
Switching to external relocations for ARM-mode branches (to allow Thumb
interworking when the offset is unencodable) causes calls to temporary symbols
to be miscompiled and instead go to the parent externally visible symbol.
Calling a temporary never happens in compiled code, but can occasionally in
hand-written assembly.
Rong Xu [Wed, 23 Aug 2017 21:36:02 +0000 (21:36 +0000)]
[PGO] Set edge weights for indirectbr instruction with profile counts
Current PGO only annotates the edge weight for branch and switch instructions
with profile counts. We should also annotate the indirectbr instruction as
all the information is there. This patch enables the annotating for indirectbr
instructions. Also uses this annotation in branch probability analysis.
G_PHI has the same semantics as PHI but also has types.
This lets us verify that the types in the G_PHI are consistent.
This also allows specifying legalization actions for G_PHIs.
Lei Huang [Wed, 23 Aug 2017 19:25:04 +0000 (19:25 +0000)]
Update branch coalescing to be a PowerPC specific pass
Implementing this pass as a PowerPC specific pass. Branch coalescing utilizes
the analyzeBranch method which currently does not include any implicit operands.
This is not an issue on PPC but must be handled on other targets.
Hans Wennborg [Wed, 23 Aug 2017 15:43:28 +0000 (15:43 +0000)]
LowerAtomic: Don't skip optnone functions; atomic still need lowering (PR34020)
The lowering isn't really an optimization, so optnone shouldn't make a
difference. ARM relies on the pass running when using "-mthread-model
single", because in that mode, it doesn't run AtomicExpand. See bug for
more details.
Victor Leschuk [Wed, 23 Aug 2017 14:59:09 +0000 (14:59 +0000)]
Make lit :: shtest-format.py supported on Windows again
It was marked as unsupported on Windows in r311230 because on some Win10
machines it failed or caused hang. The problem was that on these machines
system bash (C:\Windows\System32\bash.exe) was used which requires paths to be
passed like '/mnt/c/path/to/my/script' instead of 'C:\path\to\my\script'.
TODO: we should make lit detect if system bash is used instead of msys and set
appropriate path format.
Gor Nishanov [Wed, 23 Aug 2017 14:47:52 +0000 (14:47 +0000)]
[coroutines] CoroBegin from inner coroutines should be considered for spills
Summary:
If a coroutine outer calls another coroutine inner and the inner coroutine body is inlined into the outer, coro.begin from the inner coroutine should be considered for spilling if accessed across suspends.
Prior to this change, coroutine frame building code was not considering any coro.begins for spilling.
With this change, we only ignore coro.begin for the current coroutine, but, any coro.begins that were inlined into the current coroutine are eligible for spills.
Yuka Takahashi [Wed, 23 Aug 2017 13:39:47 +0000 (13:39 +0000)]
[Bash-autocompletion] Add support for static analyzer flags
Summary:
This is a patch for clang autocomplete feature.
It will collect values which -analyzer-checker takes, which is defined in
clang/StaticAnalyzer/Checkers/Checkers.inc, dynamically.
First, from ValuesCode class in Options.td, TableGen will generate C++
code in Options.inc. Options.inc will be included in DriverOptions.cpp, and
calls OptTable's addValues function. addValues function will add second
argument to Option's Values class. Values contains string like "foo,bar,.."
which is handed to Values class
in OptTable.
Daniel Sanders [Wed, 23 Aug 2017 12:14:18 +0000 (12:14 +0000)]
[globalisel][tablegen] Add support for ImmLeaf without SDNodeXForm
Summary:
This patch adds support for predicates on imm nodes but only for ImmLeaf and not for PatLeaf or PatFrag and only where the value does not need to be transformed before being rendered into the instruction.
The limitation on PatLeaf/PatFrag/SDNodeXForm is due to differences in the necessary target-supplied C++ for GlobalISel.
Florian Hahn [Wed, 23 Aug 2017 11:53:24 +0000 (11:53 +0000)]
[ARM] Check for assembler instructions in test.
Currently this test causes test failures on some machines, due to isel not being registered. Update the test to run all passes and check emitted assembly instructions for now.
Florian Hahn [Wed, 23 Aug 2017 10:20:59 +0000 (10:20 +0000)]
[ARM] Add missing patterns for insert_subvector.
Summary: In some cases, shufflevector instruction can be transformed involving insert_subvector instructions. The ARM backend was missing some insert_subvector patterns, causing a failure during instruction selection. AArch64 has similar patterns.
Davide Italiano [Wed, 23 Aug 2017 09:14:37 +0000 (09:14 +0000)]
[InstCombine] Fold branches with irrelevant conditions to a constant.
InstCombine folds instructions with irrelevant conditions to undef.
This, as Nuno confirmed is a bug.
(see https://bugs.llvm.org/show_bug.cgi?id=33409#c1 )
Given the original motivation for the change is that of removing an
USE, we now fold to false instead (which reaches the same goal
without undesired side effects).
Hiroshi Inoue [Wed, 23 Aug 2017 08:55:18 +0000 (08:55 +0000)]
[PowerPC] better instruction selection for OR (XOR) with a 32-bit immediate
- recommitting after fixing a test failure on MacOS
On PPC64, OR (XOR) with a 32-bit immediate can be done with only two instructions, i.e. ori + oris.
But the current LLVM generates three or four instructions for this purpose (and also it clobbers one GPR).
This patch makes PPC backend generate ori + oris (xori + xoris) for OR (XOR) with a 32-bit immediate.
e.g. (x | 0xFFFFFFFF) should be
ori 3, 3, 65535
oris 3, 3, 65535
but LLVM generates without this patch
li 4, 0
oris 4, 4, 65535
ori 4, 4, 65535
or 3, 3, 4
Sjoerd Meijer [Wed, 23 Aug 2017 08:18:37 +0000 (08:18 +0000)]
[AArch64] ISel legalization debug messages. NFCI.
Debugging AArch64 instruction legalization and custom lowering is really an
unpleasant experience because it shows nodes that appear out of thin air.
In commit r311444, some debug messages have been added to SelectionDAG, the
target independent part, and this patch adds some AArch64 specific messages.
Craig Topper [Wed, 23 Aug 2017 05:46:07 +0000 (05:46 +0000)]
[InstCombine] Remove an unnecessary dyn_cast to Instruction and a switch over two opcodes. Just dyn_cast to the specific instruction classes individually. NFC
Change the helper methods to take the more specific class as well.
Hiroshi Inoue [Wed, 23 Aug 2017 05:15:15 +0000 (05:15 +0000)]
[PowerPC] better instruction selection for OR (XOR) with a 32-bit immediate
On PPC64, OR (XOR) with a 32-bit immediate can be done with only two instructions, i.e. ori + oris.
But the current LLVM generates three or four instructions for this purpose (and also it clobbers one GPR).
This patch makes PPC backend generate ori + oris (xori + xoris) for OR (XOR) with a 32-bit immediate.
e.g. (x | 0xFFFFFFFF) should be
ori 3, 3, 65535
oris 3, 3, 65535
but LLVM generates without this patch
li 4, 0
oris 4, 4, 65535
ori 4, 4, 65535
or 3, 3, 4
[XRay][CodeGen] Use PIC-friendly code in XRay sleds; remove synthetic references in .text
Summary:
This change achieves two things:
- Redefine the Custom Event handling instrumentation points emitted by
the compiler to not require dynamic relocation of references to the
__xray_CustomEvent trampoline.
- Remove the synthetic reference we emit at the end of a function that
we used to keep auxiliary sections alive in favour of SHF_LINK_ORDER
associated with the section where the function is defined.
To achieve the custom event handling change, we've had to introduce the
concept of sled versioning -- this will need to be supported by the
runtime to allow us to understand how to turn on/off the new version of
the custom event handling sleds. That change has to land first before we
change the way we write the sleds.
To remove the synthetic reference, we rely on a relatively new linker
feature that preserves the sections that are associated with each other.
This allows us to limit the effects on the .text section of ELF
binaries.
Because we're still using absolute references that are resolved at
runtime for the instrumentation map (and function index) maps, we mark
these sections write-able. In the future we can re-define the entries in
the map to use relative relocations instead that can be statically
determined by the linker. That change will be a bit more invasive so we
defer this for later.
Yonghong Song [Wed, 23 Aug 2017 04:25:57 +0000 (04:25 +0000)]
bpf: add variants of -mcpu=# and support for additional jmp insns
-mcpu=# will support:
. generic: the default insn set
. v1: insn set version 1, the same as generic
. v2: insn set version 2, version 1 + additional jmp insns
. probe: the compiler will probe the underlying kernel to
decide proper version of insn set.
We did not not use -mcpu=native since llc/llvm will interpret -mcpu=native
as the underlying hardware architecture regardless of -march value.
Currently, only x86_64 supports -mcpu=probe. Other architecture will
silently revert to "generic".
Also added -mcpu=help to print available cpu parameters.
llvm will print out the information only if there are at least one
cpu and at least one feature. Add an unused dummy feature to
enable the printout.
Examples for usage:
$ llc -march=bpf -mcpu=v1 -filetype=asm t.ll
$ llc -march=bpf -mcpu=v2 -filetype=asm t.ll
$ llc -march=bpf -mcpu=generic -filetype=asm t.ll
$ llc -march=bpf -mcpu=probe -filetype=asm t.ll
$ llc -march=bpf -mcpu=v3 -filetype=asm t.ll
'v3' is not a recognized processor for this target (ignoring processor)
...
$ llc -march=bpf -mcpu=help -filetype=asm t.ll
Available CPUs for this target:
generic - Select the generic processor.
probe - Select the probe processor.
v1 - Select the v1 processor.
v2 - Select the v2 processor.
Available features for this target:
dummy - unused feature.
Use +feature to enable a feature, or -feature to disable it.
For example, llc -mcpu=mycpu -mattr=+feature1,-feature2
...
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Yonghong Song <yhs@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311522 91177308-0d34-0410-b5e6-96231b3b80d8
Matthias Braun [Wed, 23 Aug 2017 03:17:59 +0000 (03:17 +0000)]
Add test case for r311511
This also changes the TailDuplicator to be configured explicitely
pre/post regalloc rather than relying on the isSSA() flag. This was
necessary to have `llc -run-pass` work reliably.