Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Roman Lebedev [Mon, 21 Oct 2019 08:21:44 +0000 (08:21 +0000)]
[CVP] Deduce no-wrap on `mul`
Summary:
`ConstantRange::makeGuaranteedNoWrapRegion()` knows how to deal with `mul`
since rL335646, there is exhaustive test coverage.
This is already used by CVP's `processOverflowIntrinsic()`,
and by SCEV's `StrengthenNoWrapFlags()`
Piotr Sobczak [Mon, 21 Oct 2019 08:12:47 +0000 (08:12 +0000)]
[InstCombine] Allow values with multiple users in SimplifyDemandedVectorElts
Summary:
Allow for ignoring the check for a single use in SimplifyDemandedVectorElts
to be able to simplify operands if DemandedElts is known to contain
the union of elements used by all users.
It is a responsibility of a caller of SimplifyDemandedVectorElts to
supply correct DemandedElts.
Simplify a series of extractelement instructions if only a subset of
elements is used.
Yevgeny Rouban [Mon, 21 Oct 2019 06:52:08 +0000 (06:52 +0000)]
[IR] Fix mayReadFromMemory() for writeonly calls
Current implementation of Instruction::mayReadFromMemory()
returns !doesNotAccessMemory() which is !ReadNone. This
does not take into account that the writeonly attribute
also indicates that the call does not read from memory.
The patch changes the predicate to !doesNotReadMemory()
that reflects the intended behavior.
[Attributor] Teach AANoCapture to use information in-flight more aggressively
AAReturnedValues, AAMemoryBehavior, and AANoUnwind, can provide
information that helps during the tracking or even justifies no-capture.
We now use this information and enable no-capture in some test cases
designed a long while a ago for these cases.
Philip Reames [Sun, 20 Oct 2019 23:38:02 +0000 (23:38 +0000)]
[IndVars] Eliminate loop exits with equivalent exit counts
We can end up with two loop exits whose exit counts are equivalent, but whose textual representation is different and non-obvious. For the sub-case where we have a series of exits which dominate one another (common), eliminate any exits which would iterate *after* a previous exit on the exiting iteration.
As noted in the TODO being removed, I'd always thought this was a good idea, but I've now seen this in a real workload as well.
Interestingly, in review, Nikita pointed out there's let another oppurtunity to leverage SCEV's reasoning. If we kept track of the min of dominanting exits so far, we could discharge exits with EC >= MDE. This is less powerful than the existing transform (since later exits aren't considered), but potentially more powerful for any case where SCEV can prove a >= b, but neither a == b or a > b. I don't have an example to illustrate that oppurtunity, but won't be suprised if we find one and return to handle that case as well.
Roman Lebedev [Sun, 20 Oct 2019 20:52:06 +0000 (20:52 +0000)]
[InstCombine] conditional sign-extend of high-bit-extract: 'or' pattern.
In this pattern, all the "magic" bits that we'd `add` are all
high sign bits, and in the value we'd be adding to they are all unset,
not unexpectedly, so we can have an `or` there:
https://rise4fun.com/Alive/ups
It is possible that `haveNoCommonBitsSet()` should be taught about this
pattern so that we never have an `add` variant, but the reasoning would
need to be recursive (because of that `select`), so i'm not really sure
that would be worth it just yet.
Roman Lebedev [Sun, 20 Oct 2019 20:51:37 +0000 (20:51 +0000)]
[NFC][InstCombine] conditional sign-extend of high-bit-extract: 'and' pat. can be 'or' pattern.
In this pattern, all the "magic" bits that we'd add are all
high sign bits, and in the value we'd be adding to they are all unset,
not unexpectedly, so we can have an `or` there:
https://rise4fun.com/Alive/ups
Roman Lebedev [Sun, 20 Oct 2019 19:38:50 +0000 (19:38 +0000)]
[InstCombine] Shift amount reassociation in shifty sign bit test (PR43595)
Summary:
This problem consists of several parts:
* Basic sign bit extraction - `trunc? (?shr %x, (bitwidth(x)-1))`.
This is trivial, and easy to do, we have a fold for it.
* Shift amount reassociation - if we have two identical shifts,
and we can simplify-add their shift amounts together,
then we likely can just perform them as a single shift.
But this is finicky, has one-use restrictions,
and shift opcodes must be identical.
But there is a super-pattern where both of these work together.
to produce sign bit test from two shifts + comparison.
We do indeed already handle this in most cases.
But since we get that fold transitively, it has one-use restrictions.
And what's worse, in this case the right-shifts aren't required to be
identical, and we can't handle that transitively:
If the total shift amount is bitwidth-1, only a sign bit will remain
in the output value. But if we look at this from the perspective of
two shifts, we can't fold - we can't possibly know what bit pattern
we'd produce via two shifts, it will be *some* kind of a mask
produced from original sign bit, but we just can't tell it's shape:
https://rise4fun.com/Alive/cM0 https://rise4fun.com/Alive/9IN
But it will *only* contain sign bit and zeros.
So from the perspective of sign bit test, we're good:
https://rise4fun.com/Alive/FRz https://rise4fun.com/Alive/qBU
Superb!
So the simplest solution is to extend `reassociateShiftAmtsOfTwoSameDirectionShifts()` to also have a
sudo-analysis mode that will ignore extra-uses, and will only check
whether a) those are two right shifts and b) they end up with bitwidth(x)-1
shift amount and return either the original value that we sign-checking,
or null.
This does not have any functionality change for
the existing `reassociateShiftAmtsOfTwoSameDirectionShifts()`.
All that being said, as disscussed in the review, this yet again
increases usage of instsimplify in instcombine as utility.
Some day that may need to be reevaluated.
Roman Lebedev [Sun, 20 Oct 2019 19:36:55 +0000 (19:36 +0000)]
[ConstantRange] makeGuaranteedNoWrapRegion(): `shl` support
Summary:
If all the shifts amount are already poison-producing,
then we can add more poison-producing flags ontop:
https://rise4fun.com/Alive/Ocwi
Otherwise, we should only consider the possible range of shift amts that don't result in poison.
For unsigned range not not overflow, we must not shift out any set bits,
and the actual limit for `x` can be computed by backtransforming
the maximal value we could ever get out of the `shl` - `-1` through
`lshr`. If the `x` is any larger than that then it will overflow.
Likewise for signed range, but just in signed domain..
This is based on the general idea outlined by @nikic in https://reviews.llvm.org/D68672#1714990
Nikita Popov [Sun, 20 Oct 2019 18:59:14 +0000 (18:59 +0000)]
[ConstantRange] Optimize nowrap region test, remove redundant tests; NFC
Enumerate one less constant range in TestNoWrapRegionExhaustive,
which was unnecessary. This allows us to bump the bit count from
3 to 5 while keeping reasonable timing.
Drop four tests for multiply nowrap regions, as these cover subsets
of the exhaustive test. They do use a wider bitwidth, but I don't
think it's worthwhile to have them additionally now.
Matt Arsenault [Sun, 20 Oct 2019 17:34:44 +0000 (17:34 +0000)]
AMDGPU: Split flat offsets that don't fit in DAG
We handle it this way for some other address spaces.
Since r349196, SILoadStoreOptimizer has been trying to do this. This
is after SIFoldOperands runs, which can change the addressing
patterns. It's simpler to just split this earlier.
George Rimar [Sun, 20 Oct 2019 14:47:17 +0000 (14:47 +0000)]
[yaml2obj][obj2yaml] - Do not create a symbol table by default.
This patch tries to resolve problems faced in D68943
and uses some of the code written by Konrad Wilhelm Kleine
in that patch.
Previously, yaml2obj tool always created a .symtab section.
This patch changes that. With it we only create it when
have a "Symbols:" tag in the YAML document or when
we need to create it because it is used by another section(s).
obj2yaml follows the new behavior and does not print "Symbols:"
anymore when there is no symbol table.
Philip Reames [Sat, 19 Oct 2019 17:23:02 +0000 (17:23 +0000)]
[SCEV] Simplify umin/max of zext and sext of the same value
This is a common idiom which arises after induction variables are widened, and we have two or more exit conditions. Interestingly, we don't have instcombine or instsimplify support for this either.
Sanjay Patel [Sat, 19 Oct 2019 16:57:02 +0000 (16:57 +0000)]
[TargetLowering][DAGCombine][MSP430] add/use hook for Shift Amount Threshold (1/2)
Provides a TLI hook to allow targets to relax the emission of shifts, thus enabling
codegen improvements on targets with no multiple shift instructions and cheap selects
or branches.
Contributes to a Fix for PR43559:
https://bugs.llvm.org/show_bug.cgi?id=43559
tryToWidenViaDuplication lowers using the shuffle_v8i16(unpack_v16i8(shuffle_v8i16(x),shuffle_v8i16(x))) pattern, but the unpack only needs the even/odd 16i8 args if the original v16i8 shuffle mask references the even/odd elements - which isn't true for many extension style shuffles.
Reid Kleckner [Sat, 19 Oct 2019 00:48:11 +0000 (00:48 +0000)]
Move endian constant from Host.h to SwapByteOrder.h, prune include
Works on this dependency chain:
ArrayRef.h ->
Hashing.h -> --CUT--
Host.h ->
StringMap.h / StringRef.h
ArrayRef is very popular, but Host.h is rarely needed. Move the
IsBigEndianHost constant to SwapByteOrder.h. Clients of that header are
more likely to need it.
Matt Arsenault [Fri, 18 Oct 2019 23:24:25 +0000 (23:24 +0000)]
LiveIntervals: Fix handleMoveUp with subreg def moving across a def
If a subregister def was moved across another subregister def and
another use, the main range was not correctly updated. The end point
of the moved interval ended too early and missed the use from theh
other lanes in the subreg def.
Wei Mi [Fri, 18 Oct 2019 22:35:20 +0000 (22:35 +0000)]
[SampleFDO] Add profile remapping support for profile on-demand loading used
by ExtBinary format profile
Profile on-demand loading was added for ExtBinary format profile in rL374233,
but currently profile on-demand loading doesn't work well with profile
remapping. The patch adds the support.
Suppose a function in the current module has outline instance in the profile.
The function name in the module is different from the name of the outline
instance, but remapper knows the two names are equal. When loading profile
on-demand, the outline instance has to be loaded with remapper's help.
At the same time SampleProfileReaderItaniumRemapper is changed from a proxy
of SampleProfileReader to a helper member in SampleProfileReader.
Vedant Kumar [Fri, 18 Oct 2019 21:05:30 +0000 (21:05 +0000)]
Disable exit-on-SIGPIPE in lldb
Occasionally, during test teardown, LLDB writes to a closed pipe.
Sometimes the communication is inherently unreliable, so LLDB tries to
avoid being killed due to SIGPIPE (it calls `signal(SIGPIPE, SIG_IGN)`).
However, LLVM's default SIGPIPE behavior overrides LLDB's, causing it to
exit with IO_ERR.
Opt LLDB out of the default SIGPIPE behavior. I expect that this will
resolve some LLDB test suite flakiness (tests randomly failing with
IO_ERR) that we've seen since r344372.
Reid Kleckner [Fri, 18 Oct 2019 21:01:41 +0000 (21:01 +0000)]
[X86] Fix register parsing in .seh_* in Intel syntax
Previously, the parser checked for a '%' prefix to indicate a register.
In Intel syntax mode, LLVM does not print a '%' prefix on registers, so
LLVM could not parse its own assembly output. Instead, require that
register numbers be integer literals, or at least start with an integer
literal, which is consistent with .cfi_* directive register parsing.
Thomas Lively [Fri, 18 Oct 2019 20:27:30 +0000 (20:27 +0000)]
[WebAssembly] Allow multivalue signatures in object files
Summary:
Also changes the wasm YAML format to reflect the possibility of having
multiple return types and to put the returns after the params for
consistency with the binary encoding.
Quentin Colombet [Fri, 18 Oct 2019 20:13:42 +0000 (20:13 +0000)]
[GISel][CallLowering] Make isIncomingArgumentHandler a pure virtual method
The default implementation of isIncomingArgumentHandler could lead
to generating incorrect code.
Make it a pure virtual method, so that targets know they have to
override it to produce correct code.
Roman Lebedev [Fri, 18 Oct 2019 19:32:47 +0000 (19:32 +0000)]
[CVP] After proving that @llvm.with.overflow()/@llvm.sat() don't overflow, also try to prove other no-wrap
Summary:
CVP, unlike InstCombine, does not run till exaustion.
It only does a single pass.
When dealing with those special binops, if we prove that they can
safely be demoted into their usual binop form,
we do set the no-wrap we deduced. But when dealing with usual binops,
we try to deduce both no-wraps.
So if we convert e.g. @llvm.uadd.with.overflow() to `add nuw`,
we won't attempt to check whether it can be `add nuw nsw`.
This patch proposes to call `processBinOp()` on newly-created binop,
which is identical to what we do for div/rem already.
Lang Hames [Fri, 18 Oct 2019 18:25:15 +0000 (18:25 +0000)]
[examples] Add an example of how to use JITLink and small-code-model with LLJIT.
JITLink is LLVM's newer jit-linker. It is an alternative to (and hopefully
eventually a replacement for) LLVM's older jit-linker, RuntimeDyld. Unlike
RuntimeDyld which requries JIT'd code to be complied with the large code
model, JITlink can link code compiled with the small code model, which is
the native code model for a number of targets (including all supported MachO
targets).
This example shows how to:
-- Create a JITLink InProcessMemoryManager
-- Set the code model to small
-- Use a JITLink backed ObjectLinkingLayer as the linking layer for LLJIT
(rather than the default RTDyldObjectLinkingLayer).
Note: This example will only work on platforms supported by JITLink. As of
this commit that's MachO/x86-64 and MachO/arm64.
Julian Lettner [Fri, 18 Oct 2019 17:59:46 +0000 (17:59 +0000)]
[lit] Reduce value of synthesized timeouts
Large timeout values (one year, positive infinity) trip up Python on
Windows with "OverflowError: timeout value is too large". One week
seems to work and is still large enough in practice.
Thanks to Simon Pilgrim for helping me test this.
https://reviews.llvm.org/rL375171
Jay Foad [Fri, 18 Oct 2019 16:16:36 +0000 (16:16 +0000)]
[IR] Reimplement FPMathOperator::classof as a whitelist.
Summary:
This makes it much easier to verify that the implementation matches the
documentation. It uncovered a bug in the unit tests where we were
accidentally setting fast math flags on a load instruction.
James Molloy [Fri, 18 Oct 2019 14:48:35 +0000 (14:48 +0000)]
[DFAPacketizer] Fix large compile-time regression for VLIW targets
D68992 / rL375086 refactored the packetizer and removed a bunch of logic. Unfortunately it creates an Automaton object whenever a DFAPacketizer is required. These objects have no longevity, and in particular on a debug build the population of the Automaton's transition map from the underlying table is very slow (because it is called ~10 times per MachineFunction, in the testcase I'm looking at).
This patch changes Automaton to wrap its underlying constant data in std::shared_ptr, which allows trivial copy construction. The DFAPacketizer creation function now creates a static archetypical Automaton and copies that whenever a new DFAPacketizer is required.
This takes a testcase down from ~20s to ~0.5s in debug mode.
Roman Lebedev [Fri, 18 Oct 2019 13:20:16 +0000 (13:20 +0000)]
[NFC][CVP] Count all the no-wraps we proved
Summary:
It looks like this is the only missing statistic in the CVP pass.
Since we prove NSW and NUW separately i'd think we should count them separately too.
Graham Hunter [Fri, 18 Oct 2019 11:48:35 +0000 (11:48 +0000)]
[AArch64][SVE] Add SPLAT_VECTOR ISD Node
Adds a new ISD node to replicate a scalar value across all elements of
a vector. This is needed for scalable vectors, since BUILD_VECTOR cannot
be used.
Fixes up default type legalization for scalable vectors after the
new MVT type ranges were introduced.
At present I only use this node for scalable vectors. A DAGCombine has
been added to transform a BUILD_VECTOR into a SPLAT_VECTOR if all
elements are the same, but only if the default operation action of
Expand has been overridden by the target.
I've only added result promotion legalization for scalable vector
i8/i16/i32/i64 types in AArch64 for now.
David Green [Fri, 18 Oct 2019 10:35:46 +0000 (10:35 +0000)]
[AArch64] Don't combine callee-save and local stack adjustment when optimizing for size
For arm64, D18619 introduced the ability to combine bumping the stack pointer
upfront in case it needs to be bumped for both the callee-save area as well as
the local stack area.
That diff already remarks that "This change can cause an increase in
instructions", but argues that even when that happens, it should be still be a
performance benefit because the number of micro-ops is reduced.
We have observed that this code-size increase can be significant in practice.
This diff disables combining stack bumping for methods that are marked as
optimize-for-size.
Example of a prologue with the behavior before this diff (combining stack bumping when possible):
sub sp, sp, #0x40
stp d9, d8, [sp, #0x10]
stp x20, x19, [sp, #0x20]
stp x29, x30, [sp, #0x30]
add x29, sp, #0x30
[... compute x8 somehow ...]
stp x0, x8, [sp]
And after this diff, if the method is marked as optimize-for-size:
stp d9, d8, [sp, #-0x30]!
stp x20, x19, [sp, #0x10]
stp x29, x30, [sp, #0x20]
add x29, sp, #0x20
[... compute x8 somehow ...]
stp x0, x8, [sp, #-0x10]!
Note that without combining the stack bump there are two auto-decrements,
nicely folded into the stp instructions, whereas otherwise there is a single
sub sp, ... instruction, but not folded.
David Green [Fri, 18 Oct 2019 09:47:48 +0000 (09:47 +0000)]
[Codegen] Alter the default promotion for saturating adds and subs
The default promotion for the add_sat/sub_sat nodes currently does:
ANY_EXTEND iN to iM
SHL by M-N
[US][ADD|SUB]SAT
L/ASHR by M-N
If the promoted add_sat or sub_sat node is not legal, this can produce code
that effectively does a lot of shifting (and requiring large constants to be
materialised) just to use the overflow flag. It is simpler to just do the
saturation manually, using the higher bitwidth addition and a min/max against
the saturating bounds. That is what this patch attempts to do.
Kerry McLaughlin [Fri, 18 Oct 2019 09:40:16 +0000 (09:40 +0000)]
[AArch64][SVE] Implement unpack intrinsics
Summary:
Implements the following intrinsics:
- int_aarch64_sve_sunpkhi
- int_aarch64_sve_sunpklo
- int_aarch64_sve_uunpkhi
- int_aarch64_sve_uunpklo
This patch also adds AArch64ISD nodes for UNPK instead of implementing
the intrinsics directly, as they are required for a future patch which
implements the sign/zero extension of legal vectors.
This patch includes tests for the Subdivide2Argument type added by D67549