Roman Lebedev [Tue, 2 Jul 2019 14:48:52 +0000 (14:48 +0000)]
[NFC][Codegen][X86][AArch64][ARM][PowerPC] Add test coverage for "add-of-inc" vs "sub-of-not"
As it is pointed out in https://reviews.llvm.org/D63992,
before we get to pick canonical variant in middle-end
we should ensure best codegen in backend.
Matt Arsenault [Tue, 2 Jul 2019 14:40:22 +0000 (14:40 +0000)]
AMDGPU/GlobalISel: Fix G_GEP with mixed SGPR/VGPR operands
The register bank for the destination of the sample argument copy was
wrong. We shouldn't be constraining each source to the result register
bank. Allow constraining the original register to the right size.
Simon Pilgrim [Tue, 2 Jul 2019 13:30:04 +0000 (13:30 +0000)]
[X86][AVX] combineX86ShuffleChain - pull out CombineShuffleWithExtract lambda. NFCI.
Pull out CombineShuffleWithExtract lambda to new combineX86ShuffleChainWithExtract wrapper and refactored it to handle more than 2 shuffle inputs - this will allow combineX86ShufflesRecursively to call this in a future patch.
I was actually wondering if there was some nicer way than m_Value()+cast,
but apparently what i was really "subconsciously" thinking about
was correctness issue.
hasNoUnsignedWrap()/hasNoUnsignedWrap() exist for Instruction,
not for BinaryOperator, so let's just use m_Instruction(),
thus both avoiding a cast, and a crash.
Michal Gorny [Tue, 2 Jul 2019 11:32:03 +0000 (11:32 +0000)]
[llvm] [Support] Clean PrintStackTrace() ptr arithmetic up
Use '%tu' modifier for pointer arithmetic since we are using C++11
already. Prefer static_cast<> over C-style cast. Remove unnecessary
conversion of result, and add const qualifier to converted pointers,
to silence the following warning:
In file included from /home/mgorny/llvm-project/llvm/lib/Support/Signals.cpp:220:0:
/home/mgorny/llvm-project/llvm/lib/Support/Unix/Signals.inc: In function ‘void llvm::sys::PrintStackTrace(llvm::raw_ostream&)’:
/home/mgorny/llvm-project/llvm/lib/Support/Unix/Signals.inc:546:53: warning: cast from type ‘const void*’ to type ‘char*’ casts away qualifiers [-Wcast-qual]
(char*)dlinfo.dli_saddr));
^~~~~~~~~
[IDF] Generalize IDFCalculator to be used with Clang's CFG
I'm currently working on a GSoC project that aims to improve the the bug reports
of the analyzer. The main heuristic I plan to use is to explain values that are
a control dependency of the bug location better.
01 bool b = messyComputation();
02 int i = 0;
03 if (b) // control dependency of the bug site, let's explain why we assume val
04 // to be true
05 10 / i; // warn: division by zero
Because of this, I'd like to generalize IDFCalculator so that I could use it for
Clang's CFG: D62883.
In detail:
* Rename IDFCalculator to IDFCalculatorBase, make it take a general CFG node
type as a template argument rather then strictly BasicBlock (but preserve
ForwardIDFCalculator and ReverseIDFCalculator)
* Move IDFCalculatorBase from llvm/include/llvm/Analysis to
llvm/include/llvm/Support (but leave the BasicBlock variants in
llvm/include/llvm/Analysis)
* clang-format the file since this patch messes up git blame anyways
* Change typedef to using
* Add the new type ChildrenGetterTy, and store an instance of it in
IDFCalculatorBase. This is important because I'll have to specialize it for
Clang's CFG to filter out nullpointer successors, similarly to D62507.
Simon Tatham [Tue, 2 Jul 2019 11:26:11 +0000 (11:26 +0000)]
[ARM] MVE: allow soft-float ABI to pass vector types.
Passing a vector type over the soft-float ABI involves it being split
into four GPRs, so the first thing that has to happen at the start of
the function is to recombine those into a vector register. The ABI
types all vectors as v2f64, so we need to support BUILD_VECTOR for
that type, which I do in this patch by allowing it to be expanded in
terms of INSERT_VECTOR_ELT, and writing an ISel pattern for that in
turn. Similarly, I provide a rule for EXTRACT_VECTOR_ELT so that a
returned vector can be marshalled back into GPRs.
While I'm here, I've also added ISD::UNDEF to the list of operations
we turn back on in `setAllExpand`, because I noticed that otherwise it
gets expanded into a BUILD_VECTOR with explicit zero inputs, leading
to pointless machine instructions to zero out a vector register that's
about to have every lane overwritten of in any case.
Simon Tatham [Tue, 2 Jul 2019 11:26:00 +0000 (11:26 +0000)]
[ARM] Stop using scalar FP instructions in integer-only MVE mode.
If you compile with `-mattr=+mve` (enabling integer MVE instructions
but not floating-point ones), then the scalar FP //registers// exist
and it's legal to move things in and out of them, load and store them,
but it's not legal to do arithmetic on them.
In D60708, the calls to `addRegisterClass` in ARMISelLowering that
enable use of the scalar FP registers became conditionalised on
`Subtarget->hasFPRegs()` instead of `Subtarget->hasVFP2Base()`, so
that loads, stores and moves of those registers would work. But I
didn't realise that that would also enable all the operations on those
types by default.
Now, if the target doesn't have basic VFP, we follow up those
`addRegisterClass` calls by turning back off all the nontrivial
operations you can perform on f32 and f64. That causes several
knock-on failures, which are fixed by allowing the `VMOVDcc` and
`VMOVScc` instructions to be selected even if all you have is
`HasFPRegs`, and adjusting several checks for 'is this a double in a
single-precision-only world?' to the more general 'is this any FP type
we can't do arithmetic on?'. Between those, the whole of the
`float-ops.ll` and `fp16-instructions.ll` tests can now run in
MVE-without-FP mode and generate correct-looking code.
One odd side effect is that I had to relax the check lines in that
test so that they permit test functions like `add_f` to be generated
as tailcalls to software FP library functions, instead of ordinary
calls. Doing that is entirely legal, but the mystery is why this is
the first RUN line that's needed the relaxation: on the usual kind of
non-FP target, no tailcalls ever seem to be generated. Going by the
llc messages, I think `SoftenFloatResult` must be perturbing the code
generation in some way, but that's as much as I can guess.
Simon Pilgrim [Tue, 2 Jul 2019 10:53:17 +0000 (10:53 +0000)]
[X86] resolveTargetShuffleInputsAndMask - add repeated input handling.
We were relying on combineX86ShufflesRecursively to handle this - this patch gets it done earlier which should make it easier for other code to use resolveTargetShuffleInputsAndMask.
[TailDuplicator] Fix copy instruction emitting into the wrong block.
The code for duplicating instructions could sometimes try to emit copies
intended to deal with unconstrainable register classes to the tail block of the
original instruction, rather than before the newly cloned instruction in the
predecessor block.
[PowerPC] Implement the areMemAccessesTriviallyDisjoint hook
After implemented this hook, we will model the memory dependency in the scheduling dependency graph more precise,
and will have more opportunity to reorder the load/stores, as they didn't have the dependency at some condition
Zi Xuan Wu [Tue, 2 Jul 2019 02:54:52 +0000 (02:54 +0000)]
[DAGCombiner] Exploiting more about the transformation of TransformFPLoadStorePair function
For a given floating point load / store pair, if the load value isn't used by any other operations,
then consider transforming the pair to integer load / store operations if the target deems the transformation profitable.
And we can exploiting much more when there are other operation nodes with chain operand between the load/store pair
so long as we keep the chain ordering original. We only replace the register used to load/store from float to integer.
I only add testcase in ARM because the TLI.isDesirableToTransformToIntegerOp hook is only enabled in ARM target.
[cmake] With utils disabled, don't build tblgen in cross mode
Summary:
In cross mode, we build a separate NATIVE tblgen that runs on the
host and is used during the build. Separately, we have a flag that
disables building all executables in utils/. Of course generally,
this doesn't turn off tblgen, since we need that during the build.
In cross mode, however, that tblegen is useless since we never
actually use it. Furthermore, it can be actively problematic if the
cross toolchain doesn't like building executables for whatever reason.
And even if building executables works fine, we can at least save
compile time by omitting it from the target build. There's two changes
needed to make this happen:
- Stop creating a dependency from the native tool to the target tool.
No such dependency is required for a correct build, so I'm not entirely
sure why it was there in the first place.
- If utils were disabled on the CMake command line and we're in cross mode,
respect that by excluding it from the install target (using EXCLUDE_FROM_ALL).
[InstCombine] reduce more checks for power-of-2-or-zero using ctpop
Extends the transform from:
rL364341
...to include another (more common?) pattern that tests whether a
value is a power-of-2 (including or excluding zero).
Matt Arsenault [Mon, 1 Jul 2019 19:36:10 +0000 (19:36 +0000)]
GlobalISel: Try to widen merges with other merges
If the requested source type an be used as a merge source type, create
a merge of merges. This avoids creating large, illegal extensions and
bit-ops directly to the result type.
Matt Arsenault [Mon, 1 Jul 2019 18:40:23 +0000 (18:40 +0000)]
AMDGPU/GlobalISel: Custom lower control flow intrinsics
Replace the brcond for the 2 cases that act as branches. For now
follow how the current system works, although I think we can
eventually get rid of the pseudos.
Matt Arsenault [Mon, 1 Jul 2019 18:33:37 +0000 (18:33 +0000)]
AMDGPU/GlobalISel: Handle 16-bit SALU min/max
This needs to be extended to s32, and expanded into cmp+select. This
is relying on the fact that widenScalar happens to leave the
instruction in place, but this isn't a guaranteed property of
LegalizerHelper.
The function findPotentialBlockers may consider debug info instructions as
potential blockers and may stop searching for a store-load pair prematurely.
This patch corrects this and tests the cases where the store is separated
from the load by more than InspectionLimit debug instructions.
Summary:
ds_ordered_count can now simultaneously operate on up to 4 dwords
in a single instruction, which are taken from (and returned to)
lanes 0..3 of a single VGPR.
Matt Arsenault [Mon, 1 Jul 2019 16:27:32 +0000 (16:27 +0000)]
AMDGPU/GlobalISel: Try to select VOP3 form of add
There are several things broken, but at least emit the right thing for
gfx9.
The import of the pattern with the unused carry out seems to not
work. Needs a special class for clamp, because OperandWithDefaultOps
doesn't really work.
Previously, the llvm-readelf documentation was essentially just a list
of differences to llvm-readobj. Since llvm-readelf is the more likely
goto tool for many people migrating to the LLVM toolchain, it seems like
it would be helpful to document all the switches in the llvm-readelf
document too. This change expands the options listed accordingly.
Additionally, they are unlikely to care what the differences are to
llvm-readobj, since they won't be familiar with the latter as there is
no GNU equivalent, so this change moves the "differences" section to
llvm-readobj's documentation.
Summary:
According to the ARMARM, the VQDMLADH, VQRDMLADH, VQDMLSDH and
VQRDMLSDH instructions handle their results as follows: "The base
variant writes the results into the lower element of each pair of
elements in the destination register, whereas the exchange variant
writes to the upper element in each pair". I.e., the initial content
of the output register affects the result, as usual, we model this
with an additional input.
Also, for 32-bit variants Qd is not allowed to be the same register as
Qm and Qn, we use @earlyclobber to indicate this.
This patch also changes vpred_r to vpred_n because the instructions
don't have an explicit 'inactive' operand.
[ARM] MVE: support QQPRRegClass and QQQQPRRegClass
Summary:
QQPRRegClass and QQQQPRRegClass are used by the
interleaving/deinterleaving loads/stores to represent sequences of
consecutive SIMD registers.
Roman Lebedev [Mon, 1 Jul 2019 15:55:24 +0000 (15:55 +0000)]
[InstCombine] (Y + ~X) + 1 --> Y - X fold (PR42459)
Summary:
To be noted, this pattern is not unhandled by instcombine per-se,
it is somehow does end up being folded when one runs opt -O3,
but not if it's just -instcombine. Regardless, that fold is
indirect, depends on some other folds, and is thus blind
when there are extra uses.
This does address the regression being exposed in D63992.
Roman Lebedev [Mon, 1 Jul 2019 15:55:15 +0000 (15:55 +0000)]
[InstCombine] Shift amount reassociation in bittest (PR42399)
Summary:
Given pattern:
`icmp eq/ne (and ((x shift Q), (y oppositeshift K))), 0`
we should move shifts to the same hand of 'and', i.e. rewrite as
`icmp eq/ne (and (x shift (Q+K)), y), 0` iff `(Q+K) u< bitwidth(x)`
It might be tempting to not restrict this to situations where we know
we'd fold two shifts together, but i'm not sure what rules should there be
to avoid endless combine loops.
We pick the same shift that was originally used to shift the variable we picked to shift:
https://rise4fun.com/Alive/6x1v
Should fix [[ https://bugs.llvm.org/show_bug.cgi?id=42399 | PR42399]].
Fix stack-use-after-scope errors from r364512. One instance was already
fixed in r364611 - this patch simplifies that fix and addresses one more
instance of similar code.
Jinsong Ji [Mon, 1 Jul 2019 14:37:48 +0000 (14:37 +0000)]
[UpdateTestChecks][PowerPC] Avoid empty string when scrubbing loop comments
Summary:
SCRUB_LOOP_COMMENT_RE was introduced in https://reviews.llvm.org/D31285
This works for some loops.
However, we may generate lines with loop comments only.
And since we don't scrub leading white spaces, this will leave an empty
line there, and FileCheck will complain it.
eg: llvm/test/CodeGen/PowerPC/PR35812-neg-cmpxchg.ll:27:15:
error: found empty check string with prefix 'CHECK:'
; CHECK-NEXT:
This prevented us from using the `update_llc_test_checks.py` for quite some cases.
We should still keep the comment token there, so that we can safely
scrub the loop comment without breaking FileCheck.