[cmake] With utils disabled, don't build tblgen in cross mode
Summary:
In cross mode, we build a separate NATIVE tblgen that runs on the
host and is used during the build. Separately, we have a flag that
disables building all executables in utils/. Of course generally,
this doesn't turn off tblgen, since we need that during the build.
In cross mode, however, that tblegen is useless since we never
actually use it. Furthermore, it can be actively problematic if the
cross toolchain doesn't like building executables for whatever reason.
And even if building executables works fine, we can at least save
compile time by omitting it from the target build. There's two changes
needed to make this happen:
- Stop creating a dependency from the native tool to the target tool.
No such dependency is required for a correct build, so I'm not entirely
sure why it was there in the first place.
- If utils were disabled on the CMake command line and we're in cross mode,
respect that by excluding it from the install target (using EXCLUDE_FROM_ALL).
[InstCombine] reduce more checks for power-of-2-or-zero using ctpop
Extends the transform from:
rL364341
...to include another (more common?) pattern that tests whether a
value is a power-of-2 (including or excluding zero).
Matt Arsenault [Mon, 1 Jul 2019 19:36:10 +0000 (19:36 +0000)]
GlobalISel: Try to widen merges with other merges
If the requested source type an be used as a merge source type, create
a merge of merges. This avoids creating large, illegal extensions and
bit-ops directly to the result type.
Matt Arsenault [Mon, 1 Jul 2019 18:40:23 +0000 (18:40 +0000)]
AMDGPU/GlobalISel: Custom lower control flow intrinsics
Replace the brcond for the 2 cases that act as branches. For now
follow how the current system works, although I think we can
eventually get rid of the pseudos.
Matt Arsenault [Mon, 1 Jul 2019 18:33:37 +0000 (18:33 +0000)]
AMDGPU/GlobalISel: Handle 16-bit SALU min/max
This needs to be extended to s32, and expanded into cmp+select. This
is relying on the fact that widenScalar happens to leave the
instruction in place, but this isn't a guaranteed property of
LegalizerHelper.
The function findPotentialBlockers may consider debug info instructions as
potential blockers and may stop searching for a store-load pair prematurely.
This patch corrects this and tests the cases where the store is separated
from the load by more than InspectionLimit debug instructions.
Summary:
ds_ordered_count can now simultaneously operate on up to 4 dwords
in a single instruction, which are taken from (and returned to)
lanes 0..3 of a single VGPR.
Matt Arsenault [Mon, 1 Jul 2019 16:27:32 +0000 (16:27 +0000)]
AMDGPU/GlobalISel: Try to select VOP3 form of add
There are several things broken, but at least emit the right thing for
gfx9.
The import of the pattern with the unused carry out seems to not
work. Needs a special class for clamp, because OperandWithDefaultOps
doesn't really work.
Previously, the llvm-readelf documentation was essentially just a list
of differences to llvm-readobj. Since llvm-readelf is the more likely
goto tool for many people migrating to the LLVM toolchain, it seems like
it would be helpful to document all the switches in the llvm-readelf
document too. This change expands the options listed accordingly.
Additionally, they are unlikely to care what the differences are to
llvm-readobj, since they won't be familiar with the latter as there is
no GNU equivalent, so this change moves the "differences" section to
llvm-readobj's documentation.
Summary:
According to the ARMARM, the VQDMLADH, VQRDMLADH, VQDMLSDH and
VQRDMLSDH instructions handle their results as follows: "The base
variant writes the results into the lower element of each pair of
elements in the destination register, whereas the exchange variant
writes to the upper element in each pair". I.e., the initial content
of the output register affects the result, as usual, we model this
with an additional input.
Also, for 32-bit variants Qd is not allowed to be the same register as
Qm and Qn, we use @earlyclobber to indicate this.
This patch also changes vpred_r to vpred_n because the instructions
don't have an explicit 'inactive' operand.
[ARM] MVE: support QQPRRegClass and QQQQPRRegClass
Summary:
QQPRRegClass and QQQQPRRegClass are used by the
interleaving/deinterleaving loads/stores to represent sequences of
consecutive SIMD registers.
Roman Lebedev [Mon, 1 Jul 2019 15:55:24 +0000 (15:55 +0000)]
[InstCombine] (Y + ~X) + 1 --> Y - X fold (PR42459)
Summary:
To be noted, this pattern is not unhandled by instcombine per-se,
it is somehow does end up being folded when one runs opt -O3,
but not if it's just -instcombine. Regardless, that fold is
indirect, depends on some other folds, and is thus blind
when there are extra uses.
This does address the regression being exposed in D63992.
Roman Lebedev [Mon, 1 Jul 2019 15:55:15 +0000 (15:55 +0000)]
[InstCombine] Shift amount reassociation in bittest (PR42399)
Summary:
Given pattern:
`icmp eq/ne (and ((x shift Q), (y oppositeshift K))), 0`
we should move shifts to the same hand of 'and', i.e. rewrite as
`icmp eq/ne (and (x shift (Q+K)), y), 0` iff `(Q+K) u< bitwidth(x)`
It might be tempting to not restrict this to situations where we know
we'd fold two shifts together, but i'm not sure what rules should there be
to avoid endless combine loops.
We pick the same shift that was originally used to shift the variable we picked to shift:
https://rise4fun.com/Alive/6x1v
Should fix [[ https://bugs.llvm.org/show_bug.cgi?id=42399 | PR42399]].
Fix stack-use-after-scope errors from r364512. One instance was already
fixed in r364611 - this patch simplifies that fix and addresses one more
instance of similar code.
Jinsong Ji [Mon, 1 Jul 2019 14:37:48 +0000 (14:37 +0000)]
[UpdateTestChecks][PowerPC] Avoid empty string when scrubbing loop comments
Summary:
SCRUB_LOOP_COMMENT_RE was introduced in https://reviews.llvm.org/D31285
This works for some loops.
However, we may generate lines with loop comments only.
And since we don't scrub leading white spaces, this will leave an empty
line there, and FileCheck will complain it.
eg: llvm/test/CodeGen/PowerPC/PR35812-neg-cmpxchg.ll:27:15:
error: found empty check string with prefix 'CHECK:'
; CHECK-NEXT:
This prevented us from using the `update_llc_test_checks.py` for quite some cases.
We should still keep the comment token there, so that we can safely
scrub the loop comment without breaking FileCheck.
Matt Arsenault [Mon, 1 Jul 2019 13:22:07 +0000 (13:22 +0000)]
AMDGPU/GlobalISel: Fix scc->vcc copy handling
This was checking the size of the register with the value of the size,
which happens to be exec. Also fix assuming VCC is 64-bit to fix
wave32.
Also remove some untested handling for physical registers which is
skipped. This doesn't insert the V_CNDMASK_B32 if SCC is the physical
copy source. I'm not sure if this should be trying to handle this
special case instead of dealing with this in copyPhysReg.
Roman Lebedev [Mon, 1 Jul 2019 12:22:06 +0000 (12:22 +0000)]
[NFC][InstCombine] Tests for ((~x) + y) + 1 -> y - x fold fold (PR42459)
To be noted, this pattern is not unhandled by instcombine per-se,
it is somehow does end up being folded when one runs opt -O3,
but not if it's just -instcombine. Regardless, that fold is
indirect, depends on some other folds, and is thus blind
when there are extra uses.
Benjamin Kramer [Mon, 1 Jul 2019 11:00:23 +0000 (11:00 +0000)]
[SelectionDAG] Do minnum->minimum at legalization time instead of building time
The SDAGBuilder behavior stems from the days when we didn't have fast
math flags available in SDAG. We do now and doing the transformation in
the legalizer has the advantage that it also works for vector types.
Andrew Ng [Mon, 1 Jul 2019 10:58:20 +0000 (10:58 +0000)]
[benchmark] Disable CMake get_git_version
Disabled CMake get_git_version as it is meaningless for this in-tree
build, and hardcoded a null version.
Not using get_git_version avoids a refresh of the git index that is
executed by get_git_version. Refreshing the index can take a
considerable amount of time if the index needs to be refreshed
(particularly with the mono repo). This situation can arise when
building shared source on a host in VMs.
Jeremy Morse [Mon, 1 Jul 2019 09:38:23 +0000 (09:38 +0000)]
[DebugInfo] Avoid adding too much indirection to pointer-valued variables
This patch addresses PR41675, where a stack-pointer variable is dereferenced
too many times by its location expression, presenting a value on the stack as
the pointer to the stack.
The difference between a stack *pointer* DBG_VALUE and one that refers to a
value on the stack, is currently the indirect flag. However the DWARF backend
will also try to guess whether something is a memory location or not, based
on whether there is any computation in the location expression. By simply
prepending the stack offset to existing expressions, we can accidentally
convert a register location into a memory location, which introduces a
suprise (and unintended) dereference.
The solution is to add DW_OP_stack_value whenever we add a DIExpression
computation to a stack *pointer*. It's an implicit location computed on the
expression stack, thus needs to be flagged as a stack_value.
For the edge case where the offset is zero and the location could be a register
location, DIExpression::prepend will still generate opcodes, and thus
DW_OP_stack_value must still be added.
Sam Parker [Mon, 1 Jul 2019 08:21:28 +0000 (08:21 +0000)]
[ARM] WLS/LE Code Generation
Backend changes to enable WLS/LE low-overhead loops for armv8.1-m:
1) Use TTI to communicate to the HardwareLoop pass that we should try
to generate intrinsics that guard the loop entry, as well as setting
the loop trip count.
2) Lower the BRCOND that uses said intrinsic to an Arm specific node:
ARMWLS.
3) ISelDAGToDAG the node to a new pseudo instruction:
t2WhileLoopStart.
4) Add support in ArmLowOverheadLoops to handle the new pseudo
instruction.
[X86] Improve the type checking fast-isel handling of vector bitcasts.
We had a bunch of vector size legality checks for the source type
based on feature flags, but we didn't check the destination type at
all beyond ensuring that it was a "simple" type. But this allowed
the destination to be i128 which isn't legal.
This commit changes the code to use TLI's isTypeLegal logic in
place of the all the subtarget checks. Then additionally checks
that the source and dest are vectors.
[X86] Add a DAG combine to replace vector loads feeding a v4i32->v2f64 CVTSI2FP/CVTUI2FP node with a vzload.
But only when the load isn't volatile.
This improves load folding during isel where we only have vzload
and scalar_to_vector+load patterns. We can't have full vector load
isel patterns for the same volatile load issue.
Also add some missing masked cvtsi2fp/cvtui2fp with vzload patterns.
Mike Spertus [Sun, 30 Jun 2019 21:54:34 +0000 (21:54 +0000)]
Clean up MSVC visualization of LLVM pointer types
Create separate natvis ptr and int views for PointerIntPair.
These are convenient in watch Windows and will be used by
Clang visualizers to be checked in shortly
Also, removed deref views as the MSVC na format has
done the same thing natively since MSVC2013.
Sanjay Patel [Sun, 30 Jun 2019 13:40:31 +0000 (13:40 +0000)]
[InstCombine] canonicalize fcmp+select to minnum/maxnum intrinsics
This is the opposite direction of D62158 (we have to choose 1 form or the other).
Now that we have FMF on the select, this becomes more palatable. And the benefits
of having a single IR instruction for this operation (less chances of missing folds
based on extra uses, etc) overcome my previous comments about the potential advantage
of larger pattern matching/analysis.