Craig Topper [Wed, 2 Jan 2019 17:58:27 +0000 (17:58 +0000)]
[DAGCombiner][X86][PowerPC] Teach visitSIGN_EXTEND_INREG to fold (sext_in_reg (aext/sext x)) -> (sext x) when x has more than 1 sign bit and the sext_inreg is from one of them.
If x has multiple sign bits than it doesn't matter which one we extend from so we can sext from x's msb instead.
The X86 setcc-combine.ll changes are a little weird. It appears we ended up with a (sext_inreg (aext (trunc (extractelt)))) after type legalization. The sext_inreg+aext now gets optimized by this combine to leave (sext (trunc (extractelt))). Then we visit the trunc before we visit the sext. This ends up changing the truncate to an extractvectorelt from a bitcasted vector. I have a follow up patch to fix this.
Nico Weber [Wed, 2 Jan 2019 17:38:22 +0000 (17:38 +0000)]
[gn build] Add build files for bugpoint-passes and LLVMHello plugins
These two plugins are loaded into a host process that contains all LLVM
symbols, so they don't link against anything. This required minor readjustments
to the tablegen() setup of IR.
Wei Mi [Wed, 2 Jan 2019 17:07:23 +0000 (17:07 +0000)]
[PowerPC] Remove SeenUse check when optimizing conditional branch in
PPCPreEmitPeephole pass.
PPCPreEmitPeephole will convert a BC to B when the conditional branch is
based on a constant CR by CRSET or CRUNSET. This is added in
https://reviews.llvm.org/rL343100.
When the conditional branch is known to be always taken, all branches will
be removed and a new unconditional branch will be inserted. However, when
SeenUse is false the original patch will not remove the branches, but still
insert the new unconditional branch, update the successors and create
inconsistent IR. Compiling the synthetic testcase included can show the
problem we run into.
The patch simply removes the SeenUse condition when adding branches into
InstrsToErase set.
Hal Finkel [Wed, 2 Jan 2019 16:28:09 +0000 (16:28 +0000)]
[BasicAA] Support arbitrary pointer sizes (and fix an overflow bug)
Motivated by the discussion in D38499, this patch updates BasicAA to support
arbitrary pointer sizes by switching most remaining non-APInt calculations to
use APInt. The size of these APInts is set to the maximum pointer size (maximum
over all address spaces described by the data layout string).
Most of this translation is straightforward, but this patch contains a fix for
a bug that revealed itself during this translation process. In order for
test/Analysis/BasicAA/gep-and-alias.ll to pass, which is run with 32-bit
pointers, the intermediate calculations must be performed using 64-bit
integers. This is because, as noted in the patch, when GetLinearExpression
decomposes an expression into C1*V+C2, and we then multiply this by Scale, and
distribute, to get (C1*Scale)*V + C2*Scale, it can be the case that, even
through C1*V+C2 does not overflow for relevant values of V, (C2*Scale) can
overflow. If this happens, later logic will draw invalid conclusions from the
(base) offset value. Thus, when initially applying the APInt conversion,
because the maximum pointer size in this test is 32 bits, it started failing.
Suspicious, I created a 64-bit version of this test (included here), and that
failed (miscompiled) on trunk for a similar reason (the multiplication can
overflow).
After fixing this overflow bug, the first test case (at least) in
Analysis/BasicAA/q.bad.ll started failing. This is also a 32-bit test, and was
relying on having 64-bit intermediate values to have BasicAA return an accurate
result. In order to fix this problem, and because I believe that it is not
uncommon to use i64 indexing expressions in 32-bit code (especially portable
code using int64_t), it seems reasonable to always use at least 64-bit
integers. In this way, we won't regress our analysis capabilities (and there's
a command-line option added, so experimenting with this should be easy).
As pointed out by Eli during the review, there are other potential overflow
conditions that this patch does not address. Fixing those is left to follow-up
work.
Patch by me with contributions from Michael Ferguson (mferguson@cray.com).
Philip Pfaffe [Wed, 2 Jan 2019 15:41:47 +0000 (15:41 +0000)]
Extend Module::getOrInsertGlobal to control the construction of the
GlobalVariable
Summary:
Extend Module::getOrInsertGlobal to accept a callback for creating a new
GlobalVariable if necessary instead of calling the GV constructor
directly using default arguments. Additionally overload
getOrInsertGlobal for the previous default behavior.
[MCA] Minor refactoring of method DefaultResourceStrategy::select. NFCI
Common code used by the default resource strategy to select pipeline resources
has been moved to an helper function.
The new selection logic has been slightly rewritten to get rid of a redundant
zero check on the `ReadyMask` value. Before this patch, method select internally
called function `PowerOf2Floor` to compute the next ready pipeline resource.
However, `PowerOf2Floor` forces an implicit (redundant) zero check on the input
value. By construction, `ReadyMask` can never be zero. This patch replaces the
call to `PowerOf2Floor` with an equivalent block of code which avoids the
redundant zero check. This gives a minor 3-3.5% speedup on a release build.
Craig Topper [Wed, 2 Jan 2019 06:40:11 +0000 (06:40 +0000)]
[X86] Remove the separate SMUL8/UMUL8 X86ISD opcodes by merging with SMUL/UMUL. Remove the second result from X86ISD::UMUL.
All of these use custom isel so we can pretty easily detect the differences in the custom code in X86ISelDAGToDAG. The ISD opcodes just need to express the desired semantics not the details of how they would be selected by isel. So unifying them lets us remove the special casing from lowering.
Craig Topper [Wed, 2 Jan 2019 05:46:03 +0000 (05:46 +0000)]
[X86] Allow LowerSELECT and LowerBRCOND to directly lower i8 UMULO/SMULO.
These require a different X86ISD node to be created than i16/i32/i64. I guess no one wanted to add the special code for that except in LowerXALUO. But now LowerXALUO, LowerSELECT, and LowerBRCOND all use a common helper function so they all share the special code.
Unfortunately, there are no test changes because we seem to correct the miss in a DAG combine later. I did verify it manually using test cases from xmulo.ll
Craig Topper [Wed, 2 Jan 2019 05:46:00 +0000 (05:46 +0000)]
[X86] Remove KNL specific check prefix from xmulo.ll test. NFC
This was added at a time when i1 was a legal type with avx512f and there was a bug. i1 is no longer considered a legal type with avx512f so there should be no codegen difference.
Sanjay Patel [Tue, 1 Jan 2019 21:51:39 +0000 (21:51 +0000)]
[InstCombine] canonicalize raw IR rotate patterns to funnel shift
The final piece of IR-level analysis to allow this was committed with:
rL350188
Using the intrinsics should improve transforms based on cost models
like vectorization and inlining.
The backend should be prepared too, so we can now canonicalize more
sequences of shift/logic to the intrinsics and know that the end
result should be equal or better to the original code even if the
target does not have an actual rotate instruction.
Craig Topper [Tue, 1 Jan 2019 19:34:11 +0000 (19:34 +0000)]
[X86] Factor the core code out of LowerXALUO into a helper function. Use it in LowerBRCOND and LowerSELECT to avoid some duplicated code.
This makes it easier to keep the LowerBRCOND and LowerSELECT code in sync with LowerXALUO so they always pick the same operation for overflowing instructions.
This is inspired by the helper functions used by ARM and AArch64 for the same purpose.
The test change is because LowerSELECT was not in sync with LowerXALUO with regard to INC/DEC for SADDO/SSUBO.
Craig Topper [Tue, 1 Jan 2019 18:44:44 +0000 (18:44 +0000)]
[X86] Remove KNL specific check prefix from xaluo.ll test. NFC
This was added at a time when i1 was a legal type with avx512f and there was a bug. i1 is no longer considered a legal type with avx512f so there should be no codegen difference.
Craig Topper [Tue, 1 Jan 2019 18:44:42 +0000 (18:44 +0000)]
[X86] Add test cases to show where LowerSELECT doesn't select SADDO/SSUBO to INC/DEC, but LowerXALUOOp does. Leading to duplicate code.
When SADDO/SSUBO is used as a part of a condition, the X86 backend has to lower the instruction twice. One for the flags use and then once for the data use. These two selections should be kept in sync so they end up with one node providing the data and the flags. This doesn't seem to be happening for INC/DEC.
Nikita Popov [Tue, 1 Jan 2019 10:17:35 +0000 (10:17 +0000)]
[BDCE] Remove -instsimplify from BDCE test; NFC
To make it more obvious which part of the transformation is carried
out by BDCE. Also drop the CHECK-IO lines which only run -instsimplify
as they don't really seem meaningful if the main check doesn't run
-instsimplify either.
Nikita Popov [Tue, 1 Jan 2019 10:05:26 +0000 (10:05 +0000)]
Reapply "[BDCE][DemandedBits] Detect dead uses of undead instructions"
This (mostly) fixes https://bugs.llvm.org/show_bug.cgi?id=39771.
BDCE currently detects instructions that don't have any demanded bits
and replaces their uses with zero. However, if an instruction has
multiple uses, then some of the uses may be dead (have no demanded bits)
even though the instruction itself is still live. This patch extends
DemandedBits/BDCE to detect such uses and replace them with zero.
While this will not immediately render any instructions dead, it may
lead to simplifications (in the motivating case, by converting a rotate
into a simple shift), break dependencies, etc.
The implementation tries to strike a balance between analysis power and
complexity/memory usage. Originally I wanted to track demanded bits on
a per-use level, but ultimately we're only really interested in whether
a use is entirely dead or not. I'm using an extra set to track which uses
are dead. However, as initially all uses are dead, I'm not storing uses
those user is also dead. This case is checked separately instead.
The previous attempt to land this lead to miscompiles, because cases
where uses were initially dead but were later found to be live during
further analysis were not always correctly removed from the DeadUses
set. This is fixed now and the added test case demanstrates such an
instance.
Ayonam Ray [Tue, 1 Jan 2019 06:37:50 +0000 (06:37 +0000)]
Omit range checks from jump tables when lowering switches with unreachable
default
During the lowering of a switch that would result in the generation of a jump
table, a range check is performed before indexing into the jump table, for the
switch value being outside the jump table range and a conditional branch is
inserted to jump to the default block. In case the default block is
unreachable, this conditional jump can be omitted. This patch implements
omitting this conditional branch for unreachable defaults.
Nico Weber [Mon, 31 Dec 2018 23:48:22 +0000 (23:48 +0000)]
[gn build] Add some llvm/tools: llvm-exegesis, llvm-extract, llvm-link
Also add build file for dependency llvm/lib/ExecutionEngine/MCJIT.
The exegesis stuff is pretty hairy and knows a lot about Target internals (in
general, not specifically in the GN build). I put the llvm-tblgen -gen-exegesis
call in llvm/tools/llvm-exegesis/lib/X86, instead of in llvm/lib/Target/X86
where it is in CMake land, and asked on D52932 why it's in that place in the
CMake build.
Michal Gorny [Mon, 31 Dec 2018 13:48:12 +0000 (13:48 +0000)]
[test] Fix propagating HOME envvar to unittests
Propagate HOME environment variable to unittests. This is necessary
to fix test failures resulting from pw_home pointing to a non-existing
directory while being overriden with HOME. Apparently Gentoo users
hit this sometimes when they override build directory for Portage.
Original bug report: https://bugs.gentoo.org/674088
MSan used to report false positives in the case the argument of
llvm.is.constant intrinsic was uninitialized.
In fact checking this argument is unnecessary, as the intrinsic is only
used at compile time, and its value doesn't depend on the value of the
argument.
Nico Weber [Mon, 31 Dec 2018 00:10:47 +0000 (00:10 +0000)]
[gn build] Make `ninja check-clang` also run Clang's unit tests
Also add a build file for clang/lib/ASTMatchers/Dynamic, which is only needed
by tests (and clang/tools/extra).
Also make llvm/utils/gn/build/sync_source_lists_from_cmake.py check that every
CMakeLists.txt file below {lld,clang}/unittests has a corresponding BUILD.gn
file, so we notice if new test binaries get added (since the failure mode for
missing GN build files for tests is just the tests silently not running in the
GN build).
Also add a unittest() macro for defining unit test targets, and add a lengthy
comment there about where the unit test binaries go and why.
With this, the build files for //clang are complete.
Kang Zhang [Sun, 30 Dec 2018 15:13:51 +0000 (15:13 +0000)]
[PowerPC] Fix machine verify pass error for PATCHPOINT pseudo instruction that bad machine code
Summary:
For SDAG, we pretend patchpoints aren't special at all until we emit the code for the pseudo.
Then the verifier runs and it seems like we have a use of an undefined register (the register will
be reserved later, but the verifier doesn't know that).
So this patch call setUsesTOCBasePtr before emit the code for the pseudo, so verifier can know
X2 is a reserved register.
Craig Topper [Sun, 30 Dec 2018 03:05:07 +0000 (03:05 +0000)]
[X86] Don't mark SEXTLOAD from v4i8/v4i16/v8i8 as Custom on pre-sse4.1.
This seems to be getting in the way more than its helping. This does mean we stop scalarizing some cases, but I'm not convinced the scalarization was really better.
Some of the changes to vsel-cmp-load.ll are a regression but D56156 should fix it.
Craig Topper [Sun, 30 Dec 2018 02:30:34 +0000 (02:30 +0000)]
[X86] Add custom type legalization for SIGN_EXTEND_VECTOR_INREG from 16i16/v32i8 to v4i64 when v4i64 needs splitting.
This allows us to sign extend to v4i32 first. And then share that extension to implement the final steps to v4i64 using a pcmpgt and punpckl and punpckh.
We already do something similar for SIGN_EXTEND with -x86-experimental-vector-widening-legalization.
Nemanja Ivanovic [Sat, 29 Dec 2018 16:13:11 +0000 (16:13 +0000)]
[PowerPC][NFC] Macro for register set defs for the Asm Parser
We have some unfortunate code in the back end that defines a bunch of register
sets for the Asm Parser. Every time another class is needed in the parser, we
have to add another one of those definitions with explicit lists of registers.
This NFC patch simply provides macros to use to condense that code a little bit.
Nemanja Ivanovic [Sat, 29 Dec 2018 13:40:48 +0000 (13:40 +0000)]
[PowerPC] Complete the custom legalization of vector int to fp conversion
A recent patch has added custom legalization of vector conversions of
v2i16 -> v2f64. This just rounds it out for other types where the input vector
has an illegal (narrower) type than the result vector. Specifically, this will
handle the following conversions:
Nemanja Ivanovic [Sat, 29 Dec 2018 11:43:54 +0000 (11:43 +0000)]
[PowerPC] Fix CR Bit spill pseudo expansion
The current CRBIT spill pseudo-op expansion creates a KILL instruction
that kills the CRBIT and defines the enclosing CR field. However, this
paints a false picture to the register allocator that all bits in the CR
field are killed so copies of other bits out of the field become dead and
removable.
This changes the expansion to preserve the KILL flag on the CRBIT as an
implicit use and to treat the CR field as an undef input.
Thanks to Hal Finkel for the review and Uli Weigand for implementation input.
Simon Atanasyan [Sat, 29 Dec 2018 10:10:02 +0000 (10:10 +0000)]
[mips] Show an error on attempt to use 64-bit PC-relative relocation
The following code requests 64-bit PC-relative relocations unsupported
by MIPS ABI. Now it triggers an assertion. It's better to show an error
message.
```
foo:
.quad bar - foo
```
Craig Topper [Sat, 29 Dec 2018 01:17:11 +0000 (01:17 +0000)]
[X86] Don't mark SEXTLOAD v4i8->v4i64 and v8i8->v8i64 as custom under vector widening legalization.
This was tricking us into making these operations and then letting them get scalarized later. But I can't prove that the scalarized version is actually better.
Craig Topper [Fri, 28 Dec 2018 19:19:39 +0000 (19:19 +0000)]
[X86] Directly emit X86ISD::PMULUDQ from the ReplaceNodeResults handling of v2i8/v2i16/v2i32 multiply.
Previously we emitted a multiply and some masking that was supposed to matched to PMULUDQ, but the masking could sometimes be removed before we got a chance to match it. So instead just emit the PMULUDQ directly.
Remove the DAG combine that was added when the ReplaceNodeResults code was originally added. Add a new DAG combine to avoid regressions in shrink_vmul.ll
Some of the shrink_vmul.ll test cases now pick PMULUDQ instead of PMADDWD/PMULLD, but I think this should be an improvement on most CPUs.
I think all of this can go away if/when we switch to -x86-experimental-vector-widening-legalization
Diogo N. Sampaio [Fri, 28 Dec 2018 17:14:58 +0000 (17:14 +0000)]
[AArch64] Add command-line option for SB
SB (Speculative Barrier) is only mandatory from 8.5
onwards but is optional from Armv8.0-A. This patch adds a command
line option to enable SB, as it was previously only possible to
enable by selecting -march=armv8.5-a.
This patch also moves to FeatureSB the old FeatureSpecRestrict.
Hiroshi Inoue [Fri, 28 Dec 2018 08:00:39 +0000 (08:00 +0000)]
[PowerPC] handle ISD:TRUNCATE in BitPermutationSelector
This is the last one in a series of patches to support better code generation for bitfield insert.
BitPermutationSelector already support ISD::ZERO_EXTEND but not TRUNCATE.
This patch adds support for ISD:TRUNCATE in BitPermutationSelector.
For example of this test case,
struct s64b {
int a:4;
int b:16;
int c:24;
};
void bitfieldinsert64b(struct s64b *p, unsigned char v) {
p->b = v;
}
By handling truncate in the BitPermutationSelector, we can use information from AssertZext when selecting t19 and skip the mask operation corresponding to t18.
So the generated sequences with and without this patch are
without this patch
rlwinm 5, 5, 0, 28, 11 # corresponding to t18
rlwimi 5, 4, 4, 20, 27
with this patch
rlwimi 5, 4, 4, 12, 27
Max Kazantsev [Fri, 28 Dec 2018 06:08:51 +0000 (06:08 +0000)]
[LoopSimplifyCFG] Delete dead blocks in RPO
Deletion of dead blocks in arbitrary order may lead to failure
of assertion in `DeleteDeadBlock` that requires that we have
deleted all predecessors before we can delete the current block.
We should instead delete them in RPO order.
QingShan Zhang [Fri, 28 Dec 2018 03:38:09 +0000 (03:38 +0000)]
[PowerPC] Remove the implicit use of the register if it is replaced by Imm
If we are changing the MI operand from Reg to Imm, we need also handle its implicit use if have.
Zi Xuan Wu [Fri, 28 Dec 2018 02:12:55 +0000 (02:12 +0000)]
[PowerPC] Fix assert from machine verify pass that atomic pseudo expanding causes mismatched register class
For atomic value operand which less than 4 bytes need to be masked.
And the related operation to calculate the newvalue can be done in 32 bit gprc.
So just use gprc for mask and value calculation.
Chandler Carruth [Thu, 27 Dec 2018 23:40:17 +0000 (23:40 +0000)]
[CallSite removal] Add and flesh out APIs on the new `CallBase` base class that previously were only available on the `CallSite` wrapper.
Summary:
This will make migrating code easier and generally seems like a good collection
of API improvements.
Some of these APIs seem like more consistent / better naming of existing
ones. I've retained the old names for migration simplicit and am just
adding the new ones in this commit. I'll try to garbage collect these
once CallSite is gone.
Nico Weber [Thu, 27 Dec 2018 23:38:58 +0000 (23:38 +0000)]
[gn build] Add check-clang target and make it work
With this, check-clang runs and passes all of clang's lit tests. It doesn't run
any of its unit tests yet.
Like with check-lld, running just ninja -C out/gn will build all prerequisites
needed to run tests, but it won't run the tests (so that the build becomes
clean after one build). Running ninja -C out/gn check-clang will build
prerequisites if needed and run the tests. The check-clang target never becomes
clean and runs tests every time.
Craig Topper [Thu, 27 Dec 2018 01:50:40 +0000 (01:50 +0000)]
[X86] Factor the core code out of LowerSETCC into a helper that can create CMP/BT/PTEST/KORTEST etc. without making an X86ISD::SETCC node. NFCI
Make each of the helper functions only return their comparison node and the condition code. Leave X86ISD::SETCC creation to the LowerSETCC function itself.
Looking into whether we can use this code directly in BRCOND and SELECT lowering instead of going through LowerSETCC which creates an X86ISD::SETCC node we need to look through.
Craig Topper [Thu, 27 Dec 2018 01:50:38 +0000 (01:50 +0000)]
[X86] Merge getBitTestCondition into LowerAndToBT. Don't create X86ISD::SETCC node in the merged function. NFCI
Only one of the 3 callers of LowerAndToBT need the SETCC node. Two of them have to look through it to find the operands they really need. Instead create it after the one call that needs it.
LowerAndToBT now returns both the BT node and the X86 specific condition code separately.
[WebAssembly] Added basic support for if/else/end_if in MC layer.
Summary:
These instructions are currently unused in our backend, but for
completeness it is good to support them, so they can be used with
the assembler in hand-written code.
Tests are very basic, signature support missing much like other blocks.
[WebAssembly] Make assembler check for proper nesting of control flow.
Summary:
It does so using a simple nesting stack, and gives clear errors upon
violation. This is unique to wasm, since most CPUs do not have
any nested constructs.
Had to add an end of file check to the general assembler for this.
Note: if/else/end instructions are not currently supported in our
tablegen defs, so these tests will be enabled in a follow-up.
They already pass the nesting check.
Reid Kleckner [Wed, 26 Dec 2018 21:52:17 +0000 (21:52 +0000)]
[codeview] Check if this 'this' type of a method is a pointer
Fixes crash reported after r347354 for frontends that don't always emit
'this' pointers for methods. Now we will silently produce debug info
that makes functions like this look like static methods, which seems
reasonable.
Justin Lebar [Wed, 26 Dec 2018 19:12:31 +0000 (19:12 +0000)]
[NVPTX] Allow libcalls that are defined in the current module.
The patch adds a possibility to make library calls on NVPTX.
An important thing about library functions - they must be defined within
the current module. This basically should guarantee that we produce a
valid PTX assembly (without calls to not defined functions). The one who
wants to use the libcalls is probably will have to link against
compiler-rt or any other implementation.
Currently, it's completely impossible to make library calls because of
error LLVM ERROR: Cannot select: i32 = ExternalSymbol '...'. But we can
lower ExternalSymbol to TargetExternalSymbol and verify if the function
definition is available.
Also, there was an issue with a DAG during legalisation. When we expand
instruction into libcall, the inner call-chain isn't being "integrated"
into outer chain. Since the last "data-flow" (call retval load) node is
located in call-chain earlier than CALLSEQ_END node, the latter becomes
a leaf and therefore a dead node (and is being removed quite fast).
Proposed here solution relies on another data-flow pseudo nodes
(ProxyReg) which purpose is only to keep CALLSEQ_END at legalisation and
instruction selection phases - we remove the pseudo instructions before
register scheduling phase.
Kang Zhang [Tue, 25 Dec 2018 03:29:51 +0000 (03:29 +0000)]
[PowerPC] Fix the bug of ISD::ADDE to set its second return type to glue
Summary:
This patch is to fix the bug imported by rL341634.
In above submit , the the return type of ISD::ADDE is
14224: SDVTList VTs = DAG.getVTList(MVT::i64, MVT::i64),
but in fact, the second return type of ISD::ADDE should be
MVT::Glue not MVT::i64.
Nico Weber [Mon, 24 Dec 2018 23:06:29 +0000 (23:06 +0000)]
[gn build] Make NOSORT line actually work
GN wants the NOSORT line to be the first line of a comment block, not the last
line.
I sent https://gn-review.googlesource.com/c/gn/+/3560 to support having it in
the last line too, but since it will be a while until everyone has that change
even if it's expected, use the form that works today.
Craig Topper [Mon, 24 Dec 2018 19:40:20 +0000 (19:40 +0000)]
[X86] Use GetDemandedBits to simplify the operands of PMULDQ/PMULUDQ.
This is an alternative to what I attempted in D56057.
GetDemandedBits is a special version of SimplifyDemandedBits that allows simplifications even when the operand has other uses. GetDemandedBits will only do simplifications that allow a node to be bypassed. It won't create new nodes or alter any of the other users.
I had to add support for bypassing SIGN_EXTEND_INREG to GetDemandedBits.
Based on a patch that Simon Pilgrim sent me in email.