Updates the MSP430 target to generate EABI-compatible libcall names.
As a byproduct, adjusts the hardware multiplier options available in
the MSP430 target, adds support for promotion of the ISD::MUL operation
for 8-bit integers, and correctly marks R11 as used by call instructions.
Davide Italiano [Thu, 11 May 2017 19:37:43 +0000 (19:37 +0000)]
[LiveVariables] Switch Kill/Defs sets to be DenseSet(s).
The testcase in PR32984 shows a non linear compile time increase
after a change that made the LoopUnroll pass more aggressive
(increasing the threshold).
My profiling shows all the time of PHI elimination goes to
llvm::LiveVariables::addNewBlock. This is because we keep
Defs/Kills registers in a SmallSet and vfind(const T &V); is O(N).
Switching to a DenseSet reduces the time spent in the pass from
297 seconds to 97 seconds. Profiling still shows a lot of time is
spent iterating the data structure, so I guess there's room for
improvement.
Dan tells me GCC uses real set operations for live registers and
it takes no-time on this testcase. Matthias points out we might
want to switch all this to LiveIntervalAnalysis so it's not entirely
sure if a rewrite is worth it.
Adam Nemet [Thu, 11 May 2017 17:06:17 +0000 (17:06 +0000)]
[SLP] Emit optimization remarks
The approach I followed was to emit the remark after getTreeCost concludes
that SLP is profitable. I initially tried emitting them after the
vectorizeRootInstruction calls in vectorizeChainsInBlock but I vaguely
remember missing a few cases for example in HorizontalReduction::tryToReduce.
ORE is placed in BoUpSLP so that it's available from everywhere (notably
HorizontalReduction::tryToReduce).
We use the first instruction in the root bundle as the locator for the remark.
In order to get a sense how far the tree is spanning I've include the size of
the tree in the remark. This is not perfect of course but it gives you at
least a rough idea about the tree. Then you can follow up with -view-slp-tree
to really see the actual tree.
This patch is the first in a series of patches to provide code gen for
doing compares in GPRs when the compare result is required in a GPR.
It adds the infrastructure to select GPR sequences for i1->i32 and i1->i64
extensions. This first patch handles equality comparison on i32 operands with
the result sign or zero extended.
Tim Northover [Thu, 11 May 2017 14:51:43 +0000 (14:51 +0000)]
Modules: fix modules build.
A recent commit made GlobalVariable.h depend on intrinsics generation, so (I
think) it needs to be in the lower-level module. I'll confirm with others, but
this should fix the bots.
Javed Absar [Thu, 11 May 2017 12:28:08 +0000 (12:28 +0000)]
[IR] Allow attributes with global variables
This patch extends llvm-ir to allow attributes to be set on global variables.
An RFC was sent out earlier by my colleague James Molloy: http://lists.llvm.org/pipermail/cfe-dev/2017-March/053100.html
A key part of that proposal was to extend LLVM-IR to carry attributes on global variables.
This generic feature could be useful for multiple purposes.
In our present context, it would be useful to carry user specified sections for bss/rodata/data.
Reviewed by: Jonathan Roelofs, Reid Kleckner
Differential Revision: https://reviews.llvm.org/D32009
Ayal Zaks [Thu, 11 May 2017 11:36:33 +0000 (11:36 +0000)]
[LV] Refactor ILV.vectorize{Loop}() by introducing LVP.executePlan(); NFC
Introduce LoopVectorizationPlanner.executePlan(), replacing ILV.vectorize() and
refactoring ILV.vectorizeLoop(). Method collectDeadInstructions() is moved from
ILV to LVP. These changes facilitate building VPlans and using them to generate
code, following https://reviews.llvm.org/D28975 and its tentative breakdown.
Method ILV.createEmptyLoop() is renamed ILV.createVectorizedLoopSkeleton() to
improve clarity; it's contents remain intact.
It turned out that MSan was incorrectly calculating the shadow for int comparisons: it was done by truncating the result of (Shadow1 OR Shadow2) to i1, effectively rendering all bits except LSB useless.
This approach doesn't work e.g. in the case where the values being compared are even (i.e. have the LSB of the shadow equal to zero).
Instead, if CreateShadowCast() has to cast a bigger int to i1, we replace the truncation with an ICMP to 0.
This patch doesn't affect the code generated for SPEC 2006 binaries, i.e. there's no performance impact.
For the test case reported in PR32842 MSan with the patch generates a slightly more efficient code:
Chandler Carruth [Thu, 11 May 2017 10:52:16 +0000 (10:52 +0000)]
[x86] Fix a failure to select with AVX-512 when the type legalizer
manages to form a VSELECT with a non-i1 element type condition. Those
are technically allowed in SDAG (at least, the generic type legalization
logic will form them and I wouldn't want to try to audit everything te
preclude forming them) so we need to be able to lower them.
This isn't too hard to implement. We mark VSELECT as custom so we get
a chance in C++, add a fast path for i1 conditions to get directly
handled by the patterns, and a fallback when we need to manually force
the condition to be an i1 that uses the vptestm instruction to turn
a non-mask into a mask.
This, unsurprisingly, generates awful code. But it at least doesn't
crash. This was actually impacting open source packages built with LLVM
for AVX-512 in the wild, so quickly landing a patch that at least stops
the immediate bleeding.
I think I've found where to fix the codegen quality issue, but less
confident of that change so separating it out from the thing that
doesn't change the result of any existing test case but causes mine to
not crash.
Diana Picus [Thu, 11 May 2017 09:45:57 +0000 (09:45 +0000)]
[ARM][GlobalISel] Legalize narrow scalar ops by widening
This is the same as r292827 for AArch64: we widen 8- and 16-bit ADD, SUB
and MUL to 32 bits since we only have TableGen patterns for 32 bits.
See the commit message for r292827 for more details.
At this point we could just remove some of the tests for regbankselect
and instruction-select, since we're not going to see any narrow
operations at those levels anymore. Instead I decided to update them
with G_ANYEXT/G_TRUNC operations, so we can validate the full sequences
generated by the legalizer.
Diana Picus [Thu, 11 May 2017 08:28:31 +0000 (08:28 +0000)]
[ARM][GlobalISel] Support for G_ANYEXT
G_ANYEXT can be introduced by the legalizer when widening scalars. Add
support for it in the register bank info (same mapping as everything
else) and in the instruction selector.
When selecting it, we treat it as a COPY, just like G_TRUNC. On this
occasion we get rid of some assertions in selectCopy so we can reuse it.
This shouldn't be a problem at the moment since we're not supporting any
complicated cases (e.g. FPR, different register banks). We might want to
separate the paths when we do.
Igor Breger [Thu, 11 May 2017 06:36:37 +0000 (06:36 +0000)]
[X86] Move getX86ConditionCode() from X86FastISel.cpp to X86InstrInfo.cpp. NFC
Summary:
Move getX86ConditionCode() from X86FastISel.cpp to X86InstrInfo.cpp so it can be used by GloabalIsel instruction selector.
This is a pre-commit for a patch I'm working on to support G_ICMP. NFC.
Sanjay Patel [Wed, 10 May 2017 21:33:55 +0000 (21:33 +0000)]
[InstCombine] remove fold that swaps xor/or with constants; NFCI
// (X ^ C1) | C2 --> (X | C2) ^ (C1&~C2)
This canonicalization was added at:
https://reviews.llvm.org/rL7264
By moving xors out/down, we can more easily combine constants. I'm adding
tests that do not change with this patch, so we can verify that those kinds
of transforms are still happening.
This is no-functional-change-intended because there's a later fold:
// (X^C)|Y -> (X|Y)^C iff Y&C == 0
...and demanded-bits appears to guarantee that any fold that would have
hit the fold we're removing here would be caught by that 2nd fold.
Similar reasoning was used in:
https://reviews.llvm.org/rL299384
The larger motivation for removing this code is that it could interfere with
the fix for PR32706:
https://bugs.llvm.org/show_bug.cgi?id=32706
Ie, we're not checking if the 'xor' is actually a 'not', so we could reverse
a 'not' optimization and cause an infinite loop by altering an 'xor X, -1'.
Craig Topper [Wed, 10 May 2017 20:01:48 +0000 (20:01 +0000)]
[ConstantRange] Fix the early out in ConstantRange::multiply for positive numbers to really do what the comment says
r271020 added an early out to skip the signed multiply portion of ConstantRange::multiply. The comment says we don't need to do signed multiply if the range is only positive numbers, but the implemented check only ensures that the start of the range is positive. It doesn't look at the end of the range.
This patch checks the end of the range instead. Because Upper is one more than the end we have to see if its positive or if its one past the last positive number.
Nirav Dave [Wed, 10 May 2017 19:53:41 +0000 (19:53 +0000)]
[SDAG] Relax conditions under stores of loaded values can be merged
Summary:
Allow consecutive stores whose values come from consecutive loads to
merged in the presense of other uses of the loads. Previously this was
disallowed as in general the merged load cannot be shared with the
other uses. Merging N stores into 1 may cause as many as N redundant
loads. However in the context of caching this should have neglible
affect on memory pressure and reduce instruction count making it
almost always a win.
Teresa Johnson [Wed, 10 May 2017 18:52:16 +0000 (18:52 +0000)]
Ensure non-null ProfileSummaryInfo passed to ModuleSummaryIndex builder
This fixes a ubsan bot failure after r302597, which made getProfileCount
non-static, but ended up invoking it on a null ProfileSummaryInfo object
in some cases from buildModuleSummaryIndex.
Most testing passed because the non-static getProfileCount currently
doesn't access any member variables, but I found this when testing a
follow on patch (D32877) that adds a member variable access.
Craig Topper [Wed, 10 May 2017 18:15:24 +0000 (18:15 +0000)]
[APInt] Make toString use udivrem instead of calling the divide helper method directly. Do a better job of reusing allocations while looping. NFCI
This lets toString take advantage of the degenerate case checks in udivrem and is just generally cleaner.
One minor downside of this is that the divisor APInt now needs to be the same size as Tmp which requires an additional allocation. But we were doing a poor job of reusing allocations before so the new code should still be an improvement.
Craig Topper [Wed, 10 May 2017 18:15:20 +0000 (18:15 +0000)]
[APInt] Use uint32_t instead of unsigned for the storage type throughout the divide code. Use Lo_32/Hi_32/Make_64 helpers instead of casts and shifts. NFCI
Amara Emerson [Wed, 10 May 2017 15:15:38 +0000 (15:15 +0000)]
[AArch64] Enable use of reduction intrinsics.
The new experimental reduction intrinsics can now be used, so I'm enabling this
for AArch64. We will need this for SVE anyway, so it makes sense to do this for
NEON reductions as well.
The existing code to match shufflevector patterns are replaced with a direct
lowering of the reductions to AArch64-specific nodes. Tests updated with the
new, simpler, representation.
Sanjay Patel [Wed, 10 May 2017 14:54:49 +0000 (14:54 +0000)]
[InstCombine] remove redundant tests
The first test in this file is duplicated exactly in and.ll -> test33.
We have commuted and vector variants there too.
The second test is a composite of 2 folds. The first fold is tested
independently in add.ll -> flip_and_mask (including vector variant).
After that transform fires, the IR is identical to the first transform.
Ulrich Weigand [Wed, 10 May 2017 14:20:15 +0000 (14:20 +0000)]
[SystemZ] Add miscellaneous instructions
This adds a few missing instructions for the assembler and
disassembler. Those should be the last missing general-
purpose (Chapter 7) instructions for the z10 ISA.
Ulrich Weigand [Wed, 10 May 2017 14:18:47 +0000 (14:18 +0000)]
[SystemZ] Add missing arithmetic instructions
This adds the remaining general arithmetic instructions
for assembler / disassembler use. Most of these are not
useful for codegen; a few might be, and those are listed
in the README.txt for future improvements.
Sam Clegg [Wed, 10 May 2017 14:18:11 +0000 (14:18 +0000)]
[llvm-readobj] Improve errors on invalid binary
The previous code was discarding the error message from
createBinary() by calling errorToErrorCode().
This meant that such error were always reported unhelpfully
as "Invalid data was encountered while parsing the file".
Other tools such as llvm-objdump already produce a more
the error message in this case.
This is another step towards favoring 'not' ops over random 'xor' in IR:
https://bugs.llvm.org/show_bug.cgi?id=32706
This transformation may have occurred in longer IR sequences using computeKnownBits,
but that could be much more expensive to calculate.
As the scalar result shows, we do not currently favor 'not' in all cases. The 'not'
created by the transform is transformed again (unnecessarily). Vectors don't have
this problem because vectors are (wrongly) excluded from several other combines.
[LLVM][inline-asm] Altmacro string escape character '!'
This patch is the fourth patch in a series of reviews for the Altmacro feature.
This patch introduces a new escape character '!' and it depends on D32701.
according to https://sourceware.org/binutils/docs/as/Altmacro.html:
"single-character string escape
To include any single character literally in a string (even if the character would otherwise have some special meaning), you can prefix the character with !' (an exclamation mark). For example, you can write <4.3 !> 5.4!!>' to get the literal text `4.3 > 5.4!'. "
Mikael Holmen [Wed, 10 May 2017 13:06:13 +0000 (13:06 +0000)]
[IfConversion] Add missing check in IfConversion/canFallThroughTo
Summary:
When trying to figure out if MBB could fallthrough to ToMBB (possibly by
falling through a bunch of other MBBs) we didn't actually check if there
was fallthrough between the last two blocks in the chain.
Jonas Paulsson [Wed, 10 May 2017 13:03:25 +0000 (13:03 +0000)]
[SystemZ] Implement getRepRegClassFor()
This method must return a valid register class, or the list-ilp isel
scheduler will crash. For MVT::Untyped nullptr was previously returned, but
now ADDR128BitRegClass is returned instead. This is needed just as long as
list-ilp (and probably also list-hybrid) is still there.
Review: Ulrich Weigand, A Trick
https://reviews.llvm.org/D32802
Ulrich Weigand [Wed, 10 May 2017 12:39:11 +0000 (12:39 +0000)]
[SystemZ] Reformat assembler/disassembler tests
The assembler and disassmebler test cases started out formatted and
sorted in a particular way, but this got lost over time as patches
were added. Reformat them again. NFC.
Chandler Carruth [Wed, 10 May 2017 12:30:07 +0000 (12:30 +0000)]
Revert r301950: SpeculativeExecution: Stop using whitelist for costs
This pass doesn't correctly handle testing for when it is legal to hoist
arbitrary instructions. The whitelist happens to make it safe, so before
it is removed the pass's legality checks will need to be enhanced.
Details have been added to the code review thread for the patch.
Martin Storsjo [Wed, 10 May 2017 10:51:32 +0000 (10:51 +0000)]
[AArch64] Fix a comment to match the code. NFC.
For the ELF case, the default/preferred form is the generic one, not
the short one as used for Apple - fix the comment to say so. Currently
it is a copy-paste typo.
Make the comments on the darwin default a bit more verbose.
Use enum names instead of literal 0/1 to further increase readability
and reduce fragility.
Mikael Holmen [Wed, 10 May 2017 06:33:43 +0000 (06:33 +0000)]
[UnreachableBlockElim] Check return value of constrainRegClass().
Summary:
MachineRegisterInfo::constrainRegClass() can fail if two register classes
don't have a common subclass or if the register class doesn't contain
enough registers. Check the return value before trying to remove Phi nodes,
and if we can't constrain, we output a COPY instead of simply replacing
registers.
Ahmed Bougacha [Wed, 10 May 2017 00:39:30 +0000 (00:39 +0000)]
[CodeGen] Don't require AA in SDAGISel at -O0.
Before r247167, the pass manager builder controlled which AA
implementations were used, exporting them all in the AliasAnalysis
analysis group.
Now, AAResultsWrapperPass always uses BasicAA, but still uses other AA
implementations if made available in the pass pipeline.
But regardless, SDAGISel is required at O0, and really doesn't need to
be doing fancy optimizations based on useful AA results.
Don't require AA at CodeGenOpt::None, and only use it otherwise.
This does have a functional impact (and one testcase is pessimized
because we can't reuse a load). But I think that's desirable no matter
what.
Note that this alone doesn't result in less DT computations: TwoAddress
was previously able to reuse the DT we computed for SDAG. That will be
fixed separately.
Ahmed Bougacha [Wed, 10 May 2017 00:39:25 +0000 (00:39 +0000)]
[CodeGen] Compute DT/LI lazily in SafeStackLegacyPass. NFC.
We currently require SCEV, which requires DT/LI. Those are expensive to
compute, but the pass only runs for functions that have the safestack
attribute.
Compute DT/LI to build SCEV lazily, only when the pass is actually going
to transform the function.
Ahmed Bougacha [Wed, 10 May 2017 00:39:17 +0000 (00:39 +0000)]
[CodeGen] Add an -O0 backend pipeline test. NFC.
This should hopefully makes changes to the O0 pipeline obvious; it's
easy to require expensive passes, and this helps make informed
decisions.
Case in point: in the few weeks separating the time when I initially
wrote this patch to the time when I committed, the test regressed as
r302103 added another use of DT!
Sam Clegg [Wed, 10 May 2017 00:14:04 +0000 (00:14 +0000)]
[WebAssembly] Fix build error in wasm YAML code
This warning didn't show up on my local build
but is causing the bots to fail. Seems like a
bad idea to have types and variables with the
same name anyhow.