[AArch64][GlobalISel] Don't do extending loads combine for non-pow-2 types.
Since non-pow-2 types are going to get split up into multiple loads anyway,
don't do the [SZ]EXTLOAD combine for those and save us trouble later in
legalization.
[LSR] Rewrite misses some fixup locations if it splits critical edge
If LSR split critical edge during rewriting phi operands and
phi node has other pending fixup operands, we need to
update those pending fixups. Otherwise formulae will not be
implemented completely and some instructions will not be eliminated.
Response file expansion limits the amount of expansion to prevent
potential infinite recursion. However, the current logic assumes that
any argument beginning with @ is a response file, which is not true for
e.g. `-Xlinker -rpath -Xlinker @executable_path/../lib` on Darwin.
Having too many of these non-response file arguments beginning with @
prevents actual response files from being expanded. Instead, limit based
on the number of successful response file expansions, which should still
prevent infinite recursion but also avoid false positives.
[Support] Add a test for recursive response file expansion
I'm going to be modifying the logic to avoid infinitely recursing on
self-referential response files, so add a unit test to verify the
expected behavior.
[X86] Fix a stack folding test to have a full xmm2-31 clobber list instead of stopping at xmm15. Add an additional dependency to keep instruction below inline asm block.
Philip Reames [Mon, 15 Apr 2019 18:15:08 +0000 (18:15 +0000)]
[LoopPred] Stop passing around builders [NFC]
This is a preparatory patch for D60093. This patch itself is NFC, but while preparing this I noticed and committed a small hoisting change in rL358419.
The basic structure of the new scheme is that we pass around the guard ("the using instruction"), and select an optimal insert point by examining operands at each construction point. This seems conceptually a bit cleaner to start with as it isolates the knowledge about insertion safety at the actual insertion point.
Note that the non-hoisting path is not actually used at the moment. That's not exercised until D60093 is rebased on this one.
Don Hinton [Mon, 15 Apr 2019 17:18:10 +0000 (17:18 +0000)]
[CommandLineParser] Add DefaultOption flag
Summary: Add DefaultOption flag to CommandLineParser which provides a
default option or alias, but allows users to override it for some
other purpose as needed.
Also, add `-h` as a default alias to `-help`, which can be seamlessly
overridden by applications like llvm-objdump and llvm-readobj which
use `-h` as an alias for other options.
(relanding after revert, r358414)
Added DefaultOptions.clear() to reset().
llvm-undname: Fix nullptr deref on invalid conversion operator names in template args
A ConversionOperatorIdentifierNode has a TargetType which is read when
printing it, but if the ConversionOperatorIdentifierNode appears in a
template argument there's nothing that can provide the TargetType.
Normally the COIN is a symbol (leaf) name and takes its TargetType from the
symbol's type, but in a template argument context the COIN can only be
either a non-leaf name piece or a type, and must hence be invalid.
Similar to the COIN check in demangleDeclarator().
Philip Reames [Mon, 15 Apr 2019 15:53:25 +0000 (15:53 +0000)]
[LoopPred] Hoist and of predicated checks where legal
If we have multiple range checks which can be predicated, hoist the and of the results outside the loop. This minorly cleans up the resulting IR, but the main motivation is as a building block for D60093.
Fangrui Song [Mon, 15 Apr 2019 13:32:41 +0000 (13:32 +0000)]
[llvm-objdump] Align instructions to a tab stop in disassembly output
Summary:
In GNU objdump, -w/--wide aligns instructions in the disassembly output.
This patch does the same to llvm-objdump. However, we always use the
wide format (-w/--wide is ignored), because the narrow format
(instructions are misaligned) is probably not very useful.
In llvm-readobj, we made a similar decision: always use the wide format,
accept but ignore -W/--wide.
To save some columns, we change the tab before hex bytes (controlled by
--[no-]show-raw-insn) to a space.
Tim Northover [Mon, 15 Apr 2019 12:04:10 +0000 (12:04 +0000)]
DAG: propagate ConsecutiveRegs flags to returns too.
Arguments already have a flag to inform backends when they have been split up.
The AArch64 arm64_32 ABI makes use of these on return types too, so that code
emitted for armv7k can be ABI-compliant.
There should be no CodeGen changes yet, just making more information available.
Tim Northover [Mon, 15 Apr 2019 12:03:54 +0000 (12:03 +0000)]
DAG: propagate whether an arg is a pointer for CallingConv decisions.
The arm64_32 ABI specifies that pointers (despite being 32-bits) should be
zero-extended to 64-bits when passed in registers for efficiency reasons. This
means that the SelectionDAG needs to be able to tell the backend that an
argument was originally a pointer, which is implmented here.
Additionally, some memory intrinsics need to be declared as taking an i8*
instead of an iPTR.
There should be no CodeGen change yet, but it will be triggered when AArch64
backend support for ILP32 is added.
This patch changes the error message when the section specified by
--string-dump cannot be found by including the name of the section in
the error message and changing the prefix text to not imply that the
file itself was invalid. As part of this change some uses of
std::error_code have been replaced with the llvm Error class to better
encapsulate the error info (rather than passing File strings around),
and the WithColor class replaces string literal error prefixes.
Jeremy Morse [Mon, 15 Apr 2019 10:23:22 +0000 (10:23 +0000)]
[Docs] Switch a code block from LLVM to text
While I can't replicate this locally, it looks like the buildbots don't
recognize the IR block in r358385 l764 as IR. Downgrade it to being just
text while I look into it.
FileCheck [1/12]: Move variable table in new object
Summary:
This patch is part of a patch series to add support for FileCheck
numeric expressions. This specific patch adds a new class to hold
pattern matching global state.
The table holding the values of FileCheck variable constitutes some sort
of global state for the matching phase, yet is passed as parameters of
all functions using it. This commit create a new FileCheckPatternContext
class pointed at from FileCheckPattern. While it increases the line
count, it separates local data from global state. Later commits build
on that to add numeric expression global state to that class.
Copyright:
- Linaro (changes up to diff 183612 of revision D55940)
- GraphCore (changes in later versions of revision D55940 and
in new revision created off D55940)
Simon Tatham [Mon, 15 Apr 2019 10:06:26 +0000 (10:06 +0000)]
[TableGen] Include schedule model name in diagnostic.
If you have more than one schedule model in your TableGen target
definitions, then the diagnostic "No schedule information for
instruction 'foo'" is rather unhelpful, because it doesn't tell you
_which_ schedule model is missing the necessary information (or, as it
might be, missing the UnsupportedFeatures definition that would stop
it thinking it needed it).
Extended the message to include the name of the schedule model that
it's complaining about.
This patch adds documentation explaining how variable location information is
compiled from the IR representation down to the end of the codegen pipeline,
but avoiding discussion of file formats and encoding.
This should make it clearer how the dbg.value / dbg.declare etc intrinsics
are transformed and arranged into DBG_VALUE instructions, and their meaning.
[SelectionDAG] Use KnownBits::computeForAddSub/computeForAddCarry
Summary:
Use KnownBits::computeForAddSub/computeForAddCarry
in SelectionDAG::computeKnownBits when doing value
tracking for addition/subtraction.
This should improve the precision of the known bits,
as we only used to make a simple estimate of known
zeroes. The KnownBits support functions are also
able to deduce bits that are known to be one in the
result.
[X86] Regenerate checks for domain-reassignment.mir
Apparently there are some stray IMPLICIT_DEF operations that weren't in the
checks. Not sure if they've always been there or something changed at some
point.
[GlobalISel] Enable CSE in the IRTranslator & legalizer for -O0 with constants only.
Other opcodes shouldn't be CSE'd until we can be sure debug info quality won't
be degraded.
This change also improves the IRTranslator so that in most places, but not all,
it creates constants using the MIRBuilder directly instead of first creating a
new destination vreg and then creating a constant. By doing this, the
buildConstant() method can just return the vreg of an existing G_CONSTANT
instead of having to create a COPY from it.
I measured a 0.2% improvement in compile time and a 0.9% improvement in code
size at -O0 ARM64.
[GlobalISel] Introduce a CSEConfigBase class to allow targets to define their own CSE configs.
Because CodeGen can't depend on GlobalISel, we need a way to encapsulate the CSE
configs that can be passed between TargetPassConfig and the targets' custom
pass configs. This CSEConfigBase allows targets to create custom CSE configs
which is then used by the GISel passes for the CSEMIRBuilder.
This support will be used in a follow up commit to allow constant-only CSE for
-O0 compiles in D60580.
[X86] Redefine KUNPCK instructions to take a narrower source register class than destination register class. Remove copies from the isel output pattern.
There's no reason for the inputs to be the destination register class. This just
forces an unnecessary copy in the output patterns.
[X86] Change IMUL with immediate instruction order to ri8 instructions come before ri/ri32 instructions.
This will ensure IMUL64ri8 is tried before IMUL64ri32. For IMUL32 and IMUL16 the
order doesn't really matter because only the ri8 versions use a predicate. That
automatically gives them priority.
[X86] Move VPTESTM matching from the isel table to custom code in X86ISelDAGToDAG.
We had many tablegen patterns for these instructions. And due to the
commutability of the patterns, tablegen expands them to even more patterns. All
together VPTESTMD patterns accounted for more the 50K of the 610K isel table.
This had gotten bad when we stopped canonicalizing AND to vXi64. This required
a pattern for every combination of bitcast input type.
This change moves the matching to custom code where it is easier to look through
the bitcasts without being concerned with the specific types.
The test changes are because we are now stricter with one use checks as its
required to make load folding legal. We now require the AND and any BITCAST to
only have a single use. This prevents forming VPTESTM and a VPAND with the same
inputs.
We now support broadcast loads for 128/256 patterns without VLX. We'll widen to
512-bit like and still fold the broadcast since the amount of memory read
doesn't change.
There are a few tests that got slightly longer because are now prefering
load + VPTESTM over XOR+VPCMPEQ for (seteq (load), allzeros). Previously we were
able to share the XOR with multiple VPTESTM instructions.
[X86] Don't form masked vpcmp/vcmp/vptestm operations if the setcc node has more than one use.
We're better of emitting a single compare + kand rather than a compare for the
other use and a masked compare.
I'm looking into using custom instruction selection for VPTESTM to reduce the
ridiculous number of permutations of patterns in the isel table. Putting a one
use check on all masked compare folding makes load fold matching in the custom
code easier.
Fangrui Song [Sun, 14 Apr 2019 07:20:03 +0000 (07:20 +0000)]
[Mem2Reg] Simplify and micro optimize
* Rearrange continu/break
* BBNumbers.lookup(A) -> BBNumbers.find(A)->second
BBNumbers has been computed, thus we can assume the value exists in the predicate.
Fangrui Song [Sun, 14 Apr 2019 04:45:04 +0000 (04:45 +0000)]
[ConstantRange] Delete unused getSetSize
getSetSize returns an APInt that is 1 bit wider. The APInt is typically 65-bit and requires memory allocation. isSizeStrictlySmallerThan and isSizeLargerThan are preferred. The last use of this helper method was removed by rL302385.
Philip Reames [Sat, 13 Apr 2019 22:12:56 +0000 (22:12 +0000)]
[Tests] Add tests for D60659, and make adjustments to others to make diff clear
Three related changes:
1) auto-gen several test files
2) Add the new tests at the bottom of said files
3) Adjust a couple of other test files not to use stores to constants when trying to test constexpr address handling
[ConstantRange] Disallow NUW | NSW in makeGuaranteedNoWrapRegion()
As motivated in D60598, this drops support for specifying both NUW and
NSW in makeGuaranteedNoWrapRegion(). None of the users of this function
currently make use of this.
When both NUW and NSW are specified, the exact nowrap region has two
disjoint parts and makeGNWR() returns one of them. This result doesn't
seem to be useful for anything, but makes the semantics of the function
fuzzier.
As pointed out in D60518 folding mulo(%x, undef) to {undef, undef}
isn't correct. As a correct version of this already exists in
InstructionSimplify (https://github.com/llvm-mirror/llvm/blob/bd8056ef326e075cc500f3f0cfcd1193bc200594/lib/Analysis/InstructionSimplify.cpp#L4750-L4757) this is just
dead code though. Drop it together with the mul(%x, 0) -> {0, false}
fold that is also already handled by InstSimplify.
Don Hinton [Sat, 13 Apr 2019 16:55:28 +0000 (16:55 +0000)]
[CommandLineParser] Add DefaultOption flag
Summary: Add DefaultOption flag to CommandLineParser which provides a
default option or alias, but allows users to override it for some
other purpose as needed.
Also, add `-h` as a default alias to `-help`, which can be seamlessly
overridden by applications like llvm-objdump and llvm-readobj which
use `-h` as an alias for other options.
Philip Reames [Sat, 13 Apr 2019 03:55:13 +0000 (03:55 +0000)]
[StackMaps] Update llvm-readobj to parse V3 Stackmaps
This updates the StackMap parser in the llvm-readobj tool to parse version 3 StackMaps, which were bumped in https://reviews.llvm.org/D32629.
Version 3 StackMaps differ in that they have a uint16 sized "location size" field which was added to the Location block in a StackMap record. The record has additional padding for alignment. This was a backwards incompatible change resulting in a StackMap version bump.
Patch By: jacob.hughes@kcl.ac.uk (with a rewrite of tests by me)
Differential Revision: https://reviews.llvm.org/D59020
Philip Reames [Sat, 13 Apr 2019 03:08:45 +0000 (03:08 +0000)]
[StackMaps] Add location size to llvm-readobj -stackmap output
The size field of a location can be different for each entry, so it is useful to have this displayed in the output of llvm-readobj -stackmap. Below is an example of how the output would look: