Simon Pilgrim [Tue, 16 Apr 2019 19:18:53 +0000 (19:18 +0000)]
[X86][AVX] X86ISD::PERMV/PERMV3 node types can never fold index ops
Improves codegen demonstrated by D60512 - instructions represented by X86ISD::PERMV/PERMV3 can never memory fold the operand used for their index register.
This patch updates the 'isUseOfShuffle' helper into the more capable 'isFoldableUseOfShuffle' that recognises that the op is used for a X86ISD::PERMV/PERMV3 index mask and can't be folded - allowing us to use broadcast/subvector-broadcast ops to reduce the size of the mask constant pool data.
[InstCombine] Prune fshl/fshr with masked operands
If a constant shift amount is used, then only some of the LHS/RHS
operand bits are demanded and we may be able to simplify based on
that. InstCombineSimplifyDemanded already had the necessary support
for that, we just weren't calling it with fshl/fshr as root.
In particular, this allows us to relax some masked funnel shifts
into simple shifts, as shown in the tests.
This adds a WithOverflowInst class with a few helper methods to get
the underlying binop, signedness and nowrap type and makes use of it
where sensible. There will be two more uses in D60650/D60656.
The refactorings are all NFC, though I left some TODOs where things
could be improved. In particular we have two places where add/sub are
handled but mul isn't.
Luis Marques [Tue, 16 Apr 2019 15:09:18 +0000 (15:09 +0000)]
[DAGCombiner] Add missing flag to addressing mode check
The checks in `canFoldInAddressingMode` tested for addressing modes that have a
base register but didn't set the `HasBaseReg` flag to true (it's false by
default). This patch fixes that. Although the omission of the flag was
technically incorrect it had no known observable impact, so no tests were
changed by this patch.
[SystemZ] Add missing intrinsics to intrinsics-immarg.ll
As of r356091, support for the ImmArg intrinsics was added,
including a SystemZ test case. However, that test case doesn't
actually verify all SystemZ intrinsics with immediate arguments,
only a subset. The rest of them actually works correctly, there's
just no test for them. This patch add all missing intrinsics.
[llvm][Support] Provide interface to set thread priorities
Summary:
We have a multi-platform thread priority setting function(last piece
landed with D58683), I wanted to make this available to all llvm community,
there seem to be other users of such functionality with portability fixmes:
lib/Support/CrashRecoveryContext.cpp
tools/clang/tools/libclang/CIndex.cpp
llvm-undname: Fix nullptr deref on invalid structor names in template args
Similar to r358421: A StructorIndentifierNode has a Class field which
is read when printing it, but if the StructorIndentifierNode appears in
a template argument then demangleFullyQualifiedSymbolName() which sets
Class isn't called. Since StructorIndentifierNodes are always leaf
names, we can just reject them as well.
- Make `allocUnalignedBuffer` look more like `allocArray` and `alloc`.
No behavior change.
- Change `Head->Used < Head->Capacity` to `Head->Used <= Head->Capacity`
in `allocArray` and `alloc`. No intended behavior change, might be a
minuscule memory usage improvement. Noticed this since it was the logic
used in `allocUnalignedBuffer`.
- Don't let `allocArray` alloc too small buffers for names that have
more than 512 levels of nesting (in 64-bit builds). Fixes a heap
buffer overflow found by oss-fuzz.
llvm-undname: Add a -raw-file flag to pass a raw buffer to microsoftDemangle
The default handling splits input into lines. Since
llvm-microsoft-demangle-fuzzer doesn't do this, oss-fuzz produces inputs
that only trigger crashes if the input isn't split into lines. This adds
a hidden flag -raw-file which passes file contents to microsoftDemangle() in
the same way the fuzzer does, for reproducing oss-fuzz reports.
Also change llvm-undname to have a non-0 exit code for invalid symbols.
Hans Wennborg [Tue, 16 Apr 2019 12:13:25 +0000 (12:13 +0000)]
Re-commit r357452: SimplifyCFG SinkCommonCodeFromPredecessors: Also sink function calls without used results (PR41259)
The original commit caused false positives from AddressSanitizer's
use-after-scope checks, which have now been fixed in r358478.
> The code was previously checking that candidates for sinking had exactly
> one use or were a store instruction (which can't have uses). This meant
> we could sink call instructions only if they had a use.
>
> That limitation seemed a bit arbitrary, so this patch changes it to
> "instruction has zero or one use" which seems more natural and removes
> the need to special-case stores.
>
> Differential revision: https://reviews.llvm.org/D59936
Dmitri Gribenko [Tue, 16 Apr 2019 09:46:02 +0000 (09:46 +0000)]
Removed CMake cache upgrade code from 2011
Summary:
This code was added in r141266 to make a breaking change to CMake, but
still be compatible with existing cache files. The cache files from
2011 are irrelevant today in 2019.
Hans Wennborg [Tue, 16 Apr 2019 07:54:20 +0000 (07:54 +0000)]
Asan use-after-scope: don't poison allocas if there were untraced lifetime intrinsics in the function (PR41481)
If there are any intrinsics that cannot be traced back to an alloca, we
might have missed the start of a variable's scope, leading to false
error reports if the variable is poisoned at function entry. Instead, if
there are some intrinsics that can't be traced, fail safe and don't
poison the variables in that function.
Fangrui Song [Tue, 16 Apr 2019 03:56:55 +0000 (03:56 +0000)]
[llvm-objdump] Align instructions to a tab stop in disassembly output
This relands D60376/rL358405, with the difference: sed 'y/\t/ /' -> tr '\t' ' '
BSD sed doesn't support escape characters for the 'y' command.
I didn't use it in rL358405 because it was not listed at
https://llvm.org/docs/GettingStarted.html#software but it
should be available.
Original description:
In GNU objdump, -w/--wide aligns instructions in the disassembly output.
This patch does the same to llvm-objdump. However, we always use the
wide format (-w/--wide is ignored), because the narrow format
(instructions are misaligned) is probably not very useful.
In llvm-readobj, we made a similar decision: always use the wide format,
accept but ignore -W/--wide.
To save some columns, we change the tab before hex bytes (controlled by
--[no-]show-raw-insn) to a space.
Fangrui Song [Tue, 16 Apr 2019 02:37:29 +0000 (02:37 +0000)]
[llvm-objdump] Simplify PrintHelpMessage() logic
This relands rL358418. It missed one test that should also use -macho
Note, all the other -private-header -exports-trie tests are used
together with -macho.
[CodeExtractor] Add a few debug lines to understand why a region is not extracted
The CodeExtractor is not smart enough to compute which basic block is
the entry of a region. Instead it relies on the order of the list
of basic blocks that is handed to it and assumes that the entry
is the first block in the list.
Without the additional debug information, it is hard to understand
why a valid region does not get extracted, because we would miss
that the order of in the list just doesn't match what the CodeExtractor
wants.
The test in the dependent revision has been fixed for Windows.
Original commit message:
Response file expansion limits the amount of expansion to prevent
potential infinite recursion. However, the current logic assumes that
any argument beginning with @ is a response file, which is not true for
e.g. `-Xlinker -rpath -Xlinker @executable_path/../lib` on Darwin.
Having too many of these non-response file arguments beginning with @
prevents actual response files from being expanded. Instead, limit based
on the number of successful response file expansions, which should still
prevent infinite recursion but also avoid false positives.
Reapply [Support] Add a test for recursive response file expansion
Use the appropriate tokenizer to fix the test on Windows.
Original commit message:
I'm going to be modifying the logic to avoid infinitely recursing on
self-referential response files, so add a unit test to verify the
expected behavior.
The test breaks a Windows buildbot:
http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/17016/steps/test-check-all/logs/stdio
[AArch64][GlobalISel] Don't do extending loads combine for non-pow-2 types.
Since non-pow-2 types are going to get split up into multiple loads anyway,
don't do the [SZ]EXTLOAD combine for those and save us trouble later in
legalization.
[LSR] Rewrite misses some fixup locations if it splits critical edge
If LSR split critical edge during rewriting phi operands and
phi node has other pending fixup operands, we need to
update those pending fixups. Otherwise formulae will not be
implemented completely and some instructions will not be eliminated.
Response file expansion limits the amount of expansion to prevent
potential infinite recursion. However, the current logic assumes that
any argument beginning with @ is a response file, which is not true for
e.g. `-Xlinker -rpath -Xlinker @executable_path/../lib` on Darwin.
Having too many of these non-response file arguments beginning with @
prevents actual response files from being expanded. Instead, limit based
on the number of successful response file expansions, which should still
prevent infinite recursion but also avoid false positives.
[Support] Add a test for recursive response file expansion
I'm going to be modifying the logic to avoid infinitely recursing on
self-referential response files, so add a unit test to verify the
expected behavior.
[X86] Fix a stack folding test to have a full xmm2-31 clobber list instead of stopping at xmm15. Add an additional dependency to keep instruction below inline asm block.
Philip Reames [Mon, 15 Apr 2019 18:15:08 +0000 (18:15 +0000)]
[LoopPred] Stop passing around builders [NFC]
This is a preparatory patch for D60093. This patch itself is NFC, but while preparing this I noticed and committed a small hoisting change in rL358419.
The basic structure of the new scheme is that we pass around the guard ("the using instruction"), and select an optimal insert point by examining operands at each construction point. This seems conceptually a bit cleaner to start with as it isolates the knowledge about insertion safety at the actual insertion point.
Note that the non-hoisting path is not actually used at the moment. That's not exercised until D60093 is rebased on this one.
Don Hinton [Mon, 15 Apr 2019 17:18:10 +0000 (17:18 +0000)]
[CommandLineParser] Add DefaultOption flag
Summary: Add DefaultOption flag to CommandLineParser which provides a
default option or alias, but allows users to override it for some
other purpose as needed.
Also, add `-h` as a default alias to `-help`, which can be seamlessly
overridden by applications like llvm-objdump and llvm-readobj which
use `-h` as an alias for other options.
(relanding after revert, r358414)
Added DefaultOptions.clear() to reset().
llvm-undname: Fix nullptr deref on invalid conversion operator names in template args
A ConversionOperatorIdentifierNode has a TargetType which is read when
printing it, but if the ConversionOperatorIdentifierNode appears in a
template argument there's nothing that can provide the TargetType.
Normally the COIN is a symbol (leaf) name and takes its TargetType from the
symbol's type, but in a template argument context the COIN can only be
either a non-leaf name piece or a type, and must hence be invalid.
Similar to the COIN check in demangleDeclarator().
Philip Reames [Mon, 15 Apr 2019 15:53:25 +0000 (15:53 +0000)]
[LoopPred] Hoist and of predicated checks where legal
If we have multiple range checks which can be predicated, hoist the and of the results outside the loop. This minorly cleans up the resulting IR, but the main motivation is as a building block for D60093.
Fangrui Song [Mon, 15 Apr 2019 13:32:41 +0000 (13:32 +0000)]
[llvm-objdump] Align instructions to a tab stop in disassembly output
Summary:
In GNU objdump, -w/--wide aligns instructions in the disassembly output.
This patch does the same to llvm-objdump. However, we always use the
wide format (-w/--wide is ignored), because the narrow format
(instructions are misaligned) is probably not very useful.
In llvm-readobj, we made a similar decision: always use the wide format,
accept but ignore -W/--wide.
To save some columns, we change the tab before hex bytes (controlled by
--[no-]show-raw-insn) to a space.
Tim Northover [Mon, 15 Apr 2019 12:04:10 +0000 (12:04 +0000)]
DAG: propagate ConsecutiveRegs flags to returns too.
Arguments already have a flag to inform backends when they have been split up.
The AArch64 arm64_32 ABI makes use of these on return types too, so that code
emitted for armv7k can be ABI-compliant.
There should be no CodeGen changes yet, just making more information available.
Tim Northover [Mon, 15 Apr 2019 12:03:54 +0000 (12:03 +0000)]
DAG: propagate whether an arg is a pointer for CallingConv decisions.
The arm64_32 ABI specifies that pointers (despite being 32-bits) should be
zero-extended to 64-bits when passed in registers for efficiency reasons. This
means that the SelectionDAG needs to be able to tell the backend that an
argument was originally a pointer, which is implmented here.
Additionally, some memory intrinsics need to be declared as taking an i8*
instead of an iPTR.
There should be no CodeGen change yet, but it will be triggered when AArch64
backend support for ILP32 is added.
This patch changes the error message when the section specified by
--string-dump cannot be found by including the name of the section in
the error message and changing the prefix text to not imply that the
file itself was invalid. As part of this change some uses of
std::error_code have been replaced with the llvm Error class to better
encapsulate the error info (rather than passing File strings around),
and the WithColor class replaces string literal error prefixes.
Jeremy Morse [Mon, 15 Apr 2019 10:23:22 +0000 (10:23 +0000)]
[Docs] Switch a code block from LLVM to text
While I can't replicate this locally, it looks like the buildbots don't
recognize the IR block in r358385 l764 as IR. Downgrade it to being just
text while I look into it.
FileCheck [1/12]: Move variable table in new object
Summary:
This patch is part of a patch series to add support for FileCheck
numeric expressions. This specific patch adds a new class to hold
pattern matching global state.
The table holding the values of FileCheck variable constitutes some sort
of global state for the matching phase, yet is passed as parameters of
all functions using it. This commit create a new FileCheckPatternContext
class pointed at from FileCheckPattern. While it increases the line
count, it separates local data from global state. Later commits build
on that to add numeric expression global state to that class.
Copyright:
- Linaro (changes up to diff 183612 of revision D55940)
- GraphCore (changes in later versions of revision D55940 and
in new revision created off D55940)
Simon Tatham [Mon, 15 Apr 2019 10:06:26 +0000 (10:06 +0000)]
[TableGen] Include schedule model name in diagnostic.
If you have more than one schedule model in your TableGen target
definitions, then the diagnostic "No schedule information for
instruction 'foo'" is rather unhelpful, because it doesn't tell you
_which_ schedule model is missing the necessary information (or, as it
might be, missing the UnsupportedFeatures definition that would stop
it thinking it needed it).
Extended the message to include the name of the schedule model that
it's complaining about.
This patch adds documentation explaining how variable location information is
compiled from the IR representation down to the end of the codegen pipeline,
but avoiding discussion of file formats and encoding.
This should make it clearer how the dbg.value / dbg.declare etc intrinsics
are transformed and arranged into DBG_VALUE instructions, and their meaning.
[SelectionDAG] Use KnownBits::computeForAddSub/computeForAddCarry
Summary:
Use KnownBits::computeForAddSub/computeForAddCarry
in SelectionDAG::computeKnownBits when doing value
tracking for addition/subtraction.
This should improve the precision of the known bits,
as we only used to make a simple estimate of known
zeroes. The KnownBits support functions are also
able to deduce bits that are known to be one in the
result.
[X86] Regenerate checks for domain-reassignment.mir
Apparently there are some stray IMPLICIT_DEF operations that weren't in the
checks. Not sure if they've always been there or something changed at some
point.
[GlobalISel] Enable CSE in the IRTranslator & legalizer for -O0 with constants only.
Other opcodes shouldn't be CSE'd until we can be sure debug info quality won't
be degraded.
This change also improves the IRTranslator so that in most places, but not all,
it creates constants using the MIRBuilder directly instead of first creating a
new destination vreg and then creating a constant. By doing this, the
buildConstant() method can just return the vreg of an existing G_CONSTANT
instead of having to create a COPY from it.
I measured a 0.2% improvement in compile time and a 0.9% improvement in code
size at -O0 ARM64.
[GlobalISel] Introduce a CSEConfigBase class to allow targets to define their own CSE configs.
Because CodeGen can't depend on GlobalISel, we need a way to encapsulate the CSE
configs that can be passed between TargetPassConfig and the targets' custom
pass configs. This CSEConfigBase allows targets to create custom CSE configs
which is then used by the GISel passes for the CSEMIRBuilder.
This support will be used in a follow up commit to allow constant-only CSE for
-O0 compiles in D60580.
[X86] Redefine KUNPCK instructions to take a narrower source register class than destination register class. Remove copies from the isel output pattern.
There's no reason for the inputs to be the destination register class. This just
forces an unnecessary copy in the output patterns.