trunc (load X) --> load (bitcast X to narrow type)
We have this transform in DAGCombiner::ReduceLoadWidth(), but the truncated
load pattern can interfere with other instcombine transforms, so I'd like to
allow the fold sooner.
Example:
https://bugs.llvm.org/show_bug.cgi?id=16739
...in that report, we have bitcasts bracketing these ops, so those could get
eliminated too.
We've generally ruled out widening of loads early in IR ( LoadCombine -
http://lists.llvm.org/pipermail/llvm-dev/2016-September/105291.html ), but
that reasoning may not apply to narrowing if we can preserve information
such as the dereferenceable range.
Pablo Barrio [Thu, 25 Jul 2019 10:59:45 +0000 (10:59 +0000)]
[ARM][AArch64] Support for Cortex-A65 & A65AE, Neoverse E1 & N1
Summary:
Add support for Cortex-A65, Cortex-A65AE, Neoverse E1 and Neoverse N1.
Neoverse E1 and Cortex-A65(&AE) only implement the AArch64 state of the
Arm architecture. Neoverse N1 implements both AArch32 and AArch64.
Simon Pilgrim [Thu, 25 Jul 2019 10:20:39 +0000 (10:20 +0000)]
Revert rL366946 : [Remarks] Add support for serializing metadata for every remark streamer
This allows every serializer format to implement metaSerializer() and
return the corresponding meta serializer.
........
Fix windows build bots
http://lab.llvm.org:8011/builders/llvm-clang-x86_64-win-fast
http://lab.llvm.org:8011/builders/llvm-clang-lld-x86_64-scei-ps4-windows10pro-fast
http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win
George Rimar [Thu, 25 Jul 2019 10:19:23 +0000 (10:19 +0000)]
Recommit "rL366894: [yaml2obj] - Allow custom fields for the SHT_UNDEF sections."
With fix: do not use `stat` tool.
Original commit message:
This is a follow-up refactoring patch for recently
introduced functionality which which reduces the code duplication
and also makes possible to redefine all possible fields of
the first SHT_NULL section (previously it was only possible to set
sh_link and sh_size).
[IPSCCP] Add assertion to surface cases where we zap returns with overdefined users.
We should only zap returns in functions, where all live users have a
replace-able value (are not overdefined). Unused return values should be
undefined.
This should make it easier to detect bugs like in PR42738.
Alternatively we could bail out of zapping the function returns, but I
think it would be better to address those divergences between function
and call-site values where they are actually caused.
This refactors boolean 'OptForSize' that was passed around in a lot of places.
It controlled folding of the tail loop, the scalar epilogue, into the main loop
but code-size reasons may not be the only reason to do this. Thus, this is a
first step to generalise the concept of tail-loop folding, and hence OptForSize
has been renamed and is using an enum ScalarEpilogueStatus that holds the
status how the epilogue should be lowered.
This will be followed up by D65197, that picks up the predicate loop hint and
performs the tail-loop folding.
Kai Luo [Thu, 25 Jul 2019 07:47:52 +0000 (07:47 +0000)]
[PowerPC][NFC] Added `getDefMIPostRA` method
Summary:
In PostRA phase, we often have to find out the most recent definition
of a register. This patch adds getDefMIPostRA so that other methods
can use it rather than implementing it repeatedly.
that can be used to indicate to the vectoriser that all (load/store)
instructions should be predicated (masked). This allows, for example, folding
of the remainder loop into the main loop.
This patch will be followed up with D64916 and D65197. The former is a
refactoring in the loopvectorizer and the groundwork to make tail loop folding
a more general concept, and in the latter the actual tail loop folding
transformation will be implemented.
These tests are breaking three independent upstream buildbots (as well
downstream ones). These breakages have appeared mysteriously,
consistently, and during different revisions. Sadly, none of
{ASAN,TSAN,MSAN,UBSAN} flag anything, so the cause here is nonobvious.
Until we've figured this out, it seems best to disable these tests
entirely, so that the affected bots don't remain silent about any other,
unrelated failures.
[llvm-objdump][NFC] Make the PrettyPrinter::printInst() output buffered
Summary:
Every time PrettyPrinter::printInst is called, stdout is flushed and it makes llvm-objdump slow. This patches adds a string
buffer to prevent stdout from being flushed.
Benchmark results (./llvm-objdump-master: without this patch, ./bin/llvm-objcopy: with this patch):
$ hyperfine --warmup 10 './llvm-objdump-master -d ./bin/llvm-objcopy' './bin/llvm-objdump -d ./bin/llvm-objcopy'
Benchmark #1: ./llvm-objdump-master -d ./bin/llvm-objcopy
Time (mean ± σ): 2.230 s ± 0.050 s [User: 1.533 s, System: 0.682 s]
Range (min … max): 2.115 s … 2.278 s 10 runs
Benchmark #2: ./bin/llvm-objdump -d ./bin/llvm-objcopy
Time (mean ± σ): 386.4 ms ± 13.0 ms [User: 376.6 ms, System: 6.1 ms]
Range (min … max): 366.1 ms … 407.0 ms 10 runs
Summary
'./bin/llvm-objdump -d ./bin/llvm-objcopy' ran
5.77 ± 0.23 times faster than './llvm-objdump-master -d ./bin/llvm-objcopy'
Joel E. Denny [Thu, 25 Jul 2019 03:14:32 +0000 (03:14 +0000)]
[lit] Protect full test suite from FILECHECK_OPTS
lit's test suite calls lit multiple times for various sample test
suites. `FILECHECK_OPTS` is safe for FileCheck calls in lit's test
suite. It's not safe for FileCheck calls in the sample test suites,
whose output affects the results of lit's test suite.
Without this patch, only one such sample test suite is protected from
`FILECHECK_OPTS`, and I admit I haven't discovered other cases for
which I can produce false failures using `FILECHECK_OPTS`. However,
it's hard to predict the future, especially false passes. Thus, this
patch protects all existing and future sample test suites from
`FILECHECK_OPTS` (and the deprecated
`FILECHECK_DUMP_INPUT_ON_FAILURE`).
[FileCollector] Change coding style from LLDB to LLVM (NFC)
This patch changes the coding style of the FileCollector from the LLDB
to the LLVM coding style. Alex recently lifted it into LLVM and I
volunteered to do the conversion.
Philip Reames [Wed, 24 Jul 2019 23:24:13 +0000 (23:24 +0000)]
Define some basic terminology around loops in our documentation
I've noticed a lot of confusion around this area recently with key terms being misused in a number of threads. To help reign that in, let's go ahead and document the current terminology and meaning thereof.
My hope is to grow this over time into a broader discussion of canonical loop forms - yes, there are more than one ... many more than one - but for the moment, simply having the key terminology is a good stopping place.
Note: I am landing this *without* an LGTM. All feedback so far has been positive, and trying to apply all of the suggested changes/extensions would cause the review to never end. Instead, I decided to land it with the obvious fixes made based on reviewer comments, then iterate from there.
[AArch64][GlobalISel] Don't try to use GISel if subtarget doesn't have neon or fp.
Throughout the legalizerinfo we currently make the assumption that the target
has neon and FP target features available. Fixing it will require a refactor of
the whole thing, so until then make sure we fall back.
Summary:
This was originally reported in D62818.
https://rise4fun.com/Alive/oPH
InstCombine does the opposite fold, in hope that `C l>>/<< Y` expression
will be hoisted out of a loop if `Y` is invariant and `X` is not.
But as it is seen from the diffs here, if it didn't get hoisted,
the produced assembly is almost universally worse.
Much like with my recent "hoist add/sub by/from const" patches,
we should get almost universal win if we hoist constant,
there is almost always an "and/test by imm" instruction,
but "shift of imm" not so much, so we may avoid having to
materialize the immediate, and thus need one less register.
And since we now shift not by constant, but by something else,
the live-range of that something else may reduce.
Special care needs to be applied not to disturb x86 `BT` / hexagon `tstbit`
instruction pattern. And to not get into endless combine loop.
IR: Teach GlobalIndirectSymbol::getBaseObject() to handle more kinds of expressions.
For aliases, any expression that lowers at the MC level to global_object or
global_object+constant is valid at the object file level. getBaseObject()
should return a result if the aliasee ends up being of that form even if
the IR used to produce it is somewhat unconventional.
Note that this is different from what stripInBoundsOffsets() and that family
of functions is doing. Those functions are concerned about semantic properties
of IR, whereas here we only care about the lowering result.
Therefore reimplement getBaseObject() in a way that matches the lowering
result. This fixes a crash when producing a summary for aliases such as
that in the included test case.
[GlobalISel] Support for inlining memcpy, memset and memmove calls.
This introduces a new family of combiner helper routines that re-use the
target specific cost model from SelectionDAG, and generate inline implementations
of the memcpy family of intrinsics.
The combines are only enabled at optimization levels higher than -O0, and give
very substantial performance improvements.
[AArch64][GlobalISel] Fix a crash during s128 G_ICMP legalization due to r366317.
r366317 added a legalization for s128 G_ICMP narrow scalar which tried to hard
code the result type of the new legalized G_SELECT. Change this to instead use
type of the original G_ICMP result and allow the target to legalize it if necessary
later.
David Bolvansky [Wed, 24 Jul 2019 20:27:32 +0000 (20:27 +0000)]
Let CorrelatedValuePropagation preserve LazyValueInfo
Summary:
This patch makes CorrelatedValuePropagation preserve LazyValueInfo by adding LazyValueInfo::eraseValue & calling it whenever an instruction is erased.
Passes `make check` , test-suite, and SPECrate 2017.
David Green [Wed, 24 Jul 2019 17:36:47 +0000 (17:36 +0000)]
[ARM] Rewrite how VCMP are lowered, using a single node
This removes the VCEQ/VCNE/VCGE/VCEQZ/etc nodes, just using two called VCMP and
VCMPZ with an extra operand as the condition code. I believe this will make
some combines simpler, allowing us to just look at these codes and not the
operands. It also helps fill in a missing VCGTUZ MVE selection without adding
extra nodes for it.
matchBinOpReduction returns the lower extracted subvector in such cases, assuming isExtractSubvectorCheap accepts the extraction.
I've only enabled it for X86 reduction sums so far. I intend to enable it for the bitop/minmax cases in future patches, and eventually I think its worth turning it on all the time. This is mainly just a case of ensuring calls to matchBinOpReduction don't make assumptions on the vector width based on the original vector extraction.
Fixes the x86 partial reduction sum cases in PR33758 and PR42023.
David Green [Wed, 24 Jul 2019 17:26:26 +0000 (17:26 +0000)]
[ARM] Disable MVE fptosi and friends
The prevents us from trying to convert an i1 predicate vector to a float, or
vice-versa. Better patterns are possible, which will follow in a subsequent
commit. For now we just expand them.
David Green [Wed, 24 Jul 2019 16:42:09 +0000 (16:42 +0000)]
[ARM] Better OR's for MVE compares
This adds a DeMorgan combine for OR's of compares to turn them into AND's,
helping prevent them from going into and out of gpr registers. It also fills in
the VCLE and VCLT nodes that MVE can select, allowing it to invert more
compares.
David Tenty [Wed, 24 Jul 2019 15:04:27 +0000 (15:04 +0000)]
[AIX][lit] Don't depend on psutil on AIX
Summary:
On AIX psutil can run into problems with permissions to read the process
tree, which causes problems for python timeout tests which need to kill off
a test and it's children.
This patch adds a workaround by invoking shell via subprocess and using a
platform specific option to ps to list all the descendant processes so we can
kill them. We add some checks so lit can tell whether timeout tests are
supported with out exposing whether we are utilizing the psutil
implementation or the alternative.
Anton Afanasyev [Wed, 24 Jul 2019 14:55:40 +0000 (14:55 +0000)]
[Support] Fix `-ftime-trace-granularity` option
Summary:
Move `-ftime-trace-granularity` option to frontend options. Without patch
this option is showed up in the help for any tool that links libSupport.
David Green [Wed, 24 Jul 2019 14:28:22 +0000 (14:28 +0000)]
[ARM] MVE floating point compares and selects
Much like integers, this adds MVE floating point compares and select. It
requires a lot more buildvector/shuffle code because we may need to expand the
compares without mve.fp, and requires support for and/or because of the way we
lower llvm condition codes.
Simi Pallipurath [Wed, 24 Jul 2019 13:54:14 +0000 (13:54 +0000)]
[ARM] Make sure that the constant pool does not keep in the middle of an IT block.
This change make sure that llvm does not emit an invalid IT block
by putting the constant pool in the middle of an IT block.
We have code to try to avoid putting a constant island in the middle of an
IT block, but it only works if we see an IT between the one currently
referencing CPE and possible insertion point. If the first instruction
we look at is the VLDRD after the IT , we never see the IT and does not
realize that the instruction doing the load could be in an IT block itself.
Jay Foad [Wed, 24 Jul 2019 12:50:10 +0000 (12:50 +0000)]
[InstSimplify] Rename SimplifyFPUnOp and SimplifyFPBinOp
Summary:
SimplifyFPBinOp is a variant of SimplifyBinOp that lets you specify
fast math flags, but the name is misleading because both functions
can simplify both FP and non-FP ops. Instead, overload SimplifyBinOp
so that you can optionally specify fast math flags.
Summary:
A number of EXPECT statements in FileCheck's unit tests are dependent
from results of other values being tested. This commit changes those
earlier test to use ASSERT instead of EXPECT to avoid cascade errors
when they are all related to the same issue.
Summary:
Testing of caret location in diagnostic message is currently made with
CHECK directive with the following general format:
CHECK: {{^ \^$}}
James Henderson suggested the following would be more readable:
CHECK: {{^}} ^{{$}}
and when whole lines can be matched (as is the case for command-line
testing where error messages do not include path):
CHECK: ^
using the option --match-full-lines.
This commit implements these 2 changes on all existing caret position
tests. It also aligns the caret to the character it is trying to match
in the above line.
Summary:
This patch is part of a patch series to add support for FileCheck
numeric expressions. This specific patch lift the restriction for a
numeric expression to either be a variable definition or a numeric
expression to try to match.
This commit allows a numeric variable to be set to the result of the
evaluation of a numeric expression after it has been matched
successfully. When it happens, the variable is allowed to be used on
the same line since its value is known at match time.
It also makes use of this possibility to reuse the parsing code to
parse a command-line definition by crafting a mirror string of the
-D option with the equal sign replaced by a colon sign, e.g. for option
'-D#NUMVAL=10' it creates the string
'-D#NUMVAL=10 (parsed as [[#NUMVAL:10]])' where the numeric expression
is parsed to define NUMVAL. This result in a few tests needing updating
for the location diagnostics on top of the tests for the new feature.
It also enables empty numeric expression which match any number without
defining a variable. This is done here rather than in commit #5 of the
patch series because it requires to dissociate automatic regex insertion
in RegExStr from variable definition which would make commit #5 even
bigger than it already is.
Copyright:
- Linaro (changes up to diff 183612 of revision D55940)
- GraphCore (changes in later versions of revision D55940 and
in new revision created off D55940)
George Rimar [Wed, 24 Jul 2019 12:20:42 +0000 (12:20 +0000)]
[Object/llvm-readobj] - Cleanup testing of the dynamic objects.
This patch touches a few test cases:
It removes dtflags.elf-x86-64 binary and elf-dtflags.test.
elf-dtflags.test is excessive because we have the
elf-dynamic-tags.test which test all non-machine specific tags.
It removes testing of --dynamic-table from test\Object\readobj-shared-object.test
(we have the elf-dynamic-tags.test for that), and simplifies this test case.
It moves testing of the headers from readobj-shared-object.test
to elf-file-headers.test.
Adds test/tools/llvm-readobj/elf-file-types.test and test/tools/llvm-readobj/elf-loadname.test.
It opens road for removing the readobj-shared-object.test completely soon.
George Rimar [Wed, 24 Jul 2019 12:16:22 +0000 (12:16 +0000)]
[yaml2obj] - Allow custom fields for the SHT_UNDEF sections.
This is a follow-up refactoring patch for recently
introduced functionality which which reduces the code duplication
and also makes possible to redefine all possible fields of
the first SHT_NULL section (previously it was only possible to set
sh_link and sh_size).
David Green [Wed, 24 Jul 2019 11:51:36 +0000 (11:51 +0000)]
[ARM] MVE predicate register support
This adds support code for building and shuffling i1 predicate registers. It
generally uses two basic principles, either converting the predicate into an
scalar (through a PREDICATE_CAST) and doing scalar operations on it there, or
by converting the register to an full vector register and back.
Some of the code here is a not super efficient but will hopefully cover most
cases of moving i1 vectors around and can be improved in subsequent patches.
David Green [Wed, 24 Jul 2019 11:08:14 +0000 (11:08 +0000)]
[ARM] MVE integer compares and selects
This adds the very basics for MVE vector predication, adding integer VCMP and
VSEL instruction support. This is done through predicate registers (MVT::v16i1,
MVT::v8i1, MVT::v4i1), but otherwise using same mechanics as NEON to custom
lower setcc's through ARMISD::VCXX nodes (VCEQ, VCGT, VCEQZ, etc).
An extra VCNE was added, as this can be handled sensibly by MVE's expanded
number of VCMP condition codes. (There are also VCLE and VCLT which are added
later).
VPSEL is also added here, simply selecting on the vselect.
Sam Parker [Wed, 24 Jul 2019 09:38:39 +0000 (09:38 +0000)]
[ARM][ParallelDSP] Fix pointer operand reordering
While combining two loads into a single load, we often need to
reorder the pointer operands for the new load. This reordering was
broken in the cases where there was a chain of values that built up
the pointer.
Petr Hosek [Wed, 24 Jul 2019 00:16:23 +0000 (00:16 +0000)]
[SafeStack] Insert the deref before remaining elements
This is a follow up to D64971. While we need to insert the deref after
the offset, it needs to come before the remaining elements in the
original expression since the deref needs to happen before the LLVM
fragment if present.
Copying them is expensive. This allows the tables to be moved around at
lower cost, and allows a remarks::StringTable to be constructed from
a remarks::ParsedStringTable.
Summary:
A number of EXPECT statements in FileCheck's unit tests are dependent
from results of other values being tested. This commit changes those
earlier test to use ASSERT instead of EXPECT to avoid cascade errors
when they are all related to the same issue.
Summary:
Testing of caret location in diagnostic message is currently made with
CHECK directive with the following general format:
CHECK: {{^ \^$}}
James Henderson suggested the following would be more readable:
CHECK: {{^}} ^{{$}}
and when whole lines can be matched (as is the case for command-line
testing where error messages do not include path):
CHECK: ^
using the option --match-full-lines.
This commit implements these 2 changes on all existing caret position
tests. It also aligns the caret to the character it is trying to match
in the above line.
Summary:
This patch is part of a patch series to add support for FileCheck
numeric expressions. This specific patch lift the restriction for a
numeric expression to either be a variable definition or a numeric
expression to try to match.
This commit allows a numeric variable to be set to the result of the
evaluation of a numeric expression after it has been matched
successfully. When it happens, the variable is allowed to be used on
the same line since its value is known at match time.
It also makes use of this possibility to reuse the parsing code to
parse a command-line definition by crafting a mirror string of the
-D option with the equal sign replaced by a colon sign, e.g. for option
'-D#NUMVAL=10' it creates the string
'-D#NUMVAL=10 (parsed as [[#NUMVAL:10]])' where the numeric expression
is parsed to define NUMVAL. This result in a few tests needing updating
for the location diagnostics on top of the tests for the new feature.
It also enables empty numeric expression which match any number without
defining a variable. This is done here rather than in commit #5 of the
patch series because it requires to dissociate automatic regex insertion
in RegExStr from variable definition which would make commit #5 even
bigger than it already is.
Copyright:
- Linaro (changes up to diff 183612 of revision D55940)
- GraphCore (changes in later versions of revision D55940 and
in new revision created off D55940)