[AArch64] NFC: Add generic StackOffset to describe scalable offsets.
To support spilling/filling of scalable vectors we need a more generic
representation of a stack offset than simply 'int'.
For this we introduce the StackOffset struct, which comprises multiple
offsets sized by their respective MVTs. Byte-offsets will thus be a simple
tuple such as { offset, MVT::i8 }. Adding two byte-offsets will result in a
byte offset { offsetA + offsetB, MVT::i8 }. When two offsets have different
types, we can canonicalise them to use the same MVT, as long as their
runtime sizes are guaranteed to have the same size-ratio as they would have
at compile-time.
When we have both scalable- and fixed-size objects on the stack, we can
create an offset that is:
The struct also contains a getForFrameOffset() method that is specific to
AArch64 and decomposes the frame-offset to be used directly in instructions
that operate on the stack or index into the stack.
Note: This patch adds StackOffset as an AArch64-only concept, but we would
like to make this a generic concept/struct that is supported by all
interfaces that take or return stack offsets (currently as 'int'). Since
that would be a bigger change that is currently pending on D32530 landing,
we thought it makes sense to first show/prove the concept in the AArch64
target before proposing to roll this out further.
Igor Kudrin [Tue, 6 Aug 2019 10:47:20 +0000 (10:47 +0000)]
Support 64-bit offsets in utility classes (1/5)
Using 64-bit offsets is required to fully implement 64-bit DWARF.
As these classes are used in many different libraries they should
temporarily support both 32- and 64-bit offsets.
Ulrich Weigand [Tue, 6 Aug 2019 10:43:13 +0000 (10:43 +0000)]
[Strict FP] Allow custom operation actions
This patch changes the DAG legalizer to respect the operation actions
set by the target for strict floating-point operations. (Currently, the
legalizer will usually fall back to mutate to the non-strict action
(which is assumed to be legal), and only skip mutation if the strict
operation is marked legal.)
With this patch, if whenever a strict operation is marked as Legal or
Custom, it is passed to the target as usual. Only if it is marked as
Expand will the legalizer attempt to mutate to the non-strict operation.
Note that this will now fail if the non-strict operation is itself
marked as Custom -- the target will have to provide a Custom definition
for the strict operation then as well.
Cullen Rhodes [Tue, 6 Aug 2019 09:46:13 +0000 (09:46 +0000)]
[SelectionDAG] Extend base addressing modes supported by MGATHER/MSCATTER
Summary:
Before this patch MGATHER/MSCATTER is capable of representing all
common addressing modes, but only when illegal types are used.
This patch adds an IndexType property so more representations
are available when using legal types only.
Original modes:
vector of bases
base + vector of signed scaled offsets
New modes:
base + vector of signed unscaled offsets
base + vector of unsigned scaled offsets
base + vector of unsigned unscaled offsets
The current behaviour of addressing modes for gather/scatter remains
unchanged.
[LLVM][Alignment] Introduce Alignment In Attributes
Summary:
This is patch is part of a serie to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
[LLVM][Alignment] Introduce Alignment In GlobalObject
Summary:
This is patch is part of a serie to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
George Rimar [Tue, 6 Aug 2019 08:02:25 +0000 (08:02 +0000)]
[llvm/test/Object] - Cleanup and move out the yaml2obj tests.
There are multiple yaml2obj-* tests in llvm/test/Object
folder. This is not correct place to have them and my intention
was to move them out to test\tools\yaml2obj folder. I reviewed
them, made some changes, and my comments are below.
For all tests I:
Added comments when needed.
Moved them from llvm/test/Object to yaml2obj tests.
Another changes performed:
1) yaml2obj-invalid.yaml. It was a test for an invalid YAML input.
I just moved it.
2) yaml2obj-coff-multi-doc.test/yaml2obj-elf-multi-doc.test:
these were a tests for testing --docnum=x functionality,
one was for COFF and one for ELF. I merged them into one.
3) yaml2obj-elf-bits-endian.test:
I removed its 4 YAML inputs (merged into the main test).
4) yaml2obj-readobj.test:
This file has a long history. It was added to check the
"parsing of header charactestics" initially. Then was used to test
how yaml2obj writes the relocations. Then was upgraded to check how
yaml2obj handle "-o" option. I think it should be heavily splitted
and refactored in a separate patch. For now I leaved it as is, but restyled
to reduce the changes in a follow-ups.
5) yaml2obj-elf-alignment.yaml: its intention was to check we
can set sh-addralign field. I moved, renamed (to elf-sh-addralign.yaml)
and updated this test.
6) yaml2obj-elf-file-headers.yaml: I removed it.
It's intention was to check that
yaml2obj handles OS/ABI and ELF type (e.g Relocatable).
We are testing this already, for example in D64800. We might want
to add a better (more complete) test, but keeping the existent test
does not have much sense I think.
7) yaml2obj-elf-file-headers-with-e_flags.yaml: I would describe its intention
as "testing MIPS e_flags". It is far from being complete and tests only
a few flags. I leaved it alone for now.
8) yaml2obj-elf-rel.yaml: its intention is to check the MIPS32 relocations.
We have a version for MIPS64 here: test\Object\Mips\elf-mips64-rel.yaml
Seems them both are incomplete. I leaved them alone for now.
9) yaml2obj-elf-rel-noref.yaml: was introduced to check the support of arm32
R_ARM_V4BX relocatiion. I leaved it alone for now.
10) yaml2obj-elf-section-basic.yaml: it just checked that we are able to recognise
trivial fields like section 'Name', 'Type', 'Flags' and others. All of our yaml2obj
tests are heavily using it. I just removed this test.
11) yaml2obj-elf-section-invalid-size.yaml: its intention was to check the
"Section size must be greater than or equal to the content size" error.
I moved this test to `tools\yaml2obj\section-size-content.yaml'
12) yaml2obj-elf-symbol-basic.yaml: its intention seems was to support declarations
of the symbols in yaml2obj. I removed it. We use this in almost each test we already have.
13) yaml2obj-elf-symbol-LocalGlobalWeak.yaml: its intention was to check that we can
declare different symbol bindings. I moved it to tools\yaml2obj\elf-symbol-binding.yaml.
14) yaml2obj-coff-invalid-alignment.test: check that error is reported for a too large coff
section alignment. Moved it to tools\yaml2obj\coff-invalid-alignment.test
15) yaml2obj-elf-symbol-visibility.yaml: tests ELF symbols visibility. I improved it and
moved to tools\yaml2obj\elf-symbol-visibility.yaml and tools\obj2yaml\elf-symbol-visibility.yaml
[Attributor] Provide a generic interface to check live instructions
Summary:
Similar to `Attributor::checkForAllCallSites`, we now provide such
functionality for instructions of a certain opcode through
`Attributor::checkForAllInstructions` and the convenient wrapper
`Attributor::checkForAllCallLikeInstructions`. This cleans up code,
avoids duplication, and simplifies the usage of liveness information.
Fixes e.g.
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap-ubsan/builds/14273
We can't left shift here because left shifting of a negative number is UB.
The same doesn't apply to unsigned arithmetic, but switching to unsigned
doesn't appear to stop ubsan from complaining, so we need to mask out the
high bits.
[Attributor] Introduce the IRAttribute helper struct
Summary:
Certain properties, e.g., an AttrKind, are not shared among all abstract
attributes. This patch extracts the functionality into a helper struct.
To remove boilerplate, mostly passing through values to the
AbstractAttriubute base class, we extract the state into an IRPosition
helper. There is no function change intended but the IRPosition struct
will provide more functionality down the line.
Summary:
The new scheme is similar to the pass manager and dyn_cast scheme where
we identify classes by the address of a static member. This is better
than the old scheme in which we had to "invent" new Attributor enums if
there was no corresponding one.
[Attributor] Deduce the "no-return" attribute for functions
A function is "no-return" if we never reach a return instruction, either
because there are none or the ones that exist are dead.
Test have been adjusted:
- either noreturn was added, or
- noreturn was avoided by modifying the code.
The new noreturn_{sync,async} test make sure we do handle invoke
instructions with a noreturn (and potentially nowunwind) callee
correctly, even in the presence of potential asynchronous exceptions.
Philip Reames [Mon, 5 Aug 2019 22:34:59 +0000 (22:34 +0000)]
Add a note to the release not about a potentially breaking optimization
This has come up twice already (once in pr42763 and once in the commit thread), so give warning of a new way in which UB can result in unexpected program behavior.
Keno Fischer [Mon, 5 Aug 2019 21:36:09 +0000 (21:36 +0000)]
[WebAssembly] Fix conflict between ret legalization and sjlj
Summary:
When the WebAssembly backend encounters a return type that doesn't
fit within i32, SelectionDAG performs sret demotion, adding an
additional argument to the start of the function that contains
a pointer to an sret buffer to use instead. However, this conflicts
with the emscripten sjlj lowering pass. There we translate calls like:
```
call {i32, i32} @foo()
```
into (in pseudo-llvm)
```
%addr = @foo
call {i32, i32} @__invoke_{i32,i32}(%addr)
```
i.e. we perform an indirect call through an extra function.
However, the sret transform now transforms this into
the equivalent of
```
%addr = @foo
%sret = alloca {i32, i32}
call {i32, i32} @__invoke_{i32,i32}(%sret, %addr)
```
(while simultaneously translation the implementation of @foo as well).
Unfortunately, this doesn't work out. The __invoke_ ABI expected
the function address to be the first argument, causing crashes.
There is several possible ways to fix this:
1. Implementing the sret rewrite at the IR level as well and performing
it as part of lowering to __invoke
2. Fixing the wasm backend to recognize that __invoke has a special ABI
3. A change to the binaryen/emscripten ABI to recognize this situation
This revision implements the middle option, teaching the backend to
treat __invoke_ functions specially in sret lowering. This is achieved
by
1) Introducing a new CallingConv ID for invoke functions
2) When this CallingConv ID is seen in the backend and the first argument
is marked as sret (a function pointer would never be marked as sret),
swapping the first two arguments.
[Attributor][Fix] Do not remove instructions during manifestation
When we remove instructions cached references could still be live. This
patch avoids removing invoke instructions that are replaced by calls and
instead keeps them around but in a dead block.
[Attributor][Fix] Keep invokes if handlers catch asynchronous exceptions
Similar to other places where we transform invokes to calls we need to
be careful if the handler (=personality) can catch asynchronous
exceptions as they are not modeled as part of nounwind.
BMI2 support is indicated in bit eight of EBX, not nine.
See Intel SDM, Vol 2A, Table 3-8:
https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-vol-2a-manual.pdf#page=296
llvm-symbolizer: Untag addresses in object files by default.
Any addresses that we pass to llvm-symbolizer are going to be untagged,
while any HWASAN instrumented globals are going to be tagged in the
symbol table. Therefore we need to untag the addresses before using them.
Daniel Sanders [Mon, 5 Aug 2019 19:50:25 +0000 (19:50 +0000)]
Register/MCRegister: Add conversion operators to avoid use of implicit convert to unsigned. NFC
Summary:
This has no functional effect but makes it more obvious which parts of the
compiler do not use Register/MCRegister when you mark the implicit conversion
deprecated.
Implicit conversions for comparisons accounted for ~20% (~3k of ~13k) of
the implicit conversions when I first measured it. I haven't maintained
those numbers as other patches have landed though so it may be out of date.
Louis Dionne [Mon, 5 Aug 2019 18:29:14 +0000 (18:29 +0000)]
[libc++] Take 2: Integrate the PSTL into libc++
Summary:
This commit allows specifying LIBCXX_ENABLE_PARALLEL_ALGORITHMS when
configuring libc++ in CMake. When that option is enabled, libc++ will
assume that the PSTL can be found somewhere on the CMake module path,
and it will provide the C++17 parallel algorithms based on the PSTL
(that is assumed to be available).
The commit also adds support for running the PSTL tests as part of
the libc++ test suite.
The first attempt to commit this failed because it exposed a bug in the
tests for modules. Now that this has been fixed, it should be safe to
commit this.
Craig Topper [Mon, 5 Aug 2019 18:25:36 +0000 (18:25 +0000)]
[X86] Enable -x86-experimental-vector-widening-legalization by default.
This patch changes our defualt legalization behavior for 16, 32, and
64 bit vectors with i8/i16/i32/i64 scalar types from promotion to
widening. For example, v8i8 will now be widened to v16i8 instead of
promoted to v8i16. This keeps the elements widths the same and pads
with undef elements. We believe this is a better legalization strategy.
But it carries some issues due to the fragmented vector ISA. For
example, i8 shifts and multiplies get widened and then later have
to be promoted/split into vXi16 vectors.
This has the potential to cause regressions so we wanted to get
it in early in the 10.0 cycle so we have plenty of time to
address them.
Next steps will be to merge tests that explicitly test the command
line option. And then we can remove the option and its associated
code.
Philip Reames [Mon, 5 Aug 2019 18:25:08 +0000 (18:25 +0000)]
Robustify update_test_checks.py to non-autogened tests, and add a mode to skip non-autogenerated ones
Intended use case is:
./utils/update_test_checks.py test/Transform/PassDir/* --update-only
(i.e. rapidly be able to see changes in autogened filed, before handing non-autogened tests individually)
Evandro Menezes [Mon, 5 Aug 2019 18:09:14 +0000 (18:09 +0000)]
[AArch64] Expand bcmp() for small block lengths
Patch D56593 by @courbet results in calls to `bcmp()` in some cases, should
the target support the it. Unless `TTI::MemCmpExpansionOptions()`
is overridden by the target.
In a proprietary benchmark we see a performance drop of about 12% on PNG
compression before this patch, though it passes all tests.
This patch mirrors X86 for AArch64 and initializes
`TTI::MemCmpExpansionOptions()` to then expand calls to `bcmp()` when
appropriate. No tuning of the parameters was performed, but, at this point,
it's enough to recover the performance drop above.
This problem also exists on ARM. Once a consensus is reached for AArch64, we
can work to fix ARM as well.
Chris Bieneman [Mon, 5 Aug 2019 17:50:08 +0000 (17:50 +0000)]
NATIVE tablegen needs to depend on target tablegen
This dependency was removed in r357486, which has lead to a stream of difficult to diagnose bugs.
Without this dependency, when building with `LLVM_OPTIMIZED_TABLEGEN=On` the native tablegen executible may not be rebuilt at all, and often won't get rebuilt before targets that use the tablegen headers. In the best case this results in a build-time failure, in the worst case it results in runtime failures.
Pablo Barrio [Mon, 5 Aug 2019 17:38:58 +0000 (17:38 +0000)]
[AArch64] Set preferred function alignment to 16 bytes on Neoverse N1
Summary:
The Arm Neoverse N1 Software Optimization Guide [1], Section "4.8 Branch
instruction alignment" states:
"Consider aligning subroutine entry points and branch targets to 32B
boundaries, within the bounds of the code-density requirements of the
program."
This patch sets the preferred function alignment on Neoverse N1 to 2^4=16B.
This was already the case in some of the latest Cortex-A CPUs. Benchmarking
in previous Cortex-A CPUs suggested that 16B alignment is already better
than the default. See commit d04ee305.
The reason we don't set it to 32B right now (as the optimisation guide
suggests) is that this will impact code size and perhaps the instruction
cache performance. Therefore we need benchmark numbers first.
I have also added testing for A75 and A76 that we were missing.
Sanjay Patel [Mon, 5 Aug 2019 16:59:58 +0000 (16:59 +0000)]
[InstCombine] combine mul+shl separated by zext
This appears to slightly help patterns similar to what's
shown in PR42874:
https://bugs.llvm.org/show_bug.cgi?id=42874
...but not in the way requested.
That fix will require some later IR and/or backend pass to
decompose multiply/shifts into something more optimal per
target. Those transforms already exist in some basic forms,
but probably need enhancing to catch more cases.
Tom Stellard [Mon, 5 Aug 2019 16:08:44 +0000 (16:08 +0000)]
AMDGPU/LoadStoreOptimizer: Set the correct offset whem merging MMOs
Summary:
This is a follow up to r367237. MachineFunction::getMachineMemOperand()
adds the offset parameter to the existing offset instead of resetting it.
So we need to reset the offset to the correct value after calling this
function.
Summary:
Core files have different descriptions for note values. llvm-readelf currently prints the generic note type, which is wrong when using it to read a core file.
To verify the constants/strings, see:
Values: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=blob;f=include/elf/common.h;h=75c4fb7e9d7c0f780d635ac305f579546b7b071b;hb=HEAD#l571
Strings: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=blob;f=binutils/readelf.c;h=c31a5c1266b7bb62a485895b01b49e1f832ade35;hb=HEAD#l16881
Note: this does not handle printing the note data for NT_FILE, it just fixes the descriptions.
Matt Arsenault [Mon, 5 Aug 2019 14:40:26 +0000 (14:40 +0000)]
AMDGPU/GlobalISel: Alternative mappings for constants
Without context we assume SGPR. Allowing VGPR constants theoretically
helps avoid a copy. This seems to not actually work now, and the
choice isn't based on the use bank.
Matt Arsenault [Mon, 5 Aug 2019 14:40:23 +0000 (14:40 +0000)]
AMDGPU/GlobalISel: Don't reject shader types
I'm not sure what complications these present, but the current
argument lowering is pretty much directly copied from the DAG
lowering, so I assume these work as they should.
No tests because I'm lazy and things are getting pretty close to the
point where the existing calling-conventions.ll can be shared with
SelectionDAG.
Hubert Tong [Mon, 5 Aug 2019 13:55:41 +0000 (13:55 +0000)]
[yaml2obj][tests] Fix overly restrictive od output check
Summary:
rL364517 introduced further instances of `od` output checking of the
kind previously corrected by rL363829. This patch corrects the issue by
suppressing output of the input offset. The check remains sufficiently
sensitive to test for the intended value of the specific byte since the
relevant byte value is the only output we are expecting from `od`.
Cullen Rhodes [Mon, 5 Aug 2019 13:44:10 +0000 (13:44 +0000)]
[AArch64] Implement initial SVE calling convention support
Summary:
This patch adds initial support for the SVE calling convention such that
SVE types can be passed as arguments and return values to/from a
subroutine.
The SVE AAPCS states [1]:
z0-z7 are used to pass scalable vector arguments to a subroutine,
and to return scalable vector results from a function. If a
subroutine takes arguments in scalable vector or predicate
registers, or if it is a function that returns results in such
registers, it must ensure that the entire contents of z8-z23 are
preserved across the call. In other cases it need only preserve the
low 64 bits of z8-z15, as described in ยง5.1.2.
p0-p3 are used to pass scalable predicate arguments to a subroutine
and to return scalable predicate results from a function. If a
subroutine takes arguments in scalable vector or predicate
registers, or if it is a function that returns results in these
registers, it must ensure that p4-p15 are preserved across the call.
In other cases it need not preserve any scalable predicate register
contents.
SVE predicate and data registers are passed indirectly (i.e. spilled to the
stack and pass the address) if they exceed the registers used for argument
passing defined by the PCS referenced above. Until SVE stack support is merged
we can't spill SVE registers to the stack, so currently an llvm_unreachable is
used where we will eventually handle this.
Hans Wennborg [Mon, 5 Aug 2019 13:04:12 +0000 (13:04 +0000)]
test-release.sh: Perform the sed substitution on both files (PR42739)
The comparison would otherwise fail if Phase2 occurrs naturally in the
object file. It would get replaced with Phase3 in the one .o, but not
in the other.
We were already running both files through sed to have them processed in
this same way; this is a logical extension of that.
Sanjay Patel [Mon, 5 Aug 2019 11:27:07 +0000 (11:27 +0000)]
[DAGCombiner][x86] prevent infinite loop from truncate/extend transforms
The test case is based on the example from the post-commit thread for:
https://reviews.llvm.org/rGc9171bd0a955
This replaces the x86-specific simple-type check from:
rL367766
with a check in the DAGCombiner. Adding the check isn't
strictly necessary after the fix from:
rL367768
...but it seems likely that we're heading for trouble if
we are creating weird types in this transform.
I combined the earlier legality check into the initial
clause to simplify the code.
So we should only try the trunc/sext transform at the
earliest combine stage, but we limit the transform to
simple types anyway because the TLI hook is probably
too lax about what it considers a free truncate.