Simon Pilgrim [Thu, 18 Jul 2019 14:33:25 +0000 (14:33 +0000)]
[X86] EltsFromConsecutiveLoads - support common source loads
This patch enables us to find the source loads for each element, splitting them into a Load and ByteOffset, and attempts to recognise consecutive loads that are in fact from the same source load.
A helper function, findEltLoadSrc, recurses to find a LoadSDNode and determines the element's byte offset within it. When attempting to match consecutive loads, byte offsetted loads then attempt to matched against a previous load that has already been confirmed to be a consecutive match.
Next step towards PR16739 - after this we just need to account for shuffling/repeated elements to create a vector load + shuffle.
Summary:
Commit r365249 changed usage of FileCheckNumericVariable to have one
instance of that class per variable as opposed to one instance per
definition of a given variable as was done before. However, it retained
the safety check in setValue that it should only be called with the
variable unset, even after r365625.
However this causes assert failure when a non-pseudo variable is being
redefined. And while redefinition of @LINE at each CHECK line work in
the general case, it caused problem when a substitution failed (fixed in
r365624) and still causes problem when a CHECK line does not match since
@LINE's value is cleared after substitutions in match() happened but
printSubstitutions also attempts a substitution.
This commit solves the root of the problem by changing setValue to set a
new value regardless of whether a value was set or not, thus fixing all
the aforementioned issues.
[x86] try harder to form LEA from ADD to avoid flag conflicts (PR40483)
LEA doesn't affect flags, so use it more liberally to replace an ADD when
we know that the ADD operands affect flags.
In the motivating example from PR40483:
https://bugs.llvm.org/show_bug.cgi?id=40483
...this lets us avoid duplicating a math op just to avoid flag conflict.
As mentioned in the TODO comments, this heuristic can be extended to
fire more often if that leads to more improvements.
George Rimar [Thu, 18 Jul 2019 12:14:36 +0000 (12:14 +0000)]
[llvm-readelf] - Remove the precompiled binary from gnu-hash-symbols.test
I am working on https://bugs.llvm.org/show_bug.cgi?id=42622
and this patch reworks the gnu-hash-symbols.test so that it
will be easier to expand it with x86_64 case.
[X86] Disable combineConcatVectors for vXi1 vectors.
I'm not convinced the code this calls is properly vetted for
vXi1 vectors. Experimental vector widening legalization testing
for D55251 is now hitting an assertion failure inside
EltsFromConsecutiveLoads. This is occurring from a v2i1 load
having a store size different than its VT size. Hopefully
this commit will keep such issues from happening.
Alex Bradbury [Thu, 18 Jul 2019 05:22:55 +0000 (05:22 +0000)]
[DWARF][RISCV] Add support for RISC-V relocations needed for debug info
When code relaxation is enabled many RISC-V fixups are not resolved but
instead relocations are emitted. This happens even for DWARF debug
sections. Therefore, to properly support the parsing of DWARF debug info
we need to be able to resolve RISC-V relocations. This patch adds:
* Support for RISC-V relocations in RelocationResolver
* DWARF support for two relocations per object file offset
* DWARF changes to support relocations in more DIE fields
The two relocations per offset change is needed because some RISC-V
relocations (used for label differences) come in pairs.
Relocations can also be emitted for DWARF fields where relocations were
not yet evaluated. Adding relocation support for some of these fields is
essencial. On the other hand, LLVM currently emits RISC-V relocations
for fixups that could be safely evaluated, since they can never be
affected by code relaxations. This patch also adds relocation support
for the fields affected by those extraneous relocations (the DWARF unit
entry Length, and the DWARF debug line entry TotalLength and
PrologueLength), for testing purposes.
Differential Revision: https://reviews.llvm.org/D62062
Patch by Luís Marques.
[llvm-bcanalyzer] Fixed error 'Expected<T> must be checked before access or destruction'
After rL365286 I had failing test:
LLVM :: tools/gold/X86/v1.12/thinlto_emit_linked_objects.ll
It was failing with the output:
$ llvm-bcanalyzer --dump llvm/test/tools/gold/X86/v1.12/Output/thinlto_emit_linked_objects.ll.tmp3.o.thinlto.bc
Expected<T> must be checked before access or destruction.
Unchecked Expected<T> contained error:
Unexpected end of file reading 0 of 0 bytesStack dump:
llvm-pdbdump: Fix several smaller issues with injected source compression handling
- getCompression() used to return a PDB_SourceCompression even though
the docs for IDiaInjectedSource are explicit about the return value
being compiler-dependent. Return an uint32_t instead, and make the
printing code handle unknown values better by printing "Unknown" and
the int value instead of not printing any compression.
- Print compressed contents as hex dump, not as string.
- Add compression type "DotNet", which is used (at least) by csc.exe,
the C# compiler. Also add a lengthy comment describing the stream
contents (derived from looking at the raw hex contents long enough
to see the GUIDs, which led me to the roslyn and mono implementations
for handling this).
- The native injected source dumper was dumping the contents of the
whole data stream -- but csc.exe writes a stream that's padded with
zero bytes to the next 512 boundary, and the dia api doesn't display
those padding bytes. So make NativeInjectedSource::getCode() do the
same thing.
This will let us instrument globals during initialization. This required
making the new PM pass a module pass, which should still provide access to
analyses via the ModuleAnalysisManager.
[PEI] Don't re-allocate a pre-allocated stack protector slot
The LocalStackSlotPass pre-allocates a stack protector and makes sure
that it comes before the local variables on the stack.
We need to make sure that later during PEI we don't re-allocate a new
stack protector slot. If that happens, the new stack protector slot will
end up being **after** the local variables that it should be protecting.
Therefore, we would have two slots assigned for two different stack
protectors, one at the top of the stack, and one at the bottom. Since
PEI will overwrite the assigned slot for the stack protector, the load
that is used to compare the value of the stack protector will use the
slot assigned by PEI, which is wrong.
For this, we need to check if the object is pre-allocated, and re-use
that pre-allocated slot.
Matt Arsenault [Wed, 17 Jul 2019 20:22:44 +0000 (20:22 +0000)]
GlobalISel: Handle widenScalar of arbitrary G_MERGE_VALUES sources
Extract the sources to the GCD of the original size and target size,
padding with implicit_def as necessary.
Also fix the case where the requested source type is wider than the
original result type. This was ignoring the type, and just using the
destination. Do the operation in the requested type and truncate back.
Matt Arsenault [Wed, 17 Jul 2019 20:22:38 +0000 (20:22 +0000)]
GlobalISel: Handle more cases for widenScalar of G_MERGE_VALUES
Use an anyext to the requested type for the leftover operand to
produce a slightly wider type, and then truncate the final merge.
I have another implementation almost ready which handles arbitrary
widens, but I think it produces worse code in this example (which I
think is 90% due to not folding redundant copies or folding out
implicit_def users), so I wanted to add this as a baseline first.
Lang Hames [Wed, 17 Jul 2019 16:40:52 +0000 (16:40 +0000)]
[ORC] Add deprecation warnings to ORCv1 layers and utilities.
Summary:
ORCv1 is deprecated. The current aim is to remove it before the LLVM 10.0
release. This patch adds deprecation attributes to the ORCv1 layers and
utilities to warn clients of the change.
These tests that failed on Darwin but passed on other machines due to the default archive format differing
on a Darwin machine, and what looks to be bugs in the output of this format.
I can not investigate these issue further so the tests are considered expected failures on Darwin.
Alex Bradbury [Wed, 17 Jul 2019 14:32:25 +0000 (14:32 +0000)]
[RISCV] Add RISCV to LLVM_ALL_TARGETS so it s built by default
This follows the RFC <http://lists.llvm.org/pipermail/llvm-dev/2019-July/133724.html>.
Follow-on commits will add appropriate release notes changes etc.
Pushing this now and in a minimal form so there is reasonable time before 9.0
branches to resolve any issues arising from e.g. the backend being exposed on
different sanitizer setups.
The current builder for RISC-V is on the staging build-bot
<http://lab.llvm.org:8014/builders/llvm-riscv-linux>, however with the RISCV
backend being built by default it won't provide any real additional coverage.
We will shortly set up a builder that runs the test-suite in qemu-user.
Alex Bradbury [Wed, 17 Jul 2019 14:00:35 +0000 (14:00 +0000)]
[AsmPrinter] Make the encoding of call sites in .gcc_except_table configurable and use for RISC-V
The original behavior was to always emit the offsets to each call site in the
call site table as uleb128 values, however on some architectures (eg RISCV)
these uleb128 offsets into the code cannot always be resolved until link time
(because relaxation will invalidate any calculated offsets), and there are no
appropriate relocations for uleb128 values. As a consequence it needs to be
possible to specify an alternative.
This also switches RISCV to use DW_EH_PE_udata4 for call side encodings in
.gcc_except_table
Differential Revision: https://reviews.llvm.org/D63415
Patch by Edward Jones.
Matt Arsenault [Wed, 17 Jul 2019 13:55:01 +0000 (13:55 +0000)]
Mips: Remove immarg from copy and insert intrinsics
These intrinsics do in fact work with non-constant index arguments.
These are lowered to either the generic
ISD::INSERT_VECTOR_ELT/ISD::EXTRACT_VECTOR_ELT, or to
VEXTRACT_SEXT_ELT. The handling of these all accept variable
indexes. Turning these into generic instructions which do allow
variables introduces complications in a future change to immarg
handling.
Since these just turn into generic instructions, these are kind of
pointless and should probably just be autoupgraded to
extractelement/insertelement.
Alex Bradbury [Wed, 17 Jul 2019 13:54:38 +0000 (13:54 +0000)]
[RISCV] Set correct encodings for DWARF exception handling
This patch sets correct encodings for DWARF exception handling for RISC-V
(other than call site encoding, which must be udata4 rather than uleb128 and
is handled by D63415).
This has the same intend as D63409, except this version matches GCC/binutils
behaviour which uses the same encodings regardless of PIC/non-PIC and
medlow/medany code model.
Summary:
Missed in the original commit, use the correct callee-saved register
list for spilling, instead of the standard SVR432 list. This avoids
needlessly spilling the SPE non-volatile registers when they're not used.
As part of this, also add where missing, and sort, the spill opcode
checks for SPE and SPE4 register classes.
Summary:
Pointed out in a comment for D49754, register spilling will currently
spill SPE registers at almost any offset. However, the instructions
`evstdd` and `evldd` require a) 8-byte alignment, and b) a limit of 256
(unsigned) bytes from the base register, as the offset must fix into a
5-bit offset, which ranges from 0-31 (indexed in double-words).
The update to the register spill test is taken partially from the test
case shown in D49754.
Additionally, pointed out by Kei Thomsen, globals will currently use
evldd/evstdd, though the offset isn't known at compile time, so may
exceed the 8-bit (unsigned) offset permitted. This fixes that as well,
by forcing it to always use evlddx/evstddx when accessing globals.
Petar Avramovic [Wed, 17 Jul 2019 12:08:01 +0000 (12:08 +0000)]
[MIPS GlobalISel] ClampScalar and select pointer G_ICMP
Add narrowScalar to half of original size for G_ICMP.
ClampScalar G_ICMP's operands 2 and 3 to to s32.
Select G_ICMP for pointers for MIPS32. Pointer compare is same
as for integers, it is enough to declare them as legal type.
[TableGen] Do not set ReadNone attribute on intrinsics with side effects
If an intrinsic is defined without outputs, but having side effects,
it still can be removed completely from the program. This patch makes
TableGen not set Attribute::ReadNone for intrinsics which
are declared with IntrHasSideEffects.
Migrate CallLowering::lowerReturnVal to use the same infrastructure as
lowerCall/FormalArguments and remove the now obsolete code path from
splitToValueTypes.
Simon Atanasyan [Wed, 17 Jul 2019 08:11:31 +0000 (08:11 +0000)]
[mips] Implement .cplocal directive
This directive forces to use the alternate register for context pointer.
For example, this code:
.cplocal $4
jal foo
expands to:
ld $25, %call16(foo)($4)
jalr $25
Simon Atanasyan [Wed, 17 Jul 2019 08:11:15 +0000 (08:11 +0000)]
[mips] Support the "o" inline asm constraint
As well as other LLVM targets we do not handle "offsettable"
memory addresses in any special way. In other words, the "o" constraint
is an exact equivalent of the "m" one. But some existing code require
the "o" constraint support.
It is possible that exit block has two predecessors and one of them is a latch
block while another is not.
Current algorithm is based on the assumption that all exits are dedicated
and therefore we can check only first predecessor of loop exit to find all unique
exits.
However if we do not consider latch block and it is first predecessor of some
exit then this exit will be found.
Regression test is added.
As a side effect of algorithm re-writing, the restriction that all exits are dedicated
is eliminated.
[TableGen] Generate offsets into a flat array for getOperandType
Rather than an array of std::initializer_list, generate a table of
offsets and a flat array of the operands for getOperandType. This is a
bit more efficient on platforms that don't manage to get the array of
inintializer_lists initialized at link time (I'm looking at you
macOS). It's also quite quite a bit faster to compile.
[WebAssembly] Compile all TLS on Emscripten as local-exec
Summary:
Currently, on Emscripten, dynamic linking is not supported with threads.
This means that if thread-local storage is used, it must be used in a
statically-linked executable. Hence, local-exec is the only possible model.
This diff compiles all TLS variables to use local-exec on Emscripten as a
temporary measure until dynamic linking is supported with threads.
The goal for this is to allow C++ types with constructors to be thread-local.
Currently, when `clang` compiles a `thread_local` variable with a constructor,
it generates `__tls_guard` variable:
@__tls_guard = internal thread_local global i8 0, align 1
As no TLS model is specified, this is treated as general-dynamic, which we do
not support (and cannot support without implementing dynamic linking support
with threads in Emscripten). As a result, any C++ constructor in `thread_local`
variables would not compile.
By compiling all `thread_local` as local-exec, `__tls_guard` will compile and
we can support C++ constructors with TLS without implementing dynamic linking
with threads.
[TableGen] Add "getOperandType" to get operand types from opcode/opidx
The InstrInfoEmitter outputs an enum called "OperandType" which gives
numerical IDs to each operand type. This patch makes use of this enum
to define a function called "getOperandType", which allows looking up
the type of an operand given its opcode and operand index.
Summary:
Thread local variables are placed inside a `.tdata` segment. Their symbols are
offsets from the start of the segment. The address of a thread local variable
is computed as `__tls_base` + the offset from the start of the segment.
`.tdata` segment is a passive segment and `memory.init` is used once per thread
to initialize the thread local storage.
`__tls_base` is a wasm global. Since each thread has its own wasm instance,
it is effectively thread local. Currently, `__tls_base` must be initialized
at thread startup, and so cannot be used with dynamic libraries.
`__tls_base` is to be initialized with a new linker-synthesized function,
`__wasm_init_tls`, which takes as an argument a block of memory to use as the
storage for thread locals. It then initializes the block of memory and sets
`__tls_base`. As `__wasm_init_tls` will handle the memory initialization,
the memory does not have to be zeroed.
To help allocating memory for thread-local storage, a new compiler intrinsic
is introduced: `__builtin_wasm_tls_size()`. This instrinsic function returns
the size of the thread-local storage for the current function.
The expected usage is to run something like the following upon thread startup:
This is part of what is requested by PR42023:
https://bugs.llvm.org/show_bug.cgi?id=42023
There's an extension needed for FP add, but exactly how we would specify
that using flags is not clear to me, so I left that as a TODO.
We're still missing patterns for partial reductions when the input vector
is 256-bit or 512-bit, but I think that's a failure of vector narrowing.
If we can reduce the widths, then this matching should work on those tests.
David Blaikie [Tue, 16 Jul 2019 21:15:19 +0000 (21:15 +0000)]
DWARF: Skip zero column for inline call sites
D64033 <https://reviews.llvm.org/D64033> added DW_AT_call_column for
inline sites. However, that change wasn't aware of "-gno-column-info".
To avoid adding column info when "-gno-column-info" is used, now
DW_AT_call_column is only added when we have non-zero column (when
"-gno-column-info" is used, column will be zero).
Matt Arsenault [Tue, 16 Jul 2019 20:15:30 +0000 (20:15 +0000)]
AMDGPU/GlobalISel: Select G_SHL
I think this manages to not break the DAG handling with the divergent
predicates because the stadalone divergent patterns end up with a
higher priority than the pattern on the instruction definition.
Philip Reames [Tue, 16 Jul 2019 18:23:49 +0000 (18:23 +0000)]
[IndVars] Speculative fix for an assertion failure seen in bots
I don't have an IR sample which is actually failing, but the issue described in the comment is theoretically possible, and should be guarded against even if there's a different root cause for the bot failures.
Matt Arsenault [Tue, 16 Jul 2019 18:05:29 +0000 (18:05 +0000)]
AMDGPU/GlobalISel: Select flat loads
Now that the patterns use the new PatFrag address space support, the
only blocker to importing most load patterns is the addressing mode
complex patterns.