Gadi Haber [Wed, 13 Dec 2017 09:13:53 +0000 (09:13 +0000)]
[X86][BMI]: Adding full coverage of MC encoding for the BMI isa set.<NFC>
NFC.
Adding MC regressions tests to cover the BMI1 and BMI2 ISA sets both 32 and 64 bit.
This patch is part of a larger task to cover MC encoding of all X86 ISA Sets.
started in revision: https://reviews.llvm.org/D39952
Alex Bradbury [Wed, 13 Dec 2017 09:02:13 +0000 (09:02 +0000)]
[cmake] Fix host tools build in when LLVM_EXPERIMENTAL_TARGETS_TO_BUILD is set
r320413 triggered cmake configure failures when building with
-DLLVM_OPTIMIZED_TABLEGEN=True and with LLVM_EXPERIMENTAL_TARGETS_TO_BUILD set
(e.g. to RISCV). This is because that patch moved to passing through
LLVM_TARGETS_TO_BUILD, and at that point LLVM_EXPERIMENTAL_TARGETS_TO_BUILD
has been merged in to it. LLVM_EXPERIMENTAL_TARGETS_TO_BUILD must be also be
passed through to avoid errors like below:
-- Constructing LLVMBuild project information
CMake Error at CMakeLists.txt:682 (message):
The target `RISCV' does not exist.
Craig Topper [Wed, 13 Dec 2017 07:26:17 +0000 (07:26 +0000)]
[Targets] Don't automatically include the scheduler class enum from *GenInstrInfo.inc with GET_INSTRINFO_ENUM. Make targets request is separately.
Most of the targets don't need the scheduler class enum.
I have an X86 scheduler model change that causes some names in the enum to become about 18000 characters long. This is because using instregex in scheduler models causes the scheduler class to get named with every instruction that matches the regex concatenated together. MSVC has a limit of 4096 characters for an identifier name. Rather than trying to come up with way to reduce the name length, I'm just going to sidestep the problem by not including the enum in X86.
This change makes XRay print the log file output only when the verbosity
level is higher than 0. It reduces the log spam in the default case when
we want XRay running silently, except when there are actual
fatal/serious errors.
We also update the documentation to show how to get the information
after the change to the default behaviour.
Serguei Katkov [Wed, 13 Dec 2017 05:32:46 +0000 (05:32 +0000)]
[NFC] Refactor SafepointIRVerifier
Now two classes are responsible for verification: one of them can track GC
pointers and know whether a pointer is relocated or not and another based on
that information can verify uses of GC pointers.
Mohammad Shahid [Wed, 13 Dec 2017 03:08:29 +0000 (03:08 +0000)]
[SLP] Vectorize jumbled memory loads.
Summary:
This patch tries to vectorize loads of consecutive memory accesses, accessed
in non-consecutive or jumbled way. An earlier attempt was made with patch D26905
which was reverted back due to some basic issue with representing the 'use mask' of
jumbled accesses.
This patch fixes the mask representation by recording the 'use mask' in the usertree entry.
Florian Hahn [Wed, 13 Dec 2017 03:05:20 +0000 (03:05 +0000)]
[CallSiteSplitting] Refactor creating callsites.
Summary:
This change makes the call site creation more general if any of the
arguments is predicated on a condition in the call site's predecessors.
If we find a callsite, that potentially can be split, we collect the set
of conditions for the call site's predecessors (currently only 2
predecessors are allowed). To do that, we traverse each predecessor's
predecessors as long as it only has single predecessors and record the
condition, if it is relevant to the call site. For each condition, we
also check if the condition is taken or not. In case it is not taken,
we record the inverse predicate.
We use the recorded conditions to create the new call sites and split
the basic block.
This has 2 benefits: (1) it is slightly easier to see what is going on
(IMO) and (2) we can easily extend it to handle more complex control
flow.
Michael Trent [Tue, 12 Dec 2017 23:53:46 +0000 (23:53 +0000)]
Updated llvm-objdump to display local relocations in Mach-O binaries
Summary:
llvm-objdump's Mach-O parser was updated in r306037 to display external
relocations for MH_KEXT_BUNDLE file types. This change extends the Macho-O
parser to display local relocations for MH_PRELOAD files. When used with
the -macho option relocations will be displayed in a historical format.
Alexey Bataev [Tue, 12 Dec 2017 20:28:46 +0000 (20:28 +0000)]
[InstCombine] Fix PR35618: Instcombine hangs on single minmax load bitcast.
Summary:
If we have pattern `store (load(bitcast(select (cmp(V1, V2), &V1,
&V2)))), bitcast)`, but the load is used in other instructions, it leads
to looping in InstCombiner. Patch adds additional check that all users
of the load instructions are stores and then replaces all uses of load
instruction by the new one with new type.
Shuffle generation uses vmux to collapse vectors resulting from two
individual shuffles into one. The indexes of the elements selected
from the first operand were indicated by 0xFF in the constant vector
used in the compare instruction, but the compare (veqb) set the bits
corresponding to the 0x00 elements, thus inverting the selection.
Reverse the order of operands to vmux to get the correct output.
Sanjoy Das [Tue, 12 Dec 2017 19:11:31 +0000 (19:11 +0000)]
Reapply "[X86] Flag BroadWell scheduler model as complete"
This reverts commit r320508, in effect re-applying r320308. Simon has already
reverted the parts that caused the crash that motivated the revert in r320492.
Hiroshi Yamauchi [Tue, 12 Dec 2017 19:07:43 +0000 (19:07 +0000)]
Split IndirectBr critical edges before PGO gen/use passes.
Summary:
The PGO gen/use passes currently fail with an assert failure if there's a
critical edge whose source is an IndirectBr instruction and that edge
needs to be instrumented.
To avoid this in certain cases, split IndirectBr critical edges in the PGO
gen/use passes. This works for blocks with single indirectbr predecessors,
but not for those with multiple indirectbr predecessors (splitting an
IndirectBr critical edge isn't always possible.)
Alexey Bataev [Tue, 12 Dec 2017 18:47:00 +0000 (18:47 +0000)]
[InstCombine] Fix PR35618: Instcombine hangs on single minmax load bitcast.
Summary:
If we have pattern `store (load(bitcast(select (cmp(V1, V2), &V1,
&V2)))), bitcast)`, but the load is used in other instructions, it leads
to looping in InstCombiner. Patch adds additional check that all users
of the load instructions are stores and then replaces all uses of load
instruction by the new one with new type.
Craig Topper [Tue, 12 Dec 2017 18:39:04 +0000 (18:39 +0000)]
[X86] Add a couple TODOs about missing coverage/features motivated by D40335
D40335 was wanting to add FMSUBADD support, but it discovered that there are two pieces of code to make FMADDSUB and only one of those is tested. So I've asked that review to implement the one path until we get tests that test the existing code.
Nirav Dave [Tue, 12 Dec 2017 18:25:48 +0000 (18:25 +0000)]
[X86] Cleanup type conversion of 64-bit load-store pairs.
Summary:
Simplify and generalize chain handling and search for 64-bit load-store pairs.
Nontemporal test now converts 64-bit integer load-store into f64 which it realizes directly instead of splitting into two i32 pairs.
Geoff Berry [Tue, 12 Dec 2017 17:53:59 +0000 (17:53 +0000)]
[MachineOperand][MIR] Add isRenamable to MachineOperand.
Summary:
Add isRenamable() predicate to MachineOperand. This predicate can be
used by machine passes after register allocation to determine whether it
is safe to rename a given register operand. Register operands that
aren't marked as renamable may be required to be assigned their current
register to satisfy constraints that are not captured by the machine
IR (e.g. ABI or ISA constraints).
Alexey Bataev [Tue, 12 Dec 2017 17:19:15 +0000 (17:19 +0000)]
[InstCombine] Fix PR35618: Instcombine hangs on single minmax load bitcast.
Summary:
If we have pattern `store (load(bitcast(select (cmp(V1, V2), &V1,
&V2)))), bitcast)`, but the load is used in other instructions, it leads
to looping in InstCombiner. Patch adds additional check that all users
of the load instructions are stores and then replaces all uses of load
instruction by the new one with new type.
Alexey Bataev [Tue, 12 Dec 2017 16:58:48 +0000 (16:58 +0000)]
[InstCombine] Fix PR35618: Instcombine hangs on single minmax load bitcast.
Summary:
If we have pattern `store (load(bitcast(select (cmp(V1, V2), &V1,
&V2)))), bitcast)`, but the load is used in other instructions, it leads
to looping in InstCombiner. Patch adds additional check that all users
of the load instructions are stores and then replaces all uses of load
instruction by the new one with new type.
Simon Pilgrim [Tue, 12 Dec 2017 16:12:53 +0000 (16:12 +0000)]
[X86] Remove CompleteModel tags from CPU targets until we have better error checking (PR35636)
The checks we have for complete models are not great and miss many cases - e.g. in PR35636 it failed to recognise that only the first output (of 2) was actually tagged by the InstRW
Alexey Bataev [Tue, 12 Dec 2017 15:54:49 +0000 (15:54 +0000)]
[InstCombine] Fix PR35618: Instcombine hangs on single minmax load bitcast.
Summary:
If we have pattern `store (load(bitcast(select (cmp(V1, V2), &V1,
&V2)))), bitcast)`, but the load is used in other instructions, it leads
to looping in InstCombiner. Patch adds additional check that all users
of the load instructions are stores and then replaces all uses of load
instruction by the new one with new type.
Alex Bradbury [Tue, 12 Dec 2017 15:46:15 +0000 (15:46 +0000)]
[RISCV] Implement assembler pseudo instructions for RV32I and RV64I
Adds the assembler pseudo instructions of RV32I and RV64I which can
be mapped to a single canonical instruction. The missing pseudo
instructions (e.g., call, tail, ...) are marked as TODO. Other
things, like for example PCREL_LO, have to be implemented first.
Currently, alias emission is disabled by default to keep the patch
minimal. Alias emission by default will be enabled in a subsequent
patch which also updates all affected tests. Note that this patch
should actually break the floating point MC tests. However, the
used FileCheck configuration is not tight enought to detect the
breakage.
Alex Bradbury [Tue, 12 Dec 2017 15:17:45 +0000 (15:17 +0000)]
[RISCV] MC layer support for the instructions added in the privileged spec
Adds support for the instructions added in the RISC-V privileged ISA
(https://content.riscv.org/wp-content/uploads/2017/05/riscv-privileged-v1.10.pdf):
uret, sret, mret, wfi, and sfence.vma.
Note from the committer: I made very minor formatting changes prior to commit,
which didn't seem worth creating another review round-trip for.
Alexey Bataev [Tue, 12 Dec 2017 15:03:17 +0000 (15:03 +0000)]
[InstCombine] Fix PR35618: Instcombine hangs on single minmax load bitcast.
If we have pattern `store (load(bitcast(select (cmp(V1, V2), &V1,
&V2)))), bitcast)`, but the load is used in other instructions, it leads
to looping in InstCombiner. Patch adds additional check that all users
of the load instructions are stores and then replaces all uses of load
instruction by the new one with new type.
Ayman Musa [Tue, 12 Dec 2017 14:13:51 +0000 (14:13 +0000)]
[X86] Recognize constant arrays with special values and replace loads from it with subtract and shift instructions, which then will be replaced by X86 BZHI machine instruction.
Recognize constant arrays with the following values:
0x0, 0x1, 0x3, 0x7, 0xF, 0x1F, .... , 2^(size - 1) -1
where //size// is the size of the array.
the result of a load with index //idx// from this array is equivalent to the result of the following:
(0xFFFFFFFF >> (sub 32, idx)) (assuming the array of type 32-bit integer).
And the result of an 'AND' operation on the returned value of such a load and another input, is exactly equivalent to the X86 BZHI instruction behavior.
See test cases in the LIT test for better understanding.
Anna Thomas [Tue, 12 Dec 2017 14:12:33 +0000 (14:12 +0000)]
[InstComineLoadStoreAlloca] Optimize stores to GEP off null base
Summary:
Currently, in InstCombineLoadStoreAlloca, we have simplification
rules for the following cases:
1. load off a null
2. load off a GEP with null base
3. store to a null
This patch adds support for the fourth case which is store into a
GEP with null base. Since this is UB as well (and directly analogous to
the load off a GEP with null base), we can substitute the stored val
with undef in instcombine, so that SimplifyCFG can optimize this code
into unreachable code.
Note: Right now, simplifyCFG hasn't been taught about optimizing
this to unreachable and adding an llvm.trap (this is already done for
the above 3 cases).
Nemanja Ivanovic [Tue, 12 Dec 2017 12:33:09 +0000 (12:33 +0000)]
[PowerPC] Add branch flag on asm parser-only branch instructions
This flag was missing but it wasn't an issue as nothing depended on it
for these asm parser-only instructions. Now that LLDB support is slowly
landing, it is important to get this right.
Committing on behalf of Leonardo Bianconi.
Nemanja Ivanovic [Tue, 12 Dec 2017 12:09:34 +0000 (12:09 +0000)]
[PowerPC] Follow-up to r318436 to get the missed CSE opportunities
The last of the three patches that https://reviews.llvm.org/D40348 was
broken up into.
Canonicalize the materialization of constants so that they are more likely
to be CSE'd regardless of the bit-width of the use. If a constant can be
materialized using PPC::LI, materialize it the same way always.
For example:
li 4, -1
li 4, 255
li 4, 65535
are equivalent if the uses only use the low byte. Canonicalize it to the
first form.
Simon Pilgrim [Tue, 12 Dec 2017 11:34:25 +0000 (11:34 +0000)]
Revert r320461 - causing ICE in windows buildss
[X86] Use regular expressions more aggressively to reduce the number of scheduler entries needed for FMA3 instructions.
When the scheduler tables are generated by tablegen, the instructions are divided up into groups based on their default scheduling information and how they are referenced by groups for each processor. For any set of instructions that are matched by a specific InstRW line, that group of instructions is guaranteed to not be in a group with any other instructions. So in general, the more InstRW class definitions are created, the more groups we end up with in the generated files. Particularly if a lot of the InstRW lines only match to single instructions, which is true of a large number of the Intel scheduler models.
This change alone reduces the number of instructions groups from ~6000 to ~5500. And there's lots more we could do.
Craig Topper [Tue, 12 Dec 2017 08:17:04 +0000 (08:17 +0000)]
[X86] Use regular expressions more aggressively to reduce the number of scheduler entries needed for FMA3 instructions.
When the scheduler tables are generated by tablegen, the instructions are divided up into groups based on their default scheduling information and how they are referenced by groups for each processor. For any set of instructions that are matched by a specific InstRW line, that group of instructions is guaranteed to not be in a group with any other instructions. So in general, the more InstRW class definitions are created, the more groups we end up with in the generated files. Particularly if a lot of the InstRW lines only match to single instructions, which is true of a large number of the Intel scheduler models.
This change alone reduces the number of instructions groups from ~6000 to ~5500. And there's lots more we could do.
Mikael Holmen [Tue, 12 Dec 2017 07:29:57 +0000 (07:29 +0000)]
[CallSiteSplitting] Don't let debug intrinsics affect optimizations
Summary:
This solves PR35616.
We don't want the compiler to generate different code when we compile
with/without -g, so we now ignore debug intrinsics when determining if
the optimization can trigger or not.
Tony Tye [Tue, 12 Dec 2017 05:47:00 +0000 (05:47 +0000)]
[AMDGPU] Rename Bonaire target to be gfx704; remove gfx800 and make Iceland and Tonga both use gfx802; update target feature handling
Correct committed version to match intended accepted review D40051 id=123417
- Rename Bonaire target to be gfx704.
- Eliminate gfx800 and make Iceland and Tonga both use gfx802 as they use the same code.
- List target features supported by each processor in the processor table together with the default value.
- Add xnack flag to e_flags.
- Remove xnack from kernel metadata and kernel descriptor since it is now a whole code object property.
Hans Wennborg [Mon, 11 Dec 2017 21:15:27 +0000 (21:15 +0000)]
Revert r320407 "[InstCombine] Fix PR35618: Instcombine hangs on single minmax load bitcast."
The tests fail (opt asserts) on Windows.
> Summary:
> If we have pattern `store (load(bitcast(select (cmp(V1, V2), &V1,
> &V2)))), bitcast)`, but the load is used in other instructions, it leads
> to looping in InstCombiner. Patch adds additional check that all users
> of the load instructions are stores and then replaces all uses of load
> instruction by the new one with new type.
>
> Reviewers: RKSimon, spatel, majnemer
>
> Subscribers: llvm-commits
>
> Differential Revision: https://reviews.llvm.org/D41072
Evandro Menezes [Mon, 11 Dec 2017 21:09:27 +0000 (21:09 +0000)]
[CodeGen] Improve the consistency of instruction fusion*
When either instruction in a fused pair has no other dependency, besides on
the other instruction, make sure that other instructions do not get
scheduled between them. Additionally, avoid fusing an instruction more than
once along the same dependency chain.
Adrian Prantl [Mon, 11 Dec 2017 20:43:21 +0000 (20:43 +0000)]
ASAN: Provide reliable debug info for local variables at -O0.
The function stack poisioner conditionally stores local variables
either in an alloca or in malloc'ated memory, which has the
unfortunate side-effect, that the actual address of the variable is
only materialized when the variable is accessed, which means that
those variables are mostly invisible to the debugger even when
compiling without optimizations.
This patch stores the address of the local stack base into an alloca,
which can be referred to by the debug info and is available throughout
the function. This adds one extra pointer-sized alloca to each stack
frame (but mem2reg can optimize it away again when optimizations are
enabled, yielding roughly the same debug info quality as before in
optimized code).
Tony Jiang [Mon, 11 Dec 2017 20:42:37 +0000 (20:42 +0000)]
[PowerPC] Partially enable the ISEL expansion pass.
The pass to expand ISEL instructions into if-then-else sequences in patch D23630
is currently disabled. This patch partially enable it by always removing the
unnecessary ISELs (all registers used by the ISELs are the same one) and folding
the ISELs which have the same input registers into unconditional copies.
Justin Bogner [Mon, 11 Dec 2017 19:53:23 +0000 (19:53 +0000)]
[cmake] Pass TARGETS_TO_BUILD through to host tools build
In r319620, the host build was changed to use Native for
TARGETS_TO_BUILD because passing semicolons through add_custom_command
is surprisingly difficult. However, Native really doesn't make any
sense here, and it only works because we don't technically do any
codegen in the host tools so pretty well anything will "work".
The problem here is that passing something other than the correct
value is very fragile - as evidence note how the llvm-config in the
host tools acts differently than the target one now, and misreports
the targets to build. Similarly, if there is any logic conditional on
the targets in tablegen (now or in the future), it will do the wrong
thing.
To fix this, we need to escape the semicolons in the targets string
and pass it through to the child cmake invocation.
In all cases except for this optimistic attempt to reuse memory, the
moved-from TinyPtrVector was left `empty()` at the end of this
assignment. Though using a container after it's been moved from can be a
bit sketchy, it's probably best to just be consistent here.
Alexey Bataev [Mon, 11 Dec 2017 19:11:16 +0000 (19:11 +0000)]
[InstCombine] Fix PR35618: Instcombine hangs on single minmax load bitcast.
Summary:
If we have pattern `store (load(bitcast(select (cmp(V1, V2), &V1,
&V2)))), bitcast)`, but the load is used in other instructions, it leads
to looping in InstCombiner. Patch adds additional check that all users
of the load instructions are stores and then replaces all uses of load
instruction by the new one with new type.
[dwarfdump] Fix off-by-one bug in accelerator table extractor.
This fixes a bug where the verifier was complaining about empty
accelerator tables. When the table is empty, its size is not a valid
offset as it points after the end of the section.
This patch also makes the extractor return llvm:Error instead of bool
for better error reporting in the verifier.
Amara Emerson [Mon, 11 Dec 2017 16:58:29 +0000 (16:58 +0000)]
[GlobalISel] Disable GISel for big endian.
This is due to PR26161 needing to be resolved before we can fix
big endian bugs like PR35359. The work to split aggregates into smaller LLTs
instead of using one large scalar will take some time, so in the mean time
we'll fall back to SDAG.
Tony Tye [Mon, 11 Dec 2017 15:35:27 +0000 (15:35 +0000)]
[AMDGPU] Rename Bonaire target to be gfx704; update target feature handling
- Rename Bonaire target to be gfx704.
- Eliminate gfx800 and make Iceland and Tonga both use gfx802 as they use the same code.
- List target features supported by each processor in the processor table together with the default value.
- Add xnack flag to e_flags.
- Remove xnack from kernel metadata and kernel descriptor since it is now a whole code object property.
Sanjay Patel [Mon, 11 Dec 2017 15:19:31 +0000 (15:19 +0000)]
[DAGCombiner] protect against an infinite loop between shl <--> mul (PR35579)
At first, I tried to thread the x86 needle and use a target hook (isVectorShiftByScalarCheap())
to disable the transform only for non-splat pow-of-2 constants, but not AVX2, but only some
element types, but...it's difficult.
Here we just avoid the loop with the x86 vector transform that conflicts with the general DAG
combine and preserve all of the existing behavior AFAICT otherwise.
Some tests that will probably fail if someone does try to restrict this in a more targeted way
for x86-only may be found in:
This patch introduces getShadowOriginPtr(), a method that obtains both the shadow and origin pointers for an address as a Value pair.
The existing callers of getShadowPtr() and getOriginPtr() are updated to use getShadowOriginPtr().
The rationale for this change is to simplify KMSAN instrumentation implementation.
In KMSAN origins tracking is always enabled, and there's no direct mapping between the app memory and the shadow/origin pages.
Both the shadow and the origin pointer for a given address are obtained by calling a single runtime hook from the instrumentation,
therefore it's easier to work with those pointers together.
[Hexagon] Crash in instruction selection for insert_vector_elt for HVX
A wrong type was passed to insertVector, causing an out-of-bounds value
to be added an an operand to HexagonISD::INSERT. This later failed in
instruction selection.
Nemanja Ivanovic [Mon, 11 Dec 2017 14:35:48 +0000 (14:35 +0000)]
[PowerPC] Sign-extend negative constant stores
Second part of https://reviews.llvm.org/D40348.
Revision r318436 has extended all constants feeding a store to 64 bits
to allow for CSE on the SDAG. However, negative constants were zero extended
which made the constant being loaded appear to be a positive value larger than
16 bits. This resulted in long sequences to materialize such constants
rather than simply a "load immediate". This patch just sign-extends those
updated constants so that they remain 16-bit signed immediates if they started
out that way.