Shoaib Meenai [Wed, 13 Dec 2017 21:12:37 +0000 (21:12 +0000)]
[cmake] Explicitly set VS 2017 compatibility
When cross-compiling using clang-cl 5.0 (which is currently the latest
stable release of the compiler), the default MS compatibility level is
set to VS 2013, which is too low to build LLVM. Explicitly set the
compatibility level to VS 2017 to support cross-compiling LLVM for
Windows using clang-cl 5.0. This will be a no-op when using clang-cl 6.0
and above, where the default MS compatibility level is already VS 2017.
Shoaib Meenai [Wed, 13 Dec 2017 21:11:14 +0000 (21:11 +0000)]
[cmake] Determine MSVC host triple correctly when cross-compiling
CMAKE_CL_64 will never be set when cross-compiling with clang-cl, since
CMake relies on an actual VS environment in order to determine it.
Instead, use the size of a void pointer to determine the bit width of
the host compiler (and therefore the host triple), which works for both
native and cross compilation.
Note that, with the impending advent of Windows on AArch64, assuming
that a 64-bit host == x86_64 isn't correct either, but that's something
to be addressed in a follow-up.
Matt Arsenault [Wed, 13 Dec 2017 21:07:51 +0000 (21:07 +0000)]
AMDGPU: Partially fix disassembly of MIMG instructions
Stores failed to decode at all since they didn't have a
DecoderNamespace set. Loads worked, but did not change
the register width displayed to match the numbmer of
enabled channels.
The number of printed registers for vaddr is still wrong,
but I don't think that's encoded in the instruction so
there's not much we can do about that.
Image atomics are still broken. MIMG is the same
encoding for SI/VI, but the image atomic classes
are split up into encoding specific versions unlike
every other MIMG instruction. They have isAsmParserOnly
set on them for some reason. dmask is also special for
these, so we probably should not have it as an explicit
operand as it is now.
Brian M. Rzycki [Wed, 13 Dec 2017 20:52:26 +0000 (20:52 +0000)]
[JumpThreading] Preservation of DT and LVI across the pass
Summary:
See D37528 for a previous (non-deferred) version of this
patch and its description.
Preserves dominance in a deferred manner using a new class
DeferredDominance. This reduces the performance impact of
updating the DominatorTree at every edge insertion and
deletion. A user may call DDT->flush() within JumpThreading
for an up-to-date DT. This patch currently has one flush()
at the end of runImpl() to ensure DT is preserved across
the pass.
LVI is also preserved to help subsequent passes such as
CorrelatedValuePropagation. LVI is simpler to maintain and
is done immediately (not deferred). The code to perfom the
preversation was minimally altered and was simply marked
as preserved for the PassManager to be informed.
This extends the analysis available to JumpThreading for
future enhancements. One example is loop boundary threading.
w.r.t. the paper
"A Practical Improvement to the Partial Redundancy Elimination in SSA Form"
(https://sites.google.com/site/jongsoopark/home/ssapre.pdf)
Proper dominance check was missing here, so having a loopinfo should not be required.
Committing this diff as this fixes the bug, if there are
further concerns, I'll be happy to work on them.
[dsymutil][test] Fix failing test when no lipo binary available
The invocation without -no-output would try to lipo the different debug
objects together. This wouldn't work on platforms that don't provide
that utility.
Threading was disabled in r317263 because it broke a test in combination
with `-DLLVM_ENABLE_THREADS=OFF`. This was because a ThreadPool warning
was piped to llvm-dwarfdump which was expecting to read an object from
stdin.
This patch re-enables threading and fixes the offending test.
Unfortunately this required more than just moving the ThreadPool out of
the for loop because of the TempFile refactoring that took place in the
meantime.
Nemanja Ivanovic [Wed, 13 Dec 2017 14:47:35 +0000 (14:47 +0000)]
[PowerPC] MachineSSA pass to reduce the number of CR-logical operations
The initial implementation of an MI SSA pass to reduce cr-logical operations.
Currently, the only operations handled by the pass are binary operations where
both CR-inputs come from the same block and the single use is a conditional
branch (also in the same block).
Committing this off by default to allow for a period of field testing. Will
enable it by default in a follow-up patch soon.
Alex Bradbury [Wed, 13 Dec 2017 12:46:55 +0000 (12:46 +0000)]
[RISCV] Define sfence.vma InstAliases to match the GNU RISC-V tools
Unfortunately these aren't defined explicitly in the privileged spec, but the
GNU assembler does accept `sfence.vma` and `sfence.vma rs` as well as the
usual `sfence.vma rs, rt`.
Alex Bradbury [Wed, 13 Dec 2017 11:37:19 +0000 (11:37 +0000)]
[RISCV] Implement floating point assembler pseudo instructions
Adds the assembler aliases for the floating point instructions
which can be mapped to a single canonical instruction. The missing
pseudo instructions (flw, fld, fsw, fsd) are marked as TODO. Other
things, like for example PCREL_LO, have to be implemented first.
[CodeGen] Print target index operands as target-index(target-specific) + 8 in both MIR and debug output
Work towards the unification of MIR and debug output by printing `target-index(target-specific) + 8` instead of `<ti#0+8>` and `target-index(target-specific) + 8` instead of `<ti#0-8>`.
Pavel Labath [Wed, 13 Dec 2017 10:00:38 +0000 (10:00 +0000)]
[Testing/Support] Make the HasValue matcher composable
Summary:
This makes it possible to run an arbitrary matcher on the value
contained within the Expected<T> object.
To do this, I've needed to fully spell out the matcher, instead of using
the shorthand MATCHER_P macro.
The slight gotcha here is that standard template deduction will fail if
one tries to match HasValue(47) against an Expected<int &> -- the
workaround is to use HasValue(testing::Eq(47)).
The explanations produced by this matcher have changed a bit, since now
we delegate to the nested matcher to print the value. Since these don't
put quotes around the value, I've changed our PrintTo methods to match.
Gadi Haber [Wed, 13 Dec 2017 09:13:53 +0000 (09:13 +0000)]
[X86][BMI]: Adding full coverage of MC encoding for the BMI isa set.<NFC>
NFC.
Adding MC regressions tests to cover the BMI1 and BMI2 ISA sets both 32 and 64 bit.
This patch is part of a larger task to cover MC encoding of all X86 ISA Sets.
started in revision: https://reviews.llvm.org/D39952
Alex Bradbury [Wed, 13 Dec 2017 09:02:13 +0000 (09:02 +0000)]
[cmake] Fix host tools build in when LLVM_EXPERIMENTAL_TARGETS_TO_BUILD is set
r320413 triggered cmake configure failures when building with
-DLLVM_OPTIMIZED_TABLEGEN=True and with LLVM_EXPERIMENTAL_TARGETS_TO_BUILD set
(e.g. to RISCV). This is because that patch moved to passing through
LLVM_TARGETS_TO_BUILD, and at that point LLVM_EXPERIMENTAL_TARGETS_TO_BUILD
has been merged in to it. LLVM_EXPERIMENTAL_TARGETS_TO_BUILD must be also be
passed through to avoid errors like below:
-- Constructing LLVMBuild project information
CMake Error at CMakeLists.txt:682 (message):
The target `RISCV' does not exist.
Craig Topper [Wed, 13 Dec 2017 07:26:17 +0000 (07:26 +0000)]
[Targets] Don't automatically include the scheduler class enum from *GenInstrInfo.inc with GET_INSTRINFO_ENUM. Make targets request is separately.
Most of the targets don't need the scheduler class enum.
I have an X86 scheduler model change that causes some names in the enum to become about 18000 characters long. This is because using instregex in scheduler models causes the scheduler class to get named with every instruction that matches the regex concatenated together. MSVC has a limit of 4096 characters for an identifier name. Rather than trying to come up with way to reduce the name length, I'm just going to sidestep the problem by not including the enum in X86.
This change makes XRay print the log file output only when the verbosity
level is higher than 0. It reduces the log spam in the default case when
we want XRay running silently, except when there are actual
fatal/serious errors.
We also update the documentation to show how to get the information
after the change to the default behaviour.
Serguei Katkov [Wed, 13 Dec 2017 05:32:46 +0000 (05:32 +0000)]
[NFC] Refactor SafepointIRVerifier
Now two classes are responsible for verification: one of them can track GC
pointers and know whether a pointer is relocated or not and another based on
that information can verify uses of GC pointers.
Mohammad Shahid [Wed, 13 Dec 2017 03:08:29 +0000 (03:08 +0000)]
[SLP] Vectorize jumbled memory loads.
Summary:
This patch tries to vectorize loads of consecutive memory accesses, accessed
in non-consecutive or jumbled way. An earlier attempt was made with patch D26905
which was reverted back due to some basic issue with representing the 'use mask' of
jumbled accesses.
This patch fixes the mask representation by recording the 'use mask' in the usertree entry.
Florian Hahn [Wed, 13 Dec 2017 03:05:20 +0000 (03:05 +0000)]
[CallSiteSplitting] Refactor creating callsites.
Summary:
This change makes the call site creation more general if any of the
arguments is predicated on a condition in the call site's predecessors.
If we find a callsite, that potentially can be split, we collect the set
of conditions for the call site's predecessors (currently only 2
predecessors are allowed). To do that, we traverse each predecessor's
predecessors as long as it only has single predecessors and record the
condition, if it is relevant to the call site. For each condition, we
also check if the condition is taken or not. In case it is not taken,
we record the inverse predicate.
We use the recorded conditions to create the new call sites and split
the basic block.
This has 2 benefits: (1) it is slightly easier to see what is going on
(IMO) and (2) we can easily extend it to handle more complex control
flow.
Michael Trent [Tue, 12 Dec 2017 23:53:46 +0000 (23:53 +0000)]
Updated llvm-objdump to display local relocations in Mach-O binaries
Summary:
llvm-objdump's Mach-O parser was updated in r306037 to display external
relocations for MH_KEXT_BUNDLE file types. This change extends the Macho-O
parser to display local relocations for MH_PRELOAD files. When used with
the -macho option relocations will be displayed in a historical format.
Alexey Bataev [Tue, 12 Dec 2017 20:28:46 +0000 (20:28 +0000)]
[InstCombine] Fix PR35618: Instcombine hangs on single minmax load bitcast.
Summary:
If we have pattern `store (load(bitcast(select (cmp(V1, V2), &V1,
&V2)))), bitcast)`, but the load is used in other instructions, it leads
to looping in InstCombiner. Patch adds additional check that all users
of the load instructions are stores and then replaces all uses of load
instruction by the new one with new type.
Shuffle generation uses vmux to collapse vectors resulting from two
individual shuffles into one. The indexes of the elements selected
from the first operand were indicated by 0xFF in the constant vector
used in the compare instruction, but the compare (veqb) set the bits
corresponding to the 0x00 elements, thus inverting the selection.
Reverse the order of operands to vmux to get the correct output.
Sanjoy Das [Tue, 12 Dec 2017 19:11:31 +0000 (19:11 +0000)]
Reapply "[X86] Flag BroadWell scheduler model as complete"
This reverts commit r320508, in effect re-applying r320308. Simon has already
reverted the parts that caused the crash that motivated the revert in r320492.
Hiroshi Yamauchi [Tue, 12 Dec 2017 19:07:43 +0000 (19:07 +0000)]
Split IndirectBr critical edges before PGO gen/use passes.
Summary:
The PGO gen/use passes currently fail with an assert failure if there's a
critical edge whose source is an IndirectBr instruction and that edge
needs to be instrumented.
To avoid this in certain cases, split IndirectBr critical edges in the PGO
gen/use passes. This works for blocks with single indirectbr predecessors,
but not for those with multiple indirectbr predecessors (splitting an
IndirectBr critical edge isn't always possible.)
Alexey Bataev [Tue, 12 Dec 2017 18:47:00 +0000 (18:47 +0000)]
[InstCombine] Fix PR35618: Instcombine hangs on single minmax load bitcast.
Summary:
If we have pattern `store (load(bitcast(select (cmp(V1, V2), &V1,
&V2)))), bitcast)`, but the load is used in other instructions, it leads
to looping in InstCombiner. Patch adds additional check that all users
of the load instructions are stores and then replaces all uses of load
instruction by the new one with new type.
Craig Topper [Tue, 12 Dec 2017 18:39:04 +0000 (18:39 +0000)]
[X86] Add a couple TODOs about missing coverage/features motivated by D40335
D40335 was wanting to add FMSUBADD support, but it discovered that there are two pieces of code to make FMADDSUB and only one of those is tested. So I've asked that review to implement the one path until we get tests that test the existing code.
Nirav Dave [Tue, 12 Dec 2017 18:25:48 +0000 (18:25 +0000)]
[X86] Cleanup type conversion of 64-bit load-store pairs.
Summary:
Simplify and generalize chain handling and search for 64-bit load-store pairs.
Nontemporal test now converts 64-bit integer load-store into f64 which it realizes directly instead of splitting into two i32 pairs.
Geoff Berry [Tue, 12 Dec 2017 17:53:59 +0000 (17:53 +0000)]
[MachineOperand][MIR] Add isRenamable to MachineOperand.
Summary:
Add isRenamable() predicate to MachineOperand. This predicate can be
used by machine passes after register allocation to determine whether it
is safe to rename a given register operand. Register operands that
aren't marked as renamable may be required to be assigned their current
register to satisfy constraints that are not captured by the machine
IR (e.g. ABI or ISA constraints).
Alexey Bataev [Tue, 12 Dec 2017 17:19:15 +0000 (17:19 +0000)]
[InstCombine] Fix PR35618: Instcombine hangs on single minmax load bitcast.
Summary:
If we have pattern `store (load(bitcast(select (cmp(V1, V2), &V1,
&V2)))), bitcast)`, but the load is used in other instructions, it leads
to looping in InstCombiner. Patch adds additional check that all users
of the load instructions are stores and then replaces all uses of load
instruction by the new one with new type.
Alexey Bataev [Tue, 12 Dec 2017 16:58:48 +0000 (16:58 +0000)]
[InstCombine] Fix PR35618: Instcombine hangs on single minmax load bitcast.
Summary:
If we have pattern `store (load(bitcast(select (cmp(V1, V2), &V1,
&V2)))), bitcast)`, but the load is used in other instructions, it leads
to looping in InstCombiner. Patch adds additional check that all users
of the load instructions are stores and then replaces all uses of load
instruction by the new one with new type.
Simon Pilgrim [Tue, 12 Dec 2017 16:12:53 +0000 (16:12 +0000)]
[X86] Remove CompleteModel tags from CPU targets until we have better error checking (PR35636)
The checks we have for complete models are not great and miss many cases - e.g. in PR35636 it failed to recognise that only the first output (of 2) was actually tagged by the InstRW
Alexey Bataev [Tue, 12 Dec 2017 15:54:49 +0000 (15:54 +0000)]
[InstCombine] Fix PR35618: Instcombine hangs on single minmax load bitcast.
Summary:
If we have pattern `store (load(bitcast(select (cmp(V1, V2), &V1,
&V2)))), bitcast)`, but the load is used in other instructions, it leads
to looping in InstCombiner. Patch adds additional check that all users
of the load instructions are stores and then replaces all uses of load
instruction by the new one with new type.
Alex Bradbury [Tue, 12 Dec 2017 15:46:15 +0000 (15:46 +0000)]
[RISCV] Implement assembler pseudo instructions for RV32I and RV64I
Adds the assembler pseudo instructions of RV32I and RV64I which can
be mapped to a single canonical instruction. The missing pseudo
instructions (e.g., call, tail, ...) are marked as TODO. Other
things, like for example PCREL_LO, have to be implemented first.
Currently, alias emission is disabled by default to keep the patch
minimal. Alias emission by default will be enabled in a subsequent
patch which also updates all affected tests. Note that this patch
should actually break the floating point MC tests. However, the
used FileCheck configuration is not tight enought to detect the
breakage.
Alex Bradbury [Tue, 12 Dec 2017 15:17:45 +0000 (15:17 +0000)]
[RISCV] MC layer support for the instructions added in the privileged spec
Adds support for the instructions added in the RISC-V privileged ISA
(https://content.riscv.org/wp-content/uploads/2017/05/riscv-privileged-v1.10.pdf):
uret, sret, mret, wfi, and sfence.vma.
Note from the committer: I made very minor formatting changes prior to commit,
which didn't seem worth creating another review round-trip for.
Alexey Bataev [Tue, 12 Dec 2017 15:03:17 +0000 (15:03 +0000)]
[InstCombine] Fix PR35618: Instcombine hangs on single minmax load bitcast.
If we have pattern `store (load(bitcast(select (cmp(V1, V2), &V1,
&V2)))), bitcast)`, but the load is used in other instructions, it leads
to looping in InstCombiner. Patch adds additional check that all users
of the load instructions are stores and then replaces all uses of load
instruction by the new one with new type.
Ayman Musa [Tue, 12 Dec 2017 14:13:51 +0000 (14:13 +0000)]
[X86] Recognize constant arrays with special values and replace loads from it with subtract and shift instructions, which then will be replaced by X86 BZHI machine instruction.
Recognize constant arrays with the following values:
0x0, 0x1, 0x3, 0x7, 0xF, 0x1F, .... , 2^(size - 1) -1
where //size// is the size of the array.
the result of a load with index //idx// from this array is equivalent to the result of the following:
(0xFFFFFFFF >> (sub 32, idx)) (assuming the array of type 32-bit integer).
And the result of an 'AND' operation on the returned value of such a load and another input, is exactly equivalent to the X86 BZHI instruction behavior.
See test cases in the LIT test for better understanding.
Anna Thomas [Tue, 12 Dec 2017 14:12:33 +0000 (14:12 +0000)]
[InstComineLoadStoreAlloca] Optimize stores to GEP off null base
Summary:
Currently, in InstCombineLoadStoreAlloca, we have simplification
rules for the following cases:
1. load off a null
2. load off a GEP with null base
3. store to a null
This patch adds support for the fourth case which is store into a
GEP with null base. Since this is UB as well (and directly analogous to
the load off a GEP with null base), we can substitute the stored val
with undef in instcombine, so that SimplifyCFG can optimize this code
into unreachable code.
Note: Right now, simplifyCFG hasn't been taught about optimizing
this to unreachable and adding an llvm.trap (this is already done for
the above 3 cases).