Craig Topper [Thu, 14 Dec 2017 08:25:58 +0000 (08:25 +0000)]
[SelectionDAG][X86] Improve legalization of v32i1 CONCAT_VECTORS of v16i1 for AVX512F.
A v32i1 CONCAT_VECTORS of v16i1 uses promotion to v32i8 to legalize the v32i1. This results in a bunch of extract_vector_elts and a build_vector that ultimately gets scalarized.
This patch checks to see if v16i8 is legal and inserts a any_extend to that so that we can concat v16i8 to v32i8 and avoid creating the extracts.
Dorit Nuzman [Thu, 14 Dec 2017 07:56:31 +0000 (07:56 +0000)]
[LV] Support efficient vectorization of an induction with redundant casts
D30041 extended SCEVPredicateRewriter to improve handling of Phi nodes whose
update chain involves casts; PSCEV can now build an AddRecurrence for some
forms of such phi nodes, under the proper runtime overflow test. This means
that we can identify such phi nodes as an induction, and the loop-vectorizer
can now vectorize such inductions, however inefficiently. The vectorizer
doesn't know that it can ignore the casts, and so it vectorizes them.
This patch records the casts in the InductionDescriptor, so that they could
be marked to be ignored for cost calculation (we use VecValuesToIgnore for
that) and ignored for vectorization/widening/scalarization (i.e. treated as
TriviallyDead).
In addition to marking all these casts to be ignored, we also need to make
sure that each cast is mapped to the right vector value in the vector loop body
(be it a widened, vectorized, or scalarized induction). So whenever an
induction phi is mapped to a vector value (during vectorization/widening/
scalarization), we also map the respective cast instruction (if exists) to that
vector value. (If the phi-update sequence of an induction involves more than one
cast, then the above mapping to vector value is relevant only for the last cast
of the sequence as we allow only the "last cast" to be used outside the
induction update chain itself).
Gadi Haber [Thu, 14 Dec 2017 07:26:08 +0000 (07:26 +0000)]
[X86][AES]: Adding full coverage of MC encoding for the AES and AVXAES isa sets.<NFC>
NFC.
Adding MC regressions tests to cover the AES and AVXAES ISA sets both 32 and 64 bit.
This patch is part of a larger task to cover MC encoding of all X86 ISA Sets.
started in revision: https://reviews.llvm.org/D39952
Craig Topper [Thu, 14 Dec 2017 06:49:07 +0000 (06:49 +0000)]
[SelectionDAG] When legalizing the result type of CONCAT_VECTORS, take into account whether the input type also needs to be promoted.
If so go ahead and get the promoted input vector to extract from. Previously, we would create a bunch of any_extends of extract_vector_elts with illegal input type that needs to be promoted. The legalization of those extract_vector_elts would then potentially introduce a truncate. So now we have a bunch of any_extends of truncates. By legalizing both parts together we avoid creating these extra nodes.
The test changes seem to be because we were previously combining the build_vector with the any_extend before the any_extend got combined with the truncate.
Matthias Braun [Thu, 14 Dec 2017 03:59:24 +0000 (03:59 +0000)]
MC/AsmPrinter: Reduce code duplication.
Factor out duplicated code emitting mach-o version-min specifiers.
This should be NFC but happens to fix a bug where the code in
MCMachoStreamer didn't take the version skew between darwin and macos
versions into account.
Matthias Braun [Thu, 14 Dec 2017 00:12:46 +0000 (00:12 +0000)]
MC: Add support for mach-o build_version
LC_BUILD_VERSION is a new load command superseding the previously used
LC_XXX_MIN_VERSION commands. This adds an assembler directive along with
encoding/streaming support.
Shoaib Meenai [Wed, 13 Dec 2017 23:38:12 +0000 (23:38 +0000)]
[cmake] Add support for case-sensitive Windows SDKs
When the Windows SDK is hosted on a case-sensitive filesystem (e.g. when
compiling on Linux and not using ciopfs), we can automatically generate
a VFS overlay for headers and symlinks for libraries.
Craig Topper [Wed, 13 Dec 2017 23:11:30 +0000 (23:11 +0000)]
Recommit r320461 "[X86] Use regular expressions more aggressively to reduce the number of scheduler entries needed for FMA3 instructions."
I've hopefully sidestepped the MSVC issue that caused it to be reverted. We no longer include the Sched enum from X86GenInstrInfo.inc on the X86 target. So hopefully MSVC's preprocessor will skip over it and nothing will notice the 11000 character enum name.
Original commit message:
When the scheduler tables are generated by tablegen, the instructions are divided up into groups based on their default scheduling information and how they are referenced by groups for each processor. For any set of instructions that are matched by a specific InstRW line, that group of instructions is guaranteed to not be in a group with any other instructions. So in general, the more InstRW class definitions are created, the more groups we end up with in the generated files. Particularly if a lot of the InstRW lines only match to single instructions, which is true of a large number of the Intel scheduler models.
This change alone reduces the number of instructions groups from ~6000 to ~5500. And there's lots more we could do.
Yaxun Liu [Wed, 13 Dec 2017 22:38:09 +0000 (22:38 +0000)]
CodeGen: Fix assertion in machine inst sheduler due to llvm.dbg.value
Two issues were found about machine inst scheduler when compiling ProRender
with -g for amdgcn target:
GCNScheduleDAGMILive::schedule tries to update LiveIntervals for DBG_VALUE, which it
should not since DBG_VALUE is not mapped in LiveIntervals.
when DBG_VALUE is the last instruction of MBB, ScheduleDAGInstrs::buildSchedGraph and
ScheduleDAGMILive::scheduleMI does not move RPTracker properly, which causes assertion.
Zachary Turner [Wed, 13 Dec 2017 22:33:58 +0000 (22:33 +0000)]
[CodeView] Teach clang to emit the .debug$H COFF section.
Currently this is an LLVM extension to the COFF spec which is
experimental and intended to speed up linking. For now it is
behind a hidden cl::opt flag, but in the future we can move it
to a "real" cc1 flag and have the driver pass it through whenever
it is appropriate.
The patch to actually make use of this section in lld will come
in a followup.
Sanjay Patel [Wed, 13 Dec 2017 21:58:15 +0000 (21:58 +0000)]
[EarlyCSE] recognize commuted and swapped variants of min/max as equivalent (PR35642)
As shown in:
https://bugs.llvm.org/show_bug.cgi?id=35642
...we can have different forms of min/max, so we should recognize those here in EarlyCSE
similar to how we already handle binops and compares that can commute.
Shoaib Meenai [Wed, 13 Dec 2017 21:12:37 +0000 (21:12 +0000)]
[cmake] Explicitly set VS 2017 compatibility
When cross-compiling using clang-cl 5.0 (which is currently the latest
stable release of the compiler), the default MS compatibility level is
set to VS 2013, which is too low to build LLVM. Explicitly set the
compatibility level to VS 2017 to support cross-compiling LLVM for
Windows using clang-cl 5.0. This will be a no-op when using clang-cl 6.0
and above, where the default MS compatibility level is already VS 2017.
Shoaib Meenai [Wed, 13 Dec 2017 21:11:14 +0000 (21:11 +0000)]
[cmake] Determine MSVC host triple correctly when cross-compiling
CMAKE_CL_64 will never be set when cross-compiling with clang-cl, since
CMake relies on an actual VS environment in order to determine it.
Instead, use the size of a void pointer to determine the bit width of
the host compiler (and therefore the host triple), which works for both
native and cross compilation.
Note that, with the impending advent of Windows on AArch64, assuming
that a 64-bit host == x86_64 isn't correct either, but that's something
to be addressed in a follow-up.
Matt Arsenault [Wed, 13 Dec 2017 21:07:51 +0000 (21:07 +0000)]
AMDGPU: Partially fix disassembly of MIMG instructions
Stores failed to decode at all since they didn't have a
DecoderNamespace set. Loads worked, but did not change
the register width displayed to match the numbmer of
enabled channels.
The number of printed registers for vaddr is still wrong,
but I don't think that's encoded in the instruction so
there's not much we can do about that.
Image atomics are still broken. MIMG is the same
encoding for SI/VI, but the image atomic classes
are split up into encoding specific versions unlike
every other MIMG instruction. They have isAsmParserOnly
set on them for some reason. dmask is also special for
these, so we probably should not have it as an explicit
operand as it is now.
Brian M. Rzycki [Wed, 13 Dec 2017 20:52:26 +0000 (20:52 +0000)]
[JumpThreading] Preservation of DT and LVI across the pass
Summary:
See D37528 for a previous (non-deferred) version of this
patch and its description.
Preserves dominance in a deferred manner using a new class
DeferredDominance. This reduces the performance impact of
updating the DominatorTree at every edge insertion and
deletion. A user may call DDT->flush() within JumpThreading
for an up-to-date DT. This patch currently has one flush()
at the end of runImpl() to ensure DT is preserved across
the pass.
LVI is also preserved to help subsequent passes such as
CorrelatedValuePropagation. LVI is simpler to maintain and
is done immediately (not deferred). The code to perfom the
preversation was minimally altered and was simply marked
as preserved for the PassManager to be informed.
This extends the analysis available to JumpThreading for
future enhancements. One example is loop boundary threading.
w.r.t. the paper
"A Practical Improvement to the Partial Redundancy Elimination in SSA Form"
(https://sites.google.com/site/jongsoopark/home/ssapre.pdf)
Proper dominance check was missing here, so having a loopinfo should not be required.
Committing this diff as this fixes the bug, if there are
further concerns, I'll be happy to work on them.
[dsymutil][test] Fix failing test when no lipo binary available
The invocation without -no-output would try to lipo the different debug
objects together. This wouldn't work on platforms that don't provide
that utility.
Threading was disabled in r317263 because it broke a test in combination
with `-DLLVM_ENABLE_THREADS=OFF`. This was because a ThreadPool warning
was piped to llvm-dwarfdump which was expecting to read an object from
stdin.
This patch re-enables threading and fixes the offending test.
Unfortunately this required more than just moving the ThreadPool out of
the for loop because of the TempFile refactoring that took place in the
meantime.
Nemanja Ivanovic [Wed, 13 Dec 2017 14:47:35 +0000 (14:47 +0000)]
[PowerPC] MachineSSA pass to reduce the number of CR-logical operations
The initial implementation of an MI SSA pass to reduce cr-logical operations.
Currently, the only operations handled by the pass are binary operations where
both CR-inputs come from the same block and the single use is a conditional
branch (also in the same block).
Committing this off by default to allow for a period of field testing. Will
enable it by default in a follow-up patch soon.
Alex Bradbury [Wed, 13 Dec 2017 12:46:55 +0000 (12:46 +0000)]
[RISCV] Define sfence.vma InstAliases to match the GNU RISC-V tools
Unfortunately these aren't defined explicitly in the privileged spec, but the
GNU assembler does accept `sfence.vma` and `sfence.vma rs` as well as the
usual `sfence.vma rs, rt`.
Alex Bradbury [Wed, 13 Dec 2017 11:37:19 +0000 (11:37 +0000)]
[RISCV] Implement floating point assembler pseudo instructions
Adds the assembler aliases for the floating point instructions
which can be mapped to a single canonical instruction. The missing
pseudo instructions (flw, fld, fsw, fsd) are marked as TODO. Other
things, like for example PCREL_LO, have to be implemented first.
[CodeGen] Print target index operands as target-index(target-specific) + 8 in both MIR and debug output
Work towards the unification of MIR and debug output by printing `target-index(target-specific) + 8` instead of `<ti#0+8>` and `target-index(target-specific) + 8` instead of `<ti#0-8>`.
Pavel Labath [Wed, 13 Dec 2017 10:00:38 +0000 (10:00 +0000)]
[Testing/Support] Make the HasValue matcher composable
Summary:
This makes it possible to run an arbitrary matcher on the value
contained within the Expected<T> object.
To do this, I've needed to fully spell out the matcher, instead of using
the shorthand MATCHER_P macro.
The slight gotcha here is that standard template deduction will fail if
one tries to match HasValue(47) against an Expected<int &> -- the
workaround is to use HasValue(testing::Eq(47)).
The explanations produced by this matcher have changed a bit, since now
we delegate to the nested matcher to print the value. Since these don't
put quotes around the value, I've changed our PrintTo methods to match.
Gadi Haber [Wed, 13 Dec 2017 09:13:53 +0000 (09:13 +0000)]
[X86][BMI]: Adding full coverage of MC encoding for the BMI isa set.<NFC>
NFC.
Adding MC regressions tests to cover the BMI1 and BMI2 ISA sets both 32 and 64 bit.
This patch is part of a larger task to cover MC encoding of all X86 ISA Sets.
started in revision: https://reviews.llvm.org/D39952
Alex Bradbury [Wed, 13 Dec 2017 09:02:13 +0000 (09:02 +0000)]
[cmake] Fix host tools build in when LLVM_EXPERIMENTAL_TARGETS_TO_BUILD is set
r320413 triggered cmake configure failures when building with
-DLLVM_OPTIMIZED_TABLEGEN=True and with LLVM_EXPERIMENTAL_TARGETS_TO_BUILD set
(e.g. to RISCV). This is because that patch moved to passing through
LLVM_TARGETS_TO_BUILD, and at that point LLVM_EXPERIMENTAL_TARGETS_TO_BUILD
has been merged in to it. LLVM_EXPERIMENTAL_TARGETS_TO_BUILD must be also be
passed through to avoid errors like below:
-- Constructing LLVMBuild project information
CMake Error at CMakeLists.txt:682 (message):
The target `RISCV' does not exist.
Craig Topper [Wed, 13 Dec 2017 07:26:17 +0000 (07:26 +0000)]
[Targets] Don't automatically include the scheduler class enum from *GenInstrInfo.inc with GET_INSTRINFO_ENUM. Make targets request is separately.
Most of the targets don't need the scheduler class enum.
I have an X86 scheduler model change that causes some names in the enum to become about 18000 characters long. This is because using instregex in scheduler models causes the scheduler class to get named with every instruction that matches the regex concatenated together. MSVC has a limit of 4096 characters for an identifier name. Rather than trying to come up with way to reduce the name length, I'm just going to sidestep the problem by not including the enum in X86.
This change makes XRay print the log file output only when the verbosity
level is higher than 0. It reduces the log spam in the default case when
we want XRay running silently, except when there are actual
fatal/serious errors.
We also update the documentation to show how to get the information
after the change to the default behaviour.
Serguei Katkov [Wed, 13 Dec 2017 05:32:46 +0000 (05:32 +0000)]
[NFC] Refactor SafepointIRVerifier
Now two classes are responsible for verification: one of them can track GC
pointers and know whether a pointer is relocated or not and another based on
that information can verify uses of GC pointers.
Mohammad Shahid [Wed, 13 Dec 2017 03:08:29 +0000 (03:08 +0000)]
[SLP] Vectorize jumbled memory loads.
Summary:
This patch tries to vectorize loads of consecutive memory accesses, accessed
in non-consecutive or jumbled way. An earlier attempt was made with patch D26905
which was reverted back due to some basic issue with representing the 'use mask' of
jumbled accesses.
This patch fixes the mask representation by recording the 'use mask' in the usertree entry.
Florian Hahn [Wed, 13 Dec 2017 03:05:20 +0000 (03:05 +0000)]
[CallSiteSplitting] Refactor creating callsites.
Summary:
This change makes the call site creation more general if any of the
arguments is predicated on a condition in the call site's predecessors.
If we find a callsite, that potentially can be split, we collect the set
of conditions for the call site's predecessors (currently only 2
predecessors are allowed). To do that, we traverse each predecessor's
predecessors as long as it only has single predecessors and record the
condition, if it is relevant to the call site. For each condition, we
also check if the condition is taken or not. In case it is not taken,
we record the inverse predicate.
We use the recorded conditions to create the new call sites and split
the basic block.
This has 2 benefits: (1) it is slightly easier to see what is going on
(IMO) and (2) we can easily extend it to handle more complex control
flow.
Michael Trent [Tue, 12 Dec 2017 23:53:46 +0000 (23:53 +0000)]
Updated llvm-objdump to display local relocations in Mach-O binaries
Summary:
llvm-objdump's Mach-O parser was updated in r306037 to display external
relocations for MH_KEXT_BUNDLE file types. This change extends the Macho-O
parser to display local relocations for MH_PRELOAD files. When used with
the -macho option relocations will be displayed in a historical format.