Nemanja Ivanovic [Fri, 15 Dec 2017 01:38:03 +0000 (01:38 +0000)]
Disabling r312514 as it causes miscompiles that show up on bootstrap
The compare elimination peephole introduced in https://reviews.llvm.org/rL312514
causes a miscompile in AMDGPUInstrInfo.cpp which in turn causes some AMDGPU
test case failures in stage2 bootstrap testing. This miscompile didn't cause any
test case failures until https://reviews.llvm.org/rL320614, so it appeared as if
that patch caused these failures.
Disabling this transformation for now to bring the build bots back to green and
the author of the patch will investigate the miscompile.
Shoaib Meenai [Fri, 15 Dec 2017 01:05:48 +0000 (01:05 +0000)]
[cmake] Fix clang-cl cross-compilation on macOS
macOS paths usually start with /Users, which clang-cl interprets as a
macro undefine, leading to pretty much everything failing to compile.
CMake should be taught to put a -- in its compilation rules for clang-cl
(and I've been meaning to submit that upstream for a while). In the
meantime, however, and to support older CMake versions, we can just
create a custom make rules override to fix the compilation rules.
Craig Topper [Fri, 15 Dec 2017 01:03:45 +0000 (01:03 +0000)]
[SelectionDAG] Make getNode calls that take an ArrayRef of SDValue for operands call NewSDValueDbgMsg.
This makes it work better with some build_vector and concat_vectors creations.
Adjust the NewSDValueDbgMsg in getConstant to avoid duplicating the print when it calls getSplatBuildVector since getSplatBuildVector didn't trigger a print before.
Craig Topper [Fri, 15 Dec 2017 01:03:43 +0000 (01:03 +0000)]
[X86] Further rearrange the setOperationAction calls to separate the ones that require 512-bit registers OR VLX into separate sections. NFCI
We have several instructions that were introduced in AVX512F that are only available in 512-bit form on KNL. We still make use of them for 128/256 by artificially widening and extracting during isel.
This commit separates these operations from the true 512-bit operations. This way we can qualify the normal 512-bit operations with needing 512-bit register support. And these special operations will get qualified with needing 512-bit registers OR VLX.
The 512-bit register qualification will be introduced in a future patch this just gets everything grouped to minimize deltas on that patch.
Craig Topper [Fri, 15 Dec 2017 01:03:38 +0000 (01:03 +0000)]
[X86] Move some of the hasVLX qualified code out of the main hasAVX512 block in the X86ISelLowering constructor. NFCI
Move it into the separate hasVLX block later in the constructor.
I'm trying to separate 128/256 and 512-bit related code so we can eventually qualify the hasAVX512 block with support for 512-bit vectors required by the prefer-vector-width feature support being talked about in D41096.
Add support for properly handling PIC code with no-PLT. This equates to
`-fpic -fno-plt -O0` with the clang frontend. External functions are
marked with nonlazybind, which must then be indirected through the GOT.
This allows code to be built without optimizations in PIC mode without
going through the PLT. Addresses PR35653!
Zachary Turner [Fri, 15 Dec 2017 00:27:49 +0000 (00:27 +0000)]
Don't crash in llvm-pdbutil when dumping TypeIndexes with high bit set.
This is a special code that indicates that it's a function id.
While I'm still not certain how to interpret these, we definitely
should *not* be using these values as indices into an array directly.
For now, when we encounter one of these, just print the numeric value.
Sam Clegg [Fri, 15 Dec 2017 00:17:10 +0000 (00:17 +0000)]
[WebAssembly] Implement @llvm.global_ctors and @llvm.global_dtors
Summary:
- lowers @llvm.global_dtors by adding @llvm.global_ctors
functions which register the destructors with `__cxa_atexit`.
- impements @llvm.global_ctors with wasm start functions and linker metadata
See [here](https://github.com/WebAssembly/tool-conventions/issues/25) for more background.
Don Hinton [Fri, 15 Dec 2017 00:06:26 +0000 (00:06 +0000)]
[debuginfo] Remove obsolete test_debuginfo.pl that was moved to debuginfo-tests.
Summary:
Now that r320495, "[debuginfo-tests] Support moving
debuginfo-tests to llvm/projects," has landed, which includes a local
copy of test_debuginfo.pl, remove the obsolete copy.
[ProfileData] Use a different data structure to save memory.
This change swaps FunctionSamples to a std::map. This saves us around
17% of the memory required to parse sample profiles. To put hard numbers
on this, clang now eats around 1.3GB of RAM instead of 1.6GB while
parsing a 50MB profile.
The CPU time taken by a large profile merge (3.1GB of data across 226
files) is also reduced by ~11% by this patch (1:09.08 vs 1:01.11).
This was split out at the request of reviewers in D41152.
Adrian Prantl [Thu, 14 Dec 2017 22:55:06 +0000 (22:55 +0000)]
EmitFuncArgumentDbgValue: Prefer stack slots over registers for stack arguments
While investigating LLVM PR22316 (http://llvm.org/bugs/show_bug.cgi?id=22316)
I started wondering if it were not always preferable to emit the
initial DBG_VALUEs for stack arguments as FI locations instead of
describing the first register they get copied into. The advantage of
doing this is that the arguments will be available as soon as the
stack is setup. As illustrated by the testcase in the PR, the first
copy of the FI into a register may be sunk by MachineSink.cpp into a
later basic block. By describing the argument on the stack, we nicely
circumvent this problem.
Zachary Turner [Thu, 14 Dec 2017 22:07:03 +0000 (22:07 +0000)]
Fix many -Wsign-compare and -Wtautological-constant-compare warnings.
Most of the -Wsign-compare warnings are due to the fact that
enums are signed by default in the MS ABI, while the
tautological comparison warnings trigger on x86 builds where
sizeof(size_t) is 4 bytes, so N > numeric_limits<unsigned>::max()
is always false.
Sanjay Patel [Thu, 14 Dec 2017 22:05:20 +0000 (22:05 +0000)]
[SimplifyCFG] don't sink common insts too soon (PR34603)
This should solve:
https://bugs.llvm.org/show_bug.cgi?id=34603
...by preventing SimplifyCFG from altering redundant instructions before early-cse has a chance to run.
It changes the default (canonical-forming) behavior of SimplifyCFG, so we're only doing the
sinking transform later in the optimization pipeline.
Matt Arsenault [Thu, 14 Dec 2017 21:39:51 +0000 (21:39 +0000)]
DAG: Expose all MMO flags in getTgtMemIntrinsic
Rather than adding more bits to express every
MMO flag you could want, just directly use the
MMO flags. Also fixes using a bunch of bool arguments to
getMemIntrinsicNode.
On AMDGPU, buffer and image intrinsics should always
have MODereferencable set, but currently there is no
way to do that directly during the initial intrinsic
lowering.
Sam Clegg [Thu, 14 Dec 2017 21:10:03 +0000 (21:10 +0000)]
[WebAssembly] Add support for init functions linking metadata
Summary:
This change lays the groundwork lowering of @llvm.global_ctors
and @llvm.global_dtors for the wasm object format. Some parts
of this patch are subset of: https://reviews.llvm.org/D40759
See https://github.com/WebAssembly/tool-conventions/issues/25
Apparently this doesn't cover all the bases, so some compilers
and standard libraries still think this is not trivially copyable
even though it is. Reverting this back to an MSVC-only check for
now so that at least we have some coverage.
Guozhi Wei [Thu, 14 Dec 2017 19:35:43 +0000 (19:35 +0000)]
[SLPVectorizer] Don't ignore scalar extraction instructions of aggregate value
In SLPVectorizer, the vector build instructions (insertvalue for aggregate type) is passed to BoUpSLP.buildTree, it is treated as UserIgnoreList, so later in cost estimation, the cost of these instructions are not counted.
For aggregate value, later usage are more likely to be done in scalar registers, either used as individual scalars or used as a whole for function call or return value. Ignore scalar extraction instructions may cause too aggressive vectorization for aggregate values, and slow down performance. So for vectorization of aggregate value, the scalar extraction instructions are required in cost estimation.
Shoaib Meenai [Thu, 14 Dec 2017 18:41:49 +0000 (18:41 +0000)]
[cmake] Only attempt to install MSVC system libraries on Windows
Newer versions of CMake (I'm on 3.10, but I believe 3.9 behaves the same
way) attempt to query the system for information about the VS 2017
install. Unfortunately, this query fails on non-Windows systems:
cmake_host_system_information does not recognize <key> VS_15_DIR
CMake isn't going to find these system libraries on non-Windows anyway
(and we were previously silencing the resultant warnings in our
cross-compilation toolchain), so it makes sense to just omit the
attempted installation entirely on non-Windows.
Craig Topper [Thu, 14 Dec 2017 18:35:25 +0000 (18:35 +0000)]
[X86] Don't zero the upper bits of the k-register before extracting a single bit from a vXi1.
This doesn't match the semantics of the extract_vector_elt operation. Nothing downstream knows the bits were zeroed so they still get masked or sign extended after the extrat anyway.
Zachary Turner [Thu, 14 Dec 2017 18:07:04 +0000 (18:07 +0000)]
[COFF] Teach LLD to use the COFF .debug$H section.
This adds the /DEBUG:GHASH option to LLD which will look for
the existence of .debug$H sections in linker inputs and use them
to accelerate type merging. The clang-cl side has already been
added, so this completes the work necessary to begin experimenting
with this feature.
Gadi Haber [Thu, 14 Dec 2017 16:46:47 +0000 (16:46 +0000)]
[X86][AVX][AVX2]: Adding full coverage of MC encoding for the AVX, AVX2 isa set.<NFC>
NFC.
Adding MC regressions tests to cover the AVX and AVX2 ISA sets.
This patch is part of a larger task to cover MC encoding of all X86 ISA Sets.
See revision: https://reviews.llvm.org/D39952
Sander de Smalen [Thu, 14 Dec 2017 16:09:48 +0000 (16:09 +0000)]
Re-commit: [TableGen] AsmMatcher: Fix bug with reported diagnostic for operand.
Summary:
The generated diagnostic by the AsmMatcher isn't always applicable to the AsmOperand.
This is because the code will only update the diagnostic if it is more
specific than the previous diagnostic. However, when having validated
operands and 'moved on' to a next operand (for some instruction/alias for
which all previous operands are valid), if the diagnostic is InvalidOperand,
than that should be set as the diagnostic, not the more specific message
about a previous operand for some other instruction/alias candidate.
(Re-committed with an extra whitespace in SVEInstrFormats.td to trigger rebuild
of AArch64GenAsmMatcher.inc, since the llvm-clang-x86_64-expensive-checks-win
builder does not seem to rebuild AArch64GenAsmMatcher.inc with the
newly built TableGen due to a missing dependency somewhere (see:
http://lists.llvm.org/pipermail/llvm-dev/2017-December/119555.html))
Simon Dardis [Thu, 14 Dec 2017 14:55:25 +0000 (14:55 +0000)]
[mips] Add partial support for R6 in the long branch pass
MIPSR6 introduced several new jump instructions and deprecated
the use of the 'j' instruction. For microMIPS32R6, 'j' was removed
entirely and it only has non delay slot jumps.
This patch adds support for MIPSR6 by using some R6 instructions--
'bc' instead of 'j', 'jic $reg, 0' instead of 'jalr $zero, $reg'--
and modifies the sequences not to use delay slots for R6.
Bjorn Pettersson [Thu, 14 Dec 2017 14:47:52 +0000 (14:47 +0000)]
[ScalarEvolution] Fix base condition in isNormalAddRecPHI.
Summary:
The function is meant to recurse until it comes upon the
phi it's looking for. However, with the current condition,
it will recurse until it finds anything _but_ the phi.
The function will even fail for simple cases like:
%i = phi i32 [ %inc, %loop ], ...
...
%inc = add i32 %i, 1
because the base condition will not happen when the phi
is recursed to, and the recursion will end with a 'false'
result since the previous instruction is a phi.
Haicheng Wu [Thu, 14 Dec 2017 14:36:18 +0000 (14:36 +0000)]
[InlineCost] Tracking Values through PHI Nodes
This patch fix this FIXME in visitPHI()
FIXME: We should potentially be tracking values through phi nodes,
especially when they collapse to a single value due to deleted CFG edges
during inlining.
[AVX512] Adding support for load truncate store of I1
store operation on a truncated memory (load) of vXi1 is poorly supported by LLVM and most of the time end with an assertion.
This patch fixes this issue.
Fedor Sergeev [Thu, 14 Dec 2017 10:36:31 +0000 (10:36 +0000)]
[PM][InstCombine] fixing omission of AliasAnalysis in new-pass-manager's version of InstCombine
Summary:
Passing AliasAnalysis results instead of nullptr appears to work just fine.
A couple new-pass-manager tests updated to align with new order of analyses.
Sam Parker [Thu, 14 Dec 2017 09:31:01 +0000 (09:31 +0000)]
[DAGCombine] Move AND nodes to multiple load leaves
Recommitting rL319773, which was reverted due to a recursive issue
causing timeouts. This happened because I failed to check whether
the discovered loads could be narrowed further. In the case of a tree
with one or more narrow loads, that could not be further narrowed, as
well as a node that would need masking, an AND could be introduced
which could then be visited and recombined again with the same load.
This could again create the masking load, with would be combined
again... We now check that the load can be narrowed so that this
process stops.
Original commit message:
Search from AND nodes to find whether they can be propagated back to
loads, so that the AND and load can be combined into a narrow load.
We search through OR, XOR and other AND nodes and all bar one of the
leaves are required to be loads or constants. The exception node then
needs to be masked off meaning that the 'and' isn't removed, but the
loads(s) are narrowed still.
Craig Topper [Thu, 14 Dec 2017 08:25:58 +0000 (08:25 +0000)]
[SelectionDAG][X86] Improve legalization of v32i1 CONCAT_VECTORS of v16i1 for AVX512F.
A v32i1 CONCAT_VECTORS of v16i1 uses promotion to v32i8 to legalize the v32i1. This results in a bunch of extract_vector_elts and a build_vector that ultimately gets scalarized.
This patch checks to see if v16i8 is legal and inserts a any_extend to that so that we can concat v16i8 to v32i8 and avoid creating the extracts.
Dorit Nuzman [Thu, 14 Dec 2017 07:56:31 +0000 (07:56 +0000)]
[LV] Support efficient vectorization of an induction with redundant casts
D30041 extended SCEVPredicateRewriter to improve handling of Phi nodes whose
update chain involves casts; PSCEV can now build an AddRecurrence for some
forms of such phi nodes, under the proper runtime overflow test. This means
that we can identify such phi nodes as an induction, and the loop-vectorizer
can now vectorize such inductions, however inefficiently. The vectorizer
doesn't know that it can ignore the casts, and so it vectorizes them.
This patch records the casts in the InductionDescriptor, so that they could
be marked to be ignored for cost calculation (we use VecValuesToIgnore for
that) and ignored for vectorization/widening/scalarization (i.e. treated as
TriviallyDead).
In addition to marking all these casts to be ignored, we also need to make
sure that each cast is mapped to the right vector value in the vector loop body
(be it a widened, vectorized, or scalarized induction). So whenever an
induction phi is mapped to a vector value (during vectorization/widening/
scalarization), we also map the respective cast instruction (if exists) to that
vector value. (If the phi-update sequence of an induction involves more than one
cast, then the above mapping to vector value is relevant only for the last cast
of the sequence as we allow only the "last cast" to be used outside the
induction update chain itself).
Gadi Haber [Thu, 14 Dec 2017 07:26:08 +0000 (07:26 +0000)]
[X86][AES]: Adding full coverage of MC encoding for the AES and AVXAES isa sets.<NFC>
NFC.
Adding MC regressions tests to cover the AES and AVXAES ISA sets both 32 and 64 bit.
This patch is part of a larger task to cover MC encoding of all X86 ISA Sets.
started in revision: https://reviews.llvm.org/D39952
Craig Topper [Thu, 14 Dec 2017 06:49:07 +0000 (06:49 +0000)]
[SelectionDAG] When legalizing the result type of CONCAT_VECTORS, take into account whether the input type also needs to be promoted.
If so go ahead and get the promoted input vector to extract from. Previously, we would create a bunch of any_extends of extract_vector_elts with illegal input type that needs to be promoted. The legalization of those extract_vector_elts would then potentially introduce a truncate. So now we have a bunch of any_extends of truncates. By legalizing both parts together we avoid creating these extra nodes.
The test changes seem to be because we were previously combining the build_vector with the any_extend before the any_extend got combined with the truncate.
Matthias Braun [Thu, 14 Dec 2017 03:59:24 +0000 (03:59 +0000)]
MC/AsmPrinter: Reduce code duplication.
Factor out duplicated code emitting mach-o version-min specifiers.
This should be NFC but happens to fix a bug where the code in
MCMachoStreamer didn't take the version skew between darwin and macos
versions into account.
Matthias Braun [Thu, 14 Dec 2017 00:12:46 +0000 (00:12 +0000)]
MC: Add support for mach-o build_version
LC_BUILD_VERSION is a new load command superseding the previously used
LC_XXX_MIN_VERSION commands. This adds an assembler directive along with
encoding/streaming support.
Shoaib Meenai [Wed, 13 Dec 2017 23:38:12 +0000 (23:38 +0000)]
[cmake] Add support for case-sensitive Windows SDKs
When the Windows SDK is hosted on a case-sensitive filesystem (e.g. when
compiling on Linux and not using ciopfs), we can automatically generate
a VFS overlay for headers and symlinks for libraries.
Craig Topper [Wed, 13 Dec 2017 23:11:30 +0000 (23:11 +0000)]
Recommit r320461 "[X86] Use regular expressions more aggressively to reduce the number of scheduler entries needed for FMA3 instructions."
I've hopefully sidestepped the MSVC issue that caused it to be reverted. We no longer include the Sched enum from X86GenInstrInfo.inc on the X86 target. So hopefully MSVC's preprocessor will skip over it and nothing will notice the 11000 character enum name.
Original commit message:
When the scheduler tables are generated by tablegen, the instructions are divided up into groups based on their default scheduling information and how they are referenced by groups for each processor. For any set of instructions that are matched by a specific InstRW line, that group of instructions is guaranteed to not be in a group with any other instructions. So in general, the more InstRW class definitions are created, the more groups we end up with in the generated files. Particularly if a lot of the InstRW lines only match to single instructions, which is true of a large number of the Intel scheduler models.
This change alone reduces the number of instructions groups from ~6000 to ~5500. And there's lots more we could do.
Yaxun Liu [Wed, 13 Dec 2017 22:38:09 +0000 (22:38 +0000)]
CodeGen: Fix assertion in machine inst sheduler due to llvm.dbg.value
Two issues were found about machine inst scheduler when compiling ProRender
with -g for amdgcn target:
GCNScheduleDAGMILive::schedule tries to update LiveIntervals for DBG_VALUE, which it
should not since DBG_VALUE is not mapped in LiveIntervals.
when DBG_VALUE is the last instruction of MBB, ScheduleDAGInstrs::buildSchedGraph and
ScheduleDAGMILive::scheduleMI does not move RPTracker properly, which causes assertion.
Zachary Turner [Wed, 13 Dec 2017 22:33:58 +0000 (22:33 +0000)]
[CodeView] Teach clang to emit the .debug$H COFF section.
Currently this is an LLVM extension to the COFF spec which is
experimental and intended to speed up linking. For now it is
behind a hidden cl::opt flag, but in the future we can move it
to a "real" cc1 flag and have the driver pass it through whenever
it is appropriate.
The patch to actually make use of this section in lld will come
in a followup.
Sanjay Patel [Wed, 13 Dec 2017 21:58:15 +0000 (21:58 +0000)]
[EarlyCSE] recognize commuted and swapped variants of min/max as equivalent (PR35642)
As shown in:
https://bugs.llvm.org/show_bug.cgi?id=35642
...we can have different forms of min/max, so we should recognize those here in EarlyCSE
similar to how we already handle binops and compares that can commute.