Zachary Turner [Mon, 12 Jun 2017 21:46:51 +0000 (21:46 +0000)]
Slightly better fix for dealing with no-id-stream PDBs.
The last fix required the user to manually add the required
feature. This caused an LLD test to fail because I failed to
update LLD. In practice we can hide this logic so it can just
be transparently added when we write the PDB.
Zachary Turner [Mon, 12 Jun 2017 21:34:53 +0000 (21:34 +0000)]
[llvm-pdbdump] Don't fail on PDBs with no ID stream.
Older PDBs don't have this. Its presence is detected by using
the various "feature" flags that come at the end of the PDB
Stream. Detect this, and don't try to dump the ID stream if the
features tells us it's not present.
Anna Thomas [Mon, 12 Jun 2017 21:26:53 +0000 (21:26 +0000)]
[RS4GC] Drop invalid metadata after pointers are relocated
Summary:
After RS4GC, we should drop metadata that is no longer valid. These metadata
is used by optimizations scheduled after RS4GC, and can cause a miscompile.
One such metadata is invariant.load which is used by LICM sinking transform.
After rewriting statepoints, the address of a load maybe relocated. With
invariant.load metadata on a load instruction, LICM sinking assumes the
loaded value (from a dererenceable address) to be invariant, and
rematerializes the load operand and the load at the exit block.
This transforms the IR to have an unrelocated use of the
address after a statepoint, which is incorrect.
Other metadata we conservatively remove are related to
dereferenceability and noalias metadata.
This patch drops such metadata on store and load instructions after
rewriting statepoints.
[ADT] Reduce duplication between {Contextual,}FoldingSet; NFC
This is a precursor to another change (coming soon) that aims to make
FoldingSet's API more type-safe. Without this, the type-safety change
would just duplicate 4 more public methods between the already very
similar classes.
This renames FoldingSetImpl to FoldingSetBase so it's consistent with
the FooBase -> FooImpl<T> -> Foo<T> convention we seem to have with
other containers.
Zachary Turner [Mon, 12 Jun 2017 20:46:35 +0000 (20:46 +0000)]
Fix a null pointer dereference in llvm-pdbutil pretty.
Static data members were causing a problem because I mistakenly
assumed all members would affect a class's layout and so the
Layout member would be non-null.
Matthias Braun [Mon, 12 Jun 2017 20:30:52 +0000 (20:30 +0000)]
SplitKit: Fix partially live subreg splitting
Fix thinko/typo in subreg aware liverange splitting logic. I'm not sure
how to write a proper testcase for this. The original problem only
happens on an out-of-tree target. Forcing subreg enabled targets to
spill and split in a predictable way is near impossible.
Reid Kleckner [Mon, 12 Jun 2017 19:45:35 +0000 (19:45 +0000)]
[llvm-ar] Make llvm-lib behave more like the MSVC archiver
Summary:
Use the filepath used to open the archive member as the archive member
name instead of the file basename. This path might be absolute or
relative. This is important because the archive member name will show
up in the PDB, and we want our PDBs to look as much like MSVC's as
possible.
This also helps avoid an issue in our PDB module descriptor writing
code, which assumes that all module names are unique. Relative paths
still aren't guaranteed to be unique, but they're much better than
basenames, which definitely aren't unique.
Tony Jiang [Mon, 12 Jun 2017 18:24:36 +0000 (18:24 +0000)]
[PowerPC] Match vec_revb builtins to P9 instructions.
Power9 has instructions that will reverse the bytes within an element for all
sizes (half-word, word, double-word and quad-word). These can be used for the
vec_revb builtins in altivec.h. However, we implement these to match vector
shuffle nodes as that will cover both the builtins and vector shuffles that
occur in the SDAG through other means.
Tony Jiang [Mon, 12 Jun 2017 17:58:42 +0000 (17:58 +0000)]
[Power9] Added support for the modsw, moduw, modsd, modud hardware instructions.
Note that if we need the result of both the divide and the modulo then we
compute the modulo based on the result of the divide and not using the new
hardware instruction.
Commit on behalf of STEFAN PINTILIE.
Differential Revision: https://reviews.llvm.org/D33940
Sanjay Patel [Mon, 12 Jun 2017 17:44:30 +0000 (17:44 +0000)]
[utils] remove ability to generate llc check lines from update_test_checks.py
The dream of a unified check-line auto-generator for all phases of compilation is dead.
The llc script has already diverged to be better at its goal, so having 2 scripts that
do almost the same thing just causes confusion. Now, this script will only work with
opt to produce check lines for IR transforms.
Sanjay Patel [Mon, 12 Jun 2017 17:31:36 +0000 (17:31 +0000)]
[x86] regenerate checks with update_llc_test_checks.py
The dream of a unified check-line auto-generator for all phases of compilation is dead.
The llc script has already diverged to be better at its goal, so having 2 scripts that
do almost the same thing is just causing confusion.
We can rip out the llc ability in update_test_checks.py next and rename it, so it will
be clear that we have one script for llc check auto-generation and another for opt.
Geoff Berry [Mon, 12 Jun 2017 17:15:41 +0000 (17:15 +0000)]
[SelectionDAG] Allow sin/cos -> sincos optimization on GNU triples w/ just -fno-math-errno
Summary:
This change enables the sin(x) cos(x) -> sincos(x) optimization on GNU
target triples. This optimization was being inhibited when -ffast-math
wasn't set because sincos in GLibC does not set errno, while sin and cos
do. However, this optimization will only run if the attributes on the
sin/cos calls include readnone, which is how clang represents the fact
that it doesn't care about the errno values set by these functions (via
the -fno-math-errno flag).
Sanjay Patel [Mon, 12 Jun 2017 17:05:43 +0000 (17:05 +0000)]
[x86] regenerate checks with update_llc_test_checks.py
The dream of a unified check-line auto-generator for all phases of compilation is dead.
The llc script has already diverged to be better at its goal, so having 2 scripts that
do almost the same thing is just causing confusion for newcomers. I plan to fix up more
x86 tests in a next commit. We can rip out the llc ability in update_test_checks.py after
that.
Than McIntosh [Mon, 12 Jun 2017 14:56:02 +0000 (14:56 +0000)]
StackColoring: smarter check for slot overlap
Summary:
The old check for slot overlap treated 2 slots `S` and `T` as
overlapping if there existed a CFG node in which both of the slots could
possibly be active. That is overly conservative and caused stack blowups
in Rust programs. Instead, check whether there is a single CFG node in
which both of the slots are possibly active *together*.
Daniel Neilson [Mon, 12 Jun 2017 14:22:21 +0000 (14:22 +0000)]
Const correctness for TTI::getRegisterBitWidth
Summary: The method TargetTransformInfo::getRegisterBitWidth() is declared const, but the type erasing implementation classes (TargetTransformInfo::Concept & TargetTransformInfo::Model) that were introduced by Chandler in https://reviews.llvm.org/D7293 do not have the method declared const. This is an NFC to tidy up the const consistency between TTI and its implementation.
Export the required symbol from DynamicLibraryTests
Running unittests/Support/DynamicLibrary/DynamicLibraryTests fails
when LLVM is configured with -DLLVM_EXPORT_SYMBOLS_FOR_PLUGINS=ON, because
the test's version script only contains symbols extracted from the static libraries,
that the test links with, but not those from the main object/executable itself.
The patch moves the one symbol, needed by the test, to a static library.
/public/pkgsrc-tmp/wip/lldb-netbsd/work/.buildlink/include/llvm/ADT/Triple.h:44:7: runtime error: load of value 32639, which is not a valid value for type 'SubArchType'
/public/pkgsrc-tmp/wip/lldb-netbsd/work/.buildlink/include/llvm/ADT/Triple.h:44:7: runtime error: load of value 3200171710, which is not a valid value for type 'SubArchType'
/public/pkgsrc-tmp/wip/lldb-netbsd/work/.buildlink/include/llvm/ADT/Triple.h:44:7: runtime error: load of value 3200171710, which is not a valid value for type 'SubArchType'
Correct this issue with initialization of SubArch() in the class Triple constructor.
Sanjay Patel [Sun, 11 Jun 2017 21:18:58 +0000 (21:18 +0000)]
[x86] use vperm2f128 rather than vinsertf128 when there's a chance to fold a 32-byte load
I was looking closer at the x86 test diffs in D33866, and the first change seems like it
shouldn't happen in the first place. So this patch will resolve that.
Using Agner's tables and AMD docs, vperm2f128 and vinsertf128 have identical timing for
any given CPU model, so we should be able to interchange those without affecting perf.
But as we can see in some of the diffs here, using vperm2f128 allows load folding, so
we should take that opportunity to reduce code size and register pressure.
A secondary advantage is making AVX1 and AVX2 codegen more similar. Given that vperm2f128
was introduced with AVX1, we should be selecting it in all of the same situations that we
would with AVX2. If there's some reason that an AVX1 CPU would not want to use this
instruction, that should be fixed up in a later pass.
NAKAMURA Takumi [Sun, 11 Jun 2017 00:57:30 +0000 (00:57 +0000)]
TableGen.cmake: Try to fix build breakage introduce in r305142.
LLVM_TABLEGEN_TARGET is undefined in clang standalone build.
STREQUAL cannot omit LHS. Then I saw an error;
CMake Error at /path/to/install/llvm/lib/cmake/llvm/TableGen.cmake:40 (if):
if given arguments:
"STREQUAL" "/path/to/install/llvm/bin/llvm-tblgen.exe"
Unknown arguments specified
Davide Italiano [Sat, 10 Jun 2017 23:18:32 +0000 (23:18 +0000)]
[SmallVector] Reinstate the typedefs.
They're unused with recent versions of libstdc++ but older ones
(e.g. libstdc++ 4.9 still requires them). Maybe we should bump
the requirements on the minimum version to make GCC 7 happy, but
in the meanwhile we need to live with the warning.
Brian Gesiak [Sat, 10 Jun 2017 21:33:27 +0000 (21:33 +0000)]
[opt-viewer] Include default values in help output
Summary:
Python's argparse module includes a `%(default)s` format specifier that
can be used to print the default value of an option in its help text.
Use this for opt-viewer utilities' `--jobs` arguments.
Geoff Berry [Sat, 10 Jun 2017 15:20:03 +0000 (15:20 +0000)]
[EarlyCSE] Add option to use MemorySSA for function simplification run of EarlyCSE (off by default).
Summary:
Use MemorySSA for memory dependency checking in the EarlyCSE pass at the
start of the function simplification portion of the pipeline. We rely
on the fact that GVNHoist runs just after this pass of EarlyCSE to
amortize the MemorySSA construction cost since GVNHoist uses MemorySSA
and EarlyCSE preserves it.
This is turned off by default. A follow-up change will turn it on to
allow for easier reversion in case it breaks something.
Galina Kistanova [Sat, 10 Jun 2017 07:48:49 +0000 (07:48 +0000)]
Added dependency on the TableGen executable file.
For the case when LLVM_OPTIMIZED_TABLEGEN is ON (enables LLVM_USE_HOST_TOOLS),
we need both _TABLEGEN_TARGET and _TABLEGEN_EXE in the DEPENDS list
to have .inc files rebuilt on a tablegen change, as cmake does not propagate
file-level dependencies of custom targets.
We could always have just one dependency on both the target and
the file, but the 2 cases would produce cleaner cmake files.
Craig Topper [Sat, 10 Jun 2017 06:58:24 +0000 (06:58 +0000)]
[IR] Delete operator new(size_t, unsigned) for ShuffleVector making it consistent with other instructions that declare another operator new with a different signature. NFC
Sanjay Patel [Fri, 9 Jun 2017 23:01:05 +0000 (23:01 +0000)]
[CGP] add a reference to DataLayout in MemCmpExpansion; NFCI
We're currently passing endian-ness around as a param (and not uniformly),
so this eliminates the need for that. I'd like to add a constant fold
call too, and that requires a DL.
[AArch64] Add fallback in FastISel fp16 conversions
Summary:
- Fix assertion failures on F16 to/from int types in FastISel by falling
back to regular ISel
- Add a testcase of various conversion cases with FastISel (-O0)
Yaxun Liu [Fri, 9 Jun 2017 20:46:29 +0000 (20:46 +0000)]
[SROA] Fix APInt size when load/store have different address space
Currently there is a bug in SROA::presplitLoadsAndStores which causes assertion in
GEPOperator::accumulateConstantOffset.
Basically it does not consider the situation that the pointer operand of load or store
may be in a non-zero address space and its size may be different from the size of
a pointer in address space 0.
This patch fixes assertion when compiling Blender Cycles kernels for amdgpu backend.
Francis Ricci [Fri, 9 Jun 2017 20:31:53 +0000 (20:31 +0000)]
[ADT] Make iterable SmallVector template overrides more specific
Summary:
This prevents the iterator overrides from being selected in
the case where non-iterator types are used as arguments, which
is of particular importance in cases where other overrides with
identical types exist.
Keno Fischer [Fri, 9 Jun 2017 19:31:10 +0000 (19:31 +0000)]
[Sink] Fix predicate in legality check
Summary:
isSafeToSpeculativelyExecute is the wrong predicate to use here.
All that checks for is whether it is safe to hoist a value due to
unaligned/un-dereferencable accesses. However, not only are we doing
sinking rather than hoisting, our concern is that the location
we're loading from may have been modified. Instead forbid sinking
any load across a critical edge.
Zachary Turner [Fri, 9 Jun 2017 17:54:36 +0000 (17:54 +0000)]
Allow VarStreamArray to use stateful extractors.
Previously extractors tried to be stateless with any additional
context information needed in order to parse items being passed
in via the extraction method. This led to quite cumbersome
implementation challenges and awkwardness of use. This patch
brings back support for stateful extractors, making the
implementation and usage simpler.
Craig Topper [Fri, 9 Jun 2017 16:16:20 +0000 (16:16 +0000)]
[LazyValueInfo] Don't run the more complex predicate handling code for EQ and NE in getPredicateResult
Summary:
Unless I'm mistaken, the special handling for EQ/NE should cover everything and there is no reason to fallthrough to the more complex code. For that matter I'm not sure there's any reason to special case EQ/NE other than avoiding creating temporary ConstantRanges.
This patch moves the complex code into an else so we only do it when we are handling a predicate other than EQ/NE.
Zvi Rackover [Fri, 9 Jun 2017 14:53:45 +0000 (14:53 +0000)]
SelectionDAG: Remove deleted nodes from legalized set to avoid clash with newly created nodes
Summary:
During DAG legalization loop in SelectionDAG::Legalize(),
bookkeeping of the SDNodes that were already legalized is implemented
with SmallPtrSet (LegalizedNodes). This kind of set stores only pointers
to objects, not the objects themselves. Unfortunately, if SDNode is
deleted during legalization for some reason, LegalizedNodes set is not
informed about this fact. This wouldn’t be so bad, if SelectionDAG wouldn’t reuse
space deallocated after deletion of unused nodes, for creation of new
ones. Because of this, new nodes, created during legalization often can
have pointers identical to ones that have been previously legalized,
added to the LegalizedNodes set, and deleted afterwards. This in turn
causes, that newly created nodes, sharing the same pointer as deleted
old ones, are present in LegalizedNodes *already at the moment of
creation*, so we never call Legalize on them.
The fix facilitates the fact, that DAG notifies listeners about each
modification. I have registered DAGNodeDeletedListener inside
SelectionDAG::Legalize, with a callback function that removes any
pointer of any deleted SDNode from the LegalizedNodes set. With this
modification, LegalizeNodes set does not contain pointers to nodes that
were deleted, so newly created nodes can always be inserted to it, even
if they share pointers with old deleted nodes.
Patch by pawel.szczerbuk@intel.com
The issue this patch addresses causes failures in an out-of-tree target,
and i was not able to create a reproducer for an in-tree target, hence
there is no test-case.
Simon Dardis [Fri, 9 Jun 2017 14:37:08 +0000 (14:37 +0000)]
Reland "[SelectionDAG] Enable target specific vector scalarization of calls and returns"
By target hookifying getRegisterType, getNumRegisters, getVectorBreakdown,
backends can request that LLVM to scalarize vector types for calls
and returns.
The MIPS vector ABI requires that vector arguments and returns are passed in
integer registers. With SelectionDAG's new hooks, the MIPS backend can now
handle LLVM-IR with vector types in calls and returns. E.g.
'call @foo(<4 x i32> %4)'.
Previously these cases would be scalarized for the MIPS O32/N32/N64 ABI for
calls and returns if vector types were not legal. If vector types were legal,
a single 128bit vector argument would be assigned to a single 32 bit / 64 bit
integer register.
By teaching the MIPS backend to inspect the original types, it can now
implement the MIPS vector ABI which requires a particular method of
scalarizing vectors.
Previously, the MIPS backend relied on clang to scalarize types such as "call
@foo(<4 x float> %a) into "call @foo(i32 inreg %1, i32 inreg %2, i32 inreg %3,
i32 inreg %4)".
This patch enables the MIPS backend to take either form for vector types.
The previous version of this patch had a "conditional move or jump depends on
uninitialized value".
Javed Absar [Fri, 9 Jun 2017 14:07:21 +0000 (14:07 +0000)]
[ARM] Custom machine-scheduler. NFCI.
This patch creates a customised machine-scheduler for ARM targets,
so that subsequently DAG mutations etc can be added.
Reviewed by: hahn, rengolin, rovka.
Differential Revision: https://reviews.llvm.org/D34039
Nirav Dave [Fri, 9 Jun 2017 14:04:03 +0000 (14:04 +0000)]
[MC] Fix compiler crash in AsmParser::Lex
When an empty comment is present in an assembly file, the compiler will crash because it checks the first character for '\n' or '\r'.
The fix consists of also checking if the string is empty before accessing the *front* method of the StringRef.
A test is included for the x86 target, but this issue is reproducible with other targets as well.
Serge Rogatch [Fri, 9 Jun 2017 13:23:23 +0000 (13:23 +0000)]
[XRay] Fix computation of function size subject to XRay threshold
Summary:
Currently XRay compares its threshold against `Function::size()` . However, `Function::size()` returns the number of basic blocks (as I understand, such as cycle bodies, if/else bodies, switch-case bodies, etc.), rather than the number of instructions.
The name of the parameter `-fxray-instruction-threshold=N`, as well as XRay documentation at http://llvm.org/docs/XRay.html , suggests that instructions should be counted, rather than the number of basic blocks.
I see two options:
1. Count the number of MachineInstr`s in MachineFunction : this gives better estimate for the number of assembly instructions on the target. So a user can check in disassembly that the threshold works more or less correctly.
2. Count the number of Instruction`s in a Function : AFAIK, this gives correct number of IR instructions, which the user can check in IR listing. However, this number may be far (several times for small functions) from the number of assembly instructions finally emitted.
Option 1 is implemented in this patch because I think that having the closer estimate for the number of assembly instructions emitted is more important than to have a clear definition of the metric.
Nirav Dave [Fri, 9 Jun 2017 12:57:35 +0000 (12:57 +0000)]
Prevent RemoveDeadNodes from deleted already deleted node.
This prevents against assertion errors like PR32659 which occur from a
replacement deleting a node after it's been added to the list argument
of RemoveDeadNodes. The specific failure from PR32659 does not
currently happen, but it is still potentially possible. The underlying
cause is that the callers of the change dfunction builds up a list of
nodes to delete after having moved their uses and it possible that a
move of a later node will cause a previously deleted nodes to be
deleted.
Oliver Stannard [Fri, 9 Jun 2017 09:19:09 +0000 (09:19 +0000)]
[ARM] Add scheduling info for VFMS
The scalar VFMS instructions did not have scheduling information attached (but
VFMA did), which was causing assertion failures with the Cortex-A57 scheduling
model and -fp-contract=fast.