Matt Arsenault [Wed, 26 Oct 2016 14:53:50 +0000 (14:53 +0000)]
Fix nondeterministic output in local stack slot alloc pass
This finds all of the references to a frame index in a function, and
sorts by the offset. If multiple instructions use the same offset,
nothing was breaking the tie for sorting.
This avoids the test failures the reverted r282999 introduced.
Victor Leschuk [Wed, 26 Oct 2016 11:59:03 +0000 (11:59 +0000)]
DebugInfo: support for DWARFv5 DW_AT_alignment attribute
* Assume that clang passes non-zero alignment value to DIBuilder
only in case when it was forced by C++11 'alignas', C11 '_Alignas'
or compiler attribute '__attribute__((aligned (N)))'.
* Emit DW_AT_alignment if alignment is specified for type/object.
Andrea Di Biagio [Wed, 26 Oct 2016 10:28:32 +0000 (10:28 +0000)]
[IndVarSimplify][DebugLoc] When widening the exit loop condition, correctly reuse the debug location of the original comparison.
When the loop exit condition is canonicalized as a != compaison, reuse the
debug location of the original (non canonical) comparison.
Before this patch, the debug location of the new icmp was obtained from the
loop latch terminator. This patch fixes the issue by correctly setting the
IRBuilder's "current debug location" to the location of the original compare.
Victor Leschuk [Wed, 26 Oct 2016 08:55:27 +0000 (08:55 +0000)]
DebugInfo: support for DWARFv5 DW_AT_alignment attribute
* Assume that clang passes non-zero alignment value to DIBuilder
only in case when it was forced by C++11 'alignas', C11 '_Alignas'
or compiler attribute '__attribute__((aligned (N)))'.
* Emit DW_AT_alignment if alignment is specified for type/object.
Summary: Clang's intrinsic header currently tries to negate the third operand of a vfmadd mask3 in order to create vfmsub, but this fails isel. This patch adds scalar vfmsub and vfnmsub mask3 that we can use instead to avoid the negate. This is consistent with the packed instructions.
[XRay] Implement `llvm-xray extract`, start of the llvm-xray tool
Usage:
llvm-xray extract <object file> [-o <filename or '-'>]
The tool gets the XRay instrumentation map from an object file and turns
it into YAML. We first support ELF64 sleds on x86_64 binaries, with
provision for supporting other supported platforms and formats later.
This is the first of a many-part change to fully implement the
`llvm-xray` tool.
We also define a subcommand registration and dispatch mechanism to be
used by other further subcommand implementations for llvm-xray.
[XRay] Implement `llvm-xray extract`, start of the llvm-xray tool
Usage:
llvm-xray extract <object file> [-o <filename or '-'>]
The tool gets the XRay instrumentation map from an object file and turns
it into YAML. We first support ELF64 sleds on x86_64 binaries, with
provision for supporting other supported platforms and formats later.
This is the first of a many-part change to fully implement the
`llvm-xray` tool.
We also define a subcommand registration and dispatch mechanism to be
used by other further subcommand implementations for llvm-xray.
James Y Knight [Tue, 25 Oct 2016 22:13:28 +0000 (22:13 +0000)]
[Sparc] Don't overlap variable-sized allocas with other stack variables.
On SparcV8, it was previously the case that a variable-sized alloca
might overlap by 4-bytes the last fixed stack variable, effectively
because 92 (the number of bytes reserved for the register spill area) !=
96 (the offset added to SP for where to start a DYNAMIC_STACKALLOC).
It's not as simple as changing 96 to 92, because variables that should
be 8-byte aligned would then be misaligned.
For now, simply increase the allocation size by 8 bytes for each dynamic
allocation -- wastes space, but at least doesn't overlap. As the large
comment says, doing this more efficiently will require larger changes in
llvm.
Also adds some test cases showing that we continue to not support
dynamic stack allocation and over-alignment in the same function.
Bob Haarman [Tue, 25 Oct 2016 22:11:52 +0000 (22:11 +0000)]
[codeview] support emitting indirect virtual base class information
Summary:
Fixes PR28281.
MSVC lists indirect virtual base classes in the field list of a class,
using LF_IVBCLASS records. This change makes LLVM emit such records
when processing DW_TAG_inheritance tags with the DIFlagVirtual and
(newly introduced) DIFlagIndirect tags.
Rong Xu [Tue, 25 Oct 2016 21:47:24 +0000 (21:47 +0000)]
[PGO] Fix select instruction annotation
Summary:
Select instruction annotation in IR PGO uses the edge count to infer the
branch count. It's currently placed in setInstrumentedCounts() where
no all the BB counts have been computed. This leads to wrong branch weights.
Move the annotation after all BB counts are populated.
Lang Hames [Tue, 25 Oct 2016 21:19:30 +0000 (21:19 +0000)]
[docs] Add more Error documentation to the Programmer's Manual.
This patch updates some of the existing Error examples, expands on the
documentation for handleErrors, and includes new sections that cover
a number of helpful utilities and common error usage idioms.
Guozhi Wei [Tue, 25 Oct 2016 20:43:42 +0000 (20:43 +0000)]
[InstCombine] Resubmit the combine of A->B->A BitCast and fix for pr27996
The original patch of the A->B->A BitCast optimization was reverted by r274094 because it may cause infinite loop inside compiler https://llvm.org/bugs/show_bug.cgi?id=27996.
The problem is with following code
xB = load (type B);
xA = load (type A);
+yA = (A)xB; B -> A
+zAn = PHI[yA, xA]; PHI
+zBn = (B)zAn; // A -> B
store zAn;
store zBn;
optimizeBitCastFromPhi generates
+zBn = (B)zAn; // A -> B
and expects it will be combined with the following store instruction to another
store zAn
Unfortunately before combineStoreToValueType is called on the store instruction, optimizeBitCastFromPhi is called on the new BitCast again, and this pattern repeats indefinitely.
optimizeBitCastFromPhi only generates BitCast for load/store instructions, only the BitCast before store can cause the reexecution of optimizeBitCastFromPhi, and BitCast before store can easily be handled by InstCombineLoadStoreAlloca.cpp. So the solution to the problem is if all users of a CI are store instructions, we should not do optimizeBitCastFromPhi on it. Then optimizeBitCastFromPhi will not be called on the new BitCast instructions.
Robert Lougher [Tue, 25 Oct 2016 20:17:58 +0000 (20:17 +0000)]
revert: "Remove debug location from common tail when tail-merging"
This reverts r285093, as it caused unexpected buildbot failures on
clang-ppc64le-linux, clang-ppc64be-linux, clang-ppc64be-linux-multistage
and clang-ppc64be-linux-lnt. Failing test ubsan/TestCases/TypeCheck/vptr.cpp.
Tim Shen [Tue, 25 Oct 2016 19:55:59 +0000 (19:55 +0000)]
[APFloat] Make APFloat an interface class to the internal IEEEFloat. NFC.
Summary:
The intention is to make APFloat an interface class, so that later I can add a second implementation class DoubleAPFloat to correctly implement PPCDoubleDouble semantic. The interface of IEEEFloat is not public, and can be simplified (currently it's exactly the same as the old APFloat), but that belongs to a separate patch.
DoubleAPFloat should look like:
class DoubleAPFloat {
const fltSemantics *Semantics;
std::unique_ptr<APFloat> APFloats; // Two heap-allocated APFloats.
};
There is no functional change, nor public interface change.
Evandro Menezes [Tue, 25 Oct 2016 19:53:51 +0000 (19:53 +0000)]
Add option to specify minimum number of entries for jump tables
Add an option to allow easier experimentation by target maintainers with the
minimum number of entries to create jump tables. Also clarify the name of
the other existing option governing the creation of jump tables.
Evandro Menezes [Tue, 25 Oct 2016 19:11:43 +0000 (19:11 +0000)]
Switch lowering: improve partitioning of jump tables
When there's a tie between partitionings of jump tables, consider also cases
that result in no jump tables, but in one or a few cases. The motivation is
that many contemporary processors typically perform case switches fairly
quickly.
Matthew Simpson [Tue, 25 Oct 2016 18:59:45 +0000 (18:59 +0000)]
[LV] Sink scalar operands of predicated instructions
When we predicate an instruction (div, rem, store) we place the instruction in
its own basic block within the vectorized loop. If a predicated instruction has
scalar operands, it's possible to recursively sink these scalar expressions
into the predicated block so that they might avoid execution. This patch sinks
as much scalar computation as possible into predicated blocks. We previously
were able to sink such operands only if they were extractelement instructions.
Michael Ilseman [Tue, 25 Oct 2016 18:44:13 +0000 (18:44 +0000)]
Add -strip-nonlinetable-debuginfo capability
This adds a new function to DebugInfo.cpp that takes an llvm::Module
as input and removes all debug info metadata that is not directly
needed for line tables, thus effectively stripping all type and
variable information from the module.
The primary motivation for this feature was the bitcode work flow
(cf. http://lists.llvm.org/pipermail/llvm-dev/2016-June/100643.html
for more background). This is not wired up yet, but will be in
subsequent patches. For testing, the new functionality is exposed to
opt with a -strip-nonlinetable-debuginfo option.
The secondary use-case (and one that works right now!) is as a
reduction pass in bugpoint. I added two new bugpoint options
(-disable-strip-debuginfo and -disable-strip-debug-types) to control
the new features. By default it will first attempt to remove all debug
information, then only the type info, and then proceed to hack at any
remaining MDNodes.
Thanks to Adrian Prantl for stewarding this patch!
Robert Lougher [Tue, 25 Oct 2016 18:44:07 +0000 (18:44 +0000)]
Remove debug location from common tail when tail-merging
The branch folding pass tail merges blocks into a common-tail. However, the
tail retains the debug information from one of the original inputs to the
merge (chosen randomly). This is a problem for sampled-based PGO, as hits
on the common-tail will be attributed to whichever block was chosen,
irrespective of which path was actually taken to the common-tail.
This patch fixes the issue by nulling the debug location for the common-tail.
Nico Weber [Tue, 25 Oct 2016 17:46:29 +0000 (17:46 +0000)]
Revert 285087.
The sanitizer-windows bot turned red with:
FAILED: utils/TableGen/CMakeFiles/obj.llvm-tblgen.dir/IntrinsicEmitter.cpp.obj
C:\PROGRA~2\MICROS~1.0\VC\bin\AMD64_~2\cl.exe ... -c
C:\...\llvm\utils\TableGen\IntrinsicEmitter.cpp
c:\...\llvm\utils\tablegen\intrinsicemitter.cpp(254) :
fatal error C1001: An internal error has occurred in the compiler.
http://lab.llvm.org:8011/builders/sanitizer-windows/builds/114/steps/build%20clang%20lld/logs/stdio
Andrea Di Biagio [Tue, 25 Oct 2016 16:45:17 +0000 (16:45 +0000)]
[IndVarSimplify][Dwarf] When widening the IV increment, correctly set the debug loc.
When indvars widened an induction variable, the debug location for the loop
increment computation was incorrectly set equal to the debug loc of the loop
latch terminator.
This patch fixes the issue by propagating the correct location from the
original loop increment instruction to the new widened increment.
Geoff Berry [Tue, 25 Oct 2016 16:18:47 +0000 (16:18 +0000)]
[EarlyCSE] Make MemorySSA memory dependency check more aggressive.
Now that MemorySSA keeps track of whether MemoryUses are optimized, use
getClobberingMemoryAccess() to check MemoryUse memory dependencies since
it should no longer be so expensive.
This is a follow-up change to https://reviews.llvm.org/D25881
Ulrich Weigand [Tue, 25 Oct 2016 15:39:15 +0000 (15:39 +0000)]
[SystemZ] Do not use LOC(G) for volatile loads
It is not safe to use LOAD ON CONDITION to implement access to a memory
location marked "volatile", since the architecture leaves it unspecified
whether or not an access happens if the condition is false.
The current code already appears to care about that:
def LOC : CondUnaryRSY<"loc", 0xEBF2, nonvolatile_load, GR32, 4>;
Unfortunately, that "nonvolatile_load" operator is simply ignored
by the CondUnaryRSY class, and there was no test to catch it.
Simon Pilgrim [Tue, 25 Oct 2016 14:29:25 +0000 (14:29 +0000)]
[X86][SSE] Add support for (V)PMOVSX* constant folding
We already have (V)PMOVZX* combining support, this is the beginning of handling (V)PMOVSX* similarly - other combines in combineVSZext can be generalized in future patches.
This unearthed an interesting bug in that we were generating illegal build vectors on 32-bit targets - it was proving difficult to create a test for it from PMOVZX, but it fired immediately with PMOVSX. I've created a more general form of the existing getConstVector to handle these cases - ideally this should be handled in non-target-specific code but I couldn't find an equivalent.
Keeping the shuffle allows lowering the constant build_vector to a materialized
constant vector (such as a vector-load from the constant-pool or some other idiom).
This patch implements an API as close to that as possible. The
implementation on top of the current IRObjectFile is a bit hackish,
but should map just fine over a symbol table and is very convenient to
use.
Craig Topper [Tue, 25 Oct 2016 04:00:29 +0000 (04:00 +0000)]
[AVX-512] Add support for creating SIGN_EXTEND_VECTOR_INREG and ZERO_EXTEND_VECTOR_INREG for 512-bit vectors to support vpmovzxbq and vpmovsxbq.
Summary: The one tricky thing about this is that the sign/zero_extend_inreg uses v64i8 as an input type which isn't legal without BWI support. Though the vpmovsxbq and vpmovzxbq instructions themselves don't require BWI. To support this we need to add custom lowering for ZERO_EXTEND_VECTOR_INREG with v64i8 input. This can mostly reuse the existing sign extend code with a couple checks for sign extend vs zero extend added.
Matthias Braun [Tue, 25 Oct 2016 02:55:17 +0000 (02:55 +0000)]
MachineInstrBundle: Pass iterators to getBundle(Start|End); NFC
This is a function to go backwards in a block to find the first
instruction in a bundle, so iterator is a more natural choice for
parameter/return rather than a reference to a MachineInstruction.
Vedant Kumar [Tue, 25 Oct 2016 00:08:33 +0000 (00:08 +0000)]
[llvm-cov] Do not print out the filename of the object file
When we load coverage data from multiple objects, we don't have a way to
attribute a source object to a function record. Printing out the object
filename next to the source filename is already not very useful: soon,
it'll actually become misleading. Stop printing out the filename now.
Dan Gohman [Mon, 24 Oct 2016 23:27:49 +0000 (23:27 +0000)]
[WebAssembly] Implement more WebAssembly binary encoding.
This changes locals from being declared by the emitLocal hook in
WebAssemblyTargetStreamer, rather than with an instruction. After exploring
the infastructure in LLVM more, this seems to make more sense since
declaring locals doesn't use an encoded opcode.
This also adds more 0xd opcodes, type encodings, and miscellaneous
binary encoding bits.
Matthias Braun [Mon, 24 Oct 2016 23:23:02 +0000 (23:23 +0000)]
CodeGen/Passes: Pass MachineFunction as functor arg; NFC
Passing a MachineFunction as argument is more natural and avoids an
unnecessary round-trip through the logic determining the correct
Subtarget because MachineFunction already has a reference anyway.
Justin Bogner [Mon, 24 Oct 2016 21:58:58 +0000 (21:58 +0000)]
cmake: Rename installhdrs to install-llvm-headers and fix the dependencies
The installhdrs target was inconsistently named and would behave
differently depending on whether or not you ran a build first. This
renames it to install-llvm-headers to match other target names and
adds a dependency on intrinsics_gen so that it will always install the
same set of things.
Eli Friedman [Mon, 24 Oct 2016 21:47:44 +0000 (21:47 +0000)]
Fix regression from my recent GlobalsAA fix.
There are two fixes here: one, AnalyzeUsesOfPointer can't return
false until it has checked all the uses of the pointer. Two, if a
global uses another global, we have to assume the address of the
first global escapes.