Sam Parker [Thu, 27 Oct 2016 09:47:10 +0000 (09:47 +0000)]
[ARM] Predicate UMAAL selection on hasDSP.
UMAAL is a DSP instruction and it is not available on thumbv7m
(Cortex-M3) and thumbv6m (Cortex-M0+1) targets. Also fix wrong
CHECK prefix in longMAC.ll test.
Nicolai Haehnle [Thu, 27 Oct 2016 08:15:07 +0000 (08:15 +0000)]
AMDGPU: Fix SILoadStoreOptimizer when writes cannot be merged due register dependencies
Summary:
When finding a match for a merge and collecting the instructions that must
be moved, keep in mind that the instruction we merge might actually use one
of the defs that are being moved.
The fact that the ds_read in the test case is not eliminated suggests that
there might be another problem related to alias analysis, but that's a
separate problem: this pass should still work correctly even when earlier
optimization passes missed something or were disabled.
Justin Bogner [Wed, 26 Oct 2016 22:37:52 +0000 (22:37 +0000)]
llvm-objdump: Make some error messages more consistent
Most of the version of report_error were quoting the filename and
printing a colon between the file name and the error message, but this
one wasn't doing either of those. Fix the output to be more
consistent.
Vedant Kumar [Wed, 26 Oct 2016 22:07:39 +0000 (22:07 +0000)]
[utils] Add a '--unified-report' option to the code coverage prep script
In --unified-report mode, a single coverage report is prepared for all
specified binaries and written to *report_dir*. This mode is compatible
with all existing script options, including the --restrict mode which is
used to limit coverage reporting to certain files or directories.
This should not break any existing users of the script.
Vedant Kumar [Wed, 26 Oct 2016 22:07:35 +0000 (22:07 +0000)]
[utils] Add an '--only-merge' option to the code coverage prep script
In --only-merge mode, the script terminates after the profile merging
step. This makes the script less stateful: it's more natural to split
the merge out into a separate step instead of relying on the first
invocation of the script to do it.
This should not break any existing users of the script.
Evandro Menezes [Wed, 26 Oct 2016 22:06:20 +0000 (22:06 +0000)]
[AArch64] Create feature set for Samsung Exynos-M2
Since Exynos-M2 improved the FP square root unit a bit over the one in
Exynos-M1, it does not benefit from using the Newton series for such
operations.
Ehsan Amiri [Wed, 26 Oct 2016 20:16:59 +0000 (20:16 +0000)]
[PPC] Remove testcase from incorrect directory
During my last commit this testcase was put in an incorrect directory. Removing
it. Will put it in the right directory when I can verify everything is correct.
Tim Northover [Wed, 26 Oct 2016 20:01:00 +0000 (20:01 +0000)]
ARM: don't rely on push/pop reglists being in order when folding SP adjust.
It would be a very nice invariant to rely on, but unfortunately it doesn't
necessarily hold (and the causes of mis-sorted reglists appear to be quite
varied) so to be robust the frame lowering code can't assume that the first
register in the list is also the first one that actually gets pushed.
Should fix an issue where we were turning something like:
Nemanja Ivanovic [Wed, 26 Oct 2016 19:51:35 +0000 (19:51 +0000)]
Do not assume that FP vector operands are never legalized by expanding
This patch ensures that if a floating point vector operand is legalized by
expanding, it is legalized through the stack rather than by calling
DAGTypeLegalizer::IntegerToVector which will cause a failure since the operand
is a non-integer type.
Sanjoy Das [Wed, 26 Oct 2016 19:18:43 +0000 (19:18 +0000)]
Simplify `x >=u x >> y` and `x >=u x udiv y`
Summary:
Extends InstSimplify to handle both `x >=u x >> y` and `x >=u x udiv y`.
This is a folloup of rL258422 and
https://github.com/rust-lang/rust/pull/30917 where llvm failed to
optimize away the bounds checking in a binary search.
Dan Gohman [Wed, 26 Oct 2016 17:44:09 +0000 (17:44 +0000)]
[WebAssembly] Update the README.txt.
Update the README.txt with newer information, add a link to the Emscripten
page explaining the current easiest way to use the LLVM wasm backend, and
mention that other ways of using the LLVM wasm backend are in development.
Robert Lougher [Wed, 26 Oct 2016 17:01:47 +0000 (17:01 +0000)]
Reapply: "Remove debug location from common tail when tail-merging"
This reapplies revision 285093. Original commit message:
The branch folding pass tail merges blocks into a common-tail. However, the
tail retains the debug information from one of the original inputs to the
merge (chosen randomly). This is a problem for sampled-based PGO, as hits
on the common-tail will be attributed to whichever block was chosen,
irrespective of which path was actually taken to the common-tail.
This patch fixes the issue by nulling the debug location for the common-tail.
Dehao Chen [Wed, 26 Oct 2016 15:48:45 +0000 (15:48 +0000)]
Introduce updateDiscriminator interface to DILocation to make it cleaner assigning discriminators.
Summary: This patch introduces updateDiscriminator to DILocation so that it can be directly called by AddDiscriminator. It also makes it easier to update the discriminator later.
Matt Arsenault [Wed, 26 Oct 2016 14:53:50 +0000 (14:53 +0000)]
Fix nondeterministic output in local stack slot alloc pass
This finds all of the references to a frame index in a function, and
sorts by the offset. If multiple instructions use the same offset,
nothing was breaking the tie for sorting.
This avoids the test failures the reverted r282999 introduced.
Victor Leschuk [Wed, 26 Oct 2016 11:59:03 +0000 (11:59 +0000)]
DebugInfo: support for DWARFv5 DW_AT_alignment attribute
* Assume that clang passes non-zero alignment value to DIBuilder
only in case when it was forced by C++11 'alignas', C11 '_Alignas'
or compiler attribute '__attribute__((aligned (N)))'.
* Emit DW_AT_alignment if alignment is specified for type/object.
Andrea Di Biagio [Wed, 26 Oct 2016 10:28:32 +0000 (10:28 +0000)]
[IndVarSimplify][DebugLoc] When widening the exit loop condition, correctly reuse the debug location of the original comparison.
When the loop exit condition is canonicalized as a != compaison, reuse the
debug location of the original (non canonical) comparison.
Before this patch, the debug location of the new icmp was obtained from the
loop latch terminator. This patch fixes the issue by correctly setting the
IRBuilder's "current debug location" to the location of the original compare.
Victor Leschuk [Wed, 26 Oct 2016 08:55:27 +0000 (08:55 +0000)]
DebugInfo: support for DWARFv5 DW_AT_alignment attribute
* Assume that clang passes non-zero alignment value to DIBuilder
only in case when it was forced by C++11 'alignas', C11 '_Alignas'
or compiler attribute '__attribute__((aligned (N)))'.
* Emit DW_AT_alignment if alignment is specified for type/object.
Summary: Clang's intrinsic header currently tries to negate the third operand of a vfmadd mask3 in order to create vfmsub, but this fails isel. This patch adds scalar vfmsub and vfnmsub mask3 that we can use instead to avoid the negate. This is consistent with the packed instructions.
[XRay] Implement `llvm-xray extract`, start of the llvm-xray tool
Usage:
llvm-xray extract <object file> [-o <filename or '-'>]
The tool gets the XRay instrumentation map from an object file and turns
it into YAML. We first support ELF64 sleds on x86_64 binaries, with
provision for supporting other supported platforms and formats later.
This is the first of a many-part change to fully implement the
`llvm-xray` tool.
We also define a subcommand registration and dispatch mechanism to be
used by other further subcommand implementations for llvm-xray.
[XRay] Implement `llvm-xray extract`, start of the llvm-xray tool
Usage:
llvm-xray extract <object file> [-o <filename or '-'>]
The tool gets the XRay instrumentation map from an object file and turns
it into YAML. We first support ELF64 sleds on x86_64 binaries, with
provision for supporting other supported platforms and formats later.
This is the first of a many-part change to fully implement the
`llvm-xray` tool.
We also define a subcommand registration and dispatch mechanism to be
used by other further subcommand implementations for llvm-xray.
James Y Knight [Tue, 25 Oct 2016 22:13:28 +0000 (22:13 +0000)]
[Sparc] Don't overlap variable-sized allocas with other stack variables.
On SparcV8, it was previously the case that a variable-sized alloca
might overlap by 4-bytes the last fixed stack variable, effectively
because 92 (the number of bytes reserved for the register spill area) !=
96 (the offset added to SP for where to start a DYNAMIC_STACKALLOC).
It's not as simple as changing 96 to 92, because variables that should
be 8-byte aligned would then be misaligned.
For now, simply increase the allocation size by 8 bytes for each dynamic
allocation -- wastes space, but at least doesn't overlap. As the large
comment says, doing this more efficiently will require larger changes in
llvm.
Also adds some test cases showing that we continue to not support
dynamic stack allocation and over-alignment in the same function.
Bob Haarman [Tue, 25 Oct 2016 22:11:52 +0000 (22:11 +0000)]
[codeview] support emitting indirect virtual base class information
Summary:
Fixes PR28281.
MSVC lists indirect virtual base classes in the field list of a class,
using LF_IVBCLASS records. This change makes LLVM emit such records
when processing DW_TAG_inheritance tags with the DIFlagVirtual and
(newly introduced) DIFlagIndirect tags.
Rong Xu [Tue, 25 Oct 2016 21:47:24 +0000 (21:47 +0000)]
[PGO] Fix select instruction annotation
Summary:
Select instruction annotation in IR PGO uses the edge count to infer the
branch count. It's currently placed in setInstrumentedCounts() where
no all the BB counts have been computed. This leads to wrong branch weights.
Move the annotation after all BB counts are populated.
Lang Hames [Tue, 25 Oct 2016 21:19:30 +0000 (21:19 +0000)]
[docs] Add more Error documentation to the Programmer's Manual.
This patch updates some of the existing Error examples, expands on the
documentation for handleErrors, and includes new sections that cover
a number of helpful utilities and common error usage idioms.
Guozhi Wei [Tue, 25 Oct 2016 20:43:42 +0000 (20:43 +0000)]
[InstCombine] Resubmit the combine of A->B->A BitCast and fix for pr27996
The original patch of the A->B->A BitCast optimization was reverted by r274094 because it may cause infinite loop inside compiler https://llvm.org/bugs/show_bug.cgi?id=27996.
The problem is with following code
xB = load (type B);
xA = load (type A);
+yA = (A)xB; B -> A
+zAn = PHI[yA, xA]; PHI
+zBn = (B)zAn; // A -> B
store zAn;
store zBn;
optimizeBitCastFromPhi generates
+zBn = (B)zAn; // A -> B
and expects it will be combined with the following store instruction to another
store zAn
Unfortunately before combineStoreToValueType is called on the store instruction, optimizeBitCastFromPhi is called on the new BitCast again, and this pattern repeats indefinitely.
optimizeBitCastFromPhi only generates BitCast for load/store instructions, only the BitCast before store can cause the reexecution of optimizeBitCastFromPhi, and BitCast before store can easily be handled by InstCombineLoadStoreAlloca.cpp. So the solution to the problem is if all users of a CI are store instructions, we should not do optimizeBitCastFromPhi on it. Then optimizeBitCastFromPhi will not be called on the new BitCast instructions.
Robert Lougher [Tue, 25 Oct 2016 20:17:58 +0000 (20:17 +0000)]
revert: "Remove debug location from common tail when tail-merging"
This reverts r285093, as it caused unexpected buildbot failures on
clang-ppc64le-linux, clang-ppc64be-linux, clang-ppc64be-linux-multistage
and clang-ppc64be-linux-lnt. Failing test ubsan/TestCases/TypeCheck/vptr.cpp.
Tim Shen [Tue, 25 Oct 2016 19:55:59 +0000 (19:55 +0000)]
[APFloat] Make APFloat an interface class to the internal IEEEFloat. NFC.
Summary:
The intention is to make APFloat an interface class, so that later I can add a second implementation class DoubleAPFloat to correctly implement PPCDoubleDouble semantic. The interface of IEEEFloat is not public, and can be simplified (currently it's exactly the same as the old APFloat), but that belongs to a separate patch.
DoubleAPFloat should look like:
class DoubleAPFloat {
const fltSemantics *Semantics;
std::unique_ptr<APFloat> APFloats; // Two heap-allocated APFloats.
};
There is no functional change, nor public interface change.
Evandro Menezes [Tue, 25 Oct 2016 19:53:51 +0000 (19:53 +0000)]
Add option to specify minimum number of entries for jump tables
Add an option to allow easier experimentation by target maintainers with the
minimum number of entries to create jump tables. Also clarify the name of
the other existing option governing the creation of jump tables.
Evandro Menezes [Tue, 25 Oct 2016 19:11:43 +0000 (19:11 +0000)]
Switch lowering: improve partitioning of jump tables
When there's a tie between partitionings of jump tables, consider also cases
that result in no jump tables, but in one or a few cases. The motivation is
that many contemporary processors typically perform case switches fairly
quickly.
Matthew Simpson [Tue, 25 Oct 2016 18:59:45 +0000 (18:59 +0000)]
[LV] Sink scalar operands of predicated instructions
When we predicate an instruction (div, rem, store) we place the instruction in
its own basic block within the vectorized loop. If a predicated instruction has
scalar operands, it's possible to recursively sink these scalar expressions
into the predicated block so that they might avoid execution. This patch sinks
as much scalar computation as possible into predicated blocks. We previously
were able to sink such operands only if they were extractelement instructions.
Michael Ilseman [Tue, 25 Oct 2016 18:44:13 +0000 (18:44 +0000)]
Add -strip-nonlinetable-debuginfo capability
This adds a new function to DebugInfo.cpp that takes an llvm::Module
as input and removes all debug info metadata that is not directly
needed for line tables, thus effectively stripping all type and
variable information from the module.
The primary motivation for this feature was the bitcode work flow
(cf. http://lists.llvm.org/pipermail/llvm-dev/2016-June/100643.html
for more background). This is not wired up yet, but will be in
subsequent patches. For testing, the new functionality is exposed to
opt with a -strip-nonlinetable-debuginfo option.
The secondary use-case (and one that works right now!) is as a
reduction pass in bugpoint. I added two new bugpoint options
(-disable-strip-debuginfo and -disable-strip-debug-types) to control
the new features. By default it will first attempt to remove all debug
information, then only the type info, and then proceed to hack at any
remaining MDNodes.
Thanks to Adrian Prantl for stewarding this patch!
Robert Lougher [Tue, 25 Oct 2016 18:44:07 +0000 (18:44 +0000)]
Remove debug location from common tail when tail-merging
The branch folding pass tail merges blocks into a common-tail. However, the
tail retains the debug information from one of the original inputs to the
merge (chosen randomly). This is a problem for sampled-based PGO, as hits
on the common-tail will be attributed to whichever block was chosen,
irrespective of which path was actually taken to the common-tail.
This patch fixes the issue by nulling the debug location for the common-tail.