[mips] Do not allow -opt-bisect-limit to skip the PIC call optimization pass.
r282428 added the MipsOptimizePICCall as an opt-in pass that can be
skipped when using the -opt-bisect-limit option. However, this pass is
needed because it generates code that conforms to the o32 ABI
specification by using the $t9 register for PIC calls with JALR
instructions.
This bug was exposed by the fact that skipFunction() also checks for
the "optnone" attribute. This caused functions with that attribute to
break the requirements of the o32 ABI.
Bjorn Pettersson [Thu, 27 Oct 2016 14:48:09 +0000 (14:48 +0000)]
Fix memory issue in AttrBuilder::removeAttribute uses.
Summary:
Found when running Valgrind.
This removes two unnecessary assignments when using
AttrBuilder::removeAttribute.
AttrBuilder::removeAttribute returns a reference to the object.
As the LHSes were the same as the callees, the assignments
resulted in memcpy calls where dst = src.
Simon Pilgrim [Thu, 27 Oct 2016 14:29:28 +0000 (14:29 +0000)]
[DAGCombiner] Add vector demanded elements support to computeKnownBits
Currently computeKnownBits returns the common known zero/one bits for all elements of vector data, when we may only be interested in one/some of the elements.
This patch adds a DemandedElts argument that allows us to specify the elements we actually care about. The original computeKnownBits implementation calls with a DemandedElts demanding all elements to match current behaviour. Scalar types set this to 1.
The approach was found to be easier than trying to add a per-element known bits solution, for a similar usefulness given the combines where computeKnownBits is typically used.
I've only added support for a few opcodes so far (the ones that have proven straightforward to test), all others will default to demanding all elements but can be updated in due course.
DemandedElts support could similarly be added to computeKnownBitsForTargetNode in a future commit.
Alexey Bataev [Thu, 27 Oct 2016 12:02:28 +0000 (12:02 +0000)]
[SLP] Fix for PR30626: Compiler crash inside SLP Vectorizer.
After successfull horizontal reduction vectorization attempt for PHI node
vectorizer tries to update root binary op by combining vectorized tree
and the ReductionPHI node. But during vectorization this ReductionPHI
can be vectorized itself and replaced by the `undef` value, while the
instruction itself is marked for deletion. This 'marked for deletion'
PHI node then can be used in new binary operation, causing "Use still
stuck around after Def is destroyed" crash upon PHI node deletion.
Also the test is fixed to make it perform actual testing.
George Rimar [Thu, 27 Oct 2016 11:50:04 +0000 (11:50 +0000)]
[Object/ELF] - Fixed behavior when SectionHeaderTable->sh_size is too large.
Elf.h already has code checking that section table does not go past end of file.
Problem is that this check may not work on values greater than UINT64_MAX / Header->e_shentsize
because of calculation overflow.
Sam Parker [Thu, 27 Oct 2016 09:47:10 +0000 (09:47 +0000)]
[ARM] Predicate UMAAL selection on hasDSP.
UMAAL is a DSP instruction and it is not available on thumbv7m
(Cortex-M3) and thumbv6m (Cortex-M0+1) targets. Also fix wrong
CHECK prefix in longMAC.ll test.
Nicolai Haehnle [Thu, 27 Oct 2016 08:15:07 +0000 (08:15 +0000)]
AMDGPU: Fix SILoadStoreOptimizer when writes cannot be merged due register dependencies
Summary:
When finding a match for a merge and collecting the instructions that must
be moved, keep in mind that the instruction we merge might actually use one
of the defs that are being moved.
The fact that the ds_read in the test case is not eliminated suggests that
there might be another problem related to alias analysis, but that's a
separate problem: this pass should still work correctly even when earlier
optimization passes missed something or were disabled.
Justin Bogner [Wed, 26 Oct 2016 22:37:52 +0000 (22:37 +0000)]
llvm-objdump: Make some error messages more consistent
Most of the version of report_error were quoting the filename and
printing a colon between the file name and the error message, but this
one wasn't doing either of those. Fix the output to be more
consistent.
Vedant Kumar [Wed, 26 Oct 2016 22:07:39 +0000 (22:07 +0000)]
[utils] Add a '--unified-report' option to the code coverage prep script
In --unified-report mode, a single coverage report is prepared for all
specified binaries and written to *report_dir*. This mode is compatible
with all existing script options, including the --restrict mode which is
used to limit coverage reporting to certain files or directories.
This should not break any existing users of the script.
Vedant Kumar [Wed, 26 Oct 2016 22:07:35 +0000 (22:07 +0000)]
[utils] Add an '--only-merge' option to the code coverage prep script
In --only-merge mode, the script terminates after the profile merging
step. This makes the script less stateful: it's more natural to split
the merge out into a separate step instead of relying on the first
invocation of the script to do it.
This should not break any existing users of the script.
Evandro Menezes [Wed, 26 Oct 2016 22:06:20 +0000 (22:06 +0000)]
[AArch64] Create feature set for Samsung Exynos-M2
Since Exynos-M2 improved the FP square root unit a bit over the one in
Exynos-M1, it does not benefit from using the Newton series for such
operations.
Ehsan Amiri [Wed, 26 Oct 2016 20:16:59 +0000 (20:16 +0000)]
[PPC] Remove testcase from incorrect directory
During my last commit this testcase was put in an incorrect directory. Removing
it. Will put it in the right directory when I can verify everything is correct.
Tim Northover [Wed, 26 Oct 2016 20:01:00 +0000 (20:01 +0000)]
ARM: don't rely on push/pop reglists being in order when folding SP adjust.
It would be a very nice invariant to rely on, but unfortunately it doesn't
necessarily hold (and the causes of mis-sorted reglists appear to be quite
varied) so to be robust the frame lowering code can't assume that the first
register in the list is also the first one that actually gets pushed.
Should fix an issue where we were turning something like:
Nemanja Ivanovic [Wed, 26 Oct 2016 19:51:35 +0000 (19:51 +0000)]
Do not assume that FP vector operands are never legalized by expanding
This patch ensures that if a floating point vector operand is legalized by
expanding, it is legalized through the stack rather than by calling
DAGTypeLegalizer::IntegerToVector which will cause a failure since the operand
is a non-integer type.
Sanjoy Das [Wed, 26 Oct 2016 19:18:43 +0000 (19:18 +0000)]
Simplify `x >=u x >> y` and `x >=u x udiv y`
Summary:
Extends InstSimplify to handle both `x >=u x >> y` and `x >=u x udiv y`.
This is a folloup of rL258422 and
https://github.com/rust-lang/rust/pull/30917 where llvm failed to
optimize away the bounds checking in a binary search.
Dan Gohman [Wed, 26 Oct 2016 17:44:09 +0000 (17:44 +0000)]
[WebAssembly] Update the README.txt.
Update the README.txt with newer information, add a link to the Emscripten
page explaining the current easiest way to use the LLVM wasm backend, and
mention that other ways of using the LLVM wasm backend are in development.
Robert Lougher [Wed, 26 Oct 2016 17:01:47 +0000 (17:01 +0000)]
Reapply: "Remove debug location from common tail when tail-merging"
This reapplies revision 285093. Original commit message:
The branch folding pass tail merges blocks into a common-tail. However, the
tail retains the debug information from one of the original inputs to the
merge (chosen randomly). This is a problem for sampled-based PGO, as hits
on the common-tail will be attributed to whichever block was chosen,
irrespective of which path was actually taken to the common-tail.
This patch fixes the issue by nulling the debug location for the common-tail.
Dehao Chen [Wed, 26 Oct 2016 15:48:45 +0000 (15:48 +0000)]
Introduce updateDiscriminator interface to DILocation to make it cleaner assigning discriminators.
Summary: This patch introduces updateDiscriminator to DILocation so that it can be directly called by AddDiscriminator. It also makes it easier to update the discriminator later.
Matt Arsenault [Wed, 26 Oct 2016 14:53:50 +0000 (14:53 +0000)]
Fix nondeterministic output in local stack slot alloc pass
This finds all of the references to a frame index in a function, and
sorts by the offset. If multiple instructions use the same offset,
nothing was breaking the tie for sorting.
This avoids the test failures the reverted r282999 introduced.
Victor Leschuk [Wed, 26 Oct 2016 11:59:03 +0000 (11:59 +0000)]
DebugInfo: support for DWARFv5 DW_AT_alignment attribute
* Assume that clang passes non-zero alignment value to DIBuilder
only in case when it was forced by C++11 'alignas', C11 '_Alignas'
or compiler attribute '__attribute__((aligned (N)))'.
* Emit DW_AT_alignment if alignment is specified for type/object.
Andrea Di Biagio [Wed, 26 Oct 2016 10:28:32 +0000 (10:28 +0000)]
[IndVarSimplify][DebugLoc] When widening the exit loop condition, correctly reuse the debug location of the original comparison.
When the loop exit condition is canonicalized as a != compaison, reuse the
debug location of the original (non canonical) comparison.
Before this patch, the debug location of the new icmp was obtained from the
loop latch terminator. This patch fixes the issue by correctly setting the
IRBuilder's "current debug location" to the location of the original compare.
Victor Leschuk [Wed, 26 Oct 2016 08:55:27 +0000 (08:55 +0000)]
DebugInfo: support for DWARFv5 DW_AT_alignment attribute
* Assume that clang passes non-zero alignment value to DIBuilder
only in case when it was forced by C++11 'alignas', C11 '_Alignas'
or compiler attribute '__attribute__((aligned (N)))'.
* Emit DW_AT_alignment if alignment is specified for type/object.
Summary: Clang's intrinsic header currently tries to negate the third operand of a vfmadd mask3 in order to create vfmsub, but this fails isel. This patch adds scalar vfmsub and vfnmsub mask3 that we can use instead to avoid the negate. This is consistent with the packed instructions.
[XRay] Implement `llvm-xray extract`, start of the llvm-xray tool
Usage:
llvm-xray extract <object file> [-o <filename or '-'>]
The tool gets the XRay instrumentation map from an object file and turns
it into YAML. We first support ELF64 sleds on x86_64 binaries, with
provision for supporting other supported platforms and formats later.
This is the first of a many-part change to fully implement the
`llvm-xray` tool.
We also define a subcommand registration and dispatch mechanism to be
used by other further subcommand implementations for llvm-xray.
[XRay] Implement `llvm-xray extract`, start of the llvm-xray tool
Usage:
llvm-xray extract <object file> [-o <filename or '-'>]
The tool gets the XRay instrumentation map from an object file and turns
it into YAML. We first support ELF64 sleds on x86_64 binaries, with
provision for supporting other supported platforms and formats later.
This is the first of a many-part change to fully implement the
`llvm-xray` tool.
We also define a subcommand registration and dispatch mechanism to be
used by other further subcommand implementations for llvm-xray.
James Y Knight [Tue, 25 Oct 2016 22:13:28 +0000 (22:13 +0000)]
[Sparc] Don't overlap variable-sized allocas with other stack variables.
On SparcV8, it was previously the case that a variable-sized alloca
might overlap by 4-bytes the last fixed stack variable, effectively
because 92 (the number of bytes reserved for the register spill area) !=
96 (the offset added to SP for where to start a DYNAMIC_STACKALLOC).
It's not as simple as changing 96 to 92, because variables that should
be 8-byte aligned would then be misaligned.
For now, simply increase the allocation size by 8 bytes for each dynamic
allocation -- wastes space, but at least doesn't overlap. As the large
comment says, doing this more efficiently will require larger changes in
llvm.
Also adds some test cases showing that we continue to not support
dynamic stack allocation and over-alignment in the same function.
Bob Haarman [Tue, 25 Oct 2016 22:11:52 +0000 (22:11 +0000)]
[codeview] support emitting indirect virtual base class information
Summary:
Fixes PR28281.
MSVC lists indirect virtual base classes in the field list of a class,
using LF_IVBCLASS records. This change makes LLVM emit such records
when processing DW_TAG_inheritance tags with the DIFlagVirtual and
(newly introduced) DIFlagIndirect tags.
Rong Xu [Tue, 25 Oct 2016 21:47:24 +0000 (21:47 +0000)]
[PGO] Fix select instruction annotation
Summary:
Select instruction annotation in IR PGO uses the edge count to infer the
branch count. It's currently placed in setInstrumentedCounts() where
no all the BB counts have been computed. This leads to wrong branch weights.
Move the annotation after all BB counts are populated.
Lang Hames [Tue, 25 Oct 2016 21:19:30 +0000 (21:19 +0000)]
[docs] Add more Error documentation to the Programmer's Manual.
This patch updates some of the existing Error examples, expands on the
documentation for handleErrors, and includes new sections that cover
a number of helpful utilities and common error usage idioms.
Guozhi Wei [Tue, 25 Oct 2016 20:43:42 +0000 (20:43 +0000)]
[InstCombine] Resubmit the combine of A->B->A BitCast and fix for pr27996
The original patch of the A->B->A BitCast optimization was reverted by r274094 because it may cause infinite loop inside compiler https://llvm.org/bugs/show_bug.cgi?id=27996.
The problem is with following code
xB = load (type B);
xA = load (type A);
+yA = (A)xB; B -> A
+zAn = PHI[yA, xA]; PHI
+zBn = (B)zAn; // A -> B
store zAn;
store zBn;
optimizeBitCastFromPhi generates
+zBn = (B)zAn; // A -> B
and expects it will be combined with the following store instruction to another
store zAn
Unfortunately before combineStoreToValueType is called on the store instruction, optimizeBitCastFromPhi is called on the new BitCast again, and this pattern repeats indefinitely.
optimizeBitCastFromPhi only generates BitCast for load/store instructions, only the BitCast before store can cause the reexecution of optimizeBitCastFromPhi, and BitCast before store can easily be handled by InstCombineLoadStoreAlloca.cpp. So the solution to the problem is if all users of a CI are store instructions, we should not do optimizeBitCastFromPhi on it. Then optimizeBitCastFromPhi will not be called on the new BitCast instructions.