Handle non-~0 lane masks on live-in registers in LivePhysRegs
When LivePhysRegs adds live-in registers, it recognizes ~0 as a special
lane mask indicating the entire register. If the lane mask is not ~0,
it will only add the subregisters that overlap the specified lane mask.
The problem is that if a live-in register does not have subregisters,
and the lane mask is not ~0, it will not be added to the live set.
(The given lane mask may simply be the lane mask of its register class.)
If a register does not have subregisters, add it to the live set if
the lane mask is non-zero.
Matt Arsenault [Fri, 28 Oct 2016 19:43:31 +0000 (19:43 +0000)]
AMDGPU: Fix using incorrect private resource with no allocation
It's possible to have a use of the private resource descriptor or
scratch wave offset registers even though there are no allocated
stack objects. This would result in continuing to use the maximum
number reserved registers. This could go over the number of SGPRs
available on VI, or violate the SGPR limit requested by
the function attributes.
Teresa Johnson [Fri, 28 Oct 2016 19:36:00 +0000 (19:36 +0000)]
[ThinLTO] Use flags from summary when writing variable summary (NFC)
We already read the flags out of the summary when writing the summary
records for functions and aliases, do the same for variables.
This is an NFC change for now since the flags computed on the fly from
the GlobalValue currently will always match those in the summary
already, but once I send a follow-on patch to set the NoRename flag for
locals in the llvm.used set this becomes a necessary change.
Lang Hames [Fri, 28 Oct 2016 18:24:15 +0000 (18:24 +0000)]
[Error] Unify +Asserts/-Asserts behavior for checked flags in Error/Expected<T>.
(1) Switches to raw pointer and bitmasking operations for Error payload.
(2) Always includes the 'unchecked' bitfield in Expected<T>, even in -Asserts.
(3) Always propagates checked bit status in move-ops for both classes, even in
-Asserts.
This should allow debug programs to link against release libraries without
encountering spurious 'unchecked error' terminations.
Error checks still aren't verified in release mode so this doesn't introduce
any new control flow, but it does require new bit-masking ops in release mode
to preserve the flag values during move ops. I expect the overhead to be
minimal, but if we discover any corner cases where it matters we could fix
this by making flag propagation conditional on a new build option.
Matthias Braun [Fri, 28 Oct 2016 18:05:05 +0000 (18:05 +0000)]
TargetPassConfig: Move addPass of IPRA RegUsageInfoProp down.
TargetPassConfig::addMachinePasses() does some housekeeping first:
Handling the -print-machineinstrs flag and doing an initial printing
"After Instruction Selection". There is no reason for RegUsageInfoProp
to run before those two steps.
Tom Stellard [Fri, 28 Oct 2016 15:32:28 +0000 (15:32 +0000)]
[Loads] Fix crash in is isDereferenceableAndAlignedPointer()
Summary:
We were trying to add APInt values with different bit sizes after
visiting an addrspacecast instruction which changed the bit width
of the pointer.
Teresa Johnson [Fri, 28 Oct 2016 15:30:27 +0000 (15:30 +0000)]
[cmake] Temporarily revert enforcement of minimum GCC version increase
Summary:
This is temporary, until bot that builds public facing LLVM
documentation is upgraded. It reverts only the cmake change in r284497,
but leaves the doc changes in place to preserve intent.
Davide Italiano [Fri, 28 Oct 2016 02:47:09 +0000 (02:47 +0000)]
[Reassociate] Removing instructions mutates the IR.
Fixes PR 30784. Discussed with Justin, who pointed out that
in the new PassManager infrastructure we can have more fine-grained
control on which analyses we want to preserve, but this is the
best we can do with the current infrastructure.
Teresa Johnson [Fri, 28 Oct 2016 02:39:38 +0000 (02:39 +0000)]
[ThinLTO] Create AliasSummary when building index
Summary:
Previously we were creating the alias summary on the fly while writing
the summary to bitcode. This moves the creation of these summaries to
the module summary index builder where we build the rest of the summary
index.
This is going to be necessary for setting the NoRename flag for values
possibly used in inline asm or module level asm.
Vedant Kumar [Thu, 27 Oct 2016 23:17:51 +0000 (23:17 +0000)]
[Coverage] Darwin: Move __llvm_covmap from __DATA to __LLVM_COV
Programs with very large __llvm_covmap sections may fail to link on
Darwin because because of out-of-range 32-bit RIP relative references.
It isn't possible to work around this by using the large code model
because it isn't supported on Darwin. One solution is to move the
__llvm_covmap section past the end of the __DATA segment.
=== Testing ===
In addition to check-{llvm,clang,profile}, I performed a link test on a
simple object after injecting ~4GB of padding into __llvm_covmap:
This patch should not pose any backwards-compatibility concerns. LLVM
is expected to scan all of the sections in a binary for __llvm_covmap,
so changing its segment shouldn't affect anything. I double-checked this
by loading coverage produced by an unpatched compiler with a patched
llvm-cov.
Update .debug_line section version information to match DWARF version.
In the past the compiler always emitted .debug_line version 2, though some opcodes from DWARF 3 (e.g. DW_LNS_set_prologue_end, DW_LNS_set_epilogue_begin or DW_LNS_set_isa) and from DWARF 4 could be emitted by the compiler.
This patch changes version information of .debug_line to exactly match the DWARF version. For .debug_line version 4, a new field maximum_operations_per_instruction is emitted.
Tim Shen [Thu, 27 Oct 2016 21:39:51 +0000 (21:39 +0000)]
[APFloat] Add DoubleAPFloat mode to APFloat. NFC.
Summary:
This patch adds DoubleAPFloat mode to APFloat.
Now, an APFloat with semantics PPCDoubleDouble will have DoubleAPFloat layout
(APFloat.U.Double), which contains two underlying APFloats as
PPCDoubleDoubleImpl and IEEEdouble semantics. Currently the IEEEdouble APFloat
is not used, and the first APFloat behaves exactly the same before this change.
This patch consists of three kinds of logics:
1) Construction and destruction of APFloat. Now the ctors, dtor, assign
opertors and factory functions construct different underlying layout
based on the semantics passed in.
2) s/IEEE/getIEEE()/ for normal, lifetime-unrelated computation functions.
These functions only access Floats[0] in DoubleAPFloat, which is the
same as today's semantic.
3) A "Double dispatch" function, APFloat::convert. Converting between two
different layouts requires appropriate logic.
BitcodeReader: Require clients to read the block info block at most once.
This change makes it the client's responsibility to call ReadBlockInfoBlock()
at most once. This is in preparation for a future change that will allow
there to be multiple block info blocks.
See also: http://lists.llvm.org/pipermail/llvm-dev/2016-October/106512.html
Kyle Butt [Thu, 27 Oct 2016 21:37:20 +0000 (21:37 +0000)]
CodeGen: Handle missed case of block removal during BlockPlacement.
There is a use after free bug in the existing code. Loop layout selects
a preferred exit block, and then lays out the loop. If this block is
removed during layout, it needs to be invalidated to prevent a use after
free.
Kevin Enderby [Thu, 27 Oct 2016 20:59:10 +0000 (20:59 +0000)]
Another additional error check for invalid Mach-O files for the
obsolete load commands.
Again the philosophy of the error checking in libObject for
Mach-O files, the idea behind the checking is that we never
will return a Mach-O file out of libObject that contains unknown
things the library code can’t operate on. So known obsolete
load commands will cause a hard error.
Also to make things clear I have added comments to the
values and structures in Support/Mach-O.h and
Support/MachO.def as to what is obsolete.
As noted in a TODO in the code, there may need to be a
non-default mode to allow some unknown values for well
structured Mach-O files with things like unknown load
load commands. So things like using an old lldb on a newer
Mach-O file could still provide some limited functionality.
Ehsan Amiri [Thu, 27 Oct 2016 19:10:09 +0000 (19:10 +0000)]
[PPC] Adding the removed testcase again
This testcase was originally part of r284995, but I put it in a wrong directory.
So I removed it. Before adding it back I did some small enhancements. Also I
changed the assertions a little bit, to take into account the impact of some
changes performed since code review is done.
This is similar to changes done for another testcase in the original commit.
See: https://reviews.llvm.org/D23614#577749
Basically for instead of vxor we now generate xxlxor in some cases, which is
better.
This works great most of the time. However, if the body of the function is
greater than 127 bytes, the branch target limitation of cbz becomes an issue.
This occurs in the unoptimized code generation cases sometimes (like in
compiler-rt).
Since this is a matter of correctness, possibly pay a small penalty instead. We
now form this slightly differently:
cbnz .Lbody
udf #249 @ __brkdiv0
.Lbody:
...
The positive case is through the branch instead of being the next instruction.
However, because of the basic block layout, the negated branch is going to be
a short distance always (2 bytes away, after the inserted __brkdiv0).
The new t__brkdiv0 instruction is required to explicitly mark the instruction as
a terminator as the generic UDF instruction is not a terminator.
Greg Clayton [Thu, 27 Oct 2016 16:32:04 +0000 (16:32 +0000)]
Switch all DWARF variables for tags, attributes and forms over to use the llvm::dwarf enumerations instead of using raw uint16_t values. This allows easier debugging as users can see the values of the enumerations in the variables view that will show the enumeration string instead of just a number.
Dehao Chen [Thu, 27 Oct 2016 16:30:08 +0000 (16:30 +0000)]
Add Loop Sink pass to reverse the LICM based of basic block frequency.
Summary: LICM may hoist instructions to preheader speculatively. Before code generation, we need to sink down the hoisted instructions inside to loop if it's beneficial. This pass is a reverse of LICM: looking at instructions in preheader and sinks the instruction to basic blocks inside the loop body if basic block frequency is smaller than the preheader frequency.
[mips] Do not allow -opt-bisect-limit to skip the PIC call optimization pass.
r282428 added the MipsOptimizePICCall as an opt-in pass that can be
skipped when using the -opt-bisect-limit option. However, this pass is
needed because it generates code that conforms to the o32 ABI
specification by using the $t9 register for PIC calls with JALR
instructions.
This bug was exposed by the fact that skipFunction() also checks for
the "optnone" attribute. This caused functions with that attribute to
break the requirements of the o32 ABI.
Bjorn Pettersson [Thu, 27 Oct 2016 14:48:09 +0000 (14:48 +0000)]
Fix memory issue in AttrBuilder::removeAttribute uses.
Summary:
Found when running Valgrind.
This removes two unnecessary assignments when using
AttrBuilder::removeAttribute.
AttrBuilder::removeAttribute returns a reference to the object.
As the LHSes were the same as the callees, the assignments
resulted in memcpy calls where dst = src.
Simon Pilgrim [Thu, 27 Oct 2016 14:29:28 +0000 (14:29 +0000)]
[DAGCombiner] Add vector demanded elements support to computeKnownBits
Currently computeKnownBits returns the common known zero/one bits for all elements of vector data, when we may only be interested in one/some of the elements.
This patch adds a DemandedElts argument that allows us to specify the elements we actually care about. The original computeKnownBits implementation calls with a DemandedElts demanding all elements to match current behaviour. Scalar types set this to 1.
The approach was found to be easier than trying to add a per-element known bits solution, for a similar usefulness given the combines where computeKnownBits is typically used.
I've only added support for a few opcodes so far (the ones that have proven straightforward to test), all others will default to demanding all elements but can be updated in due course.
DemandedElts support could similarly be added to computeKnownBitsForTargetNode in a future commit.
Alexey Bataev [Thu, 27 Oct 2016 12:02:28 +0000 (12:02 +0000)]
[SLP] Fix for PR30626: Compiler crash inside SLP Vectorizer.
After successfull horizontal reduction vectorization attempt for PHI node
vectorizer tries to update root binary op by combining vectorized tree
and the ReductionPHI node. But during vectorization this ReductionPHI
can be vectorized itself and replaced by the `undef` value, while the
instruction itself is marked for deletion. This 'marked for deletion'
PHI node then can be used in new binary operation, causing "Use still
stuck around after Def is destroyed" crash upon PHI node deletion.
Also the test is fixed to make it perform actual testing.
George Rimar [Thu, 27 Oct 2016 11:50:04 +0000 (11:50 +0000)]
[Object/ELF] - Fixed behavior when SectionHeaderTable->sh_size is too large.
Elf.h already has code checking that section table does not go past end of file.
Problem is that this check may not work on values greater than UINT64_MAX / Header->e_shentsize
because of calculation overflow.
Sam Parker [Thu, 27 Oct 2016 09:47:10 +0000 (09:47 +0000)]
[ARM] Predicate UMAAL selection on hasDSP.
UMAAL is a DSP instruction and it is not available on thumbv7m
(Cortex-M3) and thumbv6m (Cortex-M0+1) targets. Also fix wrong
CHECK prefix in longMAC.ll test.
Nicolai Haehnle [Thu, 27 Oct 2016 08:15:07 +0000 (08:15 +0000)]
AMDGPU: Fix SILoadStoreOptimizer when writes cannot be merged due register dependencies
Summary:
When finding a match for a merge and collecting the instructions that must
be moved, keep in mind that the instruction we merge might actually use one
of the defs that are being moved.
The fact that the ds_read in the test case is not eliminated suggests that
there might be another problem related to alias analysis, but that's a
separate problem: this pass should still work correctly even when earlier
optimization passes missed something or were disabled.
Justin Bogner [Wed, 26 Oct 2016 22:37:52 +0000 (22:37 +0000)]
llvm-objdump: Make some error messages more consistent
Most of the version of report_error were quoting the filename and
printing a colon between the file name and the error message, but this
one wasn't doing either of those. Fix the output to be more
consistent.
Vedant Kumar [Wed, 26 Oct 2016 22:07:39 +0000 (22:07 +0000)]
[utils] Add a '--unified-report' option to the code coverage prep script
In --unified-report mode, a single coverage report is prepared for all
specified binaries and written to *report_dir*. This mode is compatible
with all existing script options, including the --restrict mode which is
used to limit coverage reporting to certain files or directories.
This should not break any existing users of the script.
Vedant Kumar [Wed, 26 Oct 2016 22:07:35 +0000 (22:07 +0000)]
[utils] Add an '--only-merge' option to the code coverage prep script
In --only-merge mode, the script terminates after the profile merging
step. This makes the script less stateful: it's more natural to split
the merge out into a separate step instead of relying on the first
invocation of the script to do it.
This should not break any existing users of the script.