Jonas Paulsson [Wed, 12 Apr 2017 12:41:37 +0000 (12:41 +0000)]
[LoopVectorizer, TTI] New method supportsEfficientVectorElementLoadStore()
Since SystemZ supports vector element load/store instructions, there is no
need for extracts/inserts if a vector load/store gets scalarized.
This patch lets Target specify that it supports such instructions by means of
a new TTI hook that defaults to false.
The use for this is in the LoopVectorizer getScalarizationOverhead() method,
which will with this patch produce a smaller sum for a vector load/store on
SystemZ.
New test: test/Transforms/LoopVectorize/SystemZ/load-store-scalarization-cost.ll
Review: Adam Nemet
https://reviews.llvm.org/D30680
George Rimar [Wed, 12 Apr 2017 08:59:15 +0000 (08:59 +0000)]
[DWARF] - Refactoring of DWARFContextInMemory implementation.
This change is basically relative to D31136, where I initially wanted to
implement some relocations handling optimization which shows it can give
significant boost. Though even without any caching algorithm looks
code can have some cleanup at first.
Refactoring separates out code for taking symbol address, used in relocations
computation.
[IR] Rename the class templates for the case iterator and case handle to
not collide with the naming convention for template *arguments*. In at
least one case they actually collided and this confuses MSVC.
Daniel Sanders [Wed, 12 Apr 2017 08:23:08 +0000 (08:23 +0000)]
[globalisel][tablegen] Add experimental support for OperandWithDefaultOps, PredicateOperand, and OptionalDefOperand
Summary:
As far as instruction selection is concerned, all three appear to be same thing.
Support for these operands is experimental since AArch64 doesn't make use
of them and the in-tree targets that do use them (AMDGPU for
OperandWithDefaultOps, AMDGPU/ARM/Hexagon/Lanai for PredicateOperand, and ARM
for OperandWithDefaultOps) are not using tablegen-erated GlobalISel yet.
Reviewers: rovka, aditya_nandakumar, t.p.northover, qcolombet, ab
Summary:
Dead basic blocks may be forming a loop, for which SSA form is
fulfilled, but with a circular def-use chain. LoadCombine could
enter an infinite loop when analysing such dead code. This patch
solves the problem by simply avoiding to analyse all basic blocks
that aren't forward reachable, from function entry, in LoadCombine.
Piotr Padlewski [Wed, 12 Apr 2017 07:59:35 +0000 (07:59 +0000)]
Invariant.group and mustalias docs fixes
Summary:
Alias analysis would like to know that
invariant.group.barrier returns pointer that mustalias,
but this can't imply that we can replace one pointer with another
[IR] Redesign the case iterator in SwitchInst to actually be an iterator
and to expose a handle to represent the actual case rather than having
the iterator return a reference to itself.
All of this allows the iterator to be used with common STL facilities,
standard algorithms, etc.
Doing this exposed some missing facilities in the iterator facade that
I've fixed and required some work to the actual iterator to fully
support the necessary API.
[BPI] Refactor post domination calculation and simple fix for ColdCall
Collection of PostDominatedByUnreachable and PostDominatedByColdCall have been
split out of heuristics itself. Update of the data happens now for each basic
block (before update for PostDominatedByColdCall might be skipped if
unreachable or matadata heuristic handled this basic block).
This separation allows re-ordering of heuristics without loosing
the post-domination information.
CodeGen: BlockPlacement: Clear ComputedEdges between functions.
Not clearing was causing non-deterministic compiles for large files. Addresses
for MachineBasicBlocks would end up colliding and we would lay out a block that
we assumed had been pre-computed when it had not been.
Bob Haarman [Wed, 12 Apr 2017 01:43:07 +0000 (01:43 +0000)]
ThinLTOBitcodeWriter: keep comdats together, rename if leader is renamed
Summary:
COFF requires that every comdat contain a symbol with the same name as
the comdat. ThinLTOBitcodeWriter renames symbols, which may cause this
requirement to be violated. This change avoids such violations by
renaming comdats if their leaders are renamed. It also keeps comdats
together when splitting modules.
[IR] Add AttributeSet to hide AttributeSetNode* again, NFC
Summary:
For now, it just wraps AttributeSetNode*. Eventually, it will hold
AvailableAttrs as an inline bitset, and adding and removing enum
attributes will be super cheap.
This sinks AttributeSetNode back down to lib/IR/AttributeImpl.h.
Anna Thomas [Tue, 11 Apr 2017 21:02:00 +0000 (21:02 +0000)]
[LV] Avoid vectorizing first order recurrence when phi uses are outside loop
In the vectorization of first order recurrence, we vectorize such
that the last element in the vector will be the one extracted to pass into the
scalar remainder loop. However, this is not true when there is a phi (other
than the primary induction variable) is used outside the loop.
In such a case, we need the value from the second last iteration (i.e.
the phi value), not the last iteration (which would be the phi update).
I've added a test case for this. Also see PR32396.
A follow up patch would generate the correct code gen for such cases,
and turn this vectorization on.
Daniel Berlin [Tue, 11 Apr 2017 20:06:36 +0000 (20:06 +0000)]
MemorySSA: Move to Analysis, from Transforms/Utils. It's used as
Analysis, it has Analysis passes, and once NewGVN is made an Analysis,
this removes the cross dependency from Analysis to Transform/Utils.
NFC.
If you run llc -stop-after=codegenprepare and feed the resulting MIR
to llc -start-after=codegenprepare, you'll have an empty machine
function since we haven't run any isel yet. Of course, this only works
if the MIRParser believes you that this is okay.
This is essentially a revert of r241862 with a fix for the problem it
was papering over.
This patch assumes that the dependents to be scanned for the ExitSU are its
predecessors; otherwise, the successors of the instr are scanned.
Furthermore, sometimes the ExitSU was being fused twice, since it may be
fused once when scanning the successors from the beginning of the BB and
then again when scanning the predecessors of ExitSU. Thus, when scanning
the successors of an instr, skip the ExitSU.
Andrea Di Biagio [Tue, 11 Apr 2017 19:07:30 +0000 (19:07 +0000)]
[AddDiscriminators] Assign discriminators to MemIntrinsic calls.
Before this patch, pass AddDiscriminators always avoided to assign
discriminators to intrinsic calls. This was done mainly for two reasons:
1) We wanted to minimize the number of based discriminators used.
2) We wanted to avoid non-deterministic discriminator assignment for
different debug levels.
Unfortunately, that approach was problematic for MemIntrinsic calls.
MemIntrinsic calls can be split by SROA into loads and stores, and each new
load/store instruction would obtain the debug location from the original
intrinsic call.
If we don't assign a discriminator to MemIntrinsic calls, then we cannot
correctly set the discriminator for the newly created loads and stores.
This may have a negative impact on the basic block weight computation
performed by the SampleLoader.
This patch fixes the issue by letting MemIntrinsic calls have a discriminator.
llvm-lto2: Move the LTO::run() action behind a subcommand.
Move LTO::run() to a "run" subcommand so that we can introduce new subcommands
for testing different parts of the LTO implementation.
This doesn't use llvm::cl subcommands because it doesn't appear to be currently
possible to pass an argument not associated with a subcommand to a subcommand
(e.g. -lto-use-new-pm, -mcpu=yonah).
[InstCombine] Use ConstantExpr::getBinOpIdentity to implement getIdentityValue.
This removes a TODO in getIdentityValue and may allow some transforms to occur earlier. But I was unable to find any transforms we didn't already handle.
[PDB] Emit index/offset pairs for TPI and IPI streams
Summary:
This lets PDB readers lookup type record data by type index in O(log n)
time. It also enables makes `cvdump -t` work on PDBs produced by LLD.
cvdump will not dump a PDB that doesn't have an index-to-offset table.
The table is sorted by type index, and has an entry every 8KB. Looking
up a type record by index is a binary search of this table, followed by
a scan of at most 8KB.
Module::getOrInsertFunction is using C-style vararg instead of variadic templates.
From a user prospective, it forces the use of an annoying nullptr to mark the end of the vararg, and there's not type checking on the arguments.
The variadic template is an obvious solution to both issues.
Summary:
In rL299692 I improved strip-dead-debug-info's ability to drop CUs that are not
referenced from the current module. However, in doing so I neglected to realize
that some SPs could be referenced entirely from inlined functions. It appears
I was not the only one to make this mistake, because DebugInfoFinder, doesn't
find those SPs either. Fix this in DebugInfoFinder and then use that to make
sure not to drop those CUs in strip-dead-debug-info.
[GlobalISel] LegalizerInfo: Enable legalization of non-power-of-2 types
Summary: Legalize only if the type is marked as Legal or Custom. If not, return Unsupported as LegalizerHelper is not able to handle non-power-of-2 types right now.
Reviewers: qcolombet, aditya_nandakumar, dsanders, t.p.northover, kristof.beyls, javed.absar, ab
Sam Parker [Tue, 11 Apr 2017 08:43:32 +0000 (08:43 +0000)]
[SelectionDAG] Check CALLSEQ_BEGIN nodes in DelayForLiveRegs
A fix for the bug reported in PR30911.
The issue arises when multiple CALLSEQ_BEGIN nodes are unscheduled as
the last node to be unscheduled will gain access to the CallResource
register. But when a node is being picked, only CALLSEQ_END nodes are
checked against the CallResource and have their chains evaluated.
This then means that other CALLSEQ_BEGIN nodes can be scheduled
before the existing call sequence has been finalised. This patch adds
a check against the FrameSetup nodes in DelayForLiveRegs to prevent
this from happening.
Module::getOrInsertFunction is using C-style vararg instead of
variadic templates.
From a user prospective, it forces the use of an annoying nullptr
to mark the end of the vararg, and there's not type checking on the
arguments. The variadic template is an obvious solution to both
issues.
Sanjoy Das [Tue, 11 Apr 2017 04:11:47 +0000 (04:11 +0000)]
[LoopUnswitch] Fix a test case
(h/t to Chandler for pointing this out)
The test in question was not at all testing what it was supposed to
test. We do not //care// about placing `!make.implicit` in inner
constant branch (since it will be folded away anyway). We care about
placing `!make.implicit` in the outer branch that switches between
either version of the loop.
Having said that, it is _correct_ to leave behind the `!make.implicit`
in the inner branch, but there is no need to do so.
Hal Finkel [Tue, 11 Apr 2017 02:03:17 +0000 (02:03 +0000)]
[PowerPC] multiply-with-overflow might use the CTR register
Check the legality of ISD::[US]MULO to see whether
Intrinsic::[us]mul_with_overflow will legalize into a function call (and, thus,
will use the CTR register). Fixes PR32485.
Hal Finkel [Tue, 11 Apr 2017 00:18:42 +0000 (00:18 +0000)]
[bugpoint] Also remove comdat's from externalized GVs
We were removing comdats from externalized functions (function declarations
can't be comdat), but were not doing the same for variable. Failure to do this
would cause bugpoint to fail ("Declaration may not be in a Comdat!").
Remove AttributeSetNode::get(AttributeList, unsigned) and sink constructor
The getter was equivalent to AttributeList::getAttributes(unsigned),
which seems like a better way to express getting the AttributeSet for a
given index. This static helper was only used in one place anyway.
The constructor doesn't benefit from inlining and doesn't need to be in
a header.
Reland "[IR] Make AttributeSetNode public, avoid temporary AttributeList copies"
This re-lands r299875.
I introduced a bug in Clang code responsible for replacing K&R, no
prototype declarations with a real function definition with a prototype.
The bug was here:
// Collect any return attributes from the call.
- if (oldAttrs.hasAttributes(llvm::AttributeList::ReturnIndex))
- newAttrs.push_back(llvm::AttributeList::get(newFn->getContext(),
- oldAttrs.getRetAttributes()));
+ newAttrs.push_back(oldAttrs.getRetAttributes());
Previously getRetAttributes() carried AttributeList::ReturnIndex in its
AttributeList. Now that we return the AttributeSetNode* directly, it no
longer carries that index, and we call this overload with a single node:
AttributeList::get(LLVMContext&, ArrayRef<AttributeSetNode*>)
That aborted with an assertion on x86_32 targets. I added an explicit
triple to the test and added CHECKs to help find issues like this in the
future sooner.
CodeGen: BlockPlacement: Don't always tail-duplicate with no other successor.
The math works out where it can actually be counter-productive. The probability
calculations correctly handle the case where the alternative is 0 probability,
rely on those calculations.
Includes a test case that demonstrates the problem.
CodeGen: BranchFolding: Merge identical blocks, even if they are short.
Merging identical blocks when it doesn't reduce fallthrough. It is common for
the blocks created from critical edge splitting to be identical. We would like
to merge these blocks whenever doing so would not reduce fallthrough.
Matt Arsenault [Mon, 10 Apr 2017 22:27:50 +0000 (22:27 +0000)]
Allow DataLayout to specify addrspace for allocas.
LLVM makes several assumptions about address space 0. However,
alloca is presently constrained to always return this address space.
There's no real way to avoid using alloca, so without this
there is no way to opt out of these assumptions.
The problematic assumptions include:
- That the pointer size used for the stack is the same size as
the code size pointer, which is also the maximum sized pointer.
- That 0 is an invalid, non-dereferencable pointer value.
These are problems for AMDGPU because alloca is used to
implement the private address space, which uses a 32-bit
index as the pointer value. Other pointers are 64-bit
and behave more like LLVM's notion of generic address
space. By changing the address space used for allocas,
we can change our generic pointer type to be LLVM's generic
pointer type which does have similar properties.
[GVNHoist] Call isGuaranteedToTransferExecutionToSuccessor on each instruction
w.r.t. https://bugs.llvm.org/show_bug.cgi?id=32153
The consensus seems to be isGuaranteedToTransferExecutionToSuccessor should be called for each function.
Revert "[IR] Make AttributeSetNode public, avoid temporary AttributeList copies"
This reverts r299875. A Linux bot came back with a test failure:
http://bb.pgr.jp/builders/test-clang-i686-linux-RA/builds/741/steps/test_clang/logs/Clang%20%3A%3A%20CodeGen__2006-05-19-SingleEltReturn.c
[IR] Make AttributeSetNode public, avoid temporary AttributeList copies
Summary:
AttributeList::get(Fn|Ret|Param)Attributes no longer creates a temporary
AttributeList just to hide the AttributeSetNode type.
I've also added a factory method to create AttributeLists from a
parallel array of AttributeSetNodes. I think this simplifies
construction of AttributeLists when rewriting function prototypes.
Previously we would test if a particular index had attributes, and
conditionally add a temporary attribute list to a vector. Now the
attribute set vector is parallel to the argument vector already that
these passes already construct.
My long term vision is to wrap AttributeSetNode* inside an AttributeSet
type that holds the enum attributes, but that will come in a follow up
change.
I haven't done any performance measurements for this change because
profiling hasn't shown that any of the affected code is hot.