Tom Stellard [Thu, 9 Mar 2017 19:24:07 +0000 (19:24 +0000)]
CMake: Don't install llvm-tblgen twice
Summary:
The add_tablegen macros defines its own install target, and it was also calling
add_llvm_utility which adds another install target.
Configuring with -DLLVM_TOOLS_INSTALL_DIR set to something other than
'bin' along with -DLLVM_INSTALL_UTILS=ON was causing llvm-tablgen
to be installed to two separate directories.
Artem Belevich [Thu, 9 Mar 2017 17:59:04 +0000 (17:59 +0000)]
[FileCheck] Added --enable-var-scope option to enable scope for regex variables.
If `--enable-var-scope` is in effect, variables with names that
start with `$` are considered to be global. All other variables are
local. All local variables get undefined at the beginning of each
CHECK-LABEL block. Global variables are not affected by CHECK-LABEL.
This makes it easier to ensure that individual tests are not affected
by variables set in preceding tests.
Sanjay Patel [Thu, 9 Mar 2017 16:20:52 +0000 (16:20 +0000)]
[InstSimplify] vector div/rem with any zero element in divisor is undef
This was suggested as a DAG simplification in the review for rL297026 :
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20170306/435253.html
...but let's start with IR since we have actual docs for IR (LangRef).
Sanjay Patel [Thu, 9 Mar 2017 15:02:25 +0000 (15:02 +0000)]
[DAG] recognize div/rem by 0 as undef before trying constant folding
As discussed in the review thread for rL297026, this is actually 2 changes that
would independently fix all of the test cases in the patch:
1. Return undef in FoldConstantArithmetic for div/rem by 0.
2. Move basic undef simplifications for div/rem (simplifyDivRem()) before
foldBinopIntoSelect() as a matter of efficiency.
I will handle the case of vectors with any zero element as a follow-up. That change
is the DAG sibling for D30665 + adding a check of vector elements to FoldConstantVectorArithmetic().
I'm deleting the test for PR30693 because it does not test for the actual bug any more
(dangers of using bugpoint).
Joey Gouly [Thu, 9 Mar 2017 13:38:06 +0000 (13:38 +0000)]
[SelectionDAG] Make SelectCode return void
SelectCode has been returning nullptr since 182dac0 ("SDAG: Make
SelectCodeCommon return void", 2016-05-10). Make SelectCode also
return void instead, as all callers have been updated.
[PM/Inliner] Make the new PM's inliner process call edges across an
entire SCC before iterating on newly-introduced call edges resulting
from any inlined function bodies.
This more closely matches the behavior of the old PM's inliner. While it
wasn't really clear to me initially, this behavior is actually essential
to the inliner behaving reasonably in its current design.
Because the inliner is fundamentally a bottom-up inliner and all of its
cost modeling is designed around that it often runs into trouble within
an SCC where we don't have any meaningful bottom-up ordering to use. In
addition to potentially cyclic, infinite inlining that we block with the
inline history mechanism, it can also take seemingly simple call graph
patterns within an SCC and turn them into *insanely* large functions by
accidentally working top-down across the SCC without any of the
threshold limitations that traditional top-down inliners use.
Consider this diabolical monster.cpp file that Richard Smith came up
with to help demonstrate this issue:
```
template <int N> extern const char *str;
When compiled with '-DMAX=N' for various values of N, this will create an SCC
with a reasonably large number of functions. Previously, the inliner would try
to exhaust the inlining candidates in a single function before moving on. This,
unfortunately, turns it into a top-down inliner within the SCC. Because our
thresholds were never built for that, we will incrementally decide that it is
always worth inlining and proceed to flatten the entire SCC into that one
function.
What's worse, we'll then proceed to the next function, and do the exact same
thing except we'll skip the first function, and so on. And at each step, we'll
also make some of the constant factors larger, which is awesome.
The fix in this patch is the obvious one which makes the new PM's inliner use
the same technique used by the old PM: consider all the call edges across the
entire SCC before beginning to process call edges introduced by inlining. The
result of this is essentially to distribute the inlining across the SCC so that
every function incrementally grows toward the inline thresholds rather than
allowing the inliner to grow one of the functions vastly beyond the threshold.
The code for this is a bit awkward, but it works out OK.
We could consider in the future doing something more powerful here such as
prioritized order (via lowest cost and/or profile info) and/or a code-growth
budget per SCC. However, both of those would require really substantial work
both to design the system in a way that wouldn't break really useful
abstraction decomposition properties of the current inliner and to be tuned
across a reasonably diverse set of code and workloads. It also seems really
risky in many ways. I have only found a single real-world file that triggers
the bad behavior here and it is generated code that has a pretty pathological
pattern. I'm not worried about the inliner not doing an *awesome* job here as
long as it does *ok*. On the other hand, the cases that will be tricky to get
right in a prioritized scheme with a budget will be more common and idiomatic
for at least some frontends (C++ and Rust at least). So while these approaches
are still really interesting, I'm not in a huge rush to go after them. Staying
even closer to the existing PM's behavior, especially when this easy to do,
seems like the right short to medium term approach.
I don't really have a test case that makes sense yet... I'll try to find a
variant of the IR produced by the monster template metaprogram that is both
small enough to be sane and large enough to clearly show when we get this wrong
in the future. But I'm not confident this exists. And the behavior change here
*should* be unobservable without snooping on debug logging. So there isn't
really much to test.
The test case updates come from two incidental changes:
1) We now visit functions in an SCC in the opposite order. I don't think there
really is a "right" order here, so I just update the test cases.
2) We no longer compute some analyses when an SCC has no call instructions that
we consider for inlining.
Simon Dardis [Thu, 9 Mar 2017 11:19:48 +0000 (11:19 +0000)]
[mips] Fix return lowering
Fix a machine verifier issue where a instruction was using a invalid
register. The return pseudo is expanded and has the return address
register added to it. The return register may have been spuriously
mark as killed earlier.
This partially resolves PR/27458
Thanks to Quentin Colombet for reporting the issue!
Teresa Johnson [Thu, 9 Mar 2017 00:19:49 +0000 (00:19 +0000)]
Perform symbol binding for .symver versioned symbols
Summary:
In a .symver assembler directive like:
.symver name, name2@@nodename
"name2@@nodename" should get the same symbol binding as "name".
While the ELF object writer is updating the symbol binding for .symver
aliases before emitting the object file, not doing so when the module
inline assembly is handled by the RecordStreamer is causing the wrong
behavior in *LTO mode.
E.g. when "name" is global, "name2@@nodename" must also be marked as
global. Otherwise, the symbol is skipped when iterating over the LTO
InputFile symbols (InputFile::Symbol::shouldSkip). So, for example,
when performing any *LTO via the gold-plugin, the versioned symbol
definition is not recorded by the plugin and passed back to the
linker. If the object was in an archive, and there were no other symbols
needed from that object, the object would not be included in the final
link and references to the versioned symbol are undefined.
The llvm-lto2 tests added will give an error about an unused symbol
resolution without the fix.
Don't merge global constants with non-dbg metadata.
!type metadata can not be dropped. An alternative to this is adding
!type metadata from the replaced globals to the replacement, but that
may weaken type tests and make them slower at the same time.
The merged global gets !dbg metadata from replaced globals, and can
end up with multiple debug locations.
This commit changes the BumpPtrAllocator for suffix tree nodes to a SpecificBumpPtrAllocator.
Before, node construction was leaking memory because of the DenseMap in SuffixTreeNodes.
Changing this to a SpecificBumpPtrAllocator allows this memory to properly be released.
Javed Absar [Wed, 8 Mar 2017 23:01:50 +0000 (23:01 +0000)]
[ConstantFold] Fix defect in constant folding computation for GEP
When the array indexes are all determined by GVN to be constants,
a call is made to constant-folding to optimize/simplify the address
computation.
The constant-folding, however, makes a mistake in that it sometimes reads
back stale Idxs instead of NewIdxs, that it re-computed in previous iteration.
This leads to incorrect addresses coming out of constant-folding to GEP.
A test case is included. The error is only triggered when indexes have particular
patterns that the stale/new index updates interplay matters.
Reviewers: Daniel Berlin
Differential Revision: https://reviews.llvm.org/D30642
Zachary Turner [Wed, 8 Mar 2017 22:49:32 +0000 (22:49 +0000)]
[Support] Add llvm::sys::fs::remove_directories.
We already have a function create_directories() which can create
an entire tree, and remove() which can remove an empty directory,
but we do not have remove_directories() which can remove an entire
tree. This patch adds such a function.
Because removing a directory tree can have dangerous consequences
when the tree contains a directory symlink, the patch here updates
the existing directory_iterator construct to optionally not follow
symlinks (previously it would always follow symlinks). The delete
algorithm uses this flag so that for symlinks, only the links are
removed, and not the targets.
On Windows this is implemented with SHFileOperation, which also
does not recurse into symbolic links or junctions.
Jordan Rose [Wed, 8 Mar 2017 21:53:12 +0000 (21:53 +0000)]
Add red zones to BumpPtrAllocator under ASan
To help catch buffer overruns, this patch changes BumpPtrAllocator to
insert an extra unused byte between allocations when building with
ASan. SpecificBumpPtrAllocator opts out of this behavior, since it
needs to destroy its items later by walking the allocated memory.
Adam Nemet [Wed, 8 Mar 2017 18:47:50 +0000 (18:47 +0000)]
[SLP] Visualize SLP trees with -view-slp-tree
Analyzing larger trees is extremely difficult with the current debug output so
this adds GraphTraits and DOTGraphTraits on top of the VectorizableTree data
structure. We can now display the SLP trees with Graphviz as in
https://reviews.llvm.org/F3132765.
I decorated the graph where a value needs to be gathered for one reason or
another. These are the red nodes.
There are other improvement I am planning to make as I work through my case
here. For example, I would also like to mark nodes that need to be extracted.
Matthew Simpson [Wed, 8 Mar 2017 18:18:20 +0000 (18:18 +0000)]
[LV] Select legal insert point when fixing first-order recurrences
Because IRBuilder performs constant-folding, it's not guaranteed that an
instruction in the original loop map to an instruction in the vector loop. It
could map to a constant vector instead. The handling of first-order recurrences
was incorrectly making this assumption when setting the IRBuilder's insert
point.
Volkan Keles [Wed, 8 Mar 2017 18:09:14 +0000 (18:09 +0000)]
[GlobalISel] Add default action for G_FNEG
Summary: rL297171 introduced G_FNEG for floating-point negation instruction and IRTranslator started to translate `FSUB -0.0, X` to `FNEG X`. This patch adds a default action for G_FNEG to avoid breaking existing targets.
Zachary Turner [Wed, 8 Mar 2017 17:56:08 +0000 (17:56 +0000)]
Resubmit FileSystem changes.
This was originall reverted due to some test failures in
ModuleCache and TestCompDirSymlink. These issues have all
been resolved and the code now passes all tests.
Matthew Simpson [Wed, 8 Mar 2017 17:03:38 +0000 (17:03 +0000)]
[LV] Make the test case for PR30183 less fragile
This patch also renames the PR number the test points to. The previous
reference was PR29559, but that bug was somehow deleted and recreated under
PR30183.
Daniel Cederman [Wed, 8 Mar 2017 15:23:10 +0000 (15:23 +0000)]
[Sparc] Check register use with isPhysRegUsed() instead of reg_nodbg_empty()
Summary: By using reg_nodbg_empty() to determine if a function can be
treated as a leaf function or not, we miss the case when the register
pair L0_L1 is used but not L0 by itself. This has the effect that
use_all_i32_regs(), a test in reserved-regs.ll which tries to use all
registers, gets treated as a leaf function.
John Brawn [Wed, 8 Mar 2017 12:49:18 +0000 (12:49 +0000)]
[ARM] Split up lsl-zero test into two tests
On Windows stderr and stdout happen to get interleaved in a way that causes the
test to fail, so split it up into a test that checks for errors and a test that
doesn't.
Jonas Hahnfeld [Wed, 8 Mar 2017 08:36:21 +0000 (08:36 +0000)]
[Support] Remove unit test for fs::is_local
rL295768 introduced this test that fails if LLVM is built and tested on
an NFS share. Delete the test as discussed on the corresponing commit
thread. The only feasible solution would have been to introduce
environment variables and to en/disable the test conditionally.
Zachary Turner [Wed, 8 Mar 2017 01:57:40 +0000 (01:57 +0000)]
Work around an ICE on MSVC 2017.
MSVC 2017 was released today, and I found one bug in the
compiler which prevents a successful build of LLVM. This
patch works around the bug in a fairly benign way.
Matt Arsenault [Wed, 8 Mar 2017 01:06:58 +0000 (01:06 +0000)]
AMDGPU: Don't wait at end of block with a trivial successor
If there is only one successor, and that successor only
has one predecessor the wait can obviously be delayed until
uses or the end of the next block. This avoids code quality
regressions when there are trivial fallthrough blocks inserted
for structurization.
Sanjay Patel [Tue, 7 Mar 2017 23:27:14 +0000 (23:27 +0000)]
[InstCombine] shrink truncated insertelement into undef vector
This is the 2nd part of solving:
http://lists.llvm.org/pipermail/llvm-dev/2017-February/110293.html
D30123 moves the trunc ahead of the shuffle, and this moves the trunc ahead of the insertelement.
We're limiting this transform to undef rather than any constant to avoid backend problems.
Daniel Sanders [Tue, 7 Mar 2017 23:20:35 +0000 (23:20 +0000)]
Recommit: [globalisel] Change LLT constructor string into an LLT-based object that knows how to generate it.
Summary:
This will allow future patches to inspect the details of the LLT. The implementation is now split between
the Support and CodeGen libraries to allow TableGen to use this class without introducing layering concerns.
Thanks to Ahmed Bougacha for finding a reasonable way to avoid the layering issue and providing the version of this patch without that problem.
The problem with the previous commit appears to have been that TableGen was including CodeGen/LowLevelType.h instead of Support/LowLevelTypeImpl.h.
Fix one-after-the-end type metadata handling in globalsplit.
Itanium ABI may have an address point one byte after the end of a
vtable. When such vtable global is split, the !type metadata needs to
follow the right vtable.
Fix SmallPtrSet::iterator behaviour and creation ReverseIterate is true.
- Any function that creates an iterator now uses
SmallPtrSet::makeIterator, which creates an iterator that
dereferences to the given pointer.
- In reverse-iterate mode, initialze iterator::End with "CurArray"
instead of EndPointer.
- In reverse-iterate mode, the current node is iterator::Buffer[-1].
iterator::operator* and SmallPtrSet::makeIterator are the only ones
that need to know.
- Fix the assertions for reverse-iterate mode.
This fixes the tests Danny B added in r297182, and adds a couple of
others to confirm that dereferencing does the right thing, regardless of
how the iterator was found, and that iteration works correctly from each
return from find.
This was committed at r297155 and reverted at r297166 because of an
over-reaching clang test. That should be fixed with r297189.
This is one part of solving a recent bug report:
http://lists.llvm.org/pipermail/llvm-dev/2017-February/110293.html
This keeps with our general approach: changing arbitrary shuffles is off-limts,
but changing splat is ok. The transform is very similar to the existing
shrinkBitwiseLogic() canonicalization.
Chris Bieneman [Tue, 7 Mar 2017 21:34:35 +0000 (21:34 +0000)]
[ObjectYAML] Fix issue with DWARF2 AddrSize 8
In my refactoring I introduced a bug where we were using the reference size instead of the offset size for DW_FORM_strp and similar forms.
This patch resolves the error and adds a test case testing all the DWARF forms for DWARF2 AddrSize 8. There is similar coverage already in the DWARFDebugInfoTest sources that covers the parser. Once I migrate the DWARFGenerator APIs to be built on the YAML tools they will be fully covered under the same tests.
Tim Northover [Tue, 7 Mar 2017 21:24:33 +0000 (21:24 +0000)]
GlobalISel: fix legalization of G_INSERT
We were calculating incorrect extract/insert offsets by trying to be too
tricksy with min/max. It's clearer to just split the logic up into "register
starts before this segment" vs "after".
Gor Nishanov [Tue, 7 Mar 2017 21:00:54 +0000 (21:00 +0000)]
[coroutines] Add handling for unwind coro.ends
Summary:
The purpose of coro.end intrinsic is to allow frontends to mark the cleanup and
other code that is only relevant during the initial invocation of the coroutine
and should not be present in resume and destroy parts.
In landing pads coro.end is replaced with an appropriate instruction to unwind to
caller. The handling of coro.end differs depending on whether the target is
using landingpad or WinEH exception model.
For landingpad based exception model, it is expected that frontend uses the
`coro.end`_ intrinsic as follows:
```
The `CoroSpit` pass replaces `coro.end` with ``True`` in the resume functions,
thus leading to immediate unwind to the caller, whereas in start function it
is replaced with ``False``, thus allowing to proceed to the rest of the cleanup
code that is only needed during initial invocation of the coroutine.
For Windows Exception handling model, a frontend should attach a funclet bundle
referring to an enclosing cleanuppad as follows:
The `CoroSplit` pass, if the funclet bundle is present, will insert
``cleanupret from %tok unwind to caller`` before
the `coro.end`_ intrinsic and will remove the rest of the block.
Ahmed Bougacha [Tue, 7 Mar 2017 20:53:09 +0000 (20:53 +0000)]
[GlobalISel] Don't translate intrinsics with metadata parameters.
Some intrinsics take metadata parameters. These all need custom
handling of some form, and cannot possibly be lowered generically to
G_INTRINSIC calls with vreg operands.
Reject them, instead of hitting an assert later in getOrCreateVReg.
Ahmed Bougacha [Tue, 7 Mar 2017 20:53:06 +0000 (20:53 +0000)]
[GlobalISel] Avoid invalidating ValToVReg when translating no-op bitcast.
When we translate a no-op (same type) bitcast, we try to be clever and
only emit a COPY if we already assigned a vreg to the defined value.
However, when we didn't, we tried to assign to a reference into the
ValToVReg DenseMap, even though the RHS of the assignment
(getOrCreateVReg) could potentially grow that DenseMap, invalidating the
reference.
Avoid that by getting the source vreg first.
I audited the rest of the translator; this is the only tricky case.
The test is quite unwieldy, as the problem is caused by the DenseMap
growing, which happens after the 47th mapped value.
Ahmed Bougacha [Tue, 7 Mar 2017 20:53:03 +0000 (20:53 +0000)]
[GlobalISel] Relax vector G_SELECT assertion.
For vector operands, the `select` instruction supports both vector and
non-vector conditions. The MIR builder had an overly restrictive
assertion, that only accepted vector conditions for vector selects
(in effect implementing ISD::VSELECT).
Make it possible to express the full range of G_SELECTs.
Ahmed Bougacha [Tue, 7 Mar 2017 20:34:20 +0000 (20:34 +0000)]
[GlobalISel] Emit DBG_VALUE %noreg for non-int/fp constant values.
When a dbg_value has a constant operand that isn't representable in MI,
there isn't much we can do. Use %noreg (0) for those situations.
This matches the SelectionDAG behavior.
SjLjEHPrepare: Fix the pass for swifterror arguments
We cannot leave the identity copies 'select true, arg, undef' that this pass
inserts for arguments to simplify handling of values on swifterror arguments.
swifterror arguments have restrictions on their uses.
[cmake] Include openmp with add_llvm_external_project
Summary:
Include projects/openmp into the build using add_llvm_external_project
instead of add_subdirectory. This creates an option
LLVM_TOOL_OPENMP_BUILD that selects whether this project gets included
in an in-tree build.
Daniel Berlin [Tue, 7 Mar 2017 18:47:50 +0000 (18:47 +0000)]
Add PointerLikeTypeTraits for const things, as long as there is one for the non-const version. Clang and other users have a number of types they use as pointers, and this avoids having to define both const and non-const versions of PointerLikeTraits.
Daniel Berlin [Tue, 7 Mar 2017 18:47:48 +0000 (18:47 +0000)]
Make SmallPtrSet count and find able to take const PtrType's
Summary:
For our set/map types, count/find normally take const references.
This works well for non-pointer types, but can suck for pointer
types.
DenseSet<int *> foo;
const int *b = nullptr;
foo.count(b) does not work
but the equivalent reference version does work
(patch to fix DenseSet/DenseMap coming up)
For SmallPtrSet, you have no such option.
The following will not work right now:
SmallPtrSet<int *> foo;
const int *b = nullptr;
foo.count(b);
This makes const correctness hard in some cases.
Example:
SmallPtrSet<Instruction *> InstructionsToErase;
You can't make this SmallPtrSet<const Instruction *> because then you
can't erase the instruction. If I want to see if something is in the
set, I may only have a const Instruction *. Given that count and find
are non-mutating, this should just work.
The places in our code base that do this resort to const_cast :(.
This patch makes count and find able to be used with const Instruction
* in the above SmallPtrSet examples.
This is a bit annoying because of where C++ applies the const, so we
have to remove the pointer type from the passed-in-type and rebuild it
with const.
Matthew Simpson [Tue, 7 Mar 2017 18:47:30 +0000 (18:47 +0000)]
[LV] Consider users that are memory accesses in uniforms expansion step
When expanding the set of uniform instructions beyond the seed instructions
(e.g., consecutive pointers), we mark a new instruction uniform if all its
loop-varying users are uniform. We should also allow users that are consecutive
or interleaved memory accesses. This fixes cases where we have an instruction
that is used as the pointer operand of a consecutive access but also used by a
non-memory instruction that later becomes uniform as part of the expansion.