Alexey Bataev [Mon, 11 Jan 2016 11:52:29 +0000 (11:52 +0000)]
[X86] Reduce complexity of the LEA optimization pass, by Andrey Turetsky.
In the OptimizeLEA pass keep instructions' positions in the basic block saved and use them for calculation of the distance between two instructions instead of std::distance. This reduces complexity of the pass from O(n^3) to O(n^2) and thus the compile time.
Differential Revision: http://reviews.llvm.org/D15692
Craig Topper [Mon, 11 Jan 2016 05:13:41 +0000 (05:13 +0000)]
[TableGen] Allow asm writer to use up to 3 OpInfo tables instead of 2. This allows x86 to use 56 total bits made up of a 32-bit, 16-bit, and 8-bit table. Previously we were using 64 total bits.
This saves 14K from the x86 table size. And saves space on other targets as well.
Craig Topper [Mon, 11 Jan 2016 05:13:38 +0000 (05:13 +0000)]
[TableGen] Remove unnecessary 0 terminator from an array that only existed to prevent ending an array with a comma. But that's perfectly legal and not something we need to prevent. NFC
Lang Hames [Mon, 11 Jan 2016 01:40:11 +0000 (01:40 +0000)]
[Orc] Add support for remote JITing to the ORC API.
This patch adds utilities to ORC for managing a remote JIT target. It consists
of:
1. A very primitive RPC system for making calls over a byte-stream. See
RPCChannel.h, RPCUtils.h.
2. An RPC API defined in the above system for managing memory, looking up
symbols, creating stubs, etc. on a remote target. See OrcRemoteTargetRPCAPI.h.
3. An interface for creating high-level JIT components (memory managers,
callback managers, stub managers, etc.) that operate over the RPC API. See
OrcRemoteTargetClient.h.
4. A helper class for building servers that can handle the RPC calls. See
OrcRemoteTargetServer.h.
The system is designed to work neatly with the existing ORC components and
functionality. In particular, the ORC callback API (and consequently the
CompileOnDemandLayer) is supported, enabling lazy compilation of remote code.
Assuming this doesn't trigger any builder failures, a follow-up patch will be
committed which tests these utilities by using them to replace LLI's existing
remote-JITing demo code.
Craig Topper [Mon, 11 Jan 2016 00:44:56 +0000 (00:44 +0000)]
[AVX-512] Remove unused Round and Itinerary from the maskable_cmp multiclasses. They weren't used and there were extra spaces in the asm string to prepare for the concatenations of the round string that wasn't ever used.
Lang Hames [Sun, 10 Jan 2016 23:59:41 +0000 (23:59 +0000)]
[RuntimeDyld] Add a notifyObjectLoaded method to RuntimeDyld::MemoryManager.
This is a more generic version of the MCJITMemoryManager::notifyObjectLoaded
method: It provides only a RuntimeDyld reference (rather than an
ExecutionEngine), and so can be used with ORC JIT stacks.
Lang Hames [Sun, 10 Jan 2016 18:51:50 +0000 (18:51 +0000)]
[RuntimeDyld] Add alignment arguments to the reserveAllocationSpace method of
RuntimeDyld::MemoryManager.
The RuntimeDyld::MemoryManager::reserveAllocationSpace method is called when
object files are loaded, and gives clients a chance to pre-allocate memory for
all segments. Previously only the size of each segment (code, ro-data, rw-data)
was supplied but not the alignment. This hasn't caused any problems so far, as
most clients allocate via the MemoryBlock interface which returns page-aligned
blocks. Adding alignment arguments enables finer grained allocation while still
satisfying alignment restrictions.
Keno Fischer [Sun, 10 Jan 2016 18:17:12 +0000 (18:17 +0000)]
[SectionMemoryManager] Don't just drop the RO free list
In r255760, I optimized the SectionMemoryManager to make better use
of virtual memory on platforms where the allocation granularity was
bigger than the protection granularity. As part of this, fixing up
the free list became more complicated and was moved into
`applyMemoryGroupPermissions`. Unfortunately, I forgot to actually
remove the call that drops the free list for RO memory (I did
remove the corresponding one for RX memory), defeating the whole
optimization.
NAKAMURA Takumi [Sun, 10 Jan 2016 15:56:49 +0000 (15:56 +0000)]
OrcJITTests//ObjectLinkingLayerTest.cpp: Appease msc18's C2327. It seems definition of nested class would confuse the context.
llvm\unittests\ExecutionEngine\Orc\ObjectLinkingLayerTest.cpp(115) : error C2327: 'llvm::OrcExecutionTest::TM' : is not a type name, static, or enumerator
llvm\unittests\ExecutionEngine\Orc\ObjectLinkingLayerTest.cpp(115) : error C2065: 'TM' : undeclared identifier
FYI, "this->TM" was valid even before moving class SectionMemoryManagerWrapper.
Chandler Carruth [Sun, 10 Jan 2016 09:40:13 +0000 (09:40 +0000)]
[ADT] Add an abstraction for embedding an integer within a pointer-like
type.
This makes it easy and safe to use a set of flags as one elmenet of
a tagged union with pointers. There is quite a bit of code that has
historically done this by casting arbitrary integers to "pointers" and
assuming that this was safe and reliable. It is neither, and has started
to rear its head by triggering safety asserts in various abstractions
like PointerLikeTypeTraits when the integers chosen are invariably poor
choices for *some* platform and *some* situation. Not to mention the
(hopefully unlikely) prospect of one of these integers actually getting
allocated!
With this, it will be straightforward to build type safe abstractions
like this without being error prone. The abstraction itself is also
remarkably simple thanks to the implicit conversion.
This use case and pattern was also independently created by the folks
working on Swift, and they're going to incrementally add any missing
functionality they find.
Chandler Carruth [Sun, 10 Jan 2016 08:48:23 +0000 (08:48 +0000)]
[ADT] Add a sum type abstraction for pointer-like types.
This is a much more general and powerful form of PointerUnion. It
provides a reasonably complete sum type (from type theory) for
pointer-like types. It has several significant advantages over the
existing PointerUnion infrastructure:
1) It allows more than two pointer types to participate without awkward
nesting structures.
2) It directly exposes the tag so that it is convenient to write
switches over the possible members.
3) It can re-use the same type for multiple tag values, something that
has been worked around by either abusing PointerIntPair or defining
nonce types and doing unsafe pointer casting.
4) It supports customization of the PointerLikeTypeTraits used for
specific member types. This means it could (in theory) be used even
with types that are over-aligned on allocation to expose larger
numbers of bits to the tag.
All in all, I think it is at least complimentary to the existing
infrastructure, and a strict improvement for some use cases.
David Majnemer [Sun, 10 Jan 2016 07:13:04 +0000 (07:13 +0000)]
[JumpThreading] Don't forget to report that the IR changed
JumpThreading's runOnFunction is supposed to return true if it made any
changes. JumpThreading has a call to removeUnreachableBlocks which may
result in changes to the IR but runOnFunction didn't appropriate account
for this possibility, leading to badness.
While we are here, make sure to call LazyValueInfo::eraseBlock in
removeUnreachableBlocks; JumpThreading preserves LVI.
Joseph Tremoulet [Sun, 10 Jan 2016 04:32:03 +0000 (04:32 +0000)]
[WinEH] Fix catchpad pred verification
Summary:
The code was simply ensuring that the catchpad's pred is its catchswitch,
which was letting cases slip through where the flow edge was the unwind
edge of the catchswitch rather than one of its catch clauses.
Joseph Tremoulet [Sun, 10 Jan 2016 04:31:05 +0000 (04:31 +0000)]
[WinEH] Disallow cyclic unwinds
Summary:
Funclet-based EH personalities/tables likely can't handle these, and they
can't be generated at source, so make them officially illegal in IR as
well.
Joseph Tremoulet [Sun, 10 Jan 2016 04:30:02 +0000 (04:30 +0000)]
[WinEH] Verify consistent funclet unwind exits
Summary:
A funclet EH pad may be exited by an unwind edge, which may be a
cleanupret exiting its cleanuppad, an invoke exiting a funclet, or an
unwind out of a nested funclet transitively exiting its parent. Funclet
EH personalities require all such exceptional exits from a given funclet to
have the same unwind destination, and EH preparation / state numbering /
table generation implicitly depends on this. Formalize it as a rule of
the IR in the LangRef and verifier.
Joseph Tremoulet [Sun, 10 Jan 2016 04:28:38 +0000 (04:28 +0000)]
[WinEH] Verify unwind edges against EH pad tree
Summary:
Funclet EH personalities require a tree-like nesting among funclets
(enforced by the ParentPad linkage in the IR), and also require that
unwind edges conform to certain rules with respect to the tree:
- An unwind edge may exit 0 or more ancestor pads
- An unwind edge must enter exactly one EH pad, which must be distinct
from any exited pads
- A cleanupret's edge must exit its cleanuppad
Describe these rules in the LangRef, and enforce them in the verifier.
Lang Hames [Sat, 9 Jan 2016 19:50:40 +0000 (19:50 +0000)]
[Orc][RuntimeDyld] Prevent duplicate calls to finalizeMemory on shared memory
managers.
Prior to this patch, recursive finalization (where finalization of one
RuntimeDyld instance triggers finalization of another instance on which the
first depends) could trigger memory access failures: When the inner (dependent)
RuntimeDyld instance and its memory manager are finalized, memory allocated
(but not yet relocated) by the outer instance is locked, and relocation in the
outer instance fails with a memory access error.
This patch adds a latch to the RuntimeDyld::MemoryManager base class that is
checked by a new method: RuntimeDyld::finalizeWithMemoryManagerLocking, ensuring
that shared memory managers are only finalized by the outermost RuntimeDyld
instance.
This allows ORC clients to supply the same memory manager to multiple calls to
addModuleSet. In particular it enables the use of user-supplied memory managers
with the CompileOnDemandLayer which must reuse the supplied memory manager for
each function that is lazily compiled.
Manuel Jacob [Sat, 9 Jan 2016 04:02:16 +0000 (04:02 +0000)]
[RS4GC] Update and simplify handling of Constants in findBaseDefiningValueOfVector().
Summary:
This is analogous to r256079, which removed an overly strong assertion, and
r256812, which simplified the code by replacing three conditionals by one.
Philip Reames [Sat, 9 Jan 2016 01:31:13 +0000 (01:31 +0000)]
[rs4gc] Optionally directly relocated vector of pointers
This patch teaches rewrite-statepoints-for-gc to relocate vector-of-pointers directly rather than trying to split them. This builds on the recent lowering/IR changes to allow vector typed gc.relocates.
The motivation for this is that we recently found a bug in the vector splitting code where depending on visit order, a vector might not be relocated at some safepoint. Specifically, the bug is that the splitting code wasn't updating the side tables (live vector) of other safepoints. As a result, a vector which was live at two safepoints might not be updated at one of them. However, if you happened to visit safepoints in post order over the dominator tree, everything worked correctly. Weirdly, it turns out that post order is actually an incredibly common order to visit instructions in in practice. Frustratingly, I have not managed to write a test case which actually hits this. I can only reproduce it in large IR files produced by actual applications.
Rather than continue to make this code more complicated, we can remove all of the complexity by just representing the relocation of the entire vector natively in the IR.
At the moment, the new functionality is hidden behind a flag. To use this code, you need to pass "-rs4gc-split-vector-values=0". Once I have a chance to stress test with this option and get feedback from other users, my plan is to flip the default and remove the original splitting code. I would just remove it now, but given the rareness of the bug, I figured it was better to leave it in place until the new approach has been stress tested.
Dan Liew [Fri, 8 Jan 2016 22:36:22 +0000 (22:36 +0000)]
Teach the CMake build system to run lit's test suite. These can be run
directy with ``make check-lit`` and are run as part of
``make check-all``.
In principle we should run lit's testsuite before testing LLVM using lit
so that any problems with lit get discovered before testing LLVM so we
can bail out early. However this implementation (``check-all`` runs all
tests together) seemed simpler and will still report failing lit tests.
Note that the tests and the configured ``lit.site.cfg`` have to be
copied into the build directory to avoid polluting the source tree.
And expand the select into a branch structure. This later enables
jump-threading over bb in this pass.
Using the similar approach of SimplifyCFG::FoldCondBranchOnPHI(), unfold
select if the associated PHI has at least one constant. If the unfolded
select is not jump-threaded, it will be folded again in the later
optimizations.
Justin Bogner [Fri, 8 Jan 2016 19:08:53 +0000 (19:08 +0000)]
LoopInfo: Simplify ownership of Loop objects
It's strange that LoopInfo mostly owns the Loop objects, but that it
defers deleting them to the loop pass manager. Instead, change the
oddly named "updateUnloop" to "markAsRemoved" and have it queue the
Loop object for deletion. We can't delete the Loop immediately when we
remove it, since we need its pointer identity still, so we'll mark the
object as "invalid" so that clients can see what's going on.
Weiming Zhao [Fri, 8 Jan 2016 18:37:43 +0000 (18:37 +0000)]
Disable shrink-wrap for Thumb1
Summary: In ARMConstantIslandPass, which runs after Shrink Wrap pass, long jumps will be fixed up as BL (tBfar) which depends on spilling LR in epilogue. However, shrink-wrap may remove the LR, which causes issues when the function returns.
Do not ASSERTZEXT for i16 result of bitcast from f16 operand
Summary:
During legalization if i16, do not ASSERTZEXT the result of FP_TO_FP16.
Directly return an FP_TO_FP16 node with return type as the
promote-to-type of i16.
This patch also removes extraneous length check. This legalization
should be valid even if integer and float types are of different
lengths.
This patch breaks a hard-float test for fp16 args. The test is changed
to allow a vmov to zero-out the top bits, and also ensure that the
return value is in an FP register.
David Majnemer [Fri, 8 Jan 2016 17:24:47 +0000 (17:24 +0000)]
[WinEH] CatchHandler which don't have catch objects in StackColoring
StackColoring rewrites the frame indicies of operations involving
allocas if it can find that the life time of two objects do not overlap.
MSVC EH needs to be kept aware of this if happens in the event that a
catch object has moved around. However, we represent the non-existance
of a catch object with a sentinel frame index (INT_MAX). This sentinel
also happens to be the EmptyKey of the SlotRemap DenseMap. Testing for
whether or not we need to translate the frame index fails in this case
because we call the count method on the DenseMap with the EmptyKey,
leading to assertions. Instead, check if it is our sentinel value
before trying to look into the DenseMap.
Teresa Johnson [Fri, 8 Jan 2016 17:06:29 +0000 (17:06 +0000)]
[ThinLTO] Use new in-place symbol changes for exporting module
Due to the new in-place ThinLTO symbol handling support added in
r257174, we now invoke renameModuleForThinLTO on the current
module from within the FunctionImport pass.
Additionally, renameModuleForThinLTO no longer needs to return the
Module as it is performing the renaming in place on the one provided.
This commit will be immediately preceeded by a companion clang patch to
remove its invocation of renameModuleForThinLTO.
Teresa Johnson [Fri, 8 Jan 2016 15:00:00 +0000 (15:00 +0000)]
[ThinLTO] Enable in-place symbol changes for exporting module
Summary:
Move ThinLTO global value processing functions out of ModuleLinker and
into a new ThinLTOGlobalProcessor class, which performs any necessary
linkage and naming changes on the given module in place.
As a result, renameModuleForThinLTO no longer needs to create a new
Module when performing any necessary local to global promotion on a
module that we are possibly exporting from during a ThinLTO backend
compilation.
During function importing the ThinLTO processing is still invoked from
the ModuleLinker (via the new class), as it needs to perform renaming and
linkage changes on the source module, e.g. in order to get the correct
renaming during local to global promotion.
Teresa Johnson [Fri, 8 Jan 2016 14:17:41 +0000 (14:17 +0000)]
[ThinLTO] Delay metadata materializtion in function importer
The function importer was still materializing metadata when modules were
loaded for function importing. We only want to materialize it when we
are going to invoke the metadata linking postpass. Materializing it
before function importing is not only unnecessary, but also causes
metadata referenced by imported functions to be mapped in early, and
then not connected to the rest of the module level metadata when it is
ultimately linked in.
Augmented the test case to specifically check for the metadata being
properly connected, which it wasn't before this fix.
Prevent renaming of CR fields in AADB when a CR restore is present
This patch corresponds to review:
http://reviews.llvm.org/D15930
Moves to and from CR fields depend on shifts/masks that depend on the
target/source CR field. Thus, post-ra anti-dep breaking must not later
change that CR register assignment.
Silviu Baranga [Fri, 8 Jan 2016 11:11:04 +0000 (11:11 +0000)]
Re-commit r257064, this time with a fixed assert
In setInsertionPoint if the value is not a PHI, Instruction or
Argument it should be a Constant, not a ConstantExpr.
Original commit message:
[InstCombine] Look through PHIs, GEPs, IntToPtrs and PtrToInts to expose more constants when comparing GEPs
Summary:
When comparing two GEP instructions which have the same base pointer
and one of them has a constant index, it is possible to only compare
indices, transforming it to a compare with a constant. This removes
one use for the GEP instruction with the constant index, can reduce
register pressure and can sometimes lead to removing the comparisson
entirely.
InstCombine was already doing this when comparing two GEPs if the base
pointers were the same. However, in the case where we have complex
pointer arithmetic (GEPs applied to GEPs, PHIs of GEPs, conversions to
or from integers, etc) the value of the original base pointer will be
hidden to the optimizer and this transformation will be disabled.
This change detects when the two sides of the comparison can be
expressed as GEPs with the same base pointer, even if they don't
appear as such in the IR. The transformation will convert all the
pointer arithmetic to arithmetic done on indices and all the relevant
uses of GEPs to GEPs with a common base pointer. The GEP comparison
will be converted to a comparison done on indices.
[attrs] Split the late-revisit pattern for deducing norecurse in
a top-down manner into a true top-down or RPO pass over the call graph.
There are specific patterns of function attributes, notably the
norecurse attribute, which are most effectively propagated top-down
because all they us caller information.
Walk in RPO over the call graph SCCs takes the form of a module pass run
immediately after the CGSCC pass managers postorder walk of the SCCs,
trying again to deduce norerucrse for each singular SCC in the call
graph.
This removes a very legacy pass manager specific trick of using a lazy
revisit list traversed during finalization of the CGSCC pass. There is
no analogous finalization step in the new pass manager, and a lazy
revisit list is just trying to produce an RPO iteration of the call
graph. We can do that more directly if more expensively. It seems
unlikely that this will be the expensive part of any compilation though
as we never examine the function bodies here. Even in an LTO run over
a very large module, this should be a reasonable fast set of operations
over a reasonably small working set -- the function call graph itself.
In the future, if this really is a compile time performance issue, we
can look at building support for both post order and RPO traversals
directly into a pass manager that builds and maintains the PO list of
SCCs.
[LCG] Re-order the lazy node iterator below the node type to make some
subsequent work I'm doing not have its delta obscured by boring code
motion. NFC.
David Majnemer [Fri, 8 Jan 2016 08:03:55 +0000 (08:03 +0000)]
[WinEH] Update WinEHFuncInfo if StackColoring merges allocas
Windows EH keeping track of which frame index corresponds to a catchpad
in order to inform the runtime where the catch parameter should be
initialized. LLVM's optimizations are able to prove that the memory
used by the catch parameter can be reused with another memory
optimization, changing it's frame index.
We need to keep WinEHFuncInfo up to date with respect to this or we will
miscompile/assert.