Dan Gohman [Tue, 12 Jan 2016 02:58:12 +0000 (02:58 +0000)]
[WebAssembly] Define a custom segment type for function definitions.
Since function definitions are not loaded into the address space, PT_LOAD is
inappropriate. PT_WEBASSEMBLY_FUNCTIONS is used to identify where the function
definitions are so that they can be processed at program startup time.
Justin Bogner [Tue, 12 Jan 2016 00:55:26 +0000 (00:55 +0000)]
LoopUnroll: Clean up the maze of initialization for unroll parameters. NFC
The layering of where the various loop unroll parameters are
initialized and overridden here was very confusing, making it pretty
difficult to tell just how the various sources interacted. Instead, we
put all of the initialization logic together in a single function so
that it's obvious what overrides what.
[libFuzzer] extend the weak memcmp/strcmp/strncmp interceptors to receive the result of the computations. With that, don't do any mutations if memcmp/etc returned 0
Function::copyAttributesFrom will copy the personality function, prefix
data and prolog data from the source function to the new function, and
is invoked when the IRMover copies the function prototype. This puts a
reference to a constant in the source module on a function in the dest
module, which causes an error when deleting the source module after
importing, since the personality function in the source module still has
uses (this would presumably also be an issue for the prologue and prefix
data). Remove the copies added to the dest copy when creating the new
prototype, as they are mapped properly when/if we link the function body.
Currently WebAssembly has two kinds of relocations; data addresses and
function addresses. This adds ELF relocations for them, as well as an
MC symbol kind to indicate which type of relocation is needed.
Sanjay Patel [Mon, 11 Jan 2016 23:31:48 +0000 (23:31 +0000)]
[LibCallSimplifier] use instruction-level fast-math-flags to transform log calls
Also, add tests to verify that we're checking 'fast' on both calls of each transform pair,
tighten the CHECK lines, and give the tests more meaningful names.
This is a continuation of:
http://reviews.llvm.org/rL255555
http://reviews.llvm.org/rL256871
http://reviews.llvm.org/rL256964
http://reviews.llvm.org/rL257400
http://reviews.llvm.org/rL257404
Sanjay Patel [Mon, 11 Jan 2016 22:34:19 +0000 (22:34 +0000)]
[LibCallSimplifier] use instruction-level fast-math-flags to transform sqrt calls
This is a continuation of adding FMF to call instructions:
http://reviews.llvm.org/rL255555
The intent of the patch is to preserve the current behavior of the transform except
that we use the sqrt instruction's 'fast' attribute as a trigger rather than the
function-level attribute.
But this raises a bug noted by the new FIXME comment.
In order to do this transform:
sqrt((x * x) * y) ---> fabs(x) * sqrt(y)
...we need all of the sqrt, the first fmul, and the second fmul to be 'fast'.
If any of those ops is strict, we should bail out.
Dan Gohman [Mon, 11 Jan 2016 22:05:44 +0000 (22:05 +0000)]
[WebAssembly] Reorganize address offset folding.
Always expect tglobaladdr and texternalsym to be wrapped in
WebAssemblywrapper nodes. Also, split out a regPlusGA from regPlusImm so
that it can special-case global addresses, as they can be folded in more
cases.
Unfortunately this doesn't enable any new optimizations yet due to
SelectionDAG limitations. I'll be submitting changes to the SelectionDAG
infrastructure, along with tests, in a separate patch.
Teresa Johnson [Mon, 11 Jan 2016 21:37:41 +0000 (21:37 +0000)]
Split resolveCycles(bool AllowTemps) into two interfaces and document
Address review feedback from r255909.
Move body of resolveCycles(bool AllowTemps) to
resolveRecursivelyImpl(bool AllowTemps). Revert resolveCycles back
to asserting on temps, and add new resolveNonTemporaries interface
to invoke the new implementation with AllowTemps=true. Document
the differences between these interfaces, specifically the effect
on RAUW support and uniquing. Call appropriate interface from
ValueMapper.
Reid Kleckner [Mon, 11 Jan 2016 20:35:45 +0000 (20:35 +0000)]
Use ::GetVersionEx directly rather than the Win8.1 SDK helpers
This removes ifdefs and fixes the build for users of the Win8.0 SDK,
which I happen to be. Upgrading is not hard, but executing the same code
everywhere seems better.
Dimitry Andric [Mon, 11 Jan 2016 20:12:53 +0000 (20:12 +0000)]
Ensure -mcpu=xscale works for arm targets, after rL252903 and rL252904
After these revisions, for arm targets, the -mcpu=xscale option caused
an error: "the clang compiler does not support '-mcpu=xscale'". Adding
"v5e" as a SUB_ARCH in ARMTargetParser.def helps.
Submitted by: Andrew Turner
Differential Revision: http://reviews.llvm.org/D16043
[sanitizer] [msan] Fix origin store of array types
This patch fixes the memory sanitizer origin store instrumentation for
array types. This can be triggered by cases where frontend lowers
function return to array type instead of aggregation.
int foo (int p)
{
mypair z = my_make_pair(p, 0);
return z.y + z.x;
}
--
It will be lowered with target set to aarch64-linux and -O0 to:
--
[...]
define i32 @_Z3fooi(i32 %p) #0 {
[...]
%call = call [2 x i64] @_Z12my_make_pairxi(i64 %conv, i32 0)
%1 = bitcast %struct.mypair* %z to [2 x i64]*
store [2 x i64] %call, [2 x i64]* %1, align 8
[...]
--
The origin store will emit a 'icmp' to test each store value again the
TLS origin array. However since 'icmp' does not support ArrayType the
memory instrumentation phase will bail out with an error.
This patch change it by using the same strategy used for struct type on
array.
It fixes the 'test/msan/insertvalue_origin.cc' for aarch64 (the -O0 case).
David Blaikie [Mon, 11 Jan 2016 19:26:01 +0000 (19:26 +0000)]
Fix some GCC 4.7 issues with the new Orc remote JIT support
I'm still seeing GCC ICE locally, but figured I'd throw this at the wall
& see if it sticks for the bots at least. Will continue investigating
the ICE in any case.
Lang Hames [Mon, 11 Jan 2016 17:09:58 +0000 (17:09 +0000)]
XFAIL the remote small code model tests on x86. Small code model is not properly
supported, and only worked previously because we weren't really running them
out-of-process.
Matt Arsenault [Mon, 11 Jan 2016 17:02:00 +0000 (17:02 +0000)]
AMDGPU: Pattern match ffbh pattern to instruction.
The hardware instruction's output on 0 is -1 rather than 32.
Eliminate a test and select to -1. This removes an extra instruction
from the compatability function with HSAIL's firstbit instruction.
Lang Hames [Mon, 11 Jan 2016 16:35:55 +0000 (16:35 +0000)]
[LLI] Replace the LLI remote-JIT support with the new ORC remote-JIT components.
The new ORC remote-JITing support provides a superset of the old code's
functionality, so we can replace the old stuff. As a bonus, a couple of
previously XFAILed tests have started passing.
Alexey Bataev [Mon, 11 Jan 2016 11:52:29 +0000 (11:52 +0000)]
[X86] Reduce complexity of the LEA optimization pass, by Andrey Turetsky.
In the OptimizeLEA pass keep instructions' positions in the basic block saved and use them for calculation of the distance between two instructions instead of std::distance. This reduces complexity of the pass from O(n^3) to O(n^2) and thus the compile time.
Differential Revision: http://reviews.llvm.org/D15692
Craig Topper [Mon, 11 Jan 2016 05:13:41 +0000 (05:13 +0000)]
[TableGen] Allow asm writer to use up to 3 OpInfo tables instead of 2. This allows x86 to use 56 total bits made up of a 32-bit, 16-bit, and 8-bit table. Previously we were using 64 total bits.
This saves 14K from the x86 table size. And saves space on other targets as well.
Craig Topper [Mon, 11 Jan 2016 05:13:38 +0000 (05:13 +0000)]
[TableGen] Remove unnecessary 0 terminator from an array that only existed to prevent ending an array with a comma. But that's perfectly legal and not something we need to prevent. NFC
Lang Hames [Mon, 11 Jan 2016 01:40:11 +0000 (01:40 +0000)]
[Orc] Add support for remote JITing to the ORC API.
This patch adds utilities to ORC for managing a remote JIT target. It consists
of:
1. A very primitive RPC system for making calls over a byte-stream. See
RPCChannel.h, RPCUtils.h.
2. An RPC API defined in the above system for managing memory, looking up
symbols, creating stubs, etc. on a remote target. See OrcRemoteTargetRPCAPI.h.
3. An interface for creating high-level JIT components (memory managers,
callback managers, stub managers, etc.) that operate over the RPC API. See
OrcRemoteTargetClient.h.
4. A helper class for building servers that can handle the RPC calls. See
OrcRemoteTargetServer.h.
The system is designed to work neatly with the existing ORC components and
functionality. In particular, the ORC callback API (and consequently the
CompileOnDemandLayer) is supported, enabling lazy compilation of remote code.
Assuming this doesn't trigger any builder failures, a follow-up patch will be
committed which tests these utilities by using them to replace LLI's existing
remote-JITing demo code.
Craig Topper [Mon, 11 Jan 2016 00:44:56 +0000 (00:44 +0000)]
[AVX-512] Remove unused Round and Itinerary from the maskable_cmp multiclasses. They weren't used and there were extra spaces in the asm string to prepare for the concatenations of the round string that wasn't ever used.
Lang Hames [Sun, 10 Jan 2016 23:59:41 +0000 (23:59 +0000)]
[RuntimeDyld] Add a notifyObjectLoaded method to RuntimeDyld::MemoryManager.
This is a more generic version of the MCJITMemoryManager::notifyObjectLoaded
method: It provides only a RuntimeDyld reference (rather than an
ExecutionEngine), and so can be used with ORC JIT stacks.
Lang Hames [Sun, 10 Jan 2016 18:51:50 +0000 (18:51 +0000)]
[RuntimeDyld] Add alignment arguments to the reserveAllocationSpace method of
RuntimeDyld::MemoryManager.
The RuntimeDyld::MemoryManager::reserveAllocationSpace method is called when
object files are loaded, and gives clients a chance to pre-allocate memory for
all segments. Previously only the size of each segment (code, ro-data, rw-data)
was supplied but not the alignment. This hasn't caused any problems so far, as
most clients allocate via the MemoryBlock interface which returns page-aligned
blocks. Adding alignment arguments enables finer grained allocation while still
satisfying alignment restrictions.
Keno Fischer [Sun, 10 Jan 2016 18:17:12 +0000 (18:17 +0000)]
[SectionMemoryManager] Don't just drop the RO free list
In r255760, I optimized the SectionMemoryManager to make better use
of virtual memory on platforms where the allocation granularity was
bigger than the protection granularity. As part of this, fixing up
the free list became more complicated and was moved into
`applyMemoryGroupPermissions`. Unfortunately, I forgot to actually
remove the call that drops the free list for RO memory (I did
remove the corresponding one for RX memory), defeating the whole
optimization.
NAKAMURA Takumi [Sun, 10 Jan 2016 15:56:49 +0000 (15:56 +0000)]
OrcJITTests//ObjectLinkingLayerTest.cpp: Appease msc18's C2327. It seems definition of nested class would confuse the context.
llvm\unittests\ExecutionEngine\Orc\ObjectLinkingLayerTest.cpp(115) : error C2327: 'llvm::OrcExecutionTest::TM' : is not a type name, static, or enumerator
llvm\unittests\ExecutionEngine\Orc\ObjectLinkingLayerTest.cpp(115) : error C2065: 'TM' : undeclared identifier
FYI, "this->TM" was valid even before moving class SectionMemoryManagerWrapper.
Chandler Carruth [Sun, 10 Jan 2016 09:40:13 +0000 (09:40 +0000)]
[ADT] Add an abstraction for embedding an integer within a pointer-like
type.
This makes it easy and safe to use a set of flags as one elmenet of
a tagged union with pointers. There is quite a bit of code that has
historically done this by casting arbitrary integers to "pointers" and
assuming that this was safe and reliable. It is neither, and has started
to rear its head by triggering safety asserts in various abstractions
like PointerLikeTypeTraits when the integers chosen are invariably poor
choices for *some* platform and *some* situation. Not to mention the
(hopefully unlikely) prospect of one of these integers actually getting
allocated!
With this, it will be straightforward to build type safe abstractions
like this without being error prone. The abstraction itself is also
remarkably simple thanks to the implicit conversion.
This use case and pattern was also independently created by the folks
working on Swift, and they're going to incrementally add any missing
functionality they find.