Lang Hames [Mon, 9 Feb 2015 07:22:56 +0000 (07:22 +0000)]
[Orc] Tweak lambda capture lists to try to avoid an ICE on gcc-4.7.2. NFC.
Apparently gcc-4.7.2 is touchy about 'this' appearing in a lambda capture list
along with other captures. I've rewritten my captures to try to avoid the issue.
Akira Hatanaka [Mon, 9 Feb 2015 06:38:23 +0000 (06:38 +0000)]
Fix a bug in DemoteRegToStack where a reload instruction was inserted into the
wrong basic block.
This would happen when the result of an invoke was used by a phi instruction
in the invoke's normal destination block. An instruction to reload the invoke's
value would get inserted before the critical edge was split and a new basic
block (which is the correct insertion point for the reload) was created. This
commit fixes the bug by splitting the critical edge before all the reload
instructions are inserted.
Also, hoist up the code which computes the insertion point to the only place
that need that computation.
Tim Northover [Mon, 9 Feb 2015 01:21:00 +0000 (01:21 +0000)]
DeadArgElim: fix mismatch in accounting of array return types.
Some parts of DeadArgElim were only considering the individual fields
of StructTypes separately, but others (where insertvalue &
extractvalue instructions occur) also looked into ArrayTypes.
This one is an actual bug; the mismatch can lead to an argument being
considered used by a return sub-value that isn't being tracked (and
hence is dead by default). It then gets incorrectly eliminated.
Tim Northover [Mon, 9 Feb 2015 01:20:53 +0000 (01:20 +0000)]
DeadArgElim: assess uses of entire return value aggregate.
Previously, a non-extractvalue use of an aggregate return value meant
the entire return was considered live (the algorithm gave up
entirely). This was correct, but conservative. It's better to actually
look at that Use, making the analysis results apply to all sub-values
under consideration.
Lang Hames [Mon, 9 Feb 2015 01:20:51 +0000 (01:20 +0000)]
[Orc] Add a JITSymbol class to the Orc APIs, refactor APIs, update clients.
This patch refactors a key piece of the Orc APIs: It removes the
*::getSymbolAddress and *::lookupSymbolAddressIn methods, which returned target
addresses (uint64_ts), and replaces them with *::findSymbol and *::findSymbolIn
respectively, which return instances of the new JITSymbol type. Unlike the old
methods, calling findSymbol or findSymbolIn does not cause the symbol to be
immediately materialized when found. Instead, the symbol will be materialized
if/when the getAddress method is called on the returned JITSymbol. This allows
us to query for the existence of symbols without actually materializing them. In
the future I expect more information to be attached to the JITSymbol class, for
example whether the returned symbol is a weak or strong definition. This will
allow us to properly handle weak symbols and multiple definitions.
Sanjoy Das [Sun, 8 Feb 2015 22:52:17 +0000 (22:52 +0000)]
Bugfix: ScalarEvolution incorrectly assumes that the start of certain
add recurrences don't overflow.
This change makes the optimization more restrictive. It still assumes
that an overflowing `add nsw` is undefined behavior; and this change
will need revisiting once we have a consistent semantics for poison
values.
Craig Topper [Sun, 8 Feb 2015 22:38:25 +0000 (22:38 +0000)]
[X86] Remove the remaining uses of memop from AVX and AVX2 instruction patterns. AVX and AVX2 can handle unaligned loads being folded so we can just use 'load'
Zachary Turner [Sun, 8 Feb 2015 20:58:09 +0000 (20:58 +0000)]
DebugInfoPDB: Make the symbol base case hold an IPDBSession ref.
Dumping a symbol often requires access to data that isn't inside
the symbol hierarchy, but which is only accessible through the
top-level session. This patch is a pure interface change to give
symbols a reference to the session.
Correctly combine alias.scope metadata by a union instead of intersecting
Summary:
The alias.scope metadata represents sets of things an instruction might
alias with. When generically combining the metadata from two
instructions the result must be the union of the original sets, because
the new instruction might alias with anything any of the original
instructions aliased with.
Gather and Scatter are new introduced intrinsics, comming after recently implemented masked load and store.
This is the first patch for Gather and Scatter intrinsics. It includes only the syntax, parsing and verification.
Gather and Scatter intrinsics allow to perform multiple memory accesses (read/write) in one vector instruction.
The intrinsics are not target specific and will have the following syntax:
Gather:
declare <16 x i32> @llvm.masked.gather.v16i32(<16 x i32*> <vector of ptrs>, i32 <alignment>, <16 x i1> <mask>, <16 x i32> <passthru>)
declare <8 x float> @llvm.masked.gather.v8f32(<8 x float*><vector of ptrs>, i32 <alignment>, <8 x i1> <mask>, <8 x float><passthru>)
Scatter:
declare void @llvm.masked.scatter.v8i32(<8 x i32><vector value to be stored> , <8 x i32*><vector of ptrs> , i32 <alignment>, <8 x i1> <mask>)
declare void @llvm.masked.scatter.v16i32(<16 x i32> <vector value to be stored> , <16 x i32*> <vector of ptrs>, i32 <alignment>, <16 x i1><mask> )
Vector of ptrs - a set of source/destination addresses, to load/store the value.
Mask - switches on/off vector lanes to prevent memory access for switched-off lanes
vector of ptrs, value and mask should have the same vector width.
These are code examples where gather / scatter should be used and will allow function vectorization
;void foo1(int * restrict A, int * restrict B, int * restrict C) {
; for (int i=0; i<SIZE; i++) {
; A[i] = B[C[i]];
; }
;}
;void foo3(int * restrict A, int * restrict B) {
; for (int i=0; i<SIZE; i++) {
; A[B[i]] = i+5;
; }
;}
Tests will come in the following patches, with CodeGen and Vectorizer.
Tim Northover [Sun, 8 Feb 2015 00:50:47 +0000 (00:50 +0000)]
ARM & AArch64: teach LowerVSETCC that output type size may differ from input.
While various DAG combines try to guarantee that a vector SETCC
operation will have the same output size as input, there's nothing
intrinsic to either creation or LegalizeTypes that actually guarantees
it, so the function needs to be ready to handle a mismatch.
Fortunately this is easy enough, just extend or truncate the naturally
compared result.
I couldn't reproduce the failure in other backends that I know have
SIMD, so it's probably only an issue for these two due to shared
heritage.
Zachary Turner [Sun, 8 Feb 2015 00:29:29 +0000 (00:29 +0000)]
Some cleanup for libpdb.
This patch implements a few of the optional suggestions from the
initial patch comitting libpdb. In particular, it implements a
virtual function out of line for each of the concrete classes.
A few other minor cleanups exist as well, such as using override
instead of virtual, etc.
Benjamin Kramer [Sat, 7 Feb 2015 21:37:08 +0000 (21:37 +0000)]
LoopIdiom: Use utility functions.
The only difference between deleteIfDeadInstruction and
RecursivelyDeleteTriviallyDeadInstructions is that the former also
manually invalidates SCEV. That's unnecessary because SCEV automatically
gets informed when an instruction is deleted via a ValueHandle. NFC.
Avoid integer overflows around realloc calls resulting in potential
heap. Problem identified by Guido Vranken. Changes differ from original
OpenBSD sources by not depending on non-portable reallocarray.
Ahmed Bougacha [Sat, 7 Feb 2015 17:04:29 +0000 (17:04 +0000)]
[BasicAA] Try to disambiguate GEPs through arrays of structs into
different fields.
We can show that two GEPs off of the same (possibly multidimensional)
array of structs, into different fields, can't alias. Quoting:
For two GEPOperators GEP1 and GEP2, if we find that:
- both GEPs begin indexing from the exact same pointer;
- the last indices in both GEPs are constants, indexing into a struct;
- said indices are different, hence,the pointed-to fields are different;
- and both GEPs only index through arrays prior to that;
this lets us determine that the struct that GEP1 indexes into and the
struct that GEP2 indexes into must either precisely overlap or be
completely disjoint. Because they cannot partially overlap, indexing
into different non-overlapping fields of the struct will never alias.
The other BasicAA::aliasGEP rules worked in some cases, but not all
(for example, the i32x3 struct in the testcase).
We can add this simple ad-hoc rule to complement them.
David Majnemer [Sat, 7 Feb 2015 08:26:40 +0000 (08:26 +0000)]
MC: Emit COFF section flags in the "proper" order
COFF section flags are not idempotent:
'rd' will make a read-write section because 'd' implies write
'dr' will make a read-only section because 'r' disables write
Hal Finkel [Sat, 7 Feb 2015 07:32:58 +0000 (07:32 +0000)]
[PowerPC] Handle loop predecessor invokes
If a loop predecessor has an invoke as its terminator, and the return value
from that invoke is used to determine the loop iteration space, then we can't
insert a computation based on that value in the loop predecessor prior to the
terminator (oops). If there's such an invoke, or just no predecessor for that
matter, insert a new loop preheader.
Zachary Turner [Sat, 7 Feb 2015 01:47:14 +0000 (01:47 +0000)]
Resubmit unittests for DebugInfoPDB.
These were originally submitted as part of r228428, but this part
caused a build breakage in LLVMConfig. The library portion was
resubmitted independently since it was not causing breakage.
There were two reasons this was causing the build to fail. The
first is that there were no Makefiles added for the PDB tests. And
the second is that the DebugInfoPDB library was only being built by
CMake behind an "if (MSVC)" check. This is wrong since this the
library hides platform specific details, and it was causing
LLVM-Config to not find the library when trying to build unittests.
Lang Hames [Fri, 6 Feb 2015 23:26:33 +0000 (23:26 +0000)]
[Orc] Add a Kaleidoscope/Orc tutorial demonstrating lazy-irgen.
This tutorial builds on the lazy_codegen kaleidoscope/orc tutorial by making
a small set of changes (~75 lines diff) to defer ir-generation for function
definitions until functions are actually referenced.
Ahmed Bougacha [Fri, 6 Feb 2015 23:15:39 +0000 (23:15 +0000)]
[AArch64] Use the source location of the IR branch when creating Bcc
from a conditional branch fed by an add/sub/mul-with-overflow node.
We previously used the SDLoc of the overflow node, for no good reason.
In some cases, this led to the Bcc and B terminators having different
source orders, and DBG_VALUEs being inserted between them.
The real issue is with the code that can't handle DBG_VALUEs between
terminators: the few places affected by this will be fixed soon.
In the meantime, fixing the SDLoc is a positive change no matter what.
No tests, as I have no idea how to get .loc emitted for branches?
Hal Finkel [Fri, 6 Feb 2015 23:07:40 +0000 (23:07 +0000)]
Revert "r227976 - [PowerPC] Yet another approach to __tls_get_addr" and related fixups
Unfortunately, even with the workaround of disabling the linker TLS
optimizations in Clang restored (which has already been done), this still
breaks self-hosting on my P7 machine (-O3 -DNDEBUG -mcpu=native).
Bill is currently working on an alternate implementation to address the TLS
issue in a way that also fully elides the linker bug (which, unfortunately,
this approach did not fully), so I'm reverting this now.
Lang Hames [Fri, 6 Feb 2015 23:04:53 +0000 (23:04 +0000)]
[Orc] Add a Kaleidoscope/Orc tutorial demonstrating lazy-codegen.
This tutorial builds on the initial kaleidoscope/orc tutorial by adding a
LazyEmittingLayer to the custom stack. This extra layer defers compilation
of modules in the JIT until they are statically referenced.
Lang Hames [Fri, 6 Feb 2015 22:52:04 +0000 (22:52 +0000)]
[Orc] Add a Kaleidoscope tutorial for Orc demonstrating eager compilation.
This tutorial demonstrates a very basic custom Orc JIT stack that performs eager
compilation: All modules are CodeGen'd immediately upon being added to the JIT.
Remove unnecessary restriction of 24-bits for line numbers in
`MDLocation`.
The rest of the debug info schema (with the exception of local
variables) uses 32-bits for line numbers. As I introduce the
specialized nodes, it makes sense to canonicalize on one size or the
other.
An atomic store always make the target location fully initialized (in the
current implementation). It should not store origin. Initialized memory can't
have meaningful origin, and, due to origin granularity (4 bytes) there is a
chance that this extra store would overwrite meaningfull origin for an adjacent
location.
Zachary Turner [Fri, 6 Feb 2015 20:30:52 +0000 (20:30 +0000)]
Resubmit "Create lib/DebugInfo/PDB" (r228428)
This change resubmits the patch that broke the build, this time
without unittests. The unittests will be submitted separately
after the problem has been addressed:
--Original Commit Message--
Create lib/DebugInfo/PDB.
This patch creates a platform-independent interface to a PDB reader.
There is currently no implementation of this interface, which will
be provided in future patches. This defines the basic object model
which any implementation must conform to.
Reviewed by: David Blaikie
Differential Revision: http://reviews.llvm.org/D7356
Use estimated number of optimized insns in unroll-threshold computation.
If complete-unroll could help us to optimize away N% of instructions, we
might want to do this even if the final size would exceed loop-unroll
threshold. However, we don't want to unroll huge loop, and we are add
AbsoluteThreshold to avoid that - this threshold will never be crossed,
even if we expect to optimize 99% instructions after that.
It is a variation of SimplifyBinOp, but it takes into account
FastMathFlags.
It is needed in inliner and loop-unroller to accurately predict the
transformation's outcome (previously we dropped the flags and were too
conservative in some cases).
Example:
float foo(float *a, float b) {
float r;
if (a[1] * b)
r = /* a lot of expensive computations */;
else
r = 1;
return r;
}
float boo(float *a) {
return foo(a, 0.0);
}
Without this patch, we don't inline 'foo' into 'boo'.
Zachary Turner [Fri, 6 Feb 2015 19:44:09 +0000 (19:44 +0000)]
Create lib/DebugInfo/PDB.
This patch creates a platform-independent interface to a PDB reader.
There is currently no implementation of this interface, which will
be provided in future patches. This defines the basic object model
which any implementation must conform to.
Reviewed by: David Blaikie
Differential Revision: http://reviews.llvm.org/D7356