Pawel Wodnicki [Mon, 19 Nov 2012 22:17:54 +0000 (22:17 +0000)]
Merging r167855 into 3.2 relase branch
Do not consider a machine instruction that uses and defines the same
physical register as candidate for common subexpression elimination
in MachineCSE.
This fixes a bug on PowerPC in MultiSource/Applications/oggenc/oggenc
caused by MachineCSE invalidly merging two separate DYNALLOC insns.
Hans Wennborg [Fri, 16 Nov 2012 20:21:42 +0000 (20:21 +0000)]
Merge r168147 from trunk:
Constant::IsThreadDependent(): Use dyn_cast<Constant> instead of cast
It turns out that the operands of a Constant are not always themselves
Constant. For example, one of the operands of BlockAddress is
BasicBlock, which is not a Constant.
This should fix the dragonegg-x86_64-linux-gcc-4.6-test build which
broke in r168037.
Hans Wennborg [Fri, 16 Nov 2012 20:14:40 +0000 (20:14 +0000)]
Merge r168037 from trunk:
Make GlobalOpt be conservative with TLS variables (PR14309)
For global variables that get the same value stored into them
everywhere, GlobalOpt will replace them with a constant. The problem is
that a thread-local GlobalVariable looks like one value (the address of
the TLS var), but is different between threads.
This patch introduces Constant::isThreadDependent() which returns true
for thread-local variables and constants which depend on them (e.g. a GEP
into a thread-local array), and teaches GlobalOpt not to track such
values.
These changes fix a serious interaction problem with the cost model on x86 that
could cause the vectorizer to enter an infinite loop (and sometimes crash in
other ways).
Each SM and PTX version is modeled as a subtarget feature/CPU. Additionally,
PTX 3.1 is added as the default PTX version to be out-of-the-box compatible
with CUDA 5.0.
Available CPUs for this target:
sm_10 - Select the sm_10 processor.
sm_11 - Select the sm_11 processor.
sm_12 - Select the sm_12 processor.
sm_13 - Select the sm_13 processor.
sm_20 - Select the sm_20 processor.
sm_21 - Select the sm_21 processor.
sm_30 - Select the sm_30 processor.
sm_35 - Select the sm_35 processor.
Available features for this target:
ptx30 - Use PTX version 3.0.
ptx31 - Use PTX version 3.1.
sm_10 - Target SM 1.0.
sm_11 - Target SM 1.1.
sm_12 - Target SM 1.2.
sm_13 - Target SM 1.3.
sm_20 - Target SM 2.0.
sm_21 - Target SM 2.1.
sm_30 - Target SM 3.0.
sm_35 - Target SM 3.5.
Meador Inge [Sun, 11 Nov 2012 07:10:25 +0000 (07:10 +0000)]
Remove hard-coded constant in Transforms/InstCombine/memcmp-1.ll
Transforms/InstCombine/memcmp-1.ll has a test case that looks like:
@foo = constant [4 x i8] c"foo\00"
@hel = constant [4 x i8] c"hel\00"
...
%mem1 = getelementptr [4 x i8]* @hel, i32 0, i32 0
%mem2 = getelementptr [4 x i8]* @foo, i32 0, i32 0
%ret = call i32 @memcmp(i8* %mem1, i8* %mem2, i32 3)
ret i32 %ret
; CHECK: ret i32 2
The folded return value (2 above) is computed using the system memcmp
that the compiler is linked with. This can return different values on
different systems. The test was originally written on an OS X 10.7.5
x86-64 box and passed. However, it failed on one of the x86-64 FreeBSD
buildbots because the system memcpy on that machine returned a different
value (1 instead of 2).
I fixed the test by checking the folding constants with regexes.
Meador Inge [Sun, 11 Nov 2012 03:51:43 +0000 (03:51 +0000)]
Add method for replacing instructions to LibCallSimplifier
In some cases the library call simplifier may need to replace instructions
other than the library call being simplified. In those cases it may be
necessary for clients of the simplifier to override how the replacements
are actually done. As such, a new overrideable method for replacing
instructions was added to LibCallSimplifier.
A new subclass of LibCallSimplifier is also defined which overrides
the instruction replacement method. This is because the instruction
combiner defines its own replacement method which updates the worklist
when instructions are replaced.
Meador Inge [Sat, 10 Nov 2012 03:11:10 +0000 (03:11 +0000)]
instcombine: Query target library information to gate libcall simplifications
Several of the simplifiers migrated from the simplify-libcalls pass to
the instcombine pass were not correctly checking the target library
information to gate the simplifications. This patch ensures that the
check is made.
Meador Inge [Sat, 10 Nov 2012 03:11:06 +0000 (03:11 +0000)]
Add more functions to the target library information.
In the process of migrating optimizations from the simplify-libcalls pass
to the instcombine pass I noticed that a few functions are missing from
the target library information. These functions need to be available for
querying in the instcombine library call simplifiers. More functions will
probably be added in the future as more simplifiers are migrated to
instcombine.
Nadav Rotem [Fri, 9 Nov 2012 07:09:44 +0000 (07:09 +0000)]
Add support for memory runtime check. When we can, we calculate array bounds.
If the arrays are found to be disjoint then we run the vectorized version of
the loop. If they are not, we run the scalar code.
Michael Liao [Thu, 8 Nov 2012 07:28:54 +0000 (07:28 +0000)]
Add support of RTM from TSX extension
- Add RTM code generation support throught 3 X86 intrinsics:
xbegin()/xend() to start/end a transaction region, and xabort() to abort a
tranaction region
Add a relocation visitor to lib object. This works via caching relocated
values in a map that can be passed to consumers. Add a testcase that
ensures this works for llvm-dwarfdump.
Andrew Trick [Wed, 7 Nov 2012 07:05:09 +0000 (07:05 +0000)]
misched: Heuristics based on the machine model.
misched is disabled by default. With -enable-misched, these heuristics
balance the schedule to simultaneously avoid saturating processor
resources, expose ILP, and minimize register pressure. I've been
analyzing the performance of these heuristics on everything in the
llvm test suite in addition to a few other benchmarks. I would like
each heuristic check to be verified by a unit test, but I'm still
trying to figure out the best way to do that. The heuristics are still
in considerable flux, but as they are refined we should be rigorous
about unit testing the improvements.
Bill Wendling [Wed, 7 Nov 2012 04:42:18 +0000 (04:42 +0000)]
When we're updating the subprogram scope DIE, we want to determine if we're
updating an abstract DIE or not. If we are, then we use that. Its children will
be added on later, as well as the object pointer attribute. Otherwise, this
function may be called with a concrete DIE twice and adding the children and
object pointer attribute to it twice.
<rdar://problem/12401423&12600340>
Chad Rosier [Wed, 7 Nov 2012 00:13:01 +0000 (00:13 +0000)]
[arm fast-isel] Appease the machine verifier by using the proper register
classes. For my test case the number of errors drop from 356 to 21.
Part of rdar://12594152
Chad Rosier [Tue, 6 Nov 2012 23:05:24 +0000 (23:05 +0000)]
Mark the Int_eh_sjlj_dispatchsetup pseudo instruction as clobbering all
registers. Previously, the register we being marked as implicitly defined, but
not killed. In some cases this would cause the register scavenger to spill a
dead register.
Also, use an empty register mask to simplify the logic and to reduce the memory
footprint.
rdar://12592448
Chad Rosier [Tue, 6 Nov 2012 22:52:42 +0000 (22:52 +0000)]
[regallocfast] Make sure the MachineRegisterInfo is aware of clobbers from a
register masks. This is an obvious and necessary fix for a soon to be committed
patch. No test case possible at this time. Reviewed by Jakob.
[c-index-test] When building with BUILD_CLANG_ONLY=YES, include c-index-test.
It is part of libclang and has other uses besides running the clang tests.
Andrew Kaylor [Tue, 6 Nov 2012 18:51:59 +0000 (18:51 +0000)]
Add interface for object-based JIT events.
This patch adds the interface to expose events from MCJIT when an object is emitted or freed and implements the MCJIT functionality to send those events. The IntelJITEventListener implementation is left empty for now. It will be fleshed out in a future patch.
Andrew Trick [Tue, 6 Nov 2012 07:10:38 +0000 (07:10 +0000)]
misched: TargetSchedule interface for machine resources.
Expose the processor resources defined by the machine model to the
scheduler and other clients through the TargetSchedule interface.
Normalize each resource count with respect to other kinds of
resources. This allows scheduling heuristics to balance resources
against other kinds of resources and latency.