[objc-arc] Extract out state specific to a ref count from the main objc arc sequence dataflow. This will allow me to separate the actual ARC queries from the meat of the dataflow algorithm.
Philip Reames [Thu, 5 Mar 2015 22:28:06 +0000 (22:28 +0000)]
[RewriteStatepointsForGC] Yet more test cases for relocation
At this point, we should have decent coverage of the involved code. I've got a few more test cases to cleanup and submit, but what's here is already reasonable.
I've got a collection of liveness tests which will be posted for review along with a decent liveness algorithm in the next few days. Once those are in, the code in this file should be well tested and I can start renaming things without risk of serious breakage.
This patch reduces code size for all AVX targets and increases speed for some chips.
SSE 4.1 introduced the useless (see code comments) 2-register form of BLENDV and
only in the packed float/double flavors.
AVX subsequently made the instruction useful by adding a 4-register operand form.
So we just need to paper over the lack of scalar forms of this instruction, complicate
the code to choose float or double forms, and use blendv on scalars since all FP is in
xmm registers anyway.
This gives us an approximately 50% speed up for a blendv microbenchmark sequence
on SandyBridge and Haswell:
blendv : 29.73 cycles/iter
logic : 43.15 cycles/iter
No new test cases with this patch because:
1. fast-isel-select-sse.ll tests the positive side for regular X86 lowering and fast-isel
2. sse-minmax.ll and fp-select-cmp-and.ll confirm that we're not firing for scalar selects without AVX
3. fp-select-cmp-and.ll and logical-load-fold.ll confirm that we're not firing for scalar selects with constants.
Ahmed Bougacha [Thu, 5 Mar 2015 19:37:53 +0000 (19:37 +0000)]
[ARM] Enable vector extload combine for legal types.
This commit enables forming vector extloads for ARM.
It only does so for legal types, and when we can't fold the extension
in a wide/long form of the user instruction.
Enabling it for larger types isn't as good an idea on ARM as it is on
X86, because:
- we pretend that extloads are legal, but end up generating vld+vmov
- we have instructions like vld {dN, dM}, which can't be generated
when we "manually expand" extloads to vld+vmov.
For legal types, the combine doesn't fire that often: in the
integration tests only in a big endian testcase, where it removes a
pointless AND.
Related to rdar://19723053
Differential Revision: http://reviews.llvm.org/D7423
Reid Kleckner [Thu, 5 Mar 2015 18:26:58 +0000 (18:26 +0000)]
Silence -Wmissing-braces warning from clang-cl
The first element of STACKFRAME64 is a struct and Clang wants us to put
braces around it's initialization. Instead, drop the zero. The result
should be the same.
Zachary Turner [Thu, 5 Mar 2015 17:47:52 +0000 (17:47 +0000)]
[Windows] Implement PrintStackTrace(FILE*)
llvm::sys::PrintBacktrace(FILE*) is supposed to print a backtrace
of the current thread given the current PC. This function was
unimplemented on Windows, and instead the only time we could
print a backtrace was as the result of an exception through
LLVMUnhandledExceptionFilter.
This patch implements backtracing of self by using
RtlCaptureContext to get a CONTEXT for the current thread, and
moving the printing and StackWalk64 code to a common method that
printing own stack trace and printing stack trace of an exception
can use.
Simon Pilgrim [Thu, 5 Mar 2015 17:14:04 +0000 (17:14 +0000)]
[DagCombiner] Allow shuffles to merge through bitcasts
Currently shuffles may only be combined if they are of the same type, despite the fact that bitcasts are often introduced in between shuffle nodes (e.g. x86 shuffle type widening).
This patch allows a single input shuffle to peek through bitcasts and if the input is another shuffle will merge them, shuffling using the smallest sized type, and re-applying the bitcasts at the inputs and output instead.
Dropped old ShuffleToZext test - this patch removes the use of the zext and vector-zext.ll covers these anyhow.
Kit Barton [Thu, 5 Mar 2015 16:24:38 +0000 (16:24 +0000)]
While reviewing the changes to Clang to add builtin support for the vsld, vsrd, and vsrad instructions, it was pointed out that the builtins are generating the LLVM opcodes (shl, lshr, and ashr) not calls to the intrinsics. This patch changes the implementation of the vsld, vsrd, and vsrad instructions from from intrinsics to VXForm_1 instructions and makes them legal with P8 Altivec. It also removes the definition of the int_ppc_altivec_vsld, int_ppc_altivec_vsrd, and int_ppc_altivec_vsrad intrinsics.
Igor Laevsky [Thu, 5 Mar 2015 14:11:21 +0000 (14:11 +0000)]
Teach lowering to correctly handle invoke statepoint and gc results tied to them. Note that we still can not lower gc.relocates for invoke statepoints.
Also it extracts getCopyFromRegs helper function in SelectionDAGBuilder as we need to be able to customize type of the register exported from basic block during lowering of the gc.result.
[PBQP] Use a local bit-matrix to speedup searching an edge in the graph.
Build time (user time) for building llvm+clang+lldb in release mode:
- default allocator: 9086 seconds
- with PBQP: 9126 seconds
- with PBQP + local bit matrix cache: 9097 seconds
Craig Topper [Thu, 5 Mar 2015 07:17:52 +0000 (07:17 +0000)]
Revert "[TableGen] Implement at least some support for multiple explicit results in an instruction pattern. No functional change to existing patterns."
Craig Topper [Thu, 5 Mar 2015 07:11:36 +0000 (07:11 +0000)]
[TableGen] Implement at least some support for multiple explicit results in an instruction pattern. No functional change to existing patterns.
This should help with the AVX512 masked gather changes Elena is working on. This patch is derived from some of the changes Elena made to tablegen, but modified by me to support arbitrary number of results.
Craig Topper [Thu, 5 Mar 2015 07:11:34 +0000 (07:11 +0000)]
[TableGen] Add support constraining a vector type in a pattern to have a specific element type and for constraining a vector type to have the same number of elements as another vector type. This is useful for AVX512 mask operations so we relate the mask type to the type of the other arguments.
Frederic Riss [Thu, 5 Mar 2015 05:29:05 +0000 (05:29 +0000)]
Revert "[dsymutil] MSVC does generate move constructors, but it should accept to default them"
This reverts commit r231350.
It turns out MSVC doesn't generate implicit move constructors and also doesn't accept to default them...
See for example http://lab.llvm.org:8011/builders/lldb-x86-windows-msvc/builds/2786
[MBP] Use range based for-loops throughout this code. Several had
already been added and the inconsistency made choosing names and
changing code more annoying. Plus, wow are they better for this code!
[MBP] NFC, run clang-format over this code and tweak things to make the
result reasonable.
This code predated clang-format and so there was a reasonable amount of
crufty formatting that had accumulated. This should ensure that neither
myself nor others end up with formatting-only changes sneaking into
other fixes.
[MBP] Revert r231238 which attempted to fix a nasty bug where MBP is
just arbitrarily interleaving unrelated control flows once they get
moved "out-of-line" (both outside of natural CFG ordering and with
diamonds that cannot be fully laid out by chaining fallthrough edges).
This easy solution doesn't work in practice, and it isn't just a small
bug. It looks like a very different strategy will be required. I'm
working on that now, and it'll again go behind some flag so that
everyone can experiment and make sure it is working well for them.
Remove the conditional addition of the execution dependency fixing
pass from the ARM backend as the pass itself will detect any use
of the appropriate register class.
Cleanup and remove a chunk of getARMSubtarget calls in the
ARM TargetMachine pass pipeline construction by pushing them down
into the appropriate pass.
Matthias Braun [Wed, 4 Mar 2015 22:31:18 +0000 (22:31 +0000)]
Improve test robustness
Improve test robustness in preparation of coming commits:
- Avoid undefs which may get propagated too much.
- Remove several pointless add 0, instructions
Sanjoy Das [Wed, 4 Mar 2015 22:24:23 +0000 (22:24 +0000)]
[IndVarSimplify] use the "canonical" way to infer no-wrap.
Summary:
rL225282 introduced an ad-hoc way to promote some additions to nuw or
nsw. Since then SCEV has become smarter in directly proving no-wrap;
and using the canonical "ext(A op B) == ext(A) op ext(B)" method of
proving no-wrap is just as powerful now. Rip out the existing
complexity in favor of getting SCEV to do all the heaving lifting
internally.
This change does not add any unit tests because it is supposed to be a
non-functional change. Tests added in rL225282 and rL226075 are valid
tests for this change.
Sanjoy Das [Wed, 4 Mar 2015 22:24:17 +0000 (22:24 +0000)]
[SCEV] make SCEV smarter about proving no-wrap.
Summary:
Teach SCEV to prove no overflow for an add recurrence by proving
something about the range of another add recurrence a loop-invariant
distance away from it.
Frederic Riss [Wed, 4 Mar 2015 22:07:44 +0000 (22:07 +0000)]
[dsymutil] Add minimal code to emit DIE trees.
This commit adds code to emit DIE trees that have been pruned from the
parts that haven't been marked as kept in the previous pass.
It works by 'cloning' the input DIE tree (as read by libDebugInfoDwarf)
into a tree of DIE objects. Cloning the DIEs means essentially cloning
their attributes. The code in this commit does only handle scalar and
block attributes (scalar because they are trivial, blocks because they
can't be easily replaced by a scalr placeholder), all the other ones
are replaced by placeholder zero values and will be handled in
further commits.
The added tests mostly check that the DIE tree has the correct layout and
also verify that a few chosen scalar and block attributes correctly make
their way into the output.
Frederic Riss [Wed, 4 Mar 2015 22:07:36 +0000 (22:07 +0000)]
Teach DIEInteger to emit FORM_strp and FORM_ref_addr attributes.
To be used/tested by llvm-dsymutil. (llvm-dsymutil does a 'static' link,
no need for relocations for most things, so it'll just emit raw integers
for most attributes)
David Blaikie [Wed, 4 Mar 2015 22:02:58 +0000 (22:02 +0000)]
Update LangRef for getelementptr explicit type changes
Here's a rough/first draft - it at least hits the actual textual IR
examples and some of the phrasing. It's probably worth a full pass over,
but I'm not sure how much these docs should reflect the strange
intermediate state we're in anyway.
Totally open to lots of review/feedback/suggestions.
Erik Eckstein [Wed, 4 Mar 2015 18:57:11 +0000 (18:57 +0000)]
Add a lock() function in PassRegistry to speed up multi-thread synchronization.
When calling lock() after all passes are registered, the PassRegistry doesn't need a mutex anymore to look up passes.
This speeds up multithreaded llvm execution by ~5% (tested with 4 threads).
In an asserts build of llvm this has an even bigger impact.
Note that it's not required to use the lock function.
Mehdi Amini [Wed, 4 Mar 2015 18:43:29 +0000 (18:43 +0000)]
Make DataLayout Non-Optional in the Module
Summary:
DataLayout keeps the string used for its creation.
As a side effect it is no longer needed in the Module.
This is "almost" NFC, the string is no longer
canonicalized, you can't rely on two "equals" DataLayout
having the same string returned by getStringRepresentation().
Get rid of DataLayoutPass: the DataLayout is in the Module
The DataLayout is "per-module", let's enforce this by not
duplicating it more than necessary.
One more step toward non-optionality of the DataLayout in the
module.
Make DataLayout Non-Optional in the Module
Module->getDataLayout() will never returns nullptr anymore.
Reid Kleckner [Wed, 4 Mar 2015 18:31:10 +0000 (18:31 +0000)]
Revert "unique_ptrify ValID::ConstantStructElts"
This reverts r231200 and r231204. The second one added an explicit move
ctor for MSVC.
This change broke the clang-cl self-host due to weirdness in MSVC's
implementation of std::map::insert. Somehow we lost our rvalue ref-ness
when going through variadic placement new:
Adrian Prantl [Wed, 4 Mar 2015 17:39:33 +0000 (17:39 +0000)]
Fix DwarfExpression::AddMachineRegExpression so it doesn't read past the
end of an expression that ends with DW_OP_plus.
Caught by the ASAN build bots.
JF Bastien [Wed, 4 Mar 2015 15:47:57 +0000 (15:47 +0000)]
Mutate TargetLowering::shouldExpandAtomicRMWInIR to specifically dictate how AtomicRMWInsts are expanded.
Summary:
In PNaCl, most atomic instructions have their own @llvm.nacl.atomic.* function, each one, with a few exceptions, represents a consistent behaviour across all NaCl-supported targets. Unfortunately, the atomic RMW operations nand, [u]min, and [u]max aren't directly represented by any such @llvm.nacl.atomic.* function. This patch refines shouldExpandAtomicRMWInIR in TargetLowering so that a future `Le32TargetLowering` class can selectively inform the caller how the target desires the atomic RMW instruction to be expanded (ie via load-linked/store-conditional for ARM/AArch64, via cmpxchg for X86/others?, or not at all for Mips) if at all.
This does not represent a behavioural change and as such no tests were added.
[X86][FastISel] Simplify the logic in method X86SelectSIToFP.
The target-independent selection algorithm in FastISel already knows how
to select a SINT_TO_FP if the target is SSE but not AVX.
On targets that have SSE but not AVX, the tablegen'd 'fastEmit' functions
for ISD::SINT_TO_FP know how to select instruction X86::CVTSI2SSrr
(for an i32 to f32 conversion) and X86::CVTSI2SDrr (for an i32 to f64
conversion).
This patch simplifies the logic in method X86SelectSIToFP knowing that
the code would not be reachable if the subtarget doesn't have AVX.
No functional change intended.
Dmitry Vyukov [Wed, 4 Mar 2015 13:27:53 +0000 (13:27 +0000)]
asan: do not instrument direct inbounds accesses to stack variables
Do not instrument direct accesses to stack variables that can be
proven to be inbounds, e.g. accesses to fields of structs on stack.
But it eliminates 33% of instrumentation on webrtc/modules_unittests
(number of memory accesses goes down from 290152 to 193998) and
reduces binary size by 15% (from 74M to 64M) and improved compilation time by 6-12%.
The optimization is guarded by asan-opt-stack flag that is off by default.
Toma Tabacu [Wed, 4 Mar 2015 13:01:14 +0000 (13:01 +0000)]
[mips] Rename the LA/LI/DLI TableGen definitions and classes. NFC.
Summary:
Use more reasonable names for these pseudo-instructions.
As there's only one definition tied to any one of these classes, I named them with abbreviated versions of their respective class' name.