In the case where just tables are part of the function section, this produces
more readable assembly by avoiding switching to the eh section and back
to .text.
This would also break with non unique section names, as trying to switch to
a unique section actually creates a new one.
Owen Anderson [Mon, 9 Mar 2015 07:13:42 +0000 (07:13 +0000)]
Fix a bug in the LLParser where we failed to diagnose landingpads with non-constant clause operands.
Fixing this also exposed a related issue where the landingpad under construction was not
cleaned up when an error was raised, which would cause bad reference errors before the
error could actually be printed.
Kevin Qin [Mon, 9 Mar 2015 06:14:28 +0000 (06:14 +0000)]
[AArch64] Enable partial & runtime unrolling on cortex-a57
For inner one of nested loops, it is more likely to be a hot loop,
and the runtime check can be promoted out from patch 0001, so the
overhead is less, we can try a doubled threshold to unroll more loops.
Kevin Qin [Mon, 9 Mar 2015 06:14:18 +0000 (06:14 +0000)]
Introduce runtime unrolling disable matadata and use it to mark the scalar loop from vectorization.
Runtime unrolling is an expensive optimization which can bring benefit
only if the loop is hot and iteration number is relatively large enough.
For some loops, we know they are not worth to be runtime unrolled.
The scalar loop from vectorization is one of the cases.
Kevin Qin [Mon, 9 Mar 2015 06:14:07 +0000 (06:14 +0000)]
Run LICM pass after loop unrolling pass.
Runtime unrollng will introduce a runtime check in loop prologue.
If the unrolled loop is a inner loop, then the proglogue will be inside
the outer loop. LICM pass can help to promote the runtime check out if
the checked value is loop invariant.
Mehdi Amini [Mon, 9 Mar 2015 03:20:25 +0000 (03:20 +0000)]
InstCombine: fix fold "fcmp x, undef" to account for NaN
Summary:
See the two test cases.
; Can fold fcmp with undef on one side by choosing NaN for the undef
; Can fold fcmp with undef on both side
; fcmp u_pred undef, undef -> true
; fcmp o_pred undef, undef -> false
; because whatever you choose for the first undef
; you can choose NaN for the other undef
David Majnemer [Sat, 7 Mar 2015 21:47:46 +0000 (21:47 +0000)]
Fix the autoconf build
lib/ExecutionEngine/Targets has no Makefile, causing the autoconf build
to fail. Solve this by bringing the COFF implementation of RuntimeDyld
in line like the Mach-O and ELF implementations.
Benjamin Kramer [Sat, 7 Mar 2015 17:41:00 +0000 (17:41 +0000)]
Make constant arrays that are passed to functions as const.
In theory this allows the compiler to skip materializing the array on
the stack. In practice clang often fails to do that, but that's a
different story. NFC.
This patch fixes the logic in the DAGCombiner that folds an AND node according
to rule: (and (X (load V)), C) -> (X (load V))
An AND between a vector load 'X' and a constant build_vector 'C' can be folded
into the load itself only if we can prove that the AND operation is redundant.
The algorithm implemented by 'visitAND' firstly computes the splat value 'S'
from C, and then checks if S has the lower 'B' bits set (where B is the size in
bits of the vector element type). The algorithm takes into account also the
'undef' bits in the splat mask.
Unfortunately, the algorithm only worked under the assumption that the size of S
is a multiple of the vector element type. With this patch, we conservatively
avoid folding the AND if the splat bits are not compatible with the vector
element type.
Teach the LLVM CMake build how to explicitly use libc++abi when using
libc++. This lets me almost self-host on Linux with libc++ and libc++abi
very simply.
Currently, MCJIT and OrcJIT are failing due to uncaught exceptions, and
the Go binding tests are failing to build due to not linking in the
correct C++ standard library.
[PM] Create a separate library for high-level pass management code.
This will provide the analogous replacements for the PassManagerBuilder
and other code long term. This code is extracted from the opt tool
currently, and I plan to extend it as I build up support for using the
new pass manager in Clang and other places.
Mailing this out for review in part to let folks comment on the terrible names
here. A brief word about why I chose the names I did.
The library is called "Passes" to try and make it clear that it is a high-level
utility and where *all* of the passes come together and are registered in
a common library. I didn't want it to be *limited* to a registry though, the
registry is just one component.
The class is a "PassBuilder" but this name I'm less happy with. It doesn't
build passes in any traditional sense and isn't a Builder-style API at all. The
class is a PassRegisterer or PassAdder, but neither of those really make a lot
of sense. This class is responsible for constructing passes for registry in an
analysis manager or for population of a pass pipeline. If anyone has a better
name, I would love to hear it. The other candidate I looked at was
PassRegistrar, but that doesn't really fit either. There is no register of all
the passes in use, and so I think continuing the "registry" analog outside of
the registry of pass *names* and *types* is a mistake. The objects themselves
are just objects with the new pass manager.
Frederic Riss [Sat, 7 Mar 2015 01:25:09 +0000 (01:25 +0000)]
[dsymutil] Apply relocations to DIE data before cloning.
Doing this gets function's low_pc and global variable's locations right
in the output debug info. It also could get right other attributes
that need to be relocated (in linker terms), but I don't know of any
other than the address attributes.
This doesn't fixup low_pc attributes in compile_unit, lexical_block
or inlined subroutine, nor does it get right high_pc attributes
for function. This will come in a subsequent commit.
Recommit r231324 with a fix to the ARM execution domain code
to disable lane switching if we don't actually have the instruction
set we want to switch to. Models the earlier check above the
conditional for the pass.
The testcase is one that triggered with the assert that's added
as part of the fix, use it to avoid adding a new testcase as it
highlights the same problem.
Frederic Riss [Fri, 6 Mar 2015 23:22:53 +0000 (23:22 +0000)]
[dsymutil] Support cloning DIE reference attributes.
Reference attributes are mainly handled by just creating DIEEntry
attributes for them. There is a special case for DW_FORM_ref_addr
attributes though, because the DIEEntry code needs a DwarfDebug
code to emit them (and we don't have one as we do no CodeGen).
In that case, just use DIEInteger attributes with the right form.
Frederic Riss [Fri, 6 Mar 2015 23:22:50 +0000 (23:22 +0000)]
[dsymutil] Set linked unit start offset early. NFC.
The start offset of a linked unit is known before starting to clone
its DIEs. Handling DW_FORM_ref_addr attributes requires that this
offset is set while cloning the unit. Split CompileUnit::computeOffsets()
into setStartOffset() and computeNextUnitOffset() and call them
repsectively before cloning the DIEs and right after.
[AArch64][LoadStoreOptimizer] Generate LDP + SXTW instead of LD[U]R + LD[U]RSW.
Teach the load store optimizer how to sign extend a result of a load pair when
it helps creating more pairs.
The rational is that loads are more expensive than sign extensions, so if we
gather some in one instruction this is better!
Matthias Braun [Fri, 6 Mar 2015 19:49:10 +0000 (19:49 +0000)]
DAGCombiner: Canonicalize select(and/or,x,y) depending on target.
This is based on the following equivalences:
select(C0 & C1, X, Y) <=> select(C0, select(C1, X, Y), Y)
select(C0 | C1, X, Y) <=> select(C0, X, select(C1, X, Y))
Many target cannot perform and/or on the CPU flags and therefore the
right side should be choosen to avoid materializign the i1 flags in an
integer register. If the target can perform this operation efficiently
we normalize to the left form.
Matthias Braun [Fri, 6 Mar 2015 19:49:06 +0000 (19:49 +0000)]
DAGCombiner: Factor out some and/or combines.
This is in preparation for changing visitSELECT to normalize towards
select(Cond0, select(Cond1, X, Y), Y);
select(Cond0, X, select(Cond1, X, Y)) which perfom an implicit and/or of
the conditions.
The factored function contains all DAGCombine rules which reduce two values
combined by an And/Or operation to a single value. This does not include rules
involving constants as visitSELECT already handles that case.
[AsmPrinter][TLOF] XFAIL AArch64 test to appease buildbots
The checking for extgotequiv and localgotequiv rely on the emission
order, which is not guaranteed because we use DenseMap to hold the GOT
equivalents. XFAIL this now until I get time to use MapVector and test
out the solution. In the meantime, appease buildbots.
Chad Rosier [Fri, 6 Mar 2015 16:15:04 +0000 (16:15 +0000)]
Avoid calls to dumpPassInfo and RegionBase<Tr>::getNameStr() in RGPassManager if
-debug-pass is not specified, as the string is only used when dumping pass
information. There is a big cost of determining the name in
ReginBase<Tr>:getNameStr() if the region's entry or exit block doesn't have a
name. This is the case for the Release build, as names are not preserved by the
front-end.
RegionPass is mainly used by Polly, resulting in long compile time for one file
of a customer application with the Release build (1m24s) vs Release+Asserts
build (10s) when Polly is used. With this change, the compile time with the
Release build went down to 8s.
Patch by Sanjin Sijaric <ssijaric@codeaurora.org>!
Phabricator: http://reviews.llvm.org/D8076
James Molloy [Fri, 6 Mar 2015 15:50:47 +0000 (15:50 +0000)]
[ConstantRange] Teach multiply to be cleverer about signed ranges.
Multiplication is not dependent on signedness, so just treating
all input ranges as unsigned is not incorrect. However it will cause
overly pessimistic ranges (such as full-set) when used with signed
negative values.
Teach multiply to try to interpret its inputs as both signed and
unsigned, and then to take the most specific (smallest population)
as its result.
[AsmPrinter][TLOF] 32-bit MachO support for replacing GOT equivalents
Add MachO 32-bit (i.e. arm and x86) support for replacing global GOT equivalent
symbol accesses. Unlike 64-bit targets, there's no GOTPCREL relocation, and
access through a non_lazy_symbol_pointers section is used instead.
[AsmPrinter][TLOF] ARM64 MachO support for replacing GOT equivalents
Follow up r230264 and add ARM64 support for replacing global GOT
equivalent symbol accesses by references to the GOT entry for the final
symbol instead, example:
Toma Tabacu [Fri, 6 Mar 2015 12:15:12 +0000 (12:15 +0000)]
[mips] [IAS] Add missing constraints and improve testing for the .module directive.
Summary:
None of the .set directives can be used before the .module directives. The .set mips0/pop/push were not triggering this constraint.
Also added testing for all the other implemented directives which are supposed to trigger this constraint.
Daniel Jasper [Fri, 6 Mar 2015 10:39:14 +0000 (10:39 +0000)]
Change the way in which error case is being handled.
Specifically this:
* Prevents an "unused" warning in non-assert builds.
* In that error case return with out removing a child loop instead of
looping forever.
Karthik Bhat [Fri, 6 Mar 2015 10:11:25 +0000 (10:11 +0000)]
Add a new pass "Loop Interchange"
This pass interchanges loops to provide a more cache-friendly memory access.
For e.g. given a loop like -
for(int i=0;i<N;i++)
for(int j=0;j<N;j++)
A[j][i] = A[j][i]+B[j][i];
is interchanged to -
for(int j=0;j<N;j++)
for(int i=0;i<N;i++)
A[j][i] = A[j][i]+B[j][i];
This pass is currently disabled by default.
To give a brief introduction it consists of 3 stages-
LoopInterchangeLegality : Checks the legality of loop interchange based on Dependency matrix.
LoopInterchangeProfitability: A very basic heuristic has been added to check for profitibility. This will evolve over time.
LoopInterchangeTransform : Which does the actual transform.
LNT Performance tests shows improvement in Polybench/linear-algebra/kernels/mvt and Polybench/linear-algebra/kernels/gemver becnmarks.
TODO:
1) Add support for reductions and lcssa phi.
2) Improve profitability model.
3) Improve loop selection algorithm to select best loop for interchange. Currently the innermost loop is selected for interchange.
4) Improve compile time regression found in llvm lnt due to this pass.
5) Fix issues in Dependency Analysis module.
A special thanks to Hal for reviewing this code.
Review: http://reviews.llvm.org/D7499
David Majnemer [Fri, 6 Mar 2015 08:11:32 +0000 (08:11 +0000)]
X86: Form IMGREL relocations for LLVM Functions
We supported forming IMGREL relocations from ConstantExprs involving
__ImageBase if the minuend was a GlobalVariable. Extend this
functionality to all GlobalObjects.
Rui Ueyama [Fri, 6 Mar 2015 06:07:32 +0000 (06:07 +0000)]
Support: Improve performance of FileOutputBuffer on Windows
We extend an underlying file before mmap'ing it, but it's not needed
on Windows. Extending file is slow on Windows, so we should avoid doing that.
The difference gets larger as the size of an output file gets larger.
It shove off 2 seconds out of 25 seconds when linking chrome.dll with LLD,
for example.
[objc-arc] Move the checking of whether or not we can match onto PtrStates and out of the main dataflow.
These refactored computations check whether or not we are at a stage
of the sequence where we can perform a match. This patch moves the
computation out of the main dataflow and into
{BottomUp,TopDown}PtrState.
[objc-arc] Refactor (Re-)initialization of PtrState from dataflow -> {TopDown,BottomUp}PtrState Class.
This initialization occurs when we see a new retain or release. Before
we performed the actual initialization inline in the dataflow. That is
just messy.
[objc-arc] Create two subclasses of PtrState in preparation for moving per ptr state change behavior onto a PtrState class.
This will enable the main ObjCARCOpts dataflow to work with higher
level concepts such as "can this ptr state be modified by this ref
count" and not need to understand the nitty gritty details of how that
is determined. This makes the dataflow cleaner.