test-release.sh: tweak RPATH for the binary packages.
libtool sets RPATH to "$ORIGIN/../lib:/the/directory/where/it/was/built/lib" so that a developper can use the built or the installed version seamlessly. Our binary packages should not have this developer friendly tweak, as the users of the binaries will not have the build tree.
Beside, in case the development tree is a possibly on an automounted share, this can create very bad user experience : they will incur an automount timeout penalty and will get a very bad feeling of llvm/clang's speed.
Alexey Samsonov [Mon, 18 Nov 2013 09:31:53 +0000 (09:31 +0000)]
Revert r194865 and r194874.
This change is incorrect. If you delete virtual destructor of both a base class
and a subclass, then the following code:
Base *foo = new Child();
delete foo;
will not cause the destructor for members of Child class. As a result, I observe
plently of memory leaks. Notable examples I investigated are:
ObjectBuffer and ObjectBufferStream, AttributeImpl and StringSAttributeImpl.
Manman Ren [Mon, 18 Nov 2013 03:19:31 +0000 (03:19 +0000)]
Debug Info Verifier: disable it by default.
Debug info verifier is part of the verifier which is a Function Pass.
Tot currently tries to pull all reachable debug info MDNodes in each function,
which is too time-consuming. The correct fix seems to be separating debug info
verification to its own module pass.
I will disable the debug info verifier until a correct fix is found.
For Bill's testing case, enabling debug info verifier increase compile
time from 11s to 11m.
Manman Ren [Sun, 17 Nov 2013 18:48:57 +0000 (18:48 +0000)]
Debug Info Verifier: fix when to find debug info nodes and when to verify them.
We used to collect debug info MDNodes in doInitialization and verify them in
doFinalization. That is incorrect since MDNodes can be modified by passes run
between doInitialization and doFinalization.
To fix the problem, we handle debug info MDNodes that can be reached from a
function in runOnFunction (i.e we collect those nodes by calling processDeclare,
processValue and processLocation, and then verify them in runOnFunction).
We handle debug info MDNodes that can be reached from named metadata in
doFinalization. This is in line with how Verifier handles module-level data
(they are verified in doFinalization).
Manman Ren [Sun, 17 Nov 2013 18:42:37 +0000 (18:42 +0000)]
Debug Info Verifier: enable public functions of Finder to update the type map.
We used to depend on running processModule before the other public functions
such as processDeclare, processValue and processLocation. We are now relaxing
the constraint by adding a module argument to the three functions and
letting the three functions to initialize the type map. This will be used in
a follow-on patch that collects nodes reachable from a Function.
Hal Finkel [Sun, 17 Nov 2013 16:02:50 +0000 (16:02 +0000)]
Add a loop rerolling flag to the PassManagerBuilder
This adds a boolean member variable to the PassManagerBuilder to control loop
rerolling (just like we have for unrolling and the various vectorization
options). This is necessary for control by the frontend. Loop rerolling remains
disabled by default at all optimization levels.
python: Properly initialize before trying to create disasm
As the "LLVMInitializeAll*" functions are not available as symbols in
the shared library they can't be used, and as a workaround a list of
the targets is kept and the individual symbols tried. As soon as the
"All"-functions are changed to proper symbols (as opposed to static
inlines in the headers) this hack will be replace with simple calls
to the corresponding "LLVMInitializeAll*" functions.
[block-freq] Add BlockFrequency::scale that returns a remainder from the division and make the private scale in BlockFrequency more performant.
This change is the first in a series of changes improving LLVM's Block
Frequency propogation implementation to not lose probability mass in
branchy code when propogating block frequency information from a basic
block to its successors. This patch is a simple infrastructure
improvement that does not actually modify the block frequency
algorithm. The specific changes are:
1. Changes the division algorithm used when scaling block frequencies by
branch probabilities to a short division algorithm. This gives us the
remainder for free as well as provides a nice speed boost. When I
benched the old routine and the new routine on a Sandy Bridge iMac with
disabled turbo mode performing 8192 iterations on an array of length
32768, I saw ~600% increase in speed in mean/median performance.
2. Exposes a scale method that returns a remainder. This is important so
we can ensure that when we scale a block frequency by some branch
probability BP = N/D, the remainder from the division by D can be
retrieved and propagated to other children to ensure no probability mass
is lost (more to come on this).
Chandler Carruth [Sun, 17 Nov 2013 03:18:05 +0000 (03:18 +0000)]
[PM] Completely remove support for explicit 'require' methods on the
AnalysisManager. All this method did was assert something and we have
a perfectly good way to trigger that assert from the query path.
Hal Finkel [Sun, 17 Nov 2013 02:06:35 +0000 (02:06 +0000)]
Add the cold attribute to error-reporting call sites
Generally speaking, control flow paths with error reporting calls are cold.
So far, error reporting calls are calls to perror and calls to fprintf,
fwrite, etc. with stderr as the stream. This can be extended in the future.
The primary motivation is to improve block placement (the cold attribute
affects the static branch prediction heuristics).
Andrew Trick [Sun, 17 Nov 2013 01:36:23 +0000 (01:36 +0000)]
Added a size field to the stack map record to handle subregister spills.
Implementing this on bigendian platforms could get strange. I added a
target hook, getStackSlotRange, per Jakob's recommendation to make
this as explicit as possible.
for (int i = 0; i < 3200; ++i) {
a[i] += alpha * b[i];
}
and loops like this:
for (int i = 0; i < 500; ++i) {
x[3*i] = foo(0);
x[3*i+1] = foo(0);
x[3*i+2] = foo(0);
}
and turn them into this:
for (int i = 0; i < 1500; ++i) {
x[i] = foo(0);
}
There are two motivations for this transformation:
1. Code-size reduction (especially relevant, obviously, when compiling for
code size).
2. Providing greater choice to the loop vectorizer (and generic unroller) to
choose the unrolling factor (and a better ability to vectorize). The loop
vectorizer can take vector lengths and register pressure into account when
choosing an unrolling factor, for example, and a pre-unrolled loop limits that
choice. This is especially problematic if the manual unrolling was optimized
for a machine different from the current target.
The current implementation is limited to single basic-block loops only. The
rerolling recognition should work regardless of how the loop iterations are
intermixed within the loop body (subject to dependency and side-effect
constraints), but the significant restriction is that the order of the
instructions in each iteration must be identical. This seems sufficient to
capture all current use cases.
This pass is not currently enabled by default at any optimization level.
Hal Finkel [Sat, 16 Nov 2013 21:29:08 +0000 (21:29 +0000)]
Apply the InstCombine fptrunc sqrt optimization to llvm.sqrt
InstCombine, in visitFPTrunc, applies the following optimization to sqrt calls:
(fptrunc (sqrt (fpext x))) -> (sqrtf x)
but does not apply the same optimization to llvm.sqrt. This is a problem
because, to enable vectorization, Clang generates llvm.sqrt instead of sqrt in
fast-math mode, and because this optimization is being applied to sqrt and not
applied to llvm.sqrt, sometimes the fast-math code is slower.
This change makes InstCombine apply this optimization to llvm.sqrt as well.
This fixes the specific problem in PR17758, although the same underlying issue
(optimizations applied to libcalls are not applied to intrinsics) exists for
other optimizations in SimplifyLibCalls.
Owen Anderson [Sat, 16 Nov 2013 00:20:01 +0000 (00:20 +0000)]
Small improvement to InstrinsicEmitter::EmitAttributes. This change removes the “pushing” and “clearing” of the SmallVector and instead uses const arrays to pass the attributeKinds to AttributeSet::get .
LoopVectorizer: Use abi alignment for accesses with no alignment
When we vectorize a scalar access with no alignment specified, we have to set
the target's abi alignment of the scalar access on the vectorized access.
Using the same alignment of zero would be wrong because most targets will have a
bigger abi alignment for vector types.
Manman Ren [Fri, 15 Nov 2013 20:41:15 +0000 (20:41 +0000)]
ArgumentPromotion: correctly transfer TBAA tags and alignments.
We used to use std::map<IndicesVector, LoadInst*> for OriginalLoads, and when we
try to promote two arguments, they will both write to OriginalLoads causing
created loads for the two arguments to have the same original load. And the same
tbaa tag and alignment will be put to the created loads for the two arguments.
The fix is to use std::map<std::pair<Argument*, IndicesVector>, LoadInst*>
for OriginalLoads, so each Argument will write to different parts of the map.
Rui Ueyama [Fri, 15 Nov 2013 20:23:25 +0000 (20:23 +0000)]
Readobj: If NumbersOfSections is 0xffff, it's an COFF import library.
0xffff does not mean that there are 65535 sections in a COFF file but
indicates that it's a COFF import library. This patch fixes SEGV error
when an import library file is passed to llvm-readobj.
Bob Wilson [Fri, 15 Nov 2013 19:09:27 +0000 (19:09 +0000)]
Avoid illegal integer promotion in fastisel
Stop folding constant adds into GEP when the type size doesn't match.
Otherwise, the adds' operands are effectively being promoted, changing the
conditions of an overflow. Results are different when:
sext(a) + sext(b) != sext(a + b)
Problem originally found on x86-64, but also fixed issues with ARM and PPC,
which used similar code.
Daniel Sanders [Fri, 15 Nov 2013 17:24:41 +0000 (17:24 +0000)]
[mips][msa] Merge basic_operations_little.ll into basic_operations.ll.
Now that FileCheck supports multiple check prefixes, we don't need to keep the
little and big endian versions of this test separate anymore. Merge them back
together.
Daniel Sanders [Fri, 15 Nov 2013 12:56:49 +0000 (12:56 +0000)]
Fix illegal DAG produced by SelectionDAG::getConstant() for v2i64 type
Summary:
When getConstant() is called for an expanded vector type, it is split into
multiple scalar constants which are then combined using appropriate build_vector
and bitcast operations.
In addition to the usual big/little endian differences, the case where the
element-order of the vector does not have the same endianness as the elements
themselves is also accounted for. For example, for v4i32 on big-endian MIPS,
the byte-order of the vector is <3210,7654,BA98,FEDC>. For little-endian, it is
<0123,4567,89AB,CDEF>.
Handling this case turns out to be a nop since getConstant() returns a splatted
vector (so reversing the element order doesn't change the value)
This fixes a number of cases in MIPS MSA where calling getConstant() during
operation legalization introduces illegal types (e.g. to legalize v2i64 UNDEF
into a v2i64 BUILD_VECTOR of illegal i64 zeros). It should also handle bigger
differences between illegal and legal types such as legalizing v2i64 into v8i16.
lowerMSASplatImm() in the MIPS backend no longer needs to avoid calling
getConstant() so this function has been updated in the same patch.
For the sake of transparency, the steps I've taken since the review are:
* Added 'virtual' to isVectorEltOrderLittleEndian() as requested. This revealed
that the MIPS tests were falsely passing because a polymorphic function was
not actually polymorphic in the reviewed patch.
* Fixed the tests that were now failing. This involved deleting the code to
handle the MIPS MSA element-order (which was previously doing an byte-order
swap instead of an element-order swap). This left
isVectorEltOrderLittleEndian() unused and it was deleted.
* Fixed build failures caused by rebasing beyond r194467-r194472. These build
failures involved the bset, bneg, and bclr instructions added in these commits
using lowerMSASplatImm() in a way that was no longer valid after this patch.
Some of these were fixed by calling SelectionDAG::getConstant() instead,
others were fixed by a new function getBuildVectorSplat() that provided the
removed functionality of lowerMSASplatImm() in a more sensible way.
Daniel Sanders [Fri, 15 Nov 2013 11:04:16 +0000 (11:04 +0000)]
[mips][msa] Build all the tests in little and big endian modes and correct an incorrect test.
Summary:
This patch (correctly) breaks some MSA tests by exposing the cases when
SelectionDAG::getConstant() produces illegal types. These have been temporarily
marked XFAIL and the XFAIL flag will be removed when
SelectionDAG::getConstant() is fixed.
There are three categories of failure:
* Immediate instructions are not selected in one endian mode.
* Immediates used in ldi.[bhwd] must be different according to endianness.
(this only affects cases where the 'wrong' ldi is used to load the correct
bitpattern. E.g. (bitcast:v2i64 (build_vector:v4i32 ...)))
* Non-immediate instructions that rely on immediates affected by the
previous two categories as part of their match pattern.
For example, the bset match pattern is the vector equivalent of
'ws | (1 << wt)'.
One test needed correcting to expect different output depending on whether big
or little endian was in use. This test was
test/CodeGen/Mips/msa/basic_operations.ll and experiences the second category
of failure shown above. The little endian version of this test is named
basic_operations_little.ll and will be merged back into basic_operations.ll in
a follow up commit now that FileCheck supports multiple check prefixes.
Chandler Carruth [Fri, 15 Nov 2013 10:20:45 +0000 (10:20 +0000)]
Move all of the GoogleTest files back to the same locations they occupy
externally to simplify our integration of GoogleTest into LLVM. Also,
build the single source file gtest-all.cc instead of the individual
source files as we don't expect these to change and thus gain nothing
from increased incrementality in compiles.
This makes our standard build of googletest exactly like upstream's
recommended build and the sanitizer's build. It also simplifies the
steps of importing a new version should we ever want one.