Hal Finkel [Tue, 11 Dec 2012 19:59:32 +0000 (19:59 +0000)]
Add -fslp-vectorize to enable bb-vectorize
Add -fslp-vectorize (with -ftree-slp-vectorize as an alias for gcc compatibility)
to provide a way to enable the basic-block vectorization pass. This uses the same
acronym as gcc, superword-level parallelism (SLP), also common in the literature,
to refer to basic-block vectorization.
Nadav suggested this as a follow-up to the adding of -fvectorize.
objective-C blocks: Check for record type when deciding if
byref variable requires extended layout info. to prevent
a crash involving arrays declared __block. // rdar://12787751
Benjamin Kramer [Tue, 11 Dec 2012 18:00:22 +0000 (18:00 +0000)]
Speed up looking up static diagnostic infos.
Instead of doing a binary search over the whole diagnostic table (which weighs
a whopping 48k on x86_64), use the existing enums to compute the index in the
table. This avoids loading any unneeded data from the table and avoids littering
CPU caches with it. This code is in a hot path for code with many diagnostics.
1% speedup on -fsyntax-only gcc.c, which emits a lot of warnings.
Summary:
A few small coding style changes for StmtDumper, including:
- rename Dump* methods to dump*
- uninline some methods
- comment fixes
- whitespace fixes
Summary:
Also rename DumpDeclarator() to dumpDecl(). Once Decl dumping is added, these will be the two main methods of the class, so this is just for consistency in naming.
There was a DumpStmt() method already, but there was no point in having it, so I have merged it into VisitStmt(). Similarly, DumpExpr() is merged into VisitExpr().
Richard Smith [Tue, 11 Dec 2012 01:14:52 +0000 (01:14 +0000)]
PR14558: Compute triviality of special members (etc) at the end of the class
definition, rather than at the end of the definition of the set of nested
classes. We still defer checking of the user-specified exception specification
to the end of the nesting -- we can't check that until we've parsed the
in-class initializers for non-static data members.
Note that there is no test suite update. This was found by a couple of
tests failing when the test suite was run on a powerpc64 host (thanks
Roman!). The tests don't specify a triple, which might seem surprising
for a codegen test. But in fact, these tests don't even inspect their
output. Not at all. I could add a bunch of triples to these tests so
that we'd get the test coverage for normal builds, but really someone
needs to go through and add actual *tests* to these tests. =[ The ones
in question are:
Add a test case that I've been using to clarify the bitfield layout for
both LE and BE targets.
AFAICT, Clang get's this correct for PPC64. I've compared it to GCC 4.8
output for PPC64 (thanks Roman!) and to my limited ability to read power
assembly, it looks functionally equivalent. It would be really good to
fill in the assertions on this test case for x86-32, PPC32, ARM, etc.,
but I've reached the limit of my time and energy... Hopefully other
folks can chip in as it would be good to have this in place to test any
subsequent changes.
To those who care about PPC64 performance, a side note: there is some
*obnoxiously* bad code generated for these test cases. It would be worth
someone's time to sit down and teach the PPC backend to pattern match
these IR constructs better. It appears that things like '(shr %foo,
<imm>)' turn into 'rldicl R, R, 64-<imm>, <imm>' or some such. They
don't even get combined with other 'rldicl' instructions *immediately
adjacent*. I'll add a couple of these patterns to the README, but
I think it would be better to look at all the patterns produced by this
and other bitfield access code, and systematically build up a collection
of patterns that efficiently reduce them to the minimal code.
Fix the bitfield record layout in codegen for big endian targets.
This was an egregious bug due to the several iterations of refactorings
that took place. Size no longer meant what it original did by the time
I finished, but this line of code never got updated. Unfortunately we
had essentially zero tests for this in the regression test suite. =[
I've added a PPC64 run over the bitfield test case I've been primarily
using. I'm still looking at adding more tests and making sure this is
the *correct* bitfield access code on PPC64 linux, but it looks pretty
close to me, and it is *worlds* better than before this patch as it no
longer asserts! =] More commits to follow with at least additional tests
and maybe more fixes.
Richard Smith [Sun, 9 Dec 2012 06:48:56 +0000 (06:48 +0000)]
Fix overload resolution for the initialization of a multi-dimensional
array from a braced-init-list. There seems to be a core wording wart
here (it suggests we should be testing whether the elements of the init
list are implicitly convertible to the array element type, not whether
there is an implicit conversion sequence) but our prior behavior appears
to be a bug, not a deliberate effort to implement the standard as written.
David Chisnall [Sat, 8 Dec 2012 09:06:08 +0000 (09:06 +0000)]
long double should be 64 bits on FreeBSD/MIPS64. It possibly should be on
Linux too, as I think we inherited it from there. The ABI spec says 128-bit,
although I think SGI's compiler on IRIX may be the only thing ever to support
this.
Richard Smith [Sat, 8 Dec 2012 08:32:28 +0000 (08:32 +0000)]
Finish implementing 'selected constructor' rules for triviality in C++11. In
the cases where we can't determine whether special members would be trivial
while building the class, we eagerly declare those special members. The impact
of this is bounded, since it does not trigger implicit declarations of special
members in classes which merely *use* those classes.
In order to determine whether we need to apply this rule, we also need to
eagerly declare move operations and destructors in cases where they might be
deleted. If a move operation were supposed to be deleted, it would instead
be suppressed, and we could need overload resolution to determine if we fall
back to a trivial copy operation. If a destructor were implicitly deleted,
it would cause the move constructor of any derived classes to be suppressed.
As discussed on cxx-abi-dev, C++11's selected constructor rules are also
retroactively applied as a defect resolution in C++03 mode, in order to
identify that class B has a non-trivial copy constructor (since it calls
A's constructor template, not A's copy constructor):
struct A { template<typename T> A(T &); };
struct B { mutable A a; };
Richard Smith [Sat, 8 Dec 2012 02:53:02 +0000 (02:53 +0000)]
Properly compute triviality for explicitly-defaulted or deleted special members.
Remove pre-standard restriction on explicitly-defaulted copy constructors with
'incorrect' parameter types, and instead just make those special members
non-trivial as the standard requires.
This required making CXXRecordDecl correctly handle classes which have both a
trivial and a non-trivial special member of the same kind.
This also fixes PR13217 by reimplementing DiagnoseNontrivial in terms of the
new triviality computation technology.
[libclang] Resolve a cursor that points to a macro name inside a #ifdef/#ifndef
directive as a macro expansion.
This is more of a "macro reference" than a macro expansion but it's close enough
for libclang's purposes. If it causes issues we can revisit and introduce a new
kind of cursor.
Richard Smith [Sat, 8 Dec 2012 02:01:17 +0000 (02:01 +0000)]
Implement C++03 [dcl.init]p5's checking for value-initialization of references
properly, rather than faking it up by pretending that a reference member makes
the default constructor non-trivial. That leads to rejects-valids when putting
such types inside unions.
Fix analysis based warnings so that all warnings are emitted when compiling
with -Werror. Previously, compiling with -Werror would emit only the first
warning in a compilation unit, because clang assumes that once an error occurs,
further analysis is unlikely to return valid results. However, warnings that
have been upgraded to errors should not be treated as "errors" in this sense.
Anna Zaks [Fri, 7 Dec 2012 21:51:47 +0000 (21:51 +0000)]
[analyzer] Optimization heuristic: do not reanalyze every ObjC method as
top level.
This heuristic is already turned on for non-ObjC methods
(inlining-mode=noredundancy). If a method has been previously analyzed,
while being inlined inside of another method, do not reanalyze it as top
level.
This commit applies it to ObjCMethods as well. The main caveat here is
that to catch the retain release errors, we are still going to reanalyze
all the ObjC methods but without inlining turned on.
Gives 21% performance increase on one heavy ObjC benchmark, which
suffered large performance regressions due to ObjC inlining.
Daniel Jasper [Fri, 7 Dec 2012 20:34:49 +0000 (20:34 +0000)]
AST matcher tutorial (Part I)
This an AST matcher tutorial based on Sam Panzer's document
(https://docs.google.com/a/google.com/document/d/1oTkVLhCdRJUEH1_LDaQdXqe8-aOqT5GLDL9e4MhoFF8/edit).
Checking in now although some parts might be a bit rough so others can
help improving it.
Jordan Rose [Fri, 7 Dec 2012 19:56:29 +0000 (19:56 +0000)]
[analyzer] Fix r168019 to work with unpruned paths as well.
This is the case where the analyzer tries to print out source locations
for code within a synthesized function body, which of course does not have
a valid source location. The previous fix attempted to do this during
diagnostic path pruning, but some diagnostics have pruning disabled, and
so any diagnostic with a path that goes through a synthesized body will
either hit an assertion or emit invalid output.
John McCall [Fri, 7 Dec 2012 07:03:17 +0000 (07:03 +0000)]
Fix the required args count for variadic blocks.
We were emitting calls to blocks as if all arguments were
required --- i.e. with signature (A,B,C,D,...) rather than
(A,B,...). This patch fixes that and accounts for the
implicit block-context argument as a required argument.
In addition, this patch changes the function type under which
we call unprototyped functions on platforms like x86-64 that
guarantee compatibility of variadic functions with unprototyped
function types; previously we would always call such functions
under the LLVM type T (...)*, but now we will call them under
the type T (A,B,C,D,...)*. This last change should have no
material effect except for making the type conventions more
explicit; it was a side-effect of the most convenient implementation.
Ted Kremenek [Fri, 7 Dec 2012 01:55:21 +0000 (01:55 +0000)]
Change RegionStore to always use ImmutableMapRef for processing cluster bindings.
This reduces analysis time by 1.2% on one test case (Objective-C), but
also cleans up some of the code conceptually as well. We can possible
just make RegionBindingsRef -> RegionBindings, but I wanted to stage
things.
After this, we should revisit Jordan's optimization of not canonicalizing
the immutable AVL trees for the cluster bindings as well.
Jordan Rose [Fri, 7 Dec 2012 01:54:38 +0000 (01:54 +0000)]
[analyzer] Remove possible pessimizations from r169563.
Thanks for reminding me about copy-elision, David. Passing references here
doesn't help when we could get move construction in C++11. If we really
cared, we'd use std::swap to steal the reference from the temporary arg,
but it's probably not /that/ critical outside of Profile anyway.
Suggested by David Blaikie. ExplodedNode, CallEvent, and CheckerContext all
hang onto their ProgramState, so the accessors can return a reference to the
internal state rather than preemptively copying it. This helps avoid
temporary ProgramStateRefs, though local variables will still (correctly)
do an extra retain and release.
[libclang] Introduce a new indexing mode where we skip function bodies
that were already parsed in the same "indexing session".
An indexing session is defined as using the same CXIndexAction object
for multiple clang_indexSourceFile calls.
Passing CXIndexOpt_SkipParsedBodiesInSession as an indexing option will
enable the mode where we try to skip bodies that were already parsed in
another translation unit.
If a function's body was skipped, the "flags" field in the CXIdxDeclInfo
structure will have "CXIdxDeclFlag_Skipped" bit was set.
Jordan Rose [Thu, 6 Dec 2012 18:58:18 +0000 (18:58 +0000)]
[analyzer] Simplify RetainCountChecker's handling of dead symbols.
Previously we made three passes over the set of dead symbols, and removed
them from the state /twice/. Now we combine the autorelease pass and the
symbol death pass, and only have to remove the bindings for the symbols
that leaked.
Jordan Rose [Thu, 6 Dec 2012 18:58:15 +0000 (18:58 +0000)]
[analyzer] Use a smarter algorithm to find the last block in an inlined call.
Previously we would search for the last statement, then back up to the
entrance of the block that contained that statement. Now, while we're
scanning for the statement, we just keep track of which blocks are being
exited (in reverse order).