Note that there is no test suite update. This was found by a couple of
tests failing when the test suite was run on a powerpc64 host (thanks
Roman!). The tests don't specify a triple, which might seem surprising
for a codegen test. But in fact, these tests don't even inspect their
output. Not at all. I could add a bunch of triples to these tests so
that we'd get the test coverage for normal builds, but really someone
needs to go through and add actual *tests* to these tests. =[ The ones
in question are:
Add a test case that I've been using to clarify the bitfield layout for
both LE and BE targets.
AFAICT, Clang get's this correct for PPC64. I've compared it to GCC 4.8
output for PPC64 (thanks Roman!) and to my limited ability to read power
assembly, it looks functionally equivalent. It would be really good to
fill in the assertions on this test case for x86-32, PPC32, ARM, etc.,
but I've reached the limit of my time and energy... Hopefully other
folks can chip in as it would be good to have this in place to test any
subsequent changes.
To those who care about PPC64 performance, a side note: there is some
*obnoxiously* bad code generated for these test cases. It would be worth
someone's time to sit down and teach the PPC backend to pattern match
these IR constructs better. It appears that things like '(shr %foo,
<imm>)' turn into 'rldicl R, R, 64-<imm>, <imm>' or some such. They
don't even get combined with other 'rldicl' instructions *immediately
adjacent*. I'll add a couple of these patterns to the README, but
I think it would be better to look at all the patterns produced by this
and other bitfield access code, and systematically build up a collection
of patterns that efficiently reduce them to the minimal code.
Fix the bitfield record layout in codegen for big endian targets.
This was an egregious bug due to the several iterations of refactorings
that took place. Size no longer meant what it original did by the time
I finished, but this line of code never got updated. Unfortunately we
had essentially zero tests for this in the regression test suite. =[
I've added a PPC64 run over the bitfield test case I've been primarily
using. I'm still looking at adding more tests and making sure this is
the *correct* bitfield access code on PPC64 linux, but it looks pretty
close to me, and it is *worlds* better than before this patch as it no
longer asserts! =] More commits to follow with at least additional tests
and maybe more fixes.
Richard Smith [Sun, 9 Dec 2012 06:48:56 +0000 (06:48 +0000)]
Fix overload resolution for the initialization of a multi-dimensional
array from a braced-init-list. There seems to be a core wording wart
here (it suggests we should be testing whether the elements of the init
list are implicitly convertible to the array element type, not whether
there is an implicit conversion sequence) but our prior behavior appears
to be a bug, not a deliberate effort to implement the standard as written.
David Chisnall [Sat, 8 Dec 2012 09:06:08 +0000 (09:06 +0000)]
long double should be 64 bits on FreeBSD/MIPS64. It possibly should be on
Linux too, as I think we inherited it from there. The ABI spec says 128-bit,
although I think SGI's compiler on IRIX may be the only thing ever to support
this.
Richard Smith [Sat, 8 Dec 2012 08:32:28 +0000 (08:32 +0000)]
Finish implementing 'selected constructor' rules for triviality in C++11. In
the cases where we can't determine whether special members would be trivial
while building the class, we eagerly declare those special members. The impact
of this is bounded, since it does not trigger implicit declarations of special
members in classes which merely *use* those classes.
In order to determine whether we need to apply this rule, we also need to
eagerly declare move operations and destructors in cases where they might be
deleted. If a move operation were supposed to be deleted, it would instead
be suppressed, and we could need overload resolution to determine if we fall
back to a trivial copy operation. If a destructor were implicitly deleted,
it would cause the move constructor of any derived classes to be suppressed.
As discussed on cxx-abi-dev, C++11's selected constructor rules are also
retroactively applied as a defect resolution in C++03 mode, in order to
identify that class B has a non-trivial copy constructor (since it calls
A's constructor template, not A's copy constructor):
struct A { template<typename T> A(T &); };
struct B { mutable A a; };
Richard Smith [Sat, 8 Dec 2012 02:53:02 +0000 (02:53 +0000)]
Properly compute triviality for explicitly-defaulted or deleted special members.
Remove pre-standard restriction on explicitly-defaulted copy constructors with
'incorrect' parameter types, and instead just make those special members
non-trivial as the standard requires.
This required making CXXRecordDecl correctly handle classes which have both a
trivial and a non-trivial special member of the same kind.
This also fixes PR13217 by reimplementing DiagnoseNontrivial in terms of the
new triviality computation technology.
[libclang] Resolve a cursor that points to a macro name inside a #ifdef/#ifndef
directive as a macro expansion.
This is more of a "macro reference" than a macro expansion but it's close enough
for libclang's purposes. If it causes issues we can revisit and introduce a new
kind of cursor.
Richard Smith [Sat, 8 Dec 2012 02:01:17 +0000 (02:01 +0000)]
Implement C++03 [dcl.init]p5's checking for value-initialization of references
properly, rather than faking it up by pretending that a reference member makes
the default constructor non-trivial. That leads to rejects-valids when putting
such types inside unions.
Fix analysis based warnings so that all warnings are emitted when compiling
with -Werror. Previously, compiling with -Werror would emit only the first
warning in a compilation unit, because clang assumes that once an error occurs,
further analysis is unlikely to return valid results. However, warnings that
have been upgraded to errors should not be treated as "errors" in this sense.
Anna Zaks [Fri, 7 Dec 2012 21:51:47 +0000 (21:51 +0000)]
[analyzer] Optimization heuristic: do not reanalyze every ObjC method as
top level.
This heuristic is already turned on for non-ObjC methods
(inlining-mode=noredundancy). If a method has been previously analyzed,
while being inlined inside of another method, do not reanalyze it as top
level.
This commit applies it to ObjCMethods as well. The main caveat here is
that to catch the retain release errors, we are still going to reanalyze
all the ObjC methods but without inlining turned on.
Gives 21% performance increase on one heavy ObjC benchmark, which
suffered large performance regressions due to ObjC inlining.
Daniel Jasper [Fri, 7 Dec 2012 20:34:49 +0000 (20:34 +0000)]
AST matcher tutorial (Part I)
This an AST matcher tutorial based on Sam Panzer's document
(https://docs.google.com/a/google.com/document/d/1oTkVLhCdRJUEH1_LDaQdXqe8-aOqT5GLDL9e4MhoFF8/edit).
Checking in now although some parts might be a bit rough so others can
help improving it.
Jordan Rose [Fri, 7 Dec 2012 19:56:29 +0000 (19:56 +0000)]
[analyzer] Fix r168019 to work with unpruned paths as well.
This is the case where the analyzer tries to print out source locations
for code within a synthesized function body, which of course does not have
a valid source location. The previous fix attempted to do this during
diagnostic path pruning, but some diagnostics have pruning disabled, and
so any diagnostic with a path that goes through a synthesized body will
either hit an assertion or emit invalid output.
John McCall [Fri, 7 Dec 2012 07:03:17 +0000 (07:03 +0000)]
Fix the required args count for variadic blocks.
We were emitting calls to blocks as if all arguments were
required --- i.e. with signature (A,B,C,D,...) rather than
(A,B,...). This patch fixes that and accounts for the
implicit block-context argument as a required argument.
In addition, this patch changes the function type under which
we call unprototyped functions on platforms like x86-64 that
guarantee compatibility of variadic functions with unprototyped
function types; previously we would always call such functions
under the LLVM type T (...)*, but now we will call them under
the type T (A,B,C,D,...)*. This last change should have no
material effect except for making the type conventions more
explicit; it was a side-effect of the most convenient implementation.
Ted Kremenek [Fri, 7 Dec 2012 01:55:21 +0000 (01:55 +0000)]
Change RegionStore to always use ImmutableMapRef for processing cluster bindings.
This reduces analysis time by 1.2% on one test case (Objective-C), but
also cleans up some of the code conceptually as well. We can possible
just make RegionBindingsRef -> RegionBindings, but I wanted to stage
things.
After this, we should revisit Jordan's optimization of not canonicalizing
the immutable AVL trees for the cluster bindings as well.
Jordan Rose [Fri, 7 Dec 2012 01:54:38 +0000 (01:54 +0000)]
[analyzer] Remove possible pessimizations from r169563.
Thanks for reminding me about copy-elision, David. Passing references here
doesn't help when we could get move construction in C++11. If we really
cared, we'd use std::swap to steal the reference from the temporary arg,
but it's probably not /that/ critical outside of Profile anyway.
Suggested by David Blaikie. ExplodedNode, CallEvent, and CheckerContext all
hang onto their ProgramState, so the accessors can return a reference to the
internal state rather than preemptively copying it. This helps avoid
temporary ProgramStateRefs, though local variables will still (correctly)
do an extra retain and release.
[libclang] Introduce a new indexing mode where we skip function bodies
that were already parsed in the same "indexing session".
An indexing session is defined as using the same CXIndexAction object
for multiple clang_indexSourceFile calls.
Passing CXIndexOpt_SkipParsedBodiesInSession as an indexing option will
enable the mode where we try to skip bodies that were already parsed in
another translation unit.
If a function's body was skipped, the "flags" field in the CXIdxDeclInfo
structure will have "CXIdxDeclFlag_Skipped" bit was set.
Jordan Rose [Thu, 6 Dec 2012 18:58:18 +0000 (18:58 +0000)]
[analyzer] Simplify RetainCountChecker's handling of dead symbols.
Previously we made three passes over the set of dead symbols, and removed
them from the state /twice/. Now we combine the autorelease pass and the
symbol death pass, and only have to remove the bindings for the symbols
that leaked.
Jordan Rose [Thu, 6 Dec 2012 18:58:15 +0000 (18:58 +0000)]
[analyzer] Use a smarter algorithm to find the last block in an inlined call.
Previously we would search for the last statement, then back up to the
entrance of the block that contained that statement. Now, while we're
scanning for the statement, we just keep track of which blocks are being
exited (in reverse order).
Jordan Rose [Thu, 6 Dec 2012 18:58:06 +0000 (18:58 +0000)]
[analyzer] Aggressively cut back on the canonicalization in RegionStore.
Whenever we touch a single bindings cluster multiple times, we can delay
canonicalizing it until the final access. This has some interesting
implications, in particular that we shouldn't remove an /empty/ cluster
from the top-level map until canonicalization.
This is good for a 2% speedup or so on the test case in
<rdar://problem/12810842>
Jordan Rose [Thu, 6 Dec 2012 18:58:01 +0000 (18:58 +0000)]
[analyzer] Remove bindExprAndLocation, which does extra work for no gain.
This feature was probably intended to improve diagnostics, but was currently
only used when dumping the Environment. It shows what location a given value
was loaded from, e.g. when evaluating an LValueToRValue cast.
Rework the bitfield access IR generation to address PR13619 and
generally support the C++11 memory model requirements for bitfield
accesses by relying more heavily on LLVM's memory model.
The primary change this introduces is to move from a manually aligned
and strided access pattern across the bits of the bitfield to a much
simpler lump access of all bits in the bitfield followed by math to
extract the bits relevant for the particular field.
This simplifies the code significantly, but relies on LLVM to
intelligently lowering these integers.
I have tested LLVM's lowering both synthetically and in benchmarks. The
lowering appears to be functional, and there are no really significant
performance regressions. Different code patterns accessing bitfields
will vary in how this impacts them. The only real regressions I'm seeing
are a few patterns where the LLVM code generation for loads that feed
directly into a mask operation don't take advantage of the x86 ability
to do a smaller load and a cheap zero-extension. This doesn't regress
any benchmark in the nightly test suite on my box past the noise
threshold, but my box is quite noisy. I'll be watching the LNT numbers,
and will look into further improvements to the LLVM lowering as needed.
Don't require that, during template deduction, a template specialization type
as a function parameter has at least as many template arguments as one used in
a function argument (not even if the argument has been resolved to an exact
type); the additional parameters might be provided by default template
arguments in the template. We don't need this check, since we now implement
[temp.deduct.call]p4 with an additional check after deduction.
Chad Rosier [Wed, 5 Dec 2012 23:08:09 +0000 (23:08 +0000)]
[driver, ms-inline asm] Have -fms-extensions enable the AsmBlocks language
option. MS-style inline asm can now be enabled by either -fasm-blocks or
-fms-extensions.
rdar://12808010