Serge Pavlov [Sat, 18 Feb 2017 06:04:15 +0000 (06:04 +0000)]
Process attributes 'ifunc' and 'alias' when checking for redefinition
These attributes effectively turn a non-defining declaration into a
definition, so the case when the declaration already has a body must
be diagnosed properly.
Vedant Kumar [Sat, 18 Feb 2017 02:02:55 +0000 (02:02 +0000)]
[profiling] Make a test more explicit. NFC.
The cxx-structors.cpp test checks that some instrumentation doesn't
appear, but it should be more explicit about which instrumentation it
actually expects to appear.
Vedant Kumar [Sat, 18 Feb 2017 01:50:14 +0000 (01:50 +0000)]
[profiling] Tighten test cases which refer to "profn" vars. NFC.
The frontend can't see "__profn" profile name variables after IRGen
because llvm throws these away now. Tighten up some test cases which
checked for the non-existence of those variables.
Richard Smith [Sat, 18 Feb 2017 00:32:02 +0000 (00:32 +0000)]
[modules] Load the ModuleOffsetMap from the module header lazily.
If we never need to map any ID within the module to its global ID, we don't
need the module offset map. If a compilation transitively depends on lots of
unused module files, this can result in a modest performance improvement.
Vedant Kumar [Fri, 17 Feb 2017 23:22:59 +0000 (23:22 +0000)]
Retry^2: [ubsan] Reduce null checking of C++ object pointers (PR27581)
This patch teaches ubsan to insert exactly one null check for the 'this'
pointer per method/lambda.
Previously, given a load of a member variable from an instance method
('this->x'), ubsan would insert a null check for 'this', and another
null check for '&this->x', before allowing the load to occur.
Similarly, given a call to a method from another method bound to the
same instance ('this->foo()'), ubsan would a redundant null check for
'this'. There is also a redundant null check in the case where the
object pointer is a reference ('Ref.foo()').
This patch teaches ubsan to remove the redundant null checks identified
above.
Testing: check-clang, check-ubsan, and a stage2 ubsan build.
I also compiled X86FastISel.cpp with -fsanitize=null using
patched/unpatched clangs based on r293572. Here are the number of null
checks emitted:
Vedant Kumar [Fri, 17 Feb 2017 23:22:55 +0000 (23:22 +0000)]
[ubsan] Pass a set of checks to skip to EmitTypeCheck() (NFC)
CodeGenFunction::EmitTypeCheck accepts a bool flag which controls
whether or not null checks are emitted. Make this a bit more flexible by
changing the bool to a SanitizerSet.
Needed for an upcoming change which deals with a scenario in which we
only want to emit null checks.
Carlo Bertolli [Fri, 17 Feb 2017 21:29:13 +0000 (21:29 +0000)]
[OpenMP] Prepare Sema for initial implementation for pragma 'distribute parallel for'
https://reviews.llvm.org/D29922
This patch adds two fields for use in the implementation of 'distribute parallel for':
The increment expression for the distribute loop. As the chunk assigned to a team is executed by multiple threads within the 'parallel for' region, the increment expression has to correspond to the value returned by the related runtime call (for_static_init).
The upper bound of the innermost loop ('for' in 'distribute parallel for') is not the globalUB expression normally used for pragma 'for' when found in isolation. It is instead the upper bound of the chunk assigned to the team ('distribute' loop). In this way, we prevent teams from executing chunks assigned to other teams.
The use of these two fields can be see in a related explanatory patch:
https://reviews.llvm.org/D29508
Jonas Hahnfeld [Fri, 17 Feb 2017 18:32:51 +0000 (18:32 +0000)]
[OpenMP] Remove barriers at cancel and cancellation point
This resolves a deadlock with the cancel directive when there is no explicit
cancellation point. In that case, the implicit barrier acts as cancellation
point. After removing the barrier after cancel, the now unmatched barrier for
the explicit cancellation point has to go as well.
This has probably worked before rL255992: With the calls for the explicit
barrier, it was sure that all threads passed a barrier before exiting.
was turned into:
ALWAYS_INLINE::std::string getName() ...
If it turns out that clang-format is failing to clean up a lot of the
existing spaces now, we can add more analyses of the identifier. It
should not currently. Cases where clang-format breaks nested name
specifiers should be fine as clang-format wraps after the "::". Thus, a
line getting longer and then shorter again should lead to the same
original code.
Richard Trieu [Fri, 17 Feb 2017 05:54:30 +0000 (05:54 +0000)]
Add better ODR checking for modules.
A slightly weaker form of ODR checking than previous attempts, but hopefully
won't break the modules build bot. Future work will be needed to catch all
cases.
When objects are imported for modules, there is a chance that a name collision
will cause an ODR violation. Previously, only a small number of such
violations were detected. This patch provides a stronger check based on
AST nodes.
The information needed to uniquely identify an object is taken from the AST and
put into a one-dimensional byte stream. This stream is then hashed to give
a value to represent the object, which is stored with the other object data
in the module.
When modules are loaded, and Decl's are merged, the hash values of the two
Decl's are compared. Only Decl's with matched hash values will be merged.
Mismatch hashes will generate a module error, and if possible, point to the
first difference between the two objects.
The transform from AST to byte stream is a modified depth first algorithm.
Due to references between some AST nodes, a pure depth first algorithm could
generate loops. For Stmt nodes, a straight depth first processing occurs.
For Type and Decl nodes, they are replaced with an index number and only on
first visit will these nodes be processed. As an optimization, boolean
values are saved and stored together in reverse order at the end of the
byte stream to lower the ammount of data that needs to be hashed.
Compile time impact was measured at 1.5-2.0% during module building, and
negligible during builds without module building.
Vedant Kumar [Fri, 17 Feb 2017 02:03:51 +0000 (02:03 +0000)]
Retry: [ubsan] Reduce null checking of C++ object pointers (PR27581)
This patch teaches ubsan to insert exactly one null check for the 'this'
pointer per method/lambda.
Previously, given a load of a member variable from an instance method
('this->x'), ubsan would insert a null check for 'this', and another
null check for '&this->x', before allowing the load to occur.
Similarly, given a call to a method from another method bound to the
same instance ('this->foo()'), ubsan would a redundant null check for
'this'. There is also a redundant null check in the case where the
object pointer is a reference ('Ref.foo()').
This patch teaches ubsan to remove the redundant null checks identified
above.
Testing: check-clang and check-ubsan. I also compiled X86FastISel.cpp
with -fsanitize=null using patched/unpatched clangs based on r293572.
Here are the number of null checks emitted:
Vedant Kumar [Fri, 17 Feb 2017 01:05:42 +0000 (01:05 +0000)]
[ubsan] Reduce null checking of C++ object pointers (PR27581)
This patch teaches ubsan to insert exactly one null check for the 'this'
pointer per method/lambda.
Previously, given a load of a member variable from an instance method
('this->x'), ubsan would insert a null check for 'this', and another
null check for '&this->x', before allowing the load to occur.
Similarly, given a call to a method from another method bound to the
same instance ('this->foo()'), ubsan would a redundant null check for
'this'. There is also a redundant null check in the case where the
object pointer is a reference ('Ref.foo()').
This patch teaches ubsan to remove the redundant null checks identified
above.
Testing: check-clang and check-ubsan. I also compiled X86FastISel.cpp
with -fsanitize=null using patched/unpatched clangs based on r293572.
Here are the number of null checks emitted:
This patch implements codegen for the reduction clause on
any teams construct for elementary data types. It builds
on parallel reductions on the GPU. Subsequently,
the team master writes to a unique location in a global
memory scratchpad. The last team to do so loads and
reduces this array to calculate the final result.
This patch emits two helper functions that are used by
the OpenMP runtime on the GPU to perform reductions across
teams.
Patch by Tian Jin in collaboration with Arpith Jacob
This patch implements codegen for the reduction clause on
any parallel construct for elementary data types. An efficient
implementation requires hierarchical reduction within a
warp and a threadblock. It is complicated by the fact that
variables declared in the stack of a CUDA thread cannot be
shared with other threads.
The patch creates a struct to hold reduction variables and
a number of helper functions. The OpenMP runtime on the GPU
implements reduction algorithms that uses these helper
functions to perform reductions within a team. Variables are
shared between CUDA threads using shuffle intrinsics.
An implementation of reductions on the NVPTX device is
substantially different to that of CPUs. However, this patch
is written so that there are minimal changes to the rest of
OpenMP codegen.
The implemented design allows the compiler and runtime to be
decoupled, i.e., the runtime does not need to know of the
reduction operation(s), the type of the reduction variable(s),
or the number of reductions. The design also allows reuse of
host codegen, with appropriate specialization for the NVPTX
device.
While the patch does introduce a number of abstractions, the
expected use case calls for inlining of the GPU OpenMP runtime.
After inlining and optimizations in LLVM, these abstractions
are unwound and performance of OpenMP reductions is comparable
to CUDA-canonical code.
Patch by Tian Jin in collaboration with Arpith Jacob
This patch implements codegen for the reduction clause on
any parallel construct for elementary data types. An efficient
implementation requires hierarchical reduction within a
warp and a threadblock. It is complicated by the fact that
variables declared in the stack of a CUDA thread cannot be
shared with other threads.
The patch creates a struct to hold reduction variables and
a number of helper functions. The OpenMP runtime on the GPU
implements reduction algorithms that uses these helper
functions to perform reductions within a team. Variables are
shared between CUDA threads using shuffle intrinsics.
An implementation of reductions on the NVPTX device is
substantially different to that of CPUs. However, this patch
is written so that there are minimal changes to the rest of
OpenMP codegen.
The implemented design allows the compiler and runtime to be
decoupled, i.e., the runtime does not need to know of the
reduction operation(s), the type of the reduction variable(s),
or the number of reductions. The design also allows reuse of
host codegen, with appropriate specialization for the NVPTX
device.
While the patch does introduce a number of abstractions, the
expected use case calls for inlining of the GPU OpenMP runtime.
After inlining and optimizations in LLVM, these abstractions
are unwound and performance of OpenMP reductions is comparable
to CUDA-canonical code.
Patch by Tian Jin in collaboration with Arpith Jacob
[OpenCL][Doc] Added OpenCL vendor extension description to user manual doc
Added description of a new feature that allows to specify
vendor extension in flexible way using compiler pragma instead
of modifying source code directly (committed in clang@r289979).
Erik Verbruggen [Thu, 16 Feb 2017 09:49:30 +0000 (09:49 +0000)]
Cache FileID when translating diagnostics in PCH files
Modules/preambles/PCH files can contain diagnostics, which, when used,
are added to the current ASTUnit. For that to work, they are translated
to use the current FileManager's FileIDs. When the entry is not the
main file, all local source locations will be checked by a linear
search. Now this is a problem, when there are lots of diagnostics (say,
25000) and lots of local source locations (say, 440000), and end up
taking seconds when using such a preamble.
The fix is to cache the last FileID, because many subsequent diagnostics
refer to the same file. This reduces the time spent in
ASTUnit::TranslateStoredDiagnostics from seconds to a few milliseconds
for files with many slocs/diagnostics.
This fixes PR31353.
Differential Revision: https://reviews.llvm.org/D29755
Richard Trieu [Thu, 16 Feb 2017 04:53:40 +0000 (04:53 +0000)]
Add better ODR checking for modules.
Recommit r293585 that was reverted in r293611 with new fixes. The previous
issue was determined to be an overly aggressive AST visitor from forward
declared objects. The visitor will now only deeply visit certain Decl's and
only do a shallow information extraction from all other Decl's.
When objects are imported for modules, there is a chance that a name collision
will cause an ODR violation. Previously, only a small number of such
violations were detected. This patch provides a stronger check based on
AST nodes.
The information needed to uniquely identify an object is taken from the AST and
put into a one-dimensional byte stream. This stream is then hashed to give
a value to represent the object, which is stored with the other object data
in the module.
When modules are loaded, and Decl's are merged, the hash values of the two
Decl's are compared. Only Decl's with matched hash values will be merged.
Mismatch hashes will generate a module error, and if possible, point to the
first difference between the two objects.
The transform from AST to byte stream is a modified depth first algorithm.
Due to references between some AST nodes, a pure depth first algorithm could
generate loops. For Stmt nodes, a straight depth first processing occurs.
For Type and Decl nodes, they are replaced with an index number and only on
first visit will these nodes be processed. As an optimization, boolean
values are saved and stored together in reverse order at the end of the
byte stream to lower the ammount of data that needs to be hashed.
Compile time impact was measured at 1.5-2.0% during module building, and
negligible during builds without module building.
Faisal Vali [Thu, 16 Feb 2017 04:12:21 +0000 (04:12 +0000)]
[cxx1z-constexpr-lambda] Implement captures - thus completing implementation of constexpr lambdas.
Enable evaluation of captures within constexpr lambdas by using a strategy similar to that used in CodeGen:
- when starting evaluation of a lambda's call operator, create a map from VarDecl's to a closure's FieldDecls
- every time a VarDecl (or '*this) that represents a capture is encountered while evaluating the expression via the expression evaluator (specifically the LValueEvaluator) in ExprConstant.cpp - it is replaced by the corresponding FieldDecl LValue (an Lvalue-to-Rvalue conversion on this LValue representation then determines the right rvalue when needed).
Thanks to Richard Smith and Hubert Tong for their review and feedback!
Richard Smith [Thu, 16 Feb 2017 03:49:44 +0000 (03:49 +0000)]
Add missing "deduced A == A" check for function template partial ordering.
This appears to be the only template argument deduction context where we were
missing this check. Surprisingly, other implementations also appear to miss
the check in this case; it may turn out that important code is relying on
the widespread non-conformance here, in which case we'll need to reconsider.
Vedant Kumar [Thu, 16 Feb 2017 01:20:00 +0000 (01:20 +0000)]
[Sema] Add lvalue-to-rvalue cast in direct-list-initialization of enum
After r264564, we allowed direct-list-initialization of an enum from an
integral value in C++1z mode, so long as that value can convert to the
enum's underlying type.
In this kind of initialization, we need a lvalue-to-rvalue conversion
for the initializer value if it is not a rvalue. This lets us accept the
following code:
enum class A : unsigned {};
A foo(unsigned x) { return A{x}; }
Hans Wennborg [Wed, 15 Feb 2017 23:28:10 +0000 (23:28 +0000)]
[dllimport] Check for dtor references in functions
Destructor references are not modelled explicitly in the AST. This adds
checks for destructor calls due to variable definitions and temporaries.
If a dllimport function references a non-dllimport destructor, it must
not be emitted available_externally, as the referenced destructor might
live across the DLL boundary and isn't exported.
[Modules] Consider enable_if attrs in isSameEntity.
Two functions that differ only in their enable_if attributes are
considered overloads, so we should check for those when we're trying to
figure out if two functions are mergeable.
We need to do the same thing for pass_object_size, as well. Looks like
that'll be a bit less trivial, since we sometimes do these merging
checks before we have pass_object_size attributes available (see the
merge checks in ASTDeclReader::VisitFunctionDecl that happen before we
read parameters, and merge checks in calls to ReadDeclAs<>()).
Gabor Horvath [Wed, 15 Feb 2017 15:35:56 +0000 (15:35 +0000)]
[analyzer] Proper caching in CallDescription objects.
During the review of D29567 it turned out the caching in CallDescription is not implemented properly. In case an identifier does not exist in a translation unit, repeated identifier lookups will be done which might have bad impact on the performance. This patch guarantees that the lookup is only executed once. Moreover this patch fixes a corner case when the identifier of CallDescription does not exist in the translation unit and the called function does not have an identifier (e.g.: overloaded operator in C++).
Akira Hatanaka [Wed, 15 Feb 2017 05:15:28 +0000 (05:15 +0000)]
[Sema] Disallow returning a __block variable via a move.
r274291 made changes to prefer calling a move constructor to calling a
copy constructor when returning from a function. This caused programs to
crash when a __block variable in the heap was moved out and used later.
This commit fixes the bug by disallowing moving out of __block variables
implicitly.
Richard Smith [Wed, 15 Feb 2017 01:16:48 +0000 (01:16 +0000)]
Don't look for GCC versions in /usr/lib/<triple> except when <triple> is a
freescale triple.
On multiarch systems, this previously caused us to stat every file in
/usr/lib/<triple> (typically several thousand files). This change halves
the runtime of a clang invocation on an empty file on my system.
Tim Shen [Tue, 14 Feb 2017 23:46:37 +0000 (23:46 +0000)]
[VLA] Handle VLA size expression in a full-expression context.
Summary: Previously the cleanups (e.g. dtor calls) are inserted into the
outer scope (e.g. function body scope), instead of it's own scope. After
the fix, the cleanups are inserted right after getting the size value.
Richard Smith [Tue, 14 Feb 2017 23:27:44 +0000 (23:27 +0000)]
Do not implicitly instantiate the definition of a class template specialization
that has been explicitly specialized!
We assume in various places that we can tell the template specialization kind
of a class type by looking at the declaration produced by TagType::getDecl.
That was previously not quite true: for an explicit specialization, we could
have first seen a template-id denoting the specialization (with a use that does
not trigger an implicit instantiation of the defintiion) and then seen the
first explicit specialization declaration. TagType::getDecl would previously
return an arbitrary declaration when called on a not-yet-defined class; it
now consistently returns the most recent declaration in that case.