Justin Lebar [Thu, 5 Jan 2017 16:53:21 +0000 (16:53 +0000)]
[CUDA] More correctly inherit primitive types from the host during device compilation.
Summary:
CUDA lets users share structs between the host and device, so for that
and other reasons, primitive types such as ptrdiff_t should be the same
on both sides of the compilation.
Our code to do this wasn't entirely successful. In particular, we did a
bunch of work during the NVPTXTargetInfo constructor, only to override
it in the NVPTX{32,64}TargetInfo constructors. It worked well enough on
Linux and Mac, but Windows is LLP64, which is different enough to break
it.
This patch removes the NVPTX{32,64}TargetInfo classes entirely and fixes
the bug described above.
Justin Lebar [Thu, 5 Jan 2017 16:52:29 +0000 (16:52 +0000)]
[Driver] Driver changes to support CUDA compilation on Windows.
Summary:
For the most part this is straightforward: Just add a CudaInstallation
object to the MSVC and MinGW toolchains.
CudaToolChain has to override computeMSVCVersion so that
Clang::constructJob passes the right version flag to cc1. We have to
modify IsWindowsMSVC and friends in Clang::constructJob to be true when
compiling CUDA device code on Windows for the same reason.
Justin Lebar [Thu, 5 Jan 2017 16:52:11 +0000 (16:52 +0000)]
[CUDA] Make CUDAInstallationDetector take the host triple in its constructor.
Summary:
Previously it was taking the true target triple, which is not really
what it needs: The location of the CUDA installation depends on the host
OS.
Justin Lebar [Thu, 5 Jan 2017 16:51:54 +0000 (16:51 +0000)]
[TableGen] Only normalize the spelling of GNU-style attributes.
Summary:
When Sema looks up an attribute name, it strips off leading and trailing
"__" if the attribute is GNU-style. That is, __attribute__((foo)) and
__attribute__((__foo__)) are equivalent.
This is only true for GNU-style attributes. In particular,
__declspec(__foo__) is not equivalent to __declspec(foo), and Sema
respects this difference.
This patch fixes TableGen to match Sema's behavior. The spelling
'GNU<"__foo__">' should be normalized to 'GNU<"foo">', but
'Declspec<"__foo__">' should not be changed.
This is necessary to make CUDA compilation work on Windows, because e.g.
the __device__ attribute is spelled __declspec(__device__).
Attr.td does not contain any Declspec spellings that start or end with
"__", so this change should not affect any other attributes.
Justin Lebar [Thu, 5 Jan 2017 16:51:37 +0000 (16:51 +0000)]
[Windows] Remove functions in intrin.h that are defined in Builtin.def.
Summary:
These duplicate declarations cause a problem for CUDA compiles on
Windows. All implicitly-defined functions are host+device, and this
applies to the declarations in Builtin.def. But then when we see the
declarations in intrin.h, they have no attributes, so are host-only
functions. This is an error.
(A better fix might be to make these builtins host-only, but that is a
much bigger change.)
Samuel Antao [Thu, 5 Jan 2017 16:02:49 +0000 (16:02 +0000)]
[OpenMP] Add fields for flags in the offload entry descriptor.
Summary:
This patch adds two fields to the offload entry descriptor. One field is meant to signal Ctors/Dtors and `link` global variables, and the other is reserved for runtime library use.
Currently, these fields are only filled with zeros in the current code generation, but that will change when `declare target` is added.
The reason, we are adding these fields now is to make the code generation consistent with the runtime library proposal under review in https://reviews.llvm.org/D14031.
inline assembly may use the `.include` directive to include other
content into the file. Without the integrated assembler, the `-I` group
gets passed to the assembler. Emulate this by collecting the header
search paths and passing them to the IAS.
This patch includes updates for codegen of the target region for the NVPTX
device. It moves initializers from the compiler to the runtime and updates
the worker loop to assume parallel work is retrieved from the runtime. A
subsequent patch will update the codegen to retrieve the parallel work using
calls to the runtime. It includes the removal of the inline attribute
for the worker loop and disabling debug info in it.
This allows codegen for a target directive and serial execution on the
NVPTX device.
Dylan McKay [Thu, 5 Jan 2017 05:20:27 +0000 (05:20 +0000)]
Add AVR target and toolchain to Clang
Summary:
Authored by Senthil Kumar Selvaraj
This patch adds barebones support in Clang for the (experimental) AVR target. It uses the integrated assembler for assembly, and the GNU linker for linking, as lld doesn't know about the target yet.
The DataLayout string is the same as the one in AVRTargetMachine.cpp. The alignment specs look wrong to me, as it's an 8 bit target and all types only need 8 bit alignment. Clang failed with a datalayout mismatch error when I tried to change it, so I left it that way for now.
Richard Smith [Thu, 5 Jan 2017 02:31:32 +0000 (02:31 +0000)]
Fix assertion failure on deduction failure due to too short template argument list.
We were previously incorrectly using TDK_TooFewArguments to report a template
argument list that's too short, but it actually means that the number of
arguments in a top-level function call was insufficient. When diagnosing the
problem, SemaOverload would (rightly) assert that the failure kind didn't make
any sense.
Looks like these functions exist just to prevent bad implicit
conversions. Rather than waiting for the linker to complain about
undefined references to them, we can mark them as deleted.
Reid Kleckner [Thu, 5 Jan 2017 01:08:22 +0000 (01:08 +0000)]
[MS] Instantiate default args during instantiation of exported default ctors
Summary:
Replace some old code that probably pre-dated the change to delay
emission of dllexported code until after the closing brace of the
outermost record type. Only uninstantiated default argument expressions
need to be handled now. It is enough to instantiate default argument
expressions when instantiating dllexported default ctors. This also
fixes some double-diagnostic issues in this area.
Erich Keane [Thu, 5 Jan 2017 00:20:51 +0000 (00:20 +0000)]
Correct Vectorcall Register passing and HVA Behavior
Front end component (back end changes are D27392). The vectorcall
calling convention was broken subtly in two cases. First,
it didn't properly handle homogeneous vector aggregates (HVAs).
Second, the vectorcall specification requires that only the
first 6 parameters be eligible for register assignment.
This patch fixes both issues.
[gtest] The way EXPECT_TEST now works after upgrading gtest triggers an
ODR use. These traits don't have a definition as they're intended to be
used strictly at compile time. Change the tests to use static_assert to
move the entire thing into compile-time.
Richard Smith [Wed, 4 Jan 2017 23:14:16 +0000 (23:14 +0000)]
Bail out if we try to build a DeclRefExpr naming an invalid declaration.
Most code paths would already bail out in this case, but certain paths,
particularly overload resolution and typo correction, would not. Carrying on
with an invalid declaration could in some cases result in crashes due to
downstream code relying on declaration invariants that are not necessarily
met for invalid declarations, and in other cases just resulted in undesirable
follow-on diagnostics.
[Parse] Don't ignore attributes after a late-parsed attr.
Without this, we drop everything after the first late-parsed attribute
in a single __attribute__. (Where "drop" means "stuff everything into
LA->Toks.")
David Blaikie [Wed, 4 Jan 2017 22:36:39 +0000 (22:36 +0000)]
Remove use of intrusive ref count ownership acquisition
The one use of CheckerManager (AnalysisConsumer, calling
createCheckerManager) keeps a strong reference to the AnalysisOptions
anyway, so this ownership wasn't necessary.
(I'm not even sure AnalysisOptions needs ref counting at all - but
that's more involved)
Richard Smith [Wed, 4 Jan 2017 22:03:59 +0000 (22:03 +0000)]
Fix failure to treat overloaded function in braced-init-list as a non-deduced context.
Previously, if an overloaded function in a braced-init-list was encountered in
template argument deduction, and the overload set couldn't be resolved to a
particular function, we'd immediately produce a deduction failure. That's not
correct; this situation is supposed to result in that particular P/A pair being
treated as a non-deduced context, and deduction can still succeed if the type
can be deduced from elsewhere.
Reid Kleckner [Wed, 4 Jan 2017 19:15:53 +0000 (19:15 +0000)]
Support -fno-delayed-template-parsing in clang-cl.exe
Summary:
This change adds support for the -fno-delayed-template-parsing option in
clang-cl.exe. This allows developers using clang-cl.exe to opt out of
emulation of MSVC's non-conformant template instantiation implementation
while continuing to use clang-cl.exe for its emulation of cl.exe
command-line options. The default behavior of clang-cl.exe
(-fdelayed-template-parsing) is unchanged.
The MSVC Standard Library implementation uses clang-cl.exe with this
switch in its tests to ensure that the library headers work on compilers
with the conformant two-phase-lookup behavior.
This patch includes updates for codegen of the target region for the NVPTX
device. It moves initializers from the compiler to the runtime and updates
the worker loop to assume parallel work is retrieved from the runtime. A
subsequent patch will update the codegen to retrieve the parallel work using
calls to the runtime. It includes the removal of the inline attribute
for the worker loop and disabling debug info in it.
This allows codegen for a target directive and serial execution on the
NVPTX device.
Alex Lorenz [Wed, 4 Jan 2017 13:40:34 +0000 (13:40 +0000)]
Add -f[no-]strict-return flag that can be used to avoid undefined behaviour
in non-void functions that fall off at the end without returning a value when
compiling C++.
Clang uses the new compiler flag to determine when it should treat control flow
paths that fall off the end of a non-void function as unreachable. If
-fno-strict-return is on, the code generator emits the ureachable and trap
IR only when the function returns either a record type with a non-trivial
destructor or another non-trivially copyable type.
The primary goal of this flag is to avoid treating falling off the end of a
non-void function as undefined behaviour. The burden of undefined behaviour
is placed on the caller instead: if the caller ignores the returned value then
the undefined behaviour is avoided. This kind of behaviour is useful in
several cases, e.g. when compiling C code in C++ mode.
Martin Probst [Wed, 4 Jan 2017 13:36:43 +0000 (13:36 +0000)]
clang-format: [JS] avoid indent after ambient function declarations.
Summary:
Before:
declare function foo();
let x = 1;
After:
declare function foo();
let x = 1;
The problem was that clang-format would unconditionally try to parse a child block, even though ambient function declarations do not have a body (similar to forward declarations).
This diff replaces --driver-mode=cpp in
utils/perf-training/order-files.lit.cfg and
utils/perf-training/lit.cfg with --driver-mode=g++.
clang --driver-mode=cpp will call the preprocessor and will not
trigger compilation.
Richard Smith [Wed, 4 Jan 2017 02:59:16 +0000 (02:59 +0000)]
Fix deduction of pack elements after a braced-init-list.
Previously, if the arguments for a parameter pack contained a braced-init-list,
we would abort deduction (keeping the pack deductions from prior arguments) at
the point when we reached the braced-init-list, resulting in wrong deductions
and rejects-valids. We now just leave a "hole" in the pack for such an argument,
which needs to be filled by another deduction of the same pack.
Richard Smith [Wed, 4 Jan 2017 01:48:55 +0000 (01:48 +0000)]
Fix template argument deduction when only some of a parameter pack is a non-deduced context.
When a parameter pack has multiple corresponding arguments, and some subset of
them are overloaded functions, it's possible that some subset of the parameters
are non-deduced contexts. In such a case, keep deducing from the remainder of
the arguments, and resolve the incomplete pack against whatever other
deductions we've performed for the pack.
GCC, MSVC, and ICC give three different bad behaviors for this case; what we do
now (and what we did before) don't exactly match any of them, sadly :( I'm
getting a core issue opened to specify more precisely how this should be
handled.
Richard Trieu [Wed, 4 Jan 2017 00:46:30 +0000 (00:46 +0000)]
Extend -Wtautological-overlap-compare to more cases.
Previously, -Wtautological-overlap-compare did not warn on cases where the
boolean expression was in an assignment or return statement. This patch
should cause all boolean statements to be passed to the tautological compare
checks in the CFG analysis.
Reid Kleckner [Tue, 3 Jan 2017 21:23:35 +0000 (21:23 +0000)]
[Win64] Don't widen integer literal zero arguments to unprototyped function calls
The special case to widen the integer literal zero when passed to
variadic function calls should only apply to variadic functions, not
unprototyped functions. This is consistent with what MSVC does. In this
test case, MSVC uses a 4-byte store to pass the 5th argument to 'kr' and
an 8-byte store to pass the zero to 'v':
This patch cleans up private methods for NVPTX OpenMP codegen. It converts private
members to static functions to follow the coding style of CGOpenMPRuntime.cpp and
declutter the header file.
Carlo Bertolli [Tue, 3 Jan 2017 18:24:42 +0000 (18:24 +0000)]
[OPENMP] Private, firstprivate, and lastprivate clauses for distribute, host code generation
https://reviews.llvm.org/D17840
This patch enables private, firstprivate, and lastprivate clauses for the OpenMP distribute directive.
Regression tests differ from the similar case of the same clauses on the for directive, by removing a reference to two global variables g and g1. This is necessary because: 1. a distribute pragma is only allowed inside a target region; 2. referring a global variable (e.g. g and g1) in a target region requires the program to enclose the variable in a "declare target" region; 3. declare target pragmas, which are used to define a declare target region, are currently unavailable in clang (patch being prepared).
For this reason, I moved the global declarations into local variables.
Add a field indicating the associated check for every replacement to the YAML
report generated with the '-export-fixes' option. Update
clang-apply-replacements to handle the new format.
Daniel Jasper [Mon, 2 Jan 2017 22:55:45 +0000 (22:55 +0000)]
Remove isIgnored()-test that is more expensive than the analysis behind it
In many translation units I have tried, the calls to isIgnored() removed
in this patch are more expensive than doing the analysis that is behind
it. The speed-up in translation units I have tried is between 10 and
20%.
Renato Golin [Mon, 2 Jan 2017 11:15:42 +0000 (11:15 +0000)]
Revert "DR1391: Check for implicit conversion sequences for non-dependent function template parameters between deduction and substitution. The idea is to accept as many cases as possible, on the basis that substitution failure outside the immediate context is much more common during substitution than during implicit conversion sequence formation."
This reverts commit r290808, as it broken all ARM and AArch64 test-suite
test: MultiSource/UnitTests/C++11/frame_layout
Also, please, next time, try to write a commit message in according to
our guidelines:
Richard Smith [Mon, 2 Jan 2017 02:42:17 +0000 (02:42 +0000)]
DR1391: Check for implicit conversion sequences for non-dependent function
template parameters between deduction and substitution. The idea is to accept
as many cases as possible, on the basis that substitution failure outside
the immediate context is much more common during substitution than during
implicit conversion sequence formation.
This does not implement the partial ordering portion of DR1391, which so
far appears to be misguided.
The MS ABI RTTI has a reserved field which is used as a cache for the
demangled name. It must be zero-initialized, which is used as a hint by
the runtime to say that the cache has not been populated. Since this
field is populated at runtime, the RTTI structures must be placed in the
.data section rather than .rdata. NFC
Richard Smith [Sat, 31 Dec 2016 21:41:23 +0000 (21:41 +0000)]
[c++17] Implement P0522R0 as written. This allows a template template argument
to be specified for a template template parameter whenever the parameter is at
least as specialized as the argument (when there's an obvious and correct
mapping from uses of the parameter to uses of the argument). For example, a
template with more parameters can be passed to a template template parameter
with fewer, if those trailing parameters have default arguments.
This is disabled by default, despite being a DR resolution, as it's fairly
broken in its current state: there are no partial ordering rules to cope with
template template parameters that have different parameter lists, meaning that
code that attempts to decompose template-ids based on arity can hit unavoidable
ambiguity issues.
The diagnostics produced on a non-matching argument are also pretty bad right
now, but I aim to improve them in a subsequent commit.
This diff fixes the clean build of the target generate-order-file.
In llvm/tools/clang/CMakeLists.txt
add_subdirectory(utils/perf-training) should go after the block where
the value of the variable CLANG_ORDER_FILE is set - otherwise
(tested with cmake's version 3.6.2) the arguments of perf-helper.py gen-order-file
will be ill-formed (CLANG_ORDER_FILE will be empty).
Justin Lebar [Thu, 29 Dec 2016 19:59:26 +0000 (19:59 +0000)]
[ADT] Delete RefCountedBaseVPTR.
Summary:
This class is unnecessary.
Its comment indicated that it was a compile error to allocate an
instance of a class that inherits from RefCountedBaseVPTR on the stack.
This may have been true at one point, but it's not today.
Moreover you really do not want to allocate *any* refcounted object on
the stack, vptrs or not, so if we did have a way to prevent these
objects from being stack-allocated, we'd want to apply it to regular
RefCountedBase too, obviating the need for a separate RefCountedBaseVPTR
class.
It seems that the main way RefCountedBaseVPTR provides safety is by
making its subclass's destructor virtual. This may have been helpful at
one point, but these days clang will emit an error if you define a class
with virtual functions that inherits from RefCountedBase but doesn't
have a virtual destructor.
Teresa Johnson [Wed, 28 Dec 2016 18:00:08 +0000 (18:00 +0000)]
[ThinLTO] No need to rediscover imports in distributed backend
Summary:
We can simply import all external values with summaries included in
the individual index file created for the distributed backend job,
as only those are added to the individual index file created by the
WriteIndexesThinBackend (in addition to summaries for the original
module, which are skipped here).
While computing the cross module imports on this index would come to
the same conclusion as the original thin link import logic, it is
unnecessary work. And when tuning, it avoids the need to pass the
same function importing parameters (e.g. -import-instr-limit) to
both the thin link and the backends (otherwise they won't make the
same decisions).
Our newly aggressive constant folding logic makes it possible for
CGExprConstant to see the same CompoundLiteralExpr more than once. So,
emitting a new GlobalVariable every time we see a CompoundLiteral is no
longer correct.
We had a similar issue with BlockExprs that was caught while testing
said aggressive folding, so I applied the same style of fix (see D26410)
here. If we find yet another case where this needs to happen, we should
probably refactor this so we don't have a third DenseMap+getter+setter.
As a design note: getAddrOfConstantCompoundLiteralIfEmitted is really
only intended to be called by ConstExprEmitter::EmitLValue. So,
returning a GlobalVariable* instead of a ConstantAddress costs us
effectively nothing, and saves us either a few bytes per entry in our
map or a bit of code duplication.
Richard Smith [Wed, 28 Dec 2016 06:27:18 +0000 (06:27 +0000)]
Mark 'auto' as dependent when instantiating the type of a non-type template
parameter. Fixes failed deduction for 'auto' non-type template parameters
nested within templates.
Richard Smith [Wed, 28 Dec 2016 02:37:25 +0000 (02:37 +0000)]
DR1315: a non-type template argument in a partial specialization is permitted
to make reference to template parameters. This is only a partial
implementation; we retain the restriction that the argument must not be
type-dependent, since it's unclear how that would work given the existence of
other language rules requiring an exact type match in this context, even for
type-dependent cases (a question has been raised on the core reflector).
David Blaikie [Tue, 27 Dec 2016 22:05:35 +0000 (22:05 +0000)]
DebugInfo: Don't include size/alignment on class declarations
This seems like it must've been a leftover by accident - no tests were
backing it up & it doesn't make much sense to include size/alignment on
class declarations (it'd only be on those declarations for which the
definition was available - otherwise the size/alignment would not be
known).
Richard Smith [Tue, 27 Dec 2016 20:03:09 +0000 (20:03 +0000)]
Add warning flag for "partial specialization is not more specialized than primary template" error (since Eigen hits it), and while I'm here also add a warning flag for "partial specialization is not usable because one or more of its parameters cannot be deduced" warning.
[DOXYGEN] Improved doxygen comments for xmmintrin.h intrinsics.
Added \n commands to insert a line breaks where necessary, since one long line of documentation is nearly unreadable.
Formatted comments to fit into 80 chars.
In some cases added \a command in front of the parameter names to display them in italics.
Richard Smith [Tue, 27 Dec 2016 07:56:27 +0000 (07:56 +0000)]
DR1495: A partial specialization is ill-formed if it is not (strictly) more
specialized than the primary template. (Put another way, if we imagine there
were a partial specialization matching the primary template, we should never
select it if some other partial specialization also matches.)
Richard Smith [Tue, 27 Dec 2016 06:14:37 +0000 (06:14 +0000)]
Work around a standard defect: template argument deduction for non-type
template parameters of reference type basically doesn't work, because we're
always deducing from an argument expression of non-reference type, so the type
of the deduced expression never matches. Instead, compare the type of an
expression naming the parameter to the type of the argument.