Tim Northover [Tue, 3 May 2016 19:22:41 +0000 (19:22 +0000)]
AArch64: simplify illegal vector check. NFC.
Use a utility function to check whether the number of elements is a power of 2
and drop the redundant upper limit (a 128-bit vector with more than 16 elements
would have each element < 8 bits, not possible).
Reid Kleckner [Tue, 3 May 2016 18:44:29 +0000 (18:44 +0000)]
[MS] Pass CalleeDecl to adjustThisArgumentForVirtualFunctionCall
If we are devirtualizing, then we want to compute the 'this' adjustment
of the devirtualized target, not the adjustment of the base's method
definition, which is usually zero.
Pete Cooper [Tue, 3 May 2016 18:32:01 +0000 (18:32 +0000)]
Change test to use regex instead of explicit value numbers. NFC.
We were seeing an internal failure when running this test. I can't
see a good reason for the difference, but the simple fix is to use
%{{.*}} instead of %1.
Reid Kleckner [Mon, 2 May 2016 22:42:34 +0000 (22:42 +0000)]
Fix argument expansion of reference fields of structs
r268261 made Clang "expand" more struct arguments on Windows. It removed
the check for 'RD->isCLike()', which was preventing us from attempting
to expand structs with reference type fields.
Our expansion code was attempting to load and pass each field of the
type in turn. We were accidentally doing one to many loads on reference
type fields.
On the function prologue side, we can use
EmitLValueForFieldInitialization, which obviously gets the address of
the field. On the call side, I tweaked EmitRValueForField directly,
since this is the only use of this method.
Akira Hatanaka [Mon, 2 May 2016 22:29:40 +0000 (22:29 +0000)]
Remove unneeded test in tryCaptureAsConstant.
It isn't necessary to call hasDefaultArg because we can't rematerialize
a captured variable that is a function parameter, regardless of whether
or not it has a default argument. NFC.
Artem Belevich [Mon, 2 May 2016 20:30:03 +0000 (20:30 +0000)]
[CUDA] Make sure device-side __global__ functions are always visible.
__global__ functions are a special case in CUDA.
Even when the symbol would normally not be externally
visible according to C++ rules, they still must be visible
in CUDA GPU object so host-side stub can launch them.
While using it in the shell is fine, this a problem when cc1as is
invoked directly by the driver because single quoting the clang full
version makes cc1as write out the version with the quotes in the final
binary.
If the user wants to copy-n-pastable output, it could use either -###
or CC_PRINT_OPTIONS=1 clang -v ...
Reid Kleckner [Mon, 2 May 2016 17:41:07 +0000 (17:41 +0000)]
Expand aggregate arguments more often on 32-bit Windows
Before this change, we would pass all non-HFA record arguments on
Windows with byval. Byval often blocks optimizations and results in bad
code generation. Windows now uses the existing workaround that other
x86_32 platforms use.
I also expanded the workaround to handle C++ records with constructors
on Windows. On non-Windows platforms, we have to keep generating the
same LLVM IR prototypes if we want our bitcode to be ABI compatible.
Otherwise we will encounter mismatch issues like PR21573.
Essentially fixes PR27522 in Clang instead of LLVM.
This exposes the Clang API bindings clang_getChildDiagnostics (which returns a
CXDiagnosticSet) and clang_getNumDiagnosticsInSet / clang_getDiagnosticInSet (to
traverse the CXDiagnosticSet), and adds a helper children property in the Python
Diagnostic wrapper.
Also, this adds the missing OVERLOAD_CANDIDATE (700) cursor type.
[NFC] Initialize a variable to make buildbot green.
In r268085 "[MS] Make #pragma pack use PragmaStack<> class." there was an
uninitialized variable 'Alignment', which caused the following failure:
http://lab.llvm.org:8011/builders/sanitizer-ppc64be-linux/builds/1758
Zero-initialize the variable to fix this failure.
Chris Bieneman [Fri, 29 Apr 2016 22:28:34 +0000 (22:28 +0000)]
Add a new warning to notify users of mismatched SDK and deployment target
Summary:
This patch adds a new driver warning -Wincompatible-sdk which notifies the user when they are mismatching the version min options and the sysroot.
The patch works by checking the sysroot (if present) for an SDK name, then matching that against the target platform. In the case of a mismatch it logs a warning.
Method Pool in modules: we make sure that if a module contains an entry for
a selector, the entry should be complete, containing everything introduced by
that module and all modules it imports.
Before writing out the method pool of a module, we sync up the out of date
selectors by pulling in methods for the selectors, from all modules it imports.
In ReadMethodPool, after pulling in the method pool entry for module A, this
lets us skip the modules that module A imports.
Make implementation of #pragma pack consistent with other "stack" pragmas.
Use PragmaStack<> class instead of old representation of internal stack.
Don't change compiler's behavior.
TODO:
1. Introduce diagnostics on popping named slots from pragma stacks.
Robert Lougher [Fri, 29 Apr 2016 17:44:29 +0000 (17:44 +0000)]
Improve test coverage of -Wdouble-promotion
This patch adds coverage for additional cases where implicit conversion can
occur (assignment and return). It also adds tests for some cases where a
warning should occur but none is produced. These are marked as FIXME.
Paul Robinson [Fri, 29 Apr 2016 17:03:34 +0000 (17:03 +0000)]
Add a Subjects line to NoDebugAttr [NFC].
The 'nodebug' attribute had hand-coded constraints; replace those with
a Subjects line in Attr.td.
Also add a missing test to verify the attribute is okay on an
Objective-C method.
[ARM] Guard the declarations of f16 to f32 vcvt intrinsics in arm_neon.h by testing __ARM_FP
Summary:
Conversions between float and half are only available when the
taraget has the half-precision extension. Guard these intrinsics
so that they don't cause crashes in the backend.
Recommit "[MS] Improved implementation of stack pragmas (vtordisp, *_seg)"
Slightly updated version, double-checked build and tests.
Improve implementation of MS pragmas that use stack + compatibility fixes.
This patch:
1. Changes implementation of #pragma vtordisp to use PragmaStack class
that other stack pragmas use;
2. Fixes "#pragma vtordisp()" behavior - it shouldn't affect the stack;
3. Supports "save-restore" of pragma stacks on enter / exit a C++ method
body, as MSVC does.
TODO:
1. Change implementation of #pragma pack to use the same approach;
2. Introduce diagnostics on popping named stack slots, as MSVC does.
[Parser] Clear the TemplateParamScope bit of the current scope's flag
if we are parsing a template specialization.
This commit makes changes to clear the TemplateParamScope bit and set
the TemplateParamParent field of the current scope to null if a template
specialization is being parsed.
Before this commit, Sema::ActOnStartOfLambdaDefinition would check
whether the parent template scope had any decls to determine whether
or not a template specialization was being parsed. This wasn't correct
since it couldn't distinguish between a real template specialization and
a template defintion with an unnamed template parameter (only template
parameters with names are added to the scope's decl list). To fix the
bug, this commit changes the code to check the pointer to the parent
template scope rather than the decl list.
Carlo Bertolli [Fri, 29 Apr 2016 01:37:30 +0000 (01:37 +0000)]
[OPENMP] Enable correct generation of runtime call when target directive is separated from teams directive by multiple curly brackets
http://reviews.llvm.org/D18474
This patch fixes a bug in code generation of the correct OpenMP runtime library call in presence of target and teams, when target is separated by teams with multiple curly brackets. The current implementation will not be able to see the teams directive inside target and issue a call to tgt_target instead of the correct one tgt_target_teams.
Richard Smith [Fri, 29 Apr 2016 01:23:20 +0000 (01:23 +0000)]
PR27549: fix bug that resulted in us giving a translation-unit-scope variable a
mangled name if it happened to be declared in an 'extern "C++"' context. This
also causes us to use the '_ZL' mangling rather than the '_Z' mangling for
internal-linkage entities that are wrapped in a language linkage construct.
Avoid -Wshadow warnings about constructor parameters named after fields
Usually these parameters are used solely to initialize the field in the
initializer list, and there is no real shadowing confusion.
There is a new warning under -Wshadow called
-Wshadow-field-in-constructor-modified. It attempts to find
modifications of such constructor parameters that probably intended to
modify the field.
It has some false negatives, though, so there is another warning group,
-Wshadow-field-in-constructor, which always warns on this special case.
For users who just want the old behavior and don't care about these fine
grained groups, we have a new warning group called -Wshadow-all that
activates everything.
[Sema] Fix a crash that occurs when a variable template is initialized
with a generic lambda.
This patch fixes Sema::InstantiateVariableInitializer to switch to the
context of the variable before instantiating its initializer, which is
necessary to set the correct type for VarTemplateSpecializationDecl.
This is the first part of the patch that was reviewed here:
http://reviews.llvm.org/D19175
Adrian Prantl [Thu, 28 Apr 2016 17:21:56 +0000 (17:21 +0000)]
Debug info: Apply an artificial debug location to __cyg_profile_func.* calls.
The LLVM Verifier expects all inlinable calls in debuggable functions to
have a location.
Tim Northover [Thu, 28 Apr 2016 13:59:55 +0000 (13:59 +0000)]
ARMv7k: define __ARM_PCS_VFP since we're hard-float.
It's a little debateable because we're not truly AAPCS, so I'm
certainly not going to define __ARM_PCS, but __ARM_PCS_VFP seems to be
really an "hard-float" define, which is a useful thing to have.
[OPENMP 4.5] Initial codegen for 'taskloop simd' directive.
OpenMP 4.5 defines 'taskloop simd' directive, which is combined
directive for 'taskloop' and 'simd' directives. Patch adds initial
codegen support for this directive and its 2 basic clauses 'safelen' and
'simdlen'.
Benjamin Kramer [Thu, 28 Apr 2016 12:14:47 +0000 (12:14 +0000)]
Revert r267784, r267824 and r267830.
It makes compiler-rt tests fail if the gold plugin is enabled.
Revert "Rework interface for bitset-using features to use a notion of LTO visibility."
Revert "Driver: only produce CFI -fvisibility= error when compiling."
Revert "clang/test/CodeGenCXX/cfi-blacklist.cpp: Exclude ms targets. They would be non-cfi."
PR27216: Only define __ARM_FEATURE_FMA when the target has VFPv4
Summary:
According to the ACLE spec, "__ARM_FEATURE_FMA is defined to 1 if
the hardware floating-point architecture supports fused floating-point
multiply-accumulate".
This changes clang's behaviour from emitting this macro for v7-A and v7-R
cores to only emitting it when the target has VFPv4 (and therefore support
for the floating point multiply-accumulate instruction).
[MS] Improved implementation of MS stack pragmas (vtordisp, *_seg)
Rework implementation of several MS pragmas that use internal stack:
vtordisp, {bss|code|const|data}_seg.
This patch:
1. Makes #pragma vtordisp use PragmaStack class as *_seg pragmas do;
2. Fixes "#pragma vtordisp()" behavior: it shouldn't affect stack;
3. Saves/restores the stacks on enter/exit a C++ method body.
[OPENMP 4.5] Codegen for 'grainsize/num_tasks' clauses of 'taskloop'
directive.
OpenMP 4.5 defines 'taskloop' directive and 2 additional clauses
'grainsize' and 'num_tasks' for this directive. Patch adds codegen for
these clauses.
These clauses are generated as arguments of the '__kmpc_taskloop'
libcall and are encoded the following way:
void __kmpc_taskloop(ident_t *loc, int gtid, kmp_task_t *task, int if_val, kmp_uint64 *lb, kmp_uint64 *ub, kmp_int64 st, int nogroup, int sched, kmp_uint64 grainsize, void *task_dup);
If 'grainsize' is specified, 'sched' argument must be set to '1' and
'grainsize' argument must be set to the value of the 'grainsize' clause.
If 'num_tasks' is specified, 'sched' argument must be set to '2' and
'grainsize' argument must be set to the value of the 'num_tasks' clause.
It is possible because these 2 clauses are mutually exclusive and can't
be used at the same time on the same directive.
If none of these clauses is specified, 'sched' argument must be set to
'0'.
Samuel Antao [Wed, 27 Apr 2016 23:14:30 +0000 (23:14 +0000)]
[OpenMP] Code generation for target exit data directive
Summary:
This patch adds support for the target exit data directive code generation.
Given that, apart from the employed runtime call, target exit data requires the same code generation pattern as target enter data, the OpenMP codegen entry point was renamed and reused for both.
Samuel Antao [Wed, 27 Apr 2016 22:40:57 +0000 (22:40 +0000)]
[OpenMP] Map clause codegeneration.
Summary:
Implement codegen for the map clause. All the new list items in 4.5 specification are supported.
Fix bug in the generation of array sections that was exposed by some of the map clause tests: for pointer types the offsets have to be calculated from the pointee not the pointer.
Richard Smith [Wed, 27 Apr 2016 21:57:05 +0000 (21:57 +0000)]
[modules] When diagnosing a missing module import, suggest adding a #include if
the current language doesn't have an import syntax and we can figure out a
suitable file to include.
Rework interface for bitset-using features to use a notion of LTO visibility.
Bitsets, and the compiler features they rely on (vtable opt, CFI),
only have visibility within the LTO'd part of the linkage unit. Therefore,
only enable these features for classes with hidden LTO visibility. This
notion is based on object file visibility or (on Windows)
dllimport/dllexport attributes.
We provide the [[clang::lto_visibility_public]] attribute to override the
compiler's LTO visibility inference in cases where the class is defined
in the non-LTO'd part of the linkage unit, or where the ABI supports
calling classes derived from abstract base classes with hidden visibility
in other linkage units (e.g. COM on Windows).
If the cross-DSO CFI mode is enabled, bitset checks are emitted even for
classes with public LTO visibility, as that mode uses a separate mechanism
to cause bitsets to be exported.
This mechanism replaces the whole-program-vtables blacklist, so remove the
-fwhole-program-vtables-blacklist flag.
Because __declspec(uuid()) now implies [[clang::lto_visibility_public]], the
support for the special attr:uuid blacklist entry is removed.