Hans Wennborg [Tue, 24 May 2016 20:40:51 +0000 (20:40 +0000)]
[Driver] Add support for -finline-functions and /Ob2 flags
-finline-functions and /Ob2 are currently ignored by Clang. The only way to
enable inlining is to use the global O flags, which also enable other options,
or to emit LLVM bitcode using Clang, then running opt by hand with the inline
pass.
This patch allows to simply use the -finline-functions flag (same as GCC) or
/Ob2 in clang-cl mode to enable inlining without other optimizations.
This is the first patch of a serie to improve support for the /Ob flags.
Summary:
Following patch D19265 which enable software floating point support in the Sparc backend, this patch enables the option to be enabled in the front-end using the -msoft-float option.
The user should ensure a library (such as the builtins from Compiler-RT) that includes the software floating point routines is provided.
Alexey Bataev [Tue, 24 May 2016 07:40:12 +0000 (07:40 +0000)]
[OPENMP] Fixed codegen for firstprivate vars in standalone worksharing
directives.
If firstprivate variable is is captured by value in outlined region and then used as firstprivate variable in inner worksharing directive, the copy for this firstprivate variable was not created. Fixed this bug.
Dmitry Polukhin [Tue, 24 May 2016 06:37:14 +0000 (06:37 +0000)]
[MSVC2015] dllexport for defaulted special class members
Clang doesn't dllexport defaulted special member function defaulted
inside class but does it if they defaulted outside class. MSVC doesn't
make any distinction where they were defaulted. Also MSVC 2013 and 2015
export different set of members. MSVC2015 doesn't emit trivial defaulted
x-tors but does emit copy assign operator.
The statement constructed an anonymous structure which was typedefed. The
anonymous structure has internal linkage, and that would cause an error when
building with modules. Give the type declaration a tag name to address the
error when building with modules.
Simon Pilgrim [Mon, 23 May 2016 22:13:02 +0000 (22:13 +0000)]
[X86][SSE] Replace lossless i32/f32 to f64 conversion intrinsics with generic IR
Both the (V)CVTDQ2PD(Y) (i32 to f64) and (V)CVTPS2PD(Y) (f32 to f64) conversion instructions are lossless and can be safely represented as generic __builtin_convertvector calls instead of x86 intrinsics without affecting final codegen.
This patch removes the clang builtins and their use in the sse2/avx headers - a future patch will deal with removing the llvm intrinsics, but that will require a bit more work.
Richard Smith [Mon, 23 May 2016 20:03:04 +0000 (20:03 +0000)]
Fix filtering of prior declarations when checking for a tag redeclaration to
map to the redecl context for both decls, not just one of them, and to properly
check that the decl contexts are equivalent.
David Majnemer [Mon, 23 May 2016 17:32:35 +0000 (17:32 +0000)]
Address post-commit review feedback to r270457
Add two tests which show our error handling behavior for invalid
parameters in the layout_version and empty_bases attributes.
Amend our documentation to make it more clear that
__declspec(empty_bases) and __declspec(layout_version) can only apply to
classes, structs, and unions.
David Majnemer [Mon, 23 May 2016 17:21:55 +0000 (17:21 +0000)]
Clang support for __is_assignable intrinsic
MSVC now supports the __is_assignable type trait intrinsic,
to enable easier and more efficient implementation of the
Standard Library's is_assignable trait.
As of Visual Studio 2015 Update 3, the VC Standard Library
implementation uses the new intrinsic unconditionally.
The implementation is pretty straightforward due to the previously
existing is_nothrow_assignable and is_trivially_assignable.
We handle __is_assignable via the same code as the other two except
that we skip the extra checks for nothrow or triviality.
David Majnemer [Mon, 23 May 2016 17:16:12 +0000 (17:16 +0000)]
[MS ABI] Implement __declspec(empty_bases) and __declspec(layout_version)
The layout_version attribute is pretty straightforward: use the layout
rules from version XYZ of MSVC when used like
struct __declspec(layout_version(XYZ)) S {};
The empty_bases attribute is more interesting. It tries to get the C++
empty base optimization to fire more often by tweaking the MSVC ABI
rules in subtle ways:
1. Disable the leading and trailing zero-sized object flags if a class
is marked __declspec(empty_bases) and is empty.
This means that given:
struct __declspec(empty_bases) A {};
struct __declspec(empty_bases) B {};
struct C : A, B {};
'C' will have size 1 and nvsize 0 despite not being annotated
__declspec(empty_bases).
2. When laying out virtual or non-virtual bases, disable the injection
of padding between classes if the most derived class is marked
__declspec(empty_bases).
This means that given:
struct A {};
struct B {};
struct __declspec(empty_bases) C : A, B {};
'C' will have size 1 and nvsize 0.
3. When calculating the offset of a non-virtual base, choose offset zero
if the most derived class is marked __declspec(empty_bases) and the
base is empty _and_ has an nvsize of 0.
Because of the ABI rules, this does not mean that empty bases
reliably get placed at offset 0!
For example:
struct A {};
struct B {};
struct __declspec(empty_bases) C : A, B { virtual ~C(); };
'C' will be pointer sized to account for the vfptr at offset 0.
'A' and 'B' will _not_ be at offset 0 despite being empty!
Instead, they will be located right after the vfptr.
This occurs due to the interaction betweeen non-virtual base layout
and virtual function pointer injection: injection occurs after the
nv-bases and shifts them down by the size of a pointer.
Exherbo has an alternative file system layout to accommodate multiarch. The
loader is located at /usr/${triple}/lib/${loader}. Adjust the Linux toolchain
to support that on exherbo.
Driver: sink getLinuxDynamicLoader into the Toolchain
The parameter already requires the toolchain, sink the method into the class.
This also enables the use of the distro detection logic which will be needed to
support Exherbo's multiarch approach.
Convert the cascading if/else to a switch. This makes it easier to follow the
logic. Minor difference is that we no longer default to x86_64 but rather to
the architecture specified by the architecture.
OpenCL builtin functions to_{global|local|private} accepts argument of pointer type to arbitrary pointee type, and return a pointer to the same pointee type in different addr space, i.e.
global gentype *to_global(gentype *p);
It is not desirable to declare it as
global void *to_global(void *);
in opencl header file since it misses diagnostics.
This patch implements these builtin functions as Clang builtin functions. In the builtin def file they are defined to have signature void*(void*). When handling call expressions, their declarations are re-written to have correct parameter type and return type corresponding to the call argument.
In codegen call to addr void *to_addr(void*) is generated with addrcasts or bitcasts to facilitate implementation in builtin library.
Dimitry Andric [Fri, 20 May 2016 17:27:22 +0000 (17:27 +0000)]
Make __FreeBSD_cc_version predefined macro configurable at build time
The `FreeBSDTargetInfo` class has always set the `__FreeBSD_cc_version`
predefined macro to a rather static value, calculated from the major OS
version.
In the FreeBSD base system, we will start incrementing the value of this
macro whenever we make any signifant change to clang, so we need a way
to configure the macro's value at build time.
Use `FREEBSD_CC_VERSION` for this, which we can define in the FreeBSD
build system using either the `-D` command line option, or an include
file. Stock builds will keep the earlier value.
Benjamin Kramer [Fri, 20 May 2016 15:21:08 +0000 (15:21 +0000)]
Add all the avx512 flavors to __builtin_cpu_supports's list.
This is matching what trunk gcc is accepting. Also adds a missing ssse3
case. PR27779. The amount of duplication here is annoying, maybe it
should be factored into a separate .def file?
Martin Probst [Fri, 20 May 2016 11:24:24 +0000 (11:24 +0000)]
clang-format: [JS] sort ES6 imports.
Summary:
This change automatically sorts ES6 imports and exports into four groups:
absolute imports, parent imports, relative imports, and then exports. Exports
are sorted in the same order, but not grouped further.
To keep JS import sorting out of Format.cpp, this required extracting the
TokenAnalyzer infrastructure to separate header and implementation files.
Vedant Kumar [Thu, 19 May 2016 23:44:02 +0000 (23:44 +0000)]
[Lexer] Don't merge macro args from different macro files
The lexer sets the end location of macro arguments incorrectly *if*,
while merging consecutive args to fit into a single SLocEntry, it finds
args which come from different macro files.
Fix the issue by using separate SLocEntries in this situation.
This fixes a code coverage crasher (rdar://problem/26181005). Because
the lexer reported end locations for certain macro args incorrectly, we
would generate bogus coverage mappings with negative line offsets.
Anton Yartsev [Thu, 19 May 2016 23:03:49 +0000 (23:03 +0000)]
[analyzer] Fix for PR23790 : constrain return value of strcmp() rather than returning a concrete value.
The function strcmp() can return any value, not just {-1,0,1} : "The strcmp(const char *s1, const char *s2) function returns an integer greater than, equal to, or less than zero, accordingly as the string pointed to by s1 is greater than, equal to, or less than the string pointed to by s2." [C11 7.24.4.2p3]
https://llvm.org/bugs/show_bug.cgi?id=23790
http://reviews.llvm.org/D16317
Justin Lebar [Thu, 19 May 2016 22:49:13 +0000 (22:49 +0000)]
[CUDA] Implement __ldg using intrinsics.
Summary:
Previously it was implemented as inline asm in the CUDA headers.
This change allows us to use the [addr+imm] addressing mode when
executing ld.global.nc instructions. This translates into a 1.3x
speedup on some benchmarks that call this instruction from within an
unrolled loop.
Artem Belevich [Thu, 19 May 2016 20:13:53 +0000 (20:13 +0000)]
[CUDA] Do not allow non-empty destructors for global device-side variables.
According to Cuda Programming guide (v7.5, E2.3.1):
> __device__, __constant__ and __shared__ variables defined in namespace
> scope, that are of class type, cannot have a non-empty constructor or a
> non-empty destructor.
Clang already deals with device-side constructors (see D15305).
This patch enforces similar rules for destructors.
Artem Belevich [Thu, 19 May 2016 20:13:39 +0000 (20:13 +0000)]
[CUDA] Split device-var-init.cu tests into separate Sema and CodeGen parts.
Codegen tests for device-side variable initialization are subset of test
cases used to verify Sema's part of the job.
Including CodeGenCUDA/device-var-init.cu from SemaCUDA makes it easier to
keep both sides in sync.
Simon Atanasyan [Thu, 19 May 2016 15:07:21 +0000 (15:07 +0000)]
[driver] Do not pass install dir to the MultilibSet include dirs callback
All additional include directories are relative to the toolchain install
folder. So let's do not pass this folder to each callback to simplify
and slightly reduce the code.
Simon Atanasyan [Thu, 19 May 2016 15:05:22 +0000 (15:05 +0000)]
[driver][mips] Hardcode triple name in case of CodeSourcery toolchain. NFC
CodeSourcery toolchain is a standalone toolchain which always uses
the same triple name in its paths. It is independent from target
triple used by the driver.
Ranjeet Singh [Thu, 19 May 2016 13:04:34 +0000 (13:04 +0000)]
[ARM] Fix cdp intrinsic
- Fixed cdp intrinsic to only accept compile time
constant values previously you could pass in a
variable to the builtin which would result in
illegal llvm assembly output
Richard Smith [Thu, 19 May 2016 01:39:10 +0000 (01:39 +0000)]
Make Sema::getPrintingPolicy less ridiculously expensive. This used to perform
an identifier table lookup, *and* copy the LangOptions (including various
std::vector<std::string>s). Twice. We call this function once each time we start
parsing a declaration specifier sequence, and once for each call to Sema::Diag.
This reduces the compile time for a sample .c file from the linux kernel by 20%.
Steven Wu [Wed, 18 May 2016 17:04:52 +0000 (17:04 +0000)]
[Driver] Fix the case when use -fembed-bitcode and -flto= together
Summary:
-fembed-bitcode was only checking for old style LTO flag (-flto) but not
considering the new -flto= style option. That makes clang output bitcode
embedded in bitcode object when using -flto= and -fembed-bitcode= together.
Now clang should output normal bitcode file when using LTO and ignores
-fembed-bitcode option.