Added doxygen comments to avxintrin.h's intrinsics. As of now, all the intrinsics in this file that were documented by Sony's intrinsics guide should have corresponding doxygen comments.
Note: The doxygen comments are automatically generated based on Sony's intrinsic
s document.
I got an OK from Eric Christopher to commit doxygen comments without prior code
review upstream.
This triggers an assert in CodeGenFunction::EmitCallExprLValue because
it doesn't expect to see a call to a function with a non-reference
scalar return type.
This commit prevents the assert by making VisitUnaryAddrOf error out if
the address-of operator is applied to a call to a function with
__unknown_anytype return type.
Oleg Ranevskyy [Fri, 18 Nov 2016 21:00:08 +0000 (21:00 +0000)]
[ARM] Fix sema check of ARM special register names
Summary:
This is a simple sema check patch for arguments of `__builtin_arm_rsr` and the related builtins, which currently do not allow special registers with indexes >7.
Some of the possible register name formats these builtins accept are:
```
{c}p<coprocessor>:<op1>:c<CRn>:c<CRm>:<op2>
```
```
o0:op1:CRn:CRm:op2
```
where `op1` / `op2` are integers in the range [0, 7] and `CRn` / `CRm` are integers in the range [0, 15].
The current sema check does not allow `CRn` > 7 and accepts `op2` up to 15.
Add doxygen comments to fxsrintrin.h's intrinsics.
The doxygen comments are automatically generated based on Sony's intrinsics document.
I got an OK from Eric Christopher to commit doxygen comments without prior code
review upstream. This patch was internally reviewed by Paul Robinson and Charles Li.
John McCall [Fri, 18 Nov 2016 01:08:24 +0000 (01:08 +0000)]
Forward ns_consumed delegate arguments with a move.
StartFunction enters a release cleanup for ns_consumed arguments in
ARC, so we need to balance that somehow. We could teach StartFunction
that it's emitting a delegating function, so that the cleanup is
unnecessary, but that would be invasive and somewhat fraught. We could
balance the consumed argument with an extra retain, but clearing the
original variable should be easier to optimize and avoid some extra work
at -O0. And there shouldn't be any difference as long as nothing else
uses the argument, which should always be true for the places we emit
delegate arguments.
Justin Lebar [Fri, 18 Nov 2016 00:41:27 +0000 (00:41 +0000)]
[CUDA] Initialize our header search using the host triple.
Summary:
This used to work because system headers are found in a (somewhat)
predictable set of locations on Linux. But this is not the case on
MacOS; without this change, we don't look in the right places for our
headers when doing device-side compilation on Mac.
Justin Lebar [Fri, 18 Nov 2016 00:41:22 +0000 (00:41 +0000)]
[CUDA] Driver changes to support CUDA compilation on MacOS.
Summary:
Compiling CUDA device code requires us to know the host toolchain,
because CUDA device-side compiles pull in e.g. host headers.
When we only supported Linux compilation, this worked because
CudaToolChain, which is responsible for device-side CUDA compilation,
inherited from the Linux toolchain. But in order to support MacOS,
CudaToolChain needs to take a HostToolChain pointer.
Because a CUDA toolchain now requires a host TC, we no longer will
create a CUDA toolchain from Driver::getToolChain -- you have to go
through CreateOffloadingDeviceToolChains. I am *pretty* sure this is
correct, and that previously any attempt to create a CUDA toolchain
through getToolChain() would eventually have resulted in us throwing
"error: unsupported use of NVPTX for host compilation".
In any case hacking getToolChain to create a CUDA+host toolchain would
be wrong, because a Driver can be reused for multiple compilations,
potentially with different host TCs, and getToolChain will cache the
result, causing us to potentially use a stale host TC.
So that's the main change in this patch.
In addition, we have to pull CudaInstallationDetector out of Generic_GCC
and into a top-level class. It's now used by the Generic_GCC and MachO
toolchains.
I made several changes for consistency with the rest of x86 instrinsics header files. Some of these changes help to render doxygen comments better.
1. avxintrin.h – Moved the opening bracket on a separate line for several
intrinsics (for consistency with the rest of the intrinsics).
2. emmintrin.h - Moved the doxygen comment next to the body of the function;
- Added braces after extern "C" even though there is only
one declaration each time
3. xmmintrin.h - Moved the doxygen comment next to the body of the function;
- Added intrinsic prototypes for a couple of macro definitions
into the doxygen comment;
- Added braces after extern "C" even though there is only one
declaration each time
4. ammintrin.h – Removed extra line between the doxygen comment and the body
of the functions (for consistency with the rest of the files).
Implement the -dI as supported by GCC: Output ‘#include’ directives in addition
to the result of preprocessing.
This change aims to add this option, pass it through to the preprocessor via
the options class, and when inclusions occur we output some information (+ test
cases).
In addition to the preprocessed sources file and reproducer script, also
point to the .crash diagnostic files on Darwin. Example:
PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
Preprocessed source(s) and associated run script(s) are located at:
clang-4.0: note: diagnostic msg: /var/folders/bk/1hj20g8j4xvdj5gd25ywhd3m0000gq/T/RegAllocGreedy-238f28.cpp
clang-4.0: note: diagnostic msg: /var/folders/bk/1hj20g8j4xvdj5gd25ywhd3m0000gq/T/RegAllocGreedy-238f28.cache
clang-4.0: note: diagnostic msg: /var/folders/bk/1hj20g8j4xvdj5gd25ywhd3m0000gq/T/RegAllocGreedy-238f28.sh
clang-4.0: note: diagnostic msg: /var/folders/bk/1hj20g8j4xvdj5gd25ywhd3m0000gq/T/RegAllocGreedy-238f28.crash
When no match is found for the .crash, point the user to a directory
where those can be found. Example:
clang-4.0: note: diagnostic msg: Crash backtrace is located in
clang-4.0: note: diagnostic msg: /Users/bruno/Library/Logs/DiagnosticReports/clang-4.0_<YYYY-MM-DD-HHMMSS>_<hostname>.crash
clang-4.0: note: diagnostic msg: (choose the .crash file that corresponds to your crash)
Malcolm Parsons [Thu, 17 Nov 2016 17:52:58 +0000 (17:52 +0000)]
Use unique_ptr for cached tokens for default arguments in C++.
Summary:
This changes pointers to cached tokens for default arguments in C++ from raw pointers to unique_ptrs. There was a fixme in the code where the cached tokens are created about using a smart pointer.
The change is straightforward, though I did have to track down and fix a memory corruption caused by the change. memcpy was being used to copy parameter information. This duplicated the unique_ptr, which led to the cached token buffer being deleted prematurely.
Sema: correct typo correction for ivars in @implementation
The previous typo correction handling assumed that ivars are only declared in
the interface declaration rather than as a private ivar in the implementation.
Adjust the handling to permit both interfaces. Assert earlier that the
interface has been acquired to ensure that we can identify when both possible
casts have failed.
Benjamin Kramer [Thu, 17 Nov 2016 15:22:36 +0000 (15:22 +0000)]
Link include-fixer into libclang if clang-tools-extra is checked out.
include-fixer only slightly bloats the size of libclang, but since
libclang has no explicit plugin mechanism it's the only way of getting
this to work. Clang-tidy is already there and so far there weren't many
complaints ;)
This is designed to be easy to remove again if libclang ever grows
proper plugin support.
Alexey Bataev [Thu, 17 Nov 2016 15:12:05 +0000 (15:12 +0000)]
[OPENMP] Fixed codegen for 'omp cancel' construct.
If 'omp cancel' construct is used in a worksharing construct it may
cause hanging of the software in case if reduction clause is used. Patch fixes this problem by avoiding extra reduction processing for branches that were canceled.
Richard Smith [Thu, 17 Nov 2016 02:16:09 +0000 (02:16 +0000)]
Remove -Wsigned-enum-bitfield from -Wmost. On a wide set of ABIs, this warning
is completely irrelevant, producing (effectively) false positives, and -Wmost
is used pretty widely. We should somehow turn it back on by default when
targeting the MS ABI, however, since it indicates the program will not do as
intended in those cases.
(Or perhaps we should just treat enum bitfields as having the signedness of the
enum, even when targeting the MS ABI...)
[Sema] Fix a bug in enable_if condition instantiation.
During template instantiation, we currently fall back to just calling
Sema::SubstExpr for enable_if attributes that aren't value-dependent or
type-dependent. Since Sema::SubstExpr strips off any implicit casts
we've added to an expression, it's possible that this behavior will
leave us with an enable_if condition that's just a DeclRefExpr.
Conditions like that deeply confuse Sema::CheckEnableIf.
Ivan Krasin [Thu, 17 Nov 2016 00:39:48 +0000 (00:39 +0000)]
Insert a type check before reading vtable.
Summary:
this is to prevent a situation when a pointer is invalid or null,
but we get to reading from vtable before we can check that
(possibly causing a segfault without a good diagnostics).
Reid Kleckner [Wed, 16 Nov 2016 23:40:00 +0000 (23:40 +0000)]
Add warning when assigning enums to bitfields without an explicit unsigned underlying type
Summary:
Add a warning when assigning enums to bitfields without an explicit
unsigned underlying type. This is to prevent problems with MSVC
compatibility, since the Microsoft ABI defaults to storing enums with a
signed type, causing inconsistencies with saving to/reading from
bitfields.
Also disabled the warning in the dr0xx.cpp test which throws the error,
and added a test for the warning.
The warning can be disabled with -Wno-signed-enum-bitfield.
Sean Callanan [Wed, 16 Nov 2016 18:21:00 +0000 (18:21 +0000)]
Fixed layout of test/ASTMerge.
As outlined in a previous RFC, the test/ASTMerge/Inputs folder is getting full and the tests are starting to become interdependent. This is undesirable because
- it makes it harder to write new tests
- it makes it harder to figure out at a glance what old tests are doing, and
- it adds the risk of breaking one test while changing a different one, because of the interdependencies.
To fix this, according to the conversation in the RFC, I have changed the layout from
a.c
Inputs/a1.c
Inputs/a2.c
to
a/test.c
a/Inputs/a1.c
a/Inputs/a2.c
for all existing tests. I have also eliminated interdependencies by replicating the input files for each test that uses them.
Simon Pilgrim [Wed, 16 Nov 2016 09:27:40 +0000 (09:27 +0000)]
[X86][AVX512] Replace lossless i32/u32 to f64 conversion intrinsics with generic IR
Both the (V)CVTDQ2PD (i32 to f64) and (V)CVTUDQ2PD (u32 to f64) conversion instructions are lossless and can be safely represented as generic __builtin_convertvector calls instead of x86 intrinsics without affecting final codegen.
This patch removes the clang builtins and their use in the headers - a future patch will deal with removing the llvm intrinsics.
This is an extension patch to D20528 which dealt with the equivalent sse/avx cases.
Steven Wu [Wed, 16 Nov 2016 06:06:44 +0000 (06:06 +0000)]
[Driver] Infer the correct option to ld64 for -fembed-bitcode
Summary:
-fembed-bitcode infers -bitcode_bundle to ld64 but it is not correctly
passed when using LTO. LTO is a special case of -fembed-bitcode which
it doesn't require embed the bitcode in a special section in the object
file but it requires linker to save that as part of the final executable.
Richard Smith [Wed, 16 Nov 2016 00:57:23 +0000 (00:57 +0000)]
Outline evaluation of calls to builtins to avoid inflating stack usage for the
common case of a call to a non-builtin, particularly for unoptimized ASan
builds (where the per-variable stack usage can be quite high).
Artem Dergachev [Tue, 15 Nov 2016 22:22:57 +0000 (22:22 +0000)]
[analyzer] NumberObjectConversion: Workaround for a linker error with modules.
A combination of C++ modules, variadic functions with more than one argument,
and const globals in headers (all three being necessary) causes some releases
of clang to misplace the matcher objects, which causes the linker to fail.
No functional change - the extra allOf() matcher is no-op here.
The wave barrier represents the discardable barrier. Its main purpose is to
carry convergent attribute, thus preventing illegal CFG optimizations. All lanes
in a wave come to convergence point simultaneously with SIMT, thus no special
instruction is needed in the ISA. The barrier is discarded during code generation.
Devin Coughlin [Tue, 15 Nov 2016 18:40:46 +0000 (18:40 +0000)]
[analyzer] Add check for when block is called with too few arguments.
The CallAndMessageChecker has an existing check for when a function pointer
is called with too few arguments. Extend this logic to handle the block
case, as well. While we're at it, do a drive-by grammar correction
("less" --> "fewer") on the diagnostic text.
Tony Jiang [Tue, 15 Nov 2016 14:30:56 +0000 (14:30 +0000)]
[PowerPC] Implement BE VSX load/store builtins - clang portion.
This patch implements all the overloads for vec_xl_be and vec_xst_be. On BE,
they behaves exactly the same with vec_xl and vec_xst, therefore they are
simply implemented by defining a matching macro. On LE, they are implemented
by defining new builtins and intrinsics. For int/float/long long/double, it
is just a load (lxvw4x/lxvd2x) or store(stxvw4x/stxvd2x). For char/char/short,
we also need some extra shuffling before or after call the builtins to get the
desired BE order. For int128, simply call vec_xl or vec_xst.
Michal Gorny [Tue, 15 Nov 2016 12:54:10 +0000 (12:54 +0000)]
[test] Correctly include build llvm_shlib_dir in stand-alone builds
Add the build llvm_shlib_dir into LD_LIBRARY_PATH before the directory
specified as llvm_libs_dir, in order to fix stand-alone builds
attempting to use installed clang libraries.
In case of stand-alone builds llvm_libs_dir specifies the location of
installed LLVM libraries which can also contain an older version
(previous build) of clang libraries. Therefore, ensure to specify
llvm_shlib_dir (which is always the build tree path) before
the potentially-system llvm_libs_dir.
Alexey Bataev [Tue, 15 Nov 2016 09:11:50 +0000 (09:11 +0000)]
[OPENMP] Fixed codegen for 'omp cancel' construct.
If 'omp cancel' construct is used in a worksharing construct it may cause
hanging of the software in case if reduction clause is used. Patch fixes
this problem by avoiding extra reduction processing for branches that
were canceled.
Dominic Chen [Tue, 15 Nov 2016 01:54:41 +0000 (01:54 +0000)]
[analyzer] Rename assumeWithinInclusiveRange*()
Summary: The name is slightly confusing, since the constraint is not necessarily within the range unless `Assumption` is true. Split out renaming for ConstraintManager.h from D26061
Kuba Brecka [Mon, 14 Nov 2016 22:38:57 +0000 (22:38 +0000)]
[sanitizer] Passthrough CMAKE_OSX_DEPLOYMENT_TARGET when building compiler-rt from clang/runtime/CMakeLists.txt
This breaks some Swift builds, because Swift's build scripts explicitly set CMAKE_OSX_DEPLOYMENT_TARGET. This however isn't propagated to the compiler-rt build, causing build errors.
Sean Fertile [Mon, 14 Nov 2016 18:47:15 +0000 (18:47 +0000)]
[PPC] altivec.h functions for converting half precision to single precision.
Adds 2 vector functions for converting from a vector of unsigned short to a
vector of float. One converts the low 4 halfwords and one converts the high
4 halfwords.
[OpenCL] Fix for integer parameters of enqueue_kernel
Make handling integer parameters more flexible:
- For the number of events argument allow to pass larger
integers than 32 bits as soon as compiler can prove that
the range fits in 32 bits. If not, the diagnostic will be given.
- Change type of the arguments specifying the sizes of
the corresponding block arguments to be size_t.
[OpenCL] Change to clk_event parameter in enqueue_kernel.
- Accept NULL pointer as a valid parameter value for clk_event.
- Generate clk_event_t arguments of internal
__enqueue_kernel_XXX function as pointers in generic address space.
Sean Fertile [Mon, 14 Nov 2016 14:43:27 +0000 (14:43 +0000)]
[PPC] add extract sig/exp test data class for vec float and vec double.
Add vector extract exponent/significand functions to altivec.h, as well as
functions (and related constants) to test the data class of vector float
and vector double.
[OpenCL] always use SPIR address spaces for kernel_arg_addr_space MD
It doesn't make sense to use the target's address space ids in this context as
this is metadata that should be referring to the "logical" OpenCL address spaces.
For flat AS machines like all "CPUs" in general, the logical AS info gets lost as
there's only one address space (0).
This commit changes the logic such that we always use the SPIR address space
ids for the argument metadata. It thus allows implementing the clGetKernelArgInfo()
and the other detection needs.