Craig Topper [Fri, 14 Dec 2018 00:21:02 +0000 (00:21 +0000)]
[Builltins][X86] Provide implementations of __lzcnt16, __lzcnt, __lzcnt64 for MS compatibility. Remove declarations from intrin.h and implementations from lzcntintrin.h
intrin.h had forward declarations for these and lzcntintrin.h had implementations that were only available with -mlzcnt or a -march that supported the lzcnt feature.
For MS compatibility we should always have these builtins available regardless of X86 being the target or the CPU support the lzcnt instruction. The backends should be able to gracefully fallback to something support even if its just shifts and bit ops.
Unfortunately, gcc also implements 2 of the 3 function names here on X86 when lzcnt feature is enabled.
This patch adds builtins for these for MSVC compatibility and drops the forward declarations from intrin.h. To keep the gcc compatibility the two intrinsics that collided have been turned into macros that use the X86 specific builtins with the lzcnt feature check. These macros are only defined when _MSC_VER is not defined. Without them being macros we can get a redefinition error because -ms-extensions doesn't seem to set _MSC_VER but does make the MS builtins available.
Artem Belevich [Thu, 13 Dec 2018 21:43:04 +0000 (21:43 +0000)]
[CUDA] Make all host-side shadows of device-side variables undef.
The host-side code can't (and should not) access the values that may
only exist on the device side. E.g. address of a __device__ function
does not exist on the host side as we don't generate the code for it there.
Aaron Ballman [Thu, 13 Dec 2018 20:55:34 +0000 (20:55 +0000)]
Update the scan-build to generate SARIF.
This updates the scan-build perl script to allow outputting to sarif in a more natural fashion by specifying -sarif as a command line argument, similar to how -plist is already supported.
Adrian Prantl [Thu, 13 Dec 2018 17:53:29 +0000 (17:53 +0000)]
Reinstate DW_AT_comp_dir support after D55519.
The DIFile used by the CU is special and distinct from the main source
file. Its directory part specifies what becomes the DW_AT_comp_dir
(the compilation directory), even if the source file was specified
with an absolute path.
To support the .dwo workflow, a valid DW_AT_comp_dir is necessary even
if source files were specified with an absolute path.
Found the case in the clang codebase where the assertion fires.
To avoid crashing assertion-enabled builds before I re-add the missing
operation.
Will restore the assertion alongside the upcoming fix.
David Green [Thu, 13 Dec 2018 17:20:06 +0000 (17:20 +0000)]
Fix CodeCompleteTest.cpp for older gcc plus ccache builds
Some versions of gcc, especially when invoked through ccache (-E), can have
trouble with raw string literals inside macros. This moves the string out of
the macro.
Vitaly Buka [Thu, 13 Dec 2018 09:47:39 +0000 (09:47 +0000)]
[asan] Don't check ODR violations for particular types of globals
Summary:
private and internal: should not trigger ODR at all.
unnamed_addr: current ODR checking approach fail and rereport false violation if
a linker merges such globals
linkonce_odr, weak_odr: could cause similar problems and they are already not
instrumented for ELF.
Artem Dergachev [Thu, 13 Dec 2018 01:30:47 +0000 (01:30 +0000)]
[analyzer] RunLoopAutoreleaseLeakChecker: Come up with a test for r348822.
Statement memoization was removed in r348822 because it was noticed to cause
memory corruption. This was happening because a reference to an object
in a DenseMap was used after being invalidated by inserting a new key
into the map.
This test case crashes reliably under ASan (i.e., when Clang is built with
-DLLVM_USE_SANITIZER="Address") on at least some machines before r348822
and doesn't crash after it.
Haibo Huang [Wed, 12 Dec 2018 22:04:12 +0000 (22:04 +0000)]
Declares __cpu_model as dso local
__builtin_cpu_supports and __builtin_cpu_is use information in __cpu_model to decide cpu features. Before this change, __cpu_model was not declared as dso local. The generated code looks up the address in GOT when reading __cpu_model. This makes it impossible to use these functions in ifunc, because at that time GOT entries have not been relocated. This change makes it dso local.
Eric Fiselier [Wed, 12 Dec 2018 21:50:55 +0000 (21:50 +0000)]
[AST] Store "UsesADL" information in CallExpr.
Summary:
Currently the Clang AST doesn't store information about how the callee of a CallExpr was found. Specifically if it was found using ADL.
However, this information is invaluable to tooling. Consider a tool which renames usages of a function. If the originally CallExpr was formed using ADL, then the tooling may need to additionally qualify the replacement.
Without information about how the callee was found, the tooling is left scratching it's head. Additionally, we want to be able to match ADL calls as quickly as possible, which means avoiding computing the answer on the fly.
This patch changes `CallExpr` to store whether it's callee was found using ADL. It does not change the size of any AST nodes.
Erich Keane [Wed, 12 Dec 2018 20:30:53 +0000 (20:30 +0000)]
Teach __builtin_unpredictable to work through implicit casts.
The __builtin_unpredictable implementation is confused by any implicit
casts, which happen in C++. This patch strips those off so that
if/switch statements now work with it in C++.
Erich Keane [Wed, 12 Dec 2018 18:11:36 +0000 (18:11 +0000)]
Change CallGraph print to show the fully qualified name
CallGraph previously would just show the normal name of a function,
which gets really confusing when using it on large C++ projects. This
patch switches the printName call to a printQualifiedName, so that the
namespaces are included.
When multiple loop transformation are defined in a loop's metadata, their order of execution is defined by the order of their respective passes in the pass pipeline. For instance, e.g.
and will try to loop-distribute before Unroll-And-Jam because the LoopDistribute pass is scheduled after UnrollAndJam pass. UnrollAndJamPass only supports one inner loop, i.e. it will necessarily fail after loop distribution. It is not possible to specify another execution order. Also,t the order of passes in the pipeline is subject to change between versions of LLVM, optimization options and which pass manager is used.
This patch adds 'followup' attributes to various loop transformation passes. These attributes define which attributes the resulting loop of a transformation should have. For instance,
defines a loop ID (!0) to be unrolled-and-jammed (!1) and then the attribute !3 to be added to the jammed inner loop, which contains the instruction to distribute the inner loop.
Currently, in both pass managers, pass execution is in a fixed order and UnrollAndJamPass will not execute again after LoopDistribute. We hope to fix this in the future by allowing pass managers to run passes until a fixpoint is reached, use Polly to perform these transformations, or add a loop transformation pass which takes the order issue into account.
For mandatory/forced transformations (e.g. by having been declared by #pragma omp simd), the user must be notified when a transformation could not be performed. It is not possible that the responsible pass emits such a warning because the transformation might be 'hidden' in a followup attribute when it is executed, or it is not present in the pipeline at all. For this reason, this patche introduces a WarnMissedTransformations pass, to warn about orphaned transformations.
Since this changes the user-visible diagnostic message when a transformation is applied, two test cases in the clang repository need to be updated.
To ensure that no other transformation is executed before the intended one, the attribute `llvm.loop.disable_nonforced` can be added which should disable transformation heuristics before the intended transformation is applied. E.g. it would be surprising if a loop is distributed before a #pragma unroll_and_jam is applied.
With more supported code transformations (loop fusion, interchange, stripmining, offloading, etc.), transformations can be used as building blocks for more complex transformations (e.g. stripmining+stripmining+interchange -> tiling).
Erich Keane [Wed, 12 Dec 2018 17:22:52 +0000 (17:22 +0000)]
Make clang::CallGraph look into template instantiations
Clang's CallGraph analysis doesn't use the RecursiveASTVisitor's setting
togo into template instantiations. The result is that anything wanting
to do call graph analysis ends up missing any template function calls.
Basic: make `int_least64_t` and `int_fast64_t` match on Darwin
The Darwin targets use `int64_t` and `uint64_t` to define the `int_least64_t`
and `int_fast64_t` types. The underlying type is actually a `long long`. Match
the types to allow the printf specifiers to work properly and have the compiler
vended macros match the implementation on the target.
Hubert Tong [Wed, 12 Dec 2018 16:53:43 +0000 (16:53 +0000)]
[ExprConstant] Improve memchr/memcmp for type mismatch and multibyte element types
Summary:
`memchr` and `memcmp` operate upon the character units of the object
representation; that is, the `size_t` parameter expresses the number of
character units. The constant folding implementation is updated in this
patch to account for multibyte element types in the arrays passed to
`memchr`/`memcmp` and, in the case of `memcmp`, to account for the
possibility that the arrays may have differing element types (even when
they are byte-sized).
Actual inspection of the object representation is not implemented.
Comparisons are done only between elements with the same object size;
that is, `memchr` will fail when inspecting at least one character unit
of a multibyte element. The integer types are assumed to have two's
complement representation with 0 for `false`, 1 for `true`, and no
padding bits.
`memcmp` on multibyte elements will only be able to fold in cases where
enough elements are equal for the answer to be 0.
Various tests are added to guard against incorrect folding for cases
that miscompile on some system or other prior to this patch. At the same
time, the unsigned 32-bit `wchar_t` testing in
`test/SemaCXX/constexpr-string.cpp` is restored.
Alexey Bataev [Wed, 12 Dec 2018 14:52:27 +0000 (14:52 +0000)]
[CUDA][OPENMP][NVPTX]Improve logic of the debug info support.
Summary:
Added support for the -gline-directives-only option + fixed logic of the
debug info for CUDA devices. If optimization level is O0, then options
--[no-]cuda-noopt-device-debug do not affect the debug info level. If
the optimization level is >O0, debug info options are used +
--no-cuda-noopt-device-debug is used or no --cuda-noopt-device-debug is
used, the optimization level for the device code is kept and the
emission of the debug directives is used.
If the opt level is > O0, debug info is requested +
--cuda-noopt-device-debug option is used, the optimization is disabled
for the device code + required debug info is emitted.
Fangrui Song [Wed, 12 Dec 2018 08:02:18 +0000 (08:02 +0000)]
Add explicit dependency on clangSerialization for a bunch of components to fix -DBUILD_SHARED_LIBS=on build
This is a more thorough fix of rC348911.
The story about -DBUILD_SHARED_LIBS=on build after rC348907 (Move PCHContainerOperations from Frontend to Serialization) is:
1. libclangSerialization.so defines PCHContainerReader dtor, ...
2. clangFrontend and clangTooling define classes inheriting from PCHContainerReader, thus their DSOs have undefined references on PCHContainerReader dtor
3. Components depending on either clangFrontend or clangTooling cannot be linked unless they have explicit dependency on clangSerialization due to the default linker option -z defs. The explicit dependency could be avoided if libclang{Frontend,Tooling}.so had these undefined references.
This patch adds the explicit dependency on clangSerialization to make them build.
Erich Keane [Tue, 11 Dec 2018 21:54:52 +0000 (21:54 +0000)]
Replace Const-Member checking with non-recursive version.
As reported in PR39946, these two implementations cause stack overflows
to occur when a type recursively contains itself. While this only
happens when an incomplete version of itself is used by membership (and
thus an otherwise invalid program), the crashes might be surprising.
The solution here is to replace the recursive implementation with one
that uses a std::vector as a queue. Old values are kept around to
prevent re-checking already checked types.
Aaron Ballman [Tue, 11 Dec 2018 19:30:49 +0000 (19:30 +0000)]
Stop stripping comments from AST matcher example code.
The AST matcher documentation dumping script was being a bit over-zealous about stripping comment markers, which ended up causing comments in example code to stop being comments. Fix that by only stripping comments at the start of a line, rather than removing any forward slash (which also impacts prose text).
Aaron Ballman [Tue, 11 Dec 2018 19:18:01 +0000 (19:18 +0000)]
Emit -Wformat properly for bit-field promotions.
Only explicitly look through integer and floating-point promotion where the result type is actually a promotion, which is not always the case for bit-fields in C.
- explicit_bzero has limited scope/usage only for security/crypto purposes but is non-optimisable version of memset/0 and bzero.
- explicit_memset has similar signature and semantics as memset but is also a non-optimisable version.
Adrian Prantl [Tue, 11 Dec 2018 16:58:43 +0000 (16:58 +0000)]
Reuse code from CGDebugInfo::getOrCreateFile() when creating the file
for the DICompileUnit.
This addresses post-commit feedback for D55085. Without this patch, a
main source file with an absolute paths may appear in different
DIFiles, once with the absolute path and once with the common prefix
between the absolute path and the current working directory.
George Karpenkov [Tue, 11 Dec 2018 01:14:17 +0000 (01:14 +0000)]
[analyzer] Remove memoization from RunLoopAutoreleaseLeakChecker
Memoization dose not seem to be necessary, as other statement visitors
run just fine without it,
and in fact seems to be causing memory corruptions.
Just removing it instead of investigating the root cause.
Fangrui Song [Mon, 10 Dec 2018 18:10:35 +0000 (18:10 +0000)]
ComputeLineNumbers: delete SSE2 vectorization
Summary:
SSE2 vectorization was added in 2012, but it is 2018 now and I can't
observe any performance boost (testing clang -E [all Sema/* CodeGen/* with proper -I options]) with the existing _mm_movemask_epi8+countTrailingZeros or the following SSE4.2 (compiling with -msse4.2):
Michael Kruse [Mon, 10 Dec 2018 15:16:37 +0000 (15:16 +0000)]
Use zip_longest for iterator range comparisons. NFC.
Use zip_longest in two locations that compare iterator ranges.
zip_longest allows the iteration using a range-based for-loop and to be
symmetric over both ranges instead of prioritizing one over the other.
In that latter case code have to handle the case that the first is
longer than the second, the second is longer than the first, and both
are of the same length, which must partially be checked after the loop.
With zip_longest, this becomes an element comparison within the loop
like the comparison of the elements themselves. The symmetry makes it
clearer that neither the first and second iterators are handled
differently. The iterators are not event used directly anymore, just
the ranges.
[OpenCL][CodeGen] Fix replacing memcpy with addrspacecast
Summary:
If a function argument is byval and RV is located in default or alloca address space
an optimization of creating addrspacecast instead of memcpy is performed. That is
not correct for OpenCL, where that can lead to a situation of address space casting
from __private * to __global *. See an example below:
Craig Topper [Mon, 10 Dec 2018 06:07:59 +0000 (06:07 +0000)]
[X86] Remove the addcarry builtins. Leaving only the addcarryx builtins since that matches gcc.
The addcarry and addcarryx builtins do the same thing. The only difference is that addcarryx previously required adx feature.
This commit removes the adx feature check from addcarryx and removes the addcarry builtin. This matches the builtins that gcc has. We don't guarantee compatibility in builtins, but we generally try to be consistent if its not a burden.
Stephen Kelly [Sun, 9 Dec 2018 13:33:30 +0000 (13:33 +0000)]
NFC: Rename TemplateDecl dump utilities
There is a clang::TemplateDecl AST type, so a method called
VisitTemplateDecl looks like it should 'override' the method from the
base visitor, but it does not because of the extra parameters it takes.
In reality, these methods are utilities, so name them like utilities.
Pete Cooper [Sat, 8 Dec 2018 05:13:50 +0000 (05:13 +0000)]
Convert some ObjC msgSends to runtime calls.
It is faster to directly call the ObjC runtime for methods such as alloc/allocWithZone instead of sending a message to those functions.
This patch adds support for converting messages to alloc/allocWithZone to their equivalent runtime calls.
Tests included for the positive case of applying this transformation, negative tests that we ensure we only convert "alloc" to objc_alloc, not "alloc2", and also a driver test to ensure we enable this only for supported runtime versions.
Richard Trieu [Sat, 8 Dec 2018 05:05:03 +0000 (05:05 +0000)]
Move diagnostic enums into Basic.
Move enums from */*Diagnostic.h to Basic/Diagnostic*.h. Basic/AllDiagnostics.h
needs all the enums and moving the sources to Basic prevents a Basic->*->Basic
dependency loop. This also allows each Basic/Diagnostics*Kinds.td to have a
header at Basic/Diagnostic*.h (except for Common). The old headers are kept in place since other packages are still using them.
Stop tracking retain count of OSObject after escape to void * / other primitive types
Escaping to void * / uint64_t / others non-OSObject * should stop tracking,
as such functions can have heterogeneous semantics depending on context,
and can not always be annotated.
[tests] Fix the FileManagerTest getVirtualFile test on Windows
Summary: The test passes on Windows only when it is executed on the C: drive. If the build and tests run on a different drive, the test is currently failing.
[Preprocessor] Don't avoid entering included files after hitting a fatal error.
Change in r337953 violated the contract for `CXTranslationUnit_KeepGoing`:
> Do not stop processing when fatal errors are encountered.
Use different approach to fix long processing times with multiple inclusion
cycles. Instead of stopping preprocessing for fatal errors, do this after
reaching the max allowed include depth and only for the files that were
processed already. It is likely but not guaranteed those files cause a cycle.