Matthew Voss [Tue, 12 Jun 2018 22:22:35 +0000 (22:22 +0000)]
[analyzer] Ensure that loop widening does not invalidate references
Loop widening can invalidate a reference. If the analyzer attempts to visit the
destructor to a non-existent reference, it will crash. This patch ensures that
the reference is preserved.
George Karpenkov [Tue, 12 Jun 2018 20:51:19 +0000 (20:51 +0000)]
[analyzer] [NFC] Remove "removeInvalidation" from visitor API
removeInvalidation is a very problematic API, as it makes suppression
order-dependent.
Moreover, it was used only once, and could be rewritten in a much
cleaner way.
George Karpenkov [Tue, 12 Jun 2018 20:51:01 +0000 (20:51 +0000)]
[analyzer] [NFC] Move ::dump methods from BugReporter.cpp to PathDiagnostics.cpp
BugReporter.cpp is already severely overloaded, and those dump methods
are on PathDiagnostics and should belong in the corresponding
implementation file.
George Karpenkov [Tue, 12 Jun 2018 20:50:44 +0000 (20:50 +0000)]
[analyzer] [NFC] Remove most usages of getEndPath
getEndPath is a problematic API, because it's not clear when it's called
(hint: not always at the end of the path), it crashes at runtime with
more than one non-nullptr returning implementation, and diagnostics
internal depend on it being called at some exact place.
However, most visitors don't actually need that: all they want is a
function consistently called after all nodes are traversed, to perform
finalization and to decide whether invalidation is needed.
Petr Hosek [Tue, 12 Jun 2018 20:00:50 +0000 (20:00 +0000)]
[AArch64] Support reserving x20 register
Register x20 is a callee-saved register which may be used for other
purposes in certain contexts, for example to hold special variables
within the kernel. This change adds support for reserving this register
both to frontend and backend to make this register usable for these
purposes.
[clang-format] Fix crash while reflowing backslash in comments
Summary:
The added test case was currently crashing with an assertion:
```
krasimir@krasimir> cat test.cc ~
// How to run:
// bbbbb run \
// rrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr \
// <log_file> -- --output_directory="<output_directory>"
krasimir@krasimir> ~/work/llvm-build/bin/clang-format test.cc ~
clang-format: /usr/local/google/home/krasimir/work/llvm/tools/clang/lib/Format/WhitespaceManager.cpp:117: void clang::format::WhitespaceManager::calculateLineBreakInformation(): Assertion `PreviousOriginalWhitespaceEndOffset <= OriginalWhitespaceStartOffset' failed.
```
The root cause was that BreakableToken was not considering the case of a reflow between an unescaped newline in a line comment.
George Karpenkov [Tue, 12 Jun 2018 19:08:00 +0000 (19:08 +0000)]
[analyzer] [NFC] Unify Minimal and Extensive diagnostics.
Once we removed AlternateExtensive, I've looked closer into the
difference between Minimal and Extensive, and turns out, the difference
was not that large.
Rename AlternateExtensive to Extensive.
In 2013, five years ago, we have switched to AlternateExtensive
diagnostics by default, and Extensive was available under unused,
undocumented flag.
This change remove the flag, renames the Alternate
diagnostic to Extensive (as it's no longer Alternate), and ports the
test.
Zachary Turner [Tue, 12 Jun 2018 17:43:52 +0000 (17:43 +0000)]
Refactor ExecuteAndWait to take StringRefs.
This simplifies some code which had StringRefs to begin with, and
makes other code more complicated which had const char* to begin
with.
In the end, I think this makes for a more idiomatic and platform
agnostic API. Not all platforms launch process with null terminated
c-string arrays for the environment pointer and argv, but the api
was designed that way because it allowed easy pass-through for
posix-based platforms. There's a little additional overhead now
since on posix based platforms we'll be takign StringRefs which
were constructed from null terminated strings and then copying
them to null terminate them again, but from a readability and
usability standpoint of the API user, I think this API signature
is strictly better.
[clang-format] Discourage breaks in submessage entries, hard rule
Summary:
Currently clang-format allows this for text protos:
```
submessage:
{ key: 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa' }
```
when it is under the column limit and when putting it all on one line exceeds the column limit.
This is not a very intuitive formatting, so I'd prefer having
```
submessage: {
key: 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
}
```
instead, even if it takes one line more.
This patch prevents clang-format from inserting a break between `: {` and similar cases.
Erich Keane [Tue, 12 Jun 2018 13:59:32 +0000 (13:59 +0000)]
Fix overload resolution between Ptr-To-Member and Bool
As reported here (https://bugs.llvm.org/show_bug.cgi?id=19808)
and discovered independently when looking at plum-hall tests,
we incorrectly implemented over.ics.rank, which says "A conversion
that is not a conversion of a pointer, or pointer to member, to bool
is better than another conversion that is such a conversion.".
In the current Draft (N4750), this is phrased slightly differently in
paragraph 4.1: A conversion that does not convert a pointer, a pointer
to member, or std::nullptr_t to bool is better than one that does.
The comment on isPointerConversionToBool (the changed function)
also confirms that this is the case (note outdated reference):
isPointerConversionToBool - Determines whether this conversion is
a conversion of a pointer or pointer-to-member to bool. This is
used as part of the ranking of standard conversion sequences
(C++ 13.3.3.2p4).
However, despite this comment, it didn't check isMemberPointerType
on the 'FromType', presumably incorrectly assuming that 'isPointerType'
matched it. This patch fixes this by adding isMemberPointerType to
this function. Additionally, member function pointers are just
MemberPointerTypes that point to functions insted of data, so that
is fixed in this patch as well.
Luke Geeson [Tue, 12 Jun 2018 09:54:27 +0000 (09:54 +0000)]
[AArch64] Corrected FP16 Intrinsic range checks in Clang + added Sema tests
Summary:
This fixes the ranges for the vcvth family of FP16 intrinsics in the clang front end. Previously it was accepting incorrect ranges
-Changed builtin range checking in SemaChecking
-added tests SemaCheck changes - included in their own file since no similar one exists
-modified existing tests to reflect new ranges
Raphael Isemann [Tue, 12 Jun 2018 03:43:21 +0000 (03:43 +0000)]
Fix that AlignedAllocation.h doesn't compile because of VersionTuple
Summary:
rL334399 put VersionTuple in the llvm namespace, but this header still assumes it's in the clang namespace.
This leads to compilation failures with enabled modules when building Clang.
Yaxun Liu [Tue, 12 Jun 2018 00:16:33 +0000 (00:16 +0000)]
[CUDA][HIP] Set kernel calling convention before arrange function
Currently clang set kernel calling convention for CUDA/HIP after
arranging function, which causes incorrect kernel function type since
it depends on calling convention.
This patch moves setting kernel convention before arranging
function.
Petr Hosek [Mon, 11 Jun 2018 22:06:44 +0000 (22:06 +0000)]
[CMake] Use libc++ and compiler-rt for bootstrap Fuchsia Clang
We want to build the second stage compiler with libc++ and compiler-rt,
also include builtins and runtimes into extra bootstrap components to
ensure these get built.
Reid Kleckner [Mon, 11 Jun 2018 16:49:43 +0000 (16:49 +0000)]
[MS] Use mangled names and comdats for string merging with ASan
This should reduce the binary size penalty of ASan on Windows. After
r334313, ASan will add red zones to globals in comdats, so we will still
find OOB accesses to string literals.
Martin Probst [Mon, 11 Jun 2018 16:20:13 +0000 (16:20 +0000)]
clang-format: [JS] strict prop init annotation.
Summary:
TypeScript uses the `!` token for strict property initialization
assertions, as in:
class X {
strictPropAsserted!: string;
}
Previously, clang-format would wrap between the `!` and the `:` for
overly long lines. This patch fixes that by generally preventing the
wrap in that location.
Summary:
This option replaces the BreakBeforeInheritanceComma option with an
enum, thus introducing a mode where the colon stays on the same line as
constructor declaration:
// When it fits on line:
class A : public B, public C {
...
};
// When it does not fit:
class A :
public B,
public C {
...
};
This matches the behavior of the `BreakConstructorInitializers` option,
introduced in https://reviews.llvm.org/D32479.
[clang-format] text protos: put entries on separate lines if there is a submessage
Summary:
This patch updates clang-format text protos to put entries of a submessage into separate lines if the submessage contains at least two entries and contains at least one submessage entry.
For example, the entries here are kept on separate lines even if putting them on a single line would be under the column limit:
```
message: {
entry: 1
submessage: { key: value }
}
```
Messages containing a single submessage or several scalar entries can still be put on one line if they fit:
```
message { submessage { key: value } }
message { x: 1 y: 2 z: 3 }
```
Pavel Labath [Mon, 11 Jun 2018 10:28:04 +0000 (10:28 +0000)]
Move VersionTuple from clang/Basic to llvm/Support
Summary:
This kind of functionality is useful to other project apart from clang.
LLDB works with version numbers a lot, but it does not have a convenient
abstraction for this. Moving this class to a lower level library allows
it to be freely used within LLDB.
Since this class is used in a lot of places in clang, and it used to be
in the clang namespace, it seemed appropriate to add it to the list of
adopted classes in LLVM.h to avoid prefixing all uses with "llvm::".
Also, I didn't find any tests specific for this class, so I wrote a
couple of quick ones for the more interesting bits of functionality.
Craig Topper [Sun, 10 Jun 2018 17:27:05 +0000 (17:27 +0000)]
[X86] Use target independent masked expandload and compressstore intrinsics to implement expandload/compressstore builtins.
Summary: We've had these target independent intrinsics for at least a year and a half. Looks like they do exactly what we need here and the backend already supports them.
Reka Kovacs [Sat, 9 Jun 2018 21:08:27 +0000 (21:08 +0000)]
[analyzer] Clean up the program state map of DanglingInternalBufferChecker.
Symbols are cleaned up from the program state map when they go out of scope.
Memory regions are cleaned up when the corresponding object is destroyed, and
additionally in 'checkDeadSymbols' in case destructor modeling was incomplete.
Reka Kovacs [Sat, 9 Jun 2018 13:03:49 +0000 (13:03 +0000)]
[analyzer] Add dangling internal buffer check.
This check will mark raw pointers to C++ standard library container internal
buffers 'released' when the objects themselves are destroyed. Such information
can be used by MallocChecker to warn about use-after-free problems.
In this first version, 'std::basic_string's are supported.
Craig Topper [Sat, 9 Jun 2018 00:30:45 +0000 (00:30 +0000)]
Use SmallPtrSet instead of SmallSet in places where we iterate over the set.
SmallSet forwards to SmallPtrSet for pointer types. SmallPtrSet supports iteration, but a normal SmallSet doesn't. So if it wasn't for the forwarding, this wouldn't work.
These places were found by hiding the begin/end methods in the SmallSet forwarding.
Craig Topper [Fri, 8 Jun 2018 22:19:42 +0000 (22:19 +0000)]
[X86] Add avx512 feature flags to __builtin_ia32_select*.
There are many masked intrinsics that just wrap a select around a legacy intrinsic from a pre-avx512 instruciton set. If that intrinsic is implemented as a macro, nothing prevents it from being used when only the older feature was enabled. This likely generates very poor code since we don't have a good way to convert from the scalar masked type used by the intrinsic into a vector control for a legacy blend instruction. If we even have a blend instruction to use.
By adding a feature to the select builtins we can prevent and diagnose misuse of these intrinsics.
Craig Topper [Fri, 8 Jun 2018 21:50:08 +0000 (21:50 +0000)]
[X86] Add back some masked vector truncate builtins. Custom IRgen a a few others.
I'd like to make the select builtins require an avx512f, avx512bw, or avx512vl fature to match what is normally required to get masking. Truncate is special in that there are instructions with a 128/256-bit masked result even without avx512vl.
By using special buitlins we can emit a select without using the 128/256-bit select builtins.
Jonas Hahnfeld [Fri, 8 Jun 2018 11:17:08 +0000 (11:17 +0000)]
[CUDA] Fix emission of constant strings in sections
CGM.GetAddrOfConstantCString() sets the adress of the created GlobalValue
to unnamed. When emitting the object file LLVM will mark the surrounding
section as SHF_MERGE iff the string is nul-terminated and contains no
other nuls (see IsNullTerminatedString). This results in problems when
saving temporaries because LLVM doesn't set an EntrySize, so reading in
the serialized assembly file fails.
This never happened for the GPU binaries because they usually contain
a nul-character somewhere. Instead this only affected the module ID
when compiling relocatable device code.
However, this points to a potentially larger problem: If we put a
constant string into a named section, we really want the data to end
up in that section in the object file. To avoid LLVM merging sections
this patch unmarks the GlobalVariable's address as unnamed which also
fixes the problem of invalid serialized assembly files when saving
temporaries.
Craig Topper [Fri, 8 Jun 2018 03:24:47 +0000 (03:24 +0000)]
[X86] Add subvector insert and extract builtins to enable target feature checking and immediate range checking.
Test changes are due to differences in how we generate undef elements now. We also changed the types used for extractf128_si256/insertf128_si256 to match the signature of the builtin that previously existed which this patch resurrects. This also matches gcc.
Shoaib Meenai [Fri, 8 Jun 2018 00:41:01 +0000 (00:41 +0000)]
[CodeGen] Always use MSVC personality for windows-msvc targets
The windows-msvc target is meant to be ABI compatible with MSVC,
including the exception handling. Ensure that a windows-msvc triple
always equates to the MSVC personality being used.
This mostly affects the GNUStep and ObjFW Obj-C runtimes. To the best of
my knowledge, those are normally not used with windows-msvc triples. I
believe WinObjC is based on GNUStep (or it at least uses libobjc2), but
that also takes the approach of wrapping Obj-C exceptions in C++
exceptions, so the MSVC personality function is the right one to use
there as well.
Matt Morehouse [Fri, 8 Jun 2018 00:33:35 +0000 (00:33 +0000)]
[clang-fuzzer] Made loop_proto more "vectorizable".
Edited loop_proto and its converter to make more "vectorizable" code
according to kcc's comment in D47666
- Removed all while loops
- Can only index into array with induction variable
Shoaib Meenai [Fri, 8 Jun 2018 00:30:00 +0000 (00:30 +0000)]
Reapply "[Parse] Use CapturedStmt for @finally on MSVC"
This reapplies r334224 and adds explicit triples to some tests to fix
them on Windows (where otherwise they would have run with the default
windows-msvc triple, which I'm changing the behavior for).
Original commit message:
The body of a `@finally` needs to be executed on both exceptional and
non-exceptional paths. On landingpad platforms, this is straightforward:
the `@finally` body is emitted as a normal (non-exceptional) cleanup,
and then a catch-all is emitted which branches to that cleanup (the
cleanup has code to conditionally re-throw based on a flag which is set
by the catch-all).
Unfortunately, we can't use the same approach for MSVC exceptions, where
the catch-all will be emitted as a catchpad. We can't just branch to the
cleanup from within the catchpad, since we can only exit it via a
catchret, at which point the exception is destroyed and we can't
rethrow. We could potentially emit the finally body inside the catchpad
and have the normal cleanup path somehow branch into it, but that would
require some new IR construct that could branch into a catchpad.
Instead, after discussing it with Reid Kleckner, we decided that
frontend outlining was the best approach, similar to how SEH `__finally`
works today. We decided to use CapturedStmt (which was also suggested by
Reid) rather than CaptureFinder (which is what `__finally` uses) since
the latter doesn't handle a lot of cases we care about, e.g. self
accesses, property accesses, block captures, etc. Extending
CaptureFinder to handle those additional cases proved unwieldy, whereas
CapturedStmt already took care of all of those. In theory `__finally`
could also be moved over to CapturedStmt, which would remove some
existing limitations (e.g. the inability to capture this), although
CaptureFinder would still be needed for SEH filters.
The one case supported by `@finally` but not CapturedStmt (or
CaptureFinder for that matter) is arbitrary control flow out of the
`@finally`, e.g. having a return statement inside a `@finally`. We can
add that support as a follow-up, but in practice we've found it to be
used very rarely anyway.
Shoaib Meenai [Thu, 7 Jun 2018 22:54:54 +0000 (22:54 +0000)]
[Frontend] Disallow non-MSVC exception models for windows-msvc targets
The windows-msvc target is used for MSVC ABI compatibility, including
the exceptions model. It doesn't make sense to pair a windows-msvc
target with a non-MSVC exception model. This would previously cause an
assertion failure; explicitly error out for it in the frontend instead.
This also allows us to reduce the matrix of target/exception models a
bit (see the modified tests), and we can possibly simplify some of the
personality code in a follow-up.
Shoaib Meenai [Thu, 7 Jun 2018 22:24:20 +0000 (22:24 +0000)]
Revert "[Parse] Use CapturedStmt for @finally on MSVC"
This reverts commit r334224.
This is causing buildbot failures on Windows, presumably because some
tests don't specify a triple. I'll test this on Windows locally and
recommit with the tests fixed.
Reid Kleckner [Thu, 7 Jun 2018 21:39:04 +0000 (21:39 +0000)]
[MS] Re-add support for the ARM interlocked bittest intrinscs
Adds support for these intrinsics, which are ARM and ARM64 only:
_interlockedbittestandreset_acq
_interlockedbittestandreset_rel
_interlockedbittestandreset_nf
_interlockedbittestandset_acq
_interlockedbittestandset_rel
_interlockedbittestandset_nf
Refactor the bittest intrinsic handling to decompose each intrinsic into
its action, its width, and its atomicity.
Craig Topper [Thu, 7 Jun 2018 21:27:41 +0000 (21:27 +0000)]
[X86] Add builtins for VALIGNQ/VALIGND to enable proper target feature checking.
We still emit shufflevector instructions we just do it from CGBuiltin.cpp now. This ensures the intrinsics that use this are only available on CPUs that support the feature.
I also added range checking to the immediate, but only checked it is 8 bits or smaller. We should maybe be stricter since we never use all 8 bits, but gcc doesn't seem to do that.
Shoaib Meenai [Thu, 7 Jun 2018 20:07:52 +0000 (20:07 +0000)]
[Parse] Use CapturedStmt for @finally on MSVC
The body of a `@finally` needs to be executed on both exceptional and
non-exceptional paths. On landingpad platforms, this is straightforward:
the `@finally` body is emitted as a normal (non-exceptional) cleanup,
and then a catch-all is emitted which branches to that cleanup (the
cleanup has code to conditionally re-throw based on a flag which is set
by the catch-all).
Unfortunately, we can't use the same approach for MSVC exceptions, where
the catch-all will be emitted as a catchpad. We can't just branch to the
cleanup from within the catchpad, since we can only exit it via a
catchret, at which point the exception is destroyed and we can't
rethrow. We could potentially emit the finally body inside the catchpad
and have the normal cleanup path somehow branch into it, but that would
require some new IR construct that could branch into a catchpad.
Instead, after discussing it with Reid Kleckner, we decided that
frontend outlining was the best approach, similar to how SEH `__finally`
works today. We decided to use CapturedStmt (which was also suggested by
Reid) rather than CaptureFinder (which is what `__finally` uses) since
the latter doesn't handle a lot of cases we care about, e.g. self
accesses, property accesses, block captures, etc. Extending
CaptureFinder to handle those additional cases proved unwieldy, whereas
CapturedStmt already took care of all of those. In theory `__finally`
could also be moved over to CapturedStmt, which would remove some
existing limitations (e.g. the inability to capture this), although
CaptureFinder would still be needed for SEH filters.
The one case supported by `@finally` but not CapturedStmt (or
CaptureFinder for that matter) is arbitrary control flow out of the
`@finally`, e.g. having a return statement inside a `@finally`. We can
add that support as a follow-up, but in practice we've found it to be
used very rarely anyway.
Zachary Turner [Thu, 7 Jun 2018 19:58:58 +0000 (19:58 +0000)]
[FileSystem] Split up the OpenFlags enumeration.
This breaks the OpenFlags enumeration into two separate
enumerations: OpenFlags and CreationDisposition. The first
controls the behavior of the API depending on whether or not
the target file already exists, and is not a flags-based
enum. The second controls more flags-like values.
This yields a more easy to understand API, while also allowing
flags to be passed to the openForRead api, where most of the
values didn't make sense before. This also makes the apis more
testable as it becomes easy to enumerate all the configurations
which make sense, so I've added many new tests to exercise all
the different values.
Vitaly Buka [Thu, 7 Jun 2018 19:17:46 +0000 (19:17 +0000)]
Introducing single for loop into clang_proto_fuzzer
Summary:
Created a new protobuf and protobuf-to-C++ "converter" that wraps the entire C++ code in a single for loop.
- Slightly changed cxx_proto.proto -> cxx_loop_proto.proto
- Made some changes to proto_to_cxx files to handle the new kind of protobuf
- Created ExampleClangLoopProtoFuzzer to test new protobuf and "converter"
Craig Topper [Thu, 7 Jun 2018 17:28:03 +0000 (17:28 +0000)]
[X86] Add back builtins for _mm_slli_si128/_mm_srli_si128 and similar intrinsics.
We still lower them to native shuffle IR, but we do it in CGBuiltin.cpp now. This allows us to check the target feature and ensure the immediate fits in 8 bits.
This also improves our -O0 codegen slightly because we're able to see the zeroinitializer in the shuffle. It looks like it got lost behind a store+load previously.
Gabor Buella [Thu, 7 Jun 2018 08:48:36 +0000 (08:48 +0000)]
[CodeGen] Improve diagnostics related to target attributes
Summary:
When requirement imposed by __target__ attributes on functions
are not satisfied, prefer printing those requirements, which
are explicitly mentioned in the attributes.
This makes such messages more useful, e.g. printing avx512f instead of avx2
in the following scenario:
int main(void)
{
x();
}
$ clang foo.c
foo.c:7:2: error: always_inline function 'x' requires target feature 'avx2', but would be inlined into function 'main' that is compiled without support for 'avx2'
x();
^
1 error generated.
```
Richard Trieu [Thu, 7 Jun 2018 03:20:30 +0000 (03:20 +0000)]
Change return value of trivial visibility check.
Previous, if no Decl's were checked, visibility was set to false. Switch it
so that in cases of no Decl's, return true. These are the Decl's after being
filtered. Also remove an unreachable return statement since it is directly
after another return statement.
Craig Topper [Thu, 7 Jun 2018 02:46:02 +0000 (02:46 +0000)]
[X86] Add back _mask, _maskz, and _mask3 builtins for some 512-bit fmadd/fmsub/fmaddsub/fmsubadd builtins.
Summary:
We recently switch to using a selects in the intrinsics header files for FMA instructions. But the 512-bit versions support flavors with rounding mode which must be an Integer Constant Expression. This has forced those intrinsics to be implemented as macros. As it stands now the mask and mask3 intrinsics evaluate one of their macro arguments twice. If that argument itself is another intrinsic macro, we can end up over expanding macros. Or if its something we can CSE later it would show up multiple times when it shouldn't.
I tried adding __extension__ around the macro and making it an expression statement and declaring a local variable. But whatever name you choose for the local variable can never be used as the name of an input to the macro in user code. If that happens you would end up with the same name on the LHS and RHS of an assignment after expansion. We might be safe if we use __ in front of the variable names because those names are reserved and user code shouldn't use that, but I wasn't sure I wanted to make that claim.
The other option which I've chosen here, is to add back _mask, _maskz, and _mask3 flavors of the builtin which we will expand in CGBuiltin.cpp to replicate the argument as needed and insert any fneg needed on the third operand to make a subtract. The _maskz isn't truly necessary if we have an unmasked version or if we use the masked version with a -1 mask and wrap a select around it. But I've chosen to make things more uniform.
I separated out the scalar builtin handling to avoid too many things going on in EmitX86FMAExpr. It was different enough due to the extract and insert that the minor duplication of the CreateCall was probably worth it.
Shoaib Meenai [Wed, 6 Jun 2018 23:09:02 +0000 (23:09 +0000)]
[Driver] Stop passing -fseh-exceptions for x86_64-windows-msvc
-fseh-exceptions is only meaningful for MinGW targets, and that driver
already has logic to pass either -fdwarf-exceptions or -fseh-exceptions
as appropriate. -fseh-exceptions is just a no-op for MSVC triples, and
passing it to cc1 causes unnecessary confusion.
Yaxun Liu [Wed, 6 Jun 2018 19:44:10 +0000 (19:44 +0000)]
[HIP] Fix unbundling
HIP uses clang-offload-bundler to bundle intermediate files for host
and different gpu archs together. When a file is unbundled,
clang-offload-bundler should be called only once, and the objects
for host and different gpu archs should be passed to the next
jobs. This is because Driver maintains CachedResults which maps
triple-arch string to output files for each job.
This patch fixes a bug in Driver::BuildJobsForActionNoCache which
uses incorrect key for CachedResults for HIP which causes
clang-offload-bundler being called mutiple times and incorrect
output files being used.
Reid Kleckner [Wed, 6 Jun 2018 01:35:08 +0000 (01:35 +0000)]
Implement bittest intrinsics generically for non-x86 platforms
I tested these locally on an x86 machine by disabling the inline asm
codepath and confirming that it does the same bitflips as we do with the
inline asm.
Craig Topper [Wed, 6 Jun 2018 00:24:55 +0000 (00:24 +0000)]
[X86] Add builtins for vector element insert and extract for different 128 and 256 bit vector types. Use them to implement the extract and insert intrinsics.
Previously we were just using extended vector operations in the header file.
This unfortunately allowed non-constant indices to be used with the intrinsics. This is incompatible with gcc, icc, and MSVC. It also introduces a different performance characteristic because non-constant index gets lowered to a vector store and an element sized load.
By adding the builtins we can check for the index to be a constant and ensure its in range of the vector element count.
User code still has the option to use extended vector operations themselves if they need non-constant indexing.
Craig Topper [Tue, 5 Jun 2018 22:40:03 +0000 (22:40 +0000)]
[X86] Implement __builtin_ia32_vec_ext_v2si correctly even though we only use it with an index of 0.
This builtin takes an index as its second operand, but the codegen hardcodes an index of 0 and doesn't use the operand. The only use of the builtin in the header file passes 0 to the operand so this works for that usage. But its more correct to use the real operand.
Craig Topper [Tue, 5 Jun 2018 21:54:35 +0000 (21:54 +0000)]
[X86] Make __builtin_ia32_vec_ext_v2si require ICE for its index argument. Add warnings for out of range indices for __builtin_ia32_vec_ext_v2si, __builtin_ia32_vec_ext_v4hi, and __builtin_ia32_vec_set_v4hi.
These should take a constant value for an index and that constant should be a valid element number.
Yaxun Liu [Tue, 5 Jun 2018 15:11:02 +0000 (15:11 +0000)]
[CUDA][HIP] Do not emit type info when compiling for device
CUDA/HIP does not support RTTI on device side, therefore there
is no point of emitting type info when compiling for device.
Emitting type info for device not only clutters the IR with useless
global variables, but also causes undefined symbol at linking
since vtable for cxxabiv1::class_type_info has external linkage.