Artur Pilipenko [Mon, 27 Jun 2016 16:54:33 +0000 (16:54 +0000)]
Revert -r273892 "Support arbitrary addrspace pointers in masked load/store intrinsics" since some of the clang tests don't expect to see the updated signatures.
Artur Pilipenko [Mon, 27 Jun 2016 16:29:26 +0000 (16:29 +0000)]
Support arbitrary addrspace pointers in masked load/store intrinsics
This is a resubmittion of 263158 change after fixing the existing problem with intrinsics mangling (see LTO and intrinsics mangling llvm-dev thread for details).
This patch fixes the problem which occurs when loop-vectorize tries to use @llvm.masked.load/store intrinsic for a non-default addrspace pointer. It fails with "Calling a function with a bad signature!" assertion in CallInst constructor because it tries to pass a non-default addrspace pointer to the pointer argument which has default addrspace.
The fix is to add pointer type as another overloaded type to @llvm.masked.load/store intrinsics.
Kuba Brecka [Mon, 27 Jun 2016 15:57:08 +0000 (15:57 +0000)]
[asan] fix false dynamic-stack-buffer-overflow report with constantly-sized dynamic allocas, LLVM part
See the bug report at https://github.com/google/sanitizers/issues/691. When a dynamic alloca has a constant size, ASan instrumentation will treat it as a regular dynamic alloca (insert calls to poison and unpoison), but the backend will turn it into a regular stack variable. The poisoning/unpoisoning is then broken. This patch will treat such allocas as static.
Renato Golin [Mon, 27 Jun 2016 14:42:20 +0000 (14:42 +0000)]
[ARM] Fix Thumb text sections' flags under COFF/Windows
The main issue here is that the "thumb" flag wasn't set for some of these
sections, making MSVC's link.exe fails to correctly relocate code
against the symbols inside these sections. link.exe could fail for
instance with the "fixup is not aligned for target 'XX'" error. If
linking doesn't fail, the relocation process goes wrong in the end and
invalid code is generated by the linker.
This patch adds Thumb/ARM information so that the right flags are set
on COFF/Windows.
Diana Picus [Mon, 27 Jun 2016 09:08:23 +0000 (09:08 +0000)]
[ARM] Do not test for CPUs, use SubtargetFeatures (Part 2). NFCI
This is a follow-up for r273544.
The end goal is to get rid of the isSwift / isCortexXY / isWhatever methods.
Since the ARM backend seems to have quite a lot of calls to these methods, I
intend to submit 5-6 subtarget features at a time, instead of one big lump.
Pawel Bylica [Mon, 27 Jun 2016 08:46:23 +0000 (08:46 +0000)]
CachePruning: correct comment about file order. NFC
Summary: Actually the list of cached files is sorted by file size, not by last accessed time. Also remove unused file access time param for a helper function.
Simon Pilgrim [Mon, 27 Jun 2016 07:44:32 +0000 (07:44 +0000)]
[X86][AVX] Peek through bitcasts to find the source of broadcasts
AVX1 can only broadcast vectors as floats/doubles, so for 256-bit vectors we insert bitcasts if we are shuffling v8i32/v4i64 types. Unfortunately the presence of these bitcasts prevents the current broadcast lowering code from peeking through cases where we have concatenated / extracted vectors to create the 256-bit vectors.
This patch allows us to peek through bitcasts as long as the number of elements doesn't change (i.e. element bitwidth is the same) so the broadcast index is not affected.
Note this bitcast peek is different from the stage later on which doesn't care about the type and is just trying to find a load node.
[lit] Add SANITIZER_IGNORE_CVE_2016_2143 to pass_vars.
This variable is used by ASan (and other sanitizers in the future)
on s390x-linux to override a check for CVE-2016-2143 in the running
kernel (see revision 267747 on compiler-rt). Since the check simply
checks if the kernel version is in a whitelist of known-good versions,
it may miss distribution kernels, or manually-patched kernels - hence
the need for this variable. To enable running the ASan testsuite on
such kernels, this variable should be passed from the environment
down to the testcases.
Craig Topper [Sun, 26 Jun 2016 05:10:56 +0000 (05:10 +0000)]
[X86] Rewrite lowerVectorShuffleWithPSHUFB to not require a ZeroableMask to be created. We can do everything with the starting mask and zeroable bit vector. This removes the last usage of isSingleInputShuffleMask. NFC
Craig Topper [Sun, 26 Jun 2016 05:10:53 +0000 (05:10 +0000)]
[X86] Replace calls to isSingleInputShuffleMask with just checking if V2 is UNDEF. Canonicalization and creation of shuffle vector ensures this is equivalent.
Sanjoy Das [Sun, 26 Jun 2016 05:10:45 +0000 (05:10 +0000)]
[LoopUnswitch] Unswitch on conditions feeding into guards
Summary:
This is a straightforward extension of what LoopUnswitch does to
branches to guards. That is, we unswitch
```
for (;;) {
...
guard(loop_invariant_cond);
...
}
```
into
```
if (loop_invariant_cond) {
for (;;) {
...
// There is no need to emit guard(true)
...
}
} else {
for (;;) {
...
guard(false);
// SimplifyCFG will clean this up by adding an
// unreachable after the guard(false)
...
}
}
```
Vedant Kumar [Sun, 26 Jun 2016 02:45:13 +0000 (02:45 +0000)]
[llvm-cov] Simplify the way expansion views are rendered (NFC)
If a sub-view has already been rendered, it's helpful to re-render the
expansion site before rendering the next expansion view. Make this fact
explicit in the rendering interface, instead of hiding it behind an
awkward Optional<LineRef> parameter.
Craig Topper [Sat, 25 Jun 2016 19:05:29 +0000 (19:05 +0000)]
[X86] Convert ==/!= comparisons with -1 for checking undef in shuffle lowering to comparisons of <0 or >=0. While there do the same for other kinds of index checks that can just check for greater than 0. No functional change intended.
Jan Vesely [Sat, 25 Jun 2016 18:24:16 +0000 (18:24 +0000)]
AMDGPU/R600: Fix GlobalValue regressions.
Don't cast GV expression to MCSymbolRefExpr. r272705 changed GV to binary
expressions by including offset even if the offset it 0
(we haven't hit this sooner since tested workloads don't include static offsets)
We don't really care about the type of expression, so set it directly. Fixes: r272705
Consider section relative relocations. Since all const as data is in one boffer section relative is equivalent to abs32. Fixes: r273166
Differential Revision: http://reviews.llvm.org/D21633
David Majnemer [Sat, 25 Jun 2016 08:04:19 +0000 (08:04 +0000)]
[SimplifyCFG] Stop inserting calls to llvm.trap for UB
SimplifyCFG had logic to insert calls to llvm.trap for two very
particular IR patterns: stores and invokes of undef/null.
While InstCombine canonicalizes certain undefined behavior IR patterns
to stores of undef, phase ordering means that this cannot be relied upon
in general.
There are much better tools than llvm.trap: UBSan and ASan.
N.B. I could be argued into reverting this change if a clear argument as
to why it is important that we synthesize llvm.trap for stores, I'd be
hard pressed to see why it'd be useful for invokes...
[AMDGPU] Emit debugger prologue and emit the rest of the debugger fields in the kernel code header
Debugger prologue is emitted if -mattr=+amdgpu-debugger-emit-prologue.
Debugger prologue writes work group IDs and work item IDs to scratch memory at fixed location in the following format:
- offset 0: work group ID x
- offset 4: work group ID y
- offset 8: work group ID z
- offset 16: work item ID x
- offset 20: work item ID y
- offset 24: work item ID z
Set
- amd_kernel_code_t::debug_wavefront_private_segment_offset_sgpr to scratch wave offset reg
- amd_kernel_code_t::debug_private_segment_buffer_sgpr to scratch rsrc reg
- amd_kernel_code_t::is_debug_supported to true if all debugger features are enabled
Vedant Kumar [Sat, 25 Jun 2016 02:58:30 +0000 (02:58 +0000)]
[llvm-cov] Separate presentation logic from formatting logic, NFC
This makes it easier to add renderers for new kinds of output formats.
- Define and document a pure-virtual coverage rendering interface.
- Move the text-based rendering logic into its a new file.
- Re-work the API to better reflect the presentation/formatting split.
Matthias Braun [Sat, 25 Jun 2016 02:03:36 +0000 (02:03 +0000)]
MachineScheduler: Remember top/bottom choice in bidirectional scheduling
Remember the last choice for the top/bottom scheduling boundary in
bidirectional scheduling mode. The top choice should not change if we
schedule at the bottom and vice versa.
This allows us to improve compiletime: We only recalculate the best pick
for one border and re-use the cached top-pick from the other border.
David Majnemer [Sat, 25 Jun 2016 00:55:12 +0000 (00:55 +0000)]
The absence of noreturn doesn't ensure mayReturn
There are two separate issues:
- LLVM doesn't consider infinite loops to be side effects: we happily
hoist/sink above/below loops whose bounds are unknown.
- The absence of the noreturn attribute is insufficient for us to know
if a function will definitely return. Relying on noreturn in the
middle-end for any property is an accident waiting to happen.
This intrinsic safely loads a function pointer from a virtual table pointer
using type metadata. This intrinsic is used to implement control flow integrity
in conjunction with virtual call optimization. The virtual call optimization
pass will optimize away llvm.type.checked.load intrinsics associated with
devirtualized calls, thereby removing the type check in cases where it is
not needed to enforce the control flow integrity constraint.
This patch also introduces the capability to copy type metadata between
global variables, and teaches the virtual call optimization pass to do so.
In bidirectional scheduling this gives more stable results than just
comparing the "reason" fields of the top/bottom node because the reason
field may be higher depending on what other nodes are in the queue.
David Majnemer [Sat, 25 Jun 2016 00:04:10 +0000 (00:04 +0000)]
Reinstate r273711
r273711 was reverted by r273743. The inliner needs to know about any
call sites in the inlined function. These were obscured if we replaced
a call to undef with an undef but kept the call around.
Matthias Braun [Fri, 24 Jun 2016 23:52:11 +0000 (23:52 +0000)]
AMDGPU: Define a schedule class for COPY.
COPY was lacking a scheduling class, define it to avoid regressions in
the upcoming change to the bidirectional MachineScheduler. Approved by
tstellar on IRC.