Pawel Bylica [Mon, 27 Jun 2016 08:46:23 +0000 (08:46 +0000)]
CachePruning: correct comment about file order. NFC
Summary: Actually the list of cached files is sorted by file size, not by last accessed time. Also remove unused file access time param for a helper function.
Simon Pilgrim [Mon, 27 Jun 2016 07:44:32 +0000 (07:44 +0000)]
[X86][AVX] Peek through bitcasts to find the source of broadcasts
AVX1 can only broadcast vectors as floats/doubles, so for 256-bit vectors we insert bitcasts if we are shuffling v8i32/v4i64 types. Unfortunately the presence of these bitcasts prevents the current broadcast lowering code from peeking through cases where we have concatenated / extracted vectors to create the 256-bit vectors.
This patch allows us to peek through bitcasts as long as the number of elements doesn't change (i.e. element bitwidth is the same) so the broadcast index is not affected.
Note this bitcast peek is different from the stage later on which doesn't care about the type and is just trying to find a load node.
[lit] Add SANITIZER_IGNORE_CVE_2016_2143 to pass_vars.
This variable is used by ASan (and other sanitizers in the future)
on s390x-linux to override a check for CVE-2016-2143 in the running
kernel (see revision 267747 on compiler-rt). Since the check simply
checks if the kernel version is in a whitelist of known-good versions,
it may miss distribution kernels, or manually-patched kernels - hence
the need for this variable. To enable running the ASan testsuite on
such kernels, this variable should be passed from the environment
down to the testcases.
Craig Topper [Sun, 26 Jun 2016 05:10:56 +0000 (05:10 +0000)]
[X86] Rewrite lowerVectorShuffleWithPSHUFB to not require a ZeroableMask to be created. We can do everything with the starting mask and zeroable bit vector. This removes the last usage of isSingleInputShuffleMask. NFC
Craig Topper [Sun, 26 Jun 2016 05:10:53 +0000 (05:10 +0000)]
[X86] Replace calls to isSingleInputShuffleMask with just checking if V2 is UNDEF. Canonicalization and creation of shuffle vector ensures this is equivalent.
Sanjoy Das [Sun, 26 Jun 2016 05:10:45 +0000 (05:10 +0000)]
[LoopUnswitch] Unswitch on conditions feeding into guards
Summary:
This is a straightforward extension of what LoopUnswitch does to
branches to guards. That is, we unswitch
```
for (;;) {
...
guard(loop_invariant_cond);
...
}
```
into
```
if (loop_invariant_cond) {
for (;;) {
...
// There is no need to emit guard(true)
...
}
} else {
for (;;) {
...
guard(false);
// SimplifyCFG will clean this up by adding an
// unreachable after the guard(false)
...
}
}
```
Vedant Kumar [Sun, 26 Jun 2016 02:45:13 +0000 (02:45 +0000)]
[llvm-cov] Simplify the way expansion views are rendered (NFC)
If a sub-view has already been rendered, it's helpful to re-render the
expansion site before rendering the next expansion view. Make this fact
explicit in the rendering interface, instead of hiding it behind an
awkward Optional<LineRef> parameter.
Craig Topper [Sat, 25 Jun 2016 19:05:29 +0000 (19:05 +0000)]
[X86] Convert ==/!= comparisons with -1 for checking undef in shuffle lowering to comparisons of <0 or >=0. While there do the same for other kinds of index checks that can just check for greater than 0. No functional change intended.
Jan Vesely [Sat, 25 Jun 2016 18:24:16 +0000 (18:24 +0000)]
AMDGPU/R600: Fix GlobalValue regressions.
Don't cast GV expression to MCSymbolRefExpr. r272705 changed GV to binary
expressions by including offset even if the offset it 0
(we haven't hit this sooner since tested workloads don't include static offsets)
We don't really care about the type of expression, so set it directly. Fixes: r272705
Consider section relative relocations. Since all const as data is in one boffer section relative is equivalent to abs32. Fixes: r273166
Differential Revision: http://reviews.llvm.org/D21633
David Majnemer [Sat, 25 Jun 2016 08:04:19 +0000 (08:04 +0000)]
[SimplifyCFG] Stop inserting calls to llvm.trap for UB
SimplifyCFG had logic to insert calls to llvm.trap for two very
particular IR patterns: stores and invokes of undef/null.
While InstCombine canonicalizes certain undefined behavior IR patterns
to stores of undef, phase ordering means that this cannot be relied upon
in general.
There are much better tools than llvm.trap: UBSan and ASan.
N.B. I could be argued into reverting this change if a clear argument as
to why it is important that we synthesize llvm.trap for stores, I'd be
hard pressed to see why it'd be useful for invokes...
[AMDGPU] Emit debugger prologue and emit the rest of the debugger fields in the kernel code header
Debugger prologue is emitted if -mattr=+amdgpu-debugger-emit-prologue.
Debugger prologue writes work group IDs and work item IDs to scratch memory at fixed location in the following format:
- offset 0: work group ID x
- offset 4: work group ID y
- offset 8: work group ID z
- offset 16: work item ID x
- offset 20: work item ID y
- offset 24: work item ID z
Set
- amd_kernel_code_t::debug_wavefront_private_segment_offset_sgpr to scratch wave offset reg
- amd_kernel_code_t::debug_private_segment_buffer_sgpr to scratch rsrc reg
- amd_kernel_code_t::is_debug_supported to true if all debugger features are enabled
Vedant Kumar [Sat, 25 Jun 2016 02:58:30 +0000 (02:58 +0000)]
[llvm-cov] Separate presentation logic from formatting logic, NFC
This makes it easier to add renderers for new kinds of output formats.
- Define and document a pure-virtual coverage rendering interface.
- Move the text-based rendering logic into its a new file.
- Re-work the API to better reflect the presentation/formatting split.
Matthias Braun [Sat, 25 Jun 2016 02:03:36 +0000 (02:03 +0000)]
MachineScheduler: Remember top/bottom choice in bidirectional scheduling
Remember the last choice for the top/bottom scheduling boundary in
bidirectional scheduling mode. The top choice should not change if we
schedule at the bottom and vice versa.
This allows us to improve compiletime: We only recalculate the best pick
for one border and re-use the cached top-pick from the other border.
David Majnemer [Sat, 25 Jun 2016 00:55:12 +0000 (00:55 +0000)]
The absence of noreturn doesn't ensure mayReturn
There are two separate issues:
- LLVM doesn't consider infinite loops to be side effects: we happily
hoist/sink above/below loops whose bounds are unknown.
- The absence of the noreturn attribute is insufficient for us to know
if a function will definitely return. Relying on noreturn in the
middle-end for any property is an accident waiting to happen.
This intrinsic safely loads a function pointer from a virtual table pointer
using type metadata. This intrinsic is used to implement control flow integrity
in conjunction with virtual call optimization. The virtual call optimization
pass will optimize away llvm.type.checked.load intrinsics associated with
devirtualized calls, thereby removing the type check in cases where it is
not needed to enforce the control flow integrity constraint.
This patch also introduces the capability to copy type metadata between
global variables, and teaches the virtual call optimization pass to do so.
In bidirectional scheduling this gives more stable results than just
comparing the "reason" fields of the top/bottom node because the reason
field may be higher depending on what other nodes are in the queue.
David Majnemer [Sat, 25 Jun 2016 00:04:10 +0000 (00:04 +0000)]
Reinstate r273711
r273711 was reverted by r273743. The inliner needs to know about any
call sites in the inlined function. These were obscured if we replaced
a call to undef with an undef but kept the call around.
Matthias Braun [Fri, 24 Jun 2016 23:52:11 +0000 (23:52 +0000)]
AMDGPU: Define a schedule class for COPY.
COPY was lacking a scheduling class, define it to avoid regressions in
the upcoming change to the bidirectional MachineScheduler. Approved by
tstellar on IRC.
Adrian Prantl [Fri, 24 Jun 2016 21:35:09 +0000 (21:35 +0000)]
Fix the type signature of DwarfExpression::Add.*Constant to support values >32 bits.
This fixes an embarrassing bug when emitting .debug_loc entries for 64-bit+ constants,
which were previously silently truncated to 32 bits.
IR: New representation for CFI and virtual call optimization pass metadata.
The bitset metadata currently used in LLVM has a few problems:
1. It has the wrong name. The name "bitset" refers to an implementation
detail of one use of the metadata (i.e. its original use case, CFI).
This makes it harder to understand, as the name makes no sense in the
context of virtual call optimization.
2. It is represented using a global named metadata node, rather than
being directly associated with a global. This makes it harder to
manipulate the metadata when rebuilding global variables, summarise it
as part of ThinLTO and drop unused metadata when associated globals are
dropped. For this reason, CFI does not currently work correctly when
both CFI and vcall opt are enabled, as vcall opt needs to rebuild vtable
globals, and fails to associate metadata with the rebuilt globals. As I
understand it, the same problem could also affect ASan, which rebuilds
globals with a red zone.
This patch solves both of those problems in the following way:
1. Rename the metadata to "type metadata". This new name reflects how
the metadata is currently being used (i.e. to represent type information
for CFI and vtable opt). The new name is reflected in the name for the
associated intrinsic (llvm.type.test) and pass (LowerTypeTests).
2. Attach metadata directly to the globals that it pertains to, rather
than using the "llvm.bitsets" global metadata node as we are doing now.
This is done using the newly introduced capability to attach
metadata to global variables (r271348 and r271358).
See also: http://lists.llvm.org/pipermail/llvm-dev/2016-June/100462.html
This patch moves MSSA's caching walker into MemorySSA, and moves the
actual definition of MSSA's caching walker out of MemorySSA.h. This is
done in preparation for the new walker, which should be out for review
soonish.
Also, this patch removes a field from UpwardsMemoryQuery and has a few
lines of diff from clang-format'ing MemorySSA.cpp.
Chris Bieneman [Fri, 24 Jun 2016 20:42:28 +0000 (20:42 +0000)]
[obj2yaml] [yaml2obj] Support for MachO Universal binaries
This patch adds round-trip support for MachO Universal binaries to obj2yaml and yaml2obj. Universal binaries have a header and list of architecture structures, followed by a the individual object files at specified offsets.
Sanjay Patel [Fri, 24 Jun 2016 18:26:02 +0000 (18:26 +0000)]
[InstCombine] consolidate commutation variants of matchSelectFromAndOr() in one place; NFCI
By putting all the possible commutations together, we simplify the code.
Note that this is NFCI, but I'm adding tests that actually exercise each
commutation pattern because we don't have this anywhere else.
Kevin Enderby [Fri, 24 Jun 2016 18:24:42 +0000 (18:24 +0000)]
Thread Expected<...> up from libObject’s getSymbolAddress() for symbols to allow
a good error message to be produced.
This is nearly the last libObject interface that used ErrorOr and the last one
that appears in llvm/include/llvm/Object/MachO.h . For Mach-O objects this is
just a clean up because it’s version of getSymbolAddress() can’t return an
error.
I will leave it to the experts on COFF and ELF to actually add meaning full
error messages in their tests if they wish. And also leave it to these experts
to change the last two ErrorOr interfaces in llvm/include/llvm/Object/ObjectFile.h
for createCOFFObjectFile() and createELFObjectFile() if they wish.
Since there are no test cases for COFF and ELF error cases with respect to
getSymbolAddress() in the test suite this is no functional change (NFC).