Ahmed Bougacha [Sat, 4 Feb 2017 00:47:08 +0000 (00:47 +0000)]
[GlobalISel] Print the matched patterns using an action.
This lets us split out PatternToMatch from the top-level RuleMatcher,
where it doesn't really belong. That, in turn, lets us eventually
generate RuleMatchers from non-SelectionDAG sources.
Brendon Cahoon [Sat, 4 Feb 2017 00:10:22 +0000 (00:10 +0000)]
[RegisterCoalescer] Do not call getInstructionIndex with DBG_VALUE
An assert occurs when calling SlotIndexes::getInstructionIndex with
a DBG_VALUE instruction because the function expects an instruction
with a slot index. However, there is no slot index for a DBG_VALUE
instruction.
Sanjay Patel [Fri, 3 Feb 2017 23:13:11 +0000 (23:13 +0000)]
[InstCombine] treat i1 as a special type in shouldChangeType()
This patch is based on the llvm-dev discussion here:
http://lists.llvm.org/pipermail/llvm-dev/2017-January/109631.html
Folding to i1 should always be desirable because that's better for value tracking
and we have special folds for i1 types.
I checked for other users of shouldChangeType() where this might have an effect,
but we already handle the i1 case differently than other types in all of those cases.
Side note: the default datalayout includes i1, so it seems we only find this gap in
shouldChangeType + phi folding for the case when there is (1) an explicit datalayout
without i1, (2) casting to i1 from a legal type, and (3) a phi with exactly 2 incoming
casted operands (as Björn mentioned).
The code comments didn't match the code logic, and we didn't actually distinguish the fake unary (not/neg/fneg)
operators from arguments. Adding another level to the weighting scheme provides more structure and can help
simplify the pattern matching in InstCombine and other places.
I fixed regressions that would have shown up from this change in:
rL290067
rL290127
But that doesn't mean there are no pattern-matching logic holes left; some combines may just be missing regression tests.
Should fix:
https://llvm.org/bugs/show_bug.cgi?id=28296
[AMDGPU] Bump -amdgpu-unroll-threshold-private to 2000
This has quite positive performance impact according to measurements.
Before previous fixes to limit the optimization that was too high
and blowed compile time and scratch usage, but now this is gone and
we can bump the threshold.
Ahmed Bougacha [Fri, 3 Feb 2017 19:11:19 +0000 (19:11 +0000)]
[TLI] Robustize SDAG LibFunc proto checking by merging it into TLI.
This re-applies commit r292189, reverted in r292191.
SelectionDAGBuilder recognizes libfuncs using some homegrown
parameter type-checking.
Use TLI instead, removing another heap of redundant code.
This isn't strictly NFC, as the SDAG code was too lax.
Concretely, this means changes are required to a few tests:
- calling a non-variadic function via a variadic prototype isn't OK;
it just happens to work on x86_64 (but not on, e.g., aarch64).
- mempcpy has a size_t parameter; the SDAG code accepts any integer
type, which meant using i32 on x86_64 worked.
- a handful of SystemZ tests check the SDAG support for lax prototype
checking: Ulrich agrees on removing them.
I don't think it's worth supporting any of these (IMO) invalid
testcases. Instead, fix them to be more meaningful.
This generalizes memory access sorting to use differences between SCEVs,
instead of relying on constant offsets. That allows us to properly do
SLP vectorization of non-sequentially ordered loads within loops bodies.
Simon Dardis [Fri, 3 Feb 2017 15:48:53 +0000 (15:48 +0000)]
[mips] Remove absolute size assertion for end directive
The .end <symbol> directive for MIPS marks the end of a symbol and sets the
symbol's size. Previously, the corresponding emitDirective handler asserted
that a function's size could be evaluated to an absolute value at that point
in time.
This cannot be done with when directives like .align have been encountered,
instead set the function's size to the corresponding symbolic expression and
let ELFObjectWriter resolve the expression to an absolute value. This avoids
a redundant call to evaluateAsAbsolute.
Sanne Wouda [Fri, 3 Feb 2017 11:15:53 +0000 (11:15 +0000)]
[ARM] Change TCReturn to tBL if tailcall optimization fails.
Summary:
The tail call optimisation is performed before register allocation, so
at that point we don't know if LR is being spilt or not. If LR was spilt
to the stack, then we cannot do a tail call optimisation. That would
involve popping back into LR which is not possible in Thumb1 code.
Alexey Bataev [Fri, 3 Feb 2017 08:08:50 +0000 (08:08 +0000)]
[SLP] Fix for PR31690: Allow using of extra values in horizontal reductions.
Currently LLVM supports vectorization of horizontal reduction
instructions with initial value set to 0. Patch supports vectorization
of reduction with non-zero initial values. Also it supports a
vectorization of instructions with some extra arguments, like:
float f(float x[], int a, int b) {
float p = a % b;
p += x[0] + 3;
for (int i = 1; i < 32; i++)
p += x[i];
return p;
}
Patch allows vectorization of this kind of horizontal reductions.
Exit loop analysis early if suitable private access found.
Do not account for GEPs which are invariant to loop induction variable.
Do not account for Allocas which are too big to fit into register file anyway.
Add option for tuning: -amdgpu-unroll-threshold-private.
Marcos Pividori [Fri, 3 Feb 2017 01:08:06 +0000 (01:08 +0000)]
[sanitizer coverage] Fix Instrumentation to work on Windows.
On Windows, the symbols "___stop___sancov_guards" and "___start___sancov_guards"
are not defined automatically. So, we need to take a different approach.
We define 3 sections:
Section ".SCOV$A" will only hold a variable ___start___sancov_guard.
Section ".SCOV$M" will hold the main data.
Section ".SCOV$Z" will only hold a variable ___stop___sancov_guards.
When linking, they will be merged sorted by the characters after the $, so we
can use the pointers of the variables ___[start|stop]___sancov_guard to know the
actual range of addresses of that section.
In this diff, I updated instrumentation to include all the guard arrays in
section ".SCOV$M".
David Blaikie [Fri, 3 Feb 2017 00:44:18 +0000 (00:44 +0000)]
DebugInfo: ensure type and namespace names are included in pubnames/pubtypes even when they are only present in type units
While looking to add support for placing singular types (types that will
only be emitted in one place (such as attached to a strong vtable or
explicit template instantiation definition)) not in type units (since
type units have overhead) I stumbled across that change causing an
increase in pubtypes.
Turns out we were missing some types from type units if they were only
referenced from other type units and not from the debug_info section.
This fixes that, following GCC's line of describing the offset of such
entities as the CU die (since there's no compile unit-relative offset
that would describe such an entity - they aren't in the CU). Also like
GCC, this change prefers to describe the type stub within the CU rather
than the "just use the CU offset" fallback where possible. This may give
the DWARF consumer some opportunity to find the extra info in the type
stub - though I'm not sure GDB does anything with this currently.
The size of the pubnames/pubtypes sections now match exactly with or
without type units enabled.
This nearly triples (+189%) the pubtypes section for a clang self-host
and grows pubnames by 0.07% (without compression). For a total of 8%
increase in debug info sections of the objects of a Split DWARF build
when using type units.
Mehdi Amini [Fri, 3 Feb 2017 00:32:38 +0000 (00:32 +0000)]
[ThinLTO] Add an auto-hide feature
When a symbol is not exported outside of the
DSO, it is can be hidden. Usually we try to internalize
as much as possible, but it is not always possible, for
instance a symbol can be referenced outside of the LTO
unit, or there can be cross-module reference in ThinLTO.
This is a recommit of r293912 after fixing build failures,
and a recommit of r293918 after fixing LLD tests.
Bob Haarman [Thu, 2 Feb 2017 23:00:49 +0000 (23:00 +0000)]
[lto] add getLinkerOpts()
Summary: Some compilers, including MSVC and Clang, allow linker options to be specified in source files. In the legacy LTO API, there is a getLinkerOpts() method that returns linker options for the bitcode module being processed. This change adds that method to the new API, so that the COFF linker can get the right linker options when using the new LTO API.
Craig Topper [Thu, 2 Feb 2017 22:02:57 +0000 (22:02 +0000)]
[X86] Move turning 256-bit INSERT_SUBVECTORS into BLENDI from legalize to DAG combine.
On one test this seems to have given more chance for DAG combine to do other INSERT_SUBVECTOR/EXTRACT_SUBVECTOR combines before the BLENDI was created. Looks like we can still improve more by teaching DAG combine to optimize INSERT_SUBVECTOR/EXTRACT_SUBVECTOR with BLENDI.
1. Added comments for options
2. Added missing option cl::desc field
3. Uniified function filter option for graph viewing.
Now PGO count/raw-counts share the same
filter option: -view-bfi-func-name=.
On ELF every section can have a corresponding section symbol. When in
an assembly file we have
.quad .text
the '.text' refers to that symbol.
The way we used to handle them is to leave .text an undefined symbol
until the very end when the object writer would map them to the
actual section symbol.
The problem with that is that anything before the end would see an
undefined symbol. This could result in bad diagnostics
(test/MC/AArch64/label-arithmetic-diags-elf.s), or incorrect results
when using the asm streamer (est/MC/Mips/expansion-jal-sym-pic.s).
Fixing this will also allow using the section symbol earlier for
setting sh_link of SHF_METADATA sections.
This patch includes a few hacks to avoid changing our behaviour when
handling conflicts between section symbols and other symbols. I
reported pr31850 to track that.
Javed Absar [Thu, 2 Feb 2017 21:08:12 +0000 (21:08 +0000)]
[ARM] Classification Improvements to ARM Sched-Model. NFCI.
This is the second in the series of patches to enable adding
of machine sched-models for ARM processors easier and compact.
This patch focuses on integer instructions and adds missing
sched definitions.
[LiveRangeEdit] Don't mess up with LiveInterval when a new vreg is created.
In r283838, we added the capability of splitting unspillable register.
When doing so we had to make sure the split live-ranges were also
unspillable and we did that by marking the related live-ranges in the
delegate method that is called when a new vreg is created.
However, by accessing the live-range there, we also triggered their lazy
computation (LiveIntervalAnalysis::getInterval) which is not what we
want in general. Indeed, later code in LiveRangeEdit is going to build
the live-ranges this lazy computation may mess up that computation
resulting in assertion failures. Namely, the createEmptyIntervalFrom
method expect that the live-range is going to be empty, not computed.
Thanks to Mikael Holmén <mikael.holmen@ericsson.com> for noticing and
reporting the problem.
Marcos Pividori [Thu, 2 Feb 2017 19:07:53 +0000 (19:07 +0000)]
[libFuzzer] Properly handle exceptions with UnhandledExceptionFilter.
Use SetUnhandledExceptionFilter instead of AddVectoredExceptionHandler.
According to the documentation on Structured Exception Handling, this is the
order for the Exception Dispatching:
+ If the process is being debugged, the system notifies the debugger.
+ The Vectored Exception Handler is called.
+ The system attempts to locate a frame-based exception handler by searching the
stack frames of the thread in which the exception occurred.
+ If no frame-based handler can be found, the UnhandledExceptionFilter filter is
called.
+ Default handling based on the exception type.
So, similar to what we do for asan, we should use SetUnhandledExceptionFilter
instead of AddVectoredExceptionHandler, so user's code that is being fuzzed can
execute frame-based exception handlers before we catch them . We want to catch
unhandled exceptions, not all the exceptions.
Mehdi Amini [Thu, 2 Feb 2017 18:31:35 +0000 (18:31 +0000)]
[ThinLTO] Add an auto-hide feature
When a symbol is not exported outside of the
DSO, it is can be hidden. Usually we try to internalize
as much as possible, but it is not always possible, for
instance a symbol can be referenced outside of the LTO
unit, or there can be cross-module reference in ThinLTO.
This is a recommit of r293912 after fixing build failures.
Mehdi Amini [Thu, 2 Feb 2017 18:13:46 +0000 (18:13 +0000)]
[ThinLTO] Add an auto-hide feature
When a symbol is not exported outside of the
DSO, it is can be hidden. Usually we try to internalize
as much as possible, but it is not always possible, for
instance a symbol can be referenced outside of the LTO
unit, or there can be cross-module reference in ThinLTO.
Jun Bum Lim [Thu, 2 Feb 2017 15:12:34 +0000 (15:12 +0000)]
[JumpThread] Enhance finding partial redundant loads by continuing scanning single predecessor
Summary: While scanning predecessors to find an available loaded value, if the predecessor has a single predecessor, we can continue scanning through the single predecessor.