Craig Topper [Sat, 25 Mar 2017 06:52:52 +0000 (06:52 +0000)]
[InstCombine] Change the interface of SimplifyDemandedBits so that it takes the instruction and operand instead of the Use.
The first thing it did was get the User for the Use to get the instruction back. This requires looking through the Uses for the User using the waymarking walk. That's pretty fast, but its probably still better to just pass the Instruction we already had.
Evgeniy Stepanov [Sat, 25 Mar 2017 01:01:11 +0000 (01:01 +0000)]
[asan] Put ctor/dtor in comdat.
When possible, put ASan ctor/dtor in comdat.
The only reason not to is global registration, which can be
TU-specific. This is not the case when there are no instrumented
globals. This is also limited to ELF targets, because MachO does
not have comdat, and COFF linkers may GC comdat constructors.
The benefit of this is a lot less __asan_init() calls: one per DSO
instead of one per TU. It's also necessary for the upcoming
gc-sections-for-globals change on Linux, where multiple references to
section start symbols trigger quadratic behaviour in gold linker.
[libFuzzer] read asan's dedup_token while minimizing a crash and stop minimization if another bug was found during minimization (https://github.com/google/oss-fuzz/issues/452)
Reid Kleckner [Fri, 24 Mar 2017 23:28:42 +0000 (23:28 +0000)]
[codeview] Don't assert when the user violates the ODR
If we have an array of a user-defined aggregates for which there was an
ODR violation, then the array size will not necessarily match the number
of elements times the size of the element.
Jessica Paquette [Fri, 24 Mar 2017 23:00:21 +0000 (23:00 +0000)]
[Outliner] Revert r298734.
When I tested r298734, I thought that red zones were enabled by default like in
X86. Since red zones are behind a flag on AArch64 the testing wasn't true.
Move spill size and alignment info from MC to TargetRegisterInfo
This is another step towards implementing register classes with
parametrized register/spill sizes and value types.
This is an updated version of r298652. The difference is that MCRegister-
Class still contains register size, available as getPhysRegSize(). The
old function getSize was retained as a temporary measure to avoid build
breakage for out-of-tree targets.
Matt Arsenault [Fri, 24 Mar 2017 20:57:10 +0000 (20:57 +0000)]
AMDGPU: Fix annotating loops with nested loop conditions
If the branch condition for a loop was a phi which itself
was fed from a phi from a loop, it isn't safe to try
to delete the phi until after the loop is handled.
Ivan Krasin [Fri, 24 Mar 2017 20:49:43 +0000 (20:49 +0000)]
Revert r298620: [LV] Vectorize GEPs
Reason: breaks linking Chromium with LLD + ThinLTO (a pass crashes)
LLVM bug: https://bugs.llvm.org//show_bug.cgi?id=32413
Original change description:
[LV] Vectorize GEPs
This patch adds support for vectorizing GEPs. Previously, we only generated
vector GEPs on-demand when creating gather or scatter operations. All GEPs from
the original loop were scalarized by default, and if a pointer was to be stored
to memory, we would have to build up the pointer vector with insertelement
instructions.
With this patch, we will vectorize all GEPs that haven't already been marked
for scalarization.
The patch refines collectLoopScalars to more exactly identify the scalar GEPs.
The function now more closely resembles collectLoopUniforms. And the patch
moves vector GEP creation out of vectorizeMemoryInstruction and into the main
vectorization loop. The vector GEPs needed for gather and scatter operations
will have already been generated before vectoring the memory accesses.
Original Differential Revision: https://reviews.llvm.org/D30710
Evgeniy Stepanov [Fri, 24 Mar 2017 20:42:15 +0000 (20:42 +0000)]
[asan] Delay creation of asan ctor.
Create the constructor in the module pass.
This in needed for the GC-friendly globals change, where the constructor can be
put in a comdat in some cases, but we don't know about that in the function
pass.
Matt Arsenault [Fri, 24 Mar 2017 19:52:05 +0000 (19:52 +0000)]
AMDGPU: Unify divergent function exits.
StructurizeCFG can't handle cases with multiple
returns creating regions with multiple exits.
Create a copy of UnifyFunctionExitNodes that only
unifies exit nodes that skips exit nodes
with uniform branch sources.
Craig Topper [Fri, 24 Mar 2017 16:56:51 +0000 (16:56 +0000)]
[InstCombine] Provide a way to calculate KnownZero/One for Add/Sub in SimplifyDemandedUseBits without recursing into ComputeKnownBits
SimplifyDemandedUseBits for Add/Sub already recursed down LHS and RHS for simplifying bits. If that didn't provide any simplifications we fall back to calling computeKnownBits which will recurse again. Instead just take the known bits for LHS and RHS we already have and call into a new function in ValueTracking that can calculate the known bits given the LHS/RHS bits.
Teresa Johnson [Fri, 24 Mar 2017 12:50:45 +0000 (12:50 +0000)]
Remove stale and unused (MC)TargetOptions comparators.
Summary:
I discovered accidentally that the operator== for TargetOptions
is stale - it is missing many fields that have been added over
the recent years. It isn't used, so remove it. Ditto for the
comparator in MCTargetOptions, which doesn't seem stale yet but is
unused.
Max Kazantsev [Fri, 24 Mar 2017 06:19:00 +0000 (06:19 +0000)]
[ScalarEvolution] Re-enable Predicate implication from operations
The patch rL298481 was reverted due to crash on clang-with-lto-ubuntu build.
The reason of the crash was type mismatch between either a or b and RHS in the following situation:
LHS = sext(a +nsw b) > RHS.
This is quite rare, but still possible situation. Normally we need to cast all {a, b, RHS} to their widest type.
But we try to avoid creation of new SCEV that are not constants to avoid initiating recursive analysis that
can take a lot of time and/or cache a bad value for iterations number. To deal with this, in this patch we
reject this case and will not try to analyze it if the type of sum doesn't match with the type of RHS. In this
situation we don't need to create any non-constant SCEVs.
This patch also adds an assertion to the method IsProvedViaContext so that we could fail on it and not
go further into range analysis etc (because in some situations these analyzes succeed even when the passed
arguments have wrong types, what should not normally happen).
The patch also contains a fix for a problem with too narrow scope of the analysis caused by wrong
usage of predicates in recursive invocations.
The regression test on the said failure: test/Analysis/ScalarEvolution/implied-via-addition.ll
Daniel Berlin [Fri, 24 Mar 2017 05:30:34 +0000 (05:30 +0000)]
NewGVN: Fix PR32403 - Handling of undef in phis was not quite correct
due to LLVM's view of phi nodes. It would cause NewGVN not to fixpoint
in some interesting edge cases.
Petr Hosek [Fri, 24 Mar 2017 02:21:11 +0000 (02:21 +0000)]
[CMake] Support single target builtins build on Darwin
This change allows cross-compiling compiler-rt builtins for
multiple targets as part of runtimes on Darwin. This functionality
is already supported on other platforms.
[libFuzzer] increase kFeatureSetSize to 2^21 and make InputCorpus scale to that size. This will potentially make libFuzzer more sensitive on targets with lots of signals
Reid Kleckner [Thu, 23 Mar 2017 23:30:41 +0000 (23:30 +0000)]
[sancov] Don't instrument blocks with no insertion point
This prevents crashes when attempting to instrument functions containing
C++ try.
Sanitizer coverage will still fail at runtime when an exception is
thrown through a sancov instrumented function, but that seems marginally
better than what we have now. The full solution is to color the blocks
in LLVM IR and only instrument blocks that have an unambiguous color,
using the appropriate token.
Dehao Chen [Thu, 23 Mar 2017 23:26:00 +0000 (23:26 +0000)]
Set the prof weight correctly for call instructions in DeadArgumentElimination.
Summary: In DeadArgumentElimination, the call instructions will be replaced. We also need to set the prof weights so that function inlining can find the correct profile.
Bryant Wong [Thu, 23 Mar 2017 23:21:07 +0000 (23:21 +0000)]
[MetaRenamer] Don't rename library functions.
Library functions can have specific semantics that affect the behavior of
certain passes. DSE, for instance, gives special treatment to malloc-ed pointers
but not to pointers returned from an equivalently typed (but differently named)
function.
MetaRenamer ought not to alter program semantics, so library functions must
remain untouched.
Dehao Chen [Thu, 23 Mar 2017 23:14:11 +0000 (23:14 +0000)]
Use isFunctionHotInCallGraph to set the function section prefix.
Summary: The current prefix based function layout algorithm only looks at function's entry count, which is not sufficient. A function should be grouped together if its entry count or any call edge count is hot.
[Hexagon] Avoid infinite loops in HexagonLoopIdiomRecognition
- Avoid explosive growth of the simplification queue by not queuing
expressions that are alredy in it.
- Add an iteration counter and abort after a sufficiently large number
of iterations (assuming that it's a symptom of an infinite loop).
Reid Kleckner [Thu, 23 Mar 2017 21:36:25 +0000 (21:36 +0000)]
[PDB] Use two DBs when dumping the IPI stream
Summary:
When dumping these records from an object file section, we should use
only one type database. However, when dumping from a PDB, we should use
two: one for the type stream and one for the IPI stream.
Certain type records that normally live in the .debug$T object file
section get moved over to the IPI stream of the PDB file and they get
new indices.
So far, I've noticed that the MSVC linker always moves these records
into IPI:
- LF_FUNC_ID
- LF_MFUNC_ID
- LF_STRING_ID
- LF_SUBSTR_LIST
- LF_BUILDINFO
- LF_UDT_MOD_SRC_LINE
These records have index fields that can point into TPI or IPI. In
particular, LF_SUBSTR_LIST and LF_BUILDINFO point to LF_STRING_ID
records to describe compilation command lines.
I've modified the dumper to have an optional pointer to the item DB, and
to do type name lookup of these fields in that DB. See printItemIndex.
The result is that our pdbdump-headers.test is more faithful to the PDB
contents and the output is less confusing.
Jessica Paquette [Thu, 23 Mar 2017 21:27:38 +0000 (21:27 +0000)]
[Outliner] Fix compile-time overhead for candidate choice
The old candidate collection method in the outliner caused some very large
regressions in compile time on large tests. For MultiSource/Benchmarks/7zip it
caused a 284.07 s or 1156% increase in compile time. On average, using the
SingleSource/MultiSource tests, it caused an average increase of 8 seconds in
compile time (something like 1000%).
This commit replaces that candidate collection method with a new one which
only visits each node in the tree once. This reduces the worst compile time
increase (still 7zip) to a 0.542 s overhead (22%) and the average compile time
increase on SingleSource and MultiSource to 0.018 s (4%).
Dehao Chen [Thu, 23 Mar 2017 21:20:05 +0000 (21:20 +0000)]
Disable loop unrolling and icp in SamplePGO ThinLTO compile phase
Summary:
loop unrolling and icp will make the sample profile annotation much harder in the backend. So disable these 2 optimization in the ThinLTO compile phase.
Will add a test in cfe in a separate patch.
Craig Topper [Thu, 23 Mar 2017 21:00:13 +0000 (21:00 +0000)]
[InstCombine] Remove some code from visitAnd that dealt with trying to reduce the LHS of a sub to 0. This should now be fully handled by SimplifyDemandedInstructionBits now.
Now that we call ShrinkDemandedConstant on the RHS of sub this should be taken care of. This code doesn't trigger on any in tree regressions, but did before ShrinkDemandedConstant was added to the RHS.
Erich Keane [Thu, 23 Mar 2017 20:38:34 +0000 (20:38 +0000)]
LLVM Changes for alloc_align
GCC has the alloc_align attribute, which is similar to assume_aligned, except the attribute's parameter is the index of the integer parameter that needs aligning to.
This is the required LLVM changes to make this happen.
Adrian Prantl [Thu, 23 Mar 2017 20:23:42 +0000 (20:23 +0000)]
Zero-Initialize PrevInstBB when entering a new MachineFunction.
It is not guaranteed that the memory used for MachineBasicBlocks in
the previous MachineFunction hasn't been freed, so holding on to a
pointer to the last function's isn't correct. Particularly I have
observed the sret.ll testcase failing because the first BasicBlock in
the new function happened to be allocated to the exact same memory as
the previously saved and (deleted) PrevInstBB.
Anna Thomas [Thu, 23 Mar 2017 20:00:54 +0000 (20:00 +0000)]
[LVIPrinterPass] Print LVI info for function arguments
Using AssemblyAnnotationWriter for LVI printer prints
for instructions and basic blocks.
So, we explicitly need to print LVI info for the arguments of the function (these
are values and not instructions).
Teresa Johnson [Thu, 23 Mar 2017 19:47:39 +0000 (19:47 +0000)]
[ThinLTO] Add support for emitting minimized bitcode for thin link
Summary:
The cumulative size of the bitcode files for a very large application
can be huge, particularly with -g. In a distributed build environment,
all of these files must be sent to the remote build node that performs
the thin link step, and this can exceed size limits.
The thin link actually only needs the summary along with a bitcode
symbol table. Until we have a proper bitcode symbol table, simply
stripping the debug metadata results in significant size reduction.
Add support for an option to additionally emit minimized bitcode
modules, just for use in the thin link step, which for now just strips
all debug metadata. I plan to add a cc1 option so this can be invoked
easily during the compile step.
However, care must be taken to ensure that these minimized thin link
bitcode files produce the same index as with the original bitcode files,
as these original bitcode files will be used in the backends.
Specifically:
1) The module hash used for caching is typically produced by hashing the
written bitcode, and we want to include the hash that would correspond
to the original bitcode file. This is because we want to ensure that
changes in the stripped portions affect caching. Added plumbing to emit
the same module hash in the minimized thin link bitcode file.
2) The module paths in the index are constructed from the module ID of
each thin linked bitcode, and typically is automatically generated from
the input file path. This is the path used for finding the modules to
import from, and obviously we need this to point to the original bitcode
files. Added gold-plugin support to take a suffix replacement during the
thin link that is used to override the identifier on the MemoryBufferRef
constructed from the loaded thin link bitcode file. The assumption is
that the build system can specify that the minimized bitcode file has a
name that is similar but uses a different suffix (e.g. out.thinlink.bc
instead of out.o).
Added various tests to ensure that we get identical index files out of
the thin link step.
Zhaoshi Zheng [Thu, 23 Mar 2017 18:06:09 +0000 (18:06 +0000)]
Model ashr(shl(x, n), m) as mul(x, 2^(n-m)) when n > m
Given below case:
%y = shl %x, n
%z = ashr %y, m
when n = m, SCEV models it as sext(trunc(x)). This patch tries to handle
the case where n > m by using sext(mul(trunc(x), 2^(n-m)))) as the SCEV
expression.
Summary:
The true and false operands for the CMOV are operands 0 and 1.
ARMISelLowering.cpp::computeKnownBits was looking at operands 1 and 2
instead. This can cause CMOV instructions to be incorrectly folded into
BFI if value set by the CMOV is another CMOV, whose known bits are
computed incorrectly.
Adrian McCarthy [Thu, 23 Mar 2017 16:45:20 +0000 (16:45 +0000)]
Re-land: Make NativeExeSymbol a concrete subclass of NativeRawSymbol [PDB]
The new test should pass on all platforms now that llvm-pdbdump has the
`-color-output` option.
This moves exe symbol-specific method implementations out of NativeRawSymbol
into a concrete subclass. Also adds implementations for hasCTypes and
hasPrivateSymbols and a simple test to ensure the native reader can access
the summary information for the executable from the PDB.
Original Differential Revision: https://reviews.llvm.org/D31059
Matthew Simpson [Thu, 23 Mar 2017 16:29:58 +0000 (16:29 +0000)]
[LV] Vectorize GEPs
This patch adds support for vectorizing GEPs. Previously, we only generated
vector GEPs on-demand when creating gather or scatter operations. All GEPs from
the original loop were scalarized by default, and if a pointer was to be stored
to memory, we would have to build up the pointer vector with insertelement
instructions.
With this patch, we will vectorize all GEPs that haven't already been marked
for scalarization.
The patch refines collectLoopScalars to more exactly identify the scalar GEPs.
The function now more closely resembles collectLoopUniforms. And the patch
moves vector GEP creation out of vectorizeMemoryInstruction and into the main
vectorization loop. The vector GEPs needed for gather and scatter operations
will have already been generated before vectoring the memory accesses.
Matthew Simpson [Thu, 23 Mar 2017 16:07:21 +0000 (16:07 +0000)]
[LV] Delete unneeded scalar GEP creation code
The code for generating scalar base pointers in vectorizeMemoryInstruction is
not needed. We currently scalarize all GEPs and maintain the scalarized values
in VectorLoopValueMap. The GEP cloning in this unneeded code is the same as
that in scalarizeInstruction. The test cases that changed as a result of this
patch changed because we were able to reuse the scalarized GEP that we
previously generated instead of cloning a new one.
Adrian McCarthy [Thu, 23 Mar 2017 15:28:15 +0000 (15:28 +0000)]
Add option to control whether llvm-pdbdump outputs in color
Adds -color-output option to llvm-pdbdump pretty commands that lets the user
specify whether the output should have color. The default depends on whether
the output is going to a TTY (per prior discussion in
https://reviews.llvm.org/D31246).
This will enable tests that pipe llvm-pdbdump output to FileCheck to work
across platforms without regard to the differences in ANSI codes.
Igor Breger [Thu, 23 Mar 2017 15:25:57 +0000 (15:25 +0000)]
[GlobalISel][X86] Support G_STORE/G_LOAD operation
Summary:
1. Support pointer type as function argumnet and return value
2. G_STORE/G_LOAD - set legal action for i8/i16/i32/i64/f32/f64/vec128
3. RegisterBank - support typeless operations like G_STORE/G_LOAD, for scalar use GPR bank.
4. Support instruction selection for G_LOAD/G_STORE