Anna Thomas [Thu, 23 Mar 2017 20:00:54 +0000 (20:00 +0000)]
[LVIPrinterPass] Print LVI info for function arguments
Using AssemblyAnnotationWriter for LVI printer prints
for instructions and basic blocks.
So, we explicitly need to print LVI info for the arguments of the function (these
are values and not instructions).
Teresa Johnson [Thu, 23 Mar 2017 19:47:39 +0000 (19:47 +0000)]
[ThinLTO] Add support for emitting minimized bitcode for thin link
Summary:
The cumulative size of the bitcode files for a very large application
can be huge, particularly with -g. In a distributed build environment,
all of these files must be sent to the remote build node that performs
the thin link step, and this can exceed size limits.
The thin link actually only needs the summary along with a bitcode
symbol table. Until we have a proper bitcode symbol table, simply
stripping the debug metadata results in significant size reduction.
Add support for an option to additionally emit minimized bitcode
modules, just for use in the thin link step, which for now just strips
all debug metadata. I plan to add a cc1 option so this can be invoked
easily during the compile step.
However, care must be taken to ensure that these minimized thin link
bitcode files produce the same index as with the original bitcode files,
as these original bitcode files will be used in the backends.
Specifically:
1) The module hash used for caching is typically produced by hashing the
written bitcode, and we want to include the hash that would correspond
to the original bitcode file. This is because we want to ensure that
changes in the stripped portions affect caching. Added plumbing to emit
the same module hash in the minimized thin link bitcode file.
2) The module paths in the index are constructed from the module ID of
each thin linked bitcode, and typically is automatically generated from
the input file path. This is the path used for finding the modules to
import from, and obviously we need this to point to the original bitcode
files. Added gold-plugin support to take a suffix replacement during the
thin link that is used to override the identifier on the MemoryBufferRef
constructed from the loaded thin link bitcode file. The assumption is
that the build system can specify that the minimized bitcode file has a
name that is similar but uses a different suffix (e.g. out.thinlink.bc
instead of out.o).
Added various tests to ensure that we get identical index files out of
the thin link step.
Zhaoshi Zheng [Thu, 23 Mar 2017 18:06:09 +0000 (18:06 +0000)]
Model ashr(shl(x, n), m) as mul(x, 2^(n-m)) when n > m
Given below case:
%y = shl %x, n
%z = ashr %y, m
when n = m, SCEV models it as sext(trunc(x)). This patch tries to handle
the case where n > m by using sext(mul(trunc(x), 2^(n-m)))) as the SCEV
expression.
Summary:
The true and false operands for the CMOV are operands 0 and 1.
ARMISelLowering.cpp::computeKnownBits was looking at operands 1 and 2
instead. This can cause CMOV instructions to be incorrectly folded into
BFI if value set by the CMOV is another CMOV, whose known bits are
computed incorrectly.
Adrian McCarthy [Thu, 23 Mar 2017 16:45:20 +0000 (16:45 +0000)]
Re-land: Make NativeExeSymbol a concrete subclass of NativeRawSymbol [PDB]
The new test should pass on all platforms now that llvm-pdbdump has the
`-color-output` option.
This moves exe symbol-specific method implementations out of NativeRawSymbol
into a concrete subclass. Also adds implementations for hasCTypes and
hasPrivateSymbols and a simple test to ensure the native reader can access
the summary information for the executable from the PDB.
Original Differential Revision: https://reviews.llvm.org/D31059
Matthew Simpson [Thu, 23 Mar 2017 16:29:58 +0000 (16:29 +0000)]
[LV] Vectorize GEPs
This patch adds support for vectorizing GEPs. Previously, we only generated
vector GEPs on-demand when creating gather or scatter operations. All GEPs from
the original loop were scalarized by default, and if a pointer was to be stored
to memory, we would have to build up the pointer vector with insertelement
instructions.
With this patch, we will vectorize all GEPs that haven't already been marked
for scalarization.
The patch refines collectLoopScalars to more exactly identify the scalar GEPs.
The function now more closely resembles collectLoopUniforms. And the patch
moves vector GEP creation out of vectorizeMemoryInstruction and into the main
vectorization loop. The vector GEPs needed for gather and scatter operations
will have already been generated before vectoring the memory accesses.
Matthew Simpson [Thu, 23 Mar 2017 16:07:21 +0000 (16:07 +0000)]
[LV] Delete unneeded scalar GEP creation code
The code for generating scalar base pointers in vectorizeMemoryInstruction is
not needed. We currently scalarize all GEPs and maintain the scalarized values
in VectorLoopValueMap. The GEP cloning in this unneeded code is the same as
that in scalarizeInstruction. The test cases that changed as a result of this
patch changed because we were able to reuse the scalarized GEP that we
previously generated instead of cloning a new one.
Adrian McCarthy [Thu, 23 Mar 2017 15:28:15 +0000 (15:28 +0000)]
Add option to control whether llvm-pdbdump outputs in color
Adds -color-output option to llvm-pdbdump pretty commands that lets the user
specify whether the output should have color. The default depends on whether
the output is going to a TTY (per prior discussion in
https://reviews.llvm.org/D31246).
This will enable tests that pipe llvm-pdbdump output to FileCheck to work
across platforms without regard to the differences in ANSI codes.
Igor Breger [Thu, 23 Mar 2017 15:25:57 +0000 (15:25 +0000)]
[GlobalISel][X86] Support G_STORE/G_LOAD operation
Summary:
1. Support pointer type as function argumnet and return value
2. G_STORE/G_LOAD - set legal action for i8/i16/i32/i64/f32/f64/vec128
3. RegisterBank - support typeless operations like G_STORE/G_LOAD, for scalar use GPR bank.
4. Support instruction selection for G_LOAD/G_STORE
Dehao Chen [Thu, 23 Mar 2017 14:43:10 +0000 (14:43 +0000)]
Do not set branch weight if the branch weight annotation is present.
Summary: ThinLTO will annotate the CFG twice. If the branch weight is set by the first annotation, we should not set the branch weight again in the second annotation because the first annotation is more accurate as there is less optimization that could affect debug info accuracy.
[X86][TD][vpmovm2 ] New TD pattern for the vpmovm2 instruction
Up until now, vpmovm2 instruction described its destination operand size
by the source operand size. This patch adds new pattern for the vpmovm2
instruction. The node describes new expansion of the destination (from
{128|256} to 512).
Craig Topper [Thu, 23 Mar 2017 06:15:56 +0000 (06:15 +0000)]
[IR] Use a binary search in DataLayout::getAlignmentInfo
Summary:
We currently do a linear scan through all of the Alignments array entries anytime getAlignmentInfo is called. I noticed while profiling compile time on a -O2 opt run that this function can be called quite frequently and was showing about as about 1% of the time in callgrind.
This patch puts the Alignments array into a sorted order by type and then by bitwidth. We can then do a binary search. And use the sorted nature to handle the special cases for INTEGER_ALIGN. Some of this is modeled after the sorting/searching we do for pointers already.
This reduced the time spent in this routine by about 2/3 in the one compilation I was looking at.
We could maybe improve this more by using a DenseMap to cache the results, but just sorting was easy and didn't require extra data structure. And I think it made the integer handling simpler.
Reid Kleckner [Thu, 23 Mar 2017 00:14:23 +0000 (00:14 +0000)]
[codeview] Move type index remapping logic to type merger
Summary:
This removes the 'remapTypeIndices' method on every TypeRecord class. My
original idea was that this would be the beginning of some kind of
generic entry point that would enumerate all of the TypeIndices inside
of a TypeRecord, so that we could write generic graph algorithms for
them without duplicating the knowledge of which fields are type index
fields everywhere. This never happened, and nothing else uses this
method. I need to change the API to deal with merging into IPI streams,
so let's move it into the file that uses it first.
[AMDGPU] Restructure code object metadata creation
- Rename runtime metadata -> code object metadata
- Make metadata not flow
- Switch enums to use ScalarEnumerationTraits
- Cleanup and move AMDGPUCodeObjectMetadata.h to AMDGPU/MCTargetDesc
- Introduce in-memory representation for attributes
- Code object metadata streamer
- Create metadata for isa and printf during EmitStartOfAsmFile
- Create metadata for kernel during EmitFunctionBodyStart
- Finalize and emit metadata to .note during EmitEndOfAsmFile
- Other minor improvements/bug fixes
[libFuzzer] add two experimental flags to make corpus merging more scalable: -save_coverage_summary/-load_coverage_summary. This is still WIP, the documentation will come later if these flags survive
Anna Thomas [Wed, 22 Mar 2017 19:27:12 +0000 (19:27 +0000)]
[LVI] Add an LVI printer pass to capture test LVI cache after transformations
Summary:
Adding a printer pass for printing the LVI cache values after transformations
that use LVI.
This will help us in identifying cases where LVI
invariants are violated, or transforms that leave LVI in an incorrect state.
Right now, I have added two test cases to show that the printer pass is working.
I will be adding more test cases in a later change, once this change is
checked in upstream.
SROA was dropping the nonnull metadata on loads from allocas that got optimized out. This patch simply preserves nonnull metadata on loads through SROA and mem2reg.
IPO: Const correctness for summaries passed into passes.
Pass const qualified summaries into importers and unqualified summaries into
exporters. This lets us const-qualify the summary argument to thinBackend.
Sanjay Patel [Wed, 22 Mar 2017 17:10:44 +0000 (17:10 +0000)]
[InstCombine] canonicalize insertelement of scalar constant ahead of insertelement of variable
insertelement (insertelement X, Y, IdxC1), ScalarC, IdxC2 -->
insertelement (insertelement X, ScalarC, IdxC2), Y, IdxC1
As noted in the code comment and seen in the test changes, the motivation is that by pulling
constant insertion up, we may be able to constant fold some insertelement instructions.
Adrian Prantl [Wed, 22 Mar 2017 16:50:16 +0000 (16:50 +0000)]
Fix PR32298 by adding an early exit to getFrameIndexExprs().
Also add an assertion for the case that there are multiple FI
expressions with a DW_OP_LLVM_fragment; which should violate internal
constraints in DbgVariable.
Zachary Turner [Wed, 22 Mar 2017 16:30:06 +0000 (16:30 +0000)]
Make the home_directory test a little more resilient.
It's possible (albeit strange) for $HOME to intentionally
point somewhere other than the user's home directory as
reported by the password database. Our test shouldn't fail
in this case. This patch updates the test to pull directly
from the password database before unsetting $HOME, rather
than comparing the return value of home_directory() to the
original value of the environment variable.
Zachary Turner [Wed, 22 Mar 2017 15:24:59 +0000 (15:24 +0000)]
Make home_directory look in the password database in addition to $HOME.
This is something of an edge case, but when the $HOME environment
variable is not set, we can still look in the password database
to get the current user's home directory.
Added a test for this by getting the value of $HOME, then unsetting
it, then calling home_directory() and verifying that it succeeds
and that the value is the same as what we originally read from
the environment.
Artyom Skrobov [Wed, 22 Mar 2017 15:09:30 +0000 (15:09 +0000)]
[ARM] t2_so_imm_neg had a subtle bug in the conversion, and could trigger UB by negating (int)-2147483648. By pure luck, none of the pre-existing tests triggered this; so I'm adding one.
Summary: Thanks to Vitaly Buka for helping catch this.
r286814 resulted that CallPenalty can be subtracted twice:
- First time, during calculation of the cost in InlineCost.cpp
- Second time, during calculation of the cost in Inliner.cpp
Max Kazantsev [Wed, 22 Mar 2017 04:48:46 +0000 (04:48 +0000)]
[ScalarEvolution] Predicate implication from operations
This patch allows SCEV predicate analysis to prove implication of some expression predicates
from context predicates related to arguments of those expressions.
It introduces three new rules:
For addition:
(A >X && B >= 0) || (B >= 0 && A > X) ===> (A + B) > X.
For division:
(A > X) && (0 < B <= X + 1) ===> (A / B > 0).
(A > X) && (-B <= X < 0) ===> (A / B >= 0).
Using these rules, SCEV is able to prove facts like "if X > 1 then X / 2 > 0".
They can also be combined with the same context, to prove more complex expressions like
"if X > 1 then X/2 + 1 > 1".
Craig Topper [Wed, 22 Mar 2017 04:03:53 +0000 (04:03 +0000)]
[InstCombine] Teach SimplifyDemandedUseBits to shrink Constants on the left side of subtracts
Summary: Subtracts can have constants on the left side, but we don't shrink them based on demanded bits. This patch fixes that to match the right hand side.
This patch changes the behavior of IRTranslating intrinsics where we
now create VREG + G_CONSTANT for ConstantInt values. We already do this
for FloatingPoint values. This makes it easier for the backends to
select code and it won't have to de-duplicate creation+selection of
constants.
Adrian Prantl [Wed, 22 Mar 2017 01:16:01 +0000 (01:16 +0000)]
Don't compose DWARF expressions with multiple subregisters.
If a register location can only be described by a complex expression
(i.e., multiple subregisters) it doesn't safely compose with another
complex expression. For example, it is not possible to apply a
DW_OP_deref operation to multiple DW_OP_pieces.
Quentin points out that r298358 would cause us to emit different code
with debug info. That's a big no-no; also erase the instructions that
only live thanks to DBG_VALUE users.
Adrian explained how this is an existing problem and an OK thing to do:
clang has allocas for all variables so shouldn't be affected at -O0, but
swift uses a bit of inlineasm to explicitly keep values live for the
purpose of debug info quality. I'm not sure there is a better scheme.
Craig Topper [Tue, 21 Mar 2017 23:04:23 +0000 (23:04 +0000)]
[IR] Remove validAlignment and validPointer methods DataLayout as they aren't used.
I don't think validAlignment has been used since r34358 in 2007. I think validPointer was copied from validAlignment some time later, but it definitely wasn't used in the first commit that contained it.
Matthias Braun [Tue, 21 Mar 2017 21:58:08 +0000 (21:58 +0000)]
SplitKit: Fix subreg copy related problems
Fix two problems related to r298025:
- SplitKit would create duplicate VNIs in some cases leading to crashs
when hoisting copies.
- VirtRegMap could fail expanding copies at the beginning of a basic
block.
Matt Arsenault [Tue, 21 Mar 2017 21:39:51 +0000 (21:39 +0000)]
AMDGPU: Mark all unspecified CC functions in tests as amdgpu_kernel
Currently the default C calling convention functions are treated
the same as compute kernels. Make this explicit so the default
calling convention can be changed to a non-kernel.
Converted with perl -pi -e 's/define void/define amdgpu_kernel void/'
on the relevant test directories (and undoing in one place that actually
wanted a non-kernel).
Zachary Turner [Tue, 21 Mar 2017 20:27:36 +0000 (20:27 +0000)]
Improve StringMap iterator support.
StringMap's iterators did not support LLVM's
iterator_facade_base, which made it unusable in various
STL algorithms or with some of our range adapters.
This patch makes both StringMapConstIterator as well as
StringMapIterator support iterator_facade_base.
With this in place, it is easy to make an iterator adapter
that iterates over only keys, and whose value_type is
StringRef. So I add StringMapKeyIterator as well, and
provide the method StringMap::keys() that returns a
range that can be iterated.
Dehao Chen [Tue, 21 Mar 2017 19:55:36 +0000 (19:55 +0000)]
Do not inline hot callsites for samplepgo in thinlto compile phase.
Summary: Because SamplePGO passes will be invoked twice in ThinLTO build: once at compile phase, the other at backend. We want to make sure the IR at the 2nd phase matches the hot part in profile, thus we do not want to inline hot callsites in the first phase.