NAKAMURA Takumi [Sat, 22 Aug 2015 04:53:52 +0000 (04:53 +0000)]
[CMake] Make LLVM_EXTERNAL_*_SOURCE_DIR consistent against older buildsites.
If corresponding in-tree subdirectory exists, just ignore LLVM_EXTERNAL* stuff.
Otherwise, set LLVM_TOOL_*_BUILD ON/OFF properly according to LLVM_EXTERNAL_*.
This makes easier to walk among old revisions *without* deleteing CMakeCache.txt.
Before r242059, LLVM_EXTERNAL_* was working like;
if(EXISTS ${*_SOURCE_DIR}/CMakeLists.txt)
set(*_BUILD ON CACHE)
if(*_BUILD is ON)
add_subdirectory(*_SOURCE_DIR)
endif()
endif()
JF Bastien [Fri, 21 Aug 2015 23:27:24 +0000 (23:27 +0000)]
Improve the determinism of MergeFunctions
Summary:
Merge functions previously relied on unsigned comparisons of pointer values to
order functions. This caused observable non-determinism in the compiler for
large bitcode programs. Basically, opt -mergefuncs program.bc | md5sum produces
different hashes when run repeatedly on the same machine. Differing output was
observed on three large bitcodes, but it was less frequent on the smallest file.
It is possible that this only manifests on the large inputs, hence remaining
undetected until now.
This patch fixes this by removing (almost, see below) all places where
comparisons between pointers are used to order functions. Most of these changes
are local, but the comparison of global values requires assigning an identifier
to each local in the order it is visited. This is very similar to the way the
comparison function identifies Value*'s defined within a function. Because the
order of visiting the functions and their subparts is deterministic, the
identifiers assigned to the globals will be as well, and the order of functions
will be deterministic.
With these changes, there is no more observed non-determinism. There is also
only minor slowdowns (negligible to 4%) compared to the baseline, which is
likely a result of the fact that global comparisons involve hash lookups and not
just pointer comparisons.
The one caveat so far is that programs containing BlockAddress constants can
still be non-deterministic. It is not clear what the right solution is here. In
particular, even if the global numbers are used to order by function, we still
need a way to order the BasicBlock*'s. Unfortunately, we cannot just bail out
and fail to order the functions or consider them equal, because we require a
total order over functions. Note that programs with BlockAddress constants are
relatively rare, so the impact of leaving this in is minor as long as this pass
is opt-in.
Adam Nemet [Fri, 21 Aug 2015 23:19:57 +0000 (23:19 +0000)]
[LAA] Hold bounds via ValueHandles during SCEV expansion
SCEV expansion can invalidate previously expanded values. For example
in SCEVExpander::ReuseOrCreateCast, if we already have the requested
cast value but it's not at the desired location, a new cast is inserted
and the old cast will be invalidated.
Therefore, when expanding the bounds for the pointers, a later entry can
invalidate the IR value for an earlier one. The fix is to store a value
handle rather than the value itself.
The newly added test has a more detailed description of how the bug
triggers.
This bug can have a negative but potentially highly variable performance
impact in Loop Distribution. Because one of the bound values was
invalidated and is an undef expression now, InstCombine is free to
transform the array overlap check:
Start0 <= End1 && Start1 <= End0
into:
Start0 <= End1
So depending on the runtime location of the arrays, we would detect a
conflict and fall back on the original loop of the versioned loop.
Also tested compile time with SPEC2006 LTO bc files.
Tom Stellard [Fri, 21 Aug 2015 22:47:27 +0000 (22:47 +0000)]
AMDGPU/SI: Better handle s_wait insertion
We can wait on either VM, EXP or LGKM.
The waits are independent.
Without this patch, a wait inserted because of one of them
would also wait for all the previous others.
This patch makes s_wait only wait for the ones we need for the next
instruction.
Here's an example of subtle perf reduction this patch solves:
The s_waitcnt vmcnt(1) is useless.
The reason it is added is because the last
buffer_load_format_xyzw needs s[44:47], which was issued
by the first s_load_dwordx4. It waits for all VM
before that call to have finished.
Internally after every instruction, 3 counters (for VM, EXP and LGTM)
are updated after every instruction. For example buffer_load_format_xyzw
will
increase the VM counter, and s_load_dwordx4 the LGKM one.
Without the patch, for every defined register,
the current 3 counters are stored, and are used to know
how long to wait when an instruction needs the register.
Because of that, the s[44:47] counter includes that to use the register
you need to wait for the previous buffer_load_format_xyzw.
Instead this patch stores only the counters that matter for the
register,
and puts zero for the other ones, since we don't need any wait for them.
Hal Finkel [Fri, 21 Aug 2015 21:34:24 +0000 (21:34 +0000)]
[PowerPC] PPCVSXFMAMutate should not segfault on undef input registers
When PPCVSXFMAMutate would look at the input addend register, it would get its
input value number. This would fail, however, if the register was undef,
causing a segfault. Don't segfault (just skip such FMA instructions).
Fixes the test case from PR24542 (although that may have been over-reduced).
Alex Lorenz [Fri, 21 Aug 2015 21:32:39 +0000 (21:32 +0000)]
AsmParser: Save and restore the parsing state for types using SlotMapping.
This commit extends the 'SlotMapping' structure and includes mappings for named
and numbered types in it. The LLParser is extended accordingly to fill out
those mappings at the end of module parsing.
This information is useful when we want to parse standalone constant values
at a later stage using the 'parseConstantValue' method. The constant values
can be constant expressions, which can contain references to types. In order
to parse such constant values, we have to restore the internal named and
numbered mappings for the types in LLParser, otherwise the parser will report
a parsing error. Therefore, this commit also introduces a new method called
'restoreParsingState' to LLParser, which uses the slot mappings to restore
some of its internal parsing state.
This commit is required to serialize constant value pointers in the machine
memory operands for the MIR format.
Alex Lorenz [Fri, 21 Aug 2015 21:12:44 +0000 (21:12 +0000)]
MIR Serialization: Print MCSymbol operands.
This commit allows the MIR printer to print the MCSymbol machine operands.
Unfortunately they can't be parsed at this time. I will create a bug that will
track the fact that the MCSymbol operands can't be parsed yet.
Sanjay Patel [Fri, 21 Aug 2015 20:17:26 +0000 (20:17 +0000)]
[x86] invert logic for attribute 'FeatureFastUAMem'
This is a 'no functional change intended' patch. It removes one FIXME, but adds several more.
Motivation: the FeatureFastUAMem attribute may be too general. It is used to determine if any
sized misaligned memory access under 32-bytes is 'fast'. From the added FIXME comments, however,
you can see that we're not consistent about this. Changing the name of the attribute makes it
clearer to see the logic holes.
Changing this to a 'slow' attribute also means we don't have to add an explicit 'fast' attribute
to new chips; fast unaligned accesses have been standard for several generations of CPUs now.
David Blaikie [Fri, 21 Aug 2015 20:16:51 +0000 (20:16 +0000)]
[opaque pointer type]: Pass explicit pointee type when building a constant GEP.
Gets a bit tricky in the ValueMapper, of course - not sure if we should
just expose a list of explicit types for each Value so that the
ValueMapper can be neutral to these special cases (it's OK for things
like load, where the explicit type is the result type - but when that's
not the case, it means plumbing through another "special" type... )
Dan Liew [Fri, 21 Aug 2015 18:10:57 +0000 (18:10 +0000)]
Filter libraries that are not installed out of CMake exports (currently
gtest and gtest_main) when generating ``Makefile.llvmbuild``.
Libraries that are not installed should not be exported because they
won't be available from an install tree. Rather than filtering out the
gtest libraries in cmake/modules/Makefile, simply teach llvm-build to
filter out libraries that will not be installed from its generated list
of exported libraries.
Note that LLVMBUILD_LIB_DEPS_* are used during our own CMake build
process so we cannot filter LLVMBUILD_LIB_DEPS_gtest* out in llvm-build.
We must leave this gtest filter logic in cmake/modules/Makefile.
Dan Liew [Fri, 21 Aug 2015 18:10:55 +0000 (18:10 +0000)]
llvm-build: Adopt generation of LLVM_LIBS_TO_EXPORT. Patch by
Brad King.
Move `LLVM_LIBS_TO_EXPORT` over to Makefile.llvmbuild and generate it
from `llvm-build` using the same logic used to export the dependencies
of these libraries. This avoids depending on `llvm-config`.
This refactoring was originally motivated by issue #24154 due to commit
r243297 (Fix `llvm-config` to emit the linker flag for the combined
shared object, 2015-07-27) changing the output of `llvm-config --libs`
to not have the individual libraries when we configure with
`--enable-shared`. That change was reverted by r244108 but this
refactoring makes sense on its own anyway.
Dan Liew [Fri, 21 Aug 2015 18:10:51 +0000 (18:10 +0000)]
llvm-build: Factor out duplicate cmake export listing. Patch by
Brad King.
The write_cmake_fragment and write_cmake_exports_fragment methods share
some logic for selecting libraries that CMake needs to know about.
Factor it out into a helper to avoid duplication.
Yaron Keren [Fri, 21 Aug 2015 17:31:03 +0000 (17:31 +0000)]
Disable Visual C++ 2013 Debug mode assert on null pointer in some STL algorithms,
such as std::equal on the third argument. This reverts previous workarounds.
Predefining _DEBUG_POINTER_IMPL disables Visual C++ 2013 headers from defining
it to a function performing the null pointer check. In practice, it's not that
bad since any function actually using the nullptr will seg fault. The other
iterator sanity checks remain enabled in the headers.
Reviewed by Aaron Ballmanþ and Duncan P. N. Exon Smith.
John Brawn [Fri, 21 Aug 2015 10:48:17 +0000 (10:48 +0000)]
[DAGCombiner] Fold together mul and shl when both are by a constant
This is intended to improve code generation for GEPs, as the index value is
shifted by the element size and in GEPs of multi-dimensional arrays the index
of higher dimensions is multiplied by the lower dimension size.
James Y Knight [Fri, 21 Aug 2015 04:17:56 +0000 (04:17 +0000)]
[Sparc] Support user-specified stack object overalignment.
Note: I do not implement a base pointer, so it's still impossible to
have dynamic realignment AND dynamic alloca in the same function.
This also moves the code for determining the frame index reference
into getFrameIndexReference, where it belongs, instead of inline in
eliminateFrameIndex.
[Begin long-winded screed]
Now, stack realignment for Sparc is actually a silly thing to support,
because the Sparc ABI has no need for it -- unlike the situation on
x86, the stack is ALWAYS aligned to the required alignment for the CPU
instructions: 8 bytes on sparcv8, and 16 bytes on sparcv9.
However, LLVM unfortunately implements user-specified overalignment
using stack realignment support, so for now, I'm going to go along
with that tradition. GCC instead treats objects which have alignment
specification greater than the maximum CPU-required alignment for the
target as a separate block of stack memory, with their own virtual
base pointer (which gets aligned). Doing it that way avoids needing to
implement per-target support for stack realignment, except for the
targets which *actually* have an ABI-specified stack alignment which
is too small for the CPU's requirements.
Further unfortunately in LLVM, the default canRealignStack for all
targets effectively returns true, despite that implementing that is
something a target needs to do specifically. So, the previous behavior
on Sparc was to silently ignore the user's specified stack
alignment. Ugh.
Yet MORE unfortunate, if a target actually does return false from
canRealignStack, that also causes the user-specified alignment to be
*silently ignored*, rather than emitting an error.
(I started looking into fixing that last, but it broke a bunch of
tests, because LLVM actually *depends* on having it silently ignored:
some architectures (e.g. non-linux i386) have smaller stack alignment
than spilled-register alignment. But, the fact that a register needs
spilling is not known until within the register allocator. And by that
point, the decision to not reserve the frame pointer has been frozen
in place. And without a frame pointer, stack realignment is not
possible. So, canRealignStack() returns false, and
needsStackRealignment() then returns false, assuming everyone can just
go on their merry way assuming the alignment requirements were
probably just suggestions after-all. Sigh...)
The module splitter splits a module into linkable partitions. It will
be used to implement parallel LTO code generation.
This initial version of the splitter does not attempt to deal with the
somewhat subtle symbol visibility issues around module splitting. These
will be dealt with in a future change.
Matthias Braun [Thu, 20 Aug 2015 23:33:34 +0000 (23:33 +0000)]
AArch64: Fix cmp;ccmp ordering
When producing conditional compare sequences for or operations we need
to negate the operands and the finally tested flags. The thing is if we negate
the finally tested flags this equals a logical negation of all previously
emitted expressions. There was a case missing where we have to order OR
expressions so they get emitted first.
Matthias Braun [Thu, 20 Aug 2015 23:33:31 +0000 (23:33 +0000)]
AArch64: Do not create CCMP on multiple users.
Create CMP;CCMP sequences from and/or trees does not gain us anything if
the and/or tree is materialized to a GP register anyway. While most of
the code already checked for hasOneUse() there was one important case
missing.
Adrian Prantl [Thu, 20 Aug 2015 22:00:30 +0000 (22:00 +0000)]
Rename Instruction::dropUnknownMetadata() to dropUnknownNonDebugMetadata()
and make it always preserve debug locations, since all callers wanted this
behavior anyway.
This is addressing a post-commit review feedback for r245589.
We don't currently fold this, but now that we use native IR for
the intrinsics (r245605), we can look through one bitcast to find
the broadcast scalar.
Ahmed Bougacha [Thu, 20 Aug 2015 20:36:19 +0000 (20:36 +0000)]
[X86] Replace avx2 broadcast intrinsics with native IR.
Since r245605, the clang headers don't use these anymore.
r245165 updated some of the tests already; update the others, add
an autoupgrade, remove the intrinsics, and cleanup the definitions.
This patch adds support for asan on aarch64-linux with 42-bit VMA
(current default config for 64K pagesize kernels). The support is
enabled by defining the SANITIZER_AARCH64_VMA to 42 at build time
for both clang/llvm and compiler-rt. The default VMA is 39 bits.
Adrian Prantl [Thu, 20 Aug 2015 18:23:56 +0000 (18:23 +0000)]
Fix a debug location handling bug in GVN.
Caught by the famous "DebugLoc describes the currect SubProgram" assertion.
When GVN is removing a nonlocal load it updates the debug location of the
SSA value it replaced the load with with the one of the load. In the
testcase this actually overwrites a valid debug location with an empty one.
In reality GVN has to make an arbitrary choice between two equally valid
debug locations. This patch changes to behavior to only update the
location if the value doesn't already have a debug location.
Adam Nemet [Thu, 20 Aug 2015 17:22:29 +0000 (17:22 +0000)]
[LVer] Fix FIXME: hide addPHINodes, NFC
Since Ashutosh made findDefsUsedOutsideOfLoop public, we can clean this
up.
Now clients that don't compute DefsUsedOutsideOfLoop can just call
versionLoop() and computing DefsUsedOutsideOfLoop will happen
implicitly. With that there is no reason to expose addPHINodes anymore.
Ashutosh, you can now drop the calls to findDefsUsedOutsideOfLoop and
addPHINodes in LVerLICM and things should just work.
James Molloy [Thu, 20 Aug 2015 16:33:44 +0000 (16:33 +0000)]
[ARM] Don't try and custom lower a vNi64 SETCC.
It won't go well. We've already marked 64-bit SETCCs as non-Custom, but it's just possible that a SETCC has a legal result type but an illegal operand type. If this happens, bail out before we create unselectable nodes.
Fixes PR24292. I tried to create a testcase but in 99% of cases we can't trigger this - not surprising that this bug has been latent since 2009.
Bjorn Steinbrink [Thu, 20 Aug 2015 08:25:28 +0000 (08:25 +0000)]
[DSE] Enable removal of lifetime intrinsics in terminating blocks
Usually DSE is not supposed to remove lifetime intrinsics, but it's
actually ok to remove them for dead objects in terminating blocks,
because they convey no extra information there. Until we hit a lifetime
start that cannot be removed, that is. Because from that point on the
lifetime intrinsics become interesting again, e.g. for stack coloring.
Chandler Carruth [Thu, 20 Aug 2015 08:06:03 +0000 (08:06 +0000)]
[ARC] Pull the ObjC ARC components that really serve the role of
analyses into LLVM's Analysis library rather than having them in
a Transforms library.
This is motivated by the need to have the core AliasAnalysis
infrastructure be aware of the ObjCARCAliasAnalysis. However, it also
seems like a nice and clean separation. Everything was very easy to move
and this doesn't create much clutter in the analysis library IMO.
Hal Finkel [Thu, 20 Aug 2015 01:18:20 +0000 (01:18 +0000)]
[PowerPC] Fix the int2fp(fp2int(x)) DAGCombine to ignore ppc_fp128
This DAGCombine was creating custom SDAG nodes with an illegal ppc_fp128
operand type because it was triggering on f64/f32 int2fp(fp2int(ppc_fp128 x)),
but shouldn't (it should only apply to f32/f64 types). The result was a crash.
Alex Lorenz [Thu, 20 Aug 2015 00:20:03 +0000 (00:20 +0000)]
MIR Serialization: Use the global value syntax for global value memory operands.
This commit modifies the serialization syntax so that the global IR values in
machine memory operands use the global value '@<name>' syntax instead of the
current '%ir.<name>' syntax.
The unnamed global IR values are handled by this commit as well, as the
existing global value parsing method can parse the unnamed globals already.
Alex Lorenz [Thu, 20 Aug 2015 00:12:57 +0000 (00:12 +0000)]
MIR Serialization: Change syntax for the call entry pseudo source values.
The global IR values in machine memory operands should use the global value
'@<name>' syntax instead of the current '%ir.<name>' syntax.
However, the global value call entry pseudo source values use the global value
syntax already. Therefore, the syntax for the call entry pseudo source values
has to be changed so that the global values and call entry global value PSVs
can be parsed without ambiguities.
David Blaikie [Wed, 19 Aug 2015 23:07:27 +0000 (23:07 +0000)]
Allow Optionals to be compared to None
This is something like nullopt in std::experimental::optional. Optional
could already be constructed from None, so this seems like an obvious
extension from there.
I have a use in a future patch for Clang, though it may not go that
way/end up used - so this seemed worth committing now regardless.