Target: Change various section classifiers in TargetLoweringObjectFile to take a GlobalObject.
These functions are about classifying a global which will actually be
emitted, so it does not make sense for them to take a GlobalValue which may
for example be an alias.
Change the Mach-O object writer and the Hexagon, Lanai and MIPS backends to
look through aliases before using TargetLoweringObjectFile interfaces. These
are functional changes but all appear to be bug fixes.
Adrian Prantl [Mon, 24 Oct 2016 18:23:51 +0000 (18:23 +0000)]
add-discriminators: Fix handling of lexical scopes.
This fixes a bug in the handling of lexical scopes, when more than one
scope is defined on the same line or functions are inlined into call
sites that are on the same line as the function definition. This
situation can easily happen in macro expansions.
The problem is solved by introducing a SmallDenseMap<DIScope *,
DILexicalBlockFile *, 1> that keeps track of all the different lexical
scopes that share a line/file location.
Geoff Berry [Mon, 24 Oct 2016 15:54:00 +0000 (15:54 +0000)]
[EarlyCSE] Optimize MemoryPhis and reduce memory clobber queries w/ MemorySSA
Summary:
When using MemorySSA, re-optimize MemoryPhis when removing a store since
this may create MemoryPhis with all identical arguments.
Also, when using MemorySSA to check if two MemoryUses are reading from
the same version of the heap, use the defining access instead of calling
getClobberingAccess, since the latter can currently result in many more
AA calls. Once the MemorySSA use optimization tracking changes are
done, we can remove this limitation, which should result in more loads
being CSE'd.
Ehsan Amiri [Mon, 24 Oct 2016 15:46:58 +0000 (15:46 +0000)]
[PPC] Better codegen for AND, ANY_EXT, SRL sequence
https://reviews.llvm.org/D24924
This improves the code generated for a sequence of AND, ANY_EXT, SRL instructions. This is a targetted fix for this special pattern. The pattern is generated by target independet dag combiner and so a more general fix may not be necessary. If we come across other similar cases, some ideas for handling it are discussed on the code review.
Nicolai Haehnle [Mon, 24 Oct 2016 14:56:02 +0000 (14:56 +0000)]
AMDGPU: Fix Two Address problems with v_movreld
Summary:
The v_movreld machine instruction is used with three operands that are
in a sense tied to each other (the explicit VGPR_32 def and the implicit
VGPR_NN def and use). There is no way to express that using the currently
available operand bits, and indeed there are cases where the Two Address
instructions pass does the wrong thing.
This patch introduces a new set of pseudo instructions that are identical
in intended semantics as v_movreld, but they only have two tied operands.
Having to add a new set of pseudo instructions is admittedly annoying, but
it's a fairly straightforward and solid approach. The only alternative I
see is to try to teach the Two Address instructions pass about Three Address
instructions, and I'm afraid that's trickier and is going to end up more
fragile.
Note that v_movrels does not suffer from this problem, and so this patch
does not touch it.
This fixes several GL45-CTS.shaders.indexing.* tests.
Nico Weber [Mon, 24 Oct 2016 14:52:04 +0000 (14:52 +0000)]
Revert 284971.
It seems to break selfhost on some bots, see e.g.
http://lab.llvm.org:8011/builders/clang-x86-windows-msvc2015/builds/21
http://lab.llvm.org:8011/builders/clang-ppc64be-linux-multistage/builds/20
http://lab.llvm.org:8011/builders/clang-ppc64be-linux-lnt/builds/22
Pavel Labath [Mon, 24 Oct 2016 14:19:28 +0000 (14:19 +0000)]
[Chrono] Fix !HAVE_FUTIMENS build
If we don't have futimens(), we fall back to futimes(), which only supports
microsecond timestamps. In that case, we need to explicitly cast away the extra
precision in setLastModificationAndAccessTime().
Pavel Labath [Mon, 24 Oct 2016 13:38:27 +0000 (13:38 +0000)]
[Object] Replace TimeValue with std::chrono
Summary:
Most of the changes are very straight-forward. The only choice I had to make was
to use second-precision time points in the Archive classes. I did this because
the archive files use that precision in the on-disk representation anyway.
Joel Jones [Mon, 24 Oct 2016 13:37:13 +0000 (13:37 +0000)]
AArch64 ILP32 relocations for assembly and ELF
Summary:
Add relocations for AArch64 ILP32. Includes:
- Addition of definitions for R_AARCH32_*
- Definition of new -target-abi: ilp32
- Definition of data layout string
- Tests for added relocations. Not comprehensive, but matches
existing tests for 64-bit. Renames "CHECK-OBJ" to "CHECK-OBJ-LP64".
- Tests for llvm-readobj
Pablo Barrio [Mon, 24 Oct 2016 13:04:45 +0000 (13:04 +0000)]
[JumpThreading] Unfold selects that depend on the same condition
Summary:
These are good candidates for jump threading. This enables later opts
(such as InstCombine) to combine instructions from the selects with
instructions out of the selects. SimplifyCFG will fold the select
again if unfolding wasn't worth it.
Pavel Labath [Mon, 24 Oct 2016 10:59:17 +0000 (10:59 +0000)]
Remove TimeValue usage from llvm/Support
Summary:
This is a follow-up to D25416. It removes all usages of TimeValue from
llvm/Support library (except for the actual TimeValue declaration), and replaces
them with appropriate usages of std::chrono. To facilitate this, I have added
small utility functions for converting time points and durations into appropriate
OS-specific types (FILETIME, struct timespec, ...).
Simon Dardis [Mon, 24 Oct 2016 10:23:59 +0000 (10:23 +0000)]
[mips] synci microMIPS instruction definition.
Add synci to the microMIPS instruction definitions, mark the MIPS sync & synci
as not being part of microMIPS. This does not cover the sync instruction alias,
as that will be handled with a different patch. Add sync to the valid tests for
microMIPS.
Simon Pilgrim [Sun, 23 Oct 2016 16:49:04 +0000 (16:49 +0000)]
[X86][SSE] Add SSE41/AVX1 costs for vector shifts.
We were defaulting to SSE2 costs which weren't taking into account the availability of PBLENDW/PBLENDVB to improve merging of per-element shift results.
Brian Gesiak [Sat, 22 Oct 2016 17:27:31 +0000 (17:27 +0000)]
[lit] Add more testing instructions to README
Summary:
r283710 introduced two regressions, one to llvm-lit, and the other to
lit executables that were installed via setuptools. Add instructions on
how to test for these regressions in the future.
Craig Topper [Sat, 22 Oct 2016 06:51:52 +0000 (06:51 +0000)]
[X86] Add support for lowering v4i64 and v8i64 shuffles directly to PALIGNR. I think shuffle combine can figure it out later, but we should try to get it right up front.
Craig Topper [Sat, 22 Oct 2016 06:51:44 +0000 (06:51 +0000)]
[X86] Remove 128-bit lane handling from the main loop of matchVectorShuffleAsByteRotate. Instead check for is128LaneRepeatedSuffleMask before the loop and just loop over the repeated mask.
I plan to use the loop to support VALIGND/Q shuffles so this makes it easier to reuse.
Gerolf Hoflehner [Sat, 22 Oct 2016 02:41:39 +0000 (02:41 +0000)]
[BasicAA] Fix - missed alias in GEP expressions
In BasicAA GEP operand values get adjusted ("wrap-around") based on the
pointersize. Otherwise, in non-64b modes, AA could report false negatives.
However, a wrap-around is valid only for a fully evaluated expression.
It had been introduced to fix an alias problem in
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160118/326163.html.
This commit restricts the wrap-around to constant gep operands only where the
value is known at compile-time.
David L. Jones [Fri, 21 Oct 2016 23:30:39 +0000 (23:30 +0000)]
Fix map insertion that is elided in release build.
The assert() macro doesn't actually execute its body in Release builds, so using
it to check cache invariants requires that the insertion be outside of the
assert() statement. This change does that, and also makes sure to return the
actual map contents.
Justin Lebar [Fri, 21 Oct 2016 22:10:23 +0000 (22:10 +0000)]
[ADT] Don't rely on string literals not being convertible to non-const char* in CachedHashString.
The build was breaking on some platforms because we assumed that
CachedHashString("foo") would match the CachedHashString(StringRef)
constructor rather than the CachedHashString(char*) constructor.
To fix this, provide a CachedHashString(const char*) constructor, and
add a dummy argument to the old CachedHashString(char*) constructor.
Justin Lebar [Fri, 21 Oct 2016 21:45:01 +0000 (21:45 +0000)]
Switch SmallSetVector to use DenseSet when it overflows its inline space.
Summary:
SetVector already used DenseSet, but SmallSetVector used std::set. This
leads to surprising performance differences. Moreover, it means that
the set of key types accepted by SetVector and SmallSetVector are
quite different!
In order to make this change, we had to convert some callsites that used
SmallSetVector<std::string, N> to use SmallSetVector<CachedHashString, N>
instead.
Li Huang [Fri, 21 Oct 2016 20:05:21 +0000 (20:05 +0000)]
[SCEV] Memoize visitMulExpr results in SCEVRewriteVisitor.
Summary:
When SCEVRewriteVisitor traverses the SCEV DAG, it may visit the same SCEV
multiple times if this SCEV is referenced by multiple other SCEVs. This has
exponential time complexity in the worst case. Memoizing the results will
avoid re-visiting the same SCEV. Add a map to save the results, and override
the visit function of SCEVVisitor. Now SCEVRewriteVisitor only visit each
SCEV once and thus returns the same result for the same input SCEV.
This patch fixes PR18606, PR18607.
Reviewers: Sanjoy Das, Mehdi Amini, Michael Zolotukhin
Kevin Enderby [Fri, 21 Oct 2016 20:03:14 +0000 (20:03 +0000)]
Fix a bug in the code of llvm-cxxdump in dumpArchive() when
iterating over an archive with object and non-object members that
would cause an Abort because to was not calling consumeError()
when the code was wanting to ignore a non-object file.
X86: Improve BT instruction selection for 64-bit values.
If a 64-bit value is tested against a bit which is known to be in the range
[0..31) (modulo 64), we can use the 32-bit BT instruction, which has a slightly
shorter encoding.
Anna Thomas [Fri, 21 Oct 2016 18:43:16 +0000 (18:43 +0000)]
[StripGCRelocates] New pass to remove gc.relocates added by RS4GC
Summary:
Utility pass to remove gc.relocates created by rewrite statepoints for GC.
With respect to safepoint verification, the IR generated would be incorrect, and cannot run
as such.
This would be a single transformation on the final optimized IR.
The benefit of the pass is for easy analysis when the IRs are 'polluted' by too
many gc.relocates.
Added tests.
test run: All RS4GC tests with -verify option. Local downstream tests on large
IR files. This also works when the pointer being gc.relocated is another
gc.relocate.
Sanjay Patel [Fri, 21 Oct 2016 17:24:26 +0000 (17:24 +0000)]
[DAG] fold negation of sign-bit
0 - X --> 0, if the sub is NUW
0 - X --> 0, if X is 0 or the minimum signed value and the sub is NSW
0 - X --> X, if X is 0 or the minimum signed value
This is the DAG equivalent of:
https://reviews.llvm.org/rL284649
plus the fold for the NUW case which already existed in InstSimplify.
Note that we miss a vector fold because of a deficiency in the DAG version of
computeKnownBits().
[Hexagon] Handle spills of partially defined double vector registers
After register allocation it is possible to have a spill of a register
that is only partially defined. That in itself it fine, but creates a
problem for double vector registers. Stores of such registers are pseudo
instructions that are expanded into pairs of individual vector stores,
and in case of a partially defined source, one of the stores may use
an entirely undefined register. To avoid this, track the defined parts
and only generate actual stores for those.
Derek Schuff [Fri, 21 Oct 2016 16:38:07 +0000 (16:38 +0000)]
[WebAssembly] Fix for 0xc call_indirect changes
Summary:
Need to reorder the operands to have the callee as the last argument.
Adds a pseudo-instruction, and a pass to lower it into a real
call_indirect.
This is the first of two options for how to fix the problem.
Sanjay Patel [Fri, 21 Oct 2016 14:58:30 +0000 (14:58 +0000)]
fix variable names; NFCI
Because we're just 'or-ing' these 2 variables later in the code, I
don't think there's a logical bug here, but of course the string with
"no size" is the one that should have the size suffix stripped off.
Sanjay Patel [Fri, 21 Oct 2016 14:36:58 +0000 (14:36 +0000)]
[DAG] use SDNode flags 'nsz' to enable fadd/fsub with zero folds
As discussed in D24815, let's start the process of killing off the broken fast-math global
state housed in TargetOptions and eliminate the need for function-level fast-math attributes.
Here we enable two similar folds that are possible when we don't care about signed-zero:
fadd nsz x, 0 --> x
fsub nsz 0, x --> -x
Note that although the test cases include a 'sin' function call, I'm side-stepping the
FMF-on-calls question (and lack of support in the DAG) for now. It's not needed for these
tests - isNegatibleForFree/GetNegatedExpression just look through a ISD::FSIN node.
Also, when we create an FNEG node and propagate the Flags of the FSUB to it, this doesn't
actually do anything today because Flags are silently dropped for any node that is not a
binary operator.
John Brawn [Fri, 21 Oct 2016 11:08:48 +0000 (11:08 +0000)]
[LoopUnroll] Keep the loop test only on the first iteration of max-or-zero loops
When we have a loop with a known upper bound on the number of iterations, and
furthermore know that either the number of iterations will be either exactly
that upper bound or zero, then we can fully unroll up to that upper bound
keeping only the first loop test to check for the zero iteration case.
Most of the work here is in plumbing this 'max-or-zero' information from the
part of scalar evolution where it's detected through to loop unrolling. I've
also gone for the safe default of 'false' everywhere but howManyLessThans which
could probably be improved.