This is a version of D32090 that unifies all of the
`getInstrProf*SectionName` helper functions. (Note: the build failures
which D32090 would have addressed were fixed with r300352.)
We should unify these helper functions because they are hard to use in
their current form. E.g we recently introduced more helpers to fix
section naming for COFF files. This scheme doesn't totally succeed at
hiding low-level details about section naming, so we should switch to an
API that is easier to maintain.
This is not an NFC commit because it fixes llvm-cov's testing support
for COFF files (this falls out of the API change naturally). This is an
area where we lack tests -- I will see about adding one as a follow up.
[InstCombine] MakeAnd/Or/Xor handling to reuse previous APInt computations
When checking if we should return a constant, we create some temporary APInts to see if we know all bits. But the exact computations we do are needed in several other locations in the same code.
This patch moves them to named temporaries so we can reuse them.
Ideally we'd write directly to KnownZero/One, but we currently seem to only write those variables after all the simplifications checks and I didn't want to change that with this patch.
[InstCombine] (X != C1 && X != C2) --> (X | (C1 ^ C2)) != C2
...when C1 differs from C2 by one bit and C1 <u C2:
http://rise4fun.com/Alive/Vuo
And move related folds to a helper function. This reduces code duplication and
will make it easier to remove the scalar-only restriction as a follow-up step.
[InstCombine] Support folding a subtract with a constant LHS into a phi node
We currently only support folding a subtract into a select but not a PHI. This fixes that.
I had to fix an assumption in FoldOpIntoPhi that assumed the PHI node was always in operand 0. Now we pass it in like we do for FoldOpIntoSelect. But we still require some dancing to find the Constant when we create the BinOp or ConstantExpr. This is based code is similar to what we do for selects.
Since I touched all call sites, this also renames FoldOpIntoPhi to foldOpIntoPhi to match coding standards.
[ValueTracking] Avoid undefined behavior in unittest by not making a named ArrayRef from a std::initializer_list
One of the ValueTracking unittests creates a named ArrayRef initialized by a std::initializer_list. The underlying array for an std::initializer_list is only guaranteed to have a lifetime as long as the initializer_list object itself. So this can leave the ArrayRef pointing at an array that no long exists.
This fixes this to just create an explicit array instead of an ArrayRef.
[InstCombine] Refactor SimplifyUsingDistributiveLaws to more explicitly skip code when LHS/RHS aren't BinaryOperators
Currently this code always makes 2 or 3 calls to tryFactorization regardless of whether the LHS/RHS are BinaryOperators. We make 3 calls when both operands are BinaryOperators with the same opcode. Or surprisingly, when neither are BinaryOperators. This is because getBinOpsForFactorization returns Instruction::BinaryOpsEnd when the operand is not a BinaryOperator. If both LHS and RHS are not BinaryOperators then they both have an Opcode of Instruction::BinaryOpsEnd. When this happens we rely on tryFactorization to early out due to A/B/C/D being null. Similar behavior occurs for the other calls, we rely on getBinOpsForFactorization having made A/B or C/D null to get tryFactorization to early out.
We also rely on these null checks to check the result of getIdentityValue and early out for it.
This patches refactors this to pull these checks up to SimplifyUsingDistributiveLaws so we don't rely on BinaryOpsEnd as a sentinel or this A/B/C/D null behavior. I think this makes this code easier to reason about. Should also give a tiny performance improvement for cases where the LHS or RHS isn't a BinaryOperator.
Sanjoy Das [Fri, 14 Apr 2017 17:42:08 +0000 (17:42 +0000)]
Make SCEVRewriteVisitor smarter about when it trys to create SCEVs
This change really saves just one foldingset lookup, but makes
SCEVRewriteVisitor "feature compatible" with the handwritten logic in
ScalarEvolutionNormalization, so that I can change
ScalarEvolutionNormalization to use SCEVRewriteVisitor in a next step.
This is a non-functional change, but _may_ improve performance in some
pathological cases, but that's unlikely.
Sanjoy Das [Fri, 14 Apr 2017 16:47:15 +0000 (16:47 +0000)]
Remove "#if 0"ed out assert
It won't compile after the recent changes I've made, and I think
keeping it in provides very little value.
Instead I've added (in an earlier commit) a C++ unit test to check the
Denormalize(Normalized(X)) == X property for specific instances of X,
which is what the assert was trying to do anyway.
Sanjoy Das [Fri, 14 Apr 2017 16:47:12 +0000 (16:47 +0000)]
Delete some unnecessary boilerplate
The PostIncTransform class was not pulling its weight, so delete it
and use free functions instead.
This also makes the use of `function_ref` more idiomatic. We were
storing an instance of function_ref in the PostIncTransform class
before, which was fine in that specific case, but the usage after this
change is more obviously okay.
Simon Pilgrim [Fri, 14 Apr 2017 15:05:35 +0000 (15:05 +0000)]
[X86][SSE] Update MOVNTDQA non-temporal loads to generic implementation (LLVM)
MOVNTDQA non-temporal aligned vector loads can be correctly represented using generic builtin loads, allowing us to remove the existing x86 intrinsics.
Gil Rapaport [Fri, 14 Apr 2017 07:30:23 +0000 (07:30 +0000)]
[LV] Remove implicit single basic block assumption
This patch is part of D28975's breakdown - no change in output intended.
LV's code currently assumes the vectorized loop is a single basic block up
until predicateInstructions() is called. This patch removes two manifestations
of this assumption (loop phi incoming values, dominator tree update) by
replacing the use of vectorLoopBody with the vectorized loop's latch/header.
[ValueTracking] Calculate the KnownZeros for Intrinsic::ctpop without using a temporary APInt to count leading zeros on.
The APInt was created from an 'unsigned' and we just wanted to know how many bits the value needed to represent it. We can just use Log2_32 from MathExtras.h to get the info.
Object, LTO: Add target triple to irsymtab and LTO API.
Start using it in LLD to avoid needing to read bitcode again just to get the
target triple, and in llvm-lto2 to avoid printing symbol table information
that is inappropriate for the target.
Lang Hames [Fri, 14 Apr 2017 00:06:12 +0000 (00:06 +0000)]
[ORC] Re-enable the Error/Expected unit tests that were disabled in r300177.
The tests were failing due to an occasional deadlock in SerializationTraits
for Error: Both serializers and deserializers were protected by a single
mutex and in the unit test (where both ends of the RPC are in the same
process) one side might obtain the mutex, then block waiting for input,
leaving the other side of the connection unable to obtain the mutex to
write the data the first side was waiting for. Splitting the mutex into
two (one for serialization, one for deserialization) appears to have fixed the
issue.
Simplify some Verifier attribute checks with AttributeSet
Now that we have a type that can represent the attributes on a single
return, function, or parameter, we can pass it around directly rather
than passing around AttributeList and Idx. Removes some more one-based
argument attribute index counting.
[bpf] Fix memory offset check for loads and stores
If the offset cannot fit into the instruction, an addition to the
pointer is emitted before the actual access. However, BPF offsets are
16-bit but LLVM considers them to be, for the matter of this check,
to be 32-bit long.
This causes the following program:
int bpf_prog1(void *ign)
{
volatile unsigned long t = 0x8983984739ull;
return *(unsigned long *)((0xffffffff8fff0002ull) + t);
- Refer to options by `-option` instead of `option`
- Use `-mtriple=` instead of `-march` in the example (-march will still
target the default operating system which is usually not what you want
in a test)
- Rephrase sentence because output does not go to stdout by default (you
need -o - for that as should be expected).
[ValueTracking] Remove duplicate call to computeKnownBits for the operands of Select.
We call it unconditionally on the operands of the select. Then decide if its a min/max and call it on the min/max operands or on the select operands again. Either of those second calls will overwrite the results of the initial call so we can just delete the first call.
Davide Italiano [Thu, 13 Apr 2017 20:36:59 +0000 (20:36 +0000)]
[LCSSA] Efficiently compute blocks dominating at least one exit.
For LCSSA purposes, loop BBs not dominating any of the exits aren't
interesting, as none of the values defined in these blocks can be
used outside the loop.
The way the code computed this information was by comparing each
BB of the loop with each of the exit blocks and ask the dominator tree
about their dominance relation. This is slow.
A more efficient way, implemented here, is that of starting from the
exit blocks and walking the dom upwards until we hit an header. By
transitivity, all the blocks we encounter in our path dominate an exit.
For the testcase provided in PR31851, this reduces compile time on
`opt -O2` by ~25%, going from 1m47s to 1m22s.
Thanks to Dan/MichaelZ for discussions/suggesting the approach/review.
Richard Smith [Thu, 13 Apr 2017 20:29:59 +0000 (20:29 +0000)]
Remove all allocation and divisions from GreatestCommonDivisor
Switch from Euclid's algorithm to Stein's algorithm for computing GCD. This
avoids the (expensive) APInt division operation in favour of bit operations.
Remove all memory allocation from within the GCD loop by tweaking our `lshr`
implementation so it can operate in-place.
SamplePGO: convert callsite samples map key from callsite_location to callsite_location+callee_name
Summary: For iterative SamplePGO, an indirect call can be speculatively promoted to multiple direct calls and get inlined. All these promoted direct calls will share the same callsite location (offset+discriminator). With the current implementation, we cannot distinguish between different promotion candidates and its inlined instance. This patch adds callee_name to the key of the callsite sample map. And added helper functions to get all inlined callee samples for a given callsite location. This helps the profile annotator promote correct targets and inline it before annotation, and ensures all indirect call targets to be annotated correctly.
[ValueTracking] Prevent a call to computeKnownBits if we already know the state of the bit we would calculate. Also reuse a temporary APInt instead of creating a new one.
Anna Thomas [Thu, 13 Apr 2017 18:59:25 +0000 (18:59 +0000)]
[LV] Fix the vector code generation for first order recurrence
Summary:
In first order recurrences where phi's are used outside the loop,
we should generate an additional vector.extract of the second last element from
the vectorized phi update.
This is because we require the phi itself (which is the value at the second last
iteration of the vector loop) and not the phi's update within the loop.
Also fix the code gen when we just unroll, but don't vectorize.
Fixes PR32396.
[InstCombine] fold X == 0 || X == -1 to one compare (PR32524)
This is effectively a retry of:
https://reviews.llvm.org/rL299851
but now we have tests and an assert to make sure the bug
that was exposed with that attempt will not happen again.
I'll fix the code duplication and missing sibling fold next,
but I want to make this change as small as possible to reduce
risk since I messed it up last time.
This should fix:
https://bugs.llvm.org/show_bug.cgi?id=32524
[ArgPromotion] Don't drop !prof metadata on promoted calls
Noticed by inspection while doing attribute work. DAE, InstCombineCalls,
and ArgPromotion have a fair amount of duplicated code for hacking on
call sites, and you can find bugs by comparing them.
[InstCombine] use similar ops for related folds; NFCI
It's less efficient to produce 'ule' than 'ult' since we know we're going to
canonicalize to 'ult', but we shouldn't have duplicated code for these folds.
As a trade-off, this was a pretty terrible way to make a '2'. :)
if (LHSC == SubOne(RHSC))
AddC = ConstantExpr::getSub(AddOne(RHSC), LHSC);
The next steps are to share the code to fix PR32524 and add the missing 'and'
fold that was left out when PR14708 was fixed:
https://bugs.llvm.org/show_bug.cgi?id=14708
Brian Gesiak [Thu, 13 Apr 2017 16:44:25 +0000 (16:44 +0000)]
[Analysis] Support bitreverse in -demanded-bits pass
Summary:
* Add a bitreverse case in the demanded bits analysis pass.
* Add tests for the bitreverse (and bswap) intrinsic in the
demanded bits pass.
* Add a test case to the BDCE tests: that manipulations to
high-order bits are eliminated once the bits are reversed
and then right-shifted.
LTO: Pass SF_Executable flag through to InputFile::Symbol
Summary:
The linker needs to be able to determine whether a symbol is text or data to
handle the case of a common being overridden by a strong definition in an
archive. If the archive contains a text member of the same name as the common,
that function is discarded. However, if the archive contains a data member of
the same name, that strong definition overrides the common. This is a behavior
of ld.bfd, which the Qualcomm linker also supports in LTO.
Here's a test case to illustrate:
####
cat > 1.c << \!
int blah;
!
cat > 2.c << \!
int blah() {
return 0;
}
!
cat > 3.c << \!
int blah = 20;
!
clang -c 1.c
clang -c 2.c
clang -c 3.c
ar cr lib.a 2.o 3.o
ld 1.o lib.a -t
####
The correct output is:
1.o
(lib.a)3.o
Thanks to Shankar Easwaran and Hemant Kulkarni for the test case!