Dehao Chen [Thu, 19 Nov 2015 19:53:05 +0000 (19:53 +0000)]
Reimplement discriminator assignment algorithm.
Summary: The new algorithm is more efficient (O(n), n is number of basic blocks). And it is guaranteed to cover all cases of multiple BB mapped to same line.
Jun Bum Lim [Thu, 19 Nov 2015 18:41:27 +0000 (18:41 +0000)]
[AArch64] Refactoring aarch64-ldst-opt. NCF.
Summary :
* Rename isSmallTypeLdMerge() to isNarrowLoad().
* Rename NumSmallTypeMerged to NumNarrowTypePromoted.
* Use Subtarget defined as a member variable.
James Molloy [Thu, 19 Nov 2015 18:04:33 +0000 (18:04 +0000)]
[GlobalOpt] Localize some globals that have non-instruction users
We currently bail out of global localization if the global has non-instruction users. However, often these can be simple bitcasts or constant-GEPs, which we can easily turn into instructions before localizing. Be a bit more aggressive.
Jun Bum Lim [Thu, 19 Nov 2015 17:21:41 +0000 (17:21 +0000)]
[AArch64]Extend merging narrow loads into a wider load
This change extends r251438 to handle more narrow load promotions
including byte type, unscaled, and signed. For example, this change will
convert :
ldursh w1, [x0, #-2]
ldurh w2, [x0, #-4]
into
ldur w2, [x0, #-4]
asr w1, w2, #16
and w2, w2, #0xffff
Sanjay Patel [Thu, 19 Nov 2015 16:37:10 +0000 (16:37 +0000)]
[CGP] despeculate expensive cttz/ctlz intrinsics
This is another step towards allowing SimplifyCFG to speculate harder, but then have
CGP clean things up if the target doesn't like it.
Previous patches in this series:
http://reviews.llvm.org/D12882
http://reviews.llvm.org/D13297
D13297 should catch most expensive ops, but speculation of cttz/ctlz requires special
handling because of weirdness in the intrinsic definition for handling a zero input
(that definition can probably be blamed on x86).
For example, if we have the usual speculated-by-select expensive op pattern like this:
This unfortunately may lead to poorer codegen (see the changes in the existing x86 test),
but if we increase speculation in SimplifyCFG (the next step in this patch series), then
we should avoid those kinds of cases in the first place.
The need for this patch was originally mentioned here:
http://reviews.llvm.org/D7506
with follow-up here:
http://reviews.llvm.org/D7554
Diego Novillo [Thu, 19 Nov 2015 15:33:08 +0000 (15:33 +0000)]
SamplePGO - Sort samples by source location when emitting as text.
When dumping function samples or writing them out as text format, it
helps if the samples are emitted sorted by source location. The sorting
of the maps is a bit slow, so we only do it on demand.
AVX-512: Fixed COPY_TO_REGCLASS for mask registers
Copying one mask register to another under BW should be done with kmovq instruction, otherwise we can loose some bits.
Copying 8 bits under DQ may be done with kmovb.
Simon Pilgrim [Thu, 19 Nov 2015 12:18:37 +0000 (12:18 +0000)]
[X86][AVX] Fix lowering of X86ISD::VZEXT_MOVL for 128-bit -> 256-bit extension
The lowering patterns for X86ISD::VZEXT_MOVL for 128-bit to 256-bit vectors were just copying the lower xmm instead of actually masking off the first scalar using a blend.
Alexey Bataev [Thu, 19 Nov 2015 11:44:35 +0000 (11:44 +0000)]
Alternative to long nops for X86 CPUs, by Andrey Turetsky
Make X86AsmBackend generate smarter nops instead of a bunch of 0x90 for code alignment for CPUs which don't support long nop instructions.
Differential Revision: http://reviews.llvm.org/D14178
Dan Liew [Thu, 19 Nov 2015 11:35:42 +0000 (11:35 +0000)]
[lit] Fix bug when using Python3 where a failing test would not show
the script when running a ShTest with an external or internal shell.
This bug is caused by use of the ``map`` function in Python 3 which
returns an iterable (rather than a list in Python 2). After the iterable
is exhausted it won't return any more output and consequently when
``_runShTest()`` tries to access the ``script`` which has already been
iterated over it is empty. Converting to a list immediatley after
calling ``map()`` fixes this.
This fixes the ``tests/shtest-format.py`` test when running under
Python3 which was previously failing.
James Molloy [Thu, 19 Nov 2015 08:49:57 +0000 (08:49 +0000)]
[FunctionAttrs] Provide a mechanism for adding function attributes from the command line
This provides a way to force a function to have certain attributes from the command line. This can be useful when debugging or doing workload exploration, where manually editing IR is tedious or not possible (due to build systems etc).
The syntax is -force-attribute=function_name:attribute_name
All function attributes are parsed except alignstack as it requires an argument.
Pointers in Masked Load, Store, Gather, Scatter intrinsics
The masked intrinsics support all integer and floating point data types. I added the pointer type to this list.
Added tests for CodeGen and for Loop Vectorizer.
Updated the Language Reference.
Pete Cooper [Thu, 19 Nov 2015 05:56:52 +0000 (05:56 +0000)]
Revert "Change memcpy/memset/memmove to have dest and source alignments."
This reverts commit r253511.
This likely broke the bots in
http://lab.llvm.org:8011/builders/clang-ppc64-elf-linux2/builds/20202
http://bb.pgr.jp/builders/clang-3stage-i686-linux/builds/3787
Weiming Zhao [Thu, 19 Nov 2015 02:45:18 +0000 (02:45 +0000)]
Fix bug 25440: GVN assertion after coercing loads
Optimizations like LoadPRE in GVN will insert new instructions.
If the insertion point is in a already processed BB, they should
get a value number explicitly. If the insertion point is after
current instruction, then just leave it. However, current GVN framework
has no support for it.
In this patch, we just bail out if a VN can't be found.
Reid Kleckner [Thu, 19 Nov 2015 00:05:21 +0000 (00:05 +0000)]
Don't search for third party libraries while using MSan
On the average user's system, those libraries will not be compiled with
MSan. Prior to this change, the LLVM test suite was full of false
positives from calls from third party libraries to MSan interceptors
like strlen.
We can remove this check if MSan ever grows a suppression mechanism
similar to TSan's.
Mehdi Amini [Wed, 18 Nov 2015 22:49:49 +0000 (22:49 +0000)]
Fix returned value for GVN: could return "false" even after modifying the IR
This bug would manifest in some very specific cases where all the following
conditions are fullfilled:
- GVN didn't remove block
- The regular GVN iteration didn't change the IR
- PRE is enabled
- PRE will not split critical edge
- The last instruction processed by PRE didn't change the IR
Because the CallGraph PassManager relies on this returned value to decide
if it needs to recompute a node after the execution of Function passes,
not returning the right value can lead to unexpected results.
Chris Bieneman [Wed, 18 Nov 2015 22:49:26 +0000 (22:49 +0000)]
[CMake] Support -fvisibility-inlines-hidden when LLVM_ENABLE_PIC=Off
I'm unaware of any reasons why -fvisibility-inlines-hidden would depend on PIC, and since autoconf supports this flag without PIC, we should support it in CMake too.
Pete Cooper [Wed, 18 Nov 2015 22:17:24 +0000 (22:17 +0000)]
Change memcpy/memset/memmove to have dest and source alignments.
Note, this was reviewed (and more details are in) http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html
These intrinsics currently have an explicit alignment argument which is
required to be a constant integer. It represents the alignment of the
source and dest, and so must be the minimum of those.
This change allows source and dest to each have their own alignments
by using the alignment attribute on their arguments. The alignment
argument itself is removed.
There are a few places in the code for which the code needs to be
checked by an expert as to whether using only src/dest alignment is
safe. For those places, they currently take the minimum of src/dest
alignments which matches the current behaviour.
For example, code which used to read:
call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 500, i32 8, i1 false)
will now read:
call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 8 %dest, i8* align 8 %src, i32 500, i1 false)
For out of tree owners, I was able to strip alignment from calls using sed by replacing:
(call.*llvm\.memset.*)i32\ [0-9]*\,\ i1 false\)
with:
$1i1 false)
and similarly for memmove and memcpy.
I then added back in alignment to test cases which needed it.
A similar commit will be made to clang which actually has many differences in alignment as now
IRBuilder can generate different source/dest alignments on calls.
In IRBuilder itself, a new argument was added. Instead of calling:
CreateMemCpy(Dst, Src, getInt64(Size), DstAlign, /* isVolatile */ false)
you now call
CreateMemCpy(Dst, Src, getInt64(Size), DstAlign, SrcAlign, /* isVolatile */ false)
There is a temporary class (IntegerAlignment) which takes the source alignment and rejects
implicit conversion from bool. This is to prevent isVolatile here from passing its default
parameter to the source alignment.
Note, changes in future can now be made to codegen. I didn't change anything here, but this
change should enable better memcpy code sequences.
Simon Pilgrim [Wed, 18 Nov 2015 21:17:19 +0000 (21:17 +0000)]
[DAGCombiner] Vector constant folding for comparisons
This patch adds support for vector constant folding of integer/float comparisons.
This requires FoldConstantVectorArithmetic to support scalar constant operands (in this case ISD::CONDCASE). In future we should be able to support other scalar constant types as necessary (and possibly start calling FoldConstantVectorArithmetic for all node creations)
Tim Northover [Wed, 18 Nov 2015 21:10:39 +0000 (21:10 +0000)]
ARM: make sure backend is consistent about exception handling method.
It turns out we decide whether to use SjLj exceptions or some alternative in
two separate places in the backend, and they disagreed with each other. This
led to inconsistent code and is generally a terrible idea.
So make them consistent and add an assert that they *do* match (unfortunately
MCAsmInfo isn't available in opt, so it can't be used to initialise the CodeGen
version directly).
Summary:
This change adds MathExtras helper functions for handling unsigned, saturating addition and multiplication. It also updates the instrumentation and sample profile merge implementations to use them.
Betul Buyukkurt [Wed, 18 Nov 2015 18:14:55 +0000 (18:14 +0000)]
[PGO] Value profiling support
This change introduces an instrumentation intrinsic instruction for
value profiling purposes, the lowering of the instrumentation intrinsic
and raw reader updates. The raw profile data files for llvm-profdata
testing are updated.
Matthew Simpson [Wed, 18 Nov 2015 18:03:06 +0000 (18:03 +0000)]
[Aarch64] Add cost for missing extensions.
This patch adds a cost estimate for some missing sign and zero extensions. The
costs were determined by counting the number of shift instructions generated
without context for each new extension.
Dan Gohman [Wed, 18 Nov 2015 16:12:01 +0000 (16:12 +0000)]
[WebAssembly] Enable register coloring and register stackifying.
This also takes the push/pop syntax another step forward, introducing stack
slot numbers to make it easier to see how expressions are connected. For
example, the value pushed in $push7 is popped in $pop7.
And, this begins an experiment with making get_local and set_local implicit
when an operation directly uses or defines a register. This greatly reduces
clutter. If this experiment succeeds, it may make sense to do this for
const instructions as well.
And, this introduces more special code for ARGUMENTS; hopefully this code
will soon be obviated by proper support for live-in virtual registers.
Jonas Paulsson [Wed, 18 Nov 2015 14:59:00 +0000 (14:59 +0000)]
[SelectionDAGBuilder] Make sure DemoteReg ends up in right reg-class.
The virtual register containing the address for returned value on
stack should in the DAG be represented with a CopyFromReg node and not
a Register node. Otherwise, InstrEmitter will not make sure that it
ends up in the right register class for the target instruction.
SystemZ needs this, becuause the reg class for address registers is a
subset of the general 64 bit register class.
test/SystemZ/CodeGen/args-07.ll and args-04.ll updated to run with
-verify-machineinstrs.
James Molloy [Wed, 18 Nov 2015 12:08:24 +0000 (12:08 +0000)]
[LTO] Appease buildbots take 3
This time I've found a linux box and checked it there. This test now passes.
Because I'd introduced an undefined reference in @bar, gold now returns an error. This doesn't matter for the test itself, because it also emits the remarks the test is checking for. But it does cause LIT to notice a nonzero return code which it faults on.
Rafael Espindola [Wed, 18 Nov 2015 06:52:18 +0000 (06:52 +0000)]
Default SetVector to use a DenseSet.
We use to have an odd difference among MapVector and SetVector. The map
used a DenseMop, but the set used a SmallSet, which in turn uses a
std::set.
I have changed SetVector to use a DenseSet. If you were depending on the
old behaviour you can pass an explicit set type or use SmallSetVector.
The common cases for needing to do it are:
* Optimizing for small sets.
* Sets for types not supported by DenseSet.
Sanjoy Das [Wed, 18 Nov 2015 06:23:38 +0000 (06:23 +0000)]
Teach the inliner to track deoptimization state
Summary:
This change teaches LLVM's inliner to track and suitably adjust
deoptimization state (tracked via deoptimization operand bundles) as it
inlines through call sites. The operation is described in more detail
in the LangRef changes.
Rafael Espindola [Wed, 18 Nov 2015 06:02:15 +0000 (06:02 +0000)]
Stop producing .data.rel sections.
If a section is rw, it is irrelevant if the dynamic linker will write to
it or not.
It looks like llvm implemented this because gcc was doing it. It looks
like gcc implemented this in the hope that it would put all the
relocated items close together and speed up the dynamic linker.
There are two problem with this:
* It doesn't work. Both bfd and gold will map .data.rel to .data and
concatenate the input sections in the order they are seen.
* If we want a feature like that, it can be implemented directly in the
linker since it knowns where the dynamic relocations are.
David Majnemer [Wed, 18 Nov 2015 02:49:19 +0000 (02:49 +0000)]
[llvm-objdump] Use the COFF export table for additional symbols
Most linked executables do not have a symbol table in COFF.
However, it is pretty typical to have some export entries. Use those
entries to inform the disassembler about potential function definitions
and call targets.
Cong Hou [Wed, 18 Nov 2015 01:20:37 +0000 (01:20 +0000)]
Let += and -= operators in BranchProbability have saturation behaviors.
This commit is for a later patch that is depend on it. The sum of two
branch probabilities can be greater than 1 due to rounding. It is safer
to saturate the results of sum and subtraction.
Cong Hou [Wed, 18 Nov 2015 00:52:52 +0000 (00:52 +0000)]
Improving edge probabilities computation when choosing the best successor in machine block placement.
When looking for the best successor from the outer loop for a block
belonging to an inner loop, the edge probability computation can be
improved so that edges in the inner loop are ignored. For example,
suppose we are building chains for the non-loop part of the following
code, and looking for B1's best successor. Assume the true body is very
hot, then B3 should be the best candidate. However, because of the
existence of the back edge from B1 to B0, the probability from B1 to B3
can be very small, preventing B3 to be its successor. In this patch, when
computing the probability of the edge from B1 to B3, the weight on the
back edge B1->B0 is ignored, so that B1->B3 will have 100% probability.
if (...)
do {
B0;
... // some branches
B1;
} while(...);
else
B2;
B3;
Summary:
This change adds MathExtras helper functions for handling unsigned, saturating addition and multiplication. It also updates the instrumentation and sample profile merge implementations to use them.
David Blaikie [Wed, 18 Nov 2015 00:34:10 +0000 (00:34 +0000)]
Generalize ownership/passing semantics to allow dsymutil to own abbreviations via unique_ptr
While still allowing CodeGen/AsmPrinter in llvm to own them using a bump
ptr allocator. (might be nice to replace the pointers there with
something that at least automatically calls their dtors, if that's
necessary/useful, rather than having it done explicitly (I think a typed
BumpPtrAllocator already does this, or maybe a unique_ptr with a custom
deleter, etc))
The logic for handling the pattern without a shift is identical
to the logic for handling the pattern with a shift if you set
the shift amount to zero for the former.
This should make it easier to see that we probably don't even need
optimizeIntToFloatBitCast().
If we call something like foldVecTruncToExtElt() from visitTrunc(),
we'll solve PR25543:
https://llvm.org/bugs/show_bug.cgi?id=25543
[llvm-profdata] Show hint for other mismatch errors when merging instr profdata
Missed bit of feedback from D14720.
Show the same "Make sure that all profile
data to be merged is generated from the same binary." hint for hash mismatch
and value site count mismatch as we now do for counter mismatch when merging
incompatible instrumentation profile data.