clang enables vectorization at optimization levels > 1 and size level < 2. opt
should behave similarily.
Loop vectorization and SLP vectorization can be disabled with the flags
-disable-(loop/slp)-vectorization.
------------------------------------------------------------------------
SLPVectorizer: An in-tree vectorized entry cannot also be a scalar external use
We were creating external uses for scalar values in MustGather entries that also
had a ScalarToTreeEntry (they also are present in a vectorized tuple). This
meant we would keep a value 'alive' as a scalar and vectorized causing havoc.
This is not necessary because when we create a MustGather vector we explicitly
create external uses entries for the insertelement instructions of the
MustGather vector elements.
[SystemZ] Fix choice of known-zero mask in insertion optimization
The backend converts 64-bit ORs into subreg moves if the upper 32 bits
of one operand and the low 32 bits of the other are known to be zero.
It then tries to peel away redundant ANDs from the upper 32 bits.
Since AND masks are canonicalized to exclude known-zero bits,
the test ORs the mask and the known-zero bits together before
checking for redundancy. The problem was that it was using the
wrong node when checking for known-zero bits, so could drop ANDs
that were still needed.
[AArch64] Implemented vcopy_lane patterns using scalar DUP instruction.
Patch by Ana Pazos!
------------------------------------------------------------------------
Bill Wendling [Mon, 2 Dec 2013 19:21:10 +0000 (19:21 +0000)]
Merging r196069:
------------------------------------------------------------------------
r196069 | alp | 2013-12-01 23:15:33 -0800 (Sun, 01 Dec 2013) | 6 lines
Update the LTO GoldPlugin documentation
* Update build instructions to reflect the current source tree layout.
* Don't inflict CVS on readers; there's a perfectly good git mirror.
* configure with --disable-werror making it possible to build using clang.
* ar and nm-new now support the -plugin option.
------------------------------------------------------------------------
Bill Wendling [Mon, 2 Dec 2013 19:20:45 +0000 (19:20 +0000)]
Merging r196100:
------------------------------------------------------------------------
r196100 | alp | 2013-12-02 06:17:47 -0800 (Mon, 02 Dec 2013) | 4 lines
Cut the gold plugin README down to size
This file hasn't been updated in years. Remove old information and point to
the current documentation at GoldPlugin.rst.
------------------------------------------------------------------------
Bill Wendling [Mon, 2 Dec 2013 19:14:12 +0000 (19:14 +0000)]
Merging r196129:
------------------------------------------------------------------------
r196129 | kkhoo | 2013-12-02 10:43:59 -0800 (Mon, 02 Dec 2013) | 1 line
Conservative fix for PR17827 - don't optimize a shift + and + compare sequence where the shift is logical unless the comparison is unsigned
------------------------------------------------------------------------
Previously, we clobbered callee-saved registers when folding an "add
sp, #N" into a "pop {rD, ...}" instruction. This change checks whether
a register we're going to add to the "pop" could actually be live
outside the function before doing so and should fix the issue.
This should fix PR18081.
------------------------------------------------------------------------
Review of this commit by Matheus Almeida revealed that it is still possible to
emit invalid code (when the offset is not a multiple of the element size).
However, we agreed that this commit still represents an improvement since it
fixes many cases that previously emitted invalid code, and does not cause any
cases that previously emitted valid code to emit invalid code.
Daniel Sanders [Sun, 1 Dec 2013 10:45:26 +0000 (10:45 +0000)]
Merged from r195975 and r195976.
------------------------------------------------------------------------
r195975 | zjovanovic | 2013-11-30 19:12:28 +0000 (Sat, 30 Nov 2013) | 1 line
Fixed issue with microMIPS long branch.
------------------------------------------------------------------------
r195976 | zjovanovic | 2013-11-30 19:13:15 +0000 (Sat, 30 Nov 2013) | 1 line
Test case for issue with microMIPS long branch.
------------------------------------------------------------------------
To expand on those commit messages:
The immediate in a MIPS branch is multiplied by the instruction size before use
as an offset. For many MIPS ISA's this is 4 bytes, but for microMIPS it is 2
bytes. This commit corrects the scale factor used for microMIPS so that attempts
to use large offsets result in a valid sequence of instructions.
Bill Wendling [Sun, 1 Dec 2013 04:40:32 +0000 (04:40 +0000)]
--- Reverse-merging r195823 into '.':
U lib/MC/MCSectionCOFF.cpp
U lib/CodeGen/TargetLoweringObjectFileImpl.cpp
U test/MC/COFF/weak-symbol.ll
U test/MC/COFF/tricky-names.ll
G .
--- Recording mergeinfo for reverse merge of r195823 into '.':
G .
AArch64: The pattern match should check the range of the immediate value.
Or we can generate some illegal instructions.
E.g. shrn2 v0.4s, v1.2d, #35. The legal range should be in [1, 16].
ARM integrated assembler generates incorrect nop opcode
This patch fixes a bug in the assembler that was causing bad code to
be emitted. When switching modes in an assembly file (e.g. arm to
thumb mode) we would always emit the opcode from the original mode.
This shows that we have actually emitted an arm nop (e320f000)
instead of a thumb nop. Unfortunately, this encodes to a thumb
branch which causes bad things to happen when compiling assembly
code with align directives.
The fix is to notify the ARMAsmBackend when we switch mode. The
MCMachOStreamer was already doing this correctly. This patch makes
the same change for the MCElfStreamer.
There is still a bug in the way nops are emitted for alignment
because the MCAlignment fragment does not store the correct mode.
The ARMAsmBackend will emit nops for the last mode it knew about. In
the example above, we still generate an arm nop if we add a `.code
32` to the end of the file.
[AArch64] Add support for NEON scalar floating-point to integer convert
instructions.
------------------------------------------------------------------------
LoopVectorizer: Truncate i64 trip counts of i32 phis if necessary
In signed arithmetic we could end up with an i64 trip count for an i32 phi.
Because it is signed arithmetic we know that this is only defined if the i32
does not wrap. It is therefore safe to truncate the i64 trip count to a i32
value.
[OCaml] Embed rpath into stub libraries and native executables
This commit embeds a set of linker flags with hardcoded paths to
the LLVM shared library on --enable-shared builds into .cmxa files
and stub dynamic libraries. This solution closely follows existing
rules for rpath in the LLVM tools, which had to be modified because
of differences in toolchain.
Without this patch, OCaml tests as well as opam bindings broke,
as neither of those updates LD_LIBRARY_PATH to include
the $prefix/lib directory.
------------------------------------------------------------------------
Bill Wendling [Wed, 27 Nov 2013 19:42:48 +0000 (19:42 +0000)]
Merging r195782:
------------------------------------------------------------------------
r195782 | whitequark | 2013-11-26 12:40:34 -0800 (Tue, 26 Nov 2013) | 1 line
[OCaml] Embed the flags necessary for linking with libLLVM.so into .cmxa files
------------------------------------------------------------------------
Bill Wendling [Wed, 27 Nov 2013 06:44:18 +0000 (06:44 +0000)]
Merging r195148:
------------------------------------------------------------------------
r195148 | rafael | 2013-11-19 11:52:52 -0800 (Tue, 19 Nov 2013) | 15 lines
Support multiple COFF sections with the same name but different COMDAT.
This is the first step to fix pr17918.
It extends the .section directive a bit, inspired by what the ELF one looks
like. The problem with using linkonce is that given
.section foo
.linkonce....
.section foo
.linkonce
we would already have switched sections when getting to .linkonce. The cleanest
solution seems to be to add the comdat information in the .section itself.
------------------------------------------------------------------------
- Fix bug in (vsext (vzext x)) -> (vsext x) in SIGN_EXTEND_IN_REG
lowering where we need to check whether x is a vector type (in-reg
type) of i8, i16 or i32; otherwise, that optimization is not valid.
PR1860 - We can't save a list of ExtractElement instructions to CSE because some of these instructions
may be removed and optimized in future iterations. Instead we save a list of basic blocks that we need to CSE.
PR18060 - When we RAUW values with ExtractElement instructions in some cases
we generate PHI nodes with multiple entries from the same basic block but
with different values. Enabling CSE on ExtractElement instructions make sure
that all of the RAUWed instructions are the same.
This issue is about case of treating pointers as integers.
We treat pointers as different if they references different address space.
At the same time, we treat pointers equal to integers (with machine address
width). It was a point of false-positive. Consider next case on 32bit machine:
foo0 != foo1, while
foo1 == foo2 and foo0 == foo2.
As you can see it breaks transitivity. That means that result depends on order
of how functions are presented in module. Next order causes merging of foo0
and foo1: foo2, foo0, foo1
First foo0 will be merged with foo2, foo0 will be erased. Second foo1 will be
merged with foo2.
Depending on order, things could be merged we don't expect to.
The fix:
Forbid to treat any pointer as integer, except for those, who belong to address space 0.
Fix the bugs about AArch64 Load/Store vector types and bitcast between i64 and vector types.
e.g. "%tmp = load <2 x i64>* %ptr" can't be selected.
"%tmp = bitcast i64 %in to <2 x i32>" can't be selected.
[mips][msa] Fix a corner case in performORCombine() when combining nodes into VSELECT.
Mask == ~InvMask asserts if the width of Mask and InvMask differ.
The combine isn't valid (with two exceptions, see below) if the widths differ
so test for this before testing Mask == ~InvMask.
In the specific cases of Mask=~0 and InvMask=0, as well as Mask=0 and
InvMask=~0, the combine is still valid. However, there are more appropriate
combines that could be used in these cases such as folding x & 0 to 0, or
x & ~0 to x.
[SystemZ] Fix incorrect use of RISBG for a zero-extended right shift
We would wrongly transform the testcase into the equivalent of an AND with 1.
The problem was that, when testing whether the shifted-in bits of the right
shift were significant, we used the width of the final zero-extended result
rather than the width of the shifted value.
Refactored the implementation of AArch64 NEON instruction ZIP, UZP
and TRN.
Fix a bug when mixed use of vget_high_u8() and vuzp_u8().
------------------------------------------------------------------------
Bill Wendling [Tue, 26 Nov 2013 10:46:15 +0000 (10:46 +0000)]
Merging r195679:
------------------------------------------------------------------------
r195679 | rafael | 2013-11-25 12:15:14 -0800 (Mon, 25 Nov 2013) | 12 lines
Don't use nopl in cpus that don't support it.
Patch by Mikulas Patocka. I added the test. I checked that for cpu names that
gas knows about, it also doesn't generate nopl.
The modified cpus:
i686 - there are i686-class CPUs that don't have nopl: Via c3, Transmeta
Crusoe, Microsoft VirtualBox - see
https://bbs.archlinux.org/viewtopic.php?pid=775414
k6, k6-2, k6-3, winchip-c6, winchip2 - these are 586-class CPUs
via c3 c3-2 - see https://bugs.archlinux.org/task/19733 as a proof that
Via c3 and c3-Nehemiah don't have nopl
------------------------------------------------------------------------
Bill Wendling [Mon, 25 Nov 2013 18:34:26 +0000 (18:34 +0000)]
Merging r195379:
------------------------------------------------------------------------
r195379 | hans | 2013-11-21 14:47:21 -0800 (Thu, 21 Nov 2013) | 7 lines
CMake: Some changes to package version names:
- Allow overriding PACKAGE_VERSION from the command-line
- Use PACKAGE_VERSION to set CPACK_PACKAGE_VERSION (used by the Win installer)
- Don't include the version number in the CPack install dir or registry key.
Fixed tryFoldToZero() for vector types that need expansion.
Summary:
Moved the requirement for SelectionDAG::getConstant() to return legally
typed nodes slightly earlier. There were two optional DAGCombine passes
that were missed out and were required to produce type-legal DAGs.
Simplified a code-path in tryFoldToZero() to use SelectionDAG::getConstant().
This provides support for both promoted and expanded vector types whereas the
previous code only supported promoted vector types.
Fixes a "Type for zero vector elements is not legal" assertion detected by
an llvm-stress generated test.
Fixed a bug about disassembling AArch64 post-index load/store single element instructions.
ie. echo "0x00 0x04 0x80 0x0d" | ../bin/llvm-mc -triple=aarch64 -mattr=+neon -disassemble
echo "0x00 0x00 0x80 0x0d" | ../bin/llvm-mc -triple=aarch64 -mattr=+neon -disassemble
will be disassembled into the same instruction st1 {v0b}[0], [x0], x0.
Migrate metadata information from scalar to vector instructions during
SLP vectorization. Based on the code in BBVectorizer.
Fixes PR17741.
Patch by Raul Silvera, reviewed by Hal and Nadav. Reformatted by my
driving of clang-format. =]
------------------------------------------------------------------------
Bill Wendling [Mon, 25 Nov 2013 05:23:29 +0000 (05:23 +0000)]
Merging r195479:
------------------------------------------------------------------------
r195479 | hans | 2013-11-22 10:25:43 -0800 (Fri, 22 Nov 2013) | 4 lines
VS integration: use the correct registry key after r195379
I changed the registry key in that commit, but forgot to update
the integration files. This change makes them use the same variable.
------------------------------------------------------------------------
StructurizeCFG: Fix verification failure with some loops.
If the beginning of the loop was also the entry block
of the function, branches were inserted to the entry block
which isn't allowed. If this occurs, create a new dummy
function entry block that branches to the start of the loop.
------------------------------------------------------------------------
Teach ISel not to optimize 'optnone' functions (revised).
Improvements over r195317:
- Set/restore EnableFastISel flag instead of just running FastISel within
SelectAllBasicBlocks; the flag is checked in various places, and
FastISel won't run properly if those places don't do the right thing.
- Test looks for normal ISel versus FastISel behavior, and not
something more subtle that doesn't work everywhere.
Bill Wendling [Mon, 25 Nov 2013 05:20:58 +0000 (05:20 +0000)]
Merging r195477:
------------------------------------------------------------------------
r195477 | rafael | 2013-11-22 09:58:12 -0800 (Fri, 22 Nov 2013) | 13 lines
Add a fixed version of r195470 back.
The fix is simply to use CurI instead of I when handling aliases to
avoid accessing a invalid iterator.
original message:
Convert linkonce* to weak* instead of strong.
Also refactor the logic into a helper function. This is an important improve
on mingw where the linker complains about mixed weak and strong symbols.
Converting to weak ensures that the symbol is not dropped, but keeps in a
comdat, making the linker happy.
------------------------------------------------------------------------
- When simplifying the mask generation for BLEND, check whether that mask is
also consumed by other non-BLEND insns. If true, skip that simplification.
Fix a Cygwin build failure caused by enum values starting with '_', which is conflicted with some platform macros.
This patch only renames variables, no functional change.