Kyle Butt [Tue, 4 Oct 2016 00:00:09 +0000 (00:00 +0000)]
Codegen: Tail-duplicate during placement.
The tail duplication pass uses an assumed layout when making duplication
decisions. This is fine, but passes up duplication opportunities that
may arise when blocks are outlined. Because we want the updated CFG to
affect subsequent placement decisions, this change must occur during
placement.
In order to achieve this goal, TailDuplicationPass is split into a
utility class, TailDuplicator, and the pass itself. The pass delegates
nearly everything to the TailDuplicator object, except for looping over
the blocks in a function. This allows the same code to be used for tail
duplication in both places.
This change, in concert with outlining optional branches, allows
triangle shaped code to perform much better, esepecially when the
taken/untaken branches are correlated, as it creates a second spine when
the tests are small enough.
[MSSA] Allow unittests to use BasicAA when building.
We now build MemorySSA in its ctor, instead of waiting until the user
calls MemorySSA::getWalker. This silently changed our unittests, since
we add BasicAA to AAResults *after* constructing MemorySSA (...but
before calling MemorySSA::getWalker).
None of them broke because we do most of our "did this get optimized
correctly?" tests in .ll files.
Chris Bieneman [Mon, 3 Oct 2016 22:12:42 +0000 (22:12 +0000)]
[lit] Use argparse instead of optparse
Summary:
optparse is deprecated in Python 2.7, which is the minimum version of
Python required to run the LLVM test suite. Replace its usage in lit
with argparse, optparse's 2.7 replacement module.
argparse has several benefits over optparse, but this commit does not
make use of those benefits yet. Instead, it simply uses the new API,
and attempts to keep the number of changes to a minimum.
Confirmed that lit's test suite, as well as LLVM's regression test suite,
still pass with these changes.
Matthias Braun [Mon, 3 Oct 2016 22:12:37 +0000 (22:12 +0000)]
TargetMachine: Make the win32-macho workaround more specific.
This is to avoid problems with win32 + ELF which surprisingly happens a
lot in practice: If a user just specifies -march on the commandline the
object format changes along with the architecture to ELF in many
instances while the OS stays with the default/host OS.
Matthias Braun [Mon, 3 Oct 2016 21:58:20 +0000 (21:58 +0000)]
Set some tests to an unknown vendor and OS
This avoids llc using the hosts OS/vendor as defaults and triggering
unwanted behaviour in the tests. This should deal with the buildbot
breakages on windows after r283140.
Each shadow only represents data flow that is restricted to its reaching
def. Propagating more than that could lead to spurious register liveness,
resulting in extra (incorrectly) block live-ins.
Matthias Braun [Mon, 3 Oct 2016 20:11:24 +0000 (20:11 +0000)]
X86: Do not produce GOT relocations on windows
Windows has no GOT relocations the way elf/darwin has. Some people use
x86_64-pc-win32-macho to build EFI firmware; Do not produce GOT
relocations for this target.
Dehao Chen [Mon, 3 Oct 2016 18:52:08 +0000 (18:52 +0000)]
Refactor LICM pass in preparation for LoopSink pass.
Summary: LoopSink pass uses some common function in LICM. This patch refactor the LICM code to make it usable by LoopSink pass (https://reviews.llvm.org/D22778).
Sanjay Patel [Mon, 3 Oct 2016 16:38:27 +0000 (16:38 +0000)]
[x86, SSE/AVX] allow 128/256-bit lowering for copysign vector intrinsics (PR30433)
This should fix:
https://llvm.org/bugs/show_bug.cgi?id=30433
There are a couple of open questions about the codegen:
1. Should we let scalar ops be scalars and avoid vector constant loads/splats?
2. Should we have a pass to combine constants such as the inverted pair that we have here?
If the llvm. prefix is dropped other parts of llvm don't see this as
an intrinsic. This means that the number of regular symbols depends
on the context the module is loaded into, which causes LTO to abort.
Nirav Dave [Mon, 3 Oct 2016 13:48:27 +0000 (13:48 +0000)]
Prevent out of order HashDirective lexing in AsmLexer.
Retrying after buildbot reset.
To lex hash directives we peek ahead to find component tokens, create a
unified token, and unlex the peeked tokens so the parser does not need
to parse the tokens then. Make sure we do not to lex another hash
directive during peek operation.
Sjoerd Meijer [Mon, 3 Oct 2016 10:12:32 +0000 (10:12 +0000)]
[ARM] Code size optimisation to lower udiv+urem to udiv+mls instead of a
library call to __aeabi_uidivmod. This is an improved implementation of
r280808, see also D24133, that got reverted because isel was stuck in a loop.
That was caused by the optimisation incorrectly triggering on i64 ints, which
shouldn't happen because there is no 64bit hwdiv support; that put isel's type
legalization and this optimisation in a loop. A native ARM compiler and testing
now shows that this is fixed.
Alexey Bataev [Mon, 3 Oct 2016 07:47:01 +0000 (07:47 +0000)]
[CodeGen] Adding a test showing the current state of poor code gen of
search loop, by Andrey Tischenko
PR27136 shows failure to hoist constant out of loop. This test is used
as start point to fix the failure: it shows the current state of codegen
and discovers what should be fixed
Chris Bieneman [Mon, 3 Oct 2016 04:48:22 +0000 (04:48 +0000)]
[lit] Throw in unimplemented method (NFC)
Summary:
lit's `OneCommandFileTest` class implements an abstract method that
raises if called. However, it raises by referencing an undefined
symbol. Instead, raise explicitly by throwing a `NotImplementedError`.
This is clearer, and appeases Python linters.
Chris Bieneman [Mon, 3 Oct 2016 04:48:13 +0000 (04:48 +0000)]
[lit] Compare to None using identity, not equality
Summary:
In Python, `None` is a singleton, so checking whether a variable is
`None` may be done with `is` or `is not`. This has a slight advantage
over equiality comparisons `== None` and `!= None`, since `__eq__` may
be overridden in Python to produce sometimes unexpected results.
Using `is None` and `is not None` is also recommended practice in
https://www.python.org/dev/peps/pep-0008:
> Comparisons to singletons like `None` should always be done with `is` or
> `is not`, never the equality operators.
Hal Finkel [Mon, 3 Oct 2016 04:06:44 +0000 (04:06 +0000)]
[PowerPC] Account for the ELFv2 function prologue during branch selection
The PPC branch-selection pass, which performs branch relaxation, needs to
account for the padding that might be introduced to satisfy block alignment
requirements. We were assuming that the first block was at offset zero (i.e.
had the alignment of the function itself), but under the ELFv2 ABI, a global
entry function prologue is added to the first block, and it is a
two-instruction sequence (i.e. eight-bytes long). If the function has 16-byte
alignment, the fact that the first block is eight bytes offset from the start
of the function is relevant to calculating where padding will be added in
between later blocks.
Craig Topper [Mon, 3 Oct 2016 02:00:29 +0000 (02:00 +0000)]
[X86] Mark all sizes of (V)MOVUPD as trivially rematerializable.
I don't know for sure that we truly needs this, but its the only vector load that isn't rematerializable. Making it consistent allows it to not be a special case in the td files.
Sanjoy Das [Sun, 2 Oct 2016 20:59:05 +0000 (20:59 +0000)]
[ConstantRange] Make getEquivalentICmp smarter
This change teaches getEquivalentICmp to be smarter about generating
ICMP_NE and ICMP_EQ predicates.
An earlier version of this change was landed as rL283057 which had a
use-after-free bug. This new version has a fix for that bug, and a (C++
unittests/) test case that would have triggered it rL283057.
To allow broadcast loads of a non-zero'th vector element, lowerVectorShuffleAsBroadcast can replace a load with a new load with an adjusted address, but unfortunately we weren't ensuring that the new load respected the same dependencies.
This patch adds a TokenFactor and updates all dependencies of the old load to reference the new load instead.
Craig Topper [Sun, 2 Oct 2016 06:13:43 +0000 (06:13 +0000)]
[X86] Don't set i64 ADDC/ADDE/SUBC/SUBE as Custom if the target isn't 64-bit. This way we don't have to catch them and do nothing with them in ReplaceNodeResults.
Hal Finkel [Sun, 2 Oct 2016 02:10:20 +0000 (02:10 +0000)]
[PowerPC] Refactor soft-float support, and enable PPC64 soft float
This change enables soft-float for PowerPC64, and also makes soft-float disable
all vector instruction sets for both 32-bit and 64-bit modes. This latter part
is necessary because the PPC backend canonicalizes many Altivec vector types to
floating-point types, and so soft-float breaks scalarization support for many
operations. Both for embedded targets and for operating-system kernels desiring
soft-float support, it seems reasonable that disabling hardware floating-point
also disables vector instructions (embedded targets without hardware floating
point support are unlikely to have Altivec, etc. and operating system kernels
desiring not to use floating-point registers to lower syscall cost are unlikely
to want to use vector registers either). If someone needs this to work, we'll
need to change the fact that we promote many Altivec operations to act on
v4f32. To make it possible to disable Altivec when soft-float is enabled,
hardware floating-point support needs to be expressed as a positive feature,
like the others, and not a negative feature, because target features cannot
have dependencies on the disabling of some other feature. So +soft-float has
now become -hard-float.
Simon Pilgrim [Sat, 1 Oct 2016 14:26:11 +0000 (14:26 +0000)]
[X86][SSE] Enable commutation from MOVSD/MOVSS to BLENDPD/BLENDPS on SSE41+ targets
Instead of selecting between MOVSD/MOVSS and BLENDPD/BLENDPS at shuffle lowering by subtarget this will help us select the instruction based on actual commutation requirements.
We could possibly add BLENDPD/BLENDPS -> MOVSD/MOVSS commutation and MOVSD/MOVSS memory folding using a similar approach if it proves useful
I avoided adding AVX512 handling as I'm not sure when we should be making use of VBLENDPD/VBLENDPS on EVEX targets
Michal Gorny [Sat, 1 Oct 2016 13:15:56 +0000 (13:15 +0000)]
Revert r283029 - [cmake] Make LIT_COMMAND configurable and improve fallback support
Revert the change in r283029 (and the fixup in r283033) due to buildbot
breakage. The fixup is ineffective for the bots that do not force clean
build since the wrong value is already cached in CMakeCache.txt.
Reverting it should result in the cache variable being removed
and therefore it should be possible to re-introduce it after all
buildbots build this revision.
Michal Gorny [Sat, 1 Oct 2016 09:28:05 +0000 (09:28 +0000)]
[cmake] Make LIT_COMMAND configurable and improve fallback support
Make LIT_COMMAND configurable, use source tree only when actually
available and extend the default search to other common executable names
'lit.py' and 'lit', in order to increase uniformity between all LLVM
projects and support using installed lit.
Changing the conditional used to determine whether in-tree or external
lit is being used covers the case when LLVM_MAIN_SRC_DIR is defined but
does not exist (anymore). In this case, the functions falls back to
looking for installed lit rather than attempting to use a non-existing
path. The same conditional is used in clang already.
Making LIT_COMMAND a cache variable in case the source tree variant is
used serves two purposes. Firstly, it increases uniformity between
the two branches since find_program() implicitly makes LIT_COMMAND
a cache variable. Secondly, it allows overriding the lit executable used
to run the tests when the LLVM source tree is provided. Gentoo is
planning to use this to use installed (and byte-compiled) lit instead of
re-compiling it in every LLVM project.
Extending default search is meant to increase uniformity between
different LLVM projects. The 'lit.py' name is already used by a few of
them, and 'lit' is the name used by utils/lit/setup.py when installing.
Michal Gorny [Sat, 1 Oct 2016 09:26:23 +0000 (09:26 +0000)]
[OCaml] Install .mli (interface) files
Install the OCaml interface .mli files. Those files were most likely
omitted because they are input files for the compiled .cmi files.
However, installing them is reasonable since -- unlike .cmi files --
they are human-readable.
It got disconnected during the cmake conversion. For Miscompilation.cpp,
it was purely advisory for the user and the ToolRunner.cpp version was
trying to compensate for libs and bins in the same directory, which
hasn't been the case for a very long time.
Craig Topper [Sat, 1 Oct 2016 07:11:24 +0000 (07:11 +0000)]
[X86] Cleanup patterns for using VMOVDDUP for broadcasts.
-Remove OptForSize. Not all of the backend follows the same rules for creating broadcasts and there is no conflicting pattern.
-Don't stop selecting VEX VMOVDDUP when AVX512 is supported. We need VLX for EVEX VMOVDDUP.
-Only use VMOVDDUP for v2i64 broadcasts if AVX2 is not supported.