Dehao Chen [Mon, 3 Oct 2016 18:52:08 +0000 (18:52 +0000)]
Refactor LICM pass in preparation for LoopSink pass.
Summary: LoopSink pass uses some common function in LICM. This patch refactor the LICM code to make it usable by LoopSink pass (https://reviews.llvm.org/D22778).
Sanjay Patel [Mon, 3 Oct 2016 16:38:27 +0000 (16:38 +0000)]
[x86, SSE/AVX] allow 128/256-bit lowering for copysign vector intrinsics (PR30433)
This should fix:
https://llvm.org/bugs/show_bug.cgi?id=30433
There are a couple of open questions about the codegen:
1. Should we let scalar ops be scalars and avoid vector constant loads/splats?
2. Should we have a pass to combine constants such as the inverted pair that we have here?
If the llvm. prefix is dropped other parts of llvm don't see this as
an intrinsic. This means that the number of regular symbols depends
on the context the module is loaded into, which causes LTO to abort.
Nirav Dave [Mon, 3 Oct 2016 13:48:27 +0000 (13:48 +0000)]
Prevent out of order HashDirective lexing in AsmLexer.
Retrying after buildbot reset.
To lex hash directives we peek ahead to find component tokens, create a
unified token, and unlex the peeked tokens so the parser does not need
to parse the tokens then. Make sure we do not to lex another hash
directive during peek operation.
Sjoerd Meijer [Mon, 3 Oct 2016 10:12:32 +0000 (10:12 +0000)]
[ARM] Code size optimisation to lower udiv+urem to udiv+mls instead of a
library call to __aeabi_uidivmod. This is an improved implementation of
r280808, see also D24133, that got reverted because isel was stuck in a loop.
That was caused by the optimisation incorrectly triggering on i64 ints, which
shouldn't happen because there is no 64bit hwdiv support; that put isel's type
legalization and this optimisation in a loop. A native ARM compiler and testing
now shows that this is fixed.
Alexey Bataev [Mon, 3 Oct 2016 07:47:01 +0000 (07:47 +0000)]
[CodeGen] Adding a test showing the current state of poor code gen of
search loop, by Andrey Tischenko
PR27136 shows failure to hoist constant out of loop. This test is used
as start point to fix the failure: it shows the current state of codegen
and discovers what should be fixed
Chris Bieneman [Mon, 3 Oct 2016 04:48:22 +0000 (04:48 +0000)]
[lit] Throw in unimplemented method (NFC)
Summary:
lit's `OneCommandFileTest` class implements an abstract method that
raises if called. However, it raises by referencing an undefined
symbol. Instead, raise explicitly by throwing a `NotImplementedError`.
This is clearer, and appeases Python linters.
Chris Bieneman [Mon, 3 Oct 2016 04:48:13 +0000 (04:48 +0000)]
[lit] Compare to None using identity, not equality
Summary:
In Python, `None` is a singleton, so checking whether a variable is
`None` may be done with `is` or `is not`. This has a slight advantage
over equiality comparisons `== None` and `!= None`, since `__eq__` may
be overridden in Python to produce sometimes unexpected results.
Using `is None` and `is not None` is also recommended practice in
https://www.python.org/dev/peps/pep-0008:
> Comparisons to singletons like `None` should always be done with `is` or
> `is not`, never the equality operators.
Hal Finkel [Mon, 3 Oct 2016 04:06:44 +0000 (04:06 +0000)]
[PowerPC] Account for the ELFv2 function prologue during branch selection
The PPC branch-selection pass, which performs branch relaxation, needs to
account for the padding that might be introduced to satisfy block alignment
requirements. We were assuming that the first block was at offset zero (i.e.
had the alignment of the function itself), but under the ELFv2 ABI, a global
entry function prologue is added to the first block, and it is a
two-instruction sequence (i.e. eight-bytes long). If the function has 16-byte
alignment, the fact that the first block is eight bytes offset from the start
of the function is relevant to calculating where padding will be added in
between later blocks.
Craig Topper [Mon, 3 Oct 2016 02:00:29 +0000 (02:00 +0000)]
[X86] Mark all sizes of (V)MOVUPD as trivially rematerializable.
I don't know for sure that we truly needs this, but its the only vector load that isn't rematerializable. Making it consistent allows it to not be a special case in the td files.
Sanjoy Das [Sun, 2 Oct 2016 20:59:05 +0000 (20:59 +0000)]
[ConstantRange] Make getEquivalentICmp smarter
This change teaches getEquivalentICmp to be smarter about generating
ICMP_NE and ICMP_EQ predicates.
An earlier version of this change was landed as rL283057 which had a
use-after-free bug. This new version has a fix for that bug, and a (C++
unittests/) test case that would have triggered it rL283057.
To allow broadcast loads of a non-zero'th vector element, lowerVectorShuffleAsBroadcast can replace a load with a new load with an adjusted address, but unfortunately we weren't ensuring that the new load respected the same dependencies.
This patch adds a TokenFactor and updates all dependencies of the old load to reference the new load instead.
Craig Topper [Sun, 2 Oct 2016 06:13:43 +0000 (06:13 +0000)]
[X86] Don't set i64 ADDC/ADDE/SUBC/SUBE as Custom if the target isn't 64-bit. This way we don't have to catch them and do nothing with them in ReplaceNodeResults.
Hal Finkel [Sun, 2 Oct 2016 02:10:20 +0000 (02:10 +0000)]
[PowerPC] Refactor soft-float support, and enable PPC64 soft float
This change enables soft-float for PowerPC64, and also makes soft-float disable
all vector instruction sets for both 32-bit and 64-bit modes. This latter part
is necessary because the PPC backend canonicalizes many Altivec vector types to
floating-point types, and so soft-float breaks scalarization support for many
operations. Both for embedded targets and for operating-system kernels desiring
soft-float support, it seems reasonable that disabling hardware floating-point
also disables vector instructions (embedded targets without hardware floating
point support are unlikely to have Altivec, etc. and operating system kernels
desiring not to use floating-point registers to lower syscall cost are unlikely
to want to use vector registers either). If someone needs this to work, we'll
need to change the fact that we promote many Altivec operations to act on
v4f32. To make it possible to disable Altivec when soft-float is enabled,
hardware floating-point support needs to be expressed as a positive feature,
like the others, and not a negative feature, because target features cannot
have dependencies on the disabling of some other feature. So +soft-float has
now become -hard-float.
Simon Pilgrim [Sat, 1 Oct 2016 14:26:11 +0000 (14:26 +0000)]
[X86][SSE] Enable commutation from MOVSD/MOVSS to BLENDPD/BLENDPS on SSE41+ targets
Instead of selecting between MOVSD/MOVSS and BLENDPD/BLENDPS at shuffle lowering by subtarget this will help us select the instruction based on actual commutation requirements.
We could possibly add BLENDPD/BLENDPS -> MOVSD/MOVSS commutation and MOVSD/MOVSS memory folding using a similar approach if it proves useful
I avoided adding AVX512 handling as I'm not sure when we should be making use of VBLENDPD/VBLENDPS on EVEX targets
Michal Gorny [Sat, 1 Oct 2016 13:15:56 +0000 (13:15 +0000)]
Revert r283029 - [cmake] Make LIT_COMMAND configurable and improve fallback support
Revert the change in r283029 (and the fixup in r283033) due to buildbot
breakage. The fixup is ineffective for the bots that do not force clean
build since the wrong value is already cached in CMakeCache.txt.
Reverting it should result in the cache variable being removed
and therefore it should be possible to re-introduce it after all
buildbots build this revision.
Michal Gorny [Sat, 1 Oct 2016 09:28:05 +0000 (09:28 +0000)]
[cmake] Make LIT_COMMAND configurable and improve fallback support
Make LIT_COMMAND configurable, use source tree only when actually
available and extend the default search to other common executable names
'lit.py' and 'lit', in order to increase uniformity between all LLVM
projects and support using installed lit.
Changing the conditional used to determine whether in-tree or external
lit is being used covers the case when LLVM_MAIN_SRC_DIR is defined but
does not exist (anymore). In this case, the functions falls back to
looking for installed lit rather than attempting to use a non-existing
path. The same conditional is used in clang already.
Making LIT_COMMAND a cache variable in case the source tree variant is
used serves two purposes. Firstly, it increases uniformity between
the two branches since find_program() implicitly makes LIT_COMMAND
a cache variable. Secondly, it allows overriding the lit executable used
to run the tests when the LLVM source tree is provided. Gentoo is
planning to use this to use installed (and byte-compiled) lit instead of
re-compiling it in every LLVM project.
Extending default search is meant to increase uniformity between
different LLVM projects. The 'lit.py' name is already used by a few of
them, and 'lit' is the name used by utils/lit/setup.py when installing.
Michal Gorny [Sat, 1 Oct 2016 09:26:23 +0000 (09:26 +0000)]
[OCaml] Install .mli (interface) files
Install the OCaml interface .mli files. Those files were most likely
omitted because they are input files for the compiled .cmi files.
However, installing them is reasonable since -- unlike .cmi files --
they are human-readable.
It got disconnected during the cmake conversion. For Miscompilation.cpp,
it was purely advisory for the user and the ToolRunner.cpp version was
trying to compensate for libs and bins in the same directory, which
hasn't been the case for a very long time.
Craig Topper [Sat, 1 Oct 2016 07:11:24 +0000 (07:11 +0000)]
[X86] Cleanup patterns for using VMOVDDUP for broadcasts.
-Remove OptForSize. Not all of the backend follows the same rules for creating broadcasts and there is no conflicting pattern.
-Don't stop selecting VEX VMOVDDUP when AVX512 is supported. We need VLX for EVEX VMOVDDUP.
-Only use VMOVDDUP for v2i64 broadcasts if AVX2 is not supported.
Nirav Dave [Sat, 1 Oct 2016 00:42:32 +0000 (00:42 +0000)]
[MC] Prevent out of order HashDirective lexing in AsmLexer.
To lex hash directives we peek ahead to find component tokens, create a
unified token, and unlex the peeked tokens so the parser does not need
to parse the tokens then. Make sure we do not to lex another hash
directive during peek operation.
Mehdi Amini [Sat, 1 Oct 2016 00:05:34 +0000 (00:05 +0000)]
[ASAN] Add the binder globals on Darwin to llvm.compiler.used to avoid LTO dead-stripping
The binder is in a specific section that "reverse" the edges in a
regular dead-stripping: the binder is live as long as a global it
references is live.
This is a big hammer that prevents LLVM from dead-stripping these,
while still allowing linker dead-stripping (with special knowledge
of the section).