Matthew Simpson [Wed, 14 Sep 2016 14:47:40 +0000 (14:47 +0000)]
[LV] Process pointer IVs with PHINodes in collectLoopUniforms
This patch moves the processing of pointer induction variables in
collectLoopUniforms from the consecutive pointer phase of the analysis to the
phi node phase. Previously, if a pointer induction variable was used by both a
scalarized non-memory instruction as well as a vectorized memory instruction,
we would incorrectly identify the pointer as uniform. Pointer induction
variables should be treated the same as other phi nodes. That is, they are
uniform if all users of the induction variable and induction variable update
are uniform.
James Molloy [Wed, 14 Sep 2016 14:47:27 +0000 (14:47 +0000)]
[ARM] Promote small global constants to constant pools
If a constant is unamed_addr and is only used within one function, we can save
on the code size and runtime cost of an indirection by changing the global's storage
to inside the constant pool. For example, instead of:
MCInstrDesc: this fixes an issue setting/getting member Flags, which
is an uint64_t. However, getter function getFlags returned an unsigned,
and in function hasProperty (1 << MCFlag) was used instead of (1ULL << MCFlag).
Fix code-gen crash on Power9 for insert_vector_elt with variable index (PR30189)
This patch corresponds to review:
https://reviews.llvm.org/D24021
In the initial implementation of this instruction, I forgot to account for
variable indices. This patch fixes PR30189 and should probably be merged into
3.9.1 (I'll open a bug according to the new instructions).
Adding missing directive for Power9.
There is currently no codegen for Power9 that depends on the directive
so this is NFC for now but will be important in the future. This was
missed in r268950 so I'm adding it now.
Kuba Brecka [Wed, 14 Sep 2016 14:06:33 +0000 (14:06 +0000)]
[asan] Enable -asan-use-private-alias on Darwin/Mach-O, add test for ODR false positive with LTO (llvm part)
The '-asan-use-private-alias’ option (disabled by default) option is currently only enabled for Linux and ELF, but it also works on Darwin and Mach-O. This option also fixes a known problem with LTO on Darwin (https://github.com/google/sanitizers/issues/647). This patch enables the support for Darwin (but still keeps it off by default) and adds the LTO test case.
Wei Mi [Wed, 14 Sep 2016 04:39:50 +0000 (04:39 +0000)]
Create a getelementptr instead of sub expr for ValueOffsetPair if the
value is a pointer.
This patch is to fix PR30213. When expanding an expr based on ValueOffsetPair,
if the value is of pointer type, we can only create a getelementptr instead
of sub expr.
Ensure Polly linking works without BUILD_SHARED_LIBS
This change ensures all necessary symbols are resolved correctly. Before this
change on some systems, the linker may have eliminated some symbols not directly
used in bugpoint, but used in Polly.
Suggested-by: Michael Kruse <lvm@meinersbur.de>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281438 91177308-0d34-0410-b5e6-96231b3b80d8
[sanitizer-coverage] add yet another flavour of coverage instrumentation: trace-pc-guard. The intent is to eventually replace all of {bool coverage, 8bit-counters, trace-pc} with just this one. LLVM part
[ObjCARC] Traverse chain downwards to replace uses of argument passed to
ObjC library call with call return.
ARC contraction tries to replace uses of an argument passed to an
objective-c library call with the call return value. For example, in the
following IR, it replaces uses of argument %9 and uses of the values
discovered traversing the chain upwards (%7 and %8) with the call return
%10, if they are dominated by the call to @objc_autoreleaseReturnValue.
This transformation enables code-gen to tail-call the call to
@objc_autoreleaseReturnValue, which is necessary to enable auto release
return value optimization.
ARC contraction no longer does the optimization described above since it
only traverses the chain upwards and fails to recognize that the
function return can be replaced by the call return. This commit changes
ARC contraction to traverse the chain downwards too and replace uses of
bitcasts with the call return.
Summary: When expanding mul in type legalization make sure the type for shift amount can actually fit the value. This fixes PR30354 https://llvm.org/bugs/show_bug.cgi?id=30354.
[DAG] Allow build-to-shuffle combine to combine builds from two wide vectors.
This allows us to, in some cases, create a vector_shuffle out of a build_vector, when
the inputs to the build are extract_elements from two different vectors, at least one
of which is wider than the output. (E.g. a <8 x i16> being constructed out of
elements from a <16 x i16> and a <8 x i16>).
Kevin Enderby [Tue, 13 Sep 2016 21:42:28 +0000 (21:42 +0000)]
Next set of additional error checks for invalid Mach-O files for bad load commands
that use the Mach::dyld_info_command type for the load commands that are
currently use in the MachOObjectFile constructor.
This contains the missing checks for LC_DYLD_INFO and
LC_DYLD_INFO_ONLY load commands and the fields for the
Mach::dyld_info_command type.
AArch64: Cleanup tailcall CC check, enable swiftcc.
Cleanup/change the code that checks for possible tailcall conventions to
look the same as the one in the X86 target. This makes the distinction
between calling conventions that can guarnatee tailcalls and the ones
that may tailcall more obvious.
- Add Swift to the mayTailCall list
- PreserveMost seemed to be incorrectly part of the guarnteed tail call
list, move it to the mayTailCall list.
Simon Pilgrim [Tue, 13 Sep 2016 18:33:29 +0000 (18:33 +0000)]
[DAGCombiner] Use APInt directly in (shl (zext (srl x, C)), C) combine range test
To avoid assertion, we must ensure that the inner shift constant is within range before calling ConstantSDNode::getZExtValue(). We already know that the outer shift constant is in range.
Andrea Di Biagio [Tue, 13 Sep 2016 14:50:47 +0000 (14:50 +0000)]
[ConstantFold] Improve the bitcast folding logic for constant vectors.
The constant folder didn't know how to always fold bitcasts of constant integer
vectors. In particular, it was unable to handle the case where a constant vector
had some undef elements, and the resulting (i.e. bitcasted) vector type had more
elements than the original vector type.
Example:
%cast = bitcast <2 x i64><i64 undef, i64 2> to <4 x i32>
On a little endian target, %cast could have been folded to:
<4 x i32><i32 undef, i32 undef, i32 2, i32 0>
This patch improves the folding logic by teaching how to correctly propagate
undef elements in the folded vector.
Nirav Dave [Tue, 13 Sep 2016 13:55:06 +0000 (13:55 +0000)]
Defer asm errors to post-statement failure
Recommitting after fixing AsmParser Initialization.
Allow errors to be deferred and emitted as part of clean up to simplify
and shorten Assembly parser code. This will allow error messages to be
emitted in helper functions and be modified by the caller which has
better context.
As part of this many minor cleanups to the Parser:
* Unify parser cleanup on error
* Add Workaround for incorrect return values in ParseDirective instances
* Tighten checks on error-signifying return values for parser functions
and fix in-tree TargetParsers to be more consistent with the changes.
* Fix AArch64 test cases checking for spurious error messages that are
now fixed.
These changes should be backwards compatible with current Target Parsers
so long as the error status are correctly returned in appropriate
functions.
Andrea Di Biagio [Tue, 13 Sep 2016 13:17:42 +0000 (13:17 +0000)]
[InstSimplify] Add tests to show missed bitcast folding opportunities.
InstSimplify doesn't always know how to fold a bitcast of a constant vector.
In particular, the logic in InstSimplify doesn't know how to handle the case
where the constant vector in input contains some undef elements, and the
number of elements is smaller than the number of elements of the bitcast
vector type.
James Molloy [Tue, 13 Sep 2016 12:45:51 +0000 (12:45 +0000)]
Revert "[ARM] Promote small global constants to constant pools"
This reverts commit r281314. Speculatively revert as it's possible this caused linker errors: http://lab.llvm.org:8011/builders/clang-native-arm-lnt/builds/19656
Pablo Barrio [Tue, 13 Sep 2016 12:18:15 +0000 (12:18 +0000)]
[ARM] Add ".code 32" to functions in the ARM instruction set
Before, only Thumb functions were marked as ".code 16". These
".code x" directives are effective until the next directive of its
kind is encountered. Therefore, in code with interleaved ARM and
Thumb functions, it was possible to declare a function as ARM and
end up with a Thumb function after assembly. A test has been added.
An existing test has also been fixed to take this change into
account.
James Molloy [Tue, 13 Sep 2016 12:12:32 +0000 (12:12 +0000)]
[Thumb] Teach ISel how to lower compares of AND bitmasks efficiently
For the common pattern (CMPZ (AND x, #bitmask), #0), we can do some more efficient instruction selection if the bitmask is one consecutive sequence of set bits (32 - clz(bm) - ctz(bm) == popcount(bm)).
1) If the bitmask touches the LSB, then we can remove all the upper bits and set the flags by doing one LSLS.
2) If the bitmask touches the MSB, then we can remove all the lower bits and set the flags with one LSRS.
3) If the bitmask has popcount == 1 (only one set bit), we can shift that bit into the sign bit with one LSLS and change the condition query from NE/EQ to MI/PL (we could also implement this by shifting into the carry bit and branching on BCC/BCS).
4) Otherwise, we can emit a sequence of LSLS+LSRS to remove the upper and lower zero bits of the mask.
1-3 require only one 16-bit instruction and can elide the CMP. 4 requires two 16-bit instructions but can elide the CMP and doesn't require materializing a complex immediate, so is also a win.
Sam Parker [Tue, 13 Sep 2016 12:10:14 +0000 (12:10 +0000)]
Enable simplify libcalls for ARM PCS
Teach SimplifyLibcalls that in can treat functions annotated with
apcs, aapcs or aapcs_vfp like normal C functions if they only take
and return integer or pointer values, and the target is not iOS.
Ying Yi [Tue, 13 Sep 2016 11:28:31 +0000 (11:28 +0000)]
[llvm-cov] - Included footer "Generated by llvm-cov -- llvm version <version number>" in the coverage report.
The llvm-cov version information will be useful to the user when comparing the code coverage across different versions of llvm-cov. This patch provides the llvm-cov version information in the generated coverage report.
Peter Smith [Tue, 13 Sep 2016 11:15:51 +0000 (11:15 +0000)]
[ARM] Support ldr.w in pseudo instruction ldr rd,=immediate
The changes made in r269352, r269353 and r269354 to support the
transformation of the ldr rd,=immediate to mov introduced a regression
from 3.8 (ldr.w rd, =immediate) not supported.
This change puts support back in for ldr.w by means of a t2InstAlias for
the .w form. The .w is ignored in ARM state and propagated to the ldr in
Thumb2.
James Molloy [Tue, 13 Sep 2016 10:28:11 +0000 (10:28 +0000)]
[ARM] Promote small global constants to constant pools
If a constant is unamed_addr and is only used within one function, we can save
on the code size and runtime cost of an indirection by changing the global's storage
to inside the constant pool. For example, instead of:
Revert of r281304 as it is causing build bot failures in hexagon
hwloop regression tests. These tests pass locally; will be investigating
where these differences come from.
This adds a new field isAdd to MCInstrDesc. The ARM and Hexagon instruction
descriptions now tag add instructions, and the Hexagon backend is using this to
identify loop induction statements.
AVX-512: Fix for PR28175 - Scalar code optimization.
Optimized (truncate (assertzext x) to i1) and anyext i1 to i8/16/32.
Optimization of this patterns is a one more step towards i1 optimization on AVX-512.
[AArch64] Support stackmap/patchpoint in getInstSizeInBytes
We currently return 4 for stackmaps and patchpoints, which is very optimistic
and can in rare cases cause the branch relaxation pass to fail to relax certain
branches.
This patch causes getInstSizeInBytes to return a pessimistic estimate of the
size as the number of bytes requested in the stackmap/patchpoint. In the future,
we could provide a more accurate estimate by sharing some of the logic in
AArch64::LowerSTACKMAP/PATCHPOINT.
Fixes part of https://llvm.org/bugs/show_bug.cgi?id=28750
[X86] Remove masked shufpd/shufps intrinsics and autoupgrade to native vector shuffles. They were removed from clang previously but accidentally left in the backend.
This should allow users of the library to get a range to iterate through
all the subcommands that are registered to the global parser. This
allows users to define subcommands in libraries that self-register to
have dispatch done at a different stage (like main). It allows for
writing code like the following:
for (auto *S : cl::getRegisteredSubcommands()) {
if (*S) {
// Dispatch on S->getName().
}
}
This change also contains tests that show this usage pattern.
DebugInfo: New metadata representation for global variables.
This patch reverses the edge from DIGlobalVariable to GlobalVariable.
This will allow us to more easily preserve debug info metadata when
manipulating global variables.
Fixes PR30362. A program for upgrading test cases is attached to that
bug.
Hans Wennborg [Tue, 13 Sep 2016 00:21:32 +0000 (00:21 +0000)]
X86: Conditional tail calls should not have isBarrier = 1
That confuses e.g. machine basic block placement, which then doesn't
realize that control can fall through a block that ends with a conditional
tail call. Instead, isBranch=1 should be set.
Philip Reames [Mon, 12 Sep 2016 22:38:44 +0000 (22:38 +0000)]
[LVI] Complete the abstract of the cache layer [NFCI]
Convert the previous introduced is-a relationship between the LVICache and LVIImple clases into a has-a relationship and hide all the implementation details of the cache from the lazy query layer.
The only slightly concerning change here is removing the addition of a queried block into the SeenBlock set in LVIImpl::getBlockValue. As far as I can tell, this was effectively dead code. I think it *used* to be the case that getCachedValueInfo wasn't const and might end up inserting elements in the cache during lookup. That's no longer true and hasn't been for a while. I did fixup the const usage to make that more obvious.
Philip Reames [Mon, 12 Sep 2016 21:46:58 +0000 (21:46 +0000)]
[LVI] Abstract out the actual cache logic [NFCI]
Seperate the caching logic from the implementation of the lazy analysis. For the moment, the lazy analysis impl has a is-a relationship with the cache; this will change to a has-a relationship shortly. This was done as two steps merely to keep the changes simple and the diff understandable.
Lang Hames [Mon, 12 Sep 2016 20:34:41 +0000 (20:34 +0000)]
[ORC] Replace the serialize/deserialize function pair with a SerializationTraits
class.
SerializationTraits provides serialize and deserialize methods corresponding to
the earlier functions, but also provides a name for the type. In future, this
name will be used to render function signatures as strings, which will in turn
be used to negotiate and verify API support between RPC clients and servers.
Summary: If consecutive select instructions are lowered separately in CGP, it will introduce redundant condition check and branches that cannot be removed by later optimization phases. This patch lowers all consecutive select instructions at the same to to avoid inefficent code as demonstrated in https://llvm.org/bugs/show_bug.cgi?id=29095
Nirav Dave [Mon, 12 Sep 2016 20:03:02 +0000 (20:03 +0000)]
[MC] Defer asm errors to post-statement failure
Allow errors to be deferred and emitted as part of clean up to simplify
and shorten Assembly parser code. This will allow error messages to be
emitted in helper functions and be modified by the caller which has
better context.
As part of this many minor cleanups to the Parser:
* Unify parser cleanup on error
* Add Workaround for incorrect return values in ParseDirective instances
* Tighten checks on error-signifying return values for parser functions
and fix in-tree TargetParsers to be more consistent with the changes.
* Fix AArch64 test cases checking for spurious error messages that are
now fixed.
These changes should be backwards compatible with current Target Parsers
so long as the error status are correctly returned in appropriate
functions.