[PowerPC] Sign extend sub-word values for atomic comparisons
Atomic comparison instructions use the sub-word load instruction on
Power8 and up but the value is not sign extended prior to the signed word
compare instruction. This patch adds that sign extension.
Speculative fix for build failures due to consumeInteger.
A recent patch added support for consumeInteger() and made
getAsInteger delegate to this function. A few buildbots are
failing as a result with an assertion failure. On a hunch,
I tested what happens if I call getAsInteger() on an empty
string, and sure enough it crashes the same way that the
buildbots are crashing.
I confirmed that getAsInteger() on an empty string did not
crash before my patch, so I suspect this to be the cause.
Sebastian Pop [Thu, 22 Sep 2016 15:33:51 +0000 (15:33 +0000)]
GVN-hoist: fix store past load dependence analysis (PR30216)
To hoist stores past loads, we used to search for potential
conflicting loads on the hoisting path by following a MemorySSA
def-def link from the store to be hoisted to the previous
defining memory access, and from there we followed the def-use
chains to all the uses that occur on the hoisting path. The
problem is that the def-def link may point to a store that does
not alias with the store to be hoisted, and so the loads that are
walked may not alias with the store to be hoisted, and even as in
the testcase of PR30216, the loads that may alias with the store
to be hoisted are not visited.
The current patch visits all loads on the path from the store to
be hoisted to the hoisting position and uses the alias analysis
to ask whether the store may alias the load. I was not able to
use the MemorySSA functionality to ask for whether load and
store are clobbered: I'm not sure which function to call, so I
used a call to AA->isNoAlias().
Store past store is still working as before using a MemorySSA
query: I added an extra test to pr30216.ll to make sure store
past store does not regress.
StringRef::getInteger() exists and treats the entire string as
an integer of the specified radix, failing if any invalid characters
are encountered or the number overflows.
Sometimes you might have something like "123456foo" and you want
to get the number 123456 and leave the string "foo" remaining.
This is similar to what would be possible by using the standard
runtime library functions strtoul et al and specifying an end
pointer.
This patch adds consumeInteger(), which does exactly that. It
consumes as much as possible until an invalid character is found,
and modifies the StringRef in place so that upon return only
the portion of the StringRef after the number remains.
Sebastian Pop [Thu, 22 Sep 2016 14:45:40 +0000 (14:45 +0000)]
GVN-hoist: only hoist relevant scalar instructions
Without this patch, GVN-hoist would think that a branch instruction is a scalar instruction
and would try to value number it. The patch filters out all such kind of irrelevant instructions.
A bit frustrating is that there is no easy way to discard all those very infrequent instructions,
a bit like isa<TerminatorInst> that stands for a large family of instructions. I'm thinking that
checking for those very infrequent other instructions would cost us more in compilation time
than just letting those instructions getting numbered, so I'm still thinking that a simpler check:
if (isa<TerminatorInst>(I))
return false;
is better than listing all the other less frequent instructions.
Keith Walker [Thu, 22 Sep 2016 14:13:25 +0000 (14:13 +0000)]
Reapplying r281895 (and follow-up r281964) after fixing pr30468.
The additional fix is:
When adding debug information to a lowered phi node in mem2reg
check that we have a valid insertion point after the phi for adding
the debug information.
This change addresses the issue in pr30468 where a lowered phi was
added before a catchswitch and no debug information should be added
after the phi in this case.
[PowerPC] Remove LE patterns matching generic stores/loads to VSX permuting ops
This patch corresponds to:
https://reviews.llvm.org/D21409
The LXVD2X, LXVW4X, STXVD2X and STXVW4X instructions permute the two doublewords
in the vector register when in little-endian mode. Custom code ensures that the
necessary swaps are inserted for these. This patch simply removes the possibilty
that a load/store node will match one of these instructions in the SDAG as that
would not insert the necessary swaps.
[Power9] Add exploitation of non-permuting memory ops
This patch corresponds to review:
https://reviews.llvm.org/D19825
The new lxvx/stxvx instructions do not require the swaps to line the elements
up correctly. In order to select them over the lxvd2x/lxvw4x instructions which
require swaps, the patterns for the old instruction have a predicate that
ensures they won't be selected on Power9 and newer CPUs.
[AVX-512] Add support for commuting VPTERNLOG instructions.
VPTERNLOG is a ternary instruction with an immediate specifying the logical operation to perform. For each bit position in the 3 source vectors the bit from each source is concatenated together and the resulting 3-bit value is used to select a bit in the immediate. This bit value is written to the result vector.
We can commute this by swapping operands and modifying the immediate. To modify the immediate we need to swap two pairs of bits. The pairs correspond to the locations in the immediate where the commuted operands bits have opposite values and the uncommuted operand has the same value. Bits 0 and 7 will never be swapped since the relevant bits from all sources are the same value.
This refactors and reuses parts of the FMA3 commuting code which is also a three operand instruction.
[RegisterBankInfo] Move to statically allocated RegisterBank.
This commit is basically the first step toward what will
RegisterBankInfo look when it gets TableGen'ed.
It introduces a XXXGenRegisterBankInfo.def file that is what TableGen
will issue at some point. Moreover, the RegBanks field in
RegisterBankInfo changed to reflect the static (compile time) aspect of
the information.
[RegisterBankInfo] Take advantage of the extra argument of SmallVector::resize.
When initializing an instance of OperandsMapper, instead of using
SmallVector::resize followed by std::fill, use the function that
directly does that in SmallVector.
Kevin Enderby [Wed, 21 Sep 2016 20:03:09 +0000 (20:03 +0000)]
Next set of additional error checks for invalid Mach-O files for bad LC_UUID
load commands. Added a missing check and made the check for more than
one like other other “more than one” checks. And of course added test cases.
Chad Rosier [Wed, 21 Sep 2016 19:16:47 +0000 (19:16 +0000)]
[LoopInterchange] Track all dependencies, not just anti dependencies.
Currently, we give up on loop interchange if we encounter a flow dependency
anywhere in the loop list. Worse yet, we don't even track output dependencies.
This patch updates the dependency matrix computation to track flow and output
dependencies in the same way we track anti dependencies.
This improves an internal workload by 2.2x.
Note the loop interchange pass is off by default and it can be enabled with
'-mllvm -enable-loopinterchange'
Teresa Johnson [Wed, 21 Sep 2016 19:12:05 +0000 (19:12 +0000)]
[ThinLTO] Emit files for distributed builds for all modules
With the new LTO API in r278338, we stopped emitting the individual
index files and imports files for some modules in the distributed backend
case (thinlto-index-only plugin option).
Specifically, this is when the linker decides not to include a module in the
link, because it was in an archive library and did not have a strong
reference to it. Not creating the expected output files makes the
distributed build system implementation more difficult, in terms of
checking for the expected outputs of the thin link, and scheduling the
backend jobs. To address this, the gold-plugin will write dummy empty
.thinlto.bc and .imports files for modules not included in the link
(which LTO never sees).
Augmented a gold v1.12+ test, since that version of gold has the handling
for notifying on modules not being included in the link.
Matthew Simpson [Wed, 21 Sep 2016 16:50:24 +0000 (16:50 +0000)]
[LV] Don't emit unused scalars for uniform instructions
If we identify an instruction as uniform after vectorization, we know that we
should only use the value corresponding to the first vector lane of each unroll
iteration. However, when scalarizing such instructions, we still produce values
for the other vector lanes. This patch prevents us from generating the unused
scalars.
Tim Northover [Wed, 21 Sep 2016 12:57:45 +0000 (12:57 +0000)]
GlobalISel: produce correct code for signext/zeroext ABI flags.
We still don't really have an equivalent of "AssertXExt" in DAG, so we don't
exploit the guarantees on the receiving side yet, but this should produce
conservatively correct code on iOS ABIs.
Tim Northover [Wed, 21 Sep 2016 12:57:35 +0000 (12:57 +0000)]
GlobalISel: pass Function to lowerFormalArguments directly (NFC).
The only implementation that exists immediately looks it up anyway, and the
information is needed to handle various parameter attributes (stored on the
function itself).
The postRA scheduler performs alias analysis to determine if stores and loads
can moved past each other. When a function has more arguments than argument
registers for the calling convention used, excess arguments are spilled onto the
stack. LLVM by default assumes that argument slots are immutable, unless the
function contains a tail call. Without the knowledge of that a function contains
a tail call site, stores and loads to fixed stack slots may be re-ordered
causing the out-going arguments to clobber the incoming arguments before the
incoming arguments are supposed to be dead.
Lei Liu [Wed, 21 Sep 2016 07:41:41 +0000 (07:41 +0000)]
AArch64: Set shift bit of TLSLE HI12 add instruction
Summary: AArch64 LLVM assembler emits add instruction without shift bit to calculate the higher 12-bit address of TLS variables in local exec model. This generates wrong code sequence to access TLS variables with thread offset larger than 0x1000.
[AVX-512] Split the 3 different usages of the X86ISD::FSETCC opcode into 3 different opcodes.
It turns out isel is really not robust against having different type profiles for the same opcode. It turns out that if you put an illegal rounding mode(i.e. not CUR_DIRECTION or NO_EXC) on a comiss intrinsic we would generate the FSETCC form with the rounding mode added, but then pattern match to an instruction with ROUND_CUR_DIRECTION.
We can probably get away with just one FSETCCM opcode that always contains the rounding mode and explicitly put ROUND_CUR_DIRECTION in the pattern, but I'll leave that for future work.
With this change the clang tests for the comiss intrinsics that used an incorrect rounding mode of 3 properly fail isel instead of silently doing the wrong thing. Those clang tests will be fixed in a follow up commit and I also plan to add rounding mode checking to clang.
[AVX-512] Don't add an additional rounding mode operand to the avx512 vcvtps2ph intrinsic lowering.
There was no way to control its value so it was always FROUND_CURRENT making it unnecessary. The true rounding mode is encoded in the immediate operand of the instruction.
This also removes the pattern from the rb form of the instructions since there is no way to specify the FROUND_NO_EXC rounding mode it required.
[AVX-512] Simplify handling of INTR_TYPE_1OP_MASK_RM to remove support for the second opcode since its never used. This makes it consistent with INTR_TYPE_2OP_MASK_RM and INTR_TYPE_3OP_MASK_RM.
And even if it was used we were passing the same operands to both so it wouldn't make sense to have two opcodes.
[AVX-512] Don't lower avx512 vcvtps2ph/vcvtph2ps nodes to ISD::FP16_TO_FP/ISD::FP_TO_FP16 with an extra x86 specific rounding mode operand. We should use a target specific ISD opcode.
Jacques Pienaar [Wed, 21 Sep 2016 01:57:57 +0000 (01:57 +0000)]
[NVPTX] Check if callsite is defined when computing argument allignment
Summary: In getArgumentAlignment check if the ImmutableCallSite pointer CS is non-null before dereferencing. If CS is 0x0 fall back to the ABI type alignment else compute the alignment as before.
Teresa Johnson [Tue, 20 Sep 2016 23:07:17 +0000 (23:07 +0000)]
[ThinLTO] Always emit a summary when compiling in ThinLTO mode
Summary:
Emit an empty summary section, instead of no summary section, when
there are no global variables in the index. This ensures that LTO
will treat these files as ThinLTO inputs, instead of as regular
LTO inputs.
In addition to not being what the user likely intended when
compiling with -flto=thin, the current behavior is problematic for
distributed build systems that expect to get ThinLTO index and imports
files back for each input compiled with -flto=thin. Combining into
a single regular LTO module also reduces the backend parallelism.
And in the case where the index was suppressed due to uses in
inline assembly, combining into a single LTO module could provoke
renaming of duplicates that we were trying to prevent by suppressing
the index.
This change required a couple of fixes to handle the empty summary
section.
Eric Christopher [Tue, 20 Sep 2016 22:03:28 +0000 (22:03 +0000)]
Revert "Remove extra argument used once on
TargetMachine::getNameWithPrefix and inline the result into the singular
caller." and "Remove more guts of TargetMachine::getNameWithPrefix and
migrate one check to the TLOF mach-o version." temporarily until I can
get the whole call migrated out of the TargetMachine as we could hit
places where TLOF isn't valid.
Anna Thomas [Tue, 20 Sep 2016 21:36:02 +0000 (21:36 +0000)]
[RS4GC] Refactor code for Rematerializing in presence of phi. NFC
Summary:
This is an NFC refactoring change as a precursor to the actual fix for rematerializing in
presence of phi.
https://reviews.llvm.org/D24399
Pasted from review:
findRematerializableChainToBasePointer changed to return the root of the
chain. instead of true or false.
move the PHI matching logic into the caller by inspecting the root return value.
This includes an assertion that the alternate root is in the liveset for the
call.
[CodeGen] stop short-circuiting the SSP code for sspstrong.
This check caused us to skip adding layout information for calls to
alloca in sspreq/sspstrong mode. We check properly for sspstrong later
on (and add the correct layout info when doing so), so removing this
shouldn't hurt.
No test is included, since testing this using lit seems to require
checking for exact offsets in asm, which is something that the lit tests
for this avoid. If someone cares deeply, I'm happy to write a unittest
or something to cover this, but that feels like overkill.
Petr Hosek [Tue, 20 Sep 2016 20:21:13 +0000 (20:21 +0000)]
Mark ELF sections whose name start with .note as note
Previously, such section would be marked as SHT_PROGBITS which
makes it impossible to use an initialized C variable declaration
to emit an (allocated) ELF note. The new behavior is also consistent
with ELF assembly parser.
Kevin Enderby [Tue, 20 Sep 2016 20:14:14 +0000 (20:14 +0000)]
Next set of additional error checks for invalid Mach-O files for bad load commands
that use the Mach::dylib_command type for the load commands that are
currently used in the MachOObjectFile constructor.
This contains the missing checks for LC_ID_DYLIB, LC_ID_DYLIB, etc.
load commands and the fields for the Mach::dylib_command type.
Also checks that only an MH_DYLIB or MH_STUB_DYLIB has an
LC_ID_DYLIB load command (and others filetype don’t) and there
is not more than one of these load commands.
Revert part of "AArch64: Do not test for CPUs, use SubtargetFeatures"
This reverts part of commit 119e358d9635c8d1f3e7aee67e3ea3b8a62f8db6 by
removing FeatureUseRSqrt et al per request by Eric Christopher
<echristo@gmail.com> (v. http://bit.ly/2cmz6kW).
Adrian Prantl [Tue, 20 Sep 2016 18:28:42 +0000 (18:28 +0000)]
ASAN: Don't drop debug info attachements for global variables.
This is a follow-up to r281284. Global Variables now can have
!dbg attachements, so ASAN should clone these when generating a
sanitized copy of a global variable.
Adrian McCarthy [Tue, 20 Sep 2016 17:20:51 +0000 (17:20 +0000)]
Emit S_COMPILE3 CodeView record
CodeView has an S_COMPILE3 record to identify the compiler and source language of the compiland. This record comes first in the debug$S section for the compiland. The debuggers rely on this record to know the source language of the code.
There was a little test fallout from introducing a new record into the symbols subsection.
We would assert that the FP setup CFI used esp/rsp always. This held up in
practice when the code was generated from IR. However, with the integrated
assembler, it is possible to have the input be user specified assembly. In such
a case, we cannot assume that the function implementation has a compact unwind
representation. Loosen the assertion into a check and bail if we cannot
represent the frame pointer in the compact unwinding.
Tim Northover [Tue, 20 Sep 2016 15:20:36 +0000 (15:20 +0000)]
GlobalISel: split aggregates for PCS lowering
This should match the existing behaviour for passing complicated struct and
array types, in particular HFAs come through like that from Clang.
For C & C++ we still need to somehow support all the weird ABI flags, or at
least those that are present in the IR (signext, byval, ...), and stack-based
parameter passing.
Keith Walker [Tue, 20 Sep 2016 10:36:17 +0000 (10:36 +0000)]
Make llvm::ConvertDebugDeclareToDebugValue() be a void function (NFC)
The routines llvm::ConvertDebugDeclareToDebugValue() always returned
a true value which was never checked at the call site; change the
function return type to void.
This NFC cleanup was approved in the review https://reviews.llvm.org/D23715
Nikolay Haustov [Tue, 20 Sep 2016 09:04:51 +0000 (09:04 +0000)]
AMDGPU: Improve documentation.
Summary:
Add links to ISA manuals and ABI.
Add text about assembler syntax.
Add info about instructions operands.
Add instruction examples for each encoding.
Update directives section, add missing .amdgpu_hsa_kernel.