[ARM] Loop Strength Reduction crashes when targeting ARM or Thumb.
Scalar Evolution asserts when not all the operands of an Add Recurrence
Expression are loop invariants. Loop Strength Reduction should only
create affine Add Recurrences, so that both the start and the step of
the expression are loop invariants.
Craig Topper [Wed, 9 Nov 2016 07:48:51 +0000 (07:48 +0000)]
[AVX-512] Add lowering to cvttpd2udq/cvttps2udq for fptoui v2f64/2f32 to 2i32
This patch adds support for fptoui to 2i32 from both 2f64 and 2f32, building on Simon's change for the signed version in r284459 and using AVX-512 instructions.
If we don't have VLX support we need to use a 512-bit operation for v2f64->v2i32 and extract the result.
It also recognises that cvttpd2udq zeroes the upper 64-bits of the xmm result.
Craig Topper [Wed, 9 Nov 2016 07:31:32 +0000 (07:31 +0000)]
[X86] Lower AVX512 and SSE intrinsics for CVTTPD2DQ to X86ISD::CVTTPD2DQ.
Summary: This allows the SSE intrinsic to use the EVEX instruction when available. It also fixes EVEX to not use a weird (v4i32 (fp_to_sint v2f64)) node and it merges some isel patterns. This also fixes some cases that weren't combining vzmovl with cvttpd2dq to remove extra moves.
Craig Topper [Wed, 9 Nov 2016 05:31:53 +0000 (05:31 +0000)]
[AVX-512] Add test cases to demonstrate PR30947. We accidentally use 32 byte aligned store instructions when the original store was only 16 byte aligned if the store is from the lower bits of a subvector extract.
Bitcode: Remove the remnants of the BitcodeDiagnosticInfo class.
The BitcodeReader no longer produces BitcodeDiagnosticInfo diagnostics.
The only remaining reference was in the gold plugin; the code there has been
dead since we stopped producing InvalidBitcodeSignature error codes in r225562.
While at it remove the InvalidBitcodeSignature error code.
Summary:
This is the initial version of the documentation for how to use XRay as
it stands in LLVM, Clang, and compiler-rt. We leave some room for later
expansion mentioining what is work in progress and what could be
expected moving forward.
We also give a high level overview of future work that's both ongoing
and planned.
Sanjay Patel [Wed, 9 Nov 2016 00:24:44 +0000 (00:24 +0000)]
[ValueTracking] recognize obfuscated variants of umin/umax
The smallest tests that expose this are codegen tests (because SelectionDAGBuilder::visitSelect() uses matchSelectPattern
to create UMAX/UMIN nodes), but it's also possible to see the effects in IR alone with folds of min/max pairs.
If these were written as unsigned compares in IR, InstCombine canonicalizes the unsigned compares to signed compares.
Ie, running the optimizer pessimizes the codegen for this case without this patch:
Example output for raw hex bytes with ASCII with offsets:
llvm::outs() << format_hex_bytes_with_ascii(Bytes, 0x100000d10);
0x0000000100000d10: 554889e54881ec70040000488d051002 |UH.?H.?p...H....|
0x0000000100000d20: 00004c8d05fd0100004c8b0dd0020000 |..L..?...L..?...|
The default groups bytes into 4 byte groups, but this can be changed to 1 byte:
llvm::outs() << format_hex_bytes(Bytes, 0x100000d10, 16 /*NumPerLine*/, 1 /*ByteGroupSize*/);
0x0000000100000d10: 55 48 89 e5 48 81 ec 70 04 00 00 48 8d 05 10 02
0x0000000100000d20: 00 00 4c 8d 05 fd 01 00 00 4c 8b 0d d0 02 00 00
Sanjay Patel [Wed, 9 Nov 2016 00:13:11 +0000 (00:13 +0000)]
[InstCombine] fix profitability equation for max-of-nots transform
As the test change shows, we can increase the critical path by adding
a 'not' instruction, so make sure that we're actually removing an
instruction if we do this transform.
This transform could also cause us to miss folds of min/max pairs.
Zachary Turner [Tue, 8 Nov 2016 22:24:53 +0000 (22:24 +0000)]
[CodeView] Hook up CodeViewRecordIO to type serialization path.
Previously support had been added for using CodeViewRecordIO
to read (deserialize) CodeView type records. This patch adds
support for writing those same records. With this patch,
reading and writing of CodeView type records finally uses a single
codepath.
Adrian Prantl [Tue, 8 Nov 2016 22:11:38 +0000 (22:11 +0000)]
Emit the DW_AT_type for a C++ static member definition
if it is more specific than the one in its DW_AT_specification.
If a static member is an array, the translation unit containing the
member definition may have a more specific type (including its length)
than TUs only seeing the class declaration. This patch adds a
DW_AT_type to the member's DW_TAG_variable in addition to the
DW_AT_specification in these cases. The member type in the
DW_AT_specification still shows the more generic type (without the
length) to avoid defeating type uniquing.
The DWARF standard discourages “duplicating” a DW_AT_type in a member
variable definition but doesn’t explicitly forbid it. Having the more
specific type (with the array length) available is what allows the
debugger to print the contents of a static array member variable.
Teresa Johnson [Tue, 8 Nov 2016 21:53:35 +0000 (21:53 +0000)]
[ThinLTO] Prevent exporting of locals used/defined in module level asm
Summary:
This patch uses the same approach added for inline asm in r285513 to
similarly prevent promotion/renaming of locals used or defined in module
level asm.
All static global values defined in normal IR and used in module level asm
should be included on either the llvm.used or llvm.compiler.used global.
The former were already being flagged as NoRename in the summary, and
I've simply added llvm.compiler.used values to this handling.
Module level asm may also contain defs of values. We need to prevent
export of any refs to local values defined in module level asm (e.g. a
ref in normal IR), since that also requires renaming/promotion of the
local. To do that, the summary index builder looks at all values in the
module level asm string that are not marked Weak or Global, which is
exactly the set of locals that are defined. A summary is created for
each of these local defs and flagged as NoRename.
This required adding handling to the BitcodeWriter to look at GV
declarations to see if they have a summary (rather than skipping them
all).
Finally, added an assert to IRObjectFile::CollectAsmUndefinedRefs to
ensure that an MCAsmParser is available, otherwise the module asm parse
would silently fail. Initialized the asm parser in the opt tool for use
in testing this fix.
Kuba Brecka [Tue, 8 Nov 2016 21:30:41 +0000 (21:30 +0000)]
[asan] Speed up compilation of large C++ stringmaps (tons of allocas) with ASan
This addresses PR30746, <https://llvm.org/bugs/show_bug.cgi?id=30746>. The ASan pass iterates over entry-block instructions and checks each alloca whether it's in NonInstrumentedStaticAllocaVec, which is apparently slow. This patch gathers the instructions to move during visitAllocaInst.
Sanjoy Das [Tue, 8 Nov 2016 20:46:01 +0000 (20:46 +0000)]
[TBAA] Drop support for "old style" scalar TBAA tags
Summary:
We've had support for auto upgrading old style scalar TBAA access
metadata tags into the "new" struct path aware TBAA metadata for 3 years
now. The only way to actually generate old style TBAA was explicitly
through the IRBuilder API. I think this is a good time for dropping
support for old style scalar TBAA.
I'm not removing support for textual or bitcode upgrade -- if you have
IR with the old style scalar TBAA tags that go through the AsmParser orf
the bitcode parser before LLVM sees them, they will keep working as
usual.
Tim Northover [Tue, 8 Nov 2016 20:39:03 +0000 (20:39 +0000)]
GlobalISel: allow CodeGen to fallback on VReg type/class issues.
After instruction selection we perform some checks on each VReg just before
discarding the type information. These checks were assertions before, but that
breaks the fallback path so this patch moves the logic into the main flow and
reports a better error on failure.
Ulrich Weigand [Tue, 8 Nov 2016 20:18:41 +0000 (20:18 +0000)]
[SystemZ] Add missing FP extension instructions
This completes assembler / disassembler support for all BFP
instructions provided by the floating-point extensions facility.
The instructions added here are not currently used for codegen.
Ulrich Weigand [Tue, 8 Nov 2016 20:17:02 +0000 (20:17 +0000)]
[SystemZ] Add program mask and addressing mode instructions
Add several instructions that operate on the program mask
or the addressing mode. These are not really needed for
code generation under Linux, but are provided for completeness
for the assembler/disassembler.
Ulrich Weigand [Tue, 8 Nov 2016 20:15:26 +0000 (20:15 +0000)]
[SystemZ] Model access registers as LLVM registers
Add the 16 access registers as LLVM registers. This allows removing
a lot of special cases in the assembler and disassembler where we
were handling access registers; this can all just use the generic
register code now.
Also add a bunch of instructions to operate on access registers,
for assembler/disassembler use only. No change in code generation
intended.
Dan Gohman [Tue, 8 Nov 2016 19:40:38 +0000 (19:40 +0000)]
[WebAssembly] Convert stackified IMPLICIT_DEF into constant 0.
Since IMPLIFIT_DEF instructions are omitted in the output, when the output
of an IMPLICIT_DEF instruction is stackified, the resulting register lacks
an explicit push, leading to a push/pop mismatch. Fix this by converting
such IMPLICIT_DEFs into CONST_I32 0 instructions so that they have explicit
pushes.
Ahmed Bougacha [Tue, 8 Nov 2016 19:27:10 +0000 (19:27 +0000)]
[GlobalISel] Permit select() to erase.
Erasing reverse_iterators is problematic; iterate manually.
While there, keep track of the range of inserted instructions.
It can miss instructions inserted elsewhere, but those are harder
to track.
Davide Italiano [Tue, 8 Nov 2016 19:18:20 +0000 (19:18 +0000)]
[LibcallsShrinkWrap] This pass doesn't preserve the CFG.
For example, it invalidates the domtree, causing assertions
in later passes which need dominator infos. Make it preserve
GlobalsAA, as suggested by Eli.
Ulrich Weigand [Tue, 8 Nov 2016 18:36:31 +0000 (18:36 +0000)]
[SystemZ] Refactor InstRR* instruction format patterns
This changes the InstRR (and related) patterns to no longer
automatically add an "r" at the end of the mnemonic. This
makes the .td files more obviously understandable, and also
allows using the patterns for those few instructions that
do not follow the *r scheme.
Also add some more sub-formats of the RRF format class, to
match operand names and sequence from the PoP better.
Wei Mi [Tue, 8 Nov 2016 18:19:36 +0000 (18:19 +0000)]
[RegAllocGreedy] Another fix about NewVRegs for last chance recoloring after r281783.
About when we should move a vreg from CurrentNewVRegs to NewVRegs,
if the vreg in CurrentNewVRegs was added into RecoloringCandidate and was
evicted, it shouldn't be added to NewVRegs because its physical register
will be restored at the end of tryLastChanceRecoloring after the recoloring
failed. If the vreg in CurrentNewVRegs was not in RecoloringCandidate, i.e.
it was evicted in selectOrSplitImpl inside tryRecoloringCandidates, its
physical register will not be restored even if the recoloring failed. In
that case, we need to add the vreg to NewVRegs.
Same as r281783, the problem was seen on out-of-tree target and we didn't
have a test case that reproduce the problem with in-tree targets.
Simon Pilgrim [Tue, 8 Nov 2016 15:07:01 +0000 (15:07 +0000)]
[TargetLowering] Fix undef vector element issue with true/false result handling
Fixed an issue with vector usage of TargetLowering::isConstTrueVal / TargetLowering::isConstFalseVal boolean result matching.
The comment said we shouldn't handle constant splat vectors with undef elements. But the the actual code was returning false if the build vector contained no undef elements....
This patch now ignores the number of undefs (getConstantSplatNode will return null if the build vector is all undefs).
The change has also unearthed a couple of missed opportunities in AVX512 comparison code that will need to be addressed.
Pablo Barrio [Tue, 8 Nov 2016 14:53:30 +0000 (14:53 +0000)]
[JumpThreading] Unfold selects that depend on the same condition
Summary:
These are good candidates for jump threading. This enables later opts
(such as InstCombine) to combine instructions from the selects with
instructions out of the selects. SimplifyCFG will fold the select
again if unfolding wasn't worth it.
Under -enable-unsafe-fp-math, SELECT_CC lowering in AArch64
transforms floating point comparisons of the form "a == 0.0 ? 0.0 : x" to
"a == 0.0 ? a : x". But it incorrectly assumes that 'x' and 'a' have
the same type which can lead to a wrong CSEL node that crashes later
due to nonsensical copies.
Craig Topper [Tue, 8 Nov 2016 06:58:53 +0000 (06:58 +0000)]
[AVX-512] Add an avx512f without avx512vl command line to vec_fp_to_int.ll and regenerate. This will make a change in a future patch easier to see. NFC
IR, Bitcode: Change bitcode reader to no longer own its memory buffer.
Unique ownership is just one possible ownership pattern for the memory buffer
underlying the bitcode reader. In practice, as this patch shows, ownership can
often reside at a higher level. With the upcoming change to allow multiple
modules in a single bitcode file, it will no longer be appropriate for
modules to generally have unique ownership of their memory buffer.
The C API exposes the ownership relation via the LLVMGetBitcodeModuleInContext
and LLVMGetBitcodeModuleInContext2 functions, so we still need some way for
the module to own the memory buffer. This patch does so by adding an owned
memory buffer field to Module, and using it in a few other places where it
is convenient.
Justin Bogner [Tue, 8 Nov 2016 05:02:18 +0000 (05:02 +0000)]
cmake: Don't try to install exports if there aren't any
When using LLVM_DISTRIBUTION_COMPONENTS, it's possible for LLVM's
export list to be empty. If this happens the install(EXPORTS) command
will fail, but since there isn't anything to install anyway we really
just want to skip it.
Bitcode: Decouple block info block state from reader.
As proposed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2016-October/106630.html
Move block info block state to a new class, BitstreamBlockInfo.
Clients may set the block info for a particular cursor with the
BitstreamCursor::setBlockInfo() method.
At this point BitstreamReader is not much more than a container for an
ArrayRef<uint8_t>, so remove it and replace all uses with direct uses
of memory buffers.
Summary:
Set _install_rpath to CMAKE_INSTALL_RPATH if it is defined, so that eventually
INSTALL_RPATH is set to CMAKE_INSTALL_RPATH.
The "if(NOT DEFINED CMAKE_INSTALL_RPATH)" was missing a corresponding else
clause.
This also cleans up the fix made in r285908.
Tim Northover [Tue, 8 Nov 2016 00:34:06 +0000 (00:34 +0000)]
GlobalISel: constrain PHI registers on AArch64.
Self-referencing PHI nodes need their destination operands to be constrained
because nothing else is likely to do so. For now we just pick a register class
naively.
[AMDGPU] Allow hoisting of comparisons out of a loop and eliminate condition copies
Codegen prepare sinks comparisons close to a user is we have only one register
for conditions. For AMDGPU we have many SGPRs capable to hold vector conditions.
Changed BE to report we have many condition registers. That way IR LICM pass
would hoist an invariant comparison out of a loop and codegen prepare will not
sink it.
With that done a condition is calculated in one block and used in another.
Current behavior is to store workitem's condition in a VGPR using v_cndmask
and then restore it with yet another v_cmp instruction from that v_cndmask's
result. To mitigate the issue a forward propagation of a v_cmp 64 bit result
to an user is implemented. Additional side effect of this is that we may
consume less VGPRs in a cost of more SGPRs in case if holding of multiple
conditions is needed, and that is a clear win in most cases.
Adam Nemet [Mon, 7 Nov 2016 22:41:13 +0000 (22:41 +0000)]
[OptDiag, opt-viewer] Save callee's location and display as link
With this we get a new field in the YAML record if the value being
streamed out has a debug location. For examples, please see the changes
to the tests.
This is then used in opt-viewer to display a link for the callee
function in the inlining remarks.
Sanjin Sijaric [Mon, 7 Nov 2016 22:39:02 +0000 (22:39 +0000)]
[AArch64] Transfer memory operands when lowering vector load/store intrinsics
Summary:
Some vector loads and stores generated from AArch64 intrinsics alias each other
unnecessarily, preventing better scheduling. We just need to transfer memory
operands during lowering.
Lang Hames [Mon, 7 Nov 2016 22:33:13 +0000 (22:33 +0000)]
[docs] Add a pointer to ExitOnError to the discussion of handleErrors in the
programmer's manual.
ExitOnError is often a better alternative to handleErrors for tool code. This
patch makes it easier to find the ExitOnError discussion when reading the
handleErrors section.
Mehdi Amini [Mon, 7 Nov 2016 22:13:38 +0000 (22:13 +0000)]
Add experimental support for unofficial monorepo-like directory layout
Summary:
This allows to have clang and llvm and the other subprojects
side-by-side instead of nested. This can be used with the monorepo or
multiple repos.
It will help having a single set of sources checked out but allows to
have a build directory with llvm and another one with llvm+clang.
Basically it abstracts LLVM_EXTERNAL_xxxx_SOURCE_DIR making it more
convenient by adopting a convention.
Derek Schuff [Mon, 7 Nov 2016 22:00:48 +0000 (22:00 +0000)]
[WebAssembly] Emit a BasePointer when we have overly-aligned stack objects
Because we shift the stack pointer by an unknown amount, we need an
additional pointer. In the case where we have variable-size objects
as well, we can't reuse the frame pointer, thus three pointers.
Sanjoy Das [Mon, 7 Nov 2016 21:01:49 +0000 (21:01 +0000)]
Avoid tail recursion elimination across calls with operand bundles
Summary:
In some specific scenarios with well understood operand bundle types
(like `"deopt"`) it may be possible to go ahead and convert recursion to
iteration, but TailRecursionElimination does not have that logic today
so avoid doing the right thing for now.
I need some input on whether `"funclet"` operand bundles should also
block tail recursion elimination. If not, I'll allow TRE across calls
with `"funclet"` operand bundles and add a test case.