This is a port of XRay to ARM 32-bit, without Thumb support yet. The XRay instrumentation support is moving up to AsmPrinter.
This is one of 3 commits to different repositories of XRay ARM port. The other 2 are:
[llvm-cov] Track function and instantiation coverage separately
These are distinct statistics which are useful to look at separately.
Example: say you have a template function "foo" with 5 instantiations
and only 3 of them are covered. Then this contributes (1/1) to the total
function coverage and (3/5) to the total instantiation coverage. I.e,
the old "Function Coverage" column has been renamed to "Instantiation
Coverage", and the new "Function Coverage" aggregates information from
the various instantiations of a function.
One benefit of making this switch is that the Line and Region coverage
columns will start making sense. Let's continue the example and assume
that the 5 instantiations of "foo" cover {2, 4, 6, 8, 10} out of 10
lines respectively. The new line coverage for "foo" is (10/10), not
(30/50). The old scenario got confusing because we'd report that there
were more lines in a file than what was actually possible.
This drops some redundant calls to get{UniqueSourceFiles,
CoveredFunctions}. We can figure out the right column widths without
re-doing this expensive work.
This isn't NFC, but I don't want to check in another binary *.covmapping
file with long filenames in it. I tested this locally on a project with
some long filenames (FileCheck).
Handle Invoke during sample profiler annotation: make it inlinable.
Summary: Previously we reline on inst-combine to remove inlinable invoke instructions. This causes trouble because a few extra optimizations are schedule early that could introduce too much CFG change (e.g. simplifycfg removes too much control flow). This patch handles invoke instruction in-place during sample profile annotation, so that we do not rely on instcombine to remove those invoke instructions.
[AVX-512] Don't lower CVTPD2PS intrinsics to ISD::FP_ROUND with an X86 rounding mode encoding in the second operand. This immediate should only be 0 or 1 and indicates if the truncation loses precision.
Also enhance an assert in SelectionDAG::getNode to flag this sort of problem in the future.
[AVX-512] Stop lowering avx512_mask_sqrt intrinsics to ISD:FSQRT with a second operand containing an X86 specific rounding mode encoding that doesn't belong.
Simon Pilgrim [Sun, 18 Sep 2016 12:45:23 +0000 (12:45 +0000)]
[X86][SSE] Improve recognition of uitofp conversions that can be performed as sitofp
With D24253 we can now use SelectionDAG::SignBitIsZero with vector operations.
This patch uses SelectionDAG::SignBitIsZero to recognise that a zero sign bit means that we can use a sitofp instead of a uitofp (which is not directly support on pre-AVX512 hardware).
While AVX512 does provide support for uitofp, the conversion to sitofp should not cause any regressions.
Simplified GEP cloning in vectorizeMemoryInstruction().
Added an assertion that checks consecutive GEP, which should have only one loop-variant operand.
Wei Mi [Sun, 18 Sep 2016 06:10:32 +0000 (06:10 +0000)]
Change the order of the splitted store from high - low to low - high.
It is a trivial change which could make the testcase easier to be reused
for the store splitting in CodeGenPrepare.
Teresa Johnson [Sat, 17 Sep 2016 20:40:16 +0000 (20:40 +0000)]
[ThinLTO] Ensure anonymous globals renamed even at -O0
Summary:
This fixes an issue when files are compiled with -flto=thin
at default -O0. We need to rename anonymous globals before attempting
to write the module summary because all values need names for
the summary. This was happening at -O1 and above, but not before
the early exit when constructing the pipeline for -O0.
Also add an internal -prepare-for-thinlto option to enable this
to be tested via opt.
Matt Arsenault [Sat, 17 Sep 2016 16:09:55 +0000 (16:09 +0000)]
AMDGPU: Fix broken FrameIndex handling
We were trying to avoid using a FrameIndex operand in non-pointer
operands in a convoluted way, and would break because of
using TargetFrameIndex. The TargetFrameIndex should only be used
in the case where it makes sense to fold it as part of the addressing
mode, otherwise it requires materialization like a normal constant.
This wasn't working reliably and failed in the added testcase, hitting
the assert when processing the frame index.
The TargetFrameIndex was coming from trying to produce an AssertZext
limiting the maximum stack size. I'm not sure this was correct to begin
with, because it is apparently possible to have a single workitem
dispatch that requires all 4G of private memory.
Matt Arsenault [Sat, 17 Sep 2016 15:44:16 +0000 (15:44 +0000)]
AMDGPU: Push bitcasts through build_vector
This reduces the number of copies and reg_sequences
when using fp constant vectors. This significantly
reduces the code size in local-stack-alloc-bug.ll
Mehdi Amini [Sat, 17 Sep 2016 06:00:02 +0000 (06:00 +0000)]
Don't create a SymbolTable in Function when the LLVMContext discards value names (NFC)
The ValueSymbolTable is used to detect name conflict and rename
instructions automatically. This is not needed when the value
names are automatically discarded by the LLVMContext.
No functional change intended, just saving a little bit of memory.
This is a recommit of r281806 after fixing the accessor to return
a pointer instead of a reference and updating all the call-sites.
Mehdi Amini [Sat, 17 Sep 2016 03:39:01 +0000 (03:39 +0000)]
Don't create a SymbolTable in Function when the LLVMContext discards value names (NFC)
The ValueSymbolTable is used to detect name conflict and rename
instructions automatically. This is not needed when the value
names are automatically discarded by the LLVMContext.
No functional change intended, just saving a little bit of memory.
Chris Bieneman [Fri, 16 Sep 2016 22:19:19 +0000 (22:19 +0000)]
[CMake] Support symlinks with the same name as the binary
This supports creating symlinks to tools in different directories than
the tool is built to. This is useful for the LLDB framework build which
I’m sending patches for shortly.
[InstCombine] canonicalize vector select with constant vector condition to shuffle
As discussed on llvm-dev ( http://lists.llvm.org/pipermail/llvm-dev/2016-August/104210.html ):
turn a vector select with constant condition operand into a shuffle as a canonicalization step.
Shuffles may be easier to reason about in conjunction with other shuffles and insert/extract.
Possible known (minor?) regressions from this change are filed as:
https://llvm.org/bugs/show_bug.cgi?id=28530
https://llvm.org/bugs/show_bug.cgi?id=28531
https://llvm.org/bugs/show_bug.cgi?id=30371
If something terrible happens to perf after this commit, feel free to revert until a backend
fix is in place.
[safestack] Fix assertion failure in stack coloring.
This is a fix for PR30318.
Clang may generate IR where an alloca is already live when entering a
BB with lifetime.start. In this case, conservatively extend the
alloca lifetime all the way back to the block entry.
[RegAllocGreedy] Fix the list of NewVRegs for last chance recoloring.
When trying to recolor a register we may split live-ranges in the
process. When we create new live-ranges we will have to process them,
but when we move a register from Assign to Split, the allocation is not
changed until the whole recoloring session is successful.
Therefore, only push the live-ranges that changed from Assign to
Split when the recoloring is successful.
Same as the previous commit, I was not able to produce a test case that
reproduce the problem with in-tree targets.
Note: The bug has been here since the recoloring scheme has been added
back in r200883 (Feb 2014).
[RegAllocGreedy] Fix an assertion and condition when last chance recoloring is used.
When last chance recoloring is used, the list of NewVRegs may not be
empty when calling selectOrSplitImpl. Indeed, another coloring may have
taken place with splitting/spilling in the same recoloring session.
Relax an assertion to take this into account and adapt a condition to
act as if the NewVRegs were local to this selectOrSplitImpl instance.
Unfortunately I am unable to produce a test case for this, I was only
able to reproduce the conditions on an out-of-tree target.
Tom Stellard [Fri, 16 Sep 2016 21:53:00 +0000 (21:53 +0000)]
AMDGPU: Refactor kernel argument lowering
Summary:
The main challenge in lowering kernel arguments for AMDGPU is determing the
memory type of the argument. The generic calling convention code assumes
that only legal register types can be stored in memory, but this is not the
case for AMDGPU.
This consolidates all the logic AMDGPU uses for deducing memory types into a single
function. This will make it much easier to support different ABIs in the future.
[InstCombine] allow vector types for constant folding / computeKnownBits (PR24942)
computeKnownBits() already works for integer vectors, so allow vector types when calling that from InstCombine.
I don't think the change to use m_APInt in computeKnownBits is strictly necessary because we do check for
ConstantVector later, but it's more efficient to handle the splat case without needing to loop on vector elements.
This should work with InstSimplify, but doesn't yet, so I made that a FIXME comment on the test for PR24942:
https://llvm.org/bugs/show_bug.cgi?id=24942
Install libLLVM if needed with LLVM_INSTALL_TOOLCHAIN_ONLY
Summary:
When LLVM_LINK_LLVM_DYLIB is set, the libLLVM shared
library needs to be installed in the toolchain. Without
this chanage LLVM_INSTALL_TOOLCHAIN_ONLY combined with
LLVM_LINK_LLVM_DYLIB results in a broken install.
Nirav Dave [Fri, 16 Sep 2016 18:30:20 +0000 (18:30 +0000)]
Defer asm errors to post-statement failure
Recommitting after fixing AsmParser initialization and X86 inline asm
error cleanup.
Allow errors to be deferred and emitted as part of clean up to simplify
and shorten Assembly parser code. This will allow error messages to be
emitted in helper functions and be modified by the caller which has
better context.
As part of this many minor cleanups to the Parser:
* Unify parser cleanup on error
* Add Workaround for incorrect return values in ParseDirective instances
* Tighten checks on error-signifying return values for parser functions
and fix in-tree TargetParsers to be more consistent with the changes.
* Fix AArch64 test cases checking for spurious error messages that are
now fixed.
These changes should be backwards compatible with current Target Parsers
so long as the error status are correctly returned in appropriate
functions.
Teresa Johnson [Fri, 16 Sep 2016 17:12:48 +0000 (17:12 +0000)]
[LTO] Use llvm-nm instead of nm in new tests
The use of nm in the new tests added with r281725 caused a couple
of bot failures:
http://lab.llvm.org:8011/builders/clang-x64-ninja-win7/builds/15701
http://bb.pgr.jp/builders/ninja-clang-i686-msc19-R/builds/6939
Keith Walker [Fri, 16 Sep 2016 14:07:29 +0000 (14:07 +0000)]
Place the lowered phi instruction(s) before the DEBUG_VALUE entry
When a phi node is finally lowered to a machine instruction it is
important that the lowered "load" instruction is placed before the
associated DEBUG_VALUE entry describing the value loaded.
Renamed the existing SkipPHIsAndLabels to SkipPHIsLabelsAndDebug to
more fully describe that it also skips debug entries. Then used the
"new" function SkipPHIsAndLabels when the debug information should not
be skipped when placing the lowered "load" instructions so that it is
placed before the debug entries.
Teresa Johnson [Fri, 16 Sep 2016 13:54:19 +0000 (13:54 +0000)]
[LTO] Fix handling of mixed (regular and thin) mode LTO
Summary:
In runThinLTO we start the task numbering for ThinLTO backend
tasks depending on whether there was also a regular LTO object
(CombinedModule). However, the CombinedModule is moved at
the end of runRegularLTO, so we need to save this information and
pass it into runThinLTO. Otherwise the AddOutput callback to the client
will use the same task number for both the regular LTO object
and the first ThinLTO object, which in gold-plugin caused only
one to be end up in the output filename array and therefore passed
back to gold for the final native link.
Simon Dardis [Fri, 16 Sep 2016 13:50:43 +0000 (13:50 +0000)]
[mips] Fix aui/daui/dahi/dati for MIPSR6
For compatiblity with binutils, define these instructions to take
two registers with a 16bit unsigned immediate. Both of the registers
have to be same for dahi and dati.