Summary:
This document is an attempt at showing how XRay could be used to debug
latency issues with LLVM tools, and how to use the llvm-xray tool to
analyse XRay traces.
Eric Christopher [Thu, 30 Mar 2017 22:34:20 +0000 (22:34 +0000)]
getPristineRegs is not accurately considering shrink wrapping puts
registers not saved in certain blocks. Use explicit getCalleeSavedInfo
and isLiveIn instead.
Rafael Espindola [Thu, 30 Mar 2017 21:05:31 +0000 (21:05 +0000)]
Use os.path.realpath when tracking the cwd.
This is needed by TestCases/Posix/coverage-direct.cc
The problem is that the test does:
mkdir <dir>
cd <dir>
cd ..
rm -rf <dir>
<more commands>
the current directory currently looks like "/.../<dir>/../" which
doesn't exist when dir is deleted.
at some point we should probably switch to using the os current
directory (specially if we want to add subshell), but this is a small
incremental improvement.
Juergen Ributzka [Thu, 30 Mar 2017 19:56:50 +0000 (19:56 +0000)]
[Object] Remove check for BIND_OPCODE_DONE/REBASE_OPCODE_DONE.
BIND_OPCODE_DONE/REBASE_OPCODE_DONE may appear at the end of the opcode array,
but they are not required to. The linker only adds them as padding to align the
opcodes to pointer size.
Yaron Keren [Thu, 30 Mar 2017 19:30:51 +0000 (19:30 +0000)]
Following r297661, disable dup workaround to disable duplicate STDOUT fd closing and instead directly prevent closing of STD* file descriptors.
We do not want to close STDOUT as there may have been several uses of it
such as the case: llc %s -o=- -pass-remarks-output=- -filetype=asm
which cause multiple closes of STDOUT_FILENO and/or use-after-close of it.
Using dup() in getFD doesn't work as we end up with original STDOUT_FILENO
open anyhow.
Adam Nemet [Thu, 30 Mar 2017 18:53:04 +0000 (18:53 +0000)]
[DAGCombiner] Initial support for the fast-math flag contract
Now alternatively to the TargetOption.AllowFPOpFusion global flag, FMUL->FADD
can also use the per operation FMF to allow fusion.
The idea here is not to port everything to the new scheme (e.g. fused
multiply-and-sub will be ported later) but that this work all the way from
clang.
The transformation is conditionalized on *both* the FADD and the FMUL having
the FMF contract flag.
Ahmed Bougacha [Thu, 30 Mar 2017 17:49:58 +0000 (17:49 +0000)]
[CodeGen] Pass SDAG an ORE, and replace FastISel stats with remarks.
In the long-term, we want to replace statistics with something
finer-grained that lets us gather per-function data.
Remarks are that replacement.
Create an ORE instance in SelectionDAGISel, and pass it to
SelectionDAG.
SelectionDAG was used so that we can emit remarks from all
SelectionDAG-related code, including TargetLowering and DAGCombiner.
This isn't used in the current patch but Adam tells me he's interested
for the fp-contract combines.
Use the ORE instance to emit FastISel failures as remarks (instead of
the mix of dbgs() dumps and statistics that we currently have).
Eventually, we want to have an API that tells us whether remarks are
enabled (http://llvm.org/PR32352) so that we don't emit expensive
remarks (in this case, dumping IR) when it's not needed. For now, use
'isEnabled' as a crude replacement.
This does mean that the replacement for '-fast-isel-verbose' is now
'-pass-remarks-missed=isel'. Additionally, clang users also need to
enable remark diagnostics, using '-Rpass-missed=isel'.
This also removes '-fast-isel-verbose2': there are no static statistics
that we want to only enable in asserts builds, so we can always use
the remarks regardless of the build type.
Sanjay Patel [Thu, 30 Mar 2017 17:32:42 +0000 (17:32 +0000)]
[DAGCombiner] add helper function for visitORLike; NFCI
This combines all of the equivalent clean-ups for foldAndOfSetCCs:
https://reviews.llvm.org/rL298938
https://reviews.llvm.org/rL298940
https://reviews.llvm.org/rL298944
https://reviews.llvm.org/rL298949
https://reviews.llvm.org/rL298950
https://reviews.llvm.org/rL299002
https://reviews.llvm.org/rL299013
The sins of code duplication are on full display here:
each function is missing a fold that wasn't copied over from its logical sibling.
Kristof Beyls [Thu, 30 Mar 2017 11:06:25 +0000 (11:06 +0000)]
Revert "Make naming in Host.h in line with coding standards."
This reverts r299062, which caused build failures on Windows.
It also reverts the attempts to fix the windows builds in r299064 and r299065.
The introduction of namespace llvm::sys::detail makes MSVC, and seemingly also
mingw, complain about ambiguity with the existing namespace llvm::detail.
E.g.:
C:\b\slave\sanitizer-windows\llvm\include\llvm/Support/MathExtras.h(184): error C2872: 'detail': ambiguous symbol
C:\b\slave\sanitizer-windows\llvm\include\llvm/Support/PointerLikeTypeTraits.h(31): note: could be 'llvm::detail'
C:\b\slave\sanitizer-windows\llvm\include\llvm/Support/Host.h(80): note: or 'llvm::sys::detail'
In r299064 and r299065 I tried to fix these ambiguities, based on the errors
reported in the log files. It seems however that the build stops early when
this kind of error is encountered, and many build-then-fix-iterations on
Windows may be needed to fix this. Therefore reverting r299062 for now to
get the build working again on Windows.
Kristof Beyls [Thu, 30 Mar 2017 07:24:49 +0000 (07:24 +0000)]
Refactor getHostCPUName to allow testing on non-native hardware.
This refactors getHostCPUName so that for the architectures that get the
host cpu info on linux from /proc/cpuinfo, the /proc/cpuinfo parsing
logic is present in the build, even if it wasn't built on a linux system
for that architecture.
Since the code is present in the build, we can then test that code also
on other systems, i.e. we don't need to have buildbots setup for all
architectures on linux to be able to test this. Instead, developers will
test this as part of the regression test run.
As an example, a few unit tests are added to test getHostCPUName for ARM
running linux. A unit test is preferred over a lit-based test, since the
expectation is that in the future, the functionality here will grow over
what can be tested with "llc -mcpu=native".
This is a preparation step to enable implementing the range of
improvements discussed on PR30516, such as adding AArch64 support,
support for big.LITTLE systems, reducing code duplication.
Craig Topper [Thu, 30 Mar 2017 05:49:03 +0000 (05:49 +0000)]
[APInt] Remove references to integerPartWidth outside of APFloat implentation.
Turns out integerPartWidth only explicitly defines the width of the tc functions in the APInt class. Functions that aren't used by APInt implementation itself. Many places in the code base already assume APInt is made up of 64-bit pieces. Explicitly assuming 64-bit here doesn't make that situation much worse. A full audit would need to be done if it ever changes.
[libFuzzer] best effort support for -fsanitize-coverage=trace-pc instrumentation. It is less efficient and precise than -fsanitize-coverage=trace-pc-guard, but still works
Eric Christopher [Wed, 29 Mar 2017 23:34:27 +0000 (23:34 +0000)]
If the DIUnit has flags passed on it then have DW_AT_producer be a combination of DICompileUnit::Producer and Flags.
The darwin behavior is unchanged and will continue to use DW_AT_APPLE_flags.
Reid Kleckner [Wed, 29 Mar 2017 22:51:22 +0000 (22:51 +0000)]
[codeview] Fix buggy BeginIndexMapSize assertion
This assert is just trying to test that processing each record adds
exactly one entry to the index map. The assert logic was wrong when the
first record in the type stream was a field list.
I've simplified the code by moving the LF_FIELDLIST-specific logic into
the callback for that record type.
Adrian McCarthy [Wed, 29 Mar 2017 19:27:08 +0000 (19:27 +0000)]
Re-land: "Make NativeExeSymbol a concrete subclass of NativeRawSymbol [PDB]"
This should work on all platforms now that r299006 has landed. Tested locally
on Windows and Linux.
This moves exe symbol-specific method implementations out of NativeRawSymbol
into a concrete subclass. Also adds implementations for hasCTypes and
hasPrivateSymbols and a simple test to ensure the native reader can access the
summary information for the executable from the PDB.
Original Differential Revision: https://reviews.llvm.org/D31059
Matthew Simpson [Wed, 29 Mar 2017 18:23:08 +0000 (18:23 +0000)]
[InstCombine] Correct the check for vector GEPs
Some of the GEP combines (e.g., descaling) can't handle vector GEPs. We have an
existing check that attempts to bail out if given a vector GEP. However, the
check only tests the GEP's pointer operand. A GEP results in a vector of
pointers if at least one of its operands is vector-typed (e.g., its pointer
operand could be a scalar, but its index could be a vector). We should just
check the type of the GEP itself. This should fix PR32414.
Daniel Sanders [Wed, 29 Mar 2017 15:37:18 +0000 (15:37 +0000)]
[tablegen][globalisel] Convert the SelectionDAG importer to a tree walking approach. NFC
Summary:
But don't actually inspect the tree any deeper than we already do. This
change is NFC but the next one will enable full traversal of the
source/destination patterns.
Depends on D30535
Reviewers: t.p.northover, qcolombet, aditya_nandakumar, rovka, ab
Instantiation of the MachineVerifierPass through
PassInfo::getNormalCtor would yield a segfault since the default
constructor of the MachineVerifierPass takes a reference to nullptr.
Craig Topper [Wed, 29 Mar 2017 06:55:28 +0000 (06:55 +0000)]
[AVX-512] Remove explicit KMOVWrk/KMOVWKr instructions from patterns where we can just use COPY_TO_REGCLASS instead.
This will result in a KMOVW or KMOVD being emitted during register allocation. And in at least some cases this might allow the register coalescer to remove the copy all together.
[XRay] Update FDR log reader to be aware of buffer sizes per thread.
Summary:
It is problematic for this reader that it expects to read data from
several threads, but the header or message format does not define
framing. Since the buffers are reused, we can't rely on skipping
zeroed out data as a synchronization method either.
There is an argument that this is not version compatible with the format
the reader expected previously. I argue that since the writer wrote garbage
past the end of buffer record, there is no currently working reader to
compromise.
The corresponding writer change is posted to D31384.
[XRay][tools] Handle "no subcommand" case for llvm-xray
Summary:
Currently the llvm-xray commandline tool fails to handle the case for
when no subcommand is provided in a graceful manner. This fixes that to
print the help message explaining the subcommands and the available
options.
Craig Topper [Tue, 28 Mar 2017 23:20:37 +0000 (23:20 +0000)]
[AVX-512] Punt on fast-isel of truncates to i1 when AVX512 is enabled.
We should be masking the value and emitting a register copy like we do in non-fast isel. Instead we were just updating the value map and emitting nothing.
After r298928 we started seeing cases where we would create a copy from GR8 to GR32 because the source register in a VK1 to GR32 copy was replaced by the GR8 going into a truncate.
Guozhi Wei [Tue, 28 Mar 2017 22:55:01 +0000 (22:55 +0000)]
[PPC] In PPCBoolRetToInt change the bool value to i64 if the target is ppc64
In PPCBoolRetToInt bool value is changed to i32 type. On ppc64 it may introduce an extra zero extension for the return value. This patch changes the integer type to i64 to avoid the zero extension on ppc64.
[AMDGPU] Boost unroll threshold for loops reading local memory
This is less important than increase threshold for private memory,
but still brings performance improvements in a wide range of tests.
Unrolling more for local memory serves three purposes: it allows
to combine ds operations if offset becomes static, saves registers
used for offsets in case of static offsets, and allows better lds
latency hiding.
[AMDGPU] Fix recorded region boundaries in max-occupancy scheduler
This is incorrect to record region boundaries before scheduling,
it may change after scheduling. As a result second pass may see less
instructions to schedule than it should.
Simon Pilgrim [Tue, 28 Mar 2017 21:32:11 +0000 (21:32 +0000)]
[X86][MMX] Match MMX fp_to_sint conversions from XMM registers
We currently perform the various fp_to_sint XMM conversion and then transfer to the MMX register (on 32-bit via the stack).
This patch improves support for MOVDQ2Q XMM to MMX transfers and adds the XMM->MMX fp_to_sint direct conversion patterns. The SSE2 specifications are the same as for XMM->XMM and XMM->MMX rounding/exceptions/etc.
Adam Nemet [Tue, 28 Mar 2017 20:11:52 +0000 (20:11 +0000)]
[IR] Add AllowContract to FastMathFlags
-ffp-contract=fast does not currently work with LTO because it's passed as a
TargetOption to the backend rather than in the IR. This adds it to
FastMathFlags.
Previously it was covered by the internalization. It turns out we cannot
run internalizer in FE, it break separate compilation tests. Thus early
inliner gets its own option.
Craig Topper [Tue, 28 Mar 2017 16:35:29 +0000 (16:35 +0000)]
[AVX-512] Fix accidental uses of AH/BH/CH/DH after copies to/from mask registers
We've had several bugs(PR32256, PR32241) recently that resulted from usages of AH/BH/CH/DH either before or after a copy to/from a mask register.
This ultimately occurs because we create COPY_TO_REGCLASS with VK1 and GR8. Then in CopyToFromAsymmetricReg in X86InstrInfo we find a 32-bit super register for the GR8 to emit the KMOV with. But as these tests are demonstrating, its possible for the GR8 register to be a high register and we end up doing an accidental extra or insert from bits 15:8.
I think the best way forward is to stop making copies directly between mask registers and GR8/GR16. Instead I think we should restrict to only copies between mask registers and GR32/GR64 and use EXTRACT_SUBREG/INSERT_SUBREG to handle the conversion from GR32 to GR16/8 or vice versa.
Unfortunately, this complicates fastisel a bit more now to create the subreg extracts where we used to create GR8 copies. We can probably make a helper function to bring down the repitition.
This does result in KMOVD being used for copies when BWI is available because we don't know the original mask register size. This caused a lot of deltas on tests because we have to split the checks for KMOVD vs KMOVW based on BWI.