Sam Kolton [Fri, 31 Mar 2017 11:42:43 +0000 (11:42 +0000)]
[AMDGPU] SDWA Peephole: improve search for immediates in SDWA patterns
Previously compiler often extracted common immediates into specific register, e.g.:
```
%vreg0 = S_MOV_B32 0xff;
%vreg2 = V_AND_B32_e32 %vreg0, %vreg1
%vreg4 = V_AND_B32_e32 %vreg0, %vreg3
```
Because of this SDWA peephole failed to find SDWA convertible pattern. E.g. in previous example this could be converted into 2 SDWA src operands:
```
SDWA src: %vreg2 src_sel:BYTE_0
SDWA src: %vreg4 src_sel:BYTE_0
```
With this change peephole check if operand is either immediate or register that is copy of immediate.
Craig Topper [Fri, 31 Mar 2017 06:30:25 +0000 (06:30 +0000)]
[APInt] Add unittests that demonstrate how very broken APIntOps::isShiftedMask is.
Did you know that 0 is a shifted mask? But 0x0000ff00 and 0x000000ff aren't? At least we get 0xff000000 right.
I only see one usage of this function in the code base today and its in InstCombine. I think its protected against 0 being misreported as a mask. I guess we just don't have tests for the missed cases.
Andrew Wilkins [Fri, 31 Mar 2017 04:59:57 +0000 (04:59 +0000)]
Go binding: Add GetCurrentDebugLocation to obtain debug location from builder
Summary:
Currently Go binding only has SetCurrentDebugLocation method.
I added GetCurrentDebugLocation method to IRBuilder instance.
I added this because I want to save current debug location, change debug location temporary and restore the saved one finally.
This is useful when source location jumps and goes back after while LLVM IR generation.
I also added tests for this to ir_test.go.
I confirmed that all test passed with this patch based on r298890
LTO: Reduce memory consumption by creating an in-memory symbol table for InputFiles. NFCI.
Introduce symbol table data structures that can be potentially written to
disk, have the LTO library build those data structures using temporarily
constructed modules and redirect the LTO library implementation to go through
those data structures. This allows us to remove the LLVMContext and Modules
owned by InputFile.
With this change I measured a peak memory consumption decrease from 5.4GB to
2.8GB in a no-op incremental ThinLTO link of Chromium on Linux. The impact on
memory consumption is larger in COFF linkers where we are currently forced
to materialize all metadata in order to read linker options. Peak memory
consumption linking a large piece of Chromium for Windows with full LTO and
debug info decreases from >64GB (OOM) to 15GB.
Eric Christopher [Fri, 31 Mar 2017 02:16:54 +0000 (02:16 +0000)]
Temporarily revert "[PPC] In PPCBoolRetToInt change the bool value to i64 if the target is ppc64" as it's causing test failures, I've given Carrot a testcase offline.
[XRay][tools] Remove some assertions in llvm-xray graph
Summary:
Assertions assuming that function calls may not have zero durations do
not seem to hold in the wild. There are valid cases where the conversion
of the tsc counters end up becoming zero-length durations. These
assertions don't really hold and the algorithms don't need those to be
true for them to work.
Dan Gohman [Thu, 30 Mar 2017 23:58:19 +0000 (23:58 +0000)]
[WebAssembly] Initial linking metadata support
Add support for the new relocations and linking metadata section support in
https://github.com/WebAssembly/tool-conventions/blob/master/Linking.md. In
particular, this allows LLVM to indicate which variable is the stack pointer,
so that it can be linked with other objects.
This also adds support for emitting type relocations for call_indirect
instructions.
Right now, this is mainly tested by using wabt and hexdump to examine the
output on selected testcases. We'll add more tests as the design stablizes
and more of the pieces are in place.
Summary:
This document is an attempt at showing how XRay could be used to debug
latency issues with LLVM tools, and how to use the llvm-xray tool to
analyse XRay traces.
Eric Christopher [Thu, 30 Mar 2017 22:34:20 +0000 (22:34 +0000)]
getPristineRegs is not accurately considering shrink wrapping puts
registers not saved in certain blocks. Use explicit getCalleeSavedInfo
and isLiveIn instead.
Rafael Espindola [Thu, 30 Mar 2017 21:05:31 +0000 (21:05 +0000)]
Use os.path.realpath when tracking the cwd.
This is needed by TestCases/Posix/coverage-direct.cc
The problem is that the test does:
mkdir <dir>
cd <dir>
cd ..
rm -rf <dir>
<more commands>
the current directory currently looks like "/.../<dir>/../" which
doesn't exist when dir is deleted.
at some point we should probably switch to using the os current
directory (specially if we want to add subshell), but this is a small
incremental improvement.
Juergen Ributzka [Thu, 30 Mar 2017 19:56:50 +0000 (19:56 +0000)]
[Object] Remove check for BIND_OPCODE_DONE/REBASE_OPCODE_DONE.
BIND_OPCODE_DONE/REBASE_OPCODE_DONE may appear at the end of the opcode array,
but they are not required to. The linker only adds them as padding to align the
opcodes to pointer size.
Yaron Keren [Thu, 30 Mar 2017 19:30:51 +0000 (19:30 +0000)]
Following r297661, disable dup workaround to disable duplicate STDOUT fd closing and instead directly prevent closing of STD* file descriptors.
We do not want to close STDOUT as there may have been several uses of it
such as the case: llc %s -o=- -pass-remarks-output=- -filetype=asm
which cause multiple closes of STDOUT_FILENO and/or use-after-close of it.
Using dup() in getFD doesn't work as we end up with original STDOUT_FILENO
open anyhow.
Adam Nemet [Thu, 30 Mar 2017 18:53:04 +0000 (18:53 +0000)]
[DAGCombiner] Initial support for the fast-math flag contract
Now alternatively to the TargetOption.AllowFPOpFusion global flag, FMUL->FADD
can also use the per operation FMF to allow fusion.
The idea here is not to port everything to the new scheme (e.g. fused
multiply-and-sub will be ported later) but that this work all the way from
clang.
The transformation is conditionalized on *both* the FADD and the FMUL having
the FMF contract flag.
Ahmed Bougacha [Thu, 30 Mar 2017 17:49:58 +0000 (17:49 +0000)]
[CodeGen] Pass SDAG an ORE, and replace FastISel stats with remarks.
In the long-term, we want to replace statistics with something
finer-grained that lets us gather per-function data.
Remarks are that replacement.
Create an ORE instance in SelectionDAGISel, and pass it to
SelectionDAG.
SelectionDAG was used so that we can emit remarks from all
SelectionDAG-related code, including TargetLowering and DAGCombiner.
This isn't used in the current patch but Adam tells me he's interested
for the fp-contract combines.
Use the ORE instance to emit FastISel failures as remarks (instead of
the mix of dbgs() dumps and statistics that we currently have).
Eventually, we want to have an API that tells us whether remarks are
enabled (http://llvm.org/PR32352) so that we don't emit expensive
remarks (in this case, dumping IR) when it's not needed. For now, use
'isEnabled' as a crude replacement.
This does mean that the replacement for '-fast-isel-verbose' is now
'-pass-remarks-missed=isel'. Additionally, clang users also need to
enable remark diagnostics, using '-Rpass-missed=isel'.
This also removes '-fast-isel-verbose2': there are no static statistics
that we want to only enable in asserts builds, so we can always use
the remarks regardless of the build type.
Sanjay Patel [Thu, 30 Mar 2017 17:32:42 +0000 (17:32 +0000)]
[DAGCombiner] add helper function for visitORLike; NFCI
This combines all of the equivalent clean-ups for foldAndOfSetCCs:
https://reviews.llvm.org/rL298938
https://reviews.llvm.org/rL298940
https://reviews.llvm.org/rL298944
https://reviews.llvm.org/rL298949
https://reviews.llvm.org/rL298950
https://reviews.llvm.org/rL299002
https://reviews.llvm.org/rL299013
The sins of code duplication are on full display here:
each function is missing a fold that wasn't copied over from its logical sibling.
Kristof Beyls [Thu, 30 Mar 2017 11:06:25 +0000 (11:06 +0000)]
Revert "Make naming in Host.h in line with coding standards."
This reverts r299062, which caused build failures on Windows.
It also reverts the attempts to fix the windows builds in r299064 and r299065.
The introduction of namespace llvm::sys::detail makes MSVC, and seemingly also
mingw, complain about ambiguity with the existing namespace llvm::detail.
E.g.:
C:\b\slave\sanitizer-windows\llvm\include\llvm/Support/MathExtras.h(184): error C2872: 'detail': ambiguous symbol
C:\b\slave\sanitizer-windows\llvm\include\llvm/Support/PointerLikeTypeTraits.h(31): note: could be 'llvm::detail'
C:\b\slave\sanitizer-windows\llvm\include\llvm/Support/Host.h(80): note: or 'llvm::sys::detail'
In r299064 and r299065 I tried to fix these ambiguities, based on the errors
reported in the log files. It seems however that the build stops early when
this kind of error is encountered, and many build-then-fix-iterations on
Windows may be needed to fix this. Therefore reverting r299062 for now to
get the build working again on Windows.
Kristof Beyls [Thu, 30 Mar 2017 07:24:49 +0000 (07:24 +0000)]
Refactor getHostCPUName to allow testing on non-native hardware.
This refactors getHostCPUName so that for the architectures that get the
host cpu info on linux from /proc/cpuinfo, the /proc/cpuinfo parsing
logic is present in the build, even if it wasn't built on a linux system
for that architecture.
Since the code is present in the build, we can then test that code also
on other systems, i.e. we don't need to have buildbots setup for all
architectures on linux to be able to test this. Instead, developers will
test this as part of the regression test run.
As an example, a few unit tests are added to test getHostCPUName for ARM
running linux. A unit test is preferred over a lit-based test, since the
expectation is that in the future, the functionality here will grow over
what can be tested with "llc -mcpu=native".
This is a preparation step to enable implementing the range of
improvements discussed on PR30516, such as adding AArch64 support,
support for big.LITTLE systems, reducing code duplication.
Craig Topper [Thu, 30 Mar 2017 05:49:03 +0000 (05:49 +0000)]
[APInt] Remove references to integerPartWidth outside of APFloat implentation.
Turns out integerPartWidth only explicitly defines the width of the tc functions in the APInt class. Functions that aren't used by APInt implementation itself. Many places in the code base already assume APInt is made up of 64-bit pieces. Explicitly assuming 64-bit here doesn't make that situation much worse. A full audit would need to be done if it ever changes.
[libFuzzer] best effort support for -fsanitize-coverage=trace-pc instrumentation. It is less efficient and precise than -fsanitize-coverage=trace-pc-guard, but still works
Eric Christopher [Wed, 29 Mar 2017 23:34:27 +0000 (23:34 +0000)]
If the DIUnit has flags passed on it then have DW_AT_producer be a combination of DICompileUnit::Producer and Flags.
The darwin behavior is unchanged and will continue to use DW_AT_APPLE_flags.
Reid Kleckner [Wed, 29 Mar 2017 22:51:22 +0000 (22:51 +0000)]
[codeview] Fix buggy BeginIndexMapSize assertion
This assert is just trying to test that processing each record adds
exactly one entry to the index map. The assert logic was wrong when the
first record in the type stream was a field list.
I've simplified the code by moving the LF_FIELDLIST-specific logic into
the callback for that record type.
Adrian McCarthy [Wed, 29 Mar 2017 19:27:08 +0000 (19:27 +0000)]
Re-land: "Make NativeExeSymbol a concrete subclass of NativeRawSymbol [PDB]"
This should work on all platforms now that r299006 has landed. Tested locally
on Windows and Linux.
This moves exe symbol-specific method implementations out of NativeRawSymbol
into a concrete subclass. Also adds implementations for hasCTypes and
hasPrivateSymbols and a simple test to ensure the native reader can access the
summary information for the executable from the PDB.
Original Differential Revision: https://reviews.llvm.org/D31059
Matthew Simpson [Wed, 29 Mar 2017 18:23:08 +0000 (18:23 +0000)]
[InstCombine] Correct the check for vector GEPs
Some of the GEP combines (e.g., descaling) can't handle vector GEPs. We have an
existing check that attempts to bail out if given a vector GEP. However, the
check only tests the GEP's pointer operand. A GEP results in a vector of
pointers if at least one of its operands is vector-typed (e.g., its pointer
operand could be a scalar, but its index could be a vector). We should just
check the type of the GEP itself. This should fix PR32414.
Daniel Sanders [Wed, 29 Mar 2017 15:37:18 +0000 (15:37 +0000)]
[tablegen][globalisel] Convert the SelectionDAG importer to a tree walking approach. NFC
Summary:
But don't actually inspect the tree any deeper than we already do. This
change is NFC but the next one will enable full traversal of the
source/destination patterns.
Depends on D30535
Reviewers: t.p.northover, qcolombet, aditya_nandakumar, rovka, ab
Instantiation of the MachineVerifierPass through
PassInfo::getNormalCtor would yield a segfault since the default
constructor of the MachineVerifierPass takes a reference to nullptr.
Craig Topper [Wed, 29 Mar 2017 06:55:28 +0000 (06:55 +0000)]
[AVX-512] Remove explicit KMOVWrk/KMOVWKr instructions from patterns where we can just use COPY_TO_REGCLASS instead.
This will result in a KMOVW or KMOVD being emitted during register allocation. And in at least some cases this might allow the register coalescer to remove the copy all together.
[XRay] Update FDR log reader to be aware of buffer sizes per thread.
Summary:
It is problematic for this reader that it expects to read data from
several threads, but the header or message format does not define
framing. Since the buffers are reused, we can't rely on skipping
zeroed out data as a synchronization method either.
There is an argument that this is not version compatible with the format
the reader expected previously. I argue that since the writer wrote garbage
past the end of buffer record, there is no currently working reader to
compromise.
The corresponding writer change is posted to D31384.
[XRay][tools] Handle "no subcommand" case for llvm-xray
Summary:
Currently the llvm-xray commandline tool fails to handle the case for
when no subcommand is provided in a graceful manner. This fixes that to
print the help message explaining the subcommands and the available
options.
Craig Topper [Tue, 28 Mar 2017 23:20:37 +0000 (23:20 +0000)]
[AVX-512] Punt on fast-isel of truncates to i1 when AVX512 is enabled.
We should be masking the value and emitting a register copy like we do in non-fast isel. Instead we were just updating the value map and emitting nothing.
After r298928 we started seeing cases where we would create a copy from GR8 to GR32 because the source register in a VK1 to GR32 copy was replaced by the GR8 going into a truncate.