Daniel Sanders [Wed, 29 Mar 2017 15:37:18 +0000 (15:37 +0000)]
[tablegen][globalisel] Convert the SelectionDAG importer to a tree walking approach. NFC
Summary:
But don't actually inspect the tree any deeper than we already do. This
change is NFC but the next one will enable full traversal of the
source/destination patterns.
Depends on D30535
Reviewers: t.p.northover, qcolombet, aditya_nandakumar, rovka, ab
Instantiation of the MachineVerifierPass through
PassInfo::getNormalCtor would yield a segfault since the default
constructor of the MachineVerifierPass takes a reference to nullptr.
Craig Topper [Wed, 29 Mar 2017 06:55:28 +0000 (06:55 +0000)]
[AVX-512] Remove explicit KMOVWrk/KMOVWKr instructions from patterns where we can just use COPY_TO_REGCLASS instead.
This will result in a KMOVW or KMOVD being emitted during register allocation. And in at least some cases this might allow the register coalescer to remove the copy all together.
[XRay] Update FDR log reader to be aware of buffer sizes per thread.
Summary:
It is problematic for this reader that it expects to read data from
several threads, but the header or message format does not define
framing. Since the buffers are reused, we can't rely on skipping
zeroed out data as a synchronization method either.
There is an argument that this is not version compatible with the format
the reader expected previously. I argue that since the writer wrote garbage
past the end of buffer record, there is no currently working reader to
compromise.
The corresponding writer change is posted to D31384.
[XRay][tools] Handle "no subcommand" case for llvm-xray
Summary:
Currently the llvm-xray commandline tool fails to handle the case for
when no subcommand is provided in a graceful manner. This fixes that to
print the help message explaining the subcommands and the available
options.
Craig Topper [Tue, 28 Mar 2017 23:20:37 +0000 (23:20 +0000)]
[AVX-512] Punt on fast-isel of truncates to i1 when AVX512 is enabled.
We should be masking the value and emitting a register copy like we do in non-fast isel. Instead we were just updating the value map and emitting nothing.
After r298928 we started seeing cases where we would create a copy from GR8 to GR32 because the source register in a VK1 to GR32 copy was replaced by the GR8 going into a truncate.
Guozhi Wei [Tue, 28 Mar 2017 22:55:01 +0000 (22:55 +0000)]
[PPC] In PPCBoolRetToInt change the bool value to i64 if the target is ppc64
In PPCBoolRetToInt bool value is changed to i32 type. On ppc64 it may introduce an extra zero extension for the return value. This patch changes the integer type to i64 to avoid the zero extension on ppc64.
[AMDGPU] Boost unroll threshold for loops reading local memory
This is less important than increase threshold for private memory,
but still brings performance improvements in a wide range of tests.
Unrolling more for local memory serves three purposes: it allows
to combine ds operations if offset becomes static, saves registers
used for offsets in case of static offsets, and allows better lds
latency hiding.
[AMDGPU] Fix recorded region boundaries in max-occupancy scheduler
This is incorrect to record region boundaries before scheduling,
it may change after scheduling. As a result second pass may see less
instructions to schedule than it should.
Simon Pilgrim [Tue, 28 Mar 2017 21:32:11 +0000 (21:32 +0000)]
[X86][MMX] Match MMX fp_to_sint conversions from XMM registers
We currently perform the various fp_to_sint XMM conversion and then transfer to the MMX register (on 32-bit via the stack).
This patch improves support for MOVDQ2Q XMM to MMX transfers and adds the XMM->MMX fp_to_sint direct conversion patterns. The SSE2 specifications are the same as for XMM->XMM and XMM->MMX rounding/exceptions/etc.
Adam Nemet [Tue, 28 Mar 2017 20:11:52 +0000 (20:11 +0000)]
[IR] Add AllowContract to FastMathFlags
-ffp-contract=fast does not currently work with LTO because it's passed as a
TargetOption to the backend rather than in the IR. This adds it to
FastMathFlags.
Previously it was covered by the internalization. It turns out we cannot
run internalizer in FE, it break separate compilation tests. Thus early
inliner gets its own option.
Craig Topper [Tue, 28 Mar 2017 16:35:29 +0000 (16:35 +0000)]
[AVX-512] Fix accidental uses of AH/BH/CH/DH after copies to/from mask registers
We've had several bugs(PR32256, PR32241) recently that resulted from usages of AH/BH/CH/DH either before or after a copy to/from a mask register.
This ultimately occurs because we create COPY_TO_REGCLASS with VK1 and GR8. Then in CopyToFromAsymmetricReg in X86InstrInfo we find a 32-bit super register for the GR8 to emit the KMOV with. But as these tests are demonstrating, its possible for the GR8 register to be a high register and we end up doing an accidental extra or insert from bits 15:8.
I think the best way forward is to stop making copies directly between mask registers and GR8/GR16. Instead I think we should restrict to only copies between mask registers and GR32/GR64 and use EXTRACT_SUBREG/INSERT_SUBREG to handle the conversion from GR32 to GR16/8 or vice versa.
Unfortunately, this complicates fastisel a bit more now to create the subreg extracts where we used to create GR8 copies. We can probably make a helper function to bring down the repitition.
This does result in KMOVD being used for copies when BWI is available because we don't know the original mask register size. This caused a lot of deltas on tests because we have to split the checks for KMOVD vs KMOVW based on BWI.
Sanne Wouda [Tue, 28 Mar 2017 10:02:56 +0000 (10:02 +0000)]
[AArch64] [Assembler] option to disable negative immediate conversions
Summary:
Similar to the ARM target in https://reviews.llvm.org/rL298380, this
patch adds identical infrastructure for disabling negative immediate
conversions, and converts the existing aliases to the new infrastucture.
Igor Breger [Tue, 28 Mar 2017 09:35:06 +0000 (09:35 +0000)]
[GlobalISel][X86] support G_FRAME_INDEX instruction selection.
Summary:
G_LOAD/G_STORE, add alternative RegisterBank mapping.
For G_LOAD, Fast and Greedy mode choose the same RegisterBank mapping (GprRegBank ) for the G_GLOAD + G_FADD , can't get rid of cross register bank copy GprRegBank->VecRegBank.
Craig Topper [Tue, 28 Mar 2017 05:32:48 +0000 (05:32 +0000)]
[APInt] Use 'unsigned' instead of 'unsigned int' in the interface to the APInt tc functions. This is more consistent with the rest of the codebase. NFC
Kevin Enderby [Mon, 27 Mar 2017 20:09:23 +0000 (20:09 +0000)]
Add the error handling for Mach-O dyld compact lazy bind, weak bind and
rebase entry errors and test cases for each of the error checks.
Also verified with Nick Kledzik that a BIND_OPCODE_SET_ADDEND_SLEB
opcode is legal in a lazy bind table, so code that had that as an error
check was removed.
With MachORebaseEntry and MachOBindEntry classes now returning
an llvm::Error in all cases for malformed input the variables Malformed
and logic to set use them is no longer needed and has been removed
from those classes.
Also in a few places, removed the redundant Done assignment to true
when also calling moveToEnd() as it does that assignment.
This only leaves the dyld compact export entries left to have
error handling yet to be added for the dyld compact info.
Matthew Simpson [Mon, 27 Mar 2017 20:07:38 +0000 (20:07 +0000)]
[LV] Transform truncations of non-primary induction variables
The vectorizer tries to replace truncations of induction variables with new
induction variables having the smaller type. After r295063, this optimization
was applied to all integer induction variables, including non-primary ones.
When optimizing the truncation of a non-primary induction variable, we still
need to transform the new induction so that it has the correct start value.
This should fix PR32419.
[TableGen] Print #nnn as a name of an non-native reg unit with id nnn
When using -debug with -gen-register-info, tablegen will crash when
trying to print a name of a non-native register unit. This patch only
affects the debug information generated while running llvm-tblgen,
and has no impact on the compilable code coming out of it.
[Support] Avoid concurrency hazard in signal handler registration
Several static functions from the signal API can be invoked
simultaneously; RemoveFileOnSignal for instance can be called indirectly
by multiple parallel loadModule() invocations, which might lead to
the assertion:
Assertion failed: (NumRegisteredSignals < array_lengthof(RegisteredSignalInfo) && "Out of space for signal handlers!"),
function RegisterHandler, file /llvm/lib/Support/Unix/Signals.inc, line 105.
RemoveFileOnSignal calls RegisterHandlers(), which isn't currently
mutex protected, leading to the behavior above. This potentially affect
a few other users of RegisterHandlers() too.
Craig Topper [Mon, 27 Mar 2017 18:16:17 +0000 (18:16 +0000)]
[APInt] Move operator&=(uint64_t) inline and use memset to clear the upper words.
This method is pretty new and probably isn't use much in the code base so this should have a negligible size impact. The OR and XOR operators are already inline.