Daniel Sanders [Mon, 23 Oct 2017 18:19:24 +0000 (18:19 +0000)]
[globalisel][tablegen] Import stores and allow GISel to automatically substitute zero regs like WZR/XZR/$zero.
This patch enables the import of stores. Unfortunately, doing so by itself,
loses an optimization where storing 0 to memory makes use of WZR/XZR.
To mitigate this, this patch also introduces a new feature that allows register
operands to nominate a zero register. When this is done, GlobalISel will
substitute (G_CONSTANT 0) with the nominated register automatically. This
is currently configured to only apply to the stores.
Applying it to GPR32/GPR64 register classes in general will be done after
review see (https://reviews.llvm.org/D39150).
Vedant Kumar [Mon, 23 Oct 2017 18:04:34 +0000 (18:04 +0000)]
[wasm] readSection: Avoid reading past eof (fixes oss-fuzz #3219)
A wasm file crafted with a bogus section size can trigger an ASan issue
in the DWARFObjInMemory constructor. Nip the problem in the bud when we
read the wasm section.
Found by OSS-Fuzz:
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=3219
Matt Arsenault [Mon, 23 Oct 2017 17:09:35 +0000 (17:09 +0000)]
AMDGPU: Fix default range in non-kernel functions
The range should be assumed to be the hardware maximum
if a workitem intrinsic is used in a callable function
which does not know the restricted limit of the calling
kernel.
Craig Topper [Mon, 23 Oct 2017 16:11:33 +0000 (16:11 +0000)]
[X86] Change XRSTOR to use PS instead of TB to match XSAVE.
I don't think this changes anything functionally yet, but I plan to fix the disassembler to use this to disable matching certain instructions with 0xf3/0xf2/0x66 prefixes.
Simon Pilgrim [Mon, 23 Oct 2017 15:48:08 +0000 (15:48 +0000)]
[DAGCombine] Permit combining of shuffles of equivalent splat BUILD_VECTORs
combineShuffleOfScalars is very conservative about shuffled BUILD_VECTORs that can be combined together.
This patch adds one additional case - if both BUILD_VECTORs represent splats of the same scalar value but with different UNDEF elements, then we should create a single splat BUILD_VECTOR, sharing only the UNDEF elements defined by the shuffle mask.
Fix for Bug 30718 - Failure to disassemble certain MOV with rex.R. The issue was in illegal segment register index.
Differential Revision: https://reviews.llvm.org/D38786
Sam Parker [Mon, 23 Oct 2017 08:05:14 +0000 (08:05 +0000)]
[ARM] Allow unrolling of multi-block loops.
Before, loop unrolling was only enabled for loops with a single
block. This restriction has been removed and replaced by:
- allow a maximum of two exiting blocks,
- a four basic block limit for cores with a branch predictor.
Yichao Yu [Sun, 22 Oct 2017 20:28:17 +0000 (20:28 +0000)]
Fix invalid ptrtoint in InstCombine
Summary:
It's unclear if this is the only thing we can do but at least this is consistent with the check
of address space agreement in `isBitCastable`.
The code is used at least in both instcombine and jumpthreading though
I could only find a way to trigger the invalid cast in instcombine.
Sanjay Patel [Sun, 22 Oct 2017 19:10:07 +0000 (19:10 +0000)]
[SimplifyCFG] delay switch condition forwarding to -latesimplifycfg
As discussed in D39011:
https://reviews.llvm.org/D39011
...replacing constants with a variable is inverting the transform done
by other IR passes, so we definitely don't want to do this early.
In fact, it's questionable whether this transform belongs in SimplifyCFG
at all. I'll look at moving this to codegen as a follow-up step.
Scenario #1:
vreg0 is evicted from physreg0 by vreg1
Evictee vreg0 is intended for region splitting with split candidate physreg0 (the reg vreg0 was evicted from)
Region splitting creates a local interval because of interference with the evictor vreg1 (normally region spliiting creates 2 interval, the "by reg" and "by stack" intervals. Local interval created when interference occurs.)
one of the split intervals ends up evicting vreg2 from physreg1
Evictee vreg2 is intended for region splitting with split candidate physreg1
one of the split intervals ends up evicting vreg3 from physreg2 etc.. until someone spills
Scenario #2
vreg0 is evicted from physreg0 by vreg1
vreg2 is evicted from physreg2 by vreg3 etc
Evictee vreg0 is intended for region splitting with split candidate physreg1
Region splitting creates a local interval because of interference with the evictor vreg1
one of the split intervals ends up evicting back original evictor vreg1 from physreg0 (the reg vreg0 was evicted from)
Another evictee vreg2 is intended for region splitting with split candidate physreg1
one of the split intervals ends up evicting vreg3 from physreg2 etc.. until someone spills
As compile time was a concern, I've added a flag to control weather we do cost calculations for local intervals we expect to be created (it's on by default for X86 target, off for the rest).
Momchil Velikov [Sun, 22 Oct 2017 11:56:35 +0000 (11:56 +0000)]
[ARM] Dynamic stack alignment for 16-bit Thumb
This patch implements dynamic stack (re-)alignment for 16-bit Thumb. When
targeting processors, which support only the 16-bit Thumb instruction set
the compiler ignores the alignment attributes of automatic variables and may
silently generate incorrect code.
Guy Blank [Sun, 22 Oct 2017 11:43:08 +0000 (11:43 +0000)]
[X86] Add a pass to convert instruction chains between domains.
The pass scans the function to find instruction chains that define
registers in the same domain (closures).
It then calculates the cost of converting the closure to another domain.
If found profitable, the instructions are converted to instructions in
the other domain and the register classes are changed accordingly.
This commit adds the pass infrastructure and a simple conversion from
the GPR domain to the Mask domain.
Craig Topper [Sun, 22 Oct 2017 06:18:26 +0000 (06:18 +0000)]
[X86] Teach the disassembler that some instructions use VEX.W==0 without a corresponding VEX.W==1 instruction and we shouldn't treat them as if VEX.W is ignored.
Craig Topper [Sat, 21 Oct 2017 20:03:20 +0000 (20:03 +0000)]
[X86] Fix disassembling of EVEX instructions to stop accidentally decoding the SIB index register as an XMM/YMM/ZMM register.
This introduces a new operand type to encode the whether the index register should be XMM/YMM/ZMM. And new code to fixup the results created by readSIB.
This has the nice effect of removing a bunch of code that hard coded the name of every GATHER and SCATTER instruction to map the index type.
Craig Topper [Sat, 21 Oct 2017 16:35:39 +0000 (16:35 +0000)]
[ValueTracking] Simplify the known bits code for constant vectors a little.
Neither of these cases really require a temporary APInt outside the loop. For the ConstantDataSequential case the APInt will never be larger than 64-bits so its fine to just call getElementAsAPInt. For ConstantVector we can get the APInt by reference and only make a copy where the inversion is needed.
The way that splitInnerLoopHeader splits blocks requires that
the induction PHI will be the first PHI in the inner loop
header. This makes sure that is actually the case when there
are both IV and reduction phis.
Craig Topper [Sat, 21 Oct 2017 03:22:13 +0000 (03:22 +0000)]
[SelectionDAG] Don't subject ConstantSDNodes to the depth limit in computeKnownBits and ComputeNumSignBits.
We don't need to do any additional recursion, we just need to analyze the APInt stored in the node. This matches what the ValueTracking versions do for IR.
Craig Topper [Sat, 21 Oct 2017 02:27:19 +0000 (02:27 +0000)]
[SelectionDAG] Don't subject ISD:Constant to the depth limit in TargetLowering::SimplifyDemandedBits.
Summary:
We shouldn't recurse any further but it doesn't mean we shouldn't be able to give the known bits for a constant. The caller would probably like that we always return the right answer for a constant RHS. This matches what InstCombine does in this case.
I don't have a test case because this showed up while trying to revive D31724.
Craig Topper [Sat, 21 Oct 2017 02:26:00 +0000 (02:26 +0000)]
[X86] Do not generate __multi3 for mul i128 on X86
Summary: __multi3 is not available on x86 (32-bit). Setting lib call name for MULI_128 to nullptr forces DAGTypeLegalizer::ExpandIntRes_MUL to generate instructions for 128-bit multiply instead of a call to an undefined function. This fixes PR20871 though it may be worth looking at why licm and indvars combine to generate 65-bit multiplies in that test.
[Hexagon] Allow redefinition with immediates for hw loop conversion
Normally, if the registers holding the induction variable's bounds
are redefined inside of the loop's body, the loop cannot be converted
to a hardware loop. However, if the redefining instruction is actually
loading an immediate value into the register, this conversion is both
possible and legal (since the immediate itself will be used in the
loop setup in the preheader).
Nikolai Bozhenov [Fri, 20 Oct 2017 10:08:47 +0000 (10:08 +0000)]
[ValueTracking] Enabling ValueTracking patch by default
(recommit #2 after checking for timeout issue).
The original patch was an improvement to IR ValueTracking on
non-negative integers. It has been checked in to trunk (D18777,
r284022). But was disabled by default due to performance regressions.
Perf impact has improved. The patch would be enabled by default.
Alex Bradbury [Thu, 19 Oct 2017 21:37:38 +0000 (21:37 +0000)]
[RISCV] Initial codegen support for ALU operations
This adds the minimum necessary to support codegen for simple ALU operations
on RV32. Prolog and epilog insertion, support for memory operations etc etc
follow in future patches.
Leave guessInstructionProperties=1 until https://reviews.llvm.org/D37065 is
reviewed and lands.