granicus.if.org Git - llvm/commit

[X86][LLVM]Expanding Supports lowerInterleavedStore() in X86InterleavedAccess.

This patch expands the support of lowerInterleavedStore to 32x8i stride 4.

LLVM creates suboptimal shuffle code-gen for AVX2. In overall, this patch is a specific fix for the pattern (Strid=4 VF=32) and we plan to include more patterns in the future. To reach our goal of "more patterns". We include two mask creators. The first function creates shuffle's mask equivalent to unpacklo/unpackhi instructions. The other creator creates mask equivalent to a concat of two half vectors(high/low).

The patch goal is to optimize the following sequence:
At the end of the computation, we have ymm2, ymm0, ymm12 and ymm3 holding
each 32 chars:

c0, c1, , c31
m0, m1, , m31
y0, y1, , y31
k0, k1, ., k31

And these need to be transposed/interleaved and stored like so:

c0 m0 y0 k0 c1 m1 y1 k1 c2 m2 y2 k2 c3 m3 y3 k3 ....

Reviewers:
dorit
Farhana
RKSimon
guyblank
DavidKreitzer

Differential Revision: https://reviews.llvm.org/D34601

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@309086 91177308-0d34-0410-b5e6-96231b3b80d8

author	Michael Zuckerman <Michael.zuckerman@intel.com>
	Wed, 26 Jul 2017 08:10:14 +0000 (08:10 +0000)
committer	Michael Zuckerman <Michael.zuckerman@intel.com>
	Wed, 26 Jul 2017 08:10:14 +0000 (08:10 +0000)
commit	9d7507a837610cb6458acfed617a15487b64bbff
tree	a26e5e59ac09b7aa68da89d80a5f3c0007a44757	tree \| snapshot
parent	8849d30d5f8d21d872fb78e35ae5b2b302113ab0	commit \| diff

lib/Target/X86/X86InterleavedAccess.cpp		diff \| blob \| history
test/CodeGen/X86/x86-interleaved-access.ll		diff \| blob \| history
test/Transforms/InterleavedAccess/X86/interleavedStore.ll		diff \| blob \| history