]> granicus.if.org Git - libvpx/commit
Merge transpose and permute in Neon SDOT vertical convolution
authorJonathan Wright <jonathan.wright@arm.com>
Sat, 22 May 2021 21:07:25 +0000 (22:07 +0100)
committerJames Zern <jzern@google.com>
Tue, 25 May 2021 00:08:32 +0000 (17:08 -0700)
commit10823f54681747b9f64deb3002531c95cc67d17f
tree94b82a45a96d29498e9cf7fdd06250f576af68b4
parent66c1ff6850fd53bcf5c17247569bea1d700d6247
Merge transpose and permute in Neon SDOT vertical convolution

The original dot-product implementation of vpx_convolve8_vert_neon
used a separate transpose before and after the convolution operation.
This patch merges the first transpose with the TBL permute (necessary
before using SDOT to compute the convolution) to significantly reduce
the amount of data re-arrangement. This new approach also allows for
more effective data re-use between loop iterations.

Co-authored by: James Greenhalgh <james.greenhalgh@arm.com>

Bug: b/181236880
Change-Id: I87fe4dadd312c3ad6216943b71a5410ddf4a1b5b
vpx_dsp/arm/vpx_convolve8_neon.c
vpx_dsp/arm/vpx_convolve8_neon.h