]> granicus.if.org Git - libvpx/commit
Optimize Neon implementation of high bitdepth SAD4D functions
authorSalome Thirot <salome.thirot@arm.com>
Thu, 2 Feb 2023 16:06:38 +0000 (16:06 +0000)
committerSalome Thirot <salome.thirot@arm.com>
Mon, 6 Feb 2023 21:04:52 +0000 (21:04 +0000)
commit6b8e9e1f3eb5fe8420d49d0b4df146fb1e91e1cf
tree6a6187096138a48bca0772a38a9474d0f1631229
parent9a5cbfbc087210eabfac5b0c2d72d12852ac56ae
Optimize Neon implementation of high bitdepth SAD4D functions

Optimizations take a similar form to those implemented for Armv8.0
standard bitdepth SAD4D:

- Use ABD, UADALP instead of ABAL, ABAL2 (double the throughput on
  modern out-of-order Arm-designed cores.)
- Use more accumulator registers to make better use of Neon pipeline
  resources on Arm CPUs that have four Neon pipes.
- Compute the four SAD sums in parallel so that we only load the source
  block once - instead of four times.

Change-Id: Ica45c44fd167e5fcc83871d8c138fc72ed3a9723
vpx_dsp/arm/highbd_sad4d_neon.c [new file with mode: 0644]
vpx_dsp/arm/highbd_sad_neon.c
vpx_dsp/vpx_dsp.mk