granicus.if.org Git - libvpx/commit

author	Salome Thirot <salome.thirot@arm.com>
	Thu, 2 Feb 2023 16:06:38 +0000 (16:06 +0000)
committer	Salome Thirot <salome.thirot@arm.com>
	Mon, 6 Feb 2023 21:04:52 +0000 (21:04 +0000)
commit	6b8e9e1f3eb5fe8420d49d0b4df146fb1e91e1cf
tree	6a6187096138a48bca0772a38a9474d0f1631229	tree \| snapshot
parent	9a5cbfbc087210eabfac5b0c2d72d12852ac56ae	commit \| diff

Optimize Neon implementation of high bitdepth SAD4D functions

Optimizations take a similar form to those implemented for Armv8.0
standard bitdepth SAD4D:

- Use ABD, UADALP instead of ABAL, ABAL2 (double the throughput on
  modern out-of-order Arm-designed cores.)
- Use more accumulator registers to make better use of Neon pipeline
  resources on Arm CPUs that have four Neon pipes.
- Compute the four SAD sums in parallel so that we only load the source
  block once - instead of four times.

Change-Id: Ica45c44fd167e5fcc83871d8c138fc72ed3a9723

vpx_dsp/arm/highbd_sad4d_neon.c	[new file with mode: 0644]	blob
vpx_dsp/arm/highbd_sad_neon.c		diff \| blob \| history
vpx_dsp/vpx_dsp.mk		diff \| blob \| history