Marco Paniconi [Sat, 14 Jan 2023 03:46:10 +0000 (19:46 -0800)]
Fix to segfault for external resize test in vp9
Failure occurs for 1 pass non-realtime mode at speed 0.
Due to speed feautre rd_ml_partition.var_pruning, which
doesn't check for scaled reference in simple_motion_search().
Jonathan Wright [Thu, 5 Jan 2023 12:20:03 +0000 (12:20 +0000)]
Use lane-referencing intrinsics in Neon convolution kernels
The Neon convolution helper functions take a pointer to a filter and
load the 8 values into a single Neon register. For some reason,
filter values 3 and 4 are then duplicated into their own separate
registers.
This patch modifies these helper functions so that they access filter
values 3 and 4 via the lane-referencing versions of the various Neon
multiply instructions. This reduces register pressure and tidies up
the source code quite a bit.
Jerome Jiang [Wed, 21 Dec 2022 16:13:40 +0000 (11:13 -0500)]
Remove references to deprecated NumPy type aliases
This change replaces references to a number of deprecated NumPy type
aliases (np.bool, np.int, np.float, np.complex, np.object, np.str)
with their recommended replacement
(bool, int, float, complex, object, str).
NumPy 1.24 drops the deprecated aliases
so we must remove uses before updating NumPy.
Anton Venema [Tue, 13 Dec 2022 18:27:37 +0000 (10:27 -0800)]
Add additional ARM targets for Visual Studio.
configure: Add an armv7-win32-vs16 target
configure: Add an armv7-win32-vs17 target
configure: Add an arm64-win64-vs16 target
configure: Add an arm64-win64-vs17 target
Hirokazu Honda [Thu, 17 Nov 2022 07:05:28 +0000 (16:05 +0900)]
vp9/rate_ctrl_rtc: Improve get cyclic refresh data
A client of the vp9 rate controller needs to know whether the
segmentation is enabled and the size of delta_q. It is also nicer to
know the size of map. This CL changes the interface to achieve these.
Marco Paniconi [Tue, 15 Nov 2022 06:11:19 +0000 (22:11 -0800)]
vp9-svc: Fixes to make SVC work with VBR
Prior to this CL SVC with VBR mode was broken.
Fixes made here to make VBR rate control work for SVC.
Rename is_one_pass_cbr_svc() --> is_one_pass_svc(),
as it can be used now for both CBR and VBR.
Johann [Mon, 14 Nov 2022 08:59:45 +0000 (17:59 +0900)]
quantize: remove vp9_regular_quantize_b_4x4
This was just a helper function which called vpx_quantize_b or
vpx_highbd_quantize_b. It also checked for skip_block, which was
necessary when webm:1439 was filed but does not appear to be
necessary now.
Removes a quantize variant and makes subsequent cleanups easier.
Refactor & optimize FHT functions further, use new butterfly functions
4x4 5% faster, 8x8 & 16x16 10% faster than previous versions.
Highbd 4x4 FHT version 2.27x faster than C version for --rt.
Sam James [Sun, 6 Nov 2022 04:11:59 +0000 (04:11 +0000)]
build: fix -Wimplicit-int (Clang 16)
Clang 16 will make -Wimplicit-int error by default which can, in addition to
other things, lead to some configure tests silently failing/returning the wrong result.
Fixes this error:
```
+/var/tmp/portage/media-libs/libvpx-1.12.0/temp/vpx-conf-1802-30624.c:1:15: error: type specifier missing, defaults to 'int'; ISO C99 and later do not support implicit int [-Wimplicit-int]
```
For more information, see LWN.net [0] or LLVM's Discourse [1], gentoo-dev@ [2],
or the (new) c-std-porting mailing list [3].
[0] https://lwn.net/Articles/913505/
[1] https://discourse.llvm.org/t/configure-script-breakage-with-the-new-werror-implicit-function-declaration/65213
[2] https://archives.gentoo.org/gentoo-dev/message/dd9f2d3082b8b6f8dfbccb0639e6e240
[3] hosted at lists.linux.dev.
Bug: https://bugs.gentoo.org/879705
Change-Id: Id73a98944ab3c99a368b9da7a5e902ddff9d937f Signed-off-by: Sam James <sam@gentoo.org>
For --best quality, resulting function
vpx_highbd_fdct32x32_rd_neon takes 0.27% of cpu time in
profiling, vs 6.27% for the sum of scalar functions:
vpx_fdct32, vpx_fdct32.constprop.0, vpx_fdct32x32_rd_c for rd.
For --rt quality, the function takes 0.19% vs 4.57% for the scalar
version.
Overall, this improves encoding time by ~6% compared for highbd
for --best and ~9% for --rt.
[NEON] Optimize and homogenize Butterfly DCT functions
Provide a set of commonly used Butterfly DCT functions for use in
DCT 4x4, 8x8, 16x16, 32x32 functions. These are provided in various
forms, using vqrdmulh_s16/vqrdmulh_s32 for _fast variants, which
unfortunately are only usable in pass1 of most DCTs, as they do not
provide the necessary precision in pass2.
This gave a performance gain ranging from 5% to 15% in 16x16 case.
Also, for 32x32, the loads were rearranged, along with the butterfly
optimizations, this gave 10% gain in 32x32_rd function.
This refactoring was necessary to allow easier porting of highbd
32x32 functions -follows this patchset.
Marco Paniconi [Wed, 12 Oct 2022 07:10:47 +0000 (00:10 -0700)]
Fix to VP8 external RC for dynamic update of layers
On change/update of rc_cfg: when number of temporal
layers change call vp8_reset_temporal_layer_change(),
which in turn will call vp8_init_temporal_layer_context()
only for the new layers.
Scott LaVarnway [Wed, 21 Sep 2022 18:37:04 +0000 (11:37 -0700)]
post_proc_sse2.c: quiet -Wuninitialized
In file included from ../libvpx/vpx_dsp/x86/post_proc_sse2.c:12:
In function ‘_mm_add_epi16’,
inlined from ‘vpx_mbpost_proc_down_sse2’ at ../libvpx/vpx_dsp/x86/post_proc_sse2.c:88:13:
/usr/lib/gcc/x86_64-linux-gnu/12/include/emmintrin.h:1060:35: warning: ‘below_context’ may be used uninitialized [-Wmaybe-uninitialized]
1060 | return (__m128i) ((__v8hu)__A + (__v8hu)__B);
| ^~~~~~~~~~~
../libvpx/vpx_dsp/x86/post_proc_sse2.c: In function ‘vpx_mbpost_proc_down_sse2’:
../libvpx/vpx_dsp/x86/post_proc_sse2.c:39:13: note: ‘below_context’ was declared here
39 | __m128i below_context;
L2E: Rework recode decisions for external max frame size and q
Allow to handle external q and external max frame size separately.
Rely on libvpx's decision to catch overshoot/undershoot and recode frames.
Previously, when external max frame size is set, we didn't handle
undershoot cases, and now we fall back to libvpx's decision to
recode a frame if overshoot/undershoot is seen.
James Zern [Thu, 8 Sep 2022 01:41:13 +0000 (18:41 -0700)]
vp8_decode: declare 2 variables volatile
fixes -Wclobbered warnings with gcc 12.1.0:
vp8/vp8_dx_iface.c|278 col 16| warning: variable 'w' might be clobbered
by 'longjmp' or 'vfork' [-Wclobbered]
vp8/vp8_dx_iface.c|278 col 19| warning: variable 'h' might be clobbered
by 'longjmp' or 'vfork' [-Wclobbered]
missed in 447e27588 vpx_dsp,neon: simplify __ARM_FEATURE_DOTPROD check
+ fix #if comments
only check that the macro is defined, the value doesn't have any effect.
from https://arm-software.github.io/acle/main/acle.html:
5.5.7.7. Dot Product extension
__ARM_FEATURE_DOTPROD is defined if the dot product data manipulation
instructions are supported and the vector intrinsics are available.
Note that this implies:
- __ARM_NEON == 1
only check that the macro is defined, the value doesn't have any effect.
from https://arm-software.github.io/acle/main/acle.html:
5.5.7.7. Dot Product extension
__ARM_FEATURE_DOTPROD is defined if the dot product data manipulation
instructions are supported and the vector intrinsics are available.
Note that this implies:
- __ARM_NEON == 1
James Zern [Fri, 2 Sep 2022 01:47:50 +0000 (18:47 -0700)]
neon,load_unaligned_*: use dup for lane 0
this produces better assembly with gcc (11.3.0-3); no change in assembly
using clang from the r24 android sdk (Android (8075178, based on
r437112b) clang version 14.0.1
(https://android.googlesource.com/toolchain/llvm-project 8671348b81b95fc603505dfc881b45103bee1731)