Hirokazu Honda [Thu, 17 Nov 2022 07:05:28 +0000 (16:05 +0900)]
vp9/rate_ctrl_rtc: Improve get cyclic refresh data
A client of the vp9 rate controller needs to know whether the
segmentation is enabled and the size of delta_q. It is also nicer to
know the size of map. This CL changes the interface to achieve these.
Marco Paniconi [Tue, 15 Nov 2022 06:11:19 +0000 (22:11 -0800)]
vp9-svc: Fixes to make SVC work with VBR
Prior to this CL SVC with VBR mode was broken.
Fixes made here to make VBR rate control work for SVC.
Rename is_one_pass_cbr_svc() --> is_one_pass_svc(),
as it can be used now for both CBR and VBR.
Johann [Mon, 14 Nov 2022 08:59:45 +0000 (17:59 +0900)]
quantize: remove vp9_regular_quantize_b_4x4
This was just a helper function which called vpx_quantize_b or
vpx_highbd_quantize_b. It also checked for skip_block, which was
necessary when webm:1439 was filed but does not appear to be
necessary now.
Removes a quantize variant and makes subsequent cleanups easier.
Refactor & optimize FHT functions further, use new butterfly functions
4x4 5% faster, 8x8 & 16x16 10% faster than previous versions.
Highbd 4x4 FHT version 2.27x faster than C version for --rt.
Sam James [Sun, 6 Nov 2022 04:11:59 +0000 (04:11 +0000)]
build: fix -Wimplicit-int (Clang 16)
Clang 16 will make -Wimplicit-int error by default which can, in addition to
other things, lead to some configure tests silently failing/returning the wrong result.
Fixes this error:
```
+/var/tmp/portage/media-libs/libvpx-1.12.0/temp/vpx-conf-1802-30624.c:1:15: error: type specifier missing, defaults to 'int'; ISO C99 and later do not support implicit int [-Wimplicit-int]
```
For more information, see LWN.net [0] or LLVM's Discourse [1], gentoo-dev@ [2],
or the (new) c-std-porting mailing list [3].
[0] https://lwn.net/Articles/913505/
[1] https://discourse.llvm.org/t/configure-script-breakage-with-the-new-werror-implicit-function-declaration/65213
[2] https://archives.gentoo.org/gentoo-dev/message/dd9f2d3082b8b6f8dfbccb0639e6e240
[3] hosted at lists.linux.dev.
Bug: https://bugs.gentoo.org/879705
Change-Id: Id73a98944ab3c99a368b9da7a5e902ddff9d937f Signed-off-by: Sam James <sam@gentoo.org>
For --best quality, resulting function
vpx_highbd_fdct32x32_rd_neon takes 0.27% of cpu time in
profiling, vs 6.27% for the sum of scalar functions:
vpx_fdct32, vpx_fdct32.constprop.0, vpx_fdct32x32_rd_c for rd.
For --rt quality, the function takes 0.19% vs 4.57% for the scalar
version.
Overall, this improves encoding time by ~6% compared for highbd
for --best and ~9% for --rt.
[NEON] Optimize and homogenize Butterfly DCT functions
Provide a set of commonly used Butterfly DCT functions for use in
DCT 4x4, 8x8, 16x16, 32x32 functions. These are provided in various
forms, using vqrdmulh_s16/vqrdmulh_s32 for _fast variants, which
unfortunately are only usable in pass1 of most DCTs, as they do not
provide the necessary precision in pass2.
This gave a performance gain ranging from 5% to 15% in 16x16 case.
Also, for 32x32, the loads were rearranged, along with the butterfly
optimizations, this gave 10% gain in 32x32_rd function.
This refactoring was necessary to allow easier porting of highbd
32x32 functions -follows this patchset.
Marco Paniconi [Wed, 12 Oct 2022 07:10:47 +0000 (00:10 -0700)]
Fix to VP8 external RC for dynamic update of layers
On change/update of rc_cfg: when number of temporal
layers change call vp8_reset_temporal_layer_change(),
which in turn will call vp8_init_temporal_layer_context()
only for the new layers.
Scott LaVarnway [Wed, 21 Sep 2022 18:37:04 +0000 (11:37 -0700)]
post_proc_sse2.c: quiet -Wuninitialized
In file included from ../libvpx/vpx_dsp/x86/post_proc_sse2.c:12:
In function ‘_mm_add_epi16’,
inlined from ‘vpx_mbpost_proc_down_sse2’ at ../libvpx/vpx_dsp/x86/post_proc_sse2.c:88:13:
/usr/lib/gcc/x86_64-linux-gnu/12/include/emmintrin.h:1060:35: warning: ‘below_context’ may be used uninitialized [-Wmaybe-uninitialized]
1060 | return (__m128i) ((__v8hu)__A + (__v8hu)__B);
| ^~~~~~~~~~~
../libvpx/vpx_dsp/x86/post_proc_sse2.c: In function ‘vpx_mbpost_proc_down_sse2’:
../libvpx/vpx_dsp/x86/post_proc_sse2.c:39:13: note: ‘below_context’ was declared here
39 | __m128i below_context;
L2E: Rework recode decisions for external max frame size and q
Allow to handle external q and external max frame size separately.
Rely on libvpx's decision to catch overshoot/undershoot and recode frames.
Previously, when external max frame size is set, we didn't handle
undershoot cases, and now we fall back to libvpx's decision to
recode a frame if overshoot/undershoot is seen.
James Zern [Thu, 8 Sep 2022 01:41:13 +0000 (18:41 -0700)]
vp8_decode: declare 2 variables volatile
fixes -Wclobbered warnings with gcc 12.1.0:
vp8/vp8_dx_iface.c|278 col 16| warning: variable 'w' might be clobbered
by 'longjmp' or 'vfork' [-Wclobbered]
vp8/vp8_dx_iface.c|278 col 19| warning: variable 'h' might be clobbered
by 'longjmp' or 'vfork' [-Wclobbered]
missed in 447e27588 vpx_dsp,neon: simplify __ARM_FEATURE_DOTPROD check
+ fix #if comments
only check that the macro is defined, the value doesn't have any effect.
from https://arm-software.github.io/acle/main/acle.html:
5.5.7.7. Dot Product extension
__ARM_FEATURE_DOTPROD is defined if the dot product data manipulation
instructions are supported and the vector intrinsics are available.
Note that this implies:
- __ARM_NEON == 1
only check that the macro is defined, the value doesn't have any effect.
from https://arm-software.github.io/acle/main/acle.html:
5.5.7.7. Dot Product extension
__ARM_FEATURE_DOTPROD is defined if the dot product data manipulation
instructions are supported and the vector intrinsics are available.
Note that this implies:
- __ARM_NEON == 1
James Zern [Fri, 2 Sep 2022 01:47:50 +0000 (18:47 -0700)]
neon,load_unaligned_*: use dup for lane 0
this produces better assembly with gcc (11.3.0-3); no change in assembly
using clang from the r24 android sdk (Android (8075178, based on
r437112b) clang version 14.0.1
(https://android.googlesource.com/toolchain/llvm-project 8671348b81b95fc603505dfc881b45103bee1731)
Also update the comment for CLASS_GRAPH by running "doxygen -u" because
the original comment for CLASS_GRAPH mentions the obsolete tag
'CLASS_DIAGRAMS',
James Zern [Wed, 24 Aug 2022 22:48:24 +0000 (15:48 -0700)]
vp8_ratectrl_rtc_test.cc: ensure frame_type is initialized
this fixes a valgrind failure:
==1095597== Conditional jump or move depends on uninitialised value(s)
==1095597== at 0x12E0CC: (anonymous
namespace)::Vp8RcInterfaceTest::PreEncodeFrameHook(libvpx_test::VideoSource*,
libvpx_test:: > Encoder*) (vp8_ratectrl_rtc_test.cc:131)
==1095597== by 0x1255A9:
libvpx_test::EncoderTest::RunLoop(libvpx_test::VideoSource*)
(encode_test_driver.cc:205)