]> granicus.if.org Git - libvpx/log
libvpx
2 years agoREADME: update release version to 1.13.0 ugly-duckling
James Zern [Sat, 11 Feb 2023 03:04:41 +0000 (19:04 -0800)]
README: update release version to 1.13.0

this was missed in the v1.13.0 tag

Bug: webm:1780
Change-Id: I3044534123bf67861174970e6241f6586055358e
(cherry picked from commit 184a886917529e8a9d23ab564b05b0cc13e29f2b)

2 years agoFix unsigned integer overflow in sse computation v1.13.0
Cheng Chen [Mon, 30 Jan 2023 19:51:58 +0000 (11:51 -0800)]
Fix unsigned integer overflow in sse computation

Basically port the fix from libaom:
https://aomedia-review.googlesource.com/c/aom/+/169361

Change-Id: Id06a5db91372037832399200ded75d514e096726
(cherry picked from commit a94cdd57ffd95ee7beb48d2794dae538f25da46c)

2 years agoUpdate AUTHORS .mailmap and version v1.13.0-rc1
Jerome Jiang [Wed, 1 Feb 2023 16:38:42 +0000 (11:38 -0500)]
Update AUTHORS .mailmap and version

Bug: webm:1780
Change-Id: I75a24bdd076dc1746b23bababfaafccbce3b4214

2 years agoFix per frame qp for temporal layers
Jerome Jiang [Thu, 26 Jan 2023 00:25:12 +0000 (19:25 -0500)]
Fix per frame qp for temporal layers

Also add tests with fixed temporal layering mode.

Change-Id: If516fe94e3fb7f5a745821d1788bfe6cf90edaac
(cherry picked from commit db69ce6aea278bee88668fd9cc2af2e544516fdb)

2 years agoUpdate CHANGELOG
Jerome Jiang [Tue, 31 Jan 2023 17:16:38 +0000 (12:16 -0500)]
Update CHANGELOG

Bug: webm:1780
Change-Id: I3ab4729bff1d27ef7127ef26e780a469e9278c21

2 years agoSkip calculating internal stats when frame dropped
Jerome Jiang [Tue, 24 Jan 2023 19:08:17 +0000 (14:08 -0500)]
Skip calculating internal stats when frame dropped

Bug: webm:1771
Change-Id: I30cd5b7ec0945b521a1cc03999d39ec6a25f1696

2 years agoSpecialize Neon averaging subpel variance by filter value
Salome Thirot [Fri, 20 Jan 2023 11:42:06 +0000 (11:42 +0000)]
Specialize Neon averaging subpel variance by filter value

Use the same specialization for averaging subpel variance functions
as used for the non-averaging variants. The rationale for the
specialization is as follows:

The optimal implementation of the bilinear interpolation depends on
the filter values being used. For both horizontal and vertical
interpolation this can simplify to just taking the source values, or
averaging the source and reference values - which can be computed
more easily than a bilinear interpolation with arbitrary filter
values.

This patch introduces tests to find the most optimal bilinear
interpolation implementation based on the filter values being used.
This new specialization is only used for larger block sizes

This is a backport of this libaom change[1].

After this change, the only differences between the code in libvpx and
libaom are due to libvpx being compiled with ISO C90, which forbids
mixing declarations and code [-Wdeclaration-after-statement].

[1] https://aomedia-review.googlesource.com/c/aom/+/166962

Change-Id: I7860c852db94a7c9c3d72ae4411316685f3800a4

2 years agoRefactor Neon averaging subpel variance functions
Salome Thirot [Fri, 20 Jan 2023 11:21:02 +0000 (11:21 +0000)]
Refactor Neon averaging subpel variance functions

Merge the computation of vpx_comp_avg_pred into the second pass of the
bilinear filter - avoiding the overhead of loading and storing the
entire block again.

This is a backport of this libaom change[1].

[1] https://aomedia-review.googlesource.com/c/aom/+/166961

Change-Id: I9327ff7382a46d50c42a5213a11379b957146372

2 years agoSpecialize Neon subpel variance by filter value for large blocks
Salome Thirot [Fri, 20 Jan 2023 10:35:34 +0000 (10:35 +0000)]
Specialize Neon subpel variance by filter value for large blocks

The optimal implementation of the bilinear interpolation depends on
the filter values being used. For both horizontal and vertical
interpolation this can simplify to just taking the source values, or
averaging the source and reference values - which can be computed
more easily than a bilinear interpolation with arbitrary filter
values.

This patch introduces tests to find the most optimal bilinear
interpolation implementation based on the filter values being used.
This new specialization is only used for larger block sizes
(>= 16x16) as we need to be doing enough work to make the cost of
finding the optimal implementation worth it.

This is a backport of this libaom change[1].

After this change, the only differences between the code in libvpx and
libaom are due to libvpx being compiled with ISO C90, which forbids
mixing declarations and code [-Wdeclaration-after-statement].

[1] https://aomedia-review.googlesource.com/c/aom/+/162463

Change-Id: Ia818e148f6fd126656e8411d59c184b55dd43094

2 years agoRefactor Neon subpel variance functions
Salome Thirot [Thu, 19 Jan 2023 18:02:52 +0000 (18:02 +0000)]
Refactor Neon subpel variance functions

Refactor the Neon implementation of the sub-pixel variance bilinear
filter helper functions - effectively backporting this libaom patch[1].

[1] https://aomedia-review.googlesource.com/c/aom/+/162462

Change-Id: I3dee32e8125250bbeffeb63d1fef5da559bacbf1

2 years agoMerge "Add codec control to set per frame QP" into main
Jerome Jiang [Fri, 20 Jan 2023 17:14:04 +0000 (17:14 +0000)]
Merge "Add codec control to set per frame QP" into main

2 years agoAdd codec control to set per frame QP
Jerome Jiang [Thu, 12 Jan 2023 20:58:00 +0000 (15:58 -0500)]
Add codec control to set per frame QP

Use case is for 1 pass encoding.
Forces max_quantizer = min_quantizer and aq-mode = 0.
Applicalble to spatial layers, where user may set
the QP per spatial layer.

Change-Id: Idfcb7daefde94c475ed1bc0eb8af47c9f309110b

2 years agoMerge "Refactor Neon implementation of variance functions" into main
James Zern [Thu, 19 Jan 2023 19:44:43 +0000 (19:44 +0000)]
Merge "Refactor Neon implementation of variance functions" into main

2 years ago*/Android.mk: add a check for NDK_ROOT
James Zern [Thu, 19 Jan 2023 03:19:01 +0000 (19:19 -0800)]
*/Android.mk: add a check for NDK_ROOT

This simplifies integration with the Android platform and avoids the
files from being used when a non-NDK build is performed. In that case
Android.bp is preferred.

Change-Id: I803912146dac788b7f0af27199c7613cabbc9fa0

2 years agoRefactor Neon implementation of variance functions
Salome Thirot [Mon, 16 Jan 2023 16:44:04 +0000 (16:44 +0000)]
Refactor Neon implementation of variance functions

Refactor and optimize the Neon implementation of variance functions -
effectively backporting these libaom changes[1,2].

After this change, the only differences between the code in libvpx and
libaom are due to libvpx being compiled with ISO C90, which forbids
mixing declarations and code [-Wdeclaration-after-statement].

[1] https://aomedia-review.googlesource.com/c/aom/+/162241
[2] https://aomedia-review.googlesource.com/c/aom/+/162262

Change-Id: Ia4e8fff4d53297511d1a1e43bca8053bf811e551

2 years agoMerge "Fix to segfault for external resize test in vp9" into main
Marco Paniconi [Wed, 18 Jan 2023 02:04:18 +0000 (02:04 +0000)]
Merge "Fix to segfault for external resize test in vp9" into main

2 years agoFix to segfault for external resize test in vp9
Marco Paniconi [Sat, 14 Jan 2023 03:46:10 +0000 (19:46 -0800)]
Fix to segfault for external resize test in vp9

Failure occurs for 1 pass non-realtime mode at speed 0.
Due to speed feautre rd_ml_partition.var_pruning, which
doesn't check for scaled reference in simple_motion_search().

Bug: webm:1768

Change-Id: Iddcb56033bac042faebb5196eed788317590b23f

2 years agovariance_test.cc: Enable HBDMse speed test.
Scott LaVarnway [Fri, 13 Jan 2023 15:30:07 +0000 (07:30 -0800)]
variance_test.cc: Enable HBDMse speed test.

Change-Id: If0226307a6efd704f8a35cb986f570304d698b95

2 years agoMerge "variance_test.cc: Enable VpxHBDMseTest for C and SSE2." into main
Scott LaVarnway [Fri, 13 Jan 2023 13:36:15 +0000 (13:36 +0000)]
Merge "variance_test.cc: Enable VpxHBDMseTest for C and SSE2." into main

2 years agovariance_test.cc: Enable VpxHBDMseTest for C and SSE2.
Scott LaVarnway [Thu, 12 Jan 2023 19:03:28 +0000 (11:03 -0800)]
variance_test.cc: Enable VpxHBDMseTest for C and SSE2.

Change-Id: I66c0db6c605876d6757684fd715614881ca261e7

2 years agoMerge changes Ifbf46768,If19f5872 into main
James Zern [Thu, 12 Jan 2023 18:41:27 +0000 (18:41 +0000)]
Merge changes Ifbf46768,If19f5872 into main

* changes:
  Implement vertical convolutions using Neon USDOT instruction
  Implement horizontal convolutions using Neon USDOT instruction

2 years agoImplement vertical convolutions using Neon USDOT instruction
Jonathan Wright [Wed, 18 May 2022 15:58:50 +0000 (16:58 +0100)]
Implement vertical convolutions using Neon USDOT instruction

Add additional AArch64 paths for vpx_convolve8_vert_neon and
vpx_convolve8_avg_vert_neon that use the Armv8.6-A USDOT (mixed-sign
dot-product) instruction. The USDOT instruction takes an 8-bit
unsigned operand vector and a signed 8-bit operand vector to produce
a signed 32-bit result. This is helpful because convolution filters
often have both positive and negative values, while the 8-bit pixel
channel data being filtered is all unsigned. As a result, the USDOT
convolution paths added here do not have to do the "transform the
pixel channel data to [-128, 128) and correct for it later" dance
that we have to do with the SDOT paths.

The USDOT instruction is optional from Armv8.2 to Armv8.5 but
mandatory from Armv8.6 onwards. The availability of the USDOT
instruction is indicated by the feature macro
__ARM_FEATURE_MATMUL_INT8. The SDOT paths are retained for use on
target CPUs that do not implement the USDOT instructions.

Change-Id: Ifbf467681dd53bb1d26e22359885e6edde3c5c72

2 years agoImplement horizontal convolutions using Neon USDOT instruction
Jonathan Wright [Wed, 18 May 2022 13:14:56 +0000 (14:14 +0100)]
Implement horizontal convolutions using Neon USDOT instruction

Add additional AArch64 paths for vpx_convolve8_horiz_neon and
vpx_convolve8_avg_horiz_neon that use the Armv8.6-A USDOT (mixed-sign
dot-product) instruction. The USDOT instruction takes an 8-bit
unsigned operand vector and a signed 8-bit operand vector to produce
a signed 32-bit result. This is helpful because convolution filters
often have both positive and negative values, while the 8-bit pixel
channel data being filtered is all unsigned. As a result, the USDOT
convolution paths added here do not have to do the "transform the
pixel channel data to [-128, 128) and correct for it later" dance
that we have to do with the SDOT paths.

The USDOT instruction is optional from Armv8.2 to Armv8.5 but
mandatory from Armv8.6 onwards. The availability of the USDOT
instruction is indicated by the feature macro
__ARM_FEATURE_MATMUL_INT8. The SDOT paths are retained for use on
target CPUs that do not implement the USDOT instructions.

Change-Id: If19f5872c3453458a8cfb7c7d2be82a2c0eab46a

2 years agobuild: replace egrep with grep -E
James Zern [Tue, 10 Jan 2023 21:49:15 +0000 (13:49 -0800)]
build: replace egrep with grep -E

avoids a warning on some platforms:
egrep: warning: egrep is obsolescent; using grep -E

Bug: webm:1786
Change-Id: Ia434297731303aacb0b02cf3dcbfd8e03936485d
Fixed: webm:1786

2 years agoUse Neon load/store helper functions consistently
Jonathan Wright [Thu, 5 Jan 2023 15:04:53 +0000 (15:04 +0000)]
Use Neon load/store helper functions consistently

Define all Neon load/store helper functions in mem_neon.h and use
them consistently in Neon convolution functions.

Change-Id: I57905bc0a3574c77999cf4f4a73442c3420fa2be

2 years agoUse lane-referencing intrinsics in Neon convolution kernels
Jonathan Wright [Thu, 5 Jan 2023 12:20:03 +0000 (12:20 +0000)]
Use lane-referencing intrinsics in Neon convolution kernels

The Neon convolution helper functions take a pointer to a filter and
load the 8 values into a single Neon register. For some reason,
filter values 3 and 4 are then duplicated into their own separate
registers.

This patch modifies these helper functions so that they access filter
values 3 and 4 via the lane-referencing versions of the various Neon
multiply instructions. This reduces register pressure and tidies up
the source code quite a bit.

Change-Id: Ia4aeee8b46fe218658fb8577dc07ff04a9324b3e

2 years agoRemove references to deprecated NumPy type aliases
Jerome Jiang [Wed, 21 Dec 2022 16:13:40 +0000 (11:13 -0500)]
Remove references to deprecated NumPy type aliases

This change replaces references to a number of deprecated NumPy type
aliases (np.bool, np.int, np.float, np.complex, np.object, np.str)
with their recommended replacement
(bool, int, float, complex, object, str).

NumPy 1.24 drops the deprecated aliases
so we must remove uses before updating NumPy.

Change-Id: I9f5dfcbb11fe6534fce358054f210c7653f278c3

2 years ago[x86]: Add vpx_highbd_comp_avg_pred_sse2().
Scott LaVarnway [Tue, 20 Dec 2022 23:43:44 +0000 (15:43 -0800)]
[x86]: Add vpx_highbd_comp_avg_pred_sse2().

C vs SSE2

4x4: 3.38x
8x8: 3.45x
16x16: 2.06x
32x32: 2.19x
64x64: 1.39x

Change-Id: I46638fe187b49a78fee554114fac51c485d74474

2 years agoAdd vpx_highbd_comp_avg_pred_c() test.
Scott LaVarnway [Fri, 16 Dec 2022 18:21:00 +0000 (10:21 -0800)]
Add vpx_highbd_comp_avg_pred_c() test.

Change-Id: I6b2c3379c49a62e56e5ac56fd4782a50b3c4e12a

2 years agoMerge "rc-svc: Add tests for dynamic svc in external RC" into main
Marco Paniconi [Wed, 14 Dec 2022 17:08:21 +0000 (17:08 +0000)]
Merge "rc-svc: Add tests for dynamic svc in external RC" into main

2 years agorc-svc: Add tests for dynamic svc in external RC
Marco Paniconi [Wed, 7 Dec 2022 08:17:22 +0000 (00:17 -0800)]
rc-svc: Add tests for dynamic svc in external RC

Test to verify RC for going down and back up in
spatial layers. Going back up has an issue so added
a TODO.

Make the test more flexible to handle dynamic layers.
Test for dyanmic change in temporal layers to follow.

Change-Id: Ic5542f7b274135277429e116f56ba54e682e96a0

2 years agoAdd additional ARM targets for Visual Studio.
Anton Venema [Tue, 13 Dec 2022 18:27:37 +0000 (10:27 -0800)]
Add additional ARM targets for Visual Studio.

configure: Add an armv7-win32-vs16 target
configure: Add an armv7-win32-vs17 target
configure: Add an arm64-win64-vs16 target
configure: Add an arm64-win64-vs17 target

Change-Id: I11d6cd6e51f7703939d6fd3fc6a7469591e3b09d

2 years agoMerge "L2E: Add a new interface to control rdmult" into main
Cheng Chen [Tue, 13 Dec 2022 01:24:00 +0000 (01:24 +0000)]
Merge "L2E: Add a new interface to control rdmult" into main

2 years ago[x86]: Add vpx_highbd_subtract_block_avx2().
Scott LaVarnway [Tue, 6 Dec 2022 21:13:30 +0000 (13:13 -0800)]
[x86]: Add vpx_highbd_subtract_block_avx2().

Up to 4x faster than "sse2 vectorized C".

Change-Id: Ie9b3c12a437c5cddf92c4d5349c4f659ca6b82ea

2 years agoAdd vpx highbd subtract test.
Scott LaVarnway [Tue, 6 Dec 2022 22:18:03 +0000 (14:18 -0800)]
Add vpx highbd subtract test.

Change-Id: I069ae0fe22bfc82ad5083df85a7fdf9058a285eb

2 years agoL2E: Add a new interface to control rdmult
Cheng Chen [Sat, 3 Dec 2022 02:04:32 +0000 (18:04 -0800)]
L2E: Add a new interface to control rdmult

Allow external model to control frame rdmult.

A function is called per frame to get the value of rdmult from
the external model.

The external rdmult will overwrite libvpx's default rdmult unless
a reserved value is selected.

A unit test is added to test when the default rdmult value is set.

Change-Id: I2f17a036c188de66dc00709beef4bf2ed86a919a

2 years agorc-rtc: Test for periodic key in SVC external RC
Marco Paniconi [Mon, 5 Dec 2022 22:30:40 +0000 (14:30 -0800)]
rc-rtc: Test for periodic key in SVC external RC

This test catches the fix merged in here:
https://chromium-review.googlesource.com/c/webm/libvpx/+/4022904

Change-Id: Ib68fbcba694b5d465a9faf3ca7d6880bfe8eabb3

2 years agorc-rtc: Remove frame_flags_ change in svc ratectril rtc test
Marco Paniconi [Mon, 5 Dec 2022 19:54:33 +0000 (11:54 -0800)]
rc-rtc: Remove frame_flags_ change in svc ratectril rtc test

SVC test is only in CBR and the frame_flags are
set by the SVC pattern, so we shouldn't undo them
for svc mode.

Change-Id: I5ffa65dd58a7b47f287d124d9e71ba1dc7c5a549

2 years agoMerge "vp9/rate_ctrl_rtc: Improve get cyclic refresh data" into main
Marco Paniconi [Fri, 18 Nov 2022 04:16:26 +0000 (04:16 +0000)]
Merge "vp9/rate_ctrl_rtc: Improve get cyclic refresh data" into main

2 years agovp9/rate_ctrl_rtc: Improve get cyclic refresh data
Hirokazu Honda [Thu, 17 Nov 2022 07:05:28 +0000 (16:05 +0900)]
vp9/rate_ctrl_rtc: Improve get cyclic refresh data

A client of the vp9 rate controller needs to know whether the
segmentation is enabled and the size of delta_q. It is also nicer to
know the size of map. This CL changes the interface to achieve these.

Bug: b:259487065
Test: Build

Change-Id: If05854530f97e1430a7b97788910f277ab673a87

2 years agoMerge "vp9-svc: Fixes to make SVC work with VBR" into main
Marco Paniconi [Tue, 15 Nov 2022 21:45:07 +0000 (21:45 +0000)]
Merge "vp9-svc: Fixes to make SVC work with VBR" into main

2 years agovp9-svc: Fixes to make SVC work with VBR
Marco Paniconi [Tue, 15 Nov 2022 06:11:19 +0000 (22:11 -0800)]
vp9-svc: Fixes to make SVC work with VBR

Prior to this CL SVC with VBR mode was broken.
Fixes made here to make VBR rate control work for SVC.
Rename is_one_pass_cbr_svc() --> is_one_pass_svc(),
as it can be used now for both CBR and VBR.

Added rate targetting unittest for (2SL, 3TL).

Bug: chromium:1375111
Change-Id: I5a62ffe7fbea29dc5949c88a284768386b1907a9

2 years agoMerge "[NEON] Optimize FHT functions, add highbd FHT 4x4" into main
James Zern [Tue, 15 Nov 2022 19:19:43 +0000 (19:19 +0000)]
Merge "[NEON] Optimize FHT functions, add highbd FHT 4x4" into main

2 years agoquantize: remove vp9_regular_quantize_b_4x4
Johann [Mon, 14 Nov 2022 08:59:45 +0000 (17:59 +0900)]
quantize: remove vp9_regular_quantize_b_4x4

This was just a helper function which called vpx_quantize_b or
vpx_highbd_quantize_b. It also checked for skip_block, which was
necessary when webm:1439 was filed but does not appear to be
necessary now.

Removes a quantize variant and makes subsequent cleanups easier.

Change-Id: Ibe545eccd19370f07ff26c8e151f290c642efd2a

2 years ago[NEON] Optimize FHT functions, add highbd FHT 4x4
Konstantinos Margaritis [Wed, 9 Nov 2022 09:30:58 +0000 (09:30 +0000)]
[NEON] Optimize FHT functions, add highbd FHT 4x4

Refactor & optimize FHT functions further, use new butterfly functions
4x4 5% faster, 8x8 & 16x16 10% faster than previous versions.
Highbd 4x4 FHT version 2.27x faster than C version for --rt.

Change-Id: I3ebcd26010f6c5c067026aa9353cde46669c5d94

2 years agovp9-rc: Fix key frame setting in external RC
Marco Paniconi [Fri, 11 Nov 2022 02:50:19 +0000 (18:50 -0800)]
vp9-rc: Fix key frame setting in external RC

Bug: b/257368998

Change-Id: I03e35915ac99b50cb6bdf7bce8b8f9ec5aef75b7

2 years agoMerge "Add Neon implementation of vpx_hadamard_32x32" into main
James Zern [Mon, 7 Nov 2022 21:48:50 +0000 (21:48 +0000)]
Merge "Add Neon implementation of vpx_hadamard_32x32" into main

2 years agobuild: fix -Wimplicit-int (Clang 16)
Sam James [Sun, 6 Nov 2022 04:11:59 +0000 (04:11 +0000)]
build: fix -Wimplicit-int (Clang 16)

Clang 16 will make -Wimplicit-int error by default which can, in addition to
other things, lead to some configure tests silently failing/returning the wrong result.

Fixes this error:
```
+/var/tmp/portage/media-libs/libvpx-1.12.0/temp/vpx-conf-1802-30624.c:1:15: error: type specifier missing, defaults to 'int'; ISO C99 and later do not support implicit int [-Wimplicit-int]
```

For more information, see LWN.net [0] or LLVM's Discourse [1], gentoo-dev@ [2],
or the (new) c-std-porting mailing list [3].

[0] https://lwn.net/Articles/913505/
[1] https://discourse.llvm.org/t/configure-script-breakage-with-the-new-werror-implicit-function-declaration/65213
[2] https://archives.gentoo.org/gentoo-dev/message/dd9f2d3082b8b6f8dfbccb0639e6e240
[3] hosted at lists.linux.dev.

Bug: https://bugs.gentoo.org/879705
Change-Id: Id73a98944ab3c99a368b9da7a5e902ddff9d937f
Signed-off-by: Sam James <sam@gentoo.org>
2 years agoAdd Neon implementation of vpx_hadamard_32x32
Andrew Salkeld [Thu, 13 Oct 2022 15:28:41 +0000 (16:28 +0100)]
Add Neon implementation of vpx_hadamard_32x32

Add an Arm Neon implementation of vpx_hadamard_32x32 and use it
instead of the scalar C implementation.

Also add test coverage for the new Neon implementation.

Change-Id: Iccc018eec4dbbe629fb0c6f8ad6ea8554e7a0b13

2 years ago[NEON] Optimize highbd 32x32 DCT
Konstantinos Margaritis [Wed, 26 Oct 2022 22:09:32 +0000 (22:09 +0000)]
[NEON] Optimize highbd 32x32 DCT

For --best quality, resulting function
vpx_highbd_fdct32x32_rd_neon takes 0.27% of cpu time in
profiling, vs 6.27% for the sum of scalar functions:
vpx_fdct32, vpx_fdct32.constprop.0, vpx_fdct32x32_rd_c for rd.
For --rt quality, the function takes 0.19% vs 4.57% for the scalar
version.
Overall, this improves encoding time by ~6% compared for highbd
for --best and ~9% for --rt.

Change-Id: I1ce4bbef6e364bbadc76264056aa3f86b1a8edc5

2 years agoMerge "[NEON] Optimize and homogenize Butterfly DCT functions" into main
James Zern [Wed, 2 Nov 2022 02:21:18 +0000 (02:21 +0000)]
Merge "[NEON] Optimize and homogenize Butterfly DCT functions" into main

2 years ago[NEON] Optimize and homogenize Butterfly DCT functions
Konstantinos Margaritis [Wed, 26 Oct 2022 21:37:31 +0000 (21:37 +0000)]
[NEON] Optimize and homogenize Butterfly DCT functions

Provide a set of commonly used Butterfly DCT functions for use in
DCT 4x4, 8x8, 16x16, 32x32 functions. These are provided in various
forms, using vqrdmulh_s16/vqrdmulh_s32 for _fast variants, which
unfortunately are only usable in pass1 of most DCTs, as they do not
provide the necessary precision in pass2.
This gave a performance gain ranging from 5% to 15% in 16x16 case.
Also, for 32x32, the loads were rearranged, along with the butterfly
optimizations, this gave 10% gain in 32x32_rd function.
This refactoring was necessary to allow easier porting of highbd
32x32 functions -follows this patchset.

Change-Id: I6282e640b95a95938faff76c3b2bace3dc298bc3

2 years agoMerge "MacOS 13 is darwin22" into main
Johann Koenig [Thu, 27 Oct 2022 08:38:48 +0000 (08:38 +0000)]
Merge "MacOS 13 is darwin22" into main

2 years agoMerge "rtcd: allow disabling neon on armv8" into main
Johann Koenig [Thu, 27 Oct 2022 08:38:18 +0000 (08:38 +0000)]
Merge "rtcd: allow disabling neon on armv8" into main

2 years agoMacOS 13 is darwin22
Johann [Thu, 27 Oct 2022 02:40:19 +0000 (11:40 +0900)]
MacOS 13 is darwin22

Bug: webm:1783
Change-Id: I97d94ab8c8aebe13aedb58e280dc37474814ad5d

2 years agortcd: allow disabling neon on armv8
Johann [Wed, 26 Oct 2022 23:49:37 +0000 (08:49 +0900)]
rtcd: allow disabling neon on armv8

Change-Id: Idef943775456eb95b46be5c92c114c1d215f38d7

2 years agomailmap: add johann@duck.com
Johann [Wed, 26 Oct 2022 08:14:21 +0000 (17:14 +0900)]
mailmap: add johann@duck.com

Change-Id: I3b48951e69ba1f4a9fafdbb81fac48f79587a342

2 years agoMerge changes I36545ff4,Id1aa29da into main
James Zern [Tue, 25 Oct 2022 19:16:46 +0000 (19:16 +0000)]
Merge changes I36545ff4,Id1aa29da into main

* changes:
  vp9_highbd_quantize_fp*_neon: normalize fn param name
  highbd_sad_avx2: normalize function param names

2 years agoMerge "SAD*Test: mark virtual Run() as overridden" into main
James Zern [Tue, 25 Oct 2022 19:16:08 +0000 (19:16 +0000)]
Merge "SAD*Test: mark virtual Run() as overridden" into main

2 years agoMerge "quantize: consolidate sse2 conditionals" into main
Johann Koenig [Tue, 25 Oct 2022 13:26:37 +0000 (13:26 +0000)]
Merge "quantize: consolidate sse2 conditionals" into main

2 years agoMerge "vp9 quantize: rewrite ssse3 in intrinsics" into main
Johann Koenig [Tue, 25 Oct 2022 13:26:22 +0000 (13:26 +0000)]
Merge "vp9 quantize: rewrite ssse3 in intrinsics" into main

2 years agoSAD*Test: mark virtual Run() as overridden
James Zern [Mon, 24 Oct 2022 22:37:26 +0000 (15:37 -0700)]
SAD*Test: mark virtual Run() as overridden

this comes from AbstractBench

Change-Id: Ie0b5a26a68bfbffd80f132125d15a1bdfc990c22

2 years agovp9_highbd_quantize_fp*_neon: normalize fn param name
James Zern [Mon, 24 Oct 2022 22:28:47 +0000 (15:28 -0700)]
vp9_highbd_quantize_fp*_neon: normalize fn param name

count -> n_coeffs. aligns the name with the rtcd header; clears a
clang-tidy warning

Change-Id: I36545ff479df92b117c95e494f16002e6990f433

2 years agohighbd_sad_avx2: normalize function param names
James Zern [Mon, 24 Oct 2022 22:24:51 +0000 (15:24 -0700)]
highbd_sad_avx2: normalize function param names

(src|ref)8_ptr -> (src|ref)_ptr. aligns the names with the rtcd header;
clears some clang-tidy warnings

Change-Id: Id1aa29da8c0fa5860b46ac902f5b2620c0d3ff54

2 years agoFix to VP8 external RC for buffer levels
Marco Paniconi [Tue, 18 Oct 2022 05:36:25 +0000 (22:36 -0700)]
Fix to VP8 external RC for buffer levels

On a dynamic change of temporal layers:
starting/maimum/optimal were being set twice,
causing incorrect large values.

Bug: b/253927937
Change-Id: I204e885cff92530336a9ed9a4363c486c5bf80ae

2 years agoquantize: consolidate sse2 conditionals
Johann [Mon, 17 Oct 2022 07:22:23 +0000 (16:22 +0900)]
quantize: consolidate sse2 conditionals

Change-Id: I43de579e30f2967b97064063e29676e0af1a498f

2 years agovp9 quantize: rewrite ssse3 in intrinsics
Johann [Sat, 1 Oct 2022 02:47:05 +0000 (11:47 +0900)]
vp9 quantize: rewrite ssse3 in intrinsics

Change-Id: I3177251a5935453a23a23c39ea5f6fd41254775e

2 years agoMerge "Fix to VP8 external RC for dynamic update of layers" into main
Marco Paniconi [Sat, 15 Oct 2022 01:56:46 +0000 (01:56 +0000)]
Merge "Fix to VP8 external RC for dynamic update of layers" into main

2 years agoFix to VP8 external RC for dynamic update of layers
Marco Paniconi [Wed, 12 Oct 2022 07:10:47 +0000 (00:10 -0700)]
Fix to VP8 external RC for dynamic update of layers

On change/update of rc_cfg: when number of temporal
layers change call vp8_reset_temporal_layer_change(),
which in turn will call vp8_init_temporal_layer_context()
only for the new layers.

Bug:b/249644737

Change-Id: Ib20d746c7eacd10b78806ca6a5362c750d9ca0b3

2 years ago[NEON] fix clang compile warnings
Konstantinos Margaritis [Thu, 13 Oct 2022 15:19:46 +0000 (15:19 +0000)]
[NEON] fix clang compile warnings

Change-Id: Ib7ce7a774ec89ba51169ea64d24c878109ef07d1

2 years agoMerge "Add vpx_highbd_sad64x{64,32}_avg_avx2." into main
Scott LaVarnway [Thu, 13 Oct 2022 11:31:51 +0000 (11:31 +0000)]
Merge "Add vpx_highbd_sad64x{64,32}_avg_avx2." into main

2 years ago[NEON] Add highbd FDCT 16x16 function
Konstantinos Margaritis [Fri, 7 Oct 2022 15:13:29 +0000 (15:13 +0000)]
[NEON] Add highbd FDCT 16x16 function

90-95% faster than C version in best/rt profiles

Change-Id: I41d5e9acdc348b57153637ec736498a25ed84c25

2 years agoMerge "[NEON] Add highbd FDCT 8x8 function" into main
James Zern [Wed, 12 Oct 2022 20:07:51 +0000 (20:07 +0000)]
Merge "[NEON] Add highbd FDCT 8x8 function" into main

2 years agoMerge "Add vpx_highbd_sad32x{64,32,16}_avg_avx2." into main
Scott LaVarnway [Wed, 12 Oct 2022 19:50:55 +0000 (19:50 +0000)]
Merge "Add vpx_highbd_sad32x{64,32,16}_avg_avx2." into main

2 years agoMerge "Add vpx_highbd_sad16x{32,16,8}_avg_avx2." into main
Scott LaVarnway [Wed, 12 Oct 2022 19:44:44 +0000 (19:44 +0000)]
Merge "Add vpx_highbd_sad16x{32,16,8}_avg_avx2." into main

2 years ago[NEON] Add highbd FDCT 8x8 function
Konstantinos Margaritis [Thu, 6 Oct 2022 16:00:43 +0000 (16:00 +0000)]
[NEON] Add highbd FDCT 8x8 function

50% faster than C version in best/rt profiles

Change-Id: I0f9504ed52b5d5f7722407e91108ed4056d66bc2

2 years agoAdd vpx_highbd_sad64x{64,32}_avg_avx2.
Scott LaVarnway [Wed, 12 Oct 2022 17:26:43 +0000 (10:26 -0700)]
Add vpx_highbd_sad64x{64,32}_avg_avx2.

~2.8x faster than the sse2 version.

Bug: b/245917257

Change-Id: Ib727ba8a8c8fa4df450bafdde30ed99fd283f06d

2 years ago[NEON] Add highbd FDCT 4x4 function
Konstantinos Margaritis [Thu, 6 Oct 2022 14:53:56 +0000 (14:53 +0000)]
[NEON] Add highbd FDCT 4x4 function

~80% faster than C version for both best/rt profiles.

Change-Id: Ibb3c8e1862131d2a020922420d53c66b31d5c2c3

2 years agoAdd vpx_highbd_sad32x{64,32,16}_avg_avx2.
Scott LaVarnway [Wed, 12 Oct 2022 13:05:46 +0000 (06:05 -0700)]
Add vpx_highbd_sad32x{64,32,16}_avg_avx2.

2.1x to 2.8x faster than the sse2 version.

Bug: b/245917257

Change-Id: I1aaffa4a1debbe5559784e854b8fc6fba07e5000

2 years agoAdd vpx_highbd_sad16x{32,16,8}_avg_avx2.
Scott LaVarnway [Mon, 10 Oct 2022 15:38:44 +0000 (08:38 -0700)]
Add vpx_highbd_sad16x{32,16,8}_avg_avx2.

1.6x to 2.1x faster than the sse2 version.

Bug: b/245917257

Change-Id: I56c467a850297ae3abcca4b4843302bb8d5d0ac1

2 years ago[NEON] Move helper functions for reuse
Konstantinos Margaritis [Thu, 6 Oct 2022 13:05:01 +0000 (13:05 +0000)]
[NEON] Move helper functions for reuse

Move all butterfly functions to fdct_neon.h
Slightly optimize load/scale/cross functions
in fdct 16x16.
These will be reused in highbd variants.

Change-Id: I28b6e0cc240304bab6b94d9c3f33cca77b8cb073

2 years agoMerge "SADavgTest: Add speed test." into main
Scott LaVarnway [Mon, 10 Oct 2022 20:34:02 +0000 (20:34 +0000)]
Merge "SADavgTest: Add speed test." into main

2 years agoSADavgTest: Add speed test.
Scott LaVarnway [Mon, 10 Oct 2022 19:20:37 +0000 (12:20 -0700)]
SADavgTest: Add speed test.

Change-Id: Ie14c0f6d15f410adf749f7ab74cf9f2bf35f3d5f

2 years ago[NEON] move transpose_8x8 to reuse
Konstantinos Margaritis [Thu, 6 Oct 2022 10:58:27 +0000 (10:58 +0000)]
[NEON] move transpose_8x8 to reuse

Change-Id: I3915b6c9971aedaac9c23f21fdb88bc271216208

2 years agoMerge "[NEON] highbd partial DCT functions" into main
James Zern [Mon, 10 Oct 2022 18:37:05 +0000 (18:37 +0000)]
Merge "[NEON] highbd partial DCT functions" into main

2 years ago[NEON] highbd partial DCT functions
Konstantinos Margaritis [Thu, 6 Oct 2022 10:26:05 +0000 (10:26 +0000)]
[NEON] highbd partial DCT functions

Change-Id: I7dd4e698469562f5b1f948cc36f8403b490dcb6a

2 years agoAdd vpx_highbd_sad64x{64,32}_avx2.
Scott LaVarnway [Fri, 7 Oct 2022 12:53:50 +0000 (05:53 -0700)]
Add vpx_highbd_sad64x{64,32}_avx2.

~2.8x faster than the sse2 version.

Bug: b/245917257

Change-Id: Ibc8e5d030ec145c9a9b742fff98fbd9131c9ede4

2 years agoMerge "vp9 quantize: change index" into main
Johann Koenig [Fri, 7 Oct 2022 08:17:03 +0000 (08:17 +0000)]
Merge "vp9 quantize: change index" into main

2 years agoAdd vpx_highbd_sad32x{64,32,16}_avx2.
Scott LaVarnway [Wed, 5 Oct 2022 21:03:55 +0000 (14:03 -0700)]
Add vpx_highbd_sad32x{64,32,16}_avx2.

2.7x to 3.1x faster than the sse2 version.

Bug: b/245917257

Change-Id: Idff3284932f7ee89d036f38893205bf622a159a3

2 years agoAdd vpx_highbd_sad16x{32,16,8}_avx2.
Scott LaVarnway [Wed, 5 Oct 2022 14:04:27 +0000 (07:04 -0700)]
Add vpx_highbd_sad16x{32,16,8}_avx2.

1.9x to 2.4x faster than the sse2 version.

Bug: b/245917257

Change-Id: I686452772f9b72233930de2207af36a0cd72e0bb

2 years agoMerge "L2E: Rework recode decisions for external max frame size and q" into main
Cheng Chen [Tue, 4 Oct 2022 16:15:49 +0000 (16:15 +0000)]
Merge "L2E: Rework recode decisions for external max frame size and q" into main

2 years agovp9 quantize: change index
Johann [Sat, 1 Oct 2022 02:18:09 +0000 (11:18 +0900)]
vp9 quantize: change index

In assembly it made sense to iterate using n_coeffs.
In intrinsics it's just as fast to use index and
easier to read.

Change-Id: I403c959709309dad68123d0a3d0efe183874543d

2 years agovpx_subpixel_8t_intrin_avx2.c: quiet -Wuninitialized
Scott LaVarnway [Mon, 19 Sep 2022 12:09:23 +0000 (05:09 -0700)]
vpx_subpixel_8t_intrin_avx2.c: quiet -Wuninitialized

warning: ‘s2[3]’ may be used uninitialized
and
warning: ‘s1[3]’ may be used uninitialized

The warnings exposed unused code.

Change-Id: I75cf1f9db75e811cb42e2f143be1ad76f3e4dee9

2 years agoMerge "vp9_rd.c quiet -Wstringop-overflow" into main
Scott LaVarnway [Mon, 26 Sep 2022 23:18:04 +0000 (23:18 +0000)]
Merge "vp9_rd.c quiet -Wstringop-overflow" into main

2 years agoquantize: standardize vp9_quantize_fp_sse2
Johann [Sat, 24 Sep 2022 01:53:05 +0000 (10:53 +0900)]
quantize: standardize vp9_quantize_fp_sse2

Match style for vpx_quantize_b_sse2 and prepare to rewrite
ssse3 version in intrinsics.

Need to evaluate the value of threshold breakout before
going further.

Change-Id: I9cfceb1bb0dc237cd6b73fc8d41d78bba444a15b

2 years agovp9_rd.c quiet -Wstringop-overflow
Scott LaVarnway [Fri, 23 Sep 2022 16:17:18 +0000 (09:17 -0700)]
vp9_rd.c quiet -Wstringop-overflow

../libvpx/vp9/encoder/vp9_rd.c:594:20: warning: writing 1 byte into a region of size 0 [-Wstringop-overflow=]
  594 |         t_above[i] = !!*(const uint32_t *)&above[i];
      |         ~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../libvpx/vp9/encoder/vp9_rd.c:572:47: note: at offset [64, 254] into destination object ‘t_above’ of size [0, 16]
  572 |                               ENTROPY_CONTEXT t_above[16],
      |                               ~~~~~~~~~~~~~~~~^~~~~~~~~~~

Change-Id: Ie9ef24e685af417cdd35f6aa7284805e422b6ae2

2 years agoquantize: add untested function
Johann [Sat, 24 Sep 2022 01:55:52 +0000 (10:55 +0900)]
quantize: add untested function

vp9_quantize_fp_sse2 was only tested in non-hbd
configuration. Missed when fixing this for
vpx_quantize_b_sse2.

Change-Id: Ide346e5727d74281c774f605c90d280050e0bf62

2 years agoquantize: increase iscan by 1
Johann [Fri, 16 Sep 2022 23:47:28 +0000 (08:47 +0900)]
quantize: increase iscan by 1

All of the assembly adds 1 to iscan to convert from
a 0 based array to the EOB value.

Add 1 to all iscan values and remove the extra
instructions from the assembly.

Change-Id: I219dd7f2bd10533ab24b206289565703176dc5e9

2 years agoMerge "resize_test.cc: quiet -Wmaybe-uninitialized" into main
Scott LaVarnway [Wed, 21 Sep 2022 23:41:42 +0000 (23:41 +0000)]
Merge "resize_test.cc: quiet -Wmaybe-uninitialized" into main

2 years agoresize_test.cc: quiet -Wmaybe-uninitialized
Scott LaVarnway [Wed, 21 Sep 2022 19:15:16 +0000 (12:15 -0700)]
resize_test.cc: quiet -Wmaybe-uninitialized

warning: ‘expected_w’ may be used uninitialized
Change-Id: I915efd82d3263250cea90391345f7683c1330fc8