James Zern [Wed, 22 Feb 2023 19:34:30 +0000 (11:34 -0800)]
vp9_block.h: rename diff struct to Diff
This matches the style guide and fixes some -Wshadow warnings related to
variables with the same name. Something similar was done in libaom in: 863b04994b Fix warnings reported by -Wshadow: Part2: av1 directory
Deepa K G [Thu, 16 Feb 2023 16:17:24 +0000 (21:47 +0530)]
Skip redundant iterations in joint motion search
In joint_motion_search, there are four iterations.
Even iterations search in the first reference frame
and odd iterations search in the second. The last two
iterations use the search result of the first two
iterations as the start point. If the search result does
not change,last two iterations are not necessary and can
be skipped.
James Zern [Tue, 14 Feb 2023 02:46:51 +0000 (02:46 +0000)]
Merge changes Id74a6d9c,I5c31e0e9,Id5a2b2d9,I73182c97,I2f5916d5, ... into main
* changes:
Optimize vpx_highbd_comp_avg_pred_neon
Add Neon AvgPredTestHBD test suite
Specialize Neon high bitdepth avg subpel variance by filter value
Specialize Neon high bitdepth subpel variance by filter value
Refactor Neon high bitdepth avg subpel variance functions
Optimize Neon high bitdepth subpel variance functions
Salome Thirot [Thu, 9 Feb 2023 16:45:01 +0000 (16:45 +0000)]
Specialize Neon high bitdepth avg subpel variance by filter value
Use the same specialization as for standard bitdepth. The rationale for
the specialization is as follows:
The optimal implementation of the bilinear interpolation depends on the
filter values being used. For both horizontal and vertical interpolation
this can simplify to just taking the source values, or averaging the
source and reference values - which can be computed more easily than a
bilinear interpolation with arbitrary filter values.
This patch introduces tests to find the most optimal bilinear
interpolation implementation based on the filter values being used.
This new specialization is only used for larger block sizes.
Salome Thirot [Thu, 9 Feb 2023 14:16:30 +0000 (14:16 +0000)]
Specialize Neon high bitdepth subpel variance by filter value
Use the same specialization as for standard bitdepth. The rationale for
the specialization is as follows:
The optimal implementation of the bilinear interpolation depends on the
filter values being used. For both horizontal and vertical interpolation
this can simplify to just taking the source values, or averaging the
source and reference values - which can be computed more easily than a
bilinear interpolation with arbitrary filter values.
This patch introduces tests to find the most optimal bilinear
interpolation implementation based on the filter values being used.
This new specialization is only used for larger block sizes.
Salome Thirot [Wed, 8 Feb 2023 16:50:59 +0000 (16:50 +0000)]
Refactor Neon high bitdepth avg subpel variance functions
Use the same general code style as in the standard bitdepth Neon
implementation - merging the computation of vpx_highbd_comp_avg_pred
with the second pass of the bilinear filter to avoid storing and loading
the block again.
Also move vpx_highbd_comp_avg_pred_neon to its own file (like the
standard bitdepth implementation) since we're no longer using it for
averaging sub-pixel variance.
Salome Thirot [Tue, 7 Feb 2023 14:08:33 +0000 (14:08 +0000)]
Optimize Neon high bitdepth subpel variance functions
Use the same general code style as in the standard bitdepth Neon
implementation. Additionally, do not unnecessarily widen to 32-bit data
types when doing bilinear filtering - allowing us to process twice as
many elements per instruction.
chiyotsai [Wed, 8 Feb 2023 21:54:46 +0000 (13:54 -0800)]
Remove CONFIG_CONSISTENT_RECODE flag
Currently, libvpx does not properly clear and re-initialize the memories
when it re-encodes a frame. As a result, out-of-date values are used in
the encoding process, and re-encoding a frame with the same parameter
will give different outputs.
This commit enables the code under CONFIG_CONSISTENT_RECODE to correct
this behavior. This change has minor effect on the coding performance,
but it ensures valid values are used in the encoding process.
Furthermore, the flag is removed as it is now always turned on.
Jerome Jiang [Thu, 9 Feb 2023 19:37:33 +0000 (14:37 -0500)]
Merge tag 'v1.13.0'
Release v1.13.0 Ugly Duckling
2023-01-31 v1.13.0 "Ugly Duckling"
This release includes more Neon and AVX2 optimizations, adds a new codec
control to set per frame QP, upgrades GoogleTest to v1.12.1, and includes
numerous bug fixes.
- Upgrading:
This release is ABI incompatible with the previous release.
New codec control VP9E_SET_QUANTIZER_ONE_PASS to set per frame QP.
GoogleTest is upgraded to v1.12.1.
.clang-format is upgraded to clang-format-11.
VPX_EXT_RATECTRL_ABI_VERSION was bumped due to incompatible changes to the
feature of using external rate control models for vp9.
- Enhancement:
Numerous improvements on Neon optimizations.
Numerous improvements on AVX2 optimizations.
Additional ARM targets added for Visual Studio.
- Bug fixes:
Fix to calculating internal stats when frame dropped.
Fix to segfault for external resize test in vp9.
Fix to build system with replacing egrep with grep -E.
Fix to a few bugs with external RTC rate control library.
Fix to make SVC work with VBR.
Fix to key frame setting in VP9 external RC.
Fix to -Wimplicit-int (Clang 16).
Fix to VP8 external RC for buffer levels.
Fix to VP8 external RC for dynamic update of layers.
Fix to VP9 auto level.
Fix to off-by-one error of max w/h in validate_config.
Fix to make SVC work for Profile 1.
Jonathan Wright [Thu, 9 Feb 2023 11:57:10 +0000 (11:57 +0000)]
Optimize Neon high bitdepth convolve copy
Use standard loads and stores instead of the significantly slower
interleaving/de-interleaving variants. Also move all loads in loop
bodies above all stores as a mitigation against the compiler thinking
that the src and dst pointers alias (since we can't use restrict in
C89.)
Salome Thirot [Tue, 7 Feb 2023 11:28:15 +0000 (11:28 +0000)]
Use 4D reduction Neon helper for standard bitdepth SAD4D
Move the 4D reduction helper function to sum_neon.h and use this for
both standard and high bitdepth SAD4D paths. This also removes the
AArch64 requirement for using the UDOT Neon SAD4D paths.
James Zern [Tue, 7 Feb 2023 01:32:03 +0000 (01:32 +0000)]
Merge changes Ica45c44f,I75c5f099,I9e626d7f into main
* changes:
Optimize Neon implementation of high bitdepth SAD4D functions
Optimize Neon implementation of high bitdepth avg SAD functions
Optimize Neon implementation of high bitdepth SAD functions
Salome Thirot [Thu, 2 Feb 2023 16:06:38 +0000 (16:06 +0000)]
Optimize Neon implementation of high bitdepth SAD4D functions
Optimizations take a similar form to those implemented for Armv8.0
standard bitdepth SAD4D:
- Use ABD, UADALP instead of ABAL, ABAL2 (double the throughput on
modern out-of-order Arm-designed cores.)
- Use more accumulator registers to make better use of Neon pipeline
resources on Arm CPUs that have four Neon pipes.
- Compute the four SAD sums in parallel so that we only load the source
block once - instead of four times.
Salome Thirot [Fri, 3 Feb 2023 11:00:19 +0000 (11:00 +0000)]
Optimize Neon implementation of high bitdepth avg SAD functions
Optimizations take a similar form to those implemented for standard
bitdepth averaging SAD:
- Use ABD, UADALP instead of ABAL, ABAL2 (double the throughput on
modern out-of-order Arm-designed cores.)
- Use more accumulator registers to make better use of Neon pipeline
resources on Arm CPUs that have four Neon pipes.
Salome Thirot [Wed, 1 Feb 2023 16:37:24 +0000 (16:37 +0000)]
Optimize Neon implementation of high bitdepth SAD functions
Optimizations take a similar form to those implemented for standard
bitdepth SAD:
- Use ABD, UADALP instead of ABAL, ABAL2 (double the throughput on
modern out-of-order Arm-designed cores.)
- Use more accumulator registers to make better use of Neon pipeline
resources on Arm CPUs that have four Neon pipes.
Yunqing Wang [Fri, 3 Feb 2023 00:30:09 +0000 (16:30 -0800)]
Fix uninitialized mesh feature for BEST mode
At BEST encoding mode, the mesh search range wasn't initialized for
non FC_GRAPHICS_ANIMATION content type, which actually/mistakenly
used speed 0's setting. Fixed it by adding the initialization.
There were 2 ways to fix this. Patchset 1 set to use speed 0's setting
for non FC_GRAPHICS_ANIMATION type. This didn't change BEST mode's
encoding results much, and only a couple of clips' results were changed.
Borg result for BEST mode:
avg_psnr: ovr_psnr: ssim: encoding_spdup:
lowres2: -0.004 -0.003 -0.000 0.030
midres2: -0.006 -0.009 -0.012 0.033
hdres2: 0.002 0.002 0.004 0.015
Patchset 2 set to use BEST's setting for non FC_GRAPHICS_ANIMATION type.
However, the majority of test clips' BDrate got changed up to
~0.5% (gain or loss), and overall it didn't give better performance
than patchset 1. So, we chose to use patchset 1.
Jonathan Wright [Tue, 31 Jan 2023 13:32:33 +0000 (13:32 +0000)]
Use load_unaligned mem_neon.h helpers in SAD and SAD4D
Use the load_unaligned helper functions in mem_neon.h to load strided
sequences of 4 bytes where alignment is not guaranteed in the Neon
SAD and SAD4D paths.
Refactor the Neon implementation of transpose_s16_8x8(q) and
transpose_u16_8x8 so that the final step compiles to 8 ZIP1/ZIP2
instructions as opposed to 8 EXT, MOV pairs. This change removes 8
instructions per call to transpose_s16_8x8(q), transpose_u16_8x8
where the result stays in registers for further processing - rather
than being stored to memory - like in vpx_hadamard_8x8_neon, for
example.
This is a backport of this libaom patch[1].
[1] https://aomedia-review.googlesource.com/c/aom/+/169426
Salome Thirot [Fri, 20 Jan 2023 11:42:06 +0000 (11:42 +0000)]
Specialize Neon averaging subpel variance by filter value
Use the same specialization for averaging subpel variance functions
as used for the non-averaging variants. The rationale for the
specialization is as follows:
The optimal implementation of the bilinear interpolation depends on
the filter values being used. For both horizontal and vertical
interpolation this can simplify to just taking the source values, or
averaging the source and reference values - which can be computed
more easily than a bilinear interpolation with arbitrary filter
values.
This patch introduces tests to find the most optimal bilinear
interpolation implementation based on the filter values being used.
This new specialization is only used for larger block sizes
This is a backport of this libaom change[1].
After this change, the only differences between the code in libvpx and
libaom are due to libvpx being compiled with ISO C90, which forbids
mixing declarations and code [-Wdeclaration-after-statement].
Salome Thirot [Fri, 20 Jan 2023 11:21:02 +0000 (11:21 +0000)]
Refactor Neon averaging subpel variance functions
Merge the computation of vpx_comp_avg_pred into the second pass of the
bilinear filter - avoiding the overhead of loading and storing the
entire block again.
Salome Thirot [Fri, 20 Jan 2023 10:35:34 +0000 (10:35 +0000)]
Specialize Neon subpel variance by filter value for large blocks
The optimal implementation of the bilinear interpolation depends on
the filter values being used. For both horizontal and vertical
interpolation this can simplify to just taking the source values, or
averaging the source and reference values - which can be computed
more easily than a bilinear interpolation with arbitrary filter
values.
This patch introduces tests to find the most optimal bilinear
interpolation implementation based on the filter values being used.
This new specialization is only used for larger block sizes
(>= 16x16) as we need to be doing enough work to make the cost of
finding the optimal implementation worth it.
This is a backport of this libaom change[1].
After this change, the only differences between the code in libvpx and
libaom are due to libvpx being compiled with ISO C90, which forbids
mixing declarations and code [-Wdeclaration-after-statement].
Jerome Jiang [Thu, 12 Jan 2023 20:58:00 +0000 (15:58 -0500)]
Add codec control to set per frame QP
Use case is for 1 pass encoding.
Forces max_quantizer = min_quantizer and aq-mode = 0.
Applicalble to spatial layers, where user may set
the QP per spatial layer.
James Zern [Thu, 19 Jan 2023 03:19:01 +0000 (19:19 -0800)]
*/Android.mk: add a check for NDK_ROOT
This simplifies integration with the Android platform and avoids the
files from being used when a non-NDK build is performed. In that case
Android.bp is preferred.
Salome Thirot [Mon, 16 Jan 2023 16:44:04 +0000 (16:44 +0000)]
Refactor Neon implementation of variance functions
Refactor and optimize the Neon implementation of variance functions -
effectively backporting these libaom changes[1,2].
After this change, the only differences between the code in libvpx and
libaom are due to libvpx being compiled with ISO C90, which forbids
mixing declarations and code [-Wdeclaration-after-statement].
Marco Paniconi [Sat, 14 Jan 2023 03:46:10 +0000 (19:46 -0800)]
Fix to segfault for external resize test in vp9
Failure occurs for 1 pass non-realtime mode at speed 0.
Due to speed feautre rd_ml_partition.var_pruning, which
doesn't check for scaled reference in simple_motion_search().
Jonathan Wright [Wed, 18 May 2022 15:58:50 +0000 (16:58 +0100)]
Implement vertical convolutions using Neon USDOT instruction
Add additional AArch64 paths for vpx_convolve8_vert_neon and
vpx_convolve8_avg_vert_neon that use the Armv8.6-A USDOT (mixed-sign
dot-product) instruction. The USDOT instruction takes an 8-bit
unsigned operand vector and a signed 8-bit operand vector to produce
a signed 32-bit result. This is helpful because convolution filters
often have both positive and negative values, while the 8-bit pixel
channel data being filtered is all unsigned. As a result, the USDOT
convolution paths added here do not have to do the "transform the
pixel channel data to [-128, 128) and correct for it later" dance
that we have to do with the SDOT paths.
The USDOT instruction is optional from Armv8.2 to Armv8.5 but
mandatory from Armv8.6 onwards. The availability of the USDOT
instruction is indicated by the feature macro
__ARM_FEATURE_MATMUL_INT8. The SDOT paths are retained for use on
target CPUs that do not implement the USDOT instructions.
Jonathan Wright [Wed, 18 May 2022 13:14:56 +0000 (14:14 +0100)]
Implement horizontal convolutions using Neon USDOT instruction
Add additional AArch64 paths for vpx_convolve8_horiz_neon and
vpx_convolve8_avg_horiz_neon that use the Armv8.6-A USDOT (mixed-sign
dot-product) instruction. The USDOT instruction takes an 8-bit
unsigned operand vector and a signed 8-bit operand vector to produce
a signed 32-bit result. This is helpful because convolution filters
often have both positive and negative values, while the 8-bit pixel
channel data being filtered is all unsigned. As a result, the USDOT
convolution paths added here do not have to do the "transform the
pixel channel data to [-128, 128) and correct for it later" dance
that we have to do with the SDOT paths.
The USDOT instruction is optional from Armv8.2 to Armv8.5 but
mandatory from Armv8.6 onwards. The availability of the USDOT
instruction is indicated by the feature macro
__ARM_FEATURE_MATMUL_INT8. The SDOT paths are retained for use on
target CPUs that do not implement the USDOT instructions.
Jonathan Wright [Thu, 5 Jan 2023 12:20:03 +0000 (12:20 +0000)]
Use lane-referencing intrinsics in Neon convolution kernels
The Neon convolution helper functions take a pointer to a filter and
load the 8 values into a single Neon register. For some reason,
filter values 3 and 4 are then duplicated into their own separate
registers.
This patch modifies these helper functions so that they access filter
values 3 and 4 via the lane-referencing versions of the various Neon
multiply instructions. This reduces register pressure and tidies up
the source code quite a bit.
Jerome Jiang [Wed, 21 Dec 2022 16:13:40 +0000 (11:13 -0500)]
Remove references to deprecated NumPy type aliases
This change replaces references to a number of deprecated NumPy type
aliases (np.bool, np.int, np.float, np.complex, np.object, np.str)
with their recommended replacement
(bool, int, float, complex, object, str).
NumPy 1.24 drops the deprecated aliases
so we must remove uses before updating NumPy.
Anton Venema [Tue, 13 Dec 2022 18:27:37 +0000 (10:27 -0800)]
Add additional ARM targets for Visual Studio.
configure: Add an armv7-win32-vs16 target
configure: Add an armv7-win32-vs17 target
configure: Add an arm64-win64-vs16 target
configure: Add an arm64-win64-vs17 target
Hirokazu Honda [Thu, 17 Nov 2022 07:05:28 +0000 (16:05 +0900)]
vp9/rate_ctrl_rtc: Improve get cyclic refresh data
A client of the vp9 rate controller needs to know whether the
segmentation is enabled and the size of delta_q. It is also nicer to
know the size of map. This CL changes the interface to achieve these.
Marco Paniconi [Tue, 15 Nov 2022 06:11:19 +0000 (22:11 -0800)]
vp9-svc: Fixes to make SVC work with VBR
Prior to this CL SVC with VBR mode was broken.
Fixes made here to make VBR rate control work for SVC.
Rename is_one_pass_cbr_svc() --> is_one_pass_svc(),
as it can be used now for both CBR and VBR.