]> granicus.if.org Git - libvpx/log
libvpx
2 years agovpx_convolve_copy_neon: fix unaligned loads w/w==4
James Zern [Mon, 6 Mar 2023 21:56:17 +0000 (13:56 -0800)]
vpx_convolve_copy_neon: fix unaligned loads w/w==4

Fixes a -fsanitize=undefined warning:

vpx_dsp/arm/vpx_convolve_copy_neon.c:29:26: runtime error: load of
misaligned address 0xffffa8242bea for type 'const uint32_t' (aka 'const
unsigned int'), which requires 4 byte alignment
0xffffa8242bea: note: pointer points here
 88 81  7d 7d 7d 7d 7d 81 81 7d  81 80 87 97 a8 ab a0 91 ...
              ^
    #0 0xb0447c in vpx_convolve_copy_neon
       vpx_dsp/arm/vpx_convolve_copy_neon.c:29:26
    #1 0x12285c8 in inter_predictor vp9/common/vp9_reconinter.h:29:3
    #2 0x1228430 in dec_build_inter_predictors
       vp9/decoder/vp9_decodeframe.c
    ...

Change-Id: Iaec4ac2a400b6e6db72d12e5a7acb316262b12a7

2 years agodisable vp8_sixtap_predict16x16_neon
James Zern [Fri, 3 Mar 2023 23:33:16 +0000 (15:33 -0800)]
disable vp8_sixtap_predict16x16_neon

This causes various buffer overflows in the tests:

[ RUN      ] NEON/SixtapPredictTest.TestWithPresetData/0
=================================================================
==22346==ERROR: AddressSanitizer: global-buffer-overflow on address
0x0000012b4a5b at pc 0x000000df0f60 bp 0xffffcf6e64b0 sp 0xffffcf6e64a8
READ of size 8 at 0x0000012b4a5b thread T0
    #0 0xdf0f5c in vp8_sixtap_predict16x16_neon
       vp8/common/arm/neon/sixtappredict_neon.c:1507:13
    #1 0x8819e4 in (anonymous
        namespace)::SixtapPredictTest_TestWithPresetData_Test::TestBody()
       test/predict_test.cc:293:3
    ...

0x0000012b4a5b is located 2 bytes to the right of global variable
'kTestData' defined in '../test/predict_test.cc:237:24' (0x12b48a0) of
size 441

[ RUN      ] NEON/SixtapPredictTest.TestWithRandomData/0
=================================================================
==22338==ERROR: AddressSanitizer: heap-buffer-overflow on address
0xffff8b5321fb at pc 0x000000df0f60 bp 0xfffff7e0cf30 sp 0xfffff7e0cf28
READ of size 8 at 0xffff8b5321fb thread T0
    #0 0xdf0f5c in vp8_sixtap_predict16x16_neon
       vp8/common/arm/neon/sixtappredict_neon.c:1507:13
    #1 0x87d4c0 in (anonymous
       namespace)::PredictTestBase::TestWithRandomData(void (*)(unsigned
       char*, int, int, int, unsigned char*, int))
       test/predict_test.cc:170:9
    ...

0xffff8b5321fb is located 2 bytes to the right of 441-byte region
[0xffff8b532040,0xffff8b5321f9)
allocated by thread T0 here:
    #0 0x5fd4f0 in operator new[](unsigned long) (test_libvpx+0x5fd4f0)
    #1 0x87c2e0 in (anonymous namespace)::PredictTestBase::SetUp()
       test/predict_test.cc:47:12
    #2 0x87d074 in non-virtual thunk to (anonymous
       namespace)::PredictTestBase::SetUp() test/predict_test.cc
    ...

Bug: webm:1795
Change-Id: I32213a381eef91547d00f88acf90f1cf2ec2ea75

2 years agodisable vpx_get4x4sse_cs_neon
James Zern [Fri, 3 Mar 2023 20:56:29 +0000 (20:56 +0000)]
disable vpx_get4x4sse_cs_neon

This function causes a heap overflow in the tests:
[ RUN      ] NEON/VpxSseTest.RefSse/0
=================================================================
==876922==ERROR: AddressSanitizer: heap-buffer-overflow on address
0xffff8949d903 at pc 0x000000dd95d4 bp 0xfffffdd7f260 sp 0xfffffdd7f258
READ of size 8 at 0xffff8949d903 thread T0
    #0 0xdd95d0 in vpx_get4x4sse_cs_neon
       vpx_dsp/arm/variance_neon.c:556:10
    #1 0x9d4894 in (anonymous namespace)::MainTestClass<unsigned int
       (*)(unsigned char const*, int, unsigned char const*,
           int)>::RefTestSse() test/variance_test.cc:531:5
    #2 0x9d4894 in (anonymous
       namespace)::VpxSseTest_RefSse_Test::TestBody()
           test/variance_test.cc:772:30
    ...

0xffff8949d903 is located 3 bytes to the right of 16-byte region
[0xffff8949d8f0,0xffff8949d900)
allocated by thread T0 here:
    #0 0x5fd050 in operator new[](unsigned long) (test_libvpx+0x5fd050)
    #1 0x9d3e04 in (anonymous namespace)::MainTestClass<unsigned int
       (*)(unsigned char const*, int, unsigned char const*,
           int)>::SetUp() test/variance_test.cc:299:12

Bug: webm:1794
Change-Id: I4bc681eb9a436743ef8bfe2a2abae59ce754309c

2 years agoRevert "Implement d117_predictor using Neon"
James Zern [Fri, 3 Mar 2023 20:34:36 +0000 (12:34 -0800)]
Revert "Implement d117_predictor using Neon"

This reverts commit 360e9069b6cc1dd3a004728b876fb923413f4b11.

This causes ASan errors:
[ RUN      ] VP9/TestVectorTest.MD5Match/1
=================================================================
==837858==ERROR: AddressSanitizer: stack-buffer-overflow on address
0xffff82ecad40 at pc 0x000000c494d4 bp 0xffffe1695800 sp 0xffffe16957f8
READ of size 16 at 0xffff82ecad40 thread T0
    #0 0xc494d0 in vpx_d117_predictor_32x32_neon (test_libvpx+0xc494d0)
    #1 0x1040b34 in vp9_predict_intra_block (test_libvpx+0x1040b34)
    #2 0xf8feec in decode_block (test_libvpx+0xf8feec)
    #3 0xf8f588 in decode_partition (test_libvpx+0xf8f588)
    #4 0xf7be5c in vp9_decode_frame (test_libvpx+0xf7be5c)
    ...
Address 0xffff82ecad40 is located in stack of thread T0 at offset 64 in
frame
    #0 0x103fd3c in vp9_predict_intra_block (test_libvpx+0x103fd3c)

  This frame has 2 object(s):
    [32, 64) 'left_col.i' <== Memory access at offset 64 overflows this
                              variable
    [96, 176) 'above_data.i'

Change-Id: I058213364617dfe1036126c33a3307f8288d9ae0

2 years agoRevert "Allow macroblock_plane to have its own rounding buffer"
Johann [Fri, 3 Mar 2023 03:46:01 +0000 (12:46 +0900)]
Revert "Allow macroblock_plane to have its own rounding buffer"

This reverts commit 5359ae810cdbb974060297ecf935183baf7b009b.

Reason for revert: Blocks quantize cleanups

Original change's description:
> Allow macroblock_plane to have its own rounding buffer
>
> Add 8 bytes buffer to macroblock_plane to support rounding factor.
>
> Change-Id: I3751689e4449c0caea28d3acf6cd17d7f39508ed

Change-Id: Ia2424d2114207370f0b45350313a5ff8521d25a8

2 years agoRevert "quantize: simplify 32x32_b args"
James Zern [Wed, 1 Mar 2023 23:53:18 +0000 (15:53 -0800)]
Revert "quantize: simplify 32x32_b args"

This reverts commit 848f6e733789c627b6606baf1c85e32be997e36f.

This has alignment issues, causing crashes in the tests:
SSSE3/VP9QuantizeTest.EOBCheck/*

Change-Id: Ic12014ab0a78ed3cde02d642509061552cdc8fc9

2 years agoRevert "quantize: simplifly highbd 32x32_b args"
James Zern [Wed, 1 Mar 2023 23:53:14 +0000 (15:53 -0800)]
Revert "quantize: simplifly highbd 32x32_b args"

This reverts commit 573f5e662b544dbc553d73fa2b61055c30dfe8cc.

This has alignment issues, causing crashes in the tests:
SSSE3/VP9QuantizeTest.EOBCheck/*

Change-Id: Ibf05e6b116c46f6e2c11187b3e3578bbd2d2c227

2 years agoRevert "quantize: use scan_order instead of passing scan/iscan"
James Zern [Wed, 1 Mar 2023 23:52:20 +0000 (15:52 -0800)]
Revert "quantize: use scan_order instead of passing scan/iscan"

This reverts commit 14fc40040ff30486c45111056db44ee18590a24a.

This has alignment issues, causing crashes in the tests:
SSSE3/VP9QuantizeTest.EOBCheck/*

Change-Id: I934f9a4c3ce3db33058a65180fa645c8649c3670

2 years agoMerge "Optimize Neon implementation of high bitdepth MSE functions" into main
James Zern [Wed, 1 Mar 2023 23:13:34 +0000 (23:13 +0000)]
Merge "Optimize Neon implementation of high bitdepth MSE functions" into main

2 years agoRevert "Implement highbd_d63_predictor using Neon"
James Zern [Wed, 1 Mar 2023 20:14:51 +0000 (12:14 -0800)]
Revert "Implement highbd_d63_predictor using Neon"

This reverts commit 7cdf139e3d6237386e0f93bdb0bdc1b459c663bf.

This causes failures in the VP9/ExternalFrameBufferMD5Test and
VP9/TestVectorTest.MD5Match tests in both armv7 and aarch64 builds.

Change-Id: I7ac4ba0ddc70e7e7860df9f962e6658defe1cdd5

2 years agoOptimize Neon implementation of high bitdepth MSE functions
Salome Thirot [Mon, 27 Feb 2023 17:58:18 +0000 (17:58 +0000)]
Optimize Neon implementation of high bitdepth MSE functions

Currently MSE functions just call the variance helpers but don't
actually use the computed sum. This patch adds dedicated helpers to
perform the computation of sse.

Add the corresponding tests as well.

Change-Id: I96a8590e3410e84d77f7187344688e02efe03902

2 years agoquantize: use scan_order instead of passing scan/iscan
Johann [Mon, 14 Nov 2022 07:47:33 +0000 (16:47 +0900)]
quantize: use scan_order instead of passing scan/iscan

further reduces the arguments for the 32x32. This will be applied to the base
version as well.

Change-Id: I25a162b5248b14af53d9e20c6a7fa2a77028a6d1

2 years agoquantize: simplifly highbd 32x32_b args
Johann [Fri, 11 Nov 2022 23:23:17 +0000 (08:23 +0900)]
quantize: simplifly highbd 32x32_b args

Change-Id: I431a41279c4c4193bc70cfe819da6ea7e1d2fba1

2 years agoMerge changes I892fbd2c,Ic59df16c,I7228327b,Ib4a1a2cb into main
James Zern [Tue, 28 Feb 2023 21:50:11 +0000 (21:50 +0000)]
Merge changes I892fbd2c,Ic59df16c,I7228327b,Ib4a1a2cb into main

* changes:
  Implement highbd_d117_predictor using Neon
  Implement highbd_d63_predictor using Neon
  Implement d117_predictor using Neon
  Implement d63_predictor using Neon

2 years agoMerge "quantize: simplify 32x32_b args" into main
James Zern [Tue, 28 Feb 2023 21:40:26 +0000 (21:40 +0000)]
Merge "quantize: simplify 32x32_b args" into main

2 years agoImplement highbd_d117_predictor using Neon
George Steed [Tue, 21 Feb 2023 11:17:10 +0000 (11:17 +0000)]
Implement highbd_d117_predictor using Neon

Add Neon implementations of the highbd d117 predictor for 4x4, 8x8,
16x16 and 32x32 block sizes. Also update tests to add new corresponding
cases.

An explanation of the general implementation strategy is given in the
8x8 implementation body, and is mostly identical to the non-highbd
version.

Speedups over the C code (higher is better):

Microarch.  | Compiler | Block | Speedup
Neoverse N1 |  LLVM 15 |   4x4 |    1.99
Neoverse N1 |  LLVM 15 |   8x8 |    4.37
Neoverse N1 |  LLVM 15 | 16x16 |    6.81
Neoverse N1 |  LLVM 15 | 32x32 |    6.49
Neoverse N1 |   GCC 12 |   4x4 |    2.49
Neoverse N1 |   GCC 12 |   8x8 |    4.10
Neoverse N1 |   GCC 12 | 16x16 |    5.58
Neoverse N1 |   GCC 12 | 32x32 |    2.16
Neoverse V1 |  LLVM 15 |   4x4 |    1.99
Neoverse V1 |  LLVM 15 |   8x8 |    5.03
Neoverse V1 |  LLVM 15 | 16x16 |    6.61
Neoverse V1 |  LLVM 15 | 32x32 |    6.01
Neoverse V1 |   GCC 12 |   4x4 |    2.09
Neoverse V1 |   GCC 12 |   8x8 |    4.52
Neoverse V1 |   GCC 12 | 16x16 |    4.23
Neoverse V1 |   GCC 12 | 32x32 |    2.70

Change-Id: I892fbd2c17ac527ddc22b91acca907ffc84c5cd2

2 years agoImplement highbd_d63_predictor using Neon
George Steed [Mon, 20 Feb 2023 11:41:40 +0000 (11:41 +0000)]
Implement highbd_d63_predictor using Neon

Add Neon implementations of the highbd d63 predictor for 4x4, 8x8, 16x16
and 32x32 block sizes. Also update tests to add new corresponding cases.

Speedups over the C code (higher is better):

Microarch.  | Compiler | Block | Speedup
Neoverse N1 |  LLVM 15 |   4x4 |    2.43
Neoverse N1 |  LLVM 15 |   8x8 |    4.03
Neoverse N1 |  LLVM 15 | 16x16 |    3.07
Neoverse N1 |  LLVM 15 | 32x32 |    4.11
Neoverse N1 |   GCC 12 |   4x4 |    2.92
Neoverse N1 |   GCC 12 |   8x8 |    7.20
Neoverse N1 |   GCC 12 | 16x16 |    4.43
Neoverse N1 |   GCC 12 | 32x32 |    3.18
Neoverse V1 |  LLVM 15 |   4x4 |    1.99
Neoverse V1 |  LLVM 15 |   8x8 |    3.66
Neoverse V1 |  LLVM 15 | 16x16 |    3.60
Neoverse V1 |  LLVM 15 | 32x32 |    3.29
Neoverse V1 |   GCC 12 |   4x4 |    2.39
Neoverse V1 |   GCC 12 |   8x8 |    4.76
Neoverse V1 |   GCC 12 | 16x16 |    3.29
Neoverse V1 |   GCC 12 | 32x32 |    2.43

Change-Id: Ic59df16ceeb468003754b4374be2f4d9af6589e4

2 years agoImplement d117_predictor using Neon
George Steed [Tue, 7 Feb 2023 12:16:00 +0000 (12:16 +0000)]
Implement d117_predictor using Neon

Add Neon implementations of the d117 predictor for 4x4, 8x8, 16x16 and
32x32 block sizes. Also update tests to add new corresponding cases.

An explanation of the general implementation strategy is given in the
8x8 implementation body.

Speedups over the C code (higher is better):

Microarch.  | Compiler | Block | Speedup
Neoverse N1 |  LLVM 15 |   4x4 |    1.73
Neoverse N1 |  LLVM 15 |   8x8 |    5.24
Neoverse N1 |  LLVM 15 | 16x16 |    9.77
Neoverse N1 |  LLVM 15 | 32x32 |   14.13
Neoverse N1 |   GCC 12 |   4x4 |    2.04
Neoverse N1 |   GCC 12 |   8x8 |    4.70
Neoverse N1 |   GCC 12 | 16x16 |    8.64
Neoverse N1 |   GCC 12 | 32x32 |    4.57
Neoverse V1 |  LLVM 15 |   4x4 |    1.75
Neoverse V1 |  LLVM 15 |   8x8 |    6.79
Neoverse V1 |  LLVM 15 | 16x16 |    9.16
Neoverse V1 |  LLVM 15 | 32x32 |   14.47
Neoverse V1 |   GCC 12 |   4x4 |    1.75
Neoverse V1 |   GCC 12 |   8x8 |    6.00
Neoverse V1 |   GCC 12 | 16x16 |    7.63
Neoverse V1 |   GCC 12 | 32x32 |    4.32

Change-Id: I7228327b5be27ee7a68deecafa05be0bd2a40ff4

2 years agoImplement d63_predictor using Neon
George Steed [Fri, 3 Feb 2023 17:12:46 +0000 (17:12 +0000)]
Implement d63_predictor using Neon

Add Neon implementations of the d63 predictor for 4x4, 8x8, 16x16 and
32x32 block sizes. Also update tests to add new corresponding cases.

Speedups over the C code (higher is better):

Microarch.  | Compiler | Block | Speedup
Neoverse N1 |  LLVM 15 |   4x4 |    2.10
Neoverse N1 |  LLVM 15 |   8x8 |    4.45
Neoverse N1 |  LLVM 15 | 16x16 |    4.74
Neoverse N1 |  LLVM 15 | 32x32 |    2.27
Neoverse N1 |   GCC 12 |   4x4 |    2.46
Neoverse N1 |   GCC 12 |   8x8 |   10.37
Neoverse N1 |   GCC 12 | 16x16 |   11.46
Neoverse N1 |   GCC 12 | 32x32 |    6.57
Neoverse V1 |  LLVM 15 |   4x4 |    2.24
Neoverse V1 |  LLVM 15 |   8x8 |    3.53
Neoverse V1 |  LLVM 15 | 16x16 |    4.44
Neoverse V1 |  LLVM 15 | 32x32 |    2.17
Neoverse V1 |   GCC 12 |   4x4 |    2.25
Neoverse V1 |   GCC 12 |   8x8 |    7.67
Neoverse V1 |   GCC 12 | 16x16 |    8.97
Neoverse V1 |   GCC 12 | 32x32 |    4.77

Change-Id: Ib4a1a2cb5a5c4495ae329529f8847664cbd0dfe0

2 years agoquantize: simplify 32x32_b args
Johann [Sat, 5 Nov 2022 00:53:07 +0000 (09:53 +0900)]
quantize: simplify 32x32_b args

Now that all the implementations of the 32x32 quantize are in
intrinsics we can reference struct members directly. Saves
pushing them to the stack.

n_coeffs is not used at all for this function.

Change-Id: I2104fea3fa20c455087e21b347d6abd7ea1f3e1e

2 years agoMerge "Add Neon implementations of standard bitdepth MSE functions" into main
James Zern [Tue, 28 Feb 2023 02:44:28 +0000 (02:44 +0000)]
Merge "Add Neon implementations of standard bitdepth MSE functions" into main

2 years agoMerge "Optimize transpose_neon.h helper functions" into main
James Zern [Tue, 28 Feb 2023 02:36:41 +0000 (02:36 +0000)]
Merge "Optimize transpose_neon.h helper functions" into main

2 years agotools_common,VpxInterface: remove unneeded const
James Zern [Mon, 27 Feb 2023 21:48:47 +0000 (13:48 -0800)]
tools_common,VpxInterface: remove unneeded const

Change-Id: Ic309aab2ff1750bdbcc36e8aafe05d52930ba694

2 years agoMerge "tools_common,VpxInterface: fix interface fn ptr proto" into main
James Zern [Mon, 27 Feb 2023 19:52:18 +0000 (19:52 +0000)]
Merge "tools_common,VpxInterface: fix interface fn ptr proto" into main

2 years agoAdd Neon implementations of standard bitdepth MSE functions
Salome Thirot [Fri, 24 Feb 2023 18:05:43 +0000 (18:05 +0000)]
Add Neon implementations of standard bitdepth MSE functions

Currently only vpx_mse16x16 has a Neon implementation. This patch adds
optimized Armv8.0 and Armv8.4 dot-product paths for all block sizes:
8x8, 8x16, 16x8 and 16x16.

Add the corresponding tests as well.

Change-Id: Ib0357fdcdeb05860385fec89633386e34395e260

2 years agoOptimize transpose_neon.h helper functions
Jonathan Wright [Sat, 25 Feb 2023 00:43:46 +0000 (00:43 +0000)]
Optimize transpose_neon.h helper functions

1) Use vtrn[12]q_[su]64 in vpx_vtrnq_[su]64* helpers on AArch64
   targets. This produces half as many TRN1/2 instructions compared to
   the number of MOVs that result from vcombine.

2) Use vpx_vtrnq_[su]64* helpers wherever applicable.

3) Refactor transpose_4x8_s16 to operate on 128-bit vectors.

Change-Id: I9a8b1c1fe2a98a429e0c5f39def5eb2f65759127

2 years agotools_common,VpxInterface: fix interface fn ptr proto
James Zern [Sat, 25 Feb 2023 03:25:39 +0000 (19:25 -0800)]
tools_common,VpxInterface: fix interface fn ptr proto

Use (void) to indicate an empty parameter list and match the declaration
of vpx_codec_vp[89]_[cd]x. This fixes a cfi sanitizer error.

Change-Id: I190f432eea4d1765afffd84c7458ec44d863f90c

2 years agoMerge changes I65d86038,If3299fe5,I3ef1ff19 into main
James Zern [Fri, 24 Feb 2023 17:58:15 +0000 (17:58 +0000)]
Merge changes I65d86038,If3299fe5,I3ef1ff19 into main

* changes:
  Add Neon implementation of high bitdepth 32x32 hadamard transform
  Add Neon implementation of high bitdepth 16x16 hadamard transform
  Add Neon implementation of high bitdepth 8x8 hadamard transform

2 years agoMerge changes Ia64d175a,Ie4ea8f0a into main
James Zern [Fri, 24 Feb 2023 17:49:25 +0000 (17:49 +0000)]
Merge changes Ia64d175a,Ie4ea8f0a into main

* changes:
  vp9_loop_filter_alloc: clear -Wshadow warnings
  vp9_adapt_mode_probs: clear -Wshadow warning

2 years agoAdd Neon implementation of high bitdepth 32x32 hadamard transform
Salome Thirot [Thu, 23 Feb 2023 12:05:30 +0000 (12:05 +0000)]
Add Neon implementation of high bitdepth 32x32 hadamard transform

Add Neon implementation of vpx_highbd_hadamard_32x32 as well as the
corresponding tests.

Change-Id: I65d8603896649de1996b353aa79eee54824b4708

2 years agoAdd Neon implementation of high bitdepth 16x16 hadamard transform
Salome Thirot [Wed, 22 Feb 2023 17:27:56 +0000 (17:27 +0000)]
Add Neon implementation of high bitdepth 16x16 hadamard transform

Add Neon implementation of vpx_highbd_hadamard_16x16 as well as the
corresponding tests.

Change-Id: If3299fe556351dfe3db994ac171d83a95ea1504b

2 years agoMerge "vp9 rc test: change param type to bool" into main
Jerome Jiang [Fri, 24 Feb 2023 01:45:54 +0000 (01:45 +0000)]
Merge "vp9 rc test: change param type to bool" into main

2 years agovp9 rc test: change param type to bool
Jerome Jiang [Thu, 23 Feb 2023 19:28:30 +0000 (14:28 -0500)]
vp9 rc test: change param type to bool

Change-Id: Ib45522e32d9137678da9062830044e9dd87537e5

2 years agoMerge "Disable some intra modes for TX_32X32" into main
Chi Yo Tsai [Thu, 23 Feb 2023 18:01:05 +0000 (18:01 +0000)]
Merge "Disable some intra modes for TX_32X32" into main

2 years agoAdd Neon implementation of high bitdepth 8x8 hadamard transform
Salome Thirot [Tue, 21 Feb 2023 17:40:20 +0000 (17:40 +0000)]
Add Neon implementation of high bitdepth 8x8 hadamard transform

Add Neon implementation of vpx_highbd_hadamard_8x8 as well as the
corresponding tests.

Change-Id: I3ef1ff199d76b6b010591ef15a81b0f36c9ded03

2 years agovp9_loop_filter_alloc: clear -Wshadow warnings
James Zern [Wed, 22 Feb 2023 21:25:29 +0000 (13:25 -0800)]
vp9_loop_filter_alloc: clear -Wshadow warnings

Bug: webm:1793
Change-Id: Ia64d175aa69dc2ecde2babf64bde04f02b32795b

2 years agovp9_adapt_mode_probs: clear -Wshadow warning
James Zern [Wed, 22 Feb 2023 21:21:27 +0000 (13:21 -0800)]
vp9_adapt_mode_probs: clear -Wshadow warning

Bug: webm:1793
Change-Id: Ie4ea8f0a3295e6f58dc6f7d5c61d46700c539d40

2 years agoMerge "vp9_block.h: rename diff struct to Diff" into main
James Zern [Thu, 23 Feb 2023 06:07:25 +0000 (06:07 +0000)]
Merge "vp9_block.h: rename diff struct to Diff" into main

2 years agoDisable some intra modes for TX_32X32
chiyotsai [Wed, 22 Feb 2023 20:44:47 +0000 (12:44 -0800)]
Disable some intra modes for TX_32X32

Performance:
| SPD_SET | TESTSET | AVG_PSNR | OVR_PSNR |  SSIM   | ENC_T |
|---------|---------|----------|----------|---------|-------|
|    0    | hdres2  | +0.036%  | +0.032%  | +0.014% | -3.9% |
|    0    | lowres2 | -0.002%  | -0.011%  | +0.020% | -3.6% |
|    0    | midres2 | +0.045%  | +0.025%  | -0.007% | -4.0% |

STATS_CHANGED

Change-Id: I75a927333d26f2a37f0dda57a641b455b845f5b9

2 years agovpx_subpixel_8t_intrin_avx2: clear -Wshadow warnings
James Zern [Wed, 22 Feb 2023 20:54:21 +0000 (12:54 -0800)]
vpx_subpixel_8t_intrin_avx2: clear -Wshadow warnings

no changes to assembly

Bug: webm:1793
Change-Id: I6a82290cafee7f4a7909d497ccfdefd5a78fb8ed

2 years agovp9_block.h: rename diff struct to Diff
James Zern [Wed, 22 Feb 2023 19:34:30 +0000 (11:34 -0800)]
vp9_block.h: rename diff struct to Diff

This matches the style guide and fixes some -Wshadow warnings related to
variables with the same name. Something similar was done in libaom in:
863b04994b Fix warnings reported by -Wshadow: Part2: av1 directory

Bug: webm:1793
Change-Id: I4df1bbc8d079a3174d75f0d35d54c200ffdbb677

2 years agoMerge "Skip redundant iterations in joint motion search " into main
Yunqing Wang [Wed, 22 Feb 2023 19:28:17 +0000 (19:28 +0000)]
Merge "Skip redundant iterations in joint motion search " into main

2 years agoMerge "vp9 rc: Make it work for SVC parallel encoding" into main
Jerome Jiang [Wed, 22 Feb 2023 14:59:49 +0000 (14:59 +0000)]
Merge "vp9 rc: Make it work for SVC parallel encoding" into main

2 years agoOptimize Neon implementation of high bitpdeth variance functions
Salome Thirot [Mon, 13 Feb 2023 16:11:31 +0000 (16:11 +0000)]
Optimize Neon implementation of high bitpdeth variance functions

Specialize implementation of high bitdepth variance functions such that
we only widen data processing element types when absolutely necessary.

Change-Id: If4cc3fea7b5ab0821e3129ebd79ff63706a512bf

2 years agoSkip redundant iterations in joint motion search 
Deepa K G [Thu, 16 Feb 2023 16:17:24 +0000 (21:47 +0530)]
Skip redundant iterations in joint motion search 

In joint_motion_search, there are four iterations.
Even iterations search in the first reference frame
and odd iterations search in the second. The last two
iterations use the search result of the first two
iterations as the start point. If the search result does
not change,last two iterations are not necessary and can
be skipped.

          Instruction Count
cpu-used   Reduction(%)
  0          1.411

Change-Id: Ie583c9f75dd0a22bbdfb432ccdd62eea6ec4fce8

2 years agovp9 rc: Make it work for SVC parallel encoding
Jerome Jiang [Fri, 11 Nov 2022 19:21:27 +0000 (14:21 -0500)]
vp9 rc: Make it work for SVC parallel encoding

Added unit test.

Keep track of spatial layer id and frame type in case where spatial
layers are encoded parallel by the hardware encoder.

ComputeQP() / PostEncodeUpdate() doesn't need to be called sequentially
when there is no inter layer prediction.

Bug: b/257368998
Change-Id: I50beaefcfc205d3f9a9d3dbe11fead5bfdc71489

2 years agoMerge "vp9 rc: Verify QP for all spatial layers" into main
Jerome Jiang [Fri, 17 Feb 2023 02:11:31 +0000 (02:11 +0000)]
Merge "vp9 rc: Verify QP for all spatial layers" into main

2 years agovp9 rc: Verify QP for all spatial layers
Jerome Jiang [Thu, 16 Feb 2023 22:48:49 +0000 (17:48 -0500)]
vp9 rc: Verify QP for all spatial layers

Change-Id: Ic669c96d25d7c039d370e9acd00dc45e09054552

2 years agoRelax frame recode tolerance on speed 0 to 1 above 480p
chiyotsai [Tue, 14 Feb 2023 22:29:29 +0000 (14:29 -0800)]
Relax frame recode tolerance on speed 0 to 1 above 480p

Performance:
| SPD_SET | TESTSET | AVG_PSNR | OVR_PSNR |  SSIM   | ENC_T |
|---------|---------|----------|----------|---------|-------|
|    0    | hdres2  | -0.028%  | +0.030%  | -0.408% | -2.0% |
|    0    | lowres2 | +0.000%  | +0.000%  | +0.000% | +0.0% |
|    0    | midres2 | -0.138%  | +0.042%  | -0.427% | -2.5% |
|---------|---------|----------|----------|---------|-------|
|    1    | hdres2  | -0.032%  | +0.018%  | -0.342% | -1.1% |
|    1    | lowres2 | +0.000%  | +0.000%  | +0.000% | +0.0% |
|    1    | midres2 | +0.050%  | +0.060%  | -0.257% | -1.6% |

Rate Error:
|         |         |     AVG_RC_ERROR    |     MAX_RC_ERROR    |
|         |         |---------------------|---------------------|
| SPD_SET | TESTSET |   BASE   |   TEST   |   BASE   |   TEST   |
|---------|---------|----------|----------|----------|----------|
|    0    | hdres2  |  33.044% |  33.065% | 149.903% | 149.903% |
|    0    | midres2 |  59.632% |  59.566% |  79.091% |  79.249% |
|---------|---------|----------|----------|----------|----------|
|    1    | hdres2  |  33.050% |  33.057% | 151.278% | 151.278% |
|    1    | midres2 |  59.640% |  59.614% |  78.707% |  78.842% |

STATS_CHANGED

Change-Id: I5d09601fede3912d5173717ce9dd070df3a97ec8

2 years agoEnable some more speed features on speed 0 to 2
chiyotsai [Tue, 14 Feb 2023 01:57:26 +0000 (17:57 -0800)]
Enable some more speed features on speed 0 to 2

Performance:
| SPD_SET | TESTSET | AVG_PSNR | OVR_PSNR |  SSIM   | ENC_T |
|---------|---------|----------|----------|---------|-------|
|    0    | hdres2  | +0.034%  | +0.030%  | +0.033% | -3.7% |
|    0    | lowres2 | +0.012%  | +0.017%  | +0.044% | -2.1% |
|    0    | midres2 | +0.030%  | +0.035%  | +0.060% | -1.9% |
|---------|---------|----------|----------|---------|-------|
|    1    | hdres2  | +0.027%  | +0.036%  | +0.030% | -2.7% |
|    1    | lowres2 | -0.006%  | -0.002%  | +0.006% | -1.0% |
|    1    | midres2 | -0.006%  | -0.012%  | -0.010% | -1.0% |
|---------|---------|----------|----------|---------|-------|
|    2    | hdres2  | -0.006%  | -0.001%  | -0.020% | -2.4% |
|    2    | lowres2 | -0.010%  | -0.015%  | -0.001% | -0.9% |
|    2    | midres2 | +0.006%  | -0.005%  | +0.009% | -1.0% |

STATS_CHANGED

Change-Id: I1431ac07215bb844739a410697387b9aead82792

2 years agoMerge changes Id74a6d9c,I5c31e0e9,Id5a2b2d9,I73182c97,I2f5916d5, ... into main
James Zern [Tue, 14 Feb 2023 02:46:51 +0000 (02:46 +0000)]
Merge changes Id74a6d9c,I5c31e0e9,Id5a2b2d9,I73182c97,I2f5916d5, ... into main

* changes:
  Optimize vpx_highbd_comp_avg_pred_neon
  Add Neon AvgPredTestHBD test suite
  Specialize Neon high bitdepth avg subpel variance by filter value
  Specialize Neon high bitdepth subpel variance by filter value
  Refactor Neon high bitdepth avg subpel variance functions
  Optimize Neon high bitdepth subpel variance functions

2 years agoOptimize vpx_highbd_comp_avg_pred_neon
Salome Thirot [Fri, 10 Feb 2023 10:50:47 +0000 (10:50 +0000)]
Optimize vpx_highbd_comp_avg_pred_neon

Optimize the implementation of vpx_highbd_comp_avg_pred_neon by making
use of the URHADD instruction to compute the average.

Change-Id: Id74a6d9c33e89bc548c3c7ecace59af69051b4a7

2 years agoAdd Neon AvgPredTestHBD test suite
Salome Thirot [Fri, 10 Feb 2023 10:29:24 +0000 (10:29 +0000)]
Add Neon AvgPredTestHBD test suite

Add test suite for vpx_highbd_comp_avg_pred_neon.

Change-Id: I5c31e0e990661ee3b8030bb517829c088fceae4d

2 years agoSpecialize Neon high bitdepth avg subpel variance by filter value
Salome Thirot [Thu, 9 Feb 2023 16:45:01 +0000 (16:45 +0000)]
Specialize Neon high bitdepth avg subpel variance by filter value

Use the same specialization as for standard bitdepth. The rationale for
the specialization is as follows:

The optimal implementation of the bilinear interpolation depends on the
filter values being used. For both horizontal and vertical interpolation
this can simplify to just taking the source values, or averaging the
source and reference values - which can be computed more easily than a
bilinear interpolation with arbitrary filter values.

This patch introduces tests to find the most optimal bilinear
interpolation implementation based on the filter values being used.
This new specialization is only used for larger block sizes.

Change-Id: Id5a2b2d9fac6f878795a6ed9de2bc27d9e62d661

2 years agoSpecialize Neon high bitdepth subpel variance by filter value
Salome Thirot [Thu, 9 Feb 2023 14:16:30 +0000 (14:16 +0000)]
Specialize Neon high bitdepth subpel variance by filter value

Use the same specialization as for standard bitdepth. The rationale for
the specialization is as follows:

The optimal implementation of the bilinear interpolation depends on the
filter values being used. For both horizontal and vertical interpolation
this can simplify to just taking the source values, or averaging the
source and reference values - which can be computed more easily than a
bilinear interpolation with arbitrary filter values.

This patch introduces tests to find the most optimal bilinear
interpolation implementation based on the filter values being used.
This new specialization is only used for larger block sizes.

Change-Id: I73182c979255f0332a274f2e5907df7f38c9eeb3

2 years agoRefactor Neon high bitdepth avg subpel variance functions
Salome Thirot [Wed, 8 Feb 2023 16:50:59 +0000 (16:50 +0000)]
Refactor Neon high bitdepth avg subpel variance functions

Use the same general code style as in the standard bitdepth Neon
implementation - merging the computation of vpx_highbd_comp_avg_pred
with the second pass of the bilinear filter to avoid storing and loading
the block again.

Also move vpx_highbd_comp_avg_pred_neon to its own file (like the
standard bitdepth implementation) since we're no longer using it for
averaging sub-pixel variance.

Change-Id: I2f5916d5b397db44b3247b478ef57046797dae6c

2 years agoOptimize Neon high bitdepth subpel variance functions
Salome Thirot [Tue, 7 Feb 2023 14:08:33 +0000 (14:08 +0000)]
Optimize Neon high bitdepth subpel variance functions

Use the same general code style as in the standard bitdepth Neon
implementation. Additionally, do not unnecessarily widen to 32-bit data
types when doing bilinear filtering - allowing us to process twice as
many elements per instruction.

Change-Id: I1e178991d2aa71f5f77a376e145d19257481e90f

2 years agoREADME: update release version to 1.13.0
James Zern [Sat, 11 Feb 2023 03:04:41 +0000 (19:04 -0800)]
README: update release version to 1.13.0

this was missed in the v1.13.0 tag

Bug: webm:1780
Change-Id: I3044534123bf67861174970e6241f6586055358e

2 years agoMerge "Remove CONFIG_CONSISTENT_RECODE flag" into main
Chi Yo Tsai [Fri, 10 Feb 2023 22:13:50 +0000 (22:13 +0000)]
Merge "Remove CONFIG_CONSISTENT_RECODE flag" into main

2 years agoRemove CONFIG_CONSISTENT_RECODE flag
chiyotsai [Wed, 8 Feb 2023 21:54:46 +0000 (13:54 -0800)]
Remove CONFIG_CONSISTENT_RECODE flag

Currently, libvpx does not properly clear and re-initialize the memories
when it re-encodes a frame. As a result, out-of-date values are used in
the encoding process, and re-encoding a frame with the same parameter
will give different outputs.

This commit enables the code under CONFIG_CONSISTENT_RECODE to correct
this behavior. This change has minor effect on the coding performance,
but it ensures valid values are used in the encoding process.

Furthermore, the flag is removed as it is now always turned on.

Performance:
| SPD_SET | TESTSET | AVG_PSNR | OVR_PSNR |  SSIM   | ENC_T |
|---------|---------|----------|----------|---------|-------|
|    0    | hdres2  | -0.012%  | -0.021%  | -0.030% | +0.1% |
|    0    | lowres2 | +0.029%  | +0.019%  | +0.047% | +0.1% |
|    0    | midres2 | -0.004%  | +0.009%  | +0.026% | +0.1% |
|---------|---------|----------|----------|---------|-------|
|    1    | hdres2  | +0.032%  | +0.032%  | -0.000% | -0.0% |
|    1    | lowres2 | -0.005%  | -0.011%  | -0.014% | +0.0% |
|    1    | midres2 | +0.004%  | +0.020%  | +0.027% | +0.2% |
|---------|---------|----------|----------|---------|-------|
|    2    | hdres2  | +0.048%  | +0.056%  | +0.057% | +0.1% |
|    2    | lowres2 | +0.007%  | +0.002%  | -0.016% | -0.0% |
|    2    | midres2 | -0.015%  | -0.008%  | -0.002% | +0.1% |
|---------|---------|----------|----------|---------|-------|
|    3    | hdres2  | +0.010%  | +0.014%  | +0.004% | -0.0% |
|    3    | lowres2 | +0.000%  | -0.021%  | -0.001% | +0.0% |
|    3    | midres2 | +0.007%  | -0.038%  | +0.012% | -0.2% |
|---------|---------|----------|----------|---------|-------|
|    4    | hdres2  | +0.107%  | +0.136%  | +0.124% | -0.0% |
|    4    | lowres2 | -0.012%  | -0.024%  | -0.020% | -0.0% |
|    4    | midres2 | +0.055%  | -0.004%  | +0.048% | -0.1% |
|---------|---------|----------|----------|---------|-------|
|    5    | hdres2  | +0.026%  | +0.027%  | +0.020% | -0.0% |
|    5    | lowres2 | +0.009%  | -0.008%  | +0.028% | +0.1% |
|    5    | midres2 | -0.025%  | +0.021%  | -0.020% | -0.1% |

STATS_CHANGED

Change-Id: I3967aee8c8e4d0608a492e07f99ab8de9744ba57

2 years agoMerge "Optimize Neon high bitdepth convolve copy" into main
James Zern [Fri, 10 Feb 2023 03:35:22 +0000 (03:35 +0000)]
Merge "Optimize Neon high bitdepth convolve copy" into main

2 years agoMerge "Merge tag 'v1.13.0'" into main
Jerome Jiang [Thu, 9 Feb 2023 22:07:28 +0000 (22:07 +0000)]
Merge "Merge tag 'v1.13.0'" into main

2 years agoMerge "Remove onyx_int.h from vp8 rc header" into main
Jerome Jiang [Thu, 9 Feb 2023 21:27:59 +0000 (21:27 +0000)]
Merge "Remove onyx_int.h from vp8 rc header" into main

2 years agoRemove onyx_int.h from vp8 rc header
Jerome Jiang [Tue, 7 Feb 2023 22:22:12 +0000 (17:22 -0500)]
Remove onyx_int.h from vp8 rc header

Also move the FRAME_TYPE declaration to common.h

Bug: webm:1766

Change-Id: Ic3016bd16548a5d2e0ae828a7fd7ad8adda8b8f6

2 years agoMerge tag 'v1.13.0'
Jerome Jiang [Thu, 9 Feb 2023 19:37:33 +0000 (14:37 -0500)]
Merge tag 'v1.13.0'

Release v1.13.0 Ugly Duckling

2023-01-31 v1.13.0 "Ugly Duckling"

  This release includes more Neon and AVX2 optimizations, adds a new codec
  control to set per frame QP, upgrades GoogleTest to v1.12.1, and includes
  numerous bug fixes.

- Upgrading:
    This release is ABI incompatible with the previous release.

    New codec control VP9E_SET_QUANTIZER_ONE_PASS to set per frame QP.

    GoogleTest is upgraded to v1.12.1.

    .clang-format is upgraded to clang-format-11.

    VPX_EXT_RATECTRL_ABI_VERSION was bumped due to incompatible changes to the
    feature of using external rate control models for vp9.

- Enhancement:
    Numerous improvements on Neon optimizations.
    Numerous improvements on AVX2 optimizations.
    Additional ARM targets added for Visual Studio.

- Bug fixes:
    Fix to calculating internal stats when frame dropped.
    Fix to segfault for external resize test in vp9.
    Fix to build system with replacing egrep with grep -E.
    Fix to a few bugs with external RTC rate control library.
    Fix to make SVC work with VBR.
    Fix to key frame setting in VP9 external RC.
    Fix to -Wimplicit-int (Clang 16).
    Fix to VP8 external RC for buffer levels.
    Fix to VP8 external RC for dynamic update of layers.
    Fix to VP9 auto level.
    Fix to off-by-one error of max w/h in validate_config.
    Fix to make SVC work for Profile 1.

Bug: webm:1780

Change-Id: I371fc1444ead56f8d7fc510e05582b6415c3ddb1

2 years agoOptimize Neon high bitdepth convolve copy
Jonathan Wright [Thu, 9 Feb 2023 11:57:10 +0000 (11:57 +0000)]
Optimize Neon high bitdepth convolve copy

Use standard loads and stores instead of the significantly slower
interleaving/de-interleaving variants. Also move all loads in loop
bodies above all stores as a mitigation against the compiler thinking
that the src and dst pointers alias (since we can't use restrict in
C89.)

Change-Id: Idd59dca51387f553f8db27144a2b8f2377c937d3

2 years agoMerge "Copy BLOCK_8X8's mi to PICK_MODE_CONTEXT::mic" into main
Chi Yo Tsai [Wed, 8 Feb 2023 23:16:48 +0000 (23:16 +0000)]
Merge "Copy BLOCK_8X8's mi to PICK_MODE_CONTEXT::mic" into main

2 years agoCopy BLOCK_8X8's mi to PICK_MODE_CONTEXT::mic
chiyotsai [Wed, 8 Feb 2023 22:01:19 +0000 (14:01 -0800)]
Copy BLOCK_8X8's mi to PICK_MODE_CONTEXT::mic

STATS_CHANGED

BUG=webm:1789

Change-Id: I74efe28bdf90a179c59fe3d1f5a15d497f57080d

2 years agoAdd missing high bitdepth Neon subpel variance tests
Salome Thirot [Wed, 8 Feb 2023 17:05:25 +0000 (17:05 +0000)]
Add missing high bitdepth Neon subpel variance tests

Add missing 4x4 and 4x8 tests for both high bitdepth sub-pixel variance
and high bitdepth averaging sub-pixel variance.

Change-Id: I042752c5b7ccc14f58075694d0bb1d36f144ad06

2 years agoFix unsigned integer overflow in sse computation v1.13.0
Cheng Chen [Mon, 30 Jan 2023 19:51:58 +0000 (11:51 -0800)]
Fix unsigned integer overflow in sse computation

Basically port the fix from libaom:
https://aomedia-review.googlesource.com/c/aom/+/169361

Change-Id: Id06a5db91372037832399200ded75d514e096726
(cherry picked from commit a94cdd57ffd95ee7beb48d2794dae538f25da46c)

2 years agoMerge "Enable some speed features on speed 0" into main
Chi Yo Tsai [Wed, 8 Feb 2023 00:44:46 +0000 (00:44 +0000)]
Merge "Enable some speed features on speed 0" into main

2 years agoEnable some speed features on speed 0
chiyotsai [Tue, 7 Feb 2023 19:11:35 +0000 (11:11 -0800)]
Enable some speed features on speed 0

Performance:
| SPD_SET | TESTSET | AVG_PSNR | OVR_PSNR |  SSIM   | ENC_T |
|---------|---------|----------|----------|---------|-------|
|    0    | hdres2  | +0.069%  | +0.067%  | +0.100% | -8.6% |
|    0    | midres2 | +0.116%  | +0.103%  | +0.062% | -9.6% |
|    0    | lowres2 | +0.276%  | +0.283%  | +0.214% |-11.9% |

STATS_CHANGED

Change-Id: I8b26c0be2312fcd0f8c9e889367682e80ea8de4b

2 years agoUse 4D reduction Neon helper for standard bitdepth SAD4D
Salome Thirot [Tue, 7 Feb 2023 11:28:15 +0000 (11:28 +0000)]
Use 4D reduction Neon helper for standard bitdepth SAD4D

Move the 4D reduction helper function to sum_neon.h and use this for
both standard and high bitdepth SAD4D paths. This also removes the
AArch64 requirement for using the UDOT Neon SAD4D paths.

Change-Id: I207f76b3d42aa541809b0672c3b3d86e54d133ff

2 years agoMerge "Move TPL to a new file" into main
Yunqing Wang [Tue, 7 Feb 2023 04:22:40 +0000 (04:22 +0000)]
Merge "Move TPL to a new file" into main

2 years agoMerge changes Ica45c44f,I75c5f099,I9e626d7f into main
James Zern [Tue, 7 Feb 2023 01:32:03 +0000 (01:32 +0000)]
Merge changes Ica45c44f,I75c5f099,I9e626d7f into main

* changes:
  Optimize Neon implementation of high bitdepth SAD4D functions
  Optimize Neon implementation of high bitdepth avg SAD functions
  Optimize Neon implementation of high bitdepth SAD functions

2 years agoMove TPL to a new file
Yunqing Wang [Mon, 6 Feb 2023 22:48:34 +0000 (14:48 -0800)]
Move TPL to a new file

This is a refactoring CL.

Change-Id: Ic8c1575601d27f14ecd1b1bf0a038e447eaae458

2 years agoMerge "Remove duplicated VPX_SCALING declaration" into main
Jerome Jiang [Mon, 6 Feb 2023 22:16:41 +0000 (22:16 +0000)]
Merge "Remove duplicated VPX_SCALING declaration" into main

2 years agoOptimize Neon implementation of high bitdepth SAD4D functions
Salome Thirot [Thu, 2 Feb 2023 16:06:38 +0000 (16:06 +0000)]
Optimize Neon implementation of high bitdepth SAD4D functions

Optimizations take a similar form to those implemented for Armv8.0
standard bitdepth SAD4D:

- Use ABD, UADALP instead of ABAL, ABAL2 (double the throughput on
  modern out-of-order Arm-designed cores.)
- Use more accumulator registers to make better use of Neon pipeline
  resources on Arm CPUs that have four Neon pipes.
- Compute the four SAD sums in parallel so that we only load the source
  block once - instead of four times.

Change-Id: Ica45c44fd167e5fcc83871d8c138fc72ed3a9723

2 years agoRemove duplicated VPX_SCALING declaration
Jerome Jiang [Mon, 6 Feb 2023 18:29:58 +0000 (13:29 -0500)]
Remove duplicated VPX_SCALING declaration

Use VPX_SCALING_MODE instead

Change-Id: Iab9d29f20838703e00bd9f7641035d8ebd69af53

2 years agoOptimize Neon implementation of high bitdepth avg SAD functions
Salome Thirot [Fri, 3 Feb 2023 11:00:19 +0000 (11:00 +0000)]
Optimize Neon implementation of high bitdepth avg SAD functions

Optimizations take a similar form to those implemented for standard
bitdepth averaging SAD:

- Use ABD, UADALP instead of ABAL, ABAL2 (double the throughput on
  modern out-of-order Arm-designed cores.)
- Use more accumulator registers to make better use of Neon pipeline
  resources on Arm CPUs that have four Neon pipes.

Change-Id: I75c5f09948f6bf17200f82e00e7a827a80451108

2 years agoOptimize Neon implementation of high bitdepth SAD functions
Salome Thirot [Wed, 1 Feb 2023 16:37:24 +0000 (16:37 +0000)]
Optimize Neon implementation of high bitdepth SAD functions

Optimizations take a similar form to those implemented for standard
bitdepth SAD:

- Use ABD, UADALP instead of ABAL, ABAL2 (double the throughput on
  modern out-of-order Arm-designed cores.)
- Use more accumulator registers to make better use of Neon pipeline
  resources on Arm CPUs that have four Neon pipes.

Change-Id: I9e626d7fa0e271908dc43448405a7985b80e6230

2 years agoMerge "Fix uninitialized mesh feature for BEST mode" into main
Yunqing Wang [Fri, 3 Feb 2023 23:22:58 +0000 (23:22 +0000)]
Merge "Fix uninitialized mesh feature for BEST mode" into main

2 years agoSet _img->bit_depth in y4m_input_fetch_frame()
Wan-Teh Chang [Fri, 3 Feb 2023 22:07:09 +0000 (14:07 -0800)]
Set _img->bit_depth in y4m_input_fetch_frame()

This is a port of
https://aomedia-review.googlesource.com/c/aom/+/169961.

Change-Id: I2aa0d12cafde0c73448bf8c57eab0cd92e846468

2 years agoFix uninitialized mesh feature for BEST mode
Yunqing Wang [Fri, 3 Feb 2023 00:30:09 +0000 (16:30 -0800)]
Fix uninitialized mesh feature for BEST mode

At BEST encoding mode, the mesh search range wasn't initialized for
non FC_GRAPHICS_ANIMATION content type, which actually/mistakenly
used speed 0's setting. Fixed it by adding the initialization.

There were 2 ways to fix this. Patchset 1 set to use speed 0's setting
for non FC_GRAPHICS_ANIMATION type. This didn't change BEST mode's
encoding results much, and only a couple of clips' results were changed.

Borg result for BEST mode:
         avg_psnr:  ovr_psnr:  ssim:  encoding_spdup:
lowres2:  -0.004     -0.003   -0.000    0.030
midres2:  -0.006     -0.009   -0.012    0.033
hdres2:    0.002      0.002    0.004    0.015

Patchset 2 set to use BEST's setting for non FC_GRAPHICS_ANIMATION type.
However, the majority of test clips' BDrate got changed up to
~0.5% (gain or loss), and overall it didn't give better performance
than patchset 1. So, we chose to use patchset 1.

Change-Id: Ibbf578dad04420e6ba22cb9a3ddec137a7e4deef

2 years agovp9_diamond_search_sad_neon: use DECLARE_ALIGNED
James Zern [Wed, 1 Feb 2023 21:27:06 +0000 (13:27 -0800)]
vp9_diamond_search_sad_neon: use DECLARE_ALIGNED

rather than the gcc specific __attribute__((aligned())); fixes build
targeting ARM64 windows.

Bug: webm:1788
Change-Id: I2210fc215f44d90c1ce9dee9b54888eb1b78c99e

2 years agoUpdate AUTHORS .mailmap and version v1.13.0-rc1
Jerome Jiang [Wed, 1 Feb 2023 16:38:42 +0000 (11:38 -0500)]
Update AUTHORS .mailmap and version

Bug: webm:1780
Change-Id: I75a24bdd076dc1746b23bababfaafccbce3b4214

2 years agoFix per frame qp for temporal layers
Jerome Jiang [Thu, 26 Jan 2023 00:25:12 +0000 (19:25 -0500)]
Fix per frame qp for temporal layers

Also add tests with fixed temporal layering mode.

Change-Id: If516fe94e3fb7f5a745821d1788bfe6cf90edaac
(cherry picked from commit db69ce6aea278bee88668fd9cc2af2e544516fdb)

2 years agoUpdate CHANGELOG
Jerome Jiang [Tue, 31 Jan 2023 17:16:38 +0000 (12:16 -0500)]
Update CHANGELOG

Bug: webm:1780
Change-Id: I3ab4729bff1d27ef7127ef26e780a469e9278c21

2 years agoMerge "Use load_unaligned mem_neon.h helpers in SAD and SAD4D" into main
James Zern [Tue, 31 Jan 2023 21:20:16 +0000 (21:20 +0000)]
Merge "Use load_unaligned mem_neon.h helpers in SAD and SAD4D" into main

2 years agoUse load_unaligned mem_neon.h helpers in SAD and SAD4D
Jonathan Wright [Tue, 31 Jan 2023 13:32:33 +0000 (13:32 +0000)]
Use load_unaligned mem_neon.h helpers in SAD and SAD4D

Use the load_unaligned helper functions in mem_neon.h to load strided
sequences of 4 bytes where alignment is not guaranteed in the Neon
SAD and SAD4D paths.

Change-Id: I941d226ef94fd7a633b09fc92165a00ba68a1501

2 years agoFix unsigned integer overflow in sse computation
Cheng Chen [Mon, 30 Jan 2023 19:51:58 +0000 (11:51 -0800)]
Fix unsigned integer overflow in sse computation

Basically port the fix from libaom:
https://aomedia-review.googlesource.com/c/aom/+/169361

Change-Id: Id06a5db91372037832399200ded75d514e096726

2 years agoMerge "Refactor 8x8 16-bit Neon transpose functions" into main
James Zern [Mon, 30 Jan 2023 19:30:45 +0000 (19:30 +0000)]
Merge "Refactor 8x8 16-bit Neon transpose functions" into main

2 years agoRefactor Neon implementation of SAD4D functions
Salome Thirot [Fri, 27 Jan 2023 16:16:16 +0000 (16:16 +0000)]
Refactor Neon implementation of SAD4D functions

Refactor and optimize the Neon implementation of SAD4D functions -
effectively backporting these libaom changes[1,2].

[1] https://aomedia-review.googlesource.com/c/aom/+/162181
[2] https://aomedia-review.googlesource.com/c/aom/+/162183

Change-Id: Icb04bd841d86f2d0e2596aa7ba86b74f8d2d360b

2 years agoMerge "Add encoder component timing information" into main
Yunqing Wang [Sat, 28 Jan 2023 00:27:57 +0000 (00:27 +0000)]
Merge "Add encoder component timing information" into main

2 years agoAdd encoder component timing information
Yunqing Wang [Fri, 27 Jan 2023 01:20:54 +0000 (17:20 -0800)]
Add encoder component timing information

Change-Id: Iaa5b73a9593ecfd74b6426ed47d2b529ec7ae2b5

2 years agoRefactor 8x8 16-bit Neon transpose functions
Gerda Zsejke More [Thu, 26 Jan 2023 15:12:55 +0000 (16:12 +0100)]
Refactor 8x8 16-bit Neon transpose functions

Refactor the Neon implementation of transpose_s16_8x8(q) and
transpose_u16_8x8 so that the final step compiles to 8 ZIP1/ZIP2
instructions as opposed to 8 EXT, MOV pairs. This change removes 8
instructions per call to transpose_s16_8x8(q), transpose_u16_8x8
where the result stays in registers for further processing - rather
than being stored to memory - like in vpx_hadamard_8x8_neon, for
example.

This is a backport of this libaom patch[1].
[1] https://aomedia-review.googlesource.com/c/aom/+/169426

Change-Id: Icef3e51d40efeca7008e1c4fc701bf39bd319c88

2 years agoMerge "Fix per frame qp for temporal layers" into main
Jerome Jiang [Thu, 26 Jan 2023 21:31:14 +0000 (21:31 +0000)]
Merge "Fix per frame qp for temporal layers" into main

2 years agoFix per frame qp for temporal layers
Jerome Jiang [Thu, 26 Jan 2023 00:25:12 +0000 (19:25 -0500)]
Fix per frame qp for temporal layers

Also add tests with fixed temporal layering mode.

Change-Id: If516fe94e3fb7f5a745821d1788bfe6cf90edaac

2 years agoMerge "Refactor Neon implementation of SAD functions" into main
James Zern [Thu, 26 Jan 2023 03:26:38 +0000 (03:26 +0000)]
Merge "Refactor Neon implementation of SAD functions" into main

2 years agoMerge "[NEON] Add Highbd FHT 8x8/16x16 functions" into main
James Zern [Thu, 26 Jan 2023 03:23:31 +0000 (03:23 +0000)]
Merge "[NEON] Add Highbd FHT 8x8/16x16 functions" into main