Johann [Wed, 30 Mar 2022 05:57:46 +0000 (14:57 +0900)]
quantize: replace highbd versions
The optimized quantize functions were already built to handle
highbd values. The only difference is the clamping. All highbd
functions expand to 32bits when running in highbd mode.
Removes vpx_highbd_quantize_32x32_sse2 as it is slower than the
C version in the worst case.
Johann [Tue, 29 Mar 2022 03:40:12 +0000 (12:40 +0900)]
remove skip_block from quantize
Whether a block is skipped is handled by mi->skip. x->skip_block
is kept exclusively to verify that the quantize functions are not
called for skip blocks.
Johann [Wed, 23 Mar 2022 04:58:46 +0000 (13:58 +0900)]
ads2gas: fix .size measurement
The distance between PROC and END is used to generate .size
information for debugging. When the leading underscore was
removed the pattern used to match the function name broke.
[NEON]
Added vpx_fdct4x4_pass1_neon(),
Added vpx_fdct8x8_pass1_notranspose_neon(),
Added vpx_fdct8x8_pass1_neon() to avoid code duplication
Refactored vpx_fdct4x4_neon() and vpx_dct8x8_neon() to use the above
Rename dct_body to vpx_fdct16x16_body to reuse later
Add transpose_s16_16x16()
I have run make test and all tests/configurations seem to pass.
Profiled using this command on an Ampere Altra VM:
sudo perf record -g ./vpxenc --codec=vp9 --height=1080 --width=1920 \
--fps=25/1 --limit=20 -o output.mkv \
../original_videos_Sports_1080P_Sports_1080P-0063.mkv --debug –rt
Johann [Sat, 12 Mar 2022 22:02:03 +0000 (07:02 +0900)]
ads2gas_apple.pl: remove gcc-isms
The gcc assembler was incompatible for a long
time. It is now based on clang and accepts
more modern syntax, although not enough to
remove the script entirely.
Marco Paniconi [Tue, 8 Feb 2022 20:36:39 +0000 (12:36 -0800)]
rtc-vp9: Fix intra-only for bypass mode
Allow intra-only frame in svc to also work
in bypass (flexible-svc) mode.
Added unittest for the flexible svc case.
And fix the gld_fb_idx for (SL0, TL1) in bypass/flexible
mode pattern in the sample encoder: force it to be 0
(same as lst_fb_idx), since the slot is unused on SL0.
Marco Paniconi [Thu, 3 Feb 2022 00:17:58 +0000 (16:17 -0800)]
rtc-vp9: Fix to tests for intra-only frame.
Fix some issues with the test, and add new
test that verifies that we can decode base stream
startinig at middle of sequence where intra-only
frame is inserted.
Jianhui Dai [Thu, 9 Dec 2021 05:38:22 +0000 (13:38 +0800)]
Set unused reference frames to first ref
If a reference frame is not referenced, then set the index for that
reference to the first one used/referenced instead of unused slot.
Unused slot means key frame, as key frame resets all slots with itself.
This CL extracts `get_first_ref_frame()` from `reset_fb_idx_unused()`
with a typo fixing, and sets all unused reference frames to first ref in
vp9 uncompressed header.
James Zern [Fri, 10 Dec 2021 02:34:18 +0000 (18:34 -0800)]
vp[89]_initalize_enc(): protect against multiple invocations
this removes the burden from callers; the rtcd functions are left with a
mostly redundant (outside of tests) once() as top-level functions should
ensure their constraints are met
v_these_mv_w is always initialized in this block with _mm_add_epi16();
converting this to a _mm_storeu_si32(tmp) call also works, but
introduces more stack usage
|| ../vp9/encoder/x86/vp9_diamond_search_sad_avx.c: In function
‘vp9_diamond_search_sad_avx’:
vp9/encoder/x86/vp9_diamond_search_sad_avx.c|285 col 19| warning:
‘v_these_mv_w’ may be used uninitialized [-Wmaybe-uninitialized]
|| 285 | new_bmv = ((const int_mv *)&v_these_mv_w)[local_best_idx];
|| | ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
vp9/encoder/x86/vp9_diamond_search_sad_avx.c|149 col 21| note:
‘v_these_mv_w’ declared here
|| 149 | const __m128i v_these_mv_w = _mm_add_epi16(v_bmv_w, v_ss_mv_w);
|| | ^~~~~~~~~~~~
as noted in
the size of interp_filter_selected[][]'s first dimension varies between
VP9_COMP and VP9BitstreamWorkerData as noted in the latter's definition:
// The size of interp_filter_selected in VP9_COMP is actually
// MAX_REFERENCE_FRAMES x SWITCHABLE. But when encoding tiles, all we ever do
// is increment the very first index (index 0) for the first dimension. Hence
// this is sufficient.
int interp_filter_selected[1][SWITCHABLE];
normalize the function signatures of write_modes*(), etc. to take this
into account.
vp9/encoder/vp9_bitstream.c|948 col 3| warning: ‘write_modes’ accessing
64 bytes in a region of size 16 [-Wstringop-overflow=]
|| 948 | write_modes(cpi, xd, &cpi->tile_data[data->tile_idx].tile_info,
|| | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|| 949 | &data->bit_writer, tile_row, data->tile_idx,
|| | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|| 950 | &data->max_mv_magnitude, data->interp_filter_selected);
|| | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
vp9/encoder/vp9_bitstream.c|948 col 3| note: referencing argument 8 of
type ‘int (*)[4]’
vp9/encoder/vp9_bitstream.c|488 col 13| note: in a call to function
‘write_modes’
Ilya Kurdyukov [Sat, 13 Nov 2021 11:22:14 +0000 (18:22 +0700)]
faster vp8_regular_quantize_b_sse4_1
Gives 10% faster VP8 encoding in simple tests.
This patch requires testing on wider datasets and encoder
settings to see if this speedup is achieved on most data.
Johann [Tue, 16 Nov 2021 21:02:15 +0000 (06:02 +0900)]
MacOS 12 is darwin21
Remove -mmacosx-version-min. The library does not use
any calls which are affected by the platform version.
There is also no version 10.16 as it went from 10.15
to 11 and now to 12.
At some point it may be good to clarify that the bare
-darwin- target is for iOS and the -darwinN- targets
are for macOS.
Mikko Koivisto [Mon, 15 Nov 2021 18:47:05 +0000 (18:47 +0000)]
vp9: Fix multiplication overflow
Fix UBSan error reported from aosp Cuttlefish device:
/vp9/encoder/vp9_ratectrl.c:238:33: unsigned integer overflow: 2500000 * 1800 cannot be represented in type 'unsigned int'
...by casting the operand and the result of multiplication
to 64bit integer.
Test: vp9 webrtc streaming with Cuttlefish
Change-Id: Id5bb3d4071a96179caffae0829d3cc4e48c7614b
James Zern [Sat, 6 Nov 2021 23:48:13 +0000 (16:48 -0700)]
mem_sse2.h: loadu_uint32 -> loadu_int32
this changes the return to int32_t which matches the type with usage
of this call as input to _mm_cvtsi32_si128(), _mm_set_epi32(), etc.
fixes implicit conversion warning with clang-11 -fsanitize=undefined
James Zern [Sat, 6 Nov 2021 23:43:11 +0000 (16:43 -0700)]
mem_sse2.h: storeu_uint32 -> storeu_int32
this changes the parameter to int32_t which matches the type with usage
of this call using _mm_cvtsi128_si32() as a parameter. quiets an
implicit conversion warning with clang-11 -fsanitize=undefined
test/video_source.h:194:33: runtime error: implicit conversion from type
'int' of value -32 (32-bit, signed) to type 'unsigned int' changed the
value to 4294967264 (32-bit, unsigned)