chiyotsai [Wed, 16 Jan 2019 18:53:56 +0000 (10:53 -0800)]
Add unit test for temporal filter on VP9
The current unit tests for temporal filtering only tests single
channel version of temporal filter. Since VP9 currently uses both luma
and chroma channel information for temporal filtering on low bitdepth,
there is no unit case in this scenario.
This commit adds some basic unit tests to facilitate further development
on temporal filtering.
Marco Paniconi [Tue, 15 Jan 2019 20:12:47 +0000 (12:12 -0800)]
vp9-svc: Fix to buffer update under frame_drops
For svc with frame dropping in full_superframe_drop or
constrained dropped mode: the buffer level for a given layer
may be capped from increasing too much. This is because that layer
may be dropped even though its buffer is stable (the dropped is forced
due to underflow in other layers in full/constrained svc-drop mode).
This capping is needed to prevent decrease in qp over consecutive
frame drops.
The capping already exists and has been used, but this change
introduce an error that prevented its usage:
https://chromium-review.googlesource.com/c/webm/libvpx/+/1330875
The fix here is to also cap the bits_off_target as well, since after
the change mentioned above, its the bits_off_target that is used to
update buffer on next frame (which in turn affects qp for next frame/layer).
chiyotsai [Sat, 3 Nov 2018 00:08:05 +0000 (17:08 -0700)]
Remove unnecessary calculation in 4-tap interpolation filter
Reduces the number of rows calculated for 2D 4-tap interpolation filter
from h+7 rows to h+3 rows.
Also fixes a bug in the avx2 function for 4-tap filters where the last
row is computed incorrectly.
Marco Paniconi [Tue, 15 Jan 2019 01:02:59 +0000 (17:02 -0800)]
vp9-svc: Rate control fix for key base layer
After encoding key frame on base spatial layer,
if the overshoot is significant, reset the
avg_frame_qindex[INTER] on base spatial layer for
all temporal layers.
This forces the active_worst_quality to increase
on subsequent frames/layers and reduces frame dropping.
Wan-Teh Chang [Mon, 14 Jan 2019 19:54:59 +0000 (11:54 -0800)]
Reset buffer_alloc_sz after freeing buffer_alloc.
ybf->buffer_alloc and ybf->buffer_alloc_sz should ideally be kept in
sync. If ybf->buffer_alloc is reset to NULL after being freed, then
ybf->buffer_alloc_sz should be reset to 0.
kyslov [Sat, 5 Jan 2019 01:04:09 +0000 (17:04 -0800)]
Fix OOB memory access on fuzzed data
vp8_norm table has 256 elements while index to it can be higher on
fuzzed data. Typecasting it to unsigned char will ensure valid range and
will trigger proper error later. Also declaring "shift" as unsigned char to
avoid UB sanitizer warning
Urvang Joshi [Thu, 3 Jan 2019 22:49:18 +0000 (14:49 -0800)]
VP9 firstpass: Bugfix when mi_col_start/end is odd
Before this patch, if mi_col_end was odd, then the for loop for 'mb_col'
was looping once LESS than it should have been.
For example, if mi_col_end = 47, then the loop was terminating when
mb_col == 23. However, the correct behavior would be to terminate when
mb_col == 24.
The issue was introduced in:
https://chromium-review.googlesource.com/c/webm/libvpx/+/423279
This can lead to many of the stats being inaccurate, for such videos
(with mi_col_start/end having an odd value).
As an example:
Even for very static content, fp_acc_data->intercount can never reach the
same value as num_mbs. And in turn, pcnt_inter can never reach the value 1
(that is, 100%). This would lead to very static videos NOT being marked
static, and encoded like regular videos.
Note: this is just one possible effect based on observation. Other
issues are also possible based on other stats.
Improvement on some test clips:
-------------------------------
- One test clip saw a gain of -2.580% in VBR mode (and -3.153% in Q
mode). The reason for improvement: a wrongly detected scene cut was
avoided due to corrected value in 'this_frame->pcnt_inter'.
- Some very static clips correctly marked as having 100% zero motion.
This avoided addition of unncecessary alt-refs, thereby reducing the
bitrate.
BDRate (PSNR) on regular sets (VBR mode):
-----------------------------------------
lowres: 0.0
midres: -0.027 (some clips were better/worse, but I double checked that
changes were as expected, given correction in stats calculation).
hdres: 0.0
STATS_CHANGED for the types of videos described above.
Angie Chiang [Fri, 4 Jan 2019 04:48:12 +0000 (20:48 -0800)]
Add full_pixel_exhaustive_new
Add full_pixel_exhaustive_new() and exhuastive_mesh_search_new().
The two functions are variants from full_pixel_exhaustive() and
exhuastive_mesh_search().
In the new versions, we use mv inconsistency in place of
mv entropy cost.
Yunqing Wang [Fri, 21 Dec 2018 22:46:52 +0000 (14:46 -0800)]
Adaptively choose block sizes in temporal filtering
Use variable block sizes in temporal filtering. Based on prediction
errors of 32x32 or 16x16 blocks, choose the block size adaptively.
This improves the coding performance, especially for HD resolutions.
Speed 1 borg test result:
avg_psnr: ovr_psnr: ssim:
lowres: -0.090 -0.075 -0.112
midres: -0.120 -0.107 -0.168
hdres: -0.506 -0.512 -0.547
Reason for revert: fails to build under visual studio
Original change's description:
> Add Tile-SB-Row based Multi-threading in Decoder
>
> Add the multi-thread function that decodes a video row by row instead
> of a tile at a time. Create a job queue for queueing all parse and recon jobs.
> Each SB row of a tile is a job.
>
> Performance Improvement:
>
> Platform Resolution 3 Threads 4 Threads
> ARM 720p 36.81% 18.37%
> 1080p 32.27% 14.76%
>
> ARM Improvement measured on Nexus 6 Snapdragon 805 Quad-core @ 2.65 GHz
>
> Change-Id: I3d4dd7a932fc2904c90d9546b2de99c809afd29e
kyslov [Fri, 21 Dec 2018 20:04:04 +0000 (12:04 -0800)]
Bound the total allocated memory of frame buffer
This CL allows to limit memory consumption of the frame buffer pool. As
the result if compiled with VPX_MAX_ALLOCABLE_MEMORY set codec will fail
if frame resolution requires more memory
This is backported CL aae2183cb58b60d01b8e4e15269ee9f48dd72908 from
aomedia
Tested:
configure --extra-cflags="-DVPX_MAX_ALLOCABLE_MEMORY=536870912"
make
./test_libvpx
elliottk [Wed, 19 Dec 2018 21:35:30 +0000 (13:35 -0800)]
Improve accuracy of benchmarking
For small code regions, readtsc can give inaccurate results because it does
not account for out-of-order execution. Add x86_tsc_start and x86_tsc_end
that account for this, according to the white paper at
Johann [Thu, 20 Dec 2018 02:09:11 +0000 (18:09 -0800)]
subpixel_8t sse2: resolve missing declarations
vpx_asm_stubs.c only references these sse2 functions. Combine the files
similar to the way the ssse3/avx2 files are set up.
Mark the intrinsics as static because they are only used within the
macros here. It is unfortunate that the assembly functions can not be
marked static as well.