Urvang Joshi [Thu, 3 Jan 2019 22:49:18 +0000 (14:49 -0800)]
VP9 firstpass: Bugfix when mi_col_start/end is odd
Before this patch, if mi_col_end was odd, then the for loop for 'mb_col'
was looping once LESS than it should have been.
For example, if mi_col_end = 47, then the loop was terminating when
mb_col == 23. However, the correct behavior would be to terminate when
mb_col == 24.
The issue was introduced in:
https://chromium-review.googlesource.com/c/webm/libvpx/+/423279
This can lead to many of the stats being inaccurate, for such videos
(with mi_col_start/end having an odd value).
As an example:
Even for very static content, fp_acc_data->intercount can never reach the
same value as num_mbs. And in turn, pcnt_inter can never reach the value 1
(that is, 100%). This would lead to very static videos NOT being marked
static, and encoded like regular videos.
Note: this is just one possible effect based on observation. Other
issues are also possible based on other stats.
Improvement on some test clips:
-------------------------------
- One test clip saw a gain of -2.580% in VBR mode (and -3.153% in Q
mode). The reason for improvement: a wrongly detected scene cut was
avoided due to corrected value in 'this_frame->pcnt_inter'.
- Some very static clips correctly marked as having 100% zero motion.
This avoided addition of unncecessary alt-refs, thereby reducing the
bitrate.
BDRate (PSNR) on regular sets (VBR mode):
-----------------------------------------
lowres: 0.0
midres: -0.027 (some clips were better/worse, but I double checked that
changes were as expected, given correction in stats calculation).
hdres: 0.0
STATS_CHANGED for the types of videos described above.
Yunqing Wang [Fri, 21 Dec 2018 22:46:52 +0000 (14:46 -0800)]
Adaptively choose block sizes in temporal filtering
Use variable block sizes in temporal filtering. Based on prediction
errors of 32x32 or 16x16 blocks, choose the block size adaptively.
This improves the coding performance, especially for HD resolutions.
Speed 1 borg test result:
avg_psnr: ovr_psnr: ssim:
lowres: -0.090 -0.075 -0.112
midres: -0.120 -0.107 -0.168
hdres: -0.506 -0.512 -0.547
Reason for revert: fails to build under visual studio
Original change's description:
> Add Tile-SB-Row based Multi-threading in Decoder
>
> Add the multi-thread function that decodes a video row by row instead
> of a tile at a time. Create a job queue for queueing all parse and recon jobs.
> Each SB row of a tile is a job.
>
> Performance Improvement:
>
> Platform Resolution 3 Threads 4 Threads
> ARM 720p 36.81% 18.37%
> 1080p 32.27% 14.76%
>
> ARM Improvement measured on Nexus 6 Snapdragon 805 Quad-core @ 2.65 GHz
>
> Change-Id: I3d4dd7a932fc2904c90d9546b2de99c809afd29e
kyslov [Fri, 21 Dec 2018 20:04:04 +0000 (12:04 -0800)]
Bound the total allocated memory of frame buffer
This CL allows to limit memory consumption of the frame buffer pool. As
the result if compiled with VPX_MAX_ALLOCABLE_MEMORY set codec will fail
if frame resolution requires more memory
This is backported CL aae2183cb58b60d01b8e4e15269ee9f48dd72908 from
aomedia
Tested:
configure --extra-cflags="-DVPX_MAX_ALLOCABLE_MEMORY=536870912"
make
./test_libvpx
elliottk [Wed, 19 Dec 2018 21:35:30 +0000 (13:35 -0800)]
Improve accuracy of benchmarking
For small code regions, readtsc can give inaccurate results because it does
not account for out-of-order execution. Add x86_tsc_start and x86_tsc_end
that account for this, according to the white paper at
Johann [Thu, 20 Dec 2018 02:09:11 +0000 (18:09 -0800)]
subpixel_8t sse2: resolve missing declarations
vpx_asm_stubs.c only references these sse2 functions. Combine the files
similar to the way the ssse3/avx2 files are set up.
Mark the intrinsics as static because they are only used within the
macros here. It is unfortunate that the assembly functions can not be
marked static as well.
Ritu Baldwa [Tue, 18 Dec 2018 12:09:38 +0000 (17:39 +0530)]
Add Tile-SB-Row based Multi-threading in Decoder
Add the multi-thread function that decodes a video row by row instead
of a tile at a time. Create a job queue for queueing all parse and recon jobs.
Each SB row of a tile is a job.
Jingning Han [Tue, 18 Dec 2018 00:09:06 +0000 (16:09 -0800)]
Relocate tpl buffer allocation
Move it to deeper stages where all the encoder configurations have
been set. This avoids the encoding failure when the buffer is
allocated before the encoder is fully configured.
Marco Paniconi [Mon, 17 Dec 2018 23:12:04 +0000 (15:12 -0800)]
vp9-svc: Adjust search step param for spatial layers
For non-base spatial layer in screen-content mode:
use nstep but with larger step_param value than sl0,
to avoid increase in encode_time.
Some improvement on scrolling slides content.
Marco Paniconi [Mon, 17 Dec 2018 21:23:01 +0000 (13:23 -0800)]
vp9-svc: Define rc scene change flag per superframe
Define the rc->high_num_blocks_with_motion, set in the
scene change analysis, to be defined per superframe.
This is used for increasing motion search area on
some (super)frames, e.g., for scrolling.
Jerome Jiang [Fri, 14 Dec 2018 22:39:58 +0000 (14:39 -0800)]
vp8: Fix potential use-after-free in mfqe.
Similar issue to 842265.
The pointer in vp8 postproc refers to show_frame_mi which is only
updated on show frame. However, when there is a no-show frame which also
changes the size (thus new frame buffers allocated), show_frame_mi is
not updated with new frame buffer memory.
Change the pointer in postproc to mi which is always updated.