Hui Su [Tue, 16 Oct 2018 03:45:07 +0000 (20:45 -0700)]
ML_VAR_PARTITION: enable at speed 5
When the ML_VAR_PARTITION experiment is turned on, replace
REFERENCE_PARTITION with ML_BASED_PARTITION at speed 5.
Coding gains(avg_psnr) compared to baseline:
ytlivehr 1.63%
ytlivelr 0.07%
Tested encoding speed with several clips from ytlivehr and ytlivelr
on linux desktop(rt, vbr, 4 threads). Encoder speed is on average
faster than baseline:
360p: 14% faster
720p: 7% faster
1080p: 1.5% faster
Yunqing Wang [Tue, 16 Oct 2018 16:24:18 +0000 (09:24 -0700)]
Fix the filter tap calculation in mips optimizations
The interp filter tap calculation was not accurate to tell the
difference between 2 taps and 4 taps. This patch fixed the bug, and
resolved Jenkins test failures in mips sub-pel filter optimizations.
Yunqing Wang [Mon, 15 Oct 2018 22:27:49 +0000 (15:27 -0700)]
A temporary fix to mips sub-pel filters
There are Jenkins test failures in mips sub-pel filter optimizations.
[ RUN ] MSA/ConvolveTest.MatchesReferenceSubpixelFilter/5
../libvpx/test/convolve_test.cc:889: Failure
Expected equality of these values:
lookup(ref, y * kOutputStride + x)
Which is: 255
lookup(out, y * kOutputStride + x)
Which is: 11
mismatch at (1,0), filters (4,0,1)
This relates to the 4-tap kernel added recently. This CL is a temporary
fix, while we investigate the issue.
Jingning Han [Mon, 15 Oct 2018 18:48:39 +0000 (11:48 -0700)]
Add frame_gop_index to GF_GROUP
Add frame_gop_index to track the frame offset within a group of
picture. This reworks the GOP frame offset calculation and use
case. The coding stats remain identical.
Jingning Han [Mon, 15 Oct 2018 17:11:57 +0000 (10:11 -0700)]
Add encoder side frame buffer for tpl model
Add an encoder side reference frame buffer pool to store the
reference frames for tpl model. This servces as an intermediate
step to support multi-layer ARF system. The buffer memory size will
be optimized afterwards.
Jingning Han [Thu, 11 Oct 2018 19:16:01 +0000 (12:16 -0700)]
Refactor tpl model setup to support multi-layer ARF setup
Generalize the tpl model framework to support the newly designed
GOP structure system. The existing tpl model assumes single layer
ARF.
This design will separate the tpl model operation for GOP with
and without ARF cases. When a GOP has ARF, the maximum lookahead
offset would upper limit the needed frame buffer to build the
tpl model for the entire GOP. When a GOP does not have ARF, we
would use the temporal model in a different approach.
The first step will focus on GOP with ARF. All the tpl model related
operation will only be triggered by ARF frame generation.
Yunqing Wang [Fri, 12 Oct 2018 19:25:36 +0000 (12:25 -0700)]
Optimize apply_temporal_filter function
This patch optimized apply_temporal_filter function. The diff^2 for each
pixel in the 16x16 block is calculated once beforehand, so that we don't
calculate it multiple times while evaluating a pixel's neighbors. This
would speed up the function.
Yunqing Wang [Thu, 11 Oct 2018 22:13:47 +0000 (15:13 -0700)]
Make 4-tap interp filter coefficients even numbers
This CL modified 4-tap interp filter coefficients to be even numbers,
which would help in writing 4-tap filter SIMD optimizations. The coding
performance change was negligible. Speed 1 borg test showed:
avg_psnr: ovr_psnr: ssim:
lowres: -0.003 -0.012 -0.017
midres: 0.029 0.018 0.043
hdres: 0.024 0.044 0.033
Reason for revert: <INSERT REASONING HERE>
Regression in webrtc perf test
Original change's description:
> vp8: Increase rate threshold for overshoot-drop
>
> Increase the rate threshold for the dropping when
> overshoot is detected during encoding. This helps
> to prevent some unneccessary drops for hard content.
>
> Change-Id: I258bf33883d46347efd44e1e192cb25c444d05fe
Jingning Han [Wed, 10 Oct 2018 21:52:30 +0000 (14:52 -0700)]
Call tpl model build at the beginning of a GOP
The gop index 0 is default as kf / gf. The effective first coding
frame controlled by the current GOP rate allocation is indexed 1.
Call the tpl model build for the current GOP once at index 1
position. This would unify the calling system for single/multi-layer
ARF GOP structure.
Yunqing Wang [Mon, 8 Oct 2018 23:21:54 +0000 (16:21 -0700)]
Use 4-tap interp filter in speed 1 sub-pel motion search
Added the 4-tap interp filter, and used it for speed 1 sub-pel motion
search. Speed 2 motion search still used bilinear filter as before.
Speed 1 borg test showed good bit savings.
avg_psnr: ovr_psnr: ssim:
lowres: -1.125 -1.179 -1.021
midres: -0.717 -0.710 -0.543
hdres: -0.357 -0.370 -0.342
Speed test at speed 1 showed ~10% encoder time increase, which was
partially because of no SIMD version of 4-tap filter.
Wan-Teh Chang [Mon, 8 Oct 2018 17:03:06 +0000 (10:03 -0700)]
Correct a for loop in init_ref_frame_bufs.
The cm->ref_frame_map and pool->frame_bufs arrays are of different sizes
(REF_FRAMES and FRAME_BUFFERS, respectively), so init_ref_frame_bufs()
cannot iterate over these two arrays using the same for loop.
This "misinformation" may make scan-build warn about the ref_cnt_fb()
function's use of its 'bufs' argument (Dereference of null pointer) when
we pass pool->frame_bufs to ref_cnt_fb().
Rewriting the above code as:
if (buf_idx != INVALID_IDX) {
buf = &pool->frame_bufs[buf_idx];
not only is clearer but also avoids confusing scan-build.
Angie Chiang [Thu, 4 Oct 2018 22:17:02 +0000 (15:17 -0700)]
Fix bug in prepare_nb_full_mvs
Previously, the prepare_nb_full_mvs might construct nb_full_mv with
wrong mvs (from other ref frame).
The following changes will fix the bug.
1) Let ready in TplDepStats becomes int array
2) Add parameter rf_idx
3) Use mv_arr instead of mv to build the nb_full_mv
Marco Paniconi [Wed, 3 Oct 2018 22:25:32 +0000 (15:25 -0700)]
vp8: Increase rate threshold for overshoot-drop
Increase the rate threshold for the dropping when
overshoot is detected during encoding. This helps
to prevent some unneccessary drops for hard content.
Paul Wilkins [Tue, 2 Oct 2018 15:11:14 +0000 (16:11 +0100)]
Force even arf group length where possible.
This patch tweaks the calculation of the active maximum GF interval
and also the break out clause for the GF interval loop. The changes
force the maximum and where possible the break out value to be odd
which in turn will result in an even length ARF group if ARF coding is
selected (vs GF only coding).
The primary aim was to improve coding with multi layer arf groups.
For the single layer case there are small net gains in 3 out of 4 sets
(low,md, hd) and a small net drop for the NF2K set.
For multi-layer the gains (opsnr, ssim, psnr-hvs : -ve = better) were:-
Hui Su [Sat, 29 Sep 2018 21:48:56 +0000 (14:48 -0700)]
Introduce the ml_var_partition_pruning feature
Add the ml_var_partition_pruning encoder speed feature that
uses a neural net model to prune partition-none and partition-split
search. The model uses prediction residue variance and quantization
step size as input features.
Encoding speed gain for speed 0(tested over 20 hdres clips):
QP=30 QP=40
average 17.7% 18.3%
max 24.46% 26.6%
Paul Wilkins [Thu, 27 Sep 2018 09:55:05 +0000 (10:55 +0100)]
Fix minor bug in calculation of max arf group length.
Their is no valid last boosted Q availably when estimating the maximum
group length for the first ARF group in a clip, so use a value based on
the current max q.
Paul Wilkins [Fri, 28 Sep 2018 15:54:03 +0000 (16:54 +0100)]
Adjustment of GOP intra factor for multi-layer.
This provides and alternative (still to be tuned for edge cases)
approach to adjusting the gop intra factor when multi-layer coding
is in effect that does not alter single layer coding.