Jingning Han [Mon, 22 Oct 2018 16:28:04 +0000 (09:28 -0700)]
Use the proper gfu_boost factor to compute rd_mult
Update the Lagrangian multiplier according to the gfu_boost factor
assigned per frame. It improves the multi-layer ARF compression
performance (results below shown for speed 0):
* changes:
Add do_motion_search
Preserve code of doing mv search in raster order
Variant implementation of changing mv search order
Add feature_score_loc_sort
Init mv_[dist/cost]_sum in init_tpl_stats
Change mv search order according to feature_score
Angie Chiang [Wed, 17 Oct 2018 20:56:42 +0000 (13:56 -0700)]
Preserve code of doing mv search in raster order
With this change, there will be three version of mv search scheme
on the codebase simultaneously.
We will do further experiment to evaluate which version is better
in terms of visual quality and coding performance.
Hui Su [Wed, 17 Oct 2018 15:40:26 +0000 (08:40 -0700)]
Enable rect partition search for HBD at speed 1
This patch enables rectangular partition search on speed 1 for high
bit depth encoding. The encoding speed loss is reduced thanks to
recently added speed features.
This only affects speed 1 high bit-depth encoding.
Urvang Joshi [Wed, 17 Oct 2018 18:48:10 +0000 (11:48 -0700)]
For keyframe-only coding do not boost in q mode
If we are using keyframe only coding - either coding a
single frame, or a sequence of keyframes - in the end-usage=q
mode, use the cq_level directly as the quality of each
coded frame, rather than boost them.
chiyotsai [Tue, 16 Oct 2018 19:26:34 +0000 (12:26 -0700)]
Refactor SSE2 Code for 4-tap interpolation filter on width 16.
Some repeated codes are refactored as inline functions. No performance
degradation is observed. These inline functions can be used for width 8
and width 4.
Yunqing Wang [Sat, 13 Oct 2018 00:21:23 +0000 (17:21 -0700)]
Optimize vp9_highbd_temporal_filter_apply_c
Following the previous patch:
(https://chromium-review.googlesource.com/c/webm/libvpx/+/1277913),
this patch modified the highbd version of applying temporal filter
in the similar way.
chiyotsai [Wed, 17 Oct 2018 00:50:37 +0000 (17:50 -0700)]
Add SSE2 support for 4-tap interpolation filter for width 16.
Horizontal filter on 64x64 block: 1.59 times as fast as baseline.
Vertical filter on 64x64 block: 2.5 times as fast as baseline.
2D filter on 64x64 block: 1.96 times as fast as baseline.
Angie Chiang [Tue, 16 Oct 2018 19:31:13 +0000 (12:31 -0700)]
Variant implementation of changing mv search order
We start mv search from the block with highest feature score, then
move on to the block's neighbors with with an searching order using
their feature scores.
We use max heap to help us achieve the functionality.
Yunqing Wang [Tue, 16 Oct 2018 16:24:18 +0000 (09:24 -0700)]
Fix the filter tap calculation in mips optimizations
The interp filter tap calculation was not accurate to tell the
difference between 2 taps and 4 taps. This patch fixed the bug, and
resolved Jenkins test failures in mips sub-pel filter optimizations.
Yunqing Wang [Mon, 15 Oct 2018 22:27:49 +0000 (15:27 -0700)]
A temporary fix to mips sub-pel filters
There are Jenkins test failures in mips sub-pel filter optimizations.
[ RUN ] MSA/ConvolveTest.MatchesReferenceSubpixelFilter/5
../libvpx/test/convolve_test.cc:889: Failure
Expected equality of these values:
lookup(ref, y * kOutputStride + x)
Which is: 255
lookup(out, y * kOutputStride + x)
Which is: 11
mismatch at (1,0), filters (4,0,1)
This relates to the 4-tap kernel added recently. This CL is a temporary
fix, while we investigate the issue.
Jingning Han [Mon, 15 Oct 2018 18:48:39 +0000 (11:48 -0700)]
Add frame_gop_index to GF_GROUP
Add frame_gop_index to track the frame offset within a group of
picture. This reworks the GOP frame offset calculation and use
case. The coding stats remain identical.
Jingning Han [Mon, 15 Oct 2018 17:11:57 +0000 (10:11 -0700)]
Add encoder side frame buffer for tpl model
Add an encoder side reference frame buffer pool to store the
reference frames for tpl model. This servces as an intermediate
step to support multi-layer ARF system. The buffer memory size will
be optimized afterwards.
Jingning Han [Thu, 11 Oct 2018 19:16:01 +0000 (12:16 -0700)]
Refactor tpl model setup to support multi-layer ARF setup
Generalize the tpl model framework to support the newly designed
GOP structure system. The existing tpl model assumes single layer
ARF.
This design will separate the tpl model operation for GOP with
and without ARF cases. When a GOP has ARF, the maximum lookahead
offset would upper limit the needed frame buffer to build the
tpl model for the entire GOP. When a GOP does not have ARF, we
would use the temporal model in a different approach.
The first step will focus on GOP with ARF. All the tpl model related
operation will only be triggered by ARF frame generation.
Yunqing Wang [Fri, 12 Oct 2018 19:25:36 +0000 (12:25 -0700)]
Optimize apply_temporal_filter function
This patch optimized apply_temporal_filter function. The diff^2 for each
pixel in the 16x16 block is calculated once beforehand, so that we don't
calculate it multiple times while evaluating a pixel's neighbors. This
would speed up the function.
Yunqing Wang [Thu, 11 Oct 2018 22:13:47 +0000 (15:13 -0700)]
Make 4-tap interp filter coefficients even numbers
This CL modified 4-tap interp filter coefficients to be even numbers,
which would help in writing 4-tap filter SIMD optimizations. The coding
performance change was negligible. Speed 1 borg test showed:
avg_psnr: ovr_psnr: ssim:
lowres: -0.003 -0.012 -0.017
midres: 0.029 0.018 0.043
hdres: 0.024 0.044 0.033
Reason for revert: <INSERT REASONING HERE>
Regression in webrtc perf test
Original change's description:
> vp8: Increase rate threshold for overshoot-drop
>
> Increase the rate threshold for the dropping when
> overshoot is detected during encoding. This helps
> to prevent some unneccessary drops for hard content.
>
> Change-Id: I258bf33883d46347efd44e1e192cb25c444d05fe
Jingning Han [Wed, 10 Oct 2018 21:52:30 +0000 (14:52 -0700)]
Call tpl model build at the beginning of a GOP
The gop index 0 is default as kf / gf. The effective first coding
frame controlled by the current GOP rate allocation is indexed 1.
Call the tpl model build for the current GOP once at index 1
position. This would unify the calling system for single/multi-layer
ARF GOP structure.
Yunqing Wang [Mon, 8 Oct 2018 23:21:54 +0000 (16:21 -0700)]
Use 4-tap interp filter in speed 1 sub-pel motion search
Added the 4-tap interp filter, and used it for speed 1 sub-pel motion
search. Speed 2 motion search still used bilinear filter as before.
Speed 1 borg test showed good bit savings.
avg_psnr: ovr_psnr: ssim:
lowres: -1.125 -1.179 -1.021
midres: -0.717 -0.710 -0.543
hdres: -0.357 -0.370 -0.342
Speed test at speed 1 showed ~10% encoder time increase, which was
partially because of no SIMD version of 4-tap filter.
Wan-Teh Chang [Mon, 8 Oct 2018 17:03:06 +0000 (10:03 -0700)]
Correct a for loop in init_ref_frame_bufs.
The cm->ref_frame_map and pool->frame_bufs arrays are of different sizes
(REF_FRAMES and FRAME_BUFFERS, respectively), so init_ref_frame_bufs()
cannot iterate over these two arrays using the same for loop.
This "misinformation" may make scan-build warn about the ref_cnt_fb()
function's use of its 'bufs' argument (Dereference of null pointer) when
we pass pool->frame_bufs to ref_cnt_fb().
Rewriting the above code as:
if (buf_idx != INVALID_IDX) {
buf = &pool->frame_bufs[buf_idx];
not only is clearer but also avoids confusing scan-build.
Angie Chiang [Thu, 4 Oct 2018 22:17:02 +0000 (15:17 -0700)]
Fix bug in prepare_nb_full_mvs
Previously, the prepare_nb_full_mvs might construct nb_full_mv with
wrong mvs (from other ref frame).
The following changes will fix the bug.
1) Let ready in TplDepStats becomes int array
2) Add parameter rf_idx
3) Use mv_arr instead of mv to build the nb_full_mv
Marco Paniconi [Wed, 3 Oct 2018 22:25:32 +0000 (15:25 -0700)]
vp8: Increase rate threshold for overshoot-drop
Increase the rate threshold for the dropping when
overshoot is detected during encoding. This helps
to prevent some unneccessary drops for hard content.