Jingning Han [Wed, 24 Oct 2018 03:30:35 +0000 (20:30 -0700)]
Enable tpl model to support multi-layer ARF
Enable temporal dependency model for the base layer ARF. It
improves the multi-layer ARF compression performance (results
are tested in speed 0 vbr mode):
Jingning Han [Tue, 23 Oct 2018 23:24:28 +0000 (16:24 -0700)]
Reset frame udpate flags after qp estimate in tpl
After the frame quantizer estimate run in tpl model, reset the
actual value assigned to the current coding frame. This would
avoid certain frame update flags being overwritten by different
frame types' update.
Yunqing Wang [Tue, 23 Oct 2018 19:30:13 +0000 (12:30 -0700)]
Use 8-tap interp filter in temporal filtering
Used 8-tap interp filter in temporal filtering to achieve more accurate
motion search result. Using 8-tap sharp gave slight better result than
using 8-tap regular.
Speed 0 borg test showed that
avg_psnr: ovr_psnr: ssim:
hdres: -0.160 -0.157 -0.173
midres: -0.083 -0.061 -0.183
lowres: -0.077 -0.099 -0.204
Speed test didn't see noticeable encoder time changes.
Jingning Han [Mon, 22 Oct 2018 16:28:04 +0000 (09:28 -0700)]
Use the proper gfu_boost factor to compute rd_mult
Update the Lagrangian multiplier according to the gfu_boost factor
assigned per frame. It improves the multi-layer ARF compression
performance (results below shown for speed 0):
Hui Su [Tue, 16 Oct 2018 03:45:07 +0000 (20:45 -0700)]
ML_VAR_PARTITION: enable at speed 5
When the ML_VAR_PARTITION experiment is turned on, replace
REFERENCE_PARTITION with ML_BASED_PARTITION at speed 5.
Coding gains(avg_psnr) compared to baseline:
ytlivehr 1.63%
ytlivelr 0.07%
Tested encoding speed with several clips from ytlivehr and ytlivelr
on linux desktop(rt, vbr, 4 threads). Encoder speed is on average
faster than baseline:
360p: 14% faster
720p: 7% faster
1080p: 1.5% faster
* changes:
Add do_motion_search
Preserve code of doing mv search in raster order
Variant implementation of changing mv search order
Add feature_score_loc_sort
Init mv_[dist/cost]_sum in init_tpl_stats
Change mv search order according to feature_score
Angie Chiang [Wed, 17 Oct 2018 20:56:42 +0000 (13:56 -0700)]
Preserve code of doing mv search in raster order
With this change, there will be three version of mv search scheme
on the codebase simultaneously.
We will do further experiment to evaluate which version is better
in terms of visual quality and coding performance.
Hui Su [Wed, 17 Oct 2018 15:40:26 +0000 (08:40 -0700)]
Enable rect partition search for HBD at speed 1
This patch enables rectangular partition search on speed 1 for high
bit depth encoding. The encoding speed loss is reduced thanks to
recently added speed features.
This only affects speed 1 high bit-depth encoding.
Urvang Joshi [Wed, 17 Oct 2018 18:48:10 +0000 (11:48 -0700)]
For keyframe-only coding do not boost in q mode
If we are using keyframe only coding - either coding a
single frame, or a sequence of keyframes - in the end-usage=q
mode, use the cq_level directly as the quality of each
coded frame, rather than boost them.
chiyotsai [Tue, 16 Oct 2018 19:26:34 +0000 (12:26 -0700)]
Refactor SSE2 Code for 4-tap interpolation filter on width 16.
Some repeated codes are refactored as inline functions. No performance
degradation is observed. These inline functions can be used for width 8
and width 4.
Yunqing Wang [Sat, 13 Oct 2018 00:21:23 +0000 (17:21 -0700)]
Optimize vp9_highbd_temporal_filter_apply_c
Following the previous patch:
(https://chromium-review.googlesource.com/c/webm/libvpx/+/1277913),
this patch modified the highbd version of applying temporal filter
in the similar way.
chiyotsai [Wed, 17 Oct 2018 00:50:37 +0000 (17:50 -0700)]
Add SSE2 support for 4-tap interpolation filter for width 16.
Horizontal filter on 64x64 block: 1.59 times as fast as baseline.
Vertical filter on 64x64 block: 2.5 times as fast as baseline.
2D filter on 64x64 block: 1.96 times as fast as baseline.
Angie Chiang [Tue, 16 Oct 2018 19:31:13 +0000 (12:31 -0700)]
Variant implementation of changing mv search order
We start mv search from the block with highest feature score, then
move on to the block's neighbors with with an searching order using
their feature scores.
We use max heap to help us achieve the functionality.
Yunqing Wang [Tue, 16 Oct 2018 16:24:18 +0000 (09:24 -0700)]
Fix the filter tap calculation in mips optimizations
The interp filter tap calculation was not accurate to tell the
difference between 2 taps and 4 taps. This patch fixed the bug, and
resolved Jenkins test failures in mips sub-pel filter optimizations.
Yunqing Wang [Mon, 15 Oct 2018 22:27:49 +0000 (15:27 -0700)]
A temporary fix to mips sub-pel filters
There are Jenkins test failures in mips sub-pel filter optimizations.
[ RUN ] MSA/ConvolveTest.MatchesReferenceSubpixelFilter/5
../libvpx/test/convolve_test.cc:889: Failure
Expected equality of these values:
lookup(ref, y * kOutputStride + x)
Which is: 255
lookup(out, y * kOutputStride + x)
Which is: 11
mismatch at (1,0), filters (4,0,1)
This relates to the 4-tap kernel added recently. This CL is a temporary
fix, while we investigate the issue.
Jingning Han [Mon, 15 Oct 2018 18:48:39 +0000 (11:48 -0700)]
Add frame_gop_index to GF_GROUP
Add frame_gop_index to track the frame offset within a group of
picture. This reworks the GOP frame offset calculation and use
case. The coding stats remain identical.
Jingning Han [Mon, 15 Oct 2018 17:11:57 +0000 (10:11 -0700)]
Add encoder side frame buffer for tpl model
Add an encoder side reference frame buffer pool to store the
reference frames for tpl model. This servces as an intermediate
step to support multi-layer ARF system. The buffer memory size will
be optimized afterwards.
Jingning Han [Thu, 11 Oct 2018 19:16:01 +0000 (12:16 -0700)]
Refactor tpl model setup to support multi-layer ARF setup
Generalize the tpl model framework to support the newly designed
GOP structure system. The existing tpl model assumes single layer
ARF.
This design will separate the tpl model operation for GOP with
and without ARF cases. When a GOP has ARF, the maximum lookahead
offset would upper limit the needed frame buffer to build the
tpl model for the entire GOP. When a GOP does not have ARF, we
would use the temporal model in a different approach.
The first step will focus on GOP with ARF. All the tpl model related
operation will only be triggered by ARF frame generation.
Yunqing Wang [Fri, 12 Oct 2018 19:25:36 +0000 (12:25 -0700)]
Optimize apply_temporal_filter function
This patch optimized apply_temporal_filter function. The diff^2 for each
pixel in the 16x16 block is calculated once beforehand, so that we don't
calculate it multiple times while evaluating a pixel's neighbors. This
would speed up the function.
Yunqing Wang [Thu, 11 Oct 2018 22:13:47 +0000 (15:13 -0700)]
Make 4-tap interp filter coefficients even numbers
This CL modified 4-tap interp filter coefficients to be even numbers,
which would help in writing 4-tap filter SIMD optimizations. The coding
performance change was negligible. Speed 1 borg test showed:
avg_psnr: ovr_psnr: ssim:
lowres: -0.003 -0.012 -0.017
midres: 0.029 0.018 0.043
hdres: 0.024 0.044 0.033
Reason for revert: <INSERT REASONING HERE>
Regression in webrtc perf test
Original change's description:
> vp8: Increase rate threshold for overshoot-drop
>
> Increase the rate threshold for the dropping when
> overshoot is detected during encoding. This helps
> to prevent some unneccessary drops for hard content.
>
> Change-Id: I258bf33883d46347efd44e1e192cb25c444d05fe
Jingning Han [Wed, 10 Oct 2018 21:52:30 +0000 (14:52 -0700)]
Call tpl model build at the beginning of a GOP
The gop index 0 is default as kf / gf. The effective first coding
frame controlled by the current GOP rate allocation is indexed 1.
Call the tpl model build for the current GOP once at index 1
position. This would unify the calling system for single/multi-layer
ARF GOP structure.
Yunqing Wang [Mon, 8 Oct 2018 23:21:54 +0000 (16:21 -0700)]
Use 4-tap interp filter in speed 1 sub-pel motion search
Added the 4-tap interp filter, and used it for speed 1 sub-pel motion
search. Speed 2 motion search still used bilinear filter as before.
Speed 1 borg test showed good bit savings.
avg_psnr: ovr_psnr: ssim:
lowres: -1.125 -1.179 -1.021
midres: -0.717 -0.710 -0.543
hdres: -0.357 -0.370 -0.342
Speed test at speed 1 showed ~10% encoder time increase, which was
partially because of no SIMD version of 4-tap filter.