Zoe Liu [Thu, 21 Jun 2018 23:28:15 +0000 (16:28 -0700)]
Add extra altref option for hierarchical structure
This CL is to hook up the implemented hierarchical structure
construction as well as its corresponding bitrate allocation
functionality with the defining of a GF group.
Currently the hierarchical structure is off by default. Hence this CL
has no impact on coding performance.
Zoe Liu [Wed, 20 Jun 2018 01:11:08 +0000 (18:11 -0700)]
Add bit allocation for hierarchical layer
This CL migrates the bit allocation scheme from libaom and combines the
scheme for hierarchical layer with the updated scheme in libvpx that
uses a modified scheme to calculate the target bitrate per frame.
Johann [Wed, 20 Jun 2018 20:10:54 +0000 (13:10 -0700)]
libyuv: remove problematic functions
These fail to build with clang on 32 bit with
--disable-optimizations
Upstream libyuv has addressed these and we will get updated
versions on the next roll. At the moment, we don't use
libyuv for copying alpha data and so this is a quick fix.
Marco Paniconi [Tue, 12 Jun 2018 18:50:29 +0000 (11:50 -0700)]
vp9-svc: Add support for spatial layer sync frames.
Add encoder control to allow application to insert
spatial layer sync frame. The sync frame disables
temporal prediction for that spatial layer.
This is useful for RTC application to have receiver
start decoding a higher spatial layer, without inserting
a key frame on base spatial layer.
If the layer sync is requested on the base spatial layer
this then force a key frame, otherwise it only disables
the temporal reference for that spatial layer, allowing
temporal prediction to continue for the other layers.
Although the temporal prediction is disabled and reset
on a layer sync frame, the inter-layer prediction for the
sync frame is enabled on INTER frames. So the meaning of
INTER_LAYER_PRED_OFF_NONKEY is modified to mean disable
inter-layer prediction on non-key and non-sync frames.
Added unittest for inserting layer sync frames.
Bump up ABI version.
Change-Id: Id458acc400a77c853551f125c4e7b6d001991f03
Jingning Han [Wed, 30 May 2018 20:31:08 +0000 (13:31 -0700)]
Refactor partition mode cost calculation
Compute the coding block partition mode cost as additional rdcost
to the cumulative rate-distortion cost from each coding block. This
changes the coding performance slightly due to the rounding error.
The compression performance change is neutral.
Hui Su [Tue, 12 Jun 2018 18:56:09 +0000 (11:56 -0700)]
Improve the partition search breakout speed feature
Use a linear model to make partition search breakout decisions.
Currently the model is tuned for large quantizers and small resolutions.
So it is only used when q-index is larger than 200 and frame
width/height is smaller than 720. Also it's not yet supported for high
bit depth.
Tested speed 1 and 2 on lowres and midres. Compression performance is
neutral. At low bitrates, encoding speedup is up to 50% for speed 1;
up to 30% for speed 2.
Some sample numbers:
Luc Trudeau [Wed, 13 Jun 2018 19:24:54 +0000 (15:24 -0400)]
[VSX] Optimize PROCESS16 macro
The PROCESS16 macro now uses 8-bit lanes instead of 16-bit lanes.
SADTest Speed Test (POWER8 Model 2.1)
16x8 Old VSX time = 16.7 ms, new VSX time = 9.1 ms [1.8x]
16x16 Old VSX time = 15.7 ms, new VSX time = 7.9 ms [2.0x]
16x32 Old VSX time = 14.4 ms, new VSX time = 7.2 ms [2.0x]
32x16 Old VSX time = 14.0 ms, new VSX time = 7.4 ms [1.9x]
32x32 Old VSX time = 13.4 ms, new VSX time = 6.5 ms [2.0x]
32x64 Old VSX time = 12.7 ms, new VSX time = 6.3 ms [2.0x]
64x32 Old VSX time = 12.6 ms, new VSX time = 6.3 ms [2.0x]
64x64 Old VSX time = 12.7 ms, new VSX time = 6.2 ms [2.0x]
Zoe Liu [Thu, 14 Jun 2018 00:33:57 +0000 (17:33 -0700)]
Unify frame_index in defining GF group structure
Following are completed in defining GF group structure in firstpass:
1. Remove redundant alt_frame_index;
2. Remove hard coded index value with the variable of frame_index.
Luc Trudeau [Wed, 13 Jun 2018 17:39:04 +0000 (13:39 -0400)]
VSX Version of SAD8xN
VSX versions of the SAD functions of width 8.
SADTest Speed Test (POWER8 Model 2.1)
8x4 C time = 68.7 ms (±0.3 ms), VSX time = 31.8 ms (±0.1 ms) [2.2x]
8x8 C time = 55.6 ms (±0.3 ms), VSX time = 18.3 ms (±0.1 ms) [3.0x]
8x16 C time = 46.5 ms (±0.1 ms), VSX time = 15.6 ms (±0.1 ms) [3.0x]
Luc Trudeau [Wed, 13 Jun 2018 17:36:17 +0000 (13:36 -0400)]
Add Speed Tests for the SADTest test suite.
Speed tests are added for the SADTest test suite. These test use the
AbstractBench and print the median run time of SAD operations. Speed
tests are disabled by default.
Jerome Jiang [Mon, 11 Jun 2018 18:05:36 +0000 (11:05 -0700)]
vp9 svc: Denoise golden when it's a temporal ref.
When golden was the inter-layer reference, a block that selected the golden ref
would not be denoised.
But when golden is used as a second temporal reference then we should denoise
blocks that select the golden reference.
This changes allows for that.
Luc Trudeau [Thu, 7 Jun 2018 19:30:23 +0000 (15:30 -0400)]
VSX Version of vp9_quantize_fp
Low bit depth version only. Passes the VP9QuantizeTest test suite.
VP9QuantizeTest Speed Test (POWER8 Model 2.1)
4x4 C time = 86.3 ms (±0.7 ms), VSX time = 18.2 ms (±0.0 ms) [ 4.7x]
8x8 C time = 57.7 ms (±0.3 ms), VSX time = 7.6 ms (±0.0 ms) [ 7.6x]
16x16 C time = 50.7 ms (±0.1 ms), VSX time = 4.9 ms (±0.0 ms) [10.3x]
Marco Paniconi [Thu, 7 Jun 2018 23:35:44 +0000 (16:35 -0700)]
vp9-svc: Fix to frames_since_golden update for SVC.
When the second (gf) temporal reference is used in SVC:
the reference is refreshed on base TL superframes, and so
the rc->frames_since_golden counter was also only updated on
base TL frames. But this was disabling the golden reference
from being used as a temporal reference for TL > 0 frames
(since frames_since_golden was 0/not updated on TL > 0 frames).
Fix is to copy the update of rc->frames_since_golden to all
upper temporal layers. This allows TL > 0 frames to test the
golden inter mode.
Gain on RTC set: ~2%, ~8% on desktop_vga clip.
Encode time increase ~5-8% on linux, 3SL-3TL run with 1 thread.
For now keep this off for TL > 0 frames in speed features, so
this change does not change current behavior for speed >= 7.
Hui Su [Fri, 8 Jun 2018 20:41:05 +0000 (13:41 -0700)]
Small speedup of ml_pruning_partition()
Terminate early and skip neural net model when linear score is already
high enough, which indicates that we should not skip split and
rectangular partitions.
No changes on compression; encoding speed improves slightly.
Marco Paniconi [Thu, 7 Jun 2018 22:07:57 +0000 (15:07 -0700)]
vp9-svc: Adjust some logic on gf temporal reference.
For the feature of using second temporal reference (when
inter-layer is off): move the buffer_idx assignement and
refresh flag settings further down to vp9_rc_get_svc_params(),
since is_key_frame is set there for every frame/layer.
Otherwise it was using the setting from the previous frame/layer.
This makes the refresh more consistent for both layers for
2 spatial layers case.
Luca Barbato [Wed, 6 Jun 2018 21:10:18 +0000 (21:10 +0000)]
Implement subtract_block for VSX
~2x speedup or better.
[ RUN ] C/VP9SubtractBlockTest.Speed/0
[ BENCH ] 4x4 365.1 ms ( ±2.2 ms )
[ BENCH ] 8x4 258.5 ms ( ±0.3 ms )
[ BENCH ] 4x8 202.7 ms ( ±0.2 ms )
[ BENCH ] 8x8 162.2 ms ( ±0.5 ms )
[ BENCH ] 16x8 138.8 ms ( ±0.3 ms )
[ BENCH ] 8x16 121.5 ms ( ±0.4 ms )
[ BENCH ] 16x16 110.2 ms ( ±0.5 ms )
[ BENCH ] 32x16 104.8 ms ( ±0.1 ms )
[ BENCH ] 16x32 32.7 ms ( ±0.1 ms )
[ BENCH ] 32x32 30.0 ms ( ±0.0 ms )
[ BENCH ] 64x32 28.7 ms ( ±0.0 ms )
[ BENCH ] 32x64 20.1 ms ( ±0.0 ms )
[ BENCH ] 64x64 19.3 ms ( ±0.0 ms )
[ RUN ] VSX/VP9SubtractBlockTest.Speed/0
[ BENCH ] 4x4 155.3 ms ( ±0.9 ms )
[ BENCH ] 8x4 99.3 ms ( ±0.4 ms )
[ BENCH ] 4x8 77.2 ms ( ±0.1 ms )
[ BENCH ] 8x8 45.7 ms ( ±0.0 ms )
[ BENCH ] 16x8 34.1 ms ( ±0.0 ms )
[ BENCH ] 8x16 29.5 ms ( ±0.0 ms )
[ BENCH ] 16x16 19.9 ms ( ±0.0 ms )
[ BENCH ] 32x16 15.1 ms ( ±0.0 ms )
[ BENCH ] 16x32 16.7 ms ( ±0.0 ms )
[ BENCH ] 32x32 14.1 ms ( ±0.0 ms )
[ BENCH ] 64x32 12.6 ms ( ±0.0 ms )
[ BENCH ] 32x64 12.0 ms ( ±0.0 ms )
[ BENCH ] 64x64 11.2 ms ( ±0.0 ms )
Tom Finegan [Thu, 7 Jun 2018 19:35:05 +0000 (12:35 -0700)]
Add avx512 compile test.
Some compiler releases allow the -mavx512f arg without actually
implementing support. Test for this situation, and disable avx512
when it is detected by configure.
Marco Paniconi [Thu, 7 Jun 2018 17:52:09 +0000 (10:52 -0700)]
vp9-svc: Allow second temporal reference for next highest layer.
When inter-layer prediction is disabled on INTER frames, allow
for next highest resolution to have second temporal reference.
Current code allowed for only top/highest spatial layer.
Marco Paniconi [Thu, 7 Jun 2018 05:42:38 +0000 (22:42 -0700)]
vp9-svc: Modify choose_partitioning for second temporal ref
For mode where second temporal reference is used in SVC: allow
for using/testing this reference (golden ref) in the variance
partition scheme (choose_partitioning).
Small positive gain (~0.25%) on metrics for 3 layer SVC,
negligible change in speed.
Marco Paniconi [Wed, 6 Jun 2018 19:14:59 +0000 (12:14 -0700)]
vp9-svc: Add a buffer_idx is_used parameter for SVC.
For the case where a second (long term) temoral reference is
used in the SVC: this additional parameter is to make sure the
buffer slot selected for this reference is available for usage,
i.e., it is never used for any of the 3 references set for the
fixed SVC patterns.
Jerome Jiang [Tue, 5 Jun 2018 22:21:29 +0000 (15:21 -0700)]
vp9: Move up reset of cyclic refresh under dynamic resize.
When resize happens and cyclic refresh is not applied on the
current (resized) frame, the sb_index is not reset and then
might be out of boundary on future frames when the
cyclic refresh is applied.