Luc Trudeau [Wed, 13 Jun 2018 19:24:54 +0000 (15:24 -0400)]
[VSX] Optimize PROCESS16 macro
The PROCESS16 macro now uses 8-bit lanes instead of 16-bit lanes.
SADTest Speed Test (POWER8 Model 2.1)
16x8 Old VSX time = 16.7 ms, new VSX time = 9.1 ms [1.8x]
16x16 Old VSX time = 15.7 ms, new VSX time = 7.9 ms [2.0x]
16x32 Old VSX time = 14.4 ms, new VSX time = 7.2 ms [2.0x]
32x16 Old VSX time = 14.0 ms, new VSX time = 7.4 ms [1.9x]
32x32 Old VSX time = 13.4 ms, new VSX time = 6.5 ms [2.0x]
32x64 Old VSX time = 12.7 ms, new VSX time = 6.3 ms [2.0x]
64x32 Old VSX time = 12.6 ms, new VSX time = 6.3 ms [2.0x]
64x64 Old VSX time = 12.7 ms, new VSX time = 6.2 ms [2.0x]
Zoe Liu [Thu, 14 Jun 2018 00:33:57 +0000 (17:33 -0700)]
Unify frame_index in defining GF group structure
Following are completed in defining GF group structure in firstpass:
1. Remove redundant alt_frame_index;
2. Remove hard coded index value with the variable of frame_index.
Luc Trudeau [Wed, 13 Jun 2018 17:39:04 +0000 (13:39 -0400)]
VSX Version of SAD8xN
VSX versions of the SAD functions of width 8.
SADTest Speed Test (POWER8 Model 2.1)
8x4 C time = 68.7 ms (±0.3 ms), VSX time = 31.8 ms (±0.1 ms) [2.2x]
8x8 C time = 55.6 ms (±0.3 ms), VSX time = 18.3 ms (±0.1 ms) [3.0x]
8x16 C time = 46.5 ms (±0.1 ms), VSX time = 15.6 ms (±0.1 ms) [3.0x]
Luc Trudeau [Wed, 13 Jun 2018 17:36:17 +0000 (13:36 -0400)]
Add Speed Tests for the SADTest test suite.
Speed tests are added for the SADTest test suite. These test use the
AbstractBench and print the median run time of SAD operations. Speed
tests are disabled by default.
Jerome Jiang [Mon, 11 Jun 2018 18:05:36 +0000 (11:05 -0700)]
vp9 svc: Denoise golden when it's a temporal ref.
When golden was the inter-layer reference, a block that selected the golden ref
would not be denoised.
But when golden is used as a second temporal reference then we should denoise
blocks that select the golden reference.
This changes allows for that.
Luc Trudeau [Thu, 7 Jun 2018 19:30:23 +0000 (15:30 -0400)]
VSX Version of vp9_quantize_fp
Low bit depth version only. Passes the VP9QuantizeTest test suite.
VP9QuantizeTest Speed Test (POWER8 Model 2.1)
4x4 C time = 86.3 ms (±0.7 ms), VSX time = 18.2 ms (±0.0 ms) [ 4.7x]
8x8 C time = 57.7 ms (±0.3 ms), VSX time = 7.6 ms (±0.0 ms) [ 7.6x]
16x16 C time = 50.7 ms (±0.1 ms), VSX time = 4.9 ms (±0.0 ms) [10.3x]
Marco Paniconi [Thu, 7 Jun 2018 23:35:44 +0000 (16:35 -0700)]
vp9-svc: Fix to frames_since_golden update for SVC.
When the second (gf) temporal reference is used in SVC:
the reference is refreshed on base TL superframes, and so
the rc->frames_since_golden counter was also only updated on
base TL frames. But this was disabling the golden reference
from being used as a temporal reference for TL > 0 frames
(since frames_since_golden was 0/not updated on TL > 0 frames).
Fix is to copy the update of rc->frames_since_golden to all
upper temporal layers. This allows TL > 0 frames to test the
golden inter mode.
Gain on RTC set: ~2%, ~8% on desktop_vga clip.
Encode time increase ~5-8% on linux, 3SL-3TL run with 1 thread.
For now keep this off for TL > 0 frames in speed features, so
this change does not change current behavior for speed >= 7.
Hui Su [Fri, 8 Jun 2018 20:41:05 +0000 (13:41 -0700)]
Small speedup of ml_pruning_partition()
Terminate early and skip neural net model when linear score is already
high enough, which indicates that we should not skip split and
rectangular partitions.
No changes on compression; encoding speed improves slightly.
Marco Paniconi [Thu, 7 Jun 2018 22:07:57 +0000 (15:07 -0700)]
vp9-svc: Adjust some logic on gf temporal reference.
For the feature of using second temporal reference (when
inter-layer is off): move the buffer_idx assignement and
refresh flag settings further down to vp9_rc_get_svc_params(),
since is_key_frame is set there for every frame/layer.
Otherwise it was using the setting from the previous frame/layer.
This makes the refresh more consistent for both layers for
2 spatial layers case.
Luca Barbato [Wed, 6 Jun 2018 21:10:18 +0000 (21:10 +0000)]
Implement subtract_block for VSX
~2x speedup or better.
[ RUN ] C/VP9SubtractBlockTest.Speed/0
[ BENCH ] 4x4 365.1 ms ( ±2.2 ms )
[ BENCH ] 8x4 258.5 ms ( ±0.3 ms )
[ BENCH ] 4x8 202.7 ms ( ±0.2 ms )
[ BENCH ] 8x8 162.2 ms ( ±0.5 ms )
[ BENCH ] 16x8 138.8 ms ( ±0.3 ms )
[ BENCH ] 8x16 121.5 ms ( ±0.4 ms )
[ BENCH ] 16x16 110.2 ms ( ±0.5 ms )
[ BENCH ] 32x16 104.8 ms ( ±0.1 ms )
[ BENCH ] 16x32 32.7 ms ( ±0.1 ms )
[ BENCH ] 32x32 30.0 ms ( ±0.0 ms )
[ BENCH ] 64x32 28.7 ms ( ±0.0 ms )
[ BENCH ] 32x64 20.1 ms ( ±0.0 ms )
[ BENCH ] 64x64 19.3 ms ( ±0.0 ms )
[ RUN ] VSX/VP9SubtractBlockTest.Speed/0
[ BENCH ] 4x4 155.3 ms ( ±0.9 ms )
[ BENCH ] 8x4 99.3 ms ( ±0.4 ms )
[ BENCH ] 4x8 77.2 ms ( ±0.1 ms )
[ BENCH ] 8x8 45.7 ms ( ±0.0 ms )
[ BENCH ] 16x8 34.1 ms ( ±0.0 ms )
[ BENCH ] 8x16 29.5 ms ( ±0.0 ms )
[ BENCH ] 16x16 19.9 ms ( ±0.0 ms )
[ BENCH ] 32x16 15.1 ms ( ±0.0 ms )
[ BENCH ] 16x32 16.7 ms ( ±0.0 ms )
[ BENCH ] 32x32 14.1 ms ( ±0.0 ms )
[ BENCH ] 64x32 12.6 ms ( ±0.0 ms )
[ BENCH ] 32x64 12.0 ms ( ±0.0 ms )
[ BENCH ] 64x64 11.2 ms ( ±0.0 ms )
Tom Finegan [Thu, 7 Jun 2018 19:35:05 +0000 (12:35 -0700)]
Add avx512 compile test.
Some compiler releases allow the -mavx512f arg without actually
implementing support. Test for this situation, and disable avx512
when it is detected by configure.
Marco Paniconi [Thu, 7 Jun 2018 17:52:09 +0000 (10:52 -0700)]
vp9-svc: Allow second temporal reference for next highest layer.
When inter-layer prediction is disabled on INTER frames, allow
for next highest resolution to have second temporal reference.
Current code allowed for only top/highest spatial layer.
Marco Paniconi [Thu, 7 Jun 2018 05:42:38 +0000 (22:42 -0700)]
vp9-svc: Modify choose_partitioning for second temporal ref
For mode where second temporal reference is used in SVC: allow
for using/testing this reference (golden ref) in the variance
partition scheme (choose_partitioning).
Small positive gain (~0.25%) on metrics for 3 layer SVC,
negligible change in speed.
Marco Paniconi [Wed, 6 Jun 2018 19:14:59 +0000 (12:14 -0700)]
vp9-svc: Add a buffer_idx is_used parameter for SVC.
For the case where a second (long term) temoral reference is
used in the SVC: this additional parameter is to make sure the
buffer slot selected for this reference is available for usage,
i.e., it is never used for any of the 3 references set for the
fixed SVC patterns.
Jerome Jiang [Tue, 5 Jun 2018 22:21:29 +0000 (15:21 -0700)]
vp9: Move up reset of cyclic refresh under dynamic resize.
When resize happens and cyclic refresh is not applied on the
current (resized) frame, the sb_index is not reset and then
might be out of boundary on future frames when the
cyclic refresh is applied.
James Zern [Sat, 2 Jun 2018 06:22:48 +0000 (23:22 -0700)]
test,cosmetics: fix func/member naming, decl order
functions: upper camelcase
members: lowercase with trailing '_'
decl order: functions (overrides marked virtual), members
after: 656e8ac61 VSX version of vpx_post_proc_down_and_across_mb_row 766d875b9 VSX version of vpx_mbpost_proc_ip 35e98a70b VSX version of vpx_mbpost_proc_down b2898a9ad Bench Class For More Robust Speed Tests
Jerome Jiang [Fri, 1 Jun 2018 21:27:34 +0000 (14:27 -0700)]
vp9-svc: Allow usage of second (long term) temporal reference.
Allow for second temporal reference for top spatial layer in SVC,
when inter-layer prediction is disabled on INTER frames.
The second temporal reference is labelled as the golden reference
and the update/refresh of this reference buffer is only on base
temporal layer superframes. For now the period of refresh is
fixed at every 20 TL0 superframes.
Average gain is ~4% on RTC set, several clips up
by ~8-12%. Speed loss is about ~2% on mac.
Jerome Jiang [Wed, 23 May 2018 18:52:37 +0000 (11:52 -0700)]
VP9: Allow for bilinear subpel interp at speed 9 for high motion.
Fixed some settings in nonrd pick mode to allow for frame-level bilinear
to be set.
On Galaxy S8+ it has 4% speed up on high motion clips. Almost the same
for low motion.
Marco Paniconi [Thu, 31 May 2018 16:59:15 +0000 (09:59 -0700)]
vp9-svc: Fix to some frame metrics for real-time mode.
Add condition of LAST frame to the consec_zeromv and
avg_frame_low_motion metrics. This is needed for SVC as
the golden reference is a spatial reference and should
not be included in the metric computation.
Hui Su [Mon, 14 May 2018 21:35:25 +0000 (14:35 -0700)]
Improve the ML based partition pruning
Add a neural net model that uses the same features as the existing
linear model. Make the pruning decision based on both the linear
and the neural net model. It provides more accurate predictions,
and may improve compression and/or encoding speed.
This only affects speed 0.
Coding gain:
0.37% on midres
0.34% on hdres
0.50% on jvet8b720p
Encoding speed impact(average over locally tested 20 clips from midres
and hdres):
QP=20: down by 2.5%.
QP=30: down by 3.9%.
QP=40: donw by 4.5%.
QP=50: up by 5.2%.
Marco Paniconi [Wed, 30 May 2018 23:01:53 +0000 (16:01 -0700)]
vp9-svc: Fix to compute some metrics on top spatiail layer.
The avg_frame_low_motion and consec_zeromv are frame-level
metrics that are updated on every frame. For SVC these should be
updated on top spatial layer (full resolution).
James Zern [Tue, 29 May 2018 23:38:46 +0000 (16:38 -0700)]
Revert 3 slide show coding changes
This is a combination of the following 3 reverts. The changes cause
issues on certain hardware devices. We'll pull them for now to allow for
further investigation.
Revert "Experiment regarding playback problems on Bravia TVs."
Luc Trudeau [Tue, 22 May 2018 18:16:15 +0000 (14:16 -0400)]
Bench Class For More Robust Speed Tests
To make speed testing more robust, the AbstractBench runs the
desired code multiple times and report the median run time with
mean absolute deviation around the median.
To use the AbstractBench, simply add it as a parent to your test
class, and implement the run() method (with the code you want to
benchmark).
Sample output for VP9QuantizeTest
[ BENCH ] Bypass calculations 4x4 165.8 ms ( ±1.0 ms )
[ BENCH ] Full calculations 4x4 165.8 ms ( ±0.9 ms )
[ BENCH ] Bypass calculations 8x8 129.7 ms ( ±0.9 ms )
[ BENCH ] Full calculations 8x8 130.3 ms ( ±1.4 ms )
[ BENCH ] Bypass calculations 16x16 110.3 ms ( ±1.4 ms )
[ BENCH ] Full calculations 16x16 110.1 ms ( ±0.9 ms )
Marco Paniconi [Sat, 26 May 2018 04:51:33 +0000 (21:51 -0700)]
vp9-realtime: Move frame dropper to after scene detection.
Move frame dropper to after scene detection and noise estimation.
Scene detection and noise estimation operate on source data and
update metrics along sequence, so they should be moved before
the frame dropper.
Also we don't want to drop on scene change, as the scene detection
and (possible) re-encode step will be missed.
Marco Paniconi [Tue, 29 May 2018 02:32:01 +0000 (19:32 -0700)]
vp9-svc: Fix to allowed value of max_consec_drop.
For the max_consec_drop parameter in svc frame drop:
since passing value 0 in the control would completely
disable the dropper, only allow for values >= 1 to be set.
Jerome Jiang [Wed, 23 May 2018 22:43:00 +0000 (15:43 -0700)]
VP8: Fix use-after-free in postproc.
The pointer in vp8 postproc refers to show_frame_mi which is only
updated on show frame. However, when there is a no-show frame which also
changes the size (thus new frame buffers allocated), show_frame_mi is
not updated with new frame buffer memory.
Change the pointer in postproc to mi which is always updated.
Marco Paniconi [Wed, 23 May 2018 21:51:00 +0000 (14:51 -0700)]
vp9: Rate control adjustments for screen content.
For screen content mode: changes to reduce occurence of
significant QP decrease (from one frame to next),
which can cause large frames (overshoot/delay).
-cap the buffer increase to optimal level for frame drop
mode where full superframe can drop
-reduce the max_adjustment_down due to buffer overflow
-reduce qp threshold to trigger re-encode on large frame
Marco Paniconi [Fri, 18 May 2018 22:09:55 +0000 (15:09 -0700)]
vp9-svc: Fix on disabling inter_layer prediction.
In vp9_svc_constrain_inter_layer_pred() we disable the
inter_layer prediction if anything but only the previous
spatial layer (from same supeframe) is used for inter_layer
prediction. This check and disabling was only allowed when
the control VP9E_SET_SVC_INTER_LAYER_PRED is set to
INTER_LAYER_PRED_ON_CONSTRAINED.
But the control VP9E_SET_SVC_INTER_LAYER_PRED is needed for setting:
INTER_LAYER_PRED_ON/INTER_LAYER_PRED_OFF/INTER_LAYER_PRED_OFF_NONKEY.
So there is a conflict with setting INTER_LAYER_PRED_ON_CONSTRAINED.
Fix for now is to always allow for this disabling check
(disable inter_layer reference if its not previous spatial layer) as
long as inter_layer prediction is used (i.e., not set to _OFF).
A separate fix if needed may be to invoke another control for setting
INTER_LAYER_PRED_ON_CONSTRAINED.
This was causing an issue with enabling spatial layers on the fly
(say spatial layer 2), where since INTER_LAYER_PRED_ON_CONSTRAINED was
not set (default), the inter_layer prediction was then using a reference
from 2 spatial layers below (spatial layer 0).
Marco Paniconi [Fri, 18 May 2018 21:50:46 +0000 (14:50 -0700)]
vp9-svc: Fix issue with reseting lst_fb_idx.
When encoding a given spatial layer and the same spatial layer
on previous superframe was dropped (or disabled due to 0 bitrate),
the lst_fb_idx for current layer is set to the buffer index that
was last updated on TL0 frame (for the same spatial layer).
This condition was to maintain proper temporal prediction pattern
under frame drops, and it should only apply to INTER frames.
But the condition was causing an assert to be triggered on spatial
layers whose base are key frames. Fix is to condition this reset of
lst_fb_idx on the "is_key_frame" flag. Also initialize the
fb_idx_upd_tl0 to -1 and only use it for a given spatial layer
if its been set.
These issues can happen when superframe drop happens just before
a key frame, or when stream starts with lower layers and dynamically
enabled higher spatial layers.
Added datarate unittest the inserts key frame after superframe drop,
and verified that this fix is needed for test to pass.
Also modified the existing DisableEnable spatial layer test to trigger
the issue of using fb_idx_upd_tl0 when it hasn't been set for a
spatial layer.
Paul Wilkins [Fri, 18 May 2018 16:36:30 +0000 (17:36 +0100)]
Experiment regarding playback problems on Bravia TVs.
This patch experimentally reduces the maximum GF interval for
static content such as slide shows.
It does not fully revert the previous slide show patches as this
still allows the codec to code static sections only using GFs
groups rather than ARF groups or a mix of ARF and GF groups.
However, the maximum group length is reduced.