Supradeep T R [Tue, 12 Jun 2018 08:27:39 +0000 (13:57 +0530)]
Loopfilter MultiThread Optimization
Adding LPF within the tileworker hook. This means that LPF will be done
immediately after decode, without waiting for all threads to sync.
Performance Improvement -
Platform Resolution 2 Threads 4 Threads
X86 720p 7.24% 22.04%
1080p 5.29% 17.02%
ARM 720p 4.61% 8.75%
1080p 5.55% 12.03%
x86 Improvement measured on Intel Core i7-6700 CPU @ 2.10GHz set
in performance with turbo mode off
ARM Improvement measured on Nexus 6 Snapdragon 805 Quad-core @ 2.65 GHz
Jingning Han [Sat, 11 Aug 2018 00:01:08 +0000 (17:01 -0700)]
Use YUV components to build the temporal filter
Use both luma and chroma components simultaneously to estimate the
non-local mean kernel and build the temporal filter. It improves
the compression performance primarily for chroma components. Tested
in speed 0 and vbr mode, the coding gains are:
Marco Paniconi [Mon, 30 Jul 2018 16:25:14 +0000 (09:25 -0700)]
vp9: Add flatness metric to cyclic refresh setup.
For screen-content with aq-mode = 3: identify spatial
flat superblocks in the setup stage and don't mark them as
candidates for refresh. Spatially flat blocks are already
removed from refresh at a later stage in the encoding (in pick_mode),
but doing this at the setup stage of cyclic refresh (before encoding)
allows refresh to more quickly hit the text areas. Only drawback is
an extra source variance calculation for a set of superblocks on
each frame.
Adjust the refresh rate: lower it to reduce overshoot since
more texture areas are hit faster with this change.
Jerome Jiang [Mon, 13 Aug 2018 18:01:31 +0000 (11:01 -0700)]
vp9: fix memory alloc for adaptive_rd_thresh_row_mt.
When the feature is enabled and the memory is not available, allocate
it. There was a case where speed feature changed in the middle of stream
but the number of tiles stayed the same, memory was not re-allocated.
Another case is where speed for base layer is different than that of
higher quality layers (same resolution). Removed the speed constraints
forcing base layer using same speed setting.
Thus the memory for adaptive_rd_thresh_row_mt stayed NULL but the
feature was enabled.
Marco Paniconi [Mon, 13 Aug 2018 04:35:15 +0000 (21:35 -0700)]
vp9-svc: Fixes for cyclic refresh for SVC.
Add metrics that are being updated per-frame to
the layer struct, so each layer using the cyclic
refresh has the correct update. This is more consistent
for the rate control and refresh rate.
Some improvement in screen content clips.
Neutral for SVC on rtc set.
Marco Paniconi [Sat, 11 Aug 2018 19:59:40 +0000 (12:59 -0700)]
vp9-svc: Fix to updated SET_SVC_REF_FRAME_CONFIG control
Add flag to separate two cases of bypass (flexible) SVC mode:
usage of using the SET_SVC_REF_FRAME_CONFIG vs passing in the
frame_flags in the vpx_encode (only used for temporal layers).
This fixes failures in Datarate Temporal layer test,
introduced in commit: a66da31
Marco Paniconi [Wed, 8 Aug 2018 21:01:26 +0000 (14:01 -0700)]
vp9: Allow for overshoot detection for non-screen CBR mode.
For CBR real-time mode: refactor usage of speed feature to
handle overshoot on slide/scene change. Add 2 modes to indicate
how slide/scene change is processed for re-setting Q/rate control.
Keep the speed setting to 1 for speed >= 5, otherwise set to 0.
Video content and screen content are now handled in similar way,
though with different thresholds.
Some fixes to thresholds and reset: correct the reset of the buffer
level to optimal level for each temporal layer, if scene change
frame will be encoded at max_q.
Also increase the min_thresh for video mode (non-screen content):
this is to avoid scene change detection on cases like large
lighting changes, cameras focus. And increase in min_thresh
makes it more robust to sudden increase in noise level.
Marco Paniconi [Thu, 9 Aug 2018 16:34:05 +0000 (09:34 -0700)]
vp9-svc: Fix for scene detection for SVC
For spatial layers: use the correct mi_cols/rows in the
scene detection. The scene detection for spatial layers
is only called once per superframe, but we were using wrong
mi_cols/rows (those for base spatial were being used).
Also increase frame_since_key threshold to account for spatial
layers.
James Zern [Wed, 8 Aug 2018 03:07:09 +0000 (20:07 -0700)]
loop_filter_rows_mt: use sb_rows to limit workers
Previously if the number of tiles decreased within a clip and there were
fewer super block rows than workers the mi_row calculation would cause
rows to be skipped. The num_workers stored is the max allocated amount,
use sb_rows to limit the active ones if the row count is smaller as
additional threads will provide no benefit.
Marco Paniconi [Fri, 3 Aug 2018 17:45:41 +0000 (10:45 -0700)]
vp9: Add screen-content mode to overshoot detection.
For real-time 1 pass mode: overshoot detection and max_Q
reset should only be for screen-content mode.
This fixes some failures in the 1 pass VBR tests, from
the commit: 2fae9991
Marco Paniconi [Fri, 3 Aug 2018 16:20:55 +0000 (09:20 -0700)]
vp9: Adjust qp_thresh on slide change overshoot detection
For real-time screen-content mode: increase the
qp_thresh for max_Q setting on slide changes.
This will make bitrate spikes less likely on slide changes.
Marco Paniconi [Thu, 2 Aug 2018 16:22:58 +0000 (09:22 -0700)]
vp9: Disable re_encode_overshoot feature for speed >= 6.
For real-time screen content mode: for speed >= 6 disable
the re_encode_overshoot feature. This means for speed >= 6
the Q and rate control is reset on slide changes based on
the scene/slide detection and the current Q (and not on a
first pass encoded frame at current Q).
This reduces encode time on slide changes, but may be less
accurate in deciding when to reset/max-out the Q.
Hui Su [Wed, 1 Aug 2018 22:43:05 +0000 (15:43 -0700)]
Handle partition cost better in RD search
Take partition cost into consideration during rectangular partition
mode search.
Compression change is neutral. Encoding speed can be a little faster
at low quality settings. With QP=55 at speed 0, average speed up over
15 midres sequences is about 2.7%.
Jingning Han [Tue, 31 Jul 2018 16:43:17 +0000 (09:43 -0700)]
Use mesh full pixel motion search to build the source ARF
Append mesh search to the diamond shape search to refine
the full pixel motion estimation for source ARF generation.
It improves the average compression performance.