Paul Wilkins [Tue, 2 Oct 2018 15:11:14 +0000 (16:11 +0100)]
Force even arf group length where possible.
This patch tweaks the calculation of the active maximum GF interval
and also the break out clause for the GF interval loop. The changes
force the maximum and where possible the break out value to be odd
which in turn will result in an even length ARF group if ARF coding is
selected (vs GF only coding).
The primary aim was to improve coding with multi layer arf groups.
For the single layer case there are small net gains in 3 out of 4 sets
(low,md, hd) and a small net drop for the NF2K set.
For multi-layer the gains (opsnr, ssim, psnr-hvs : -ve = better) were:-
Hui Su [Sat, 29 Sep 2018 21:48:56 +0000 (14:48 -0700)]
Introduce the ml_var_partition_pruning feature
Add the ml_var_partition_pruning encoder speed feature that
uses a neural net model to prune partition-none and partition-split
search. The model uses prediction residue variance and quantization
step size as input features.
Encoding speed gain for speed 0(tested over 20 hdres clips):
QP=30 QP=40
average 17.7% 18.3%
max 24.46% 26.6%
Paul Wilkins [Thu, 27 Sep 2018 09:55:05 +0000 (10:55 +0100)]
Fix minor bug in calculation of max arf group length.
Their is no valid last boosted Q availably when estimating the maximum
group length for the first ARF group in a clip, so use a value based on
the current max q.
Paul Wilkins [Fri, 28 Sep 2018 15:54:03 +0000 (16:54 +0100)]
Adjustment of GOP intra factor for multi-layer.
This provides and alternative (still to be tuned for edge cases)
approach to adjusting the gop intra factor when multi-layer coding
is in effect that does not alter single layer coding.
Hui Su [Tue, 25 Sep 2018 19:19:53 +0000 (12:19 -0700)]
Add ml_var_partition experiment
Make partition decisions using machine learning models. The goal is to
achieve better coding quality than the variance-based parititioning
without much encoding speed loss.
To enable this experiment, use --enable-ml-var-partition for config.
When eanbled, the variance-based partitioning is replaced by this ML
based partitioing for speed 6 and above in real time mode(except low
resolution or high bit-depth).
This is badly broken and may help somewhat for multi-layer but is hurting
massively in single layer encodes.
I ran this through this morning and while it often helps in SSIM it is badly down
for global PSNR and PSNR-HVS with some clips down by 35-40%. This is in line
with previous experiments where I have found that a bigger boost helps SSIM
but hurts PSNR and PSNR HVS.
I was also working on changes to the I factor that gave some improvements
in single layer though these were based upon the active Q mostly. I also have
looked at a bug for the first group where int_lbq is not properly defined and
will submit an interim patch for this while I look for a better solution.
In the meantime I think we should revert this.
The (Global PSNR, SSIM, PSNR-HVS) for the patch as is in my runs for
single layer vs a couple of days ago seem to be (-ve is better).
Low res 0.346, -1.475, 0.239
mid res 1.581, -1.300, 1.731 (worst result down by 30-40% in psnr)
hdres 0.665, -0.712, 1.043 (worst result down by 17-19% in psnr)
NF2k 0.927, 0.111, 1.3220 (Worst result down by 5-7% in psnr)
Jingning Han [Fri, 28 Sep 2018 04:01:08 +0000 (21:01 -0700)]
Add MID_OVERLAY_UPDATE frame type
Add a MID_OVERLAY_UPDATE abstract to support multi-layer
ARF-Overlay frame based approach. When setting the frame update
type to be USE_BUF_FRAME, the encoder will use show_existing_frame
to process the intermediate ARF frames. When setting the frame
update type to be MID_OVERLAY_UPDATE, the intermediate ARF frames
will go through an overlay frame for display.
Hui Su [Thu, 27 Sep 2018 17:12:55 +0000 (10:12 -0700)]
Fix a loophole in nonrd_pick_partition()
In some rare cases, all possible paritions may be skipped during RD
search. The patch makes the encoder do rectangular partition search if
both partition-none and partition-split are not allowed.
Tested on the rtc and ytlivehr testsets with speed 5 and 7, no coding
stats changes were observed.
Jingning Han [Wed, 26 Sep 2018 23:34:16 +0000 (16:34 -0700)]
Adapt GOP size threshold to the allowed layer depth
Increase the total prediction error budget linearly with the
allowed ARF layer depth. This in general improves the compression
performance, but does hit corner cases on a few clips at very
low bit-rate range (corresponding to 26 - 28 dB range). To mitigate
such problem, we temporarily work around this problem by limiting
the first GOP size to be ~8 so as to not drain up the bit resource.
The overall compression performance improvements over the current
multi-layer ARF system in speed 0 are:
Jingning Han [Mon, 24 Sep 2018 20:53:53 +0000 (13:53 -0700)]
Use layer dependent gfu_boost factor
When multi-layer ARF is enabled, use the corresponding gfu_boost
factor assigned to each ARF to compute the best_quality_index
adjustment. This on average improves the coding performance by
0.2% for lowres and hdres, 0.4% for ntflx2k. It seems this change
will only affect a small group of clips, e.g., pamphlet, bowing,
mobcal_720p, etc., which tend to gain 4-5%, whereas the rest
clips remain largely identical coding statistics.
With --enable-better-hw-compatibility an access to array element -1
can be observed for VP9/ActiveMapTest.Test/0
../vp9/encoder/vp9_rdopt.c:3938:53: runtime error:
index -1 out of bounds for type 'RefBuffer [3]'
There doesn't seem anything that would prevent ref_frame from being 0.
If there is no reference frame it can probably be assumed that it
isn't scaled.
When built with -fsanitizer=address,undefined a number of tests,
such as ByteAlignmentTest.SwitchByteAlignment or
ByteAlignmentTest.SwitchByteAlignment produce runtime errors about
unaligned 4-byte loads/stores. While normally not really a problem,
this does technically violate the language and it is eays to fix in
a standard conforming way using memcpy which does not produce
inferior code.
Jingning Han [Thu, 20 Sep 2018 17:03:27 +0000 (10:03 -0700)]
Generalize encoder comp_var_ref setting
Generalize the encoder comp_fixed_ref and comp_var_ref assignments.
Make it fully support 2 fwd + 1 bwd and 1 fwd + 2 bwd settings
that VP9 decoder allows.
When running tests built with
-fsanitize=undefined and--disable-optimizations
the sanitizer will emit errors of the following general form:
runtime error: member call on address 0xxxxxxxxx which does not
point to an object of type 'WithParamInterface'
0xxxxxxxxx: note: object has invalid vptr
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ...
^~~~~~~~~~~~~~~~~~~~~~~
invalid vptr
This can be traced to calls to WithParamInterface<T>::GetParam before
the object argument has been initialized. Although GetParam only
accesses static data it is a non-static member function. This causes
that call to have undefined behaviour.
The patch makes GetParam a static member function.
The alternative - if the pull request is denied - would be to
modify all parameterized tests to have them derive from
::libvpx_test::CodecTestWith*Params as the first base class.