Yue Chen [Tue, 16 Feb 2016 20:22:11 +0000 (12:22 -0800)]
Fixing a bug in obmc prediction in the rd loop
This bug made the rd loop use one-side obmc (compound of the current
predictor and the predictors of the left mi's, while the above ones
are ignored by mistake) to determine whether to use obmc. This fix
improved the compression performance by ~0.6% on different test sets.
Coding gain (%) of obmc experiment on derflr/derfhd/hevcmr/hevchd:
1.568/TBD/1.628/TBD
Geza Lore [Fri, 12 Feb 2016 16:04:35 +0000 (16:04 +0000)]
Add optimized vpx_sum_squares_2d_i16 for vp10.
Using this we can eliminate large numbers of calls to predict intra,
and is also faster than most of the variance functions it replaces.
This is an equivalence transform so coding performance is unaffected.
Encoder speedup is approx 7% when var_tx, super_tx and ext_tx are all
enabled.
Yue Chen [Wed, 27 Jan 2016 22:18:53 +0000 (14:18 -0800)]
Overlapped block motion compensation experiment
In this experiment, an obmc inter prediction mode is enabled for
>= 8X8 inter blocks. When the obmc flag is on, the regular block-
based motion compensation will be refined by using predictors of
the above and left blocks.
Fixed some compatibility issues with vp9_highbitdepth, supertx,
ref_mv, and ext_interp.
Coding gain (%) on derflr/hevcmr/hevchd
OBMC:
1.047/1.022/0.708
OBMC + SUPERTX:
1.652/1.616/1.137
SUPERTX:
0.862/0.779/0.630
Adds a wiener filter based restoration scheme in loop which can
be optionally selected instead of the bilateral filter.
The LMMSE filter generated per frame is a separable symmetric 7
tap filter. Three parameters for each of horizontal and vertical
filters are transmitted in the bitstream. The fourth parameter
is obtained assuming the sum is normalized to 1.
Also integerizes the bilateral filters, along with other
refactoring necessary in order to support the new switchable
restoration type framework.
derflr: -0.75% BDRATE
[A lot of videos still prefer bilateral, however since many frames
now use the simpler separable filter, the decoding speed is
much better].
Further experiments to follow, related to replacing the bilateral.
Yaowu Xu [Fri, 5 Feb 2016 21:03:47 +0000 (13:03 -0800)]
Enable computing PSNRHVS for hbd build
This commit adds computation of PSNRHVS for highbitdepth build, it
also adds tests to make sure the calculation of psnrhvs metric for
10 and 12 bit correct.
Jingning Han [Thu, 11 Feb 2016 17:41:06 +0000 (09:41 -0800)]
Align rate-distortion cost metric for chroma compoments
This commit aligns the rate-distortion metrics for both luma and
chroma components in super transform rate-distortion optimization.
It improves the coding gains due to var-tx and supertx experiments
by 0.2% for high resolution test sets.
Alex Converse [Wed, 10 Feb 2016 22:12:26 +0000 (14:12 -0800)]
Port switch to 9-bit rate cost to vp10.
Brings the following commits to vp10: 269428e Tie the bit cost scale to a define. d13385c Switch to 9-bit rate cost constants built on a 256 probability denominator. ad43a73 Fix a signed overflow in vp9 motion cost. 1c9b091 Fix some interger overflow errors fac947d Restore previous motion search bit-error scale.
Marco [Thu, 11 Feb 2016 00:59:09 +0000 (16:59 -0800)]
vp9-resize: Force reference masking off for external dynamic-resizing.
An issue exists with reference_masking in non-rd pickmode for spatial
scaling. It was kept off for internal dynamic resizing and svc, this
change is to keep it off also for external dynamic resizing.
Update to external resize test, and update TODO to re-enable this
at frame level when references have same scale as source.
Yaowu Xu [Fri, 5 Feb 2016 19:30:12 +0000 (11:30 -0800)]
Enable computing of FastSSIM for HBD build
This commit adds the computation of fastSSIM for highbitdepth build,
it also modifies the hbdmetric test to be more generic and applicable
for fastSSIM.
The 255 used for calculating ssim constants c1 and c2 is not exactly
scaled by 4x and 16x to 1023 and 4095, therefore requries the metric
test to have a thresold more tolerant than 0, currently at 0.03dB.
Yue Chen [Wed, 10 Feb 2016 23:53:08 +0000 (15:53 -0800)]
Adding the config tag for the OBMC experiment
obmc: We add an obmc prediction mode at superblock level.
When it is enabled, predictors of the above and left blocks
are used to refine the regular block-based motion compensation.
Marco [Wed, 10 Feb 2016 23:18:26 +0000 (15:18 -0800)]
vp9 resize_test: Enable resize_allowed in real-time ExternalResize test.
For dynamic resizing (whether the new codec size is determined internally
or externally set by user), we should for now keep rc.resize_allowed enabled.
This prevent the use of referene_masking for real-time mode
(in set_rt_speed_feature()).
Scott LaVarnway [Wed, 10 Feb 2016 19:43:23 +0000 (11:43 -0800)]
VP9: Pass NULL scale_factors ptr when not scaling
to vp9_setup_pre_planes(), preventing the function
unscaled_value() from being called. unscaled_value()
returns the same value that was passed in. See
scaled_buffer_offset() in vp9_reconinter.h.
Jingning Han [Wed, 10 Feb 2016 01:51:49 +0000 (17:51 -0800)]
Resolve conflict between var-tx and super-tx
This commit aligns the rate-distortion metric for the recursive
transform block partitioning and the super transform. It resolves
the conflicts between these two experiments. The coding performance
gains of the combined experiments (var-tx + super-tx) has been
improved:
Marco [Mon, 8 Feb 2016 18:41:13 +0000 (10:41 -0800)]
vp9-dynamic resize: Fix bug on releasing scaled reference.
When the codec frame size is the same as the reference frame size,
release the scaled reference before assigning it a new buf_idx.
Only affects 1 pass non-svc mode, where the scaled references are
release only under certain conditions (to prevent un-needed scaling
of the references every frame).
Modified a unittest that can trigger this bug without this change.
Geza Lore [Tue, 9 Feb 2016 10:17:22 +0000 (10:17 +0000)]
Fix partition type costing.
This patch makes rd optimization use the same context for computing
the rate cost of coding the partitioning as the packer actually uses
when emitting it in write_modes_sb.
Yaowu Xu [Tue, 9 Feb 2016 02:31:30 +0000 (18:31 -0800)]
Fix a bug in HBD buffer size computation
The value of use_highbitdepth flag is used for compute the size for
high bit depth buffer allocation, which should take value 0 or 1
depending on if the buffer is used for high bit depth or not.
Previously, the values is set to 8 or 0, this commit fixes the issue
and properly set the value for this flag to 1 or 0.
This cuts the size of highbitdepth buffer memory allocation to 2/9 of
the size prior to the fix.
Jingning Han [Fri, 22 Jan 2016 02:07:31 +0000 (18:07 -0800)]
Entropy coding for dynamic ref mv modes
This commit enables entropy coding for dynamic reference motion
vector modes. The probability model is contexted on the ranking
categories of the reference motion vector candidates.
Fixes an issue where the tx_type was not set correctly for
sub8x8 inter and intra blocks. In the current syntax, for
sub8x8 blocks, there is still a single tx_type that is
transmitted. Ideally, this should be searched for the best
rd performance, albeit at the expense of encode speed.
For now, we just set it to DCT_DCT. Previously it was left
incorrectly as what was used for the previous non sub8x8
block.
Fixes an issue where the tx_type was not set correctly for
sub8x8 inter and intra blocks. In the current syntax, for
sub8x8 blocks, there is still a single tx_type that is
transmitted. Ideally, this should be searched for the best
rd performance, albeit at the expense of encode speed.
For now, we just set it to DCT_DCT. Previously it was left
incorrectly as what was used for the previous non sub8x8
block.
Yaowu Xu [Mon, 8 Feb 2016 17:41:43 +0000 (09:41 -0800)]
Fix msvc compiler warnings
There were a number of compiler warnings:
1. int16_t to uint8_t in recon_intra.c;
2. double to float conversions in psnrhvs.c
3. intptr_t to int in quantize.c
4. size_t to int32_t in decoder.c
James Zern [Fri, 5 Feb 2016 20:31:48 +0000 (12:31 -0800)]
intrapred/d135: flatten border results before storing
the results along the top and left border are then stored with a moving
window into the vector.
~40-67% faster on ARM, ~40-77+% on x86 depending on the block size.