Jingning Han [Sat, 5 Mar 2016 05:23:55 +0000 (21:23 -0800)]
Hybrid 1-D/2-D transform coding
This commit enables a hybrid 1-D/2-D transform coding scheme and
the accompany entropy coding system. It currently uses hybrid
1-D/2-D DCT transform coding. It provides coding performance gains:
Yi Luo [Mon, 29 Feb 2016 17:53:42 +0000 (09:53 -0800)]
Added vp10_fht4x4_sse2() unit test
Inherited class TransformTestBase to derived class VP10Trans4x4HT.
Employed RunCoeffCheck() to test vp10_fht4x4_sse2() against
C reference vp10_fht4x4_c().
fdst4_sse2() related seven hybrid transform cases are covered
in this test.
Wrote a header file for test base class. Some modification to
make sure the base class can be used for 8x8, 16x16, 32x32 cases.
All related tests passed.
Sarah Parker [Tue, 1 Mar 2016 18:12:13 +0000 (10:12 -0800)]
Adding speed feature interface for ext tx search
This sets up the interface for 3 speed features that progressively
eliminate a greater number of transforms in ext tx using
pre-trained support vector machines.
Each speed feature still needs to be implemented.
Yi Luo [Wed, 2 Mar 2016 21:45:52 +0000 (13:45 -0800)]
Fixed a computation bug in fdct16_sse2()
fdct16_sse2() was not bit-exact with C reference, fdct16().
The inconsistency was found by writing a unit test for
vp10_fht16x16_sse2(). Since the unit test needs a pending
change on the inherited base class. I will commit this unit
test after making a header file for this base class.
Passed the uncommitted unit test: vp10_fht16x16_test.cc.
Yunqing Wang [Tue, 16 Feb 2016 22:33:18 +0000 (14:33 -0800)]
Do sub-pixel motion search in up-sampled reference frames
Up-sampled the reference frames to 8 times in each dimension using
the 8-tap interpolation filter. In sub-pixel motion search, use the
up-sampled reference frames to find the best matching blocks. This
largely improved the motion search precision, and thus, improved
the compression quality. There was no change in decoder side.
Borg test and speed test results:
1. On derflr set,
Overall PSNR gain: 1.306%, and SSIM gain: 1.512%.
Average speed loss on derf set was 6.0%.
2. On stdhd set,
Overall PSNR gain: 0.754%, and SSIM gain: 0.814%.
On hevchd set,
Overall PSNR gain: 0.465%, and SSIM gain: 0.527%.
Speed loss on HD clips was 3.5%.
Fixes some issues introduced by a merge of two patches.
Also decouples the temporal interpolation filter from the switchable
filters for now for ease of experimentation with both separately.
Jingning Han [Fri, 26 Feb 2016 17:23:43 +0000 (09:23 -0800)]
Unify frame border extension operation
This commit unifies the encoder and decoder border extension and
motion compensated prediction process. Remove the decoder specific
flow to simplify the development flow.
Geza Lore [Mon, 22 Feb 2016 10:55:32 +0000 (10:55 +0000)]
Port interintra experiment from nextgen.
The interintra experiment, which combines an inter prediction and an
inter prediction have been ported from the nextgen branch. The
experiment is merged into ext_inter, so there is no separate configure
option to enable it.
Yi Luo [Wed, 24 Feb 2016 00:59:38 +0000 (16:59 -0800)]
Implemented DST 8x8 with SSE2 intrinsics.
Implemented fdst8_sse2() function against C version: fdst8().
Added seven DST related hybrid transform types in vp10_fht8x8_sse2().
Replaced vp10_fht8x8_c() with vp10_fht8x8_sse2() in fwd_txfm_8x8().
Speedup: 18.1%, 11.5%, 22.0% based on speed test from
city_cif.y4m, garden_sif.y4m, mobile_cif.y4m.
Adds hooks to use 32x32 ext-tx. Also adds scan orders for the masked
transforms for 32x32.
Make macro USE_MSKTX_FOR_32X32 1 in blockd.h to support 32x32 masked
transforms for ext-tx.
Jingning Han [Tue, 23 Feb 2016 17:25:24 +0000 (09:25 -0800)]
Enable context based motion vector entropy coding
This commit enables a context based motion vector entropy coding
conditioned on dynamic reference motion vector list. This (along with
the previous CL) imporves the coding gains due to dynamic motion
vector referencing based entropy coding:
derf 0.1%
hevcmr 0.2%
stdhd 0.7%
hevchr 0.4%
Yue Chen [Thu, 18 Feb 2016 20:35:14 +0000 (12:35 -0800)]
Optimizing obmc rd decision by checking the real rd cost
Instead of using model_rd_for_sb() to estimate the cost and make the
decision on bmc/obmc, we use super_block_yrd/uvrd() to calculate and
compare the real rd costs of bmc and obmc.
Average bit-rate reduction(%) of obmc experiment:
derflr/derfhd/hevcmr/hevchd
2.353/TBD/TBD/TBD
Before the optimization, the coding gain was:
1.582/1.109/1.600/1.164
Note: there is still some mysterious bug because that compared to
the previous version, the performance at low bit rate drops a lot.
Yaowu Xu [Mon, 22 Feb 2016 20:58:38 +0000 (12:58 -0800)]
Add shift stage in FASTSSIM computation
This commits adds a shift stage for FASTSSIM computaton when source
bit depth is different from working bit depth, to make sure metric
results are calculated in bit_depth consistent with source.
Yaowu Xu [Mon, 22 Feb 2016 18:22:42 +0000 (10:22 -0800)]
Add shift stage for PSNRHVS computation
This commit adds the ability to shift down the working buffer when
source bit_depth is different than working bit_depth. It does so
by shift down to be consistent with source bit_depth.
Jingning Han [Fri, 19 Feb 2016 19:53:24 +0000 (11:53 -0800)]
Unify motion vector cost system
This commit unifies the motion vector cost buffers for full pixel
and sub-pixel motion search. The new motion vector coding system
provides 0.5% coding gains for 720p and above sequences and 0.2%
for lower resolution sets.
Angie Chiang [Sat, 20 Feb 2016 03:31:38 +0000 (19:31 -0800)]
Fix 12 TAP convolution bug
Priviously, we do 12-tap interpolation even there is no sub pixel,
This could cause a bug becuase decoder doesn't extend border when there
is no sub pixel. In this situation, if we still do interpolation, we
will access the border extension which doesn't exist and cause a
memory error