Alex Converse [Wed, 17 Feb 2016 19:07:20 +0000 (11:07 -0800)]
Add a placeholder forward buffered ANS coder.
This buffered ANS coder supports coding the symbols in forward (decode)
order. Rather than windowing or growing the buffer, right now this
coder merely asserts that the buffer will never overflow.
This approach should allow ANS to be used as a drop in replacement for
other entropy coders rather than requiring complicated reversal logic
throughout the codebase.
Jingning Han [Fri, 11 Mar 2016 20:05:18 +0000 (12:05 -0800)]
Turn off 32x32 transform type selection
Temporarily disable transform type selection for 32x32 transform
block size. This speeds up the encoding process. For bus at CIF
150 frames, the encoding time goes from 896s -> 762s (11% faster).
The compression performance for lowres set is improved by 0.15%,
and -0.029% for hdres.
Jingning Han [Thu, 10 Mar 2016 00:40:08 +0000 (16:40 -0800)]
Enable hybrid 1-D/2-D transform coding for highbd setting
This commit enables the hybrid 1-D/2-D transform coding scheme for
high bit-depth setting. It improves the compression performance of
ext-tx experiment by 0.98% for lowres_all set.
Jingning Han [Wed, 9 Mar 2016 16:58:07 +0000 (08:58 -0800)]
Add horizontal and vertical scan order for 1-D transform
This commit enables the 1-D transform to use Manhattan grid vertical
and horizontal scan order for transform coefficient entropy coding.
Enabled in inter prediction mode, the hybrid 1D/2D transform coding
scheme outperforms the 2D-DCT based coding system used in VP9 by
lowres_all 1.7%
hdres_all 1.4%
As one coding option, in addition to the existing 17 other transform
types in ext-tx experiment, the 1D/2D hybrid transform improves
the coding gains:
lowres_all 2.2% -> 3.0%
- Implemented fdst16_sse2(), fdst16_8col() against C version: fdst16().
- Turned on 7 DST related hybrid txfm types in vp10_fht16x16_sse2().
- Replaced vp10_fht10x10_c() with vp10_fht16x16_sse2() in
fwd_txfm_16x16().
- Added vp10_fht16x16_sse2() unit test against C version:
vp10_fht16x16_c() (--gtest_filter=*VP10Trans16x16*).
- Unit test passed.
- Speed improvement: 2.4%, 3.2%, 3.2%, for city_cif.y4m, garden_sif.y4m,
and mobile_cif.y4m.
Yi Luo [Mon, 7 Mar 2016 22:25:07 +0000 (14:25 -0800)]
Added vp10_fht8x8_sse2() unit test
- Inherited base class TransformTestBase to derived class VP10Trans8x8HT.
- Employed RunCoeffCheck() to test vp10_fht8x8_sse2() against C reference
function vp10_fht8x8_c().
- fdst8_sse2() related seven hybrid transform cases are covered in this
test.
- Test passed (4 test cases w/o EXT_TX; 16 test cases with EXT_TX).
Jingning Han [Sat, 5 Mar 2016 05:23:55 +0000 (21:23 -0800)]
Hybrid 1-D/2-D transform coding
This commit enables a hybrid 1-D/2-D transform coding scheme and
the accompany entropy coding system. It currently uses hybrid
1-D/2-D DCT transform coding. It provides coding performance gains:
Yi Luo [Mon, 29 Feb 2016 17:53:42 +0000 (09:53 -0800)]
Added vp10_fht4x4_sse2() unit test
Inherited class TransformTestBase to derived class VP10Trans4x4HT.
Employed RunCoeffCheck() to test vp10_fht4x4_sse2() against
C reference vp10_fht4x4_c().
fdst4_sse2() related seven hybrid transform cases are covered
in this test.
Wrote a header file for test base class. Some modification to
make sure the base class can be used for 8x8, 16x16, 32x32 cases.
All related tests passed.
Sarah Parker [Tue, 1 Mar 2016 18:12:13 +0000 (10:12 -0800)]
Adding speed feature interface for ext tx search
This sets up the interface for 3 speed features that progressively
eliminate a greater number of transforms in ext tx using
pre-trained support vector machines.
Each speed feature still needs to be implemented.
Alex Converse [Tue, 16 Feb 2016 21:41:01 +0000 (13:41 -0800)]
ANS: Switch from PDFs to CDFs.
Make the RANS implementation operate on cumulative distribution
functions rather than individual probability distribution functions.
CDFs have shown themselves more flexible to work with.
Reduces decoding memory usage from scaling O(num_distributions *
symbol_resolution) to O(num_distributions).
No bitstream change. This is an purely implementation change.
Yi Luo [Wed, 2 Mar 2016 21:45:52 +0000 (13:45 -0800)]
Fixed a computation bug in fdct16_sse2()
fdct16_sse2() was not bit-exact with C reference, fdct16().
The inconsistency was found by writing a unit test for
vp10_fht16x16_sse2(). Since the unit test needs a pending
change on the inherited base class. I will commit this unit
test after making a header file for this base class.
Passed the uncommitted unit test: vp10_fht16x16_test.cc.
Yunqing Wang [Tue, 16 Feb 2016 22:33:18 +0000 (14:33 -0800)]
Do sub-pixel motion search in up-sampled reference frames
Up-sampled the reference frames to 8 times in each dimension using
the 8-tap interpolation filter. In sub-pixel motion search, use the
up-sampled reference frames to find the best matching blocks. This
largely improved the motion search precision, and thus, improved
the compression quality. There was no change in decoder side.
Borg test and speed test results:
1. On derflr set,
Overall PSNR gain: 1.306%, and SSIM gain: 1.512%.
Average speed loss on derf set was 6.0%.
2. On stdhd set,
Overall PSNR gain: 0.754%, and SSIM gain: 0.814%.
On hevchd set,
Overall PSNR gain: 0.465%, and SSIM gain: 0.527%.
Speed loss on HD clips was 3.5%.
Fixes some issues introduced by a merge of two patches.
Also decouples the temporal interpolation filter from the switchable
filters for now for ease of experimentation with both separately.