Jingning Han [Tue, 22 Mar 2016 23:38:27 +0000 (16:38 -0700)]
Rework the predicted motion vector for sub8x8 block
This commit makes the sub8x8 block to use its nearest neighbor's
motion vector as predicted motion vector for NEWMV mode. It improves
the coding performance by 0.12%.
Yunqing Wang [Fri, 25 Mar 2016 18:57:20 +0000 (11:57 -0700)]
Make set_reference control API work in VP9 and VP10
Moved the API patch from NextGen to NextGenv2 and also added this
API to VP10. An example was included. To try it, for example, run
the following command:
$ examples/vpx_cx_set_ref vp10 352 288 in.yuv out.ivf 4 30
hui su [Sat, 26 Mar 2016 00:28:15 +0000 (17:28 -0700)]
Fixes for Palette mode
This patch fixes 2 issues in Palette mode:
1. More memory is needed in PALETTE_BUFFER for 444 video format.
2. A merge issue caused by
https://chromium-review.googlesource.com/#/c/333940/7
Alex Converse [Fri, 25 Mar 2016 23:11:17 +0000 (16:11 -0700)]
Use speed 2 on superframe test.
No need to do avoid shortcuts when all we are testing is the superframe
syntax. Decreases the run time up the VP10 version of the test from 22
seconds to 3 seconds on my machine.
Yi Luo [Fri, 25 Mar 2016 23:48:19 +0000 (16:48 -0700)]
8x8/16x16 HT types V_DCT to H_FLIPADST SSE2 optimization
- Wrote function: fidtx8_sse2() and fidtx16_sse2().
- Turned on vp10_fht8x8_sse2()/vp10_fht16x16_sse2() for new types.
- Updated 8x8/16x16 unit tests for accuracy/speed.
- Running 20K times with random numbers and getting through
tx type from V_DCT to H_FLIPADST, SSE2 speed improvement:
8x8: ~131%
16x16: ~66%
Yunqing Wang [Fri, 25 Mar 2016 16:05:25 +0000 (09:05 -0700)]
Recover tile coding performance
After porting tile coding from VP9 to VP10, some performance
degradation was seen because of the difference between VP9 and
Vp10 baseline. This patch disabled some features in VP10 while
tile coding is turned on. Also, an encoder control API was added
back for this use case.
Yi Luo [Wed, 23 Mar 2016 23:22:43 +0000 (16:22 -0700)]
4x4 hybrid transform type V_DCT to H_FLIPADST SSE2 optimization
- Added function fidtx4_sse2().
- Turned on vp10_fht4x4_sse2() for these tx types.
- Updated 4x4 unit test for speed/accuracy.
- 4x4 Unit test passed.
- Running 20K times with random numbers for tx type from
V_DCT to H_FLIPADST, SSE2 against C, speed improves ~46%.
Geza Lore [Fri, 11 Mar 2016 17:42:49 +0000 (17:42 +0000)]
Port large scale tile coding features from nextgen.
If configured with --enable-ext-tile, the codec uses an alternative
tile coding syntax in the bitstream. Changes include::
- The maximum number of tile rows and columns is extended to 1024
each.
- The minimum tile width/height is 64 pixels (1 superblock).
- A tile copy mode is added where a tile directly reuse the coded
data of a previous tile
- The meaning of the tile-columns and tile-rows codec parameters are
overloaded to mean tile-width and tile-height in units of 64
pixels.
- All tiles should now be independent, including rows within the
same columns, so large scale parallel, or independent decoding is
possible.
- vpxdec also gained the options to decode only a particular tile,
tile row, or tile column.
Changes without --enable-ext-tile:
- All tiles should now be independent, including rows within the
same columns, so large scale parallel, or independent decoding is
possible.
- vpxenc default tile configuration changed to use 1 tile column.
Yi Luo [Wed, 23 Mar 2016 19:10:52 +0000 (12:10 -0700)]
Misc. updates for highbd changes
- Use Makefile to control the build for highbd_fwd_txfm_sse4.c.
- Fixed hybrid transform (HT) types due to recent update.
- Added new unit test cases for highbd HT.
Yi Luo [Wed, 23 Mar 2016 18:30:39 +0000 (18:30 +0000)]
Merge "Highbd fht4x4 SSE4.1 optimization for DCT_DCT mode - Setup function vp10_highbd_fht4x4_sse4_1 for highbd SSE4.1 intrinsics optimization. - Wrote SSE4.1 functions: load_buffer_4x4(), write_buffer_4x4(), and fdct4x4_sse4_1(). - Used logic right shift to avoid coeff memory write/read. - Turned on vp10_highbd_fht4x4_sse4_1 for DCT_DCT mode only. - Improved overall encoding performance >2.3% for 50 frames sequence, park_joy_1080p_12.y4m, in which, --input-bit-depth=12, --bit-depth=12, 50 frames. - Unit test passed." into nextgenv2
Yi Luo [Wed, 16 Mar 2016 00:09:38 +0000 (17:09 -0700)]
Highbd fht4x4 SSE4.1 optimization for DCT_DCT mode
- Setup function vp10_highbd_fht4x4_sse4_1 for highbd SSE4.1
intrinsics optimization.
- Wrote SSE4.1 functions: load_buffer_4x4(), write_buffer_4x4(),
and fdct4x4_sse4_1().
- Used logic right shift to avoid coeff memory write/read.
- Turned on vp10_highbd_fht4x4_sse4_1 for DCT_DCT mode only.
- Improved overall encoding performance >2.3% for 50 frames
sequence, park_joy_1080p_12.y4m, in which, --input-bit-depth=12,
--bit-depth=12, 50 frames.
- Unit test passed.
Julia Robson [Thu, 17 Mar 2016 16:50:28 +0000 (16:50 +0000)]
Porting ext_partition experiment from nextgen
This has been ported under ext_partition_types because it is due
to be combined with the coding_unit_size experiment which is
already being ported under ext_partition
Yue Chen [Mon, 21 Mar 2016 18:53:57 +0000 (11:53 -0700)]
Refactor transform type-size search function
Decompose choose_tx_size_from_rd into three functions that determine
the transform coding rd at different stages. Besides the original
function, txfm_yrd() calculates the rd for fixed size and type.
choose_tx_size_fix_type() fixes the type and searches for the size.
It can enable other experiments to do restricted tx searches so as to
reduce the impact on speed.
Similar refactoring is done for select_tx_type_yrd() in VAR_TX.
Performance change in baseline is trivial:
0.014/0.001/-0.020 for lowres/midres/hdres.
Alex Converse [Wed, 17 Feb 2016 19:07:20 +0000 (11:07 -0800)]
Add a placeholder forward buffered ANS coder.
This buffered ANS coder supports coding the symbols in forward (decode)
order. Rather than windowing or growing the buffer, right now this
coder merely asserts that the buffer will never overflow.
This approach should allow ANS to be used as a drop in replacement for
other entropy coders rather than requiring complicated reversal logic
throughout the codebase.
Jingning Han [Tue, 15 Mar 2016 22:58:03 +0000 (15:58 -0700)]
Enable dynamic motion vector referencing for newmv mode
This commit enables the dynamic motion vector predictor for NEWMV
mode. It allows the codec to select the best motion vector predictor
in a rate-distortion optimization framework for motion vector
residual coding. The compression performance is improved:
lowres 0.14%
midres 0.27%
hdres 0.24%