Decouples interintra modes and probability models from regular
intra modes, to enable creating/optimizing new interintra modes.
Also, fixes interpolation values for 128x128 interintra and obmc.
Yi Luo [Wed, 30 Mar 2016 22:40:37 +0000 (15:40 -0700)]
Fixed HBD variance unit test on incorrect bit mask operation
- Change logical AND (&&) to arithmetic AND (&).
- Disable failed unit tests for next-step fixing.
https://bugs.chromium.org/p/webm/issues/detail?id=1166
Geza Lore [Thu, 24 Mar 2016 11:20:25 +0000 (11:20 +0000)]
Rename MI_BLOCK_SIZE and MI_MASK macros.
Rename MI_BLOCK_SIZE.* -> MAX_MIB_SIZE.* (MIB is for MI Block).
Rename MI_MASK.* -> MAX_MIB_MASK.*
There are no functional changes.
This is in preparation for coding the superblock size at the frame
level, which will require some of these constants to become variables.
The new names better reflect future semantics, and hence make the code
clearer.
Some fixes/speed-ups on inter-intra part of ext-inter
Fixes an issue with rectangular inter-intra blocks.
Includes various other refactoring and cleanups to enable fast mixing
of inter and intra predictors.
Uses only the best single inter reference so far for the inter-intra
search.
About 30% speed-up with a 0.1% hit in performance.
This is part one of overhauling on the ext-inter experiment. To be
continued in subsequent patches.
Geza Lore [Mon, 7 Mar 2016 13:46:39 +0000 (13:46 +0000)]
Extend superblock size fo 128x128 pixels.
If --enable-ext-partition is used at build time, the superblock size
(sometimes also referred to as coding unit (CU) size) is extended to
128x128 pixels.
Jingning Han [Tue, 22 Mar 2016 23:38:27 +0000 (16:38 -0700)]
Rework the predicted motion vector for sub8x8 block
This commit makes the sub8x8 block to use its nearest neighbor's
motion vector as predicted motion vector for NEWMV mode. It improves
the coding performance by 0.12%.
Yunqing Wang [Fri, 25 Mar 2016 18:57:20 +0000 (11:57 -0700)]
Make set_reference control API work in VP9 and VP10
Moved the API patch from NextGen to NextGenv2 and also added this
API to VP10. An example was included. To try it, for example, run
the following command:
$ examples/vpx_cx_set_ref vp10 352 288 in.yuv out.ivf 4 30
hui su [Sat, 26 Mar 2016 00:28:15 +0000 (17:28 -0700)]
Fixes for Palette mode
This patch fixes 2 issues in Palette mode:
1. More memory is needed in PALETTE_BUFFER for 444 video format.
2. A merge issue caused by
https://chromium-review.googlesource.com/#/c/333940/7
Alex Converse [Fri, 25 Mar 2016 23:11:17 +0000 (16:11 -0700)]
Use speed 2 on superframe test.
No need to do avoid shortcuts when all we are testing is the superframe
syntax. Decreases the run time up the VP10 version of the test from 22
seconds to 3 seconds on my machine.
Yi Luo [Fri, 25 Mar 2016 23:48:19 +0000 (16:48 -0700)]
8x8/16x16 HT types V_DCT to H_FLIPADST SSE2 optimization
- Wrote function: fidtx8_sse2() and fidtx16_sse2().
- Turned on vp10_fht8x8_sse2()/vp10_fht16x16_sse2() for new types.
- Updated 8x8/16x16 unit tests for accuracy/speed.
- Running 20K times with random numbers and getting through
tx type from V_DCT to H_FLIPADST, SSE2 speed improvement:
8x8: ~131%
16x16: ~66%
Yunqing Wang [Fri, 25 Mar 2016 16:05:25 +0000 (09:05 -0700)]
Recover tile coding performance
After porting tile coding from VP9 to VP10, some performance
degradation was seen because of the difference between VP9 and
Vp10 baseline. This patch disabled some features in VP10 while
tile coding is turned on. Also, an encoder control API was added
back for this use case.
Yi Luo [Wed, 23 Mar 2016 23:22:43 +0000 (16:22 -0700)]
4x4 hybrid transform type V_DCT to H_FLIPADST SSE2 optimization
- Added function fidtx4_sse2().
- Turned on vp10_fht4x4_sse2() for these tx types.
- Updated 4x4 unit test for speed/accuracy.
- 4x4 Unit test passed.
- Running 20K times with random numbers for tx type from
V_DCT to H_FLIPADST, SSE2 against C, speed improves ~46%.
Geza Lore [Fri, 11 Mar 2016 17:42:49 +0000 (17:42 +0000)]
Port large scale tile coding features from nextgen.
If configured with --enable-ext-tile, the codec uses an alternative
tile coding syntax in the bitstream. Changes include::
- The maximum number of tile rows and columns is extended to 1024
each.
- The minimum tile width/height is 64 pixels (1 superblock).
- A tile copy mode is added where a tile directly reuse the coded
data of a previous tile
- The meaning of the tile-columns and tile-rows codec parameters are
overloaded to mean tile-width and tile-height in units of 64
pixels.
- All tiles should now be independent, including rows within the
same columns, so large scale parallel, or independent decoding is
possible.
- vpxdec also gained the options to decode only a particular tile,
tile row, or tile column.
Changes without --enable-ext-tile:
- All tiles should now be independent, including rows within the
same columns, so large scale parallel, or independent decoding is
possible.
- vpxenc default tile configuration changed to use 1 tile column.
Yi Luo [Wed, 23 Mar 2016 19:10:52 +0000 (12:10 -0700)]
Misc. updates for highbd changes
- Use Makefile to control the build for highbd_fwd_txfm_sse4.c.
- Fixed hybrid transform (HT) types due to recent update.
- Added new unit test cases for highbd HT.
Yi Luo [Wed, 23 Mar 2016 18:30:39 +0000 (18:30 +0000)]
Merge "Highbd fht4x4 SSE4.1 optimization for DCT_DCT mode - Setup function vp10_highbd_fht4x4_sse4_1 for highbd SSE4.1 intrinsics optimization. - Wrote SSE4.1 functions: load_buffer_4x4(), write_buffer_4x4(), and fdct4x4_sse4_1(). - Used logic right shift to avoid coeff memory write/read. - Turned on vp10_highbd_fht4x4_sse4_1 for DCT_DCT mode only. - Improved overall encoding performance >2.3% for 50 frames sequence, park_joy_1080p_12.y4m, in which, --input-bit-depth=12, --bit-depth=12, 50 frames. - Unit test passed." into nextgenv2
Yi Luo [Wed, 16 Mar 2016 00:09:38 +0000 (17:09 -0700)]
Highbd fht4x4 SSE4.1 optimization for DCT_DCT mode
- Setup function vp10_highbd_fht4x4_sse4_1 for highbd SSE4.1
intrinsics optimization.
- Wrote SSE4.1 functions: load_buffer_4x4(), write_buffer_4x4(),
and fdct4x4_sse4_1().
- Used logic right shift to avoid coeff memory write/read.
- Turned on vp10_highbd_fht4x4_sse4_1 for DCT_DCT mode only.
- Improved overall encoding performance >2.3% for 50 frames
sequence, park_joy_1080p_12.y4m, in which, --input-bit-depth=12,
--bit-depth=12, 50 frames.
- Unit test passed.
Yunqing Wang [Tue, 22 Mar 2016 17:54:43 +0000 (10:54 -0700)]
Prevent encoder crash caused by row tile dependencies
In multi-thread case, the encoder may crash if using encoder option
tile-rows > 0. To prevent that, force tile-rows=0 in this situation.
This is a workaround for WebM issue 1095:
https://bugs.chromium.org/p/webm/issues/detail?id=1095
The further fix can be done by adding synchronizations after a tile
row is encoded. But this will hurt multi-threaded encoder performance.
So, it is recommended to use tile-rows=0 while encoding with threads
> 1.
Yunqing Wang [Tue, 22 Mar 2016 21:13:18 +0000 (14:13 -0700)]
Simplify the loopfilter synchronization logic in VP8 encoder
This patch was to fix a reported Hangouts deadlock/freezing issue
in VP8 encoder(issue 27232610). The original encoder loopfilter
synchronization happened in the following frame, which was prone
to causing problems in some complex use cases. This patch simplified
the synchronization logic.