Geza Lore [Thu, 24 Mar 2016 13:56:05 +0000 (13:56 +0000)]
Make superblock size variable at the frame level.
The uncompressed frame header contains a bit to signal whether the
frame is encoded using 64x64 or 128x128 superblocks. This can vary
between any 2 frames.
vpxenc gained the --sb-size={64,128,dynamic} option, which allows the
configuration of the superblock size used (default is dynamic). 64/128
will force the encoder to always use the specified superblock size.
Dynamic would enable the encoder to choose the sb size for each
frame, but this is not implemented yet (dynamic does the same as 128
for now).
Constraints on tile sizes depend on the superblock size, the following
is a summary of the current bitstream syntax and semantics:
If both --enable-ext-tile is OFF and --enable-ext-partition is OFF:
The tile coding in this case is the same as VP9. In particular,
tiles have a minimum width of 256 pixels and a maximum width of
4096 pixels. The tile width must be multiples of 64 pixels
(except for the rightmost tile column). There can be a maximum
of 64 tile columns and 4 tile rows.
If --enable-ext-tile is OFF and --enable-ext-partition is ON:
Same constraints as above, except that tile width must be
multiples of 128 pixels (except for the rightmost tile column).
There is no change in the bitstream syntax used for coding the tile
configuration if --enable-ext-tile is OFF.
If --enable-ext-tile is ON and --enable-ext-partition is ON:
This is the new large scale tile coding configuration. The
minimum/maximum tile width and height are 64/4096 pixels. Tile
width and height must be multiples of 64 pixels. The uncompressed
header contains two 6 bit fields that hold the tile width/heigh
in units of 64 pixels. The maximum number of tile rows/columns
is only limited by the maximum frame size of 65536x65536 pixels
that can be coded in the bitstream. This yields a maximum of
1024x1024 tile rows and columns (of 64x64 tiles in a 65536x65536
frame).
If both --enable-ext-tile is ON and --enable-ext-partition is ON:
Same applies as above, except that in the bitstream the 2 fields
containing the tile width/height are in units of the superblock
size, and the superblock size itself is also coded in the bitstream.
If the uncompressed header signals the use of 64x64 superblocks,
then the tile width/height fields are 6 bits wide and are in units
of 64 pixels. If the uncompressed header signals the use of 128x128
superblocks, then the tile width/height fields are 5 bits wide and
are in units of 128 pixels.
The above is a summary of the bitstream. The user interface to vpxenc
(and the equivalent encoder API) behaves a follows:
If --enable-ext-tile is OFF:
No change in the user interface. --tile-columns and --tile-rows
specify the base 2 logarithm of the desired number of tile columns
and tile rows. The actual number of tile rows and tile columns,
and the particular tile width and tile height are computed by the
codec ensuring all of the above constraints are respected.
If --enable-ext-tile is ON, but --enable-ext-partition is OFF:
No change in the user interface. --tile-columns and --tile-rows
specify the WIDTH and HEIGHT of the tiles in unit of 64 pixels.
The valid values are in the range [1, 64] (which corresponds to
[64, 4096] pixels in increments of 64.
If both --enable-ext-tile is ON and --enable-ext-partition is ON:
If --sb-size=64 (default):
The user interface is the same as in the previous point.
--tile-columns and --tile-rows specify tile WIDTH and HEIGHT,
in units of 64 pixels, in the range [1, 64] (which corresponds
to [64, 4096] pixels in increments of 64).
If --sb-size=128 or --sb-size=dynamic:
--tile-columns and --tile-rows specify tile WIDTH and HEIGHT,
in units of 128 pixels in the range [1, 32] (which corresponds
to [128, 4096] pixels in increments of 128).
Julia Robson [Wed, 6 Apr 2016 10:16:57 +0000 (11:16 +0100)]
Fixing assertion in *Large unit tests
In certain cases the code was subtracting the obmc cost
despite it not having been added previously.
For example with ref_mv, supertx, ext_inter, obmc & ext_refs
enabled the following test was failing but now passes:
"VP10/ArfFreqTestLarge.MinArfFreqTest/33"
Yi Luo [Mon, 4 Apr 2016 16:44:56 +0000 (09:44 -0700)]
Fix high bit depth mask and variance reference function
- Use arithmetic AND (&) instead of logical AND (&&) to
generate correct testing input.
- Fix variance reference function to be consistent with
our codebase implementation.
- Refer to the following issue:
https://bugs.chromium.org/p/webm/issues/detail?id=1166
Alex Converse [Tue, 22 Mar 2016 17:33:34 +0000 (10:33 -0700)]
ANS experiment: Use ANS everywhere.
Use ANS for all entropy coded data in VP10 including the compressed header and
modes and motion vectors. ANS tokens continue to be used for DCT tokens.
Decouples interintra modes and probability models from regular
intra modes, to enable creating/optimizing new interintra modes.
Also, fixes interpolation values for 128x128 interintra and obmc.
Yi Luo [Wed, 30 Mar 2016 22:40:37 +0000 (15:40 -0700)]
Fixed HBD variance unit test on incorrect bit mask operation
- Change logical AND (&&) to arithmetic AND (&).
- Disable failed unit tests for next-step fixing.
https://bugs.chromium.org/p/webm/issues/detail?id=1166
Geza Lore [Thu, 24 Mar 2016 11:20:25 +0000 (11:20 +0000)]
Rename MI_BLOCK_SIZE and MI_MASK macros.
Rename MI_BLOCK_SIZE.* -> MAX_MIB_SIZE.* (MIB is for MI Block).
Rename MI_MASK.* -> MAX_MIB_MASK.*
There are no functional changes.
This is in preparation for coding the superblock size at the frame
level, which will require some of these constants to become variables.
The new names better reflect future semantics, and hence make the code
clearer.
Some fixes/speed-ups on inter-intra part of ext-inter
Fixes an issue with rectangular inter-intra blocks.
Includes various other refactoring and cleanups to enable fast mixing
of inter and intra predictors.
Uses only the best single inter reference so far for the inter-intra
search.
About 30% speed-up with a 0.1% hit in performance.
This is part one of overhauling on the ext-inter experiment. To be
continued in subsequent patches.
Geza Lore [Mon, 7 Mar 2016 13:46:39 +0000 (13:46 +0000)]
Extend superblock size fo 128x128 pixels.
If --enable-ext-partition is used at build time, the superblock size
(sometimes also referred to as coding unit (CU) size) is extended to
128x128 pixels.
Jingning Han [Tue, 22 Mar 2016 23:38:27 +0000 (16:38 -0700)]
Rework the predicted motion vector for sub8x8 block
This commit makes the sub8x8 block to use its nearest neighbor's
motion vector as predicted motion vector for NEWMV mode. It improves
the coding performance by 0.12%.
Yunqing Wang [Fri, 25 Mar 2016 18:57:20 +0000 (11:57 -0700)]
Make set_reference control API work in VP9 and VP10
Moved the API patch from NextGen to NextGenv2 and also added this
API to VP10. An example was included. To try it, for example, run
the following command:
$ examples/vpx_cx_set_ref vp10 352 288 in.yuv out.ivf 4 30
hui su [Sat, 26 Mar 2016 00:28:15 +0000 (17:28 -0700)]
Fixes for Palette mode
This patch fixes 2 issues in Palette mode:
1. More memory is needed in PALETTE_BUFFER for 444 video format.
2. A merge issue caused by
https://chromium-review.googlesource.com/#/c/333940/7
Alex Converse [Fri, 25 Mar 2016 23:11:17 +0000 (16:11 -0700)]
Use speed 2 on superframe test.
No need to do avoid shortcuts when all we are testing is the superframe
syntax. Decreases the run time up the VP10 version of the test from 22
seconds to 3 seconds on my machine.
Yi Luo [Fri, 25 Mar 2016 23:48:19 +0000 (16:48 -0700)]
8x8/16x16 HT types V_DCT to H_FLIPADST SSE2 optimization
- Wrote function: fidtx8_sse2() and fidtx16_sse2().
- Turned on vp10_fht8x8_sse2()/vp10_fht16x16_sse2() for new types.
- Updated 8x8/16x16 unit tests for accuracy/speed.
- Running 20K times with random numbers and getting through
tx type from V_DCT to H_FLIPADST, SSE2 speed improvement:
8x8: ~131%
16x16: ~66%
Yunqing Wang [Fri, 25 Mar 2016 16:05:25 +0000 (09:05 -0700)]
Recover tile coding performance
After porting tile coding from VP9 to VP10, some performance
degradation was seen because of the difference between VP9 and
Vp10 baseline. This patch disabled some features in VP10 while
tile coding is turned on. Also, an encoder control API was added
back for this use case.
Yi Luo [Wed, 23 Mar 2016 23:22:43 +0000 (16:22 -0700)]
4x4 hybrid transform type V_DCT to H_FLIPADST SSE2 optimization
- Added function fidtx4_sse2().
- Turned on vp10_fht4x4_sse2() for these tx types.
- Updated 4x4 unit test for speed/accuracy.
- 4x4 Unit test passed.
- Running 20K times with random numbers for tx type from
V_DCT to H_FLIPADST, SSE2 against C, speed improves ~46%.