Geza Lore [Mon, 30 May 2016 00:35:35 +0000 (01:35 +0100)]
Pre-compute and use contiguous wedge masks.
This is purely a refactoring patch and has no functional effect.
Uses of these masks can be arranged such that all input blocks are
contiguous in memory (stride == block width). In this case 1D versions
of operations can be used. 1D vector operations have superior performance
over 2D block equivalents as they are more processor cache friendly and
they can do away with a second loop overhead.
hui su [Sat, 21 May 2016 00:40:00 +0000 (17:40 -0700)]
Add a speed feature for intra tx type search
Add a speed feature to seperate prediction mode and tx type search
for intra modes: search for best intra prediction mode with fixed
default tx type first, then choose the best tx type for the
selected mode.
Zoe Liu [Wed, 25 May 2016 18:57:15 +0000 (11:57 -0700)]
Make the bi-predictive frame group interval adjustable
This is for the bidir-pred experiment. Previously the length of the
bi-predictive frame group interval is fixed at 2, i.e. one
bi-predictive frame may be inserted every other frame. This patch
makes the length adjustable, i.e. any positive number may be
specified, but the use of the backward ref will be turned off if the
bi-predictive frame group interval is larger than the golden frame
group.
Further, an additional rate factor level has been added:
INTER_LOW
, which applies to LAST_BIPRED_UPDATE frames that are not used as
references.
Zoe Liu [Thu, 4 Feb 2016 17:47:46 +0000 (09:47 -0800)]
Added an experiment "bidir_pred" for backward prediction
Major parts have been implemented as follows:
(1) Added BRF_UPDATE, LASTNRF_UPDATE, and NRF_UPDATE in firstpass.c;
(2) Added the handling for the scenario of
"cpi->common.show_existing_frame == 1" at the encoder;
(3) Added a new reference frame of BWDREF_FRAME;
(4) Have bwd-ref work with upsampled references.
Note that when the experiment of "ext_refs" turned on, this experiment
will be turned off automatically currently.
RD performance in Overall PSNR has been improved, compared against the
VP10 baseline:
Yi Luo [Thu, 19 May 2016 21:13:07 +0000 (14:13 -0700)]
HBD inverse HT 8x8 and 16x16 sse4.1 optimization
- Covers tx_type: DCT_DCT, DCT_ADST, ADST_DCT, ADST_ADST.
- Encoding speed improves ~27% on crowd_run_1080p_12.
- Merge 4x4, 8x8, 16x16 unit tests in one test file.
Geza Lore [Tue, 24 May 2016 10:26:18 +0000 (11:26 +0100)]
Remove redundant memcpy from wedge predictor.
Removing redundant calls to memcpy from
build_wedge_inter_predictor_from_buf yields a net 4% encoder speedup
with ext-inter only. The output is identical.
Jingning Han [Thu, 19 May 2016 16:57:23 +0000 (09:57 -0700)]
Properly handle the filter extension in highbd setting
This commit makes the filter extension in highbd aware of the
dual filter and ext-interp experiments to prevent enc/dec mismatch
when both experiments are turned on.
Jingning Han [Tue, 17 May 2016 01:27:20 +0000 (18:27 -0700)]
Rework sub8x8 chroma component inter predictor
This commit makes the sub8x8 chroma component inter predictor
operate at 2x2 block level. This allows one to use the actual motion
vector associated with each individal pixel block. It improves the
compression performance
Geza Lore [Wed, 18 May 2016 14:29:30 +0000 (15:29 +0100)]
Fix obmc + ext-interp interference
With ext-interp, a switchable interpolation filter is coded iff the
motion vector uses fractional pixel movement (ie, true subpixel
movement). With ext-interp and obmc enabled at the same time, the RD
search proceeds as:
1. Do motion search
2. Do interpolation filter search iff subpixel motion, otherwise use
EIGHTTAP_REGULAR
3. Evaluate obmc=0
4. Evaluete obmc=1 - This involves another motion search
If the motion search in step 4 yields an integer motion vector, while
the search in step 1 did not, then an interp_filter value other than
EIGHTTAP_REGULAR is invalid, and will cause an assertion failure
at output time, or a mismatch if not using --enable-debug.
The fix sets the interp_filter to EIGHTTAP_REGULAR if obmc=1 is picked
with an integer motion vector.
Yi Luo [Tue, 17 May 2016 16:54:56 +0000 (09:54 -0700)]
Fix to conform Google's coding convention
- Confirm input coeff buffer is 16-byte aligned.
- sizeof() prefer variable name instead of type.
- Fix function name (Capital first letter then Pascal case).
- Long base class name uses a newline (with colon and 4 space indent).
- Remove a unnecessary reference function variable.
- Method declaration precedes variable declaration in class definition.
Jingning Han [Sat, 7 May 2016 00:12:52 +0000 (17:12 -0700)]
Account sub8x8 block reference filter type for prob context
If a reference block is coded with sub8x8 block size, and if it
has sub-pixel level motion vectors, its prediction filter type
should be used as context information.
The coding performance gains of dual filter type coding scheme are
lowres 0.57%
hdres 0.88%
- Integrate 5 flip transform types for each 4x4, 8x8, and 16x16
block, for experiment, EXT_TX.
- Encoder speed improves about 12%-15%.
- Update the unit tests for bit-exact result against C.
This commit fixes issue 1141. The issue was triggered in multi-tile
encoding. The change properly saves and restores the block context
information in the real-time mode selection process. It removes
several redundant memcpy operations in sub8x8 intra block mode
search.
Increases number of wedges for smaller block and removes
wedge coding mode for blocks larger than 32x32.
Also adds various other enhancements for subsequent experimentation,
including adding provision for multiple smoothing functions
(though one is used currently), adds a speed feature that decides
the sign for interinter wedges using a fast mechanism, and refactors
wedge representations.
lowres: -2.651% BDRATE
Most of the gain is due to increase in codebook size for 8x8 - 16x16.
Angie Chiang [Mon, 16 May 2016 18:12:18 +0000 (18:12 +0000)]
Merge changes I6aa75c66,Id5f0fade,I368d365e,Ibaf7b00b into nextgenv2
* changes:
Refactor and add flip unit test to vp10_inv_txfm2d_test.cc
Add flip feature to vp10_inv_txfm2d.c
add unit test for highbd flip transform
Refactor vp10_fwd_txfm2d_test.cc
Yi Luo [Fri, 13 May 2016 17:08:13 +0000 (10:08 -0700)]
HBD inverse HT 4x4 SSE4.1 optimization
- Tx_type: DCT_DCT, DCT_ADST, ADST_DCT, ADST_ADST.
- Encoder overall instruction count drops 2.91%.
- Decoder overall instruction count drops 1.01%.
- Add unit test to test bit-exact result against C.