Refactoring in preparation for OBMC optimizations.
- Use int32_t instead of int in vpx_obmc{variance,sad} functions
- Remove weigthed_src and obmc mask strides and assume contiguous
buffers. These inputs can always be packed as contiguous arrays.
Wei-ting Lin [Thu, 30 Jun 2016 20:33:55 +0000 (13:33 -0700)]
Remove reference frame buffer update for show_exsiting_frame
Originally we need to send the refresh flag and the virtual indices
mapping for the reference frame buffer update for show_existing_frame to
have the BWDREF_FRAME replace the LAST_FRAME.
To remove sending this information, we update the the virtual indices
of the reference frame buffer after the last_bipred_frame is encoded,
and therefore the decoder will receive the updated reference mapping
at the next non-show-existing frame.
As a result, we can save 4 bytes per show-existing frame, and get 0.12,
0.2, and 0.07 BDRATE improvement in lowres, derf, and midref test set
respectively.
Yi Luo [Sat, 25 Jun 2016 00:29:21 +0000 (17:29 -0700)]
Fix bugs in convolution filter optimization
- Fix the over-writing bug in horizontal filtering as width = 2.
- Fix 10-tap vertical filtering which no longer reads one row of
pixel above the block.
- Fix 10-tap filter zero padding.
- Encoder speed slow down ~4.0%, compared to, 81ad953 Convolution vertical filter SSSE3 optimization
Yi Luo [Thu, 23 Jun 2016 21:31:26 +0000 (14:31 -0700)]
Change register loading to fix stack overflow issue
- Use _mm_loadl_epi64 instead of _mm_loadu_si128 for
uint16_t temp2[4 * 4] buffer.
- Refer to: d0de89a remove vpx_highbd_1[02]_sub_pixel_variance4x4_sse4_1
BUG=webm:1242
Jingning Han [Thu, 23 Jun 2016 19:15:17 +0000 (12:15 -0700)]
Make recursive txfm partitioning use uniform quantizer
Replace the expanded zero-bin quantizer with uniform quantizer in
the recursive transform block partitioning scheme. This improves
the compression performance by 0.4% for lowres.
Yi Luo [Tue, 21 Jun 2016 19:17:39 +0000 (12:17 -0700)]
Convolution vertical filter SSSE3 optimization
- Apply 8-pixel vertical filtering direction parallelism.
- Add unit tests to verify bit exact.
- Encoder speed improves ~29% (enable EXT_INTERP) on Xeon E5-2680.
- Combinational cycle count of vp10_convolve() drops from 26.06%
to 6.73%.
Jingning Han [Tue, 21 Jun 2016 22:06:40 +0000 (15:06 -0700)]
Make drl support bi-directional reference frames
This commit refactors the reference frame structure used in the
dynamic motion vector referencing system, and makes it support
the bi-directional reference frames. This resolves unit test
failure (enc/dec mismatch) when both are turned on.
The compression performance (ref-mv + ext-refs) is improved by
0.2% for lowres.
Yi Luo [Tue, 14 Jun 2016 00:01:17 +0000 (17:01 -0700)]
Convolution horizontal filter SSSE3 optimization
- Apply signal direction/4-pixel vertical/8-pixel vertical
parallelism.
- Add unit test to verify the bit exact result.
- Overall encoding time improves ~24% on Xeon E5-2680 CPU.
Jingning Han [Fri, 17 Jun 2016 23:23:32 +0000 (16:23 -0700)]
Fix unit test failure in obmc exp
Properly restore the rate cost in the inner search loop of obmc
prediction. This avoids unexpected encoding behavior. It fixes
the unit test failure in obmc experiment:
Zoe Liu [Wed, 8 Jun 2016 21:27:56 +0000 (14:27 -0700)]
Merge bi-predictive frames to EXT_REFS
This patch removed the experiment of BIDIR_PRED and merged the feature
into the experiment of EXT_REFS:
(1) Each frame now has up to 6 reference frames, namely
LAST_FRAME, LAST2_FRAME, LAST3_FRAME, GOLDEN_FRAME, (forward) and
BWDREF_FRAME, ALTREF_FRAME (backward);
LAST4_FRAME has been removed;
(2) First pass still keeps the 8 updates:
KF_UPDATE, LF_UPDATE, GF_UPDATE, ARF_UPDATE, OVERLAY_UPDATE, and
BRF_UPDATE, LAST_BIPRED_UPDATE, BI_PRED_UPDATE;
(3) show_existing_frame==1 is supported in the experiment of EXT_REFS;
(4) New encoding modes are added for both single-ref and compound cases,
through the use of the 2 extra forward references (LAST2 & LAST3)
and the 1 extra backward reference (BWDREF).
RD performance wise, using Overall PSNR: Avg/BDRate
Bipred only Prev EXT_REFS Current EXT_REFS with bipred
lowres: -3.474/-3.324 -1.748/-1.586 -4.613/-4.387
derflr: -2.097/-1.353 -1.439/-1.215 -3.120/-2.252
midres: -2.129/-1.901 -1.345/-1.185 -2.898/-2.636
If in vp10/encoder/firstpass.h, change BFG_INTERVAL from 2 to 3, i.e. to
use 2 bi-predictive frames than 1, a further improvement may be
obtained:
Current EXT_REFS with bipred
1 bi-predictive frame 2 bi-predictive frames
lowres: -4.613/-4.387 -4.675/-4.465
derflr: -3.120/-2.252 -3.333/-2.516
midres: -2.898/-2.636 -3.406/-3.095
Geza Lore [Fri, 17 Jun 2016 10:28:02 +0000 (11:28 +0100)]
Make variance based partitioning compatible with SEG_LVL_SKIP
Inter blocks that have SEG_LVL_SKIP active must be at least 8x8 in
size for bitstream conformance (see read_inter_block_mode_info in
decodemv.c).
This patch makes the variance based partitioning scheme stop at 8x8
blocks in inter frames. This satisfies the SEG_LVL_SKIP constraint
and is more in line with the original implementation of this function
(before it got extended for 128x128 superblocks).
Jingning Han [Thu, 16 Jun 2016 22:18:46 +0000 (15:18 -0700)]
Skip restore token_cache value
The trellis optimization is going backward. Hence there is no need
to restore the token_cache values that is behind the current node
in the scan order.