Yi Luo [Sat, 25 Jun 2016 00:29:21 +0000 (17:29 -0700)]
Fix bugs in convolution filter optimization
- Fix the over-writing bug in horizontal filtering as width = 2.
- Fix 10-tap vertical filtering which no longer reads one row of
pixel above the block.
- Fix 10-tap filter zero padding.
- Encoder speed slow down ~4.0%, compared to, 81ad953 Convolution vertical filter SSSE3 optimization
Yi Luo [Thu, 23 Jun 2016 21:31:26 +0000 (14:31 -0700)]
Change register loading to fix stack overflow issue
- Use _mm_loadl_epi64 instead of _mm_loadu_si128 for
uint16_t temp2[4 * 4] buffer.
- Refer to: d0de89a remove vpx_highbd_1[02]_sub_pixel_variance4x4_sse4_1
BUG=webm:1242
Jingning Han [Thu, 23 Jun 2016 19:15:17 +0000 (12:15 -0700)]
Make recursive txfm partitioning use uniform quantizer
Replace the expanded zero-bin quantizer with uniform quantizer in
the recursive transform block partitioning scheme. This improves
the compression performance by 0.4% for lowres.
Yi Luo [Tue, 21 Jun 2016 19:17:39 +0000 (12:17 -0700)]
Convolution vertical filter SSSE3 optimization
- Apply 8-pixel vertical filtering direction parallelism.
- Add unit tests to verify bit exact.
- Encoder speed improves ~29% (enable EXT_INTERP) on Xeon E5-2680.
- Combinational cycle count of vp10_convolve() drops from 26.06%
to 6.73%.
Jingning Han [Tue, 21 Jun 2016 22:06:40 +0000 (15:06 -0700)]
Make drl support bi-directional reference frames
This commit refactors the reference frame structure used in the
dynamic motion vector referencing system, and makes it support
the bi-directional reference frames. This resolves unit test
failure (enc/dec mismatch) when both are turned on.
The compression performance (ref-mv + ext-refs) is improved by
0.2% for lowres.
Yi Luo [Tue, 14 Jun 2016 00:01:17 +0000 (17:01 -0700)]
Convolution horizontal filter SSSE3 optimization
- Apply signal direction/4-pixel vertical/8-pixel vertical
parallelism.
- Add unit test to verify the bit exact result.
- Overall encoding time improves ~24% on Xeon E5-2680 CPU.
Jingning Han [Fri, 17 Jun 2016 23:23:32 +0000 (16:23 -0700)]
Fix unit test failure in obmc exp
Properly restore the rate cost in the inner search loop of obmc
prediction. This avoids unexpected encoding behavior. It fixes
the unit test failure in obmc experiment:
Zoe Liu [Wed, 8 Jun 2016 21:27:56 +0000 (14:27 -0700)]
Merge bi-predictive frames to EXT_REFS
This patch removed the experiment of BIDIR_PRED and merged the feature
into the experiment of EXT_REFS:
(1) Each frame now has up to 6 reference frames, namely
LAST_FRAME, LAST2_FRAME, LAST3_FRAME, GOLDEN_FRAME, (forward) and
BWDREF_FRAME, ALTREF_FRAME (backward);
LAST4_FRAME has been removed;
(2) First pass still keeps the 8 updates:
KF_UPDATE, LF_UPDATE, GF_UPDATE, ARF_UPDATE, OVERLAY_UPDATE, and
BRF_UPDATE, LAST_BIPRED_UPDATE, BI_PRED_UPDATE;
(3) show_existing_frame==1 is supported in the experiment of EXT_REFS;
(4) New encoding modes are added for both single-ref and compound cases,
through the use of the 2 extra forward references (LAST2 & LAST3)
and the 1 extra backward reference (BWDREF).
RD performance wise, using Overall PSNR: Avg/BDRate
Bipred only Prev EXT_REFS Current EXT_REFS with bipred
lowres: -3.474/-3.324 -1.748/-1.586 -4.613/-4.387
derflr: -2.097/-1.353 -1.439/-1.215 -3.120/-2.252
midres: -2.129/-1.901 -1.345/-1.185 -2.898/-2.636
If in vp10/encoder/firstpass.h, change BFG_INTERVAL from 2 to 3, i.e. to
use 2 bi-predictive frames than 1, a further improvement may be
obtained:
Current EXT_REFS with bipred
1 bi-predictive frame 2 bi-predictive frames
lowres: -4.613/-4.387 -4.675/-4.465
derflr: -3.120/-2.252 -3.333/-2.516
midres: -2.898/-2.636 -3.406/-3.095
Geza Lore [Fri, 17 Jun 2016 10:28:02 +0000 (11:28 +0100)]
Make variance based partitioning compatible with SEG_LVL_SKIP
Inter blocks that have SEG_LVL_SKIP active must be at least 8x8 in
size for bitstream conformance (see read_inter_block_mode_info in
decodemv.c).
This patch makes the variance based partitioning scheme stop at 8x8
blocks in inter frames. This satisfies the SEG_LVL_SKIP constraint
and is more in line with the original implementation of this function
(before it got extended for 128x128 superblocks).
Jingning Han [Thu, 16 Jun 2016 22:18:46 +0000 (15:18 -0700)]
Skip restore token_cache value
The trellis optimization is going backward. Hence there is no need
to restore the token_cache values that is behind the current node
in the scan order.
Zoe Liu [Thu, 16 Jun 2016 16:41:30 +0000 (09:41 -0700)]
Disable the unit test of ArfFreq for BIDIR_PRED
The test in arf_freq assumes any no-show frame as ALTREF_FRAME and
then calculate the minimum run between two consecutive ALTREF_FRAME's
based on this assumption. As BWDREF_FRAME is also a no-show frame and
the minimum run between two consecutive BWDREF_FRAME's may vary
between 1 and any arbitrary positive number as long as it does not
exceed the golden frame group interval, this test does not apply to
the experiment of BIDIR_PRED.
Geza Lore [Thu, 16 Jun 2016 16:13:55 +0000 (17:13 +0100)]
Change supertx syntax order.
Move the supertx skip bit and transform type past the recursive
prediction blocks. This is in preparation for using the segment level
skip feature for supertx blocks.
Geza Lore [Thu, 16 Jun 2016 10:08:14 +0000 (11:08 +0100)]
Use correct size load in vpx_avg_4x4_sse2.
The old version used 64 bit loads, and then ignored the top half
of the result. This can cause asan failures if we read past the end
of a buffer. Switched to using 32 bit loads instead.