Yi Luo [Tue, 14 Jun 2016 00:01:17 +0000 (17:01 -0700)]
Convolution horizontal filter SSSE3 optimization
- Apply signal direction/4-pixel vertical/8-pixel vertical
parallelism.
- Add unit test to verify the bit exact result.
- Overall encoding time improves ~24% on Xeon E5-2680 CPU.
Jingning Han [Fri, 17 Jun 2016 23:23:32 +0000 (16:23 -0700)]
Fix unit test failure in obmc exp
Properly restore the rate cost in the inner search loop of obmc
prediction. This avoids unexpected encoding behavior. It fixes
the unit test failure in obmc experiment:
Zoe Liu [Wed, 8 Jun 2016 21:27:56 +0000 (14:27 -0700)]
Merge bi-predictive frames to EXT_REFS
This patch removed the experiment of BIDIR_PRED and merged the feature
into the experiment of EXT_REFS:
(1) Each frame now has up to 6 reference frames, namely
LAST_FRAME, LAST2_FRAME, LAST3_FRAME, GOLDEN_FRAME, (forward) and
BWDREF_FRAME, ALTREF_FRAME (backward);
LAST4_FRAME has been removed;
(2) First pass still keeps the 8 updates:
KF_UPDATE, LF_UPDATE, GF_UPDATE, ARF_UPDATE, OVERLAY_UPDATE, and
BRF_UPDATE, LAST_BIPRED_UPDATE, BI_PRED_UPDATE;
(3) show_existing_frame==1 is supported in the experiment of EXT_REFS;
(4) New encoding modes are added for both single-ref and compound cases,
through the use of the 2 extra forward references (LAST2 & LAST3)
and the 1 extra backward reference (BWDREF).
RD performance wise, using Overall PSNR: Avg/BDRate
Bipred only Prev EXT_REFS Current EXT_REFS with bipred
lowres: -3.474/-3.324 -1.748/-1.586 -4.613/-4.387
derflr: -2.097/-1.353 -1.439/-1.215 -3.120/-2.252
midres: -2.129/-1.901 -1.345/-1.185 -2.898/-2.636
If in vp10/encoder/firstpass.h, change BFG_INTERVAL from 2 to 3, i.e. to
use 2 bi-predictive frames than 1, a further improvement may be
obtained:
Current EXT_REFS with bipred
1 bi-predictive frame 2 bi-predictive frames
lowres: -4.613/-4.387 -4.675/-4.465
derflr: -3.120/-2.252 -3.333/-2.516
midres: -2.898/-2.636 -3.406/-3.095
Geza Lore [Fri, 17 Jun 2016 10:28:02 +0000 (11:28 +0100)]
Make variance based partitioning compatible with SEG_LVL_SKIP
Inter blocks that have SEG_LVL_SKIP active must be at least 8x8 in
size for bitstream conformance (see read_inter_block_mode_info in
decodemv.c).
This patch makes the variance based partitioning scheme stop at 8x8
blocks in inter frames. This satisfies the SEG_LVL_SKIP constraint
and is more in line with the original implementation of this function
(before it got extended for 128x128 superblocks).
Jingning Han [Thu, 16 Jun 2016 22:18:46 +0000 (15:18 -0700)]
Skip restore token_cache value
The trellis optimization is going backward. Hence there is no need
to restore the token_cache values that is behind the current node
in the scan order.
Zoe Liu [Thu, 16 Jun 2016 16:41:30 +0000 (09:41 -0700)]
Disable the unit test of ArfFreq for BIDIR_PRED
The test in arf_freq assumes any no-show frame as ALTREF_FRAME and
then calculate the minimum run between two consecutive ALTREF_FRAME's
based on this assumption. As BWDREF_FRAME is also a no-show frame and
the minimum run between two consecutive BWDREF_FRAME's may vary
between 1 and any arbitrary positive number as long as it does not
exceed the golden frame group interval, this test does not apply to
the experiment of BIDIR_PRED.
Geza Lore [Thu, 16 Jun 2016 16:13:55 +0000 (17:13 +0100)]
Change supertx syntax order.
Move the supertx skip bit and transform type past the recursive
prediction blocks. This is in preparation for using the segment level
skip feature for supertx blocks.
Geza Lore [Thu, 16 Jun 2016 10:08:14 +0000 (11:08 +0100)]
Use correct size load in vpx_avg_4x4_sse2.
The old version used 64 bit loads, and then ignored the top half
of the result. This can cause asan failures if we read past the end
of a buffer. Switched to using 32 bit loads instead.
Jingning Han [Fri, 10 Jun 2016 03:44:00 +0000 (20:44 -0700)]
Refactor trellis optimization process
This commit refactors the trellis coefficient optimization process.
It saves multiplications used to generate the final dequantized
coefficients. It also removes two memset operations on quantized
and dequantized coefficient sets.
The trellis coefficient optimization is on average running over
10% faster.
Jingning Han [Thu, 9 Jun 2016 18:20:26 +0000 (11:20 -0700)]
Rework transform quantization pipeline
This commit reworks the transform and quantization unit. It enables
the use of adaptive quantization for intra modes. This further
improves the compression performance:
lowres 0.36%
midres 0.79%
hdres 0.73%
The key frame coding performance is improved:
lowres 1.7%
midres 1.9%
hdres 3.3%
Geza Lore [Tue, 14 Jun 2016 12:41:20 +0000 (13:41 +0100)]
Disable loop restoration when LPF_PICK_MINIMAL_LPF.
The speed feature sf->lpf_picl == LPF_PICK_MINIMAL_LPF is used
to disable loop filtering. This did not work with the loop-restoration
experiment, but now it is respected.
Note that this speed feature is only used in real-time cpu-used >= 8
settings.
Geza Lore [Tue, 14 Jun 2016 12:12:45 +0000 (13:12 +0100)]
Remove magic number from traversal (CYCLIC_REFRESH_AQ).
mi->stride now depends on the maximum superblock size, and hence
the constant 8 padding is no longer appropriate. Traverse the array
using mi->stride instead.
Geza Lore [Fri, 10 Jun 2016 08:32:21 +0000 (09:32 +0100)]
Select segment based loopfilter strength for supertx blocks.
Segment based loopfilter strength for supertx coded blocks is now
selected based on the minimum of all segment IDs within a supertx
coded block (same as the quantiser settings).
Geza Lore [Thu, 9 Jun 2016 14:12:27 +0000 (15:12 +0100)]
Refactor variance aq.
Explicitly signal when the segment map is being refreshed when
using VARIANE_AQ. This simplifies decisions about when the segment id
needs to be set from the previous segment map vs based on the current
variance.
Jingning Han [Mon, 13 Jun 2016 19:08:14 +0000 (12:08 -0700)]
Make tx_type speed feature default
Revisit the compression performance and complexity trade-off after
making the SIMD version of trellis optimizations. Before that,
reduce the transform-quantization function calls temporarily. This
would cause about 0.3% performance drop for lowres set.