Zoe Liu [Thu, 16 Jun 2016 16:41:30 +0000 (09:41 -0700)]
Disable the unit test of ArfFreq for BIDIR_PRED
The test in arf_freq assumes any no-show frame as ALTREF_FRAME and
then calculate the minimum run between two consecutive ALTREF_FRAME's
based on this assumption. As BWDREF_FRAME is also a no-show frame and
the minimum run between two consecutive BWDREF_FRAME's may vary
between 1 and any arbitrary positive number as long as it does not
exceed the golden frame group interval, this test does not apply to
the experiment of BIDIR_PRED.
Geza Lore [Thu, 16 Jun 2016 16:13:55 +0000 (17:13 +0100)]
Change supertx syntax order.
Move the supertx skip bit and transform type past the recursive
prediction blocks. This is in preparation for using the segment level
skip feature for supertx blocks.
Geza Lore [Thu, 16 Jun 2016 10:08:14 +0000 (11:08 +0100)]
Use correct size load in vpx_avg_4x4_sse2.
The old version used 64 bit loads, and then ignored the top half
of the result. This can cause asan failures if we read past the end
of a buffer. Switched to using 32 bit loads instead.
Jingning Han [Fri, 10 Jun 2016 03:44:00 +0000 (20:44 -0700)]
Refactor trellis optimization process
This commit refactors the trellis coefficient optimization process.
It saves multiplications used to generate the final dequantized
coefficients. It also removes two memset operations on quantized
and dequantized coefficient sets.
The trellis coefficient optimization is on average running over
10% faster.
Jingning Han [Thu, 9 Jun 2016 18:20:26 +0000 (11:20 -0700)]
Rework transform quantization pipeline
This commit reworks the transform and quantization unit. It enables
the use of adaptive quantization for intra modes. This further
improves the compression performance:
lowres 0.36%
midres 0.79%
hdres 0.73%
The key frame coding performance is improved:
lowres 1.7%
midres 1.9%
hdres 3.3%
Geza Lore [Tue, 14 Jun 2016 12:41:20 +0000 (13:41 +0100)]
Disable loop restoration when LPF_PICK_MINIMAL_LPF.
The speed feature sf->lpf_picl == LPF_PICK_MINIMAL_LPF is used
to disable loop filtering. This did not work with the loop-restoration
experiment, but now it is respected.
Note that this speed feature is only used in real-time cpu-used >= 8
settings.
Geza Lore [Tue, 14 Jun 2016 12:12:45 +0000 (13:12 +0100)]
Remove magic number from traversal (CYCLIC_REFRESH_AQ).
mi->stride now depends on the maximum superblock size, and hence
the constant 8 padding is no longer appropriate. Traverse the array
using mi->stride instead.
Geza Lore [Fri, 10 Jun 2016 08:32:21 +0000 (09:32 +0100)]
Select segment based loopfilter strength for supertx blocks.
Segment based loopfilter strength for supertx coded blocks is now
selected based on the minimum of all segment IDs within a supertx
coded block (same as the quantiser settings).
Geza Lore [Thu, 9 Jun 2016 14:12:27 +0000 (15:12 +0100)]
Refactor variance aq.
Explicitly signal when the segment map is being refreshed when
using VARIANE_AQ. This simplifies decisions about when the segment id
needs to be set from the previous segment map vs based on the current
variance.
Jingning Han [Mon, 13 Jun 2016 19:08:14 +0000 (12:08 -0700)]
Make tx_type speed feature default
Revisit the compression performance and complexity trade-off after
making the SIMD version of trellis optimizations. Before that,
reduce the transform-quantization function calls temporarily. This
would cause about 0.3% performance drop for lowres set.
Jingning Han [Wed, 8 Jun 2016 00:54:20 +0000 (17:54 -0700)]
Trellis based adaptive quantization
This commit combines uniform quantizer with trellis based coefficient
level optimization. It improves the codebase compression performance:
lowres 0.8%
midres 1.0%
hdres 1.6%
Note that the current trellis optimization unit is using C code. This
will make the cost of the overall quantization process slower. A number
of optimizations will come up next.
Sarah Parker [Tue, 10 May 2016 22:32:42 +0000 (15:32 -0700)]
Move new quant experiment from nextgen
This experiment implements non-uniform quantization where
the width of the bins increases gradually to more closely
match a laplacian distribution of the coeficcients.
Jingning Han [Wed, 8 Jun 2016 16:49:03 +0000 (09:49 -0700)]
Remove swap buffer speed feature
The inter prediction residual can undergo different transform types
during the rate-distortion optimization search. The assumption used
in this speed feature no longer holds true. This commit removes the
related code to clean up the codebase and clear out unit test
failure in higher speed setting.
Jingning Han [Mon, 6 Jun 2016 21:58:50 +0000 (14:58 -0700)]
Rework the tx type speed feature
This commit re-works the transform type speed feature. It moves
the transform type selection outside of the coding mode loop. This
avoids repeated motion search if the best prediction mode is
chosen as NEWMV. It improves the speed performance for clips that
contain more motion activities.
For mobile_cif at 1000 kbps, this makes the baseline encoding 7%
faster and makes the encoding with dynamic motion vector referencing
scheme enabled 10% faster.
Zoe Liu [Fri, 3 Jun 2016 23:03:00 +0000 (16:03 -0700)]
Fix a RD performance bug in bipredictive frames
This patch will make sure the use of the BWDREF_FRAME for the
encoding of both the two types of bipredictive frames, namely
LAST_BIPRED_UPDATE and BIPRED_UPDATE. To realize it, the
updates on the cpi->ref_frame_flags have been moved to before
the encoding of one frame, instread of originally handled after
the encoding of one frame.
RD performance has been improved slightly, approximately by 0.17%
compared to before the applying of this patch:
Geza Lore [Tue, 7 Jun 2016 16:02:03 +0000 (17:02 +0100)]
Zero segment counter before accumulating.
The segment counts are computed as part of packing the bitstream,
so they might have been computed already in the recode loop. Zero
the accumulator to avoid double counting.