Jingning Han [Mon, 13 Jun 2016 19:08:14 +0000 (12:08 -0700)]
Make tx_type speed feature default
Revisit the compression performance and complexity trade-off after
making the SIMD version of trellis optimizations. Before that,
reduce the transform-quantization function calls temporarily. This
would cause about 0.3% performance drop for lowres set.
Jingning Han [Wed, 8 Jun 2016 00:54:20 +0000 (17:54 -0700)]
Trellis based adaptive quantization
This commit combines uniform quantizer with trellis based coefficient
level optimization. It improves the codebase compression performance:
lowres 0.8%
midres 1.0%
hdres 1.6%
Note that the current trellis optimization unit is using C code. This
will make the cost of the overall quantization process slower. A number
of optimizations will come up next.
Sarah Parker [Tue, 10 May 2016 22:32:42 +0000 (15:32 -0700)]
Move new quant experiment from nextgen
This experiment implements non-uniform quantization where
the width of the bins increases gradually to more closely
match a laplacian distribution of the coeficcients.
Jingning Han [Wed, 8 Jun 2016 16:49:03 +0000 (09:49 -0700)]
Remove swap buffer speed feature
The inter prediction residual can undergo different transform types
during the rate-distortion optimization search. The assumption used
in this speed feature no longer holds true. This commit removes the
related code to clean up the codebase and clear out unit test
failure in higher speed setting.
Jingning Han [Mon, 6 Jun 2016 21:58:50 +0000 (14:58 -0700)]
Rework the tx type speed feature
This commit re-works the transform type speed feature. It moves
the transform type selection outside of the coding mode loop. This
avoids repeated motion search if the best prediction mode is
chosen as NEWMV. It improves the speed performance for clips that
contain more motion activities.
For mobile_cif at 1000 kbps, this makes the baseline encoding 7%
faster and makes the encoding with dynamic motion vector referencing
scheme enabled 10% faster.
Zoe Liu [Fri, 3 Jun 2016 23:03:00 +0000 (16:03 -0700)]
Fix a RD performance bug in bipredictive frames
This patch will make sure the use of the BWDREF_FRAME for the
encoding of both the two types of bipredictive frames, namely
LAST_BIPRED_UPDATE and BIPRED_UPDATE. To realize it, the
updates on the cpi->ref_frame_flags have been moved to before
the encoding of one frame, instread of originally handled after
the encoding of one frame.
RD performance has been improved slightly, approximately by 0.17%
compared to before the applying of this patch:
Geza Lore [Tue, 7 Jun 2016 16:02:03 +0000 (17:02 +0100)]
Zero segment counter before accumulating.
The segment counts are computed as part of packing the bitstream,
so they might have been computed already in the recode loop. Zero
the accumulator to avoid double counting.
Aamir Anis [Wed, 1 Jun 2016 05:50:24 +0000 (22:50 -0700)]
Updated loop restoration
1. Wiener restoration filter now has normalization and evaluation of
quantization procedure.
2. Corrected scaling of bits in RD cost computation.
3. Changed dynamic range and number of bits for Wiener filter.
Observed gains: Overall 0.58% for low_res, 0.7% for mid_res sequences.
Geza Lore [Wed, 1 Jun 2016 13:19:01 +0000 (14:19 +0100)]
Optimize wedge partition selection.
We can optimize wedge partition selection by pre-computing the
residuals of the 2 underlying predictors, and then blend these
to compute the sse of the compound predictor, without actually
having to compute and subtract the compound predictor.
Similarly we can pre-compute a proxy array which we can use to
cheaply check which mask sign would have lower sse.
Details are in wedge_utils.c.
Mathematically these are equivalence transformations, but due to the
finite precision the encoder output will be perturbed, though on
average this should make 0% difference.
Geza Lore [Fri, 3 Jun 2016 09:54:49 +0000 (10:54 +0100)]
Fix decoder crash with supertx
xd->plane[0].n4_h and xd->plane[0].n4_w are not set at that point
when using supertx.
While this fixes the immediate crash described in the referenced
bug report, there are still issues in the ref-mv experiment that
causes these tests to fail, so they are kept disabled.
Geza Lore [Thu, 12 May 2016 13:03:28 +0000 (14:03 +0100)]
Always include the cost of tx size in rate for Y.
The transform can only be skipped if both Y and U/V can be skipped, so
we always include the cost of tx size in the rate for Y. This will
get later subtracted if the transform is actually skipped.
Geza Lore [Thu, 12 May 2016 13:55:56 +0000 (14:55 +0100)]
Compute cost of UV mode accurately for intra blocks.
We used to cache the cost of the UV mode from the search with a
different previously tried Y mode, but the UV mode is contexted
on the Y mode, so caching the cost is inaccurate.
Geza Lore [Mon, 30 May 2016 00:35:35 +0000 (01:35 +0100)]
Pre-compute and use contiguous wedge masks.
This is purely a refactoring patch and has no functional effect.
Uses of these masks can be arranged such that all input blocks are
contiguous in memory (stride == block width). In this case 1D versions
of operations can be used. 1D vector operations have superior performance
over 2D block equivalents as they are more processor cache friendly and
they can do away with a second loop overhead.
hui su [Sat, 21 May 2016 00:40:00 +0000 (17:40 -0700)]
Add a speed feature for intra tx type search
Add a speed feature to seperate prediction mode and tx type search
for intra modes: search for best intra prediction mode with fixed
default tx type first, then choose the best tx type for the
selected mode.
Zoe Liu [Wed, 25 May 2016 18:57:15 +0000 (11:57 -0700)]
Make the bi-predictive frame group interval adjustable
This is for the bidir-pred experiment. Previously the length of the
bi-predictive frame group interval is fixed at 2, i.e. one
bi-predictive frame may be inserted every other frame. This patch
makes the length adjustable, i.e. any positive number may be
specified, but the use of the backward ref will be turned off if the
bi-predictive frame group interval is larger than the golden frame
group.
Further, an additional rate factor level has been added:
INTER_LOW
, which applies to LAST_BIPRED_UPDATE frames that are not used as
references.