Jingning Han [Wed, 1 Apr 2015 00:46:41 +0000 (17:46 -0700)]
Refactor block_yrd function for RTC coding mode
This commit separates Hadamard transform/quantization operations
from rate and distortion computation in block_yrd. This allows one
to skip SATD computation when all transform blocks are quantized
to zero. It also uses a new block error function that skips
repeated computation of sum of squared residuals. It reduces the
CPU cycles spent on block error calculation in block_yrd by 40%.
Jingning Han [Wed, 1 Apr 2015 18:39:36 +0000 (11:39 -0700)]
Optimize quantization simd implementation
This commit allows the quantizer to compare the AC coefficients to
the quantization step size to determine if further multiplication
operations are needed. It makes the quantization process 20% faster
without coding statistics change.
Yunqing Wang [Wed, 25 Mar 2015 21:19:29 +0000 (14:19 -0700)]
Enhance the transform skipping decision-making in non-rd mode
For large partition blocks(block_size > 32x32), the variance
calculation is modified so that every 8x8 block's variance
is stored during the calculation, which is used in the
following transform skipping test. Also, the variance for
every tx block is calculated. The skipping test checks all tx
blocks in the partition, and sets the skip flag only if all tx
blocks are skippable. If the skip flag of Y plane is 1, a
quick evaluation is done on UV planes. If the current partition
block is skippable in YUV planes, the mode search checks fewer
inter modes and doesn't check intra modes.
The rtc set borg test(at speed 6) showed that:
Overall psnr: -0.527%; Avg psnr: -0.510%; ssim: -0.573%.
Average single-thread speedup on rtc set was 3.5%.
For 720p clips, more speedups were seen.
gipsrecmotion: 13%
gipsrestat: 12%
vidyo: 5 - 9%
dark: 15%
niklas: 6%
Jingning Han [Tue, 31 Mar 2015 17:57:41 +0000 (10:57 -0700)]
Tuning SATD rate calculation for speed
This commit allows the encoder to check the eob per transform
block to decide how to compute the SATD rate cost. If the entire
block is quantized to zero, there is no need to add anything; if
only the DC coefficient is non-zero, add its absolute value;
otherwise, sum over the block. This reduces the CPU cycles spent
on vp9_satd_sse2 to one third.
Jingning Han [Tue, 31 Mar 2015 16:25:21 +0000 (09:25 -0700)]
Allow block skip coding option in RTC mode
When the estimated rate-distortion cost of skip coding mode is
lower than that of sending quantized coefficients, allow the
encoder to drop these coefficients. This improves the compression
performance of speed -6 by 0.268% and makes the encoding speed
slightly faster.
Jingning Han [Mon, 30 Mar 2015 19:31:46 +0000 (12:31 -0700)]
Enable 16x16 Hadamard transform in SATD based mode decision
This commit replaces the 16x16 2D-DCT transform with Hadamard
transform for RTC coding mode. It reduces the CPU cycles cost
on 16x16 transform by 5X. Overall it makes the speed -6 encoding
speed 1.5% faster without compromise on compression performance.
Jingning Han [Mon, 30 Mar 2015 18:09:29 +0000 (11:09 -0700)]
Use SATD based mode decision for block sizes below 16x16
This commit makes the encoder to select between SATD/variance as
metric for mode decision. It also allows to account chroma
component costs for mode decision as well. The overall encoding
time increase as compared to variance based mode selection is about
15% for speed -6. The compression performance is on average 2.2%
better than variance based approach, with about 5% compression
performance gains for hard clips (e.g., jimredvga, nikas720p, and
mmmoving) at lower bit-rate range.
Jingning Han [Mon, 23 Mar 2015 17:02:42 +0000 (10:02 -0700)]
Hadamard transform based coding mode decision process
This commit uses Hadamard transform based rate-distortion cost
estimate for rtc coding mode decision. It improves the compression
performance of speed -6 for many hard clips at lower bit-rates.
For example, 5.5% for jimredvga, 6.7% for mmmoving, 6.1% for
niklas720p. This will introduce extra encoding cycle costs at
this point.
webmdec: Fix read_frame return value for calls after EOS
webm_read_frame assumes that it won't be called once end of file
is reached. But for frame parallel mode that turns out to be not
true. this patch fixes that behavior by checking for EOS and
returning the appropriate value for subsequent calls.
Adrian Grange [Tue, 24 Mar 2015 15:55:35 +0000 (08:55 -0700)]
Fix use of scaling in joint motion search
To enable us to the scale-invariant motion estimation
code during mode selection, each of the reference
buffers is scaled to match the size of the frame
being encoded.
This fix ensures that a unit scaling factor is used in
this case rather than the one calculated assuming that
the reference frame is not scaled.
paulwilkins [Mon, 23 Mar 2015 15:57:09 +0000 (15:57 +0000)]
Enable group adaptive max q by default.
Set the GF group adaptive max Q compile flag to 1 by default.
This change has a quite big visual impact in some clips and also
contributes to tighter rate control.
For short test clips that have consistent content the impact is
quite small on metrics but for more varied long form clips there is
a drop in overal psnr but a sharp rise in average psnr caused by
greater expenditure on some easier sections and tighter rate clipping
in hard sections.
In chunck'ed encodes some of the effect will already be present due
to the independent rate control in each chunk but this change takes
the control down to a smaller scale.
paulwilkins [Fri, 6 Mar 2015 17:21:36 +0000 (17:21 +0000)]
Experimental rd bias based on source vs recon variance.
This experiment biases the rd decision based on the impact
a mode decision has on the relative spatial complexity of the
reconstruction vs the source.
The aim is to better retain a semblance of texture even if it
is slightly misaligned / wrong, rather than use a simple rd
measure that tends to favor use of a flat predictor if a perfect
match can't be found.
This improves the appearance of texture and visual quality
on specific test clips but is hidden under a flag and currently
off by default pending visual quality testing on a wider Yt set.
Adrian Grange [Thu, 19 Mar 2015 21:41:43 +0000 (14:41 -0700)]
Restore first ref frame pointer to the correct value
The joint_motion_search function alternates prediction
between two reference frames. In order to reuse existing
code, a pointer to the appropriate reference frame is
written into xd->plane[0].pre[0], that the motion
estimation code assumes points to the reference frame.
If this first reference frame was scaled then the
pointer was incorrectly being reset to point to the
unscaled reference frame rather than the scaled
version.
Yunqing Wang [Thu, 19 Mar 2015 19:28:24 +0000 (12:28 -0700)]
vp8: fix a bug in the internal PSNR calculation
While CONFIG_INTERNAL_STATS=1, PSNR is calculated while encoding.
The aligned width/height were used mistakenly in the calculation.
This patch fixed it, and used the orignal image width/height.
James Zern [Thu, 19 Mar 2015 19:20:14 +0000 (12:20 -0700)]
put spatial svc behind an ABI check
this removes the CONFIG_* checks from public headers, but means
'--enable-experimental --enable-spatial-svc' builds will fail without a
local change to the ABI in vpx_encoder.h. this should be all right for
testing this experiment.
Jingning Han [Wed, 18 Mar 2015 01:40:40 +0000 (18:40 -0700)]
Speed up non-rd mode decision search
This commit makes the encoder to explicitly calculate the SAD
associated with the LAST_FRAME motion vector and compare it to
that of the GOLDEN_FRAME given by integral projection motion
estimation. It skips the expensive sub-pixel motion search over
GOLDEN_FRAME when the LAST_FRAME can provide fairly good motion
compensated prediction quality.
For dark720p speed -6 single thread goes from
33304 b/f, 40.070 dB, 18156 ms ->
33319 b/f, 40.061 dB, 17611 ms