Jingning Han [Sat, 20 Jun 2015 00:31:34 +0000 (17:31 -0700)]
Use balanced model for intra prediction mode coding
This commit replaces the previous table based intra mode model
coding with a more balanced entropy coding system. It reduces the
decoder lookup table size by 1K bytes. The key frame compression
performance is about even on average. There are a few points where
the compression performance is improved by over 5%. Most test
points are fairly close to the lookup table approach.
Jingning Han [Thu, 11 Jun 2015 23:05:04 +0000 (16:05 -0700)]
Skip redundant flag reset
If the skip flag is already on, there is no need to further check
the all zero block case. This improves encoding speed at no coding
statistics change.
Jingning Han [Wed, 10 Jun 2015 19:55:04 +0000 (12:55 -0700)]
Refactor transform block partition entropy coding
This commit refactors the transform block partition entropy
coding process to improve the encoding speed. There is no change
in the compression statistics.
Jingning Han [Tue, 28 Apr 2015 19:16:51 +0000 (12:16 -0700)]
Account for context information for partition rate estimate
This commit allows the encoder to account for the boundary block
information to estimate the transform block partitiion rate cost
in the rate-distortion optimization scheme.
Jingning Han [Wed, 3 Jun 2015 01:45:29 +0000 (18:45 -0700)]
Fix rate estimate issue in transform block partition coding
This commit fixes the over count issue in the recursive transform
block partition rate cost estimation. It improves the compression
performance by about 0.45%.
Jingning Han [Wed, 22 Apr 2015 05:22:08 +0000 (22:22 -0700)]
Enable rate-distortion optimization for transform partition
This commit enables the rate-distortion optimization for recursive
transform block partition for inter mode blocks based on luma
component. The chroma component infers the transform block size
decision from those of luma component.
Jingning Han [Tue, 21 Apr 2015 22:34:39 +0000 (15:34 -0700)]
Make chroma component RD estimate support transform partition
This commit makes the rate-distortion estimation of the chroma
components support the recursive transform block partition
inferred from the luma component mode decisions.
Jingning Han [Tue, 7 Apr 2015 19:39:54 +0000 (12:39 -0700)]
Compute prediction filter type cost only when needed
Skip redundant prediction filter type cost in filter search loop,
if the rate value will be reset in Hadamard transform based rate
distortion estimate.
Jingning Han [Fri, 3 Apr 2015 18:33:24 +0000 (11:33 -0700)]
Enable Hadamard transform based cost estimate for all block sizes
This commit turns on the Hadamard transform based rate distortion
estimate for all block sizes in RTC coding mode. It conditionally
skips the rate distortion estimation if all zero block flag is set
on. No significant encoding speed change is observed. The
compression performance of speed -6 is improved by 1.7% over using
it only for block sizes of 32x32 and below.
Jingning Han [Fri, 3 Apr 2015 16:20:25 +0000 (09:20 -0700)]
Account for eob cost in the RTC mode decision process
This commit accounts for the transform block end of coefficient flag
cost in the RTC mode decision process. This allows a more precise
rate estimate. It also turns on the model to block sizes up to 32x32.
The test sequences shows about 3% - 5% speed penalty for speed -6.
The average compression performance improvement for speed -6 is
1.58% in PSNR. The compression gains for hard clips like jimredvga,
mmmoving, and tacomascmv at low bit-rate range are 1.8%, 2.1%, and
3.2%, respectively.
Jingning Han [Wed, 1 Apr 2015 00:46:41 +0000 (17:46 -0700)]
Refactor block_yrd function for RTC coding mode
This commit separates Hadamard transform/quantization operations
from rate and distortion computation in block_yrd. This allows one
to skip SATD computation when all transform blocks are quantized
to zero. It also uses a new block error function that skips
repeated computation of sum of squared residuals. It reduces the
CPU cycles spent on block error calculation in block_yrd by 40%.
Jingning Han [Wed, 1 Apr 2015 18:39:36 +0000 (11:39 -0700)]
Optimize quantization simd implementation
This commit allows the quantizer to compare the AC coefficients to
the quantization step size to determine if further multiplication
operations are needed. It makes the quantization process 20% faster
without coding statistics change.
Yunqing Wang [Wed, 25 Mar 2015 21:19:29 +0000 (14:19 -0700)]
Enhance the transform skipping decision-making in non-rd mode
For large partition blocks(block_size > 32x32), the variance
calculation is modified so that every 8x8 block's variance
is stored during the calculation, which is used in the
following transform skipping test. Also, the variance for
every tx block is calculated. The skipping test checks all tx
blocks in the partition, and sets the skip flag only if all tx
blocks are skippable. If the skip flag of Y plane is 1, a
quick evaluation is done on UV planes. If the current partition
block is skippable in YUV planes, the mode search checks fewer
inter modes and doesn't check intra modes.
The rtc set borg test(at speed 6) showed that:
Overall psnr: -0.527%; Avg psnr: -0.510%; ssim: -0.573%.
Average single-thread speedup on rtc set was 3.5%.
For 720p clips, more speedups were seen.
gipsrecmotion: 13%
gipsrestat: 12%
vidyo: 5 - 9%
dark: 15%
niklas: 6%
Jingning Han [Tue, 31 Mar 2015 17:57:41 +0000 (10:57 -0700)]
Tuning SATD rate calculation for speed
This commit allows the encoder to check the eob per transform
block to decide how to compute the SATD rate cost. If the entire
block is quantized to zero, there is no need to add anything; if
only the DC coefficient is non-zero, add its absolute value;
otherwise, sum over the block. This reduces the CPU cycles spent
on vp9_satd_sse2 to one third.
Jingning Han [Tue, 31 Mar 2015 16:25:21 +0000 (09:25 -0700)]
Allow block skip coding option in RTC mode
When the estimated rate-distortion cost of skip coding mode is
lower than that of sending quantized coefficients, allow the
encoder to drop these coefficients. This improves the compression
performance of speed -6 by 0.268% and makes the encoding speed
slightly faster.
Jingning Han [Mon, 30 Mar 2015 19:31:46 +0000 (12:31 -0700)]
Enable 16x16 Hadamard transform in SATD based mode decision
This commit replaces the 16x16 2D-DCT transform with Hadamard
transform for RTC coding mode. It reduces the CPU cycles cost
on 16x16 transform by 5X. Overall it makes the speed -6 encoding
speed 1.5% faster without compromise on compression performance.
Jingning Han [Mon, 30 Mar 2015 18:09:29 +0000 (11:09 -0700)]
Use SATD based mode decision for block sizes below 16x16
This commit makes the encoder to select between SATD/variance as
metric for mode decision. It also allows to account chroma
component costs for mode decision as well. The overall encoding
time increase as compared to variance based mode selection is about
15% for speed -6. The compression performance is on average 2.2%
better than variance based approach, with about 5% compression
performance gains for hard clips (e.g., jimredvga, nikas720p, and
mmmoving) at lower bit-rate range.
Jingning Han [Mon, 23 Mar 2015 17:02:42 +0000 (10:02 -0700)]
Hadamard transform based coding mode decision process
This commit uses Hadamard transform based rate-distortion cost
estimate for rtc coding mode decision. It improves the compression
performance of speed -6 for many hard clips at lower bit-rates.
For example, 5.5% for jimredvga, 6.7% for mmmoving, 6.1% for
niklas720p. This will introduce extra encoding cycle costs at
this point.
webmdec: Fix read_frame return value for calls after EOS
webm_read_frame assumes that it won't be called once end of file
is reached. But for frame parallel mode that turns out to be not
true. this patch fixes that behavior by checking for EOS and
returning the appropriate value for subsequent calls.
Adrian Grange [Tue, 24 Mar 2015 15:55:35 +0000 (08:55 -0700)]
Fix use of scaling in joint motion search
To enable us to the scale-invariant motion estimation
code during mode selection, each of the reference
buffers is scaled to match the size of the frame
being encoded.
This fix ensures that a unit scaling factor is used in
this case rather than the one calculated assuming that
the reference frame is not scaled.