Improves entropy coding to send the filter type only if
the motion vectors have subpel components.
Adds one new 8-tap smooth filter, and tweaks the others.
derflr: +0.695%
hevcmr: +0.305%
About 5% encode slowdown. No visible impact for decoding.
Also makes the interpolation framework flexible to support both
strictly interpolating filters as well as non-interpolating
filters that filter integer offsets. This is mainly for
further experimentation and if not found useful the code will
be removed.
Marco [Fri, 6 Nov 2015 00:00:15 +0000 (16:00 -0800)]
vp9: Updates to noise estimation.
Add threshold/condition on spatial_variance and brightness level.
Modification to normalization of block variance.
Change resolution limit below which we disable noise estimation.
Geza Lore [Wed, 28 Oct 2015 14:35:04 +0000 (14:35 +0000)]
Add AVX vectorized vp9_diamond_search_sad
This function now has an AVX intrinsics version which is about 80%
faster compared to the C implementation. This provides a 2-4% total
speed-up for encode, depending on encoding parameters. The function
utilizes 3 properties of the cost function lookup table, constructed
in 'cal_nmvjointsadcost' and 'cal_nmvsadcosts'.
For the joint cost:
- mvjointsadcost[1] == mvjointsadcost[2] == mvjointsadcost[3]
For the component costs:
- For all i: mvsadcost[0][i] == mvsadcost[1][i]
(equal per component cost)
- For all i: mvsadcost[0][i] == mvsadcost[0][-i]
(Cost function is even)
These must hold, otherwise the AVX version of the function cannot be used.
Jingning Han [Tue, 3 Nov 2015 20:59:24 +0000 (12:59 -0800)]
Simplify txfm rate-distortion optimization
This commit refactors the rate-distortion optimization scheme for
transform block coding. When both ext-tx and var-tx experiments
are turned on, the encoding time for bus_cif at 1000 kbps goes down
from 706377 ms to 666503 ms (5.6% speed-up). The coding statics
remain unchanged.
Geza Lore [Wed, 4 Nov 2015 14:56:34 +0000 (14:56 +0000)]
Flip the result of the inverse transform for FLIPADST.
When using FLIPADST, the vp10_inv_txfm_add functions used to flip
the destination array, add the result of the inverse transform, to it
and then flip the destination back. This has been replaced by
flipping the result of the inverse transform before adding it to the
destination. Up-Down flipping is done by negating the destination
stride, and staring from the bottom, so it should now be free.
Left-right flipping is done with the usual SSE2 instructions in the
optimized code.
The C functions match the SSE2 functions as expected, so the C functions
now do the flipping as well when required. Adding this cleanly required
some refactoring of the C functions, but there is no measurable
performance impact when ext-tx is not enabled.
hui su [Wed, 7 Oct 2015 16:29:02 +0000 (09:29 -0700)]
ext-intra experiment
Currently there are two parts in this experiment: extra directional intra
prediction modes and the filter intra modes migrated from the nextgen branch.
Several macros are defined in "blockd.h" to provide controls of the experiment
settings. Setting "DR_ONLY" as 1 (default is 0) means we only use directional
modes, and skip the filter-intra modes; "EXT_INTRA_ANGLES" (default is 128)
defines the number of different angles we want to support; setting
"ANGLE_FAST_SEARCH" as 1 (default is 1) means we use fast sub-optimal search
for the best prediction angle, instead of exhaustive search. The fast search
is about 6 times faster than the exhaustive search, while preserving about
60% of the coding gains.
With extra directional prediction modes (fast search), we observe the following
code gains (number in parentheses is for all-key-frame setting):
derflr +0.42% (+1.79%)
hevclr +0.78% (+2.19%)
hevcmr +1.20% (+3.49%)
stdhd +0.56%
Speed-wise, about 110% slower for key frames, and 30% slower overall.
The gains of filter intra modes mostly add up with the gains of directional
modes. The overall coding gain of this experiment:
derflr +0.94%
hevclr +1.46%
hevcmr +1.94%
stdhd +1.58%
Marco [Fri, 30 Oct 2015 18:51:06 +0000 (11:51 -0700)]
Bias against non-zero mv for large blocks.
Change is only for real-time mode, speed > 5, and non-screen content mode.
Bias is based on block size and motion vector level (motion above some threshold).
Helps to improves stability in background from lightning changes.
PSNR/SSIM metrics on RTC set almost no change/neutral (within +/- 0.1).
Geza Lore [Tue, 3 Nov 2015 11:10:20 +0000 (11:10 +0000)]
Eliminate copying for FLIPADST in fwd transforms.
This patch eliminates the copying of data when using FLIPADST forward
transforms, by incorporating the necessary data flipping into the
load_buffer_* functions of the SSE2 optimized forward transforms. The
load_buffer_* functions are normally inlined, so the overhead of copying
the data is removed and the overhead of flipping is minimized. Left to
right flipping is still not free, as the columns need to be shuffled in
registers.
To preserve identity between the C and SSE2 implementations, the
appropriate C implementations now also do the data flipping as part of
the transform, rather than relying on the caller for flipping the input.
Overall speedup is about 1.5-2% in encode on my tests. Note that these
are only the forward transforms. Inverse transforms to come in a later
patch.
There are also a few code hygiene changes:
- Fixed some indents of switch statements.
- DCT_DCT transform now always use vp10_fht* functions, which dispatch
to vpx_fdct* for DCT_DCT (some of them used to call vpx_fdct*
directly, some of them used to call vp10_fht*).
Geza Lore [Tue, 3 Nov 2015 13:53:32 +0000 (13:53 +0000)]
Fix transform tables in C implementations.
These tables were out of sync with the indexing enum since the
refactoring in commit 4f16f119 (change 303389), due to the removal
of the ext_tx_to_txtype lookup table. This patch just puts them
back in order.
Jingning Han [Tue, 3 Nov 2015 16:56:47 +0000 (08:56 -0800)]
Re-work rate-distortion optimization scheme for transform coding
This commit re-works the rate-distortion optimization scheme for
transform coding. It improves the overall compression performance.
For derf set, the ext-tx experiment provides 2.27% coding gains,
and the new scheme that integrates multiple transform type selection
and recursive transform block partitioning provides a total of 3.24%
coding gains.
Jingning Han [Mon, 2 Nov 2015 20:05:47 +0000 (12:05 -0800)]
Incorporate flexible tx type and tx partition in RD scheme
This commit hooks up the rate-distortion optimization system to
fully exploit recursive transform block partition and multiple
transform type. The compression performance of the two experiments
largely adds up. For derf set, ext-tx provides additional 2.1%
coding gains on top of the gains due to recursive transform block
partition (0.69%).
Marco [Sun, 1 Nov 2015 22:40:05 +0000 (14:40 -0800)]
Move noise level estimate outside denoiser.
Source noise level estimate is also useful for
setting variance encoder parameters (variance thresholds,
qp-delta, mode selection, etc), so allow it to be used also
if denoising is not on.
Change-Id: I38952cd55b91f35e5db45bc8e6a20ef25069c464
--ext-refs: extended references - for multi-ref in nextgen
--ext-inter: extended inter - for new_inter/copy_mode in nextgen
--ext-interp: for new interpolation
Jingning Han [Fri, 16 Oct 2015 22:54:58 +0000 (15:54 -0700)]
Refactor loop filter mask
This commit refactors the loop filter selection process to support
variable transform block sizes based filter mask. It disables the
multi-thread loop filter implementation to simplify the experiments.
The speed impact on speed 0 encoding is negligible.