Jingning Han [Wed, 11 Nov 2015 00:02:33 +0000 (16:02 -0800)]
Fix an encoding failure case when speed features are on
This commit fixes an encoding failure case triggered when early
termination feature is turned on for transform block size search.
It resolves the corresponding enc/dec mismatch issue.
Marco [Mon, 9 Nov 2015 21:36:56 +0000 (13:36 -0800)]
Add bias to zero/small motion for noisy source.
Change is only for real-time mode, speed >= 5, and non-screen content mode.
Add bias to zero/low motion for big blocks, if noise estimation
is enabled and noise level is above threshold.
jackychen [Mon, 9 Nov 2015 19:47:26 +0000 (11:47 -0800)]
VP9 dynamic resize: increase waiting time after key frame.
For 1 pass CBR mode: increase waiting time after key frame
before we start sampling rate control behavior for determining
resize. This change need to disable one internal resize(DownUp)
temporally since it requires a longer clip to do so.
Alex Converse [Tue, 3 Nov 2015 00:28:10 +0000 (16:28 -0800)]
Expand unconstrained nodes in pack_mb_tokens and loop on zeros.
Reduces Linux perf estimated cycle count for pack_mb_tokens on a
lossless encode on my desktop from 61858501855 to 48154040219 or from
26% of the overall profile to 21%.
This breaks 32-bit builds:
runtime error: load of misaligned address 0xf72fdd48 for type 'const
__m128i' (vector of 2 'long long' values), which requires 16 byte
alignment
+ _mm_set1_epi64x is incompatible with some versions of visual studio
Improves entropy coding to send the filter type only if
the motion vectors have subpel components.
Adds one new 8-tap smooth filter, and tweaks the others.
derflr: +0.695%
hevcmr: +0.305%
About 5% encode slowdown. No visible impact for decoding.
Also makes the interpolation framework flexible to support both
strictly interpolating filters as well as non-interpolating
filters that filter integer offsets. This is mainly for
further experimentation and if not found useful the code will
be removed.
Marco [Fri, 6 Nov 2015 00:00:15 +0000 (16:00 -0800)]
vp9: Updates to noise estimation.
Add threshold/condition on spatial_variance and brightness level.
Modification to normalization of block variance.
Change resolution limit below which we disable noise estimation.
Geza Lore [Wed, 28 Oct 2015 14:35:04 +0000 (14:35 +0000)]
Add AVX vectorized vp9_diamond_search_sad
This function now has an AVX intrinsics version which is about 80%
faster compared to the C implementation. This provides a 2-4% total
speed-up for encode, depending on encoding parameters. The function
utilizes 3 properties of the cost function lookup table, constructed
in 'cal_nmvjointsadcost' and 'cal_nmvsadcosts'.
For the joint cost:
- mvjointsadcost[1] == mvjointsadcost[2] == mvjointsadcost[3]
For the component costs:
- For all i: mvsadcost[0][i] == mvsadcost[1][i]
(equal per component cost)
- For all i: mvsadcost[0][i] == mvsadcost[0][-i]
(Cost function is even)
These must hold, otherwise the AVX version of the function cannot be used.
Jingning Han [Tue, 3 Nov 2015 20:59:24 +0000 (12:59 -0800)]
Simplify txfm rate-distortion optimization
This commit refactors the rate-distortion optimization scheme for
transform block coding. When both ext-tx and var-tx experiments
are turned on, the encoding time for bus_cif at 1000 kbps goes down
from 706377 ms to 666503 ms (5.6% speed-up). The coding statics
remain unchanged.
Geza Lore [Wed, 4 Nov 2015 14:56:34 +0000 (14:56 +0000)]
Flip the result of the inverse transform for FLIPADST.
When using FLIPADST, the vp10_inv_txfm_add functions used to flip
the destination array, add the result of the inverse transform, to it
and then flip the destination back. This has been replaced by
flipping the result of the inverse transform before adding it to the
destination. Up-Down flipping is done by negating the destination
stride, and staring from the bottom, so it should now be free.
Left-right flipping is done with the usual SSE2 instructions in the
optimized code.
The C functions match the SSE2 functions as expected, so the C functions
now do the flipping as well when required. Adding this cleanly required
some refactoring of the C functions, but there is no measurable
performance impact when ext-tx is not enabled.
hui su [Wed, 7 Oct 2015 16:29:02 +0000 (09:29 -0700)]
ext-intra experiment
Currently there are two parts in this experiment: extra directional intra
prediction modes and the filter intra modes migrated from the nextgen branch.
Several macros are defined in "blockd.h" to provide controls of the experiment
settings. Setting "DR_ONLY" as 1 (default is 0) means we only use directional
modes, and skip the filter-intra modes; "EXT_INTRA_ANGLES" (default is 128)
defines the number of different angles we want to support; setting
"ANGLE_FAST_SEARCH" as 1 (default is 1) means we use fast sub-optimal search
for the best prediction angle, instead of exhaustive search. The fast search
is about 6 times faster than the exhaustive search, while preserving about
60% of the coding gains.
With extra directional prediction modes (fast search), we observe the following
code gains (number in parentheses is for all-key-frame setting):
derflr +0.42% (+1.79%)
hevclr +0.78% (+2.19%)
hevcmr +1.20% (+3.49%)
stdhd +0.56%
Speed-wise, about 110% slower for key frames, and 30% slower overall.
The gains of filter intra modes mostly add up with the gains of directional
modes. The overall coding gain of this experiment:
derflr +0.94%
hevclr +1.46%
hevcmr +1.94%
stdhd +1.58%