Yaowu Xu [Fri, 23 Aug 2013 20:29:32 +0000 (13:29 -0700)]
Limit mv range to be based on partition size
Previous change c4048dbd limits the mv search range assuming max block
size of 64x64, this commit change the search range using actual block
size instead.
Dmitry Kovalev [Fri, 23 Aug 2013 20:12:46 +0000 (13:12 -0700)]
Fixing display size setting problem.
Fix of https://code.google.com/p/webm/issues/detail?id=608. We could have
used invalid display size equal to the previous frame size (not to the
current frame size).
Paul Wilkins [Thu, 22 Aug 2013 16:23:02 +0000 (17:23 +0100)]
Limit Key frame Intra modes checks.
Most of the focus so far has been on inter frames.
At high speed settings the key frame is now taking a high %
of the cycles.
This patch puts in some masking to reduce the number
of INTRA modes searched during key frame coding (as already
happens for inter frames) at higher speed settings
TODO: Develop this further with either adaptive rd thresholds
when choosing which intra modes to consider or some other
heuristic.
Impact.
At high speed settings on some clips the key frame was starting
to dominate. In a coding of the first 50 frames of AKIYO at speed
2 limiting the key frame intra modes to DC or TM_PRED resulted in
~30% overall speedup. For Bus the number was lower at ~4-5%.
hkuang [Wed, 21 Aug 2013 21:19:08 +0000 (14:19 -0700)]
Add neon optimize vp9_short_idct10_16x16_add.
vp9_short_idct10_16x16_add is used to handle the block that only have valid data
at top left 4x4 block. All the other datas are 0. So we could cut many
unnecessary calculations in order to save instructions.
Dmitry Kovalev [Thu, 22 Aug 2013 21:39:05 +0000 (14:39 -0700)]
Removing useless calls to setup_{pre, dst}_planes.
Comment is wrong, we don't initialize any xd pointers. We only initialize
xd->planes[i]->dst and xd->planes[i]->pre[], which are actually initialized
for every block during the decoding.
Jingning Han [Tue, 20 Aug 2013 21:34:17 +0000 (14:34 -0700)]
Refactor rd_pick_partition for parameter control
This commit changes the partition search order of superblocks from
{SPLIT, NONE, HORZ, VERT} to {NONE, SPLIT, HORZ, VERT} for
consistency with that of sub8x8 partition search. It enable the use
of early termination in partition search for all block sizes.
For ped_area_1080p 50 frames coded at 4000 kbps, it makes the runtime
goes down from 844305ms -> 818003ms (3% speed-up) at speed 0.
This will further move towards making the in-search partition types
configurable, hence unifying various speed-up approaches.
Some speed 1 and 2 features are turned off during the refactoring
process, including:
disable_split_var_thresh
using_small_partition_info
Stricter constraints are applied to use_square_partition_only for
right/bottom boundary blocks. Will bring back/refine these features
subsequently. At this point, it makes derf set at speed 1 about
0.45% higher in compression performance, and 9% down in run-time.
Deb Mukherjee [Wed, 21 Aug 2013 23:19:35 +0000 (16:19 -0700)]
Fixes on feature disabling split based on variance
Adds a couple of minor fixes, which may be absorbed in Jingning's
patch. Thanks to Guillaume for pointing these out.
Also adjusts the thresholds for speed 1 and 2 to 16 and 32
respectively, to keep quality drops small.
Scott LaVarnway [Thu, 22 Aug 2013 12:51:04 +0000 (08:51 -0400)]
Initialize mb_skip_coeff before picking modes
It appears that the above/left mb_skip_coeff used during
the pick modes, is left over from the previously
encode frame. This patch initializes the flag to the default
value of zero.
the final macroblock rows are scheduled in the main thread. prior to
this change one additional macroblock row would be scheduled in the
worker forcing the main thread to wait before finishing.
Deb Mukherjee [Mon, 19 Aug 2013 21:16:26 +0000 (14:16 -0700)]
Make "good" quality 2-pass vpxenc encoding default
Currently, the best quality mode in VP9 is not very well developed,
and unnecessarily makes the encode too slow. Hence the command line
default is changed to "good" quality. Also, the number of passes
default is changed to 2 passes as well, since 1-pass encoding is
not very efficient in VP9.
Besides, a number of VP9 defaults are set to the currently
recommended settings. With these changes, vpxenc
run with --codec=vp9 --kf-max-dist=9999 --cpu-used=0 should
work about the same as our borg results.
Note when the --cpu-used=0 option is dropped there will be a slight
difference in the output, because of a difference in the cpu-used
value for the first pass. Specifically, the default when unspecified
is to use cpu_used=1 for the first pass and cpu_used=0 for the
second pass. But when specified, both passes will use the cpu-used
value specified.
Note that this also changes the default for VP8 as being "good"
but other options stay unchanged.
hkuang [Fri, 16 Aug 2013 23:36:07 +0000 (16:36 -0700)]
Add neon optimize vp9_short_idct10_8x8_add.
vp9_short_idct10_8x8_add is used to handle the block that only have valid data
at top left 4x4 block. All the other datas are 0. So we could cut several
unnecessary calculations in order to save instructions.
Jingning Han [Tue, 20 Aug 2013 17:33:42 +0000 (10:33 -0700)]
Enable zero coeff check in sub8x8 UV rd loop
Check the minimum rate-distortion cost of regular quantization and
all zero coeffs cases in the sub8x8 inter prediction rd loop for
luma components. Use this as the cumulative rdcost sent to UV rd
estimation.
Deb Mukherjee [Fri, 16 Aug 2013 20:51:00 +0000 (13:51 -0700)]
Cleanup/enhancements of switchable filter search
Cleans up the switchable filter search logic. Also adds a
speed feature - a variance threshold - to disable filter search
if source variance is lower than this value.
Jim Bankoski [Tue, 20 Aug 2013 15:14:52 +0000 (08:14 -0700)]
fix the mv_ref_idx issue
The following issue was reported :
https://code.google.com/p/webm/issues/detail?id=601&q=jimbankoski&sort=-id&colspec=ID%20Pri%20mstone%20ReleaseBlock%20Type%20Component%20Status%20Owner%20Summary
This code makes the choice and code cleaner and removes any question
about whether the border needs to be checked.