Paul Wilkins [Thu, 22 Aug 2013 16:23:02 +0000 (17:23 +0100)]
Limit Key frame Intra modes checks.
Most of the focus so far has been on inter frames.
At high speed settings the key frame is now taking a high %
of the cycles.
This patch puts in some masking to reduce the number
of INTRA modes searched during key frame coding (as already
happens for inter frames) at higher speed settings
TODO: Develop this further with either adaptive rd thresholds
when choosing which intra modes to consider or some other
heuristic.
Impact.
At high speed settings on some clips the key frame was starting
to dominate. In a coding of the first 50 frames of AKIYO at speed
2 limiting the key frame intra modes to DC or TM_PRED resulted in
~30% overall speedup. For Bus the number was lower at ~4-5%.
hkuang [Wed, 21 Aug 2013 21:19:08 +0000 (14:19 -0700)]
Add neon optimize vp9_short_idct10_16x16_add.
vp9_short_idct10_16x16_add is used to handle the block that only have valid data
at top left 4x4 block. All the other datas are 0. So we could cut many
unnecessary calculations in order to save instructions.
Jingning Han [Tue, 20 Aug 2013 21:34:17 +0000 (14:34 -0700)]
Refactor rd_pick_partition for parameter control
This commit changes the partition search order of superblocks from
{SPLIT, NONE, HORZ, VERT} to {NONE, SPLIT, HORZ, VERT} for
consistency with that of sub8x8 partition search. It enable the use
of early termination in partition search for all block sizes.
For ped_area_1080p 50 frames coded at 4000 kbps, it makes the runtime
goes down from 844305ms -> 818003ms (3% speed-up) at speed 0.
This will further move towards making the in-search partition types
configurable, hence unifying various speed-up approaches.
Some speed 1 and 2 features are turned off during the refactoring
process, including:
disable_split_var_thresh
using_small_partition_info
Stricter constraints are applied to use_square_partition_only for
right/bottom boundary blocks. Will bring back/refine these features
subsequently. At this point, it makes derf set at speed 1 about
0.45% higher in compression performance, and 9% down in run-time.
Deb Mukherjee [Wed, 21 Aug 2013 23:19:35 +0000 (16:19 -0700)]
Fixes on feature disabling split based on variance
Adds a couple of minor fixes, which may be absorbed in Jingning's
patch. Thanks to Guillaume for pointing these out.
Also adjusts the thresholds for speed 1 and 2 to 16 and 32
respectively, to keep quality drops small.
Scott LaVarnway [Thu, 22 Aug 2013 12:51:04 +0000 (08:51 -0400)]
Initialize mb_skip_coeff before picking modes
It appears that the above/left mb_skip_coeff used during
the pick modes, is left over from the previously
encode frame. This patch initializes the flag to the default
value of zero.
Deb Mukherjee [Mon, 19 Aug 2013 21:16:26 +0000 (14:16 -0700)]
Make "good" quality 2-pass vpxenc encoding default
Currently, the best quality mode in VP9 is not very well developed,
and unnecessarily makes the encode too slow. Hence the command line
default is changed to "good" quality. Also, the number of passes
default is changed to 2 passes as well, since 1-pass encoding is
not very efficient in VP9.
Besides, a number of VP9 defaults are set to the currently
recommended settings. With these changes, vpxenc
run with --codec=vp9 --kf-max-dist=9999 --cpu-used=0 should
work about the same as our borg results.
Note when the --cpu-used=0 option is dropped there will be a slight
difference in the output, because of a difference in the cpu-used
value for the first pass. Specifically, the default when unspecified
is to use cpu_used=1 for the first pass and cpu_used=0 for the
second pass. But when specified, both passes will use the cpu-used
value specified.
Note that this also changes the default for VP8 as being "good"
but other options stay unchanged.
hkuang [Fri, 16 Aug 2013 23:36:07 +0000 (16:36 -0700)]
Add neon optimize vp9_short_idct10_8x8_add.
vp9_short_idct10_8x8_add is used to handle the block that only have valid data
at top left 4x4 block. All the other datas are 0. So we could cut several
unnecessary calculations in order to save instructions.
Jingning Han [Tue, 20 Aug 2013 17:33:42 +0000 (10:33 -0700)]
Enable zero coeff check in sub8x8 UV rd loop
Check the minimum rate-distortion cost of regular quantization and
all zero coeffs cases in the sub8x8 inter prediction rd loop for
luma components. Use this as the cumulative rdcost sent to UV rd
estimation.
Deb Mukherjee [Fri, 16 Aug 2013 20:51:00 +0000 (13:51 -0700)]
Cleanup/enhancements of switchable filter search
Cleans up the switchable filter search logic. Also adds a
speed feature - a variance threshold - to disable filter search
if source variance is lower than this value.
Jim Bankoski [Tue, 20 Aug 2013 15:14:52 +0000 (08:14 -0700)]
fix the mv_ref_idx issue
The following issue was reported :
https://code.google.com/p/webm/issues/detail?id=601&q=jimbankoski&sort=-id&colspec=ID%20Pri%20mstone%20ReleaseBlock%20Type%20Component%20Status%20Owner%20Summary
This code makes the choice and code cleaner and removes any question
about whether the border needs to be checked.
Yaowu Xu [Sat, 10 Aug 2013 22:04:02 +0000 (15:04 -0700)]
Change to limit the mv search range
As the pixel values beyond image border are duplicates of pixels
on edge, the change limits the mv search range, any mv beyond
the limits no longer produce new/different prediction values
as entire block with pixels used for subpel interpolation are
outside image border.
Jingning Han [Wed, 14 Aug 2013 23:50:45 +0000 (16:50 -0700)]
Enable early termination in uv rd loop
This commit enables early termination in the rate-distortion
optimization search loop for chroma components. When the cumulative
rd cost is above the current best value, skip the rest per-block
transform/quantization/coeff_cost and continue to the next
prediction mode.
For bus_cif at 2000 kbps, the average run-time goes down from
168546ms -> 164678ms, (2% speed-up) at speed 0
36197ms -> 34465ms, (4% speed-up) at speed 1
Dmitry Kovalev [Mon, 19 Aug 2013 20:20:21 +0000 (13:20 -0700)]
Using plane_bsize instead of bsize.
This change set is intermediate. The next one will remove all repetitive
plane_bsize calculations, because it will be passed as argument to
foreach_transformed_block_visitor.
Adrian Grange [Mon, 19 Aug 2013 18:58:52 +0000 (11:58 -0700)]
Further correct bug in loopfilter initialization
The intent was to initialize the deltas for the
segment to the computed value, irrespective of mode
and reference frame if (mode_ref_delta_enabled == 0).
(In response to bug posted by Manjit Hota to codec-devel
and webm-discuss lists)
Jingning Han [Fri, 16 Aug 2013 00:03:14 +0000 (17:03 -0700)]
Fix the returned distortion value in rd_pick_intra
Return the distortion value in vp9_rd_pick_intra_mode_sb as sum of
dist_y and dist_uv. Remove the right shift operation on dist_uv,
and make it consistent with that of vp9_rd_pick_inter_mode_sb.