Paul Wilkins [Wed, 21 Aug 2013 11:34:14 +0000 (12:34 +0100)]
Added per pixel inter rd hit count stats
Added some code to output normalized rd hit count stats.
In effect this approximates to the average number of rd
operations/tests per pixel for the sequence.
The results are not quite accurate and I have not bothered
to account for partial SB64s at frame edges and for key frames
However they do give some idea of the number of modes /
prediction methods being tested for each pixel across the
different partition sizes. This indicates how much scope their
is for further gains either by reducing the number of partitions
examined or the modes per partition through heuristics.
Patch 3 moved place where count incremented so partial rd
tests that are aborted with INT_MAX return are also counted.
Example numbers for first 50 frames of Akiyo.
Speed 0 ~84.4 rd operations / pixel
Speed 1 ~28.8
Speed 2 ~11.9
Yaowu Xu [Thu, 29 Aug 2013 17:26:52 +0000 (10:26 -0700)]
Fixed potential overflows
The two arrays are typically initialized to INT64_MAX, if they are not
filled with valid values before the addition, the values can overflow
and lead to wrong results.
Dmitry Kovalev [Wed, 28 Aug 2013 19:22:37 +0000 (12:22 -0700)]
General code cleanup.
Switching from mi_{width, height}_log2 and b_{width, height}_log2 to
num_8x8_blocks_{wide, high} and num_4x4_blocks_{wide, high}. Removing
redundant code, adding const.
Deb Mukherjee [Tue, 27 Aug 2013 22:07:50 +0000 (15:07 -0700)]
Adds a speed feature for fast 1-loop forw updates
Incorporates a speed feature for fast forward updates of
coefficients. This feature takes 3 values:
0 - use standard 2-loop version
1 - use a 1-loop version
2 - use a 1-loop version with reduced updates
Results: derfraw300 +0.007% (on speed 0) at feature value = 1
-0.160% (on speed 0) at feature value = 2
There is substantial speed up at speeds 2 and above for low
resolution sequences where the entropy updates are a big part
of the overall computations.
Yaowu Xu [Tue, 27 Aug 2013 15:39:20 +0000 (08:39 -0700)]
fixed the reading too many bytes
In subpel_avg_variance functions, code similar to the following
punpkldq m2, [addr]
actually reads 8 bytes. For functions that are supposed to work on
buffers only have less 8 bytes a line, this caused valgrind error
of reading uninitialized memory.
Yaowu Xu [Fri, 23 Aug 2013 20:29:32 +0000 (13:29 -0700)]
Limit mv range to be based on partition size
Previous change c4048dbd limits the mv search range assuming max block
size of 64x64, this commit change the search range using actual block
size instead.
Dmitry Kovalev [Fri, 23 Aug 2013 20:12:46 +0000 (13:12 -0700)]
Fixing display size setting problem.
Fix of https://code.google.com/p/webm/issues/detail?id=608. We could have
used invalid display size equal to the previous frame size (not to the
current frame size).
Paul Wilkins [Thu, 22 Aug 2013 16:23:02 +0000 (17:23 +0100)]
Limit Key frame Intra modes checks.
Most of the focus so far has been on inter frames.
At high speed settings the key frame is now taking a high %
of the cycles.
This patch puts in some masking to reduce the number
of INTRA modes searched during key frame coding (as already
happens for inter frames) at higher speed settings
TODO: Develop this further with either adaptive rd thresholds
when choosing which intra modes to consider or some other
heuristic.
Impact.
At high speed settings on some clips the key frame was starting
to dominate. In a coding of the first 50 frames of AKIYO at speed
2 limiting the key frame intra modes to DC or TM_PRED resulted in
~30% overall speedup. For Bus the number was lower at ~4-5%.
hkuang [Wed, 21 Aug 2013 21:19:08 +0000 (14:19 -0700)]
Add neon optimize vp9_short_idct10_16x16_add.
vp9_short_idct10_16x16_add is used to handle the block that only have valid data
at top left 4x4 block. All the other datas are 0. So we could cut many
unnecessary calculations in order to save instructions.
Dmitry Kovalev [Thu, 22 Aug 2013 21:39:05 +0000 (14:39 -0700)]
Removing useless calls to setup_{pre, dst}_planes.
Comment is wrong, we don't initialize any xd pointers. We only initialize
xd->planes[i]->dst and xd->planes[i]->pre[], which are actually initialized
for every block during the decoding.
Jingning Han [Tue, 20 Aug 2013 21:34:17 +0000 (14:34 -0700)]
Refactor rd_pick_partition for parameter control
This commit changes the partition search order of superblocks from
{SPLIT, NONE, HORZ, VERT} to {NONE, SPLIT, HORZ, VERT} for
consistency with that of sub8x8 partition search. It enable the use
of early termination in partition search for all block sizes.
For ped_area_1080p 50 frames coded at 4000 kbps, it makes the runtime
goes down from 844305ms -> 818003ms (3% speed-up) at speed 0.
This will further move towards making the in-search partition types
configurable, hence unifying various speed-up approaches.
Some speed 1 and 2 features are turned off during the refactoring
process, including:
disable_split_var_thresh
using_small_partition_info
Stricter constraints are applied to use_square_partition_only for
right/bottom boundary blocks. Will bring back/refine these features
subsequently. At this point, it makes derf set at speed 1 about
0.45% higher in compression performance, and 9% down in run-time.
Deb Mukherjee [Wed, 21 Aug 2013 23:19:35 +0000 (16:19 -0700)]
Fixes on feature disabling split based on variance
Adds a couple of minor fixes, which may be absorbed in Jingning's
patch. Thanks to Guillaume for pointing these out.
Also adjusts the thresholds for speed 1 and 2 to 16 and 32
respectively, to keep quality drops small.
Scott LaVarnway [Thu, 22 Aug 2013 12:51:04 +0000 (08:51 -0400)]
Initialize mb_skip_coeff before picking modes
It appears that the above/left mb_skip_coeff used during
the pick modes, is left over from the previously
encode frame. This patch initializes the flag to the default
value of zero.