Jingning Han [Fri, 26 Jul 2013 21:11:37 +0000 (14:11 -0700)]
Special handle on DC only inverse 8x8 2D-DCT
This commit enables a special handle for the 8x8 inverse 2D-DCT,
where only DC coefficient is quantized to be non-zero. For bus_cif
at 2000 kbps, it provides about 1% speed-up at speed 0.
Paul Wilkins [Wed, 24 Jul 2013 13:07:37 +0000 (14:07 +0100)]
Auto min and max partition size experiment.
Speed feature experiment to set an upper and lower
partition size limit based on what has been seen
in spatial neighbors.
This seems to gives quite reasonable speed gains in local
(10-15%) and when used with speed 0 the losses are small
(0.25% derf, 0.35% stdhd). However, for now I am only
enabling it on speed 1 as there may be clashes with the existing
temporal partition selection in speed 2.
Using a tighter min / max around the range derived from the
neighbors increases speed further but at the cost of a
bigger quality loss. However, I think this spatial method could
be combined with data from either the last frame or a variance
method (or both) to refine the range of minimum and maximum
partition size. I.e. consider the min and max from spatial and
temporal neighbors and the variance recommendation.
Yunqing Wang [Fri, 26 Jul 2013 02:17:46 +0000 (19:17 -0700)]
Modify static threshold calculation
Used 3 * standard_deviation in internal threshold calculation
instead of fit curve. This actually approached the algorithm
better.
For comparison, similar tests were done:
The overall psnr loss is less than before.
1. derf set:
when static-thresh = 1, psnr loss is 0.329%;
when static-thresh = 500, psnr loss is 0.970%;
2. stdhd set:
when static-thresh = 1, psnr loss is 0.922%;
when static-thresh = 500, psnr loss is 1.307%;
Similar speedup is achieved. For example,
clip bitrate static-thresh psnr time
akiyo(cif) 500 0 48.952 5.077s(50f)
akiyo 500 500 48.866 4.169s(50f)
Yunqing Wang [Thu, 11 Jul 2013 18:15:00 +0000 (11:15 -0700)]
Add encoding option --static-thresh
This option exists in VP8, and it was rewritten in VP9 to support
skipping on different partition levels. After prediction is done,
we can check if the residuals in the partition block will be all
quantized to 0. If this is true, the skip flag is set, and only
prediction data are needed in reconstruction. Based on DCT's energy
conservation property, the skipping check can be estimated in
spatial domain.
The prediction error is calculated and compared to a threshold.
The threshold is determined by the dequant values, and also
adjusted by partition sizes. To be precise, the DC and AC parts
for Y, U, and V planes are checked to decide skipping or not.
Test showed that
1. derf set:
when static-thresh = 1, psnr loss is 0.666%;
when static-thresh = 500, psnr loss is 1.162%;
2. stdhd set:
when static-thresh = 1, psnr loss is 1.249%;
when static-thresh = 500, psnr loss is 1.668%;
For different clips, encoding speedup range is between several
percentage and 20+% when static-thresh <= 500. For example,
clip bitrate static-thresh psnr time
akiyo(cif) 500 0 48.923 5.635s(50f)
akiyo 500 500 48.863 4.402s(50f)
Removing unused constants, macros, and function declarations. Using
ROUND_POWER_OF_TWO macro, vp9_zero, vp9_copy where possible. Moving
#include from *.h to *.c. Merging for loops for motion vectors.
Jingning Han [Tue, 23 Jul 2013 22:53:09 +0000 (15:53 -0700)]
Make coeff_optimize initialized per-plane
This commit makes the initialization of trellis coeff optimization
a per-plane operation, thereby eliminating the redundant steps in
encode_sby and encode_sbuv. It makes the encoder at speed 0 slightly
faster.
The feature that uses small partition results as a measure to skip
mode evaluation at larger partition requires the flags to be reset.
The reset was missing in the code path that calls rd_use_partition().
Martin Storsjo [Wed, 24 Jul 2013 13:06:49 +0000 (16:06 +0300)]
msvs: Generate proper configurations for mixed platforms
Prior to 73c4e284, the generated .sln files didn't contain any
information about the different configurations when using .vcxproj
project files. The MSVS IDE was able to fill this in just fine when
loaded though.
When building for ARM, the obj_int_extract project still is built
for x86, in order for the build process to be able to use
obj_int_extract.exe.
Now that configuration info is generated, it breaks current ARM
setups, since the configurations generated by gen_msvs_sln.sh only
included configurations from the last parsed project file (as
mentioned in the comment).
In these setups, the MSVS IDE generated a third meta-platform, called
"Mixed Platforms". This meta-platform points to either ARM or
Win32 as platform in each of the individual projects.
When the MSVS IDE generated this automatically, it also included
the original ARM and Win32 platforms as separate choices, but these
can be omitted since they don't make sense.
Adrian Grange [Wed, 24 Jul 2013 16:59:36 +0000 (09:59 -0700)]
Use local variables rather than structure members
Although local copies of the mode member variables
(mode, ref_frame) were made, they were not used in
all places. Also, made a local copy of the
second_ref_frame member.
Jingning Han [Tue, 23 Jul 2013 17:02:43 +0000 (10:02 -0700)]
Skip inverse transform when eob is zero
When all the transform coefficients were quantized to zero, skip
the inverse transform operation. For bus_cif at 1000 kbps, the
runtime goes from 154967ms -> 149842ms, i.e., about 3% speed-up,
at speed 0.
Jim Bankoski [Tue, 23 Jul 2013 13:51:44 +0000 (06:51 -0700)]
clean up bw, bh
many structures use bw and bh and they have different meanings. This cl attempts
to start this clean up and remove unneccessary 2 step look up log and then
shift operations...
also removed partition type multiple operation code in bitstream.c.
Moving common encoder/decoder code to update_tx_counts. Also renaming
vp9_get_pred_probs_tx_size to get_tx_probs2 and adding get_tx_probs to
call vp9_get_pred_context_tx_size inside read_selected_tx_size only once
(twice before).
Using update_ct and update_ct2 functions for probability update.
Update logic for both mode and mvref was the same, so using MODE_COUNT_SAT,
MODE_MAX_UPDATE_FACTOR, update_ct, update_ct2 for both cases. Removing
function update_tx_ct because it was identical to update_mode_ct2.
Deb Mukherjee [Wed, 17 Jul 2013 22:44:40 +0000 (15:44 -0700)]
Diamond search change to accelerate movement
Optional change in diamond search to continue in the best move
direction until that move turns worse.
This is still WIP since the exact way the new method is to be used is
under investigation. One option is to make it an option in diamond
search and use it only when motion is large.
Overall slightly positive on derfraw300 +0.02%, stdhdraw +0.13%,
but works a lot better for high motion sequences (ex. football : +1%).