Yunqing Wang [Fri, 26 Jul 2013 02:17:46 +0000 (19:17 -0700)]
Modify static threshold calculation
Used 3 * standard_deviation in internal threshold calculation
instead of fit curve. This actually approached the algorithm
better.
For comparison, similar tests were done:
The overall psnr loss is less than before.
1. derf set:
when static-thresh = 1, psnr loss is 0.329%;
when static-thresh = 500, psnr loss is 0.970%;
2. stdhd set:
when static-thresh = 1, psnr loss is 0.922%;
when static-thresh = 500, psnr loss is 1.307%;
Similar speedup is achieved. For example,
clip bitrate static-thresh psnr time
akiyo(cif) 500 0 48.952 5.077s(50f)
akiyo 500 500 48.866 4.169s(50f)
Yunqing Wang [Thu, 11 Jul 2013 18:15:00 +0000 (11:15 -0700)]
Add encoding option --static-thresh
This option exists in VP8, and it was rewritten in VP9 to support
skipping on different partition levels. After prediction is done,
we can check if the residuals in the partition block will be all
quantized to 0. If this is true, the skip flag is set, and only
prediction data are needed in reconstruction. Based on DCT's energy
conservation property, the skipping check can be estimated in
spatial domain.
The prediction error is calculated and compared to a threshold.
The threshold is determined by the dequant values, and also
adjusted by partition sizes. To be precise, the DC and AC parts
for Y, U, and V planes are checked to decide skipping or not.
Test showed that
1. derf set:
when static-thresh = 1, psnr loss is 0.666%;
when static-thresh = 500, psnr loss is 1.162%;
2. stdhd set:
when static-thresh = 1, psnr loss is 1.249%;
when static-thresh = 500, psnr loss is 1.668%;
For different clips, encoding speedup range is between several
percentage and 20+% when static-thresh <= 500. For example,
clip bitrate static-thresh psnr time
akiyo(cif) 500 0 48.923 5.635s(50f)
akiyo 500 500 48.863 4.402s(50f)
Jingning Han [Tue, 23 Jul 2013 22:53:09 +0000 (15:53 -0700)]
Make coeff_optimize initialized per-plane
This commit makes the initialization of trellis coeff optimization
a per-plane operation, thereby eliminating the redundant steps in
encode_sby and encode_sbuv. It makes the encoder at speed 0 slightly
faster.
The feature that uses small partition results as a measure to skip
mode evaluation at larger partition requires the flags to be reset.
The reset was missing in the code path that calls rd_use_partition().
Martin Storsjo [Wed, 24 Jul 2013 13:06:49 +0000 (16:06 +0300)]
msvs: Generate proper configurations for mixed platforms
Prior to 73c4e284, the generated .sln files didn't contain any
information about the different configurations when using .vcxproj
project files. The MSVS IDE was able to fill this in just fine when
loaded though.
When building for ARM, the obj_int_extract project still is built
for x86, in order for the build process to be able to use
obj_int_extract.exe.
Now that configuration info is generated, it breaks current ARM
setups, since the configurations generated by gen_msvs_sln.sh only
included configurations from the last parsed project file (as
mentioned in the comment).
In these setups, the MSVS IDE generated a third meta-platform, called
"Mixed Platforms". This meta-platform points to either ARM or
Win32 as platform in each of the individual projects.
When the MSVS IDE generated this automatically, it also included
the original ARM and Win32 platforms as separate choices, but these
can be omitted since they don't make sense.
Adrian Grange [Wed, 24 Jul 2013 16:59:36 +0000 (09:59 -0700)]
Use local variables rather than structure members
Although local copies of the mode member variables
(mode, ref_frame) were made, they were not used in
all places. Also, made a local copy of the
second_ref_frame member.
Jingning Han [Tue, 23 Jul 2013 17:02:43 +0000 (10:02 -0700)]
Skip inverse transform when eob is zero
When all the transform coefficients were quantized to zero, skip
the inverse transform operation. For bus_cif at 1000 kbps, the
runtime goes from 154967ms -> 149842ms, i.e., about 3% speed-up,
at speed 0.
Jim Bankoski [Tue, 23 Jul 2013 13:51:44 +0000 (06:51 -0700)]
clean up bw, bh
many structures use bw and bh and they have different meanings. This cl attempts
to start this clean up and remove unneccessary 2 step look up log and then
shift operations...
also removed partition type multiple operation code in bitstream.c.
Moving common encoder/decoder code to update_tx_counts. Also renaming
vp9_get_pred_probs_tx_size to get_tx_probs2 and adding get_tx_probs to
call vp9_get_pred_context_tx_size inside read_selected_tx_size only once
(twice before).
Using update_ct and update_ct2 functions for probability update.
Update logic for both mode and mvref was the same, so using MODE_COUNT_SAT,
MODE_MAX_UPDATE_FACTOR, update_ct, update_ct2 for both cases. Removing
function update_tx_ct because it was identical to update_mode_ct2.
Deb Mukherjee [Wed, 17 Jul 2013 22:44:40 +0000 (15:44 -0700)]
Diamond search change to accelerate movement
Optional change in diamond search to continue in the best move
direction until that move turns worse.
This is still WIP since the exact way the new method is to be used is
under investigation. One option is to make it an option in diamond
search and use it only when motion is large.
Overall slightly positive on derfraw300 +0.02%, stdhdraw +0.13%,
but works a lot better for high motion sequences (ex. football : +1%).
Jingning Han [Thu, 18 Jul 2013 00:07:32 +0000 (17:07 -0700)]
Optimize operation flow in sub8x8 rd loop
Stack the rate-distortion statistics in the sub8x8 rd loop. This allows
the encoder to skip the forward transform, quantization, and coeff cost
estimation, in the sub8x8 rd optimization search, if the motion
vector(s) are of integer pixel value, and have been tested in the
previous prediction filter type rd loops of the same block.
This gives about 2% speed-up for bus_cif at 2000 kpbs, for speed 0.
Its efficacy depends how frequently the motion search will select an
integer motion vector.