Combined non-rd motion searchs into a single function
This commit combined the full pel and sub pel motion search into a
single function to avoid code duplication. The commit does not change
encoder outputs.
Jingning Han [Mon, 7 Jul 2014 19:08:40 +0000 (12:08 -0700)]
Re-design quantization process for 32x32 transform block
This commit enables a new quantization process for 32x32 2D-DCT
transform coefficient blocks. It improves the compression
performance of speed 5 by 1.4%. The overall compression gains of
speed 5 due to the new quantization scheme is 4.7%. It also includes
the SSSE3 implementation of the 32x32 quantization process.
Adrian Grange [Mon, 9 Jun 2014 22:22:17 +0000 (15:22 -0700)]
Fix decoder handling of intra-only frames
This patch fixes bug 633:
https://code.google.com/p/webm/issues/detail?id=633
The first decoded frame does not have to be a keyframe,
it could be an inter-frame that is coded intra-only.
This patch fixes the handling of intra-only frames.
A test vector has also been added that encodes 3
intra-only frames at the start of the clip. The
test vector was generated using the code in the
following patch:
https://gerrit.chromium.org/gerrit/#/c/70680/
Tim Kopp [Mon, 7 Jul 2014 18:35:27 +0000 (11:35 -0700)]
Vp9 denoiser MC bugfix
In the previous version, only certain buffers in the macroblockd were saved and
the restored. In this version, all of the buffers are saved and restored. The
code was then rolled into a loop for readability.
Also contains a tiny fix for when the -DOUTPUT_YUV_DENOISED flag is used.
Prepare for frame parallel decoding, the reference count buffers
need to be protected by mutex. Move vp9_thread.* to common
folder so that those buffers could use cross-platform mutex
from vp9_thread.*.
Alex Converse [Tue, 1 Jul 2014 20:02:05 +0000 (13:02 -0700)]
Cleanup motion search speed features.
* Replace max_step_search_steps with constant MAX_MVSEARCH_STEPS
* Fold (reduce_first_step_size + speed > 5) into reduce_first_step_size
replacing uses of reduce_first_step_size that don't add the speed
check with zero.
Alex Converse [Wed, 2 Jul 2014 19:36:48 +0000 (12:36 -0700)]
Split vp9_rdopt into vp9_rdopt and vp9_rd.
vp9_rdopt is for making rd optimal mode decisions. vp9_rd is for all
other rd related routines. Anything used outside of making an rd optimal
decision belongs in rd.
Yunqing Wang [Wed, 2 Jul 2014 19:16:27 +0000 (12:16 -0700)]
Fix rd threshold overflow issue
Moved the threshold adjustment before reference flag checking,
which could set the threshold to INT_MAX for disabled reference
frame, and cause overflow if the adjustment is done after that.
Tim Kopp [Tue, 17 Jun 2014 20:04:39 +0000 (13:04 -0700)]
VP9 denoiser implemented FILTER_BLOCK case
Renamed updating_running_avg() to filter(). Extended function with the rest of
the filter procedure. Made all of the empirically-determined constants used in
VP8 into functions so they can be tweaked more easily.
Tim Kopp [Tue, 1 Jul 2014 18:05:16 +0000 (11:05 -0700)]
VP9 denoising enabled by noise_sensitivity param
As in VP8.
Currently, this parameter is set with the VP8E_SET_NOISE_SENSITIVITY flag.
The flag was not renamed so that we don't break the interface for webrtc. This
should probably be changed at some point in the future.
Added a speed feature controlling a motion search parameter
This commit added a speed feature to control the step_param used in
full pixel motion search. The intention is to reduced the search
steps for high speed real time coding.
James Zern [Wed, 2 Jul 2014 02:02:15 +0000 (19:02 -0700)]
vpxdec: add --keep-going option
for debugging purposes.
continues decoding after receiving a decode error. will still exit with
an error after the current loop, ignoring remaining --loops
Jingning Han [Tue, 1 Jul 2014 23:10:44 +0000 (16:10 -0700)]
Re-design quantization process
This commit re-designs the quantization process for transform
coefficient blocks of size 4x4 to 16x16. It improves compression
performance for speed 7 by 3.85%. The SSSE3 version for the
new quantization process is included.
The average runtime of the 8x8 block quantization is reduced
from 285 cycles -> 255 cycles, i.e., over 10% faster.
Pengchong Jin [Mon, 30 Jun 2014 16:52:27 +0000 (09:52 -0700)]
Store/read 16x16 block statistics obtained from the first pass
Add a conditional compile flag for this feature. Also add a
switch to enable the encoder to use these statistics in the
second pass. Currently, the switch is turned off.
Yunqing Wang [Tue, 1 Jul 2014 19:18:27 +0000 (12:18 -0700)]
Elevate NEWMV mode checking threshold in real time
The current threshold is knid of low, and in many cases NEWMV
mode is checked but not picked as the best mode. This patch
added a speed feature to increase NEWMV threshold, so that
less partition mode checking goes to check NEWMV. This feature
is enabled for speed 6 and 7.
Rtc set borg tests showed:
1. Speed 6, overall psnr: -0.088%, ssim: -1.339%;
Average speedup on rtc set is 11.1%.
2. Speed 7, overall psnr: -0.505%, ssim: -2.320%
Average speedup on rtc set is 12.9%.
Jingning Han [Mon, 30 Jun 2014 16:49:48 +0000 (09:49 -0700)]
Remove unused set_mode_info function
When the frame is intra coded only, the encoder takes the RD
coding flow. Hence the function set_mode_info is not practically
in use. This commit removes it and the associated conditional
branches.
Yunqing Wang [Sat, 28 Jun 2014 00:44:32 +0000 (17:44 -0700)]
Enable encode breakout in real time
For real time speed 7, once encode breakout is on(i.e. encoding
setting --static-thresh=1), a proper encode breakout threshold
is set to speed up the encoder.
Set --static-thresh=1, RTC set borg test showed a slight overall
psnr loss of 0.162%, but ssim gain of 0.287%. The average speedup
on RTC set is 6%, and for some clips, the speedup can be 10+%.
Yunqing Wang [Thu, 1 May 2014 22:14:39 +0000 (15:14 -0700)]
Decide the partitioning threshold from the variance histogram
Before encoding a frame, calculate and store each 16x16 block's
variance of source difference between last and current frame.
Find partitioning threshold T for the frame from its variance
histogram, and then use T to make partition decisions.
Comparing with fixed 16x16 partitioning, rtc set test showed an
overall psnr gain of 3.242%, and ssim gain of 3.751%. The best
psnr gain is 8.653%.
The overall encoding speed didn't change much. It got faster for
some clips(for example, 12% speedup for vidyo1), and a little
slower for others.
Also, a minor modification was made in datarate unit test.