Yunqing Wang [Fri, 6 Mar 2015 23:17:59 +0000 (15:17 -0800)]
vp9_ethread: fix me consts initialization to support aq_mode=3 encoding
While turning on "--aq_mode=3", the quantizers are updated by each
thread. Fixed the me consts initialization function to make sure
that the correct thread data are updated.
Yunqing Wang [Thu, 5 Mar 2015 17:44:18 +0000 (09:44 -0800)]
Modify the setting of transform skip flags in non-rd mode
While searching for the best mode in non-rd case, SSE of
a partition block is calculated and the transform size is set.
This patch rewrites the skip checking conditions based on
transform size instead of partition size to be more precise.
Small gains were seen in rtc set borg test (speed 6).
AVG PSNR: 0.087%, overall PSNR: 0.073%, SSIM: 0.146%.
No noticeable speed change.
Johann [Wed, 4 Mar 2015 23:34:55 +0000 (15:34 -0800)]
Declare function used by 'once' with 'void' parameters
Visual Studio is exceptionally picky about this:
vp9_reconintra.c(900): warning C4113: 'void (__cdecl *)()' differs in
parameter lists from 'void (__cdecl *)(void)'
[.build-x86_64-win64-vs10\vpx.vcxproj]
Jingning Han [Wed, 4 Mar 2015 17:40:01 +0000 (09:40 -0800)]
Use SAD value to set chroma cost flag
This saves an extra 64x64 variance calculation and replaces two
32x32 variance functions with sad functions. The compression
performance change is unnoticeable.
Adrian Grange [Wed, 18 Feb 2015 17:40:34 +0000 (09:40 -0800)]
Make encoder buffer allocation dynamic
Frame buffers are now allocated dynamically on-demand.
Entries in the reference frame map, cm->ref_frame_map,
may now be set to -1 (INVALID_IDX) to indicate that
there is not a valid reference buffer in that "slot".
All slots in the reference frame map are now initialized
to the empty state (-1) and each buffer is initialized
to have a reference count of 0.
Jingning Han [Mon, 2 Mar 2015 21:51:12 +0000 (13:51 -0800)]
Properly handle the boundary blocks for integral projection search
Use rectangular block size for integral projection motion estimation
if the the 64x64 block has over half block outside the frame. This
avoids the issue that the motion information of these blocks is
dominated by the extended pixels, instead of the pixels of interest.
Deb Mukherjee [Tue, 3 Mar 2015 20:26:41 +0000 (12:26 -0800)]
dc quantizer fix for 32x32 transforms
The rounding factor needs to be scaled down by a factor of 2.
Also, the quantized and dequantized coefficients are memset to 0
when dc quantizer is used.
hkuang [Tue, 3 Mar 2015 20:57:26 +0000 (12:57 -0800)]
Fix a tsan error bug in frame parallel decode.
A frame may be waiting for an out of border pixel from another
frame. A frame's row progress variable is set to -1 when start being decoded
and another frame may be waiting for -2 row pixel from this frame.
In this case, vp9_frameworker_wait will return directly and skip the waiting
which leads to tsan error between threads.
Frank Galligan [Tue, 3 Mar 2015 19:20:11 +0000 (11:20 -0800)]
VP9: turn on tile-columns and frame-parallel-mode by default
Most of the current decoders use tile-based multithreading. Also
most of the current decoders need frame_parallel_decoding_mode
turned on to enable multithreaded decoding. tile-columns is
limited by resolution, so setting to max (6) is fine.
Yaowu Xu [Mon, 2 Mar 2015 23:35:58 +0000 (15:35 -0800)]
Adapt color sensitiviy threshold to luma signal energy
Instead using only a fixed threshold, this commit adapts the threshold
for color sensitivity decision to luma signal energy: chroma channel's
sse is at least 1/6 of that in luma for color sensitivity flag to be
set to active.
This recoups a large portion of the speed loss due to accounting for
chroma component costs in RTC mode decision.
Yunqing Wang [Tue, 3 Mar 2015 00:19:23 +0000 (16:19 -0800)]
fix a race condition caused by intra function pointer initialization
This patch fixed webm issue 962.
(https://code.google.com/p/webm/issues/detail?id=962)
The data races occurred when an encoder and a decoder were created
at the same time, and the function pointers were initialized twice.
Jingning Han [Fri, 27 Feb 2015 21:35:22 +0000 (13:35 -0800)]
Use variance metric for integral projection vector match
This commit replaces the SAD with variance as metric for the
integral projection vector match. It improves the search accuracy
in the presence of slight light change. The average speed -6
compression performance for rtc set is improved by 1.7%. No speed
changes are observed for the test clips.
Jingning Han [Fri, 27 Feb 2015 23:37:18 +0000 (15:37 -0800)]
Fix source frame border extension
This commit fixes an issue in source frame border extension. It
causes certain frame resolution such as 640x480 to have a portion
of the right/bottom extension filled by zeros, which misleads
motion search and degrades transform coding performance when large
block size is used.
This fix improves the speed 2 compression performance of a few
yt sequence, typically ranging from 1% - 2%, up to 5% at median
to low bit-rate.
James Zern [Fri, 27 Feb 2015 07:14:54 +0000 (23:14 -0800)]
test-data.mk: fix perf test data dependency
both the encode and decode perf tests require niklas_1280_720_30.yuv
broken since: 28eebf3 Merge "tests: add a shorter 720p test clip" 7839d03 tests: add a shorter 720p test clip
James Zern [Thu, 26 Feb 2015 03:09:59 +0000 (19:09 -0800)]
tests: add a shorter 720p test clip
niklas_1280_720_30.y4m 60 frames @ 30fps
only a small number of frames are being used; this reduces the test data
download size in non-perf-test cases by >500M.
retain niklas_1280_720_30.yuv for encode+decode perf tests
Jingning Han [Mon, 23 Feb 2015 20:33:24 +0000 (12:33 -0800)]
Motion compensated reference refinement
This commit applies one-step refinement search to the resulting
motion vector of the integral projectiion based motion estimation,
per 64x64 block. It improves the coding performance of speed -6.
pedestrian 1080p 500 kbps
51735 b/f, 36.794 dB, 16044 ms ->
51382 b/f, 36.793 dB, 16282 ms
cloud 1080p 500 kbps
24081 b/f, 37.988 dB, 14016 ms ->
23597 b/f, 38.076 dB, 12774 ms
vidyo1 720p 1000 kbps
16552 b/f, 40.514 dB, 8279 ms ->
16553 b/f, 40.543 dB, 8510 ms
The rtc set compression performance is improved by 0.5%.
Jingning Han [Tue, 24 Feb 2015 20:04:09 +0000 (12:04 -0800)]
Fix high bit-depth loop-filter sse2 compiling issue - part 1
The intrinsic statement _mm_subs_epi16() should take immediate.
Feeding variable as its input argument will cause compile failure
in older version gcc.