Frank Galligan [Tue, 3 Mar 2015 19:20:11 +0000 (11:20 -0800)]
VP9: turn on tile-columns and frame-parallel-mode by default
Most of the current decoders use tile-based multithreading. Also
most of the current decoders need frame_parallel_decoding_mode
turned on to enable multithreaded decoding. tile-columns is
limited by resolution, so setting to max (6) is fine.
Jingning Han [Fri, 27 Feb 2015 21:35:22 +0000 (13:35 -0800)]
Use variance metric for integral projection vector match
This commit replaces the SAD with variance as metric for the
integral projection vector match. It improves the search accuracy
in the presence of slight light change. The average speed -6
compression performance for rtc set is improved by 1.7%. No speed
changes are observed for the test clips.
Jingning Han [Fri, 27 Feb 2015 23:37:18 +0000 (15:37 -0800)]
Fix source frame border extension
This commit fixes an issue in source frame border extension. It
causes certain frame resolution such as 640x480 to have a portion
of the right/bottom extension filled by zeros, which misleads
motion search and degrades transform coding performance when large
block size is used.
This fix improves the speed 2 compression performance of a few
yt sequence, typically ranging from 1% - 2%, up to 5% at median
to low bit-rate.
James Zern [Fri, 27 Feb 2015 07:14:54 +0000 (23:14 -0800)]
test-data.mk: fix perf test data dependency
both the encode and decode perf tests require niklas_1280_720_30.yuv
broken since: 28eebf3 Merge "tests: add a shorter 720p test clip" 7839d03 tests: add a shorter 720p test clip
James Zern [Thu, 26 Feb 2015 03:09:59 +0000 (19:09 -0800)]
tests: add a shorter 720p test clip
niklas_1280_720_30.y4m 60 frames @ 30fps
only a small number of frames are being used; this reduces the test data
download size in non-perf-test cases by >500M.
retain niklas_1280_720_30.yuv for encode+decode perf tests
Jingning Han [Mon, 23 Feb 2015 20:33:24 +0000 (12:33 -0800)]
Motion compensated reference refinement
This commit applies one-step refinement search to the resulting
motion vector of the integral projectiion based motion estimation,
per 64x64 block. It improves the coding performance of speed -6.
pedestrian 1080p 500 kbps
51735 b/f, 36.794 dB, 16044 ms ->
51382 b/f, 36.793 dB, 16282 ms
cloud 1080p 500 kbps
24081 b/f, 37.988 dB, 14016 ms ->
23597 b/f, 38.076 dB, 12774 ms
vidyo1 720p 1000 kbps
16552 b/f, 40.514 dB, 8279 ms ->
16553 b/f, 40.543 dB, 8510 ms
The rtc set compression performance is improved by 0.5%.
Jingning Han [Tue, 24 Feb 2015 20:04:09 +0000 (12:04 -0800)]
Fix high bit-depth loop-filter sse2 compiling issue - part 1
The intrinsic statement _mm_subs_epi16() should take immediate.
Feeding variable as its input argument will cause compile failure
in older version gcc.
Jingning Han [Mon, 23 Feb 2015 20:55:50 +0000 (12:55 -0800)]
Re-distribute hierarchical vector match pattern
This commit modifies the hierarchical vector match patter. It
avoids repeated SAD computation at same points. The function
vp9_vector_sad_sse2 is called 12 times per 64x64 block, instead
of 15 times as before. The effective coverage remains the same.
Yunqing Wang [Tue, 24 Feb 2015 18:37:05 +0000 (10:37 -0800)]
Fix ssse3 quantize_fp functions while skip=1
In ssse3 functions, DEFINE_ARGS macro hard codes qcoeff and dqcoeff
to r3 and r4. If skip is 1, qcoeff and dqcoeff need to be loaded
from the stack, which doesn't work because of the above definitions.
Currently, skip=1 case is not used in the encoder. This patch fixed
the issue, so it can be turned on later.
paulwilkins [Fri, 20 Feb 2015 13:41:25 +0000 (13:41 +0000)]
Account for rate error in GF group Q calculation.
When GF group adaptive maxQ is enabled this patch accounts
somewhat for accumulated error in the rate control.
This improves accuracy quite a bit on many clips especially
when there is overshoot.
Examples when the overshoot and undershoot command line
parameters are set to 100:
Hall @ 1200 overshoot is reduced from 67-24%.
Akiyo @ 400 undershoot is reduced from 28%-15%.
Setting a lower value for undershoot or overshoot still
reduces the error further.
Impact on metrics is mixed with some gains in average psnr
but generally a little lower (e.g. 0.5%) on overall and ssim.
The GF group adaptation is still off by default in this patch.
Compared to with the head, enabling this mode now gives
big average psnr gains on the YT sets (e.g. YT_HD >11.2%),
a drop in overall PSNR (YT-HD 3.9%) and a smaller drop or
neutral for SSIM.
Yunqing Wang [Thu, 19 Feb 2015 00:38:08 +0000 (16:38 -0800)]
Improve skip_txfm thresholds in the non-rd mode selection
Modified the thresholds of deciding whether or not to skip
the transforms in model_rd_for_sb_y(). Used zbin[] instead
of dequant[] to be more precise. Also, modified the checking
coditions.
Rtc set borg test results (at speed 6) showed:
average PSNR gain: 0.138%, overall PSNR gain: 0.158%,
and SSIM gain: 0.177%.
The data rate test was modified slightly as suggested by
Marco.
Jingning Han [Fri, 13 Feb 2015 19:23:45 +0000 (11:23 -0800)]
Integral projection based motion estimation
This commit introduces a new block match motion estimation
using integral projection measurement. The 2-D block and the nearby
region is projected onto the horizontal and vertical 1-D vectors,
respectively. It then runs vector match, instead of block match,
over the two separate 1-D vectors to locate the motion compensated
reference block.
This process is run per 64x64 block to align the reference before
choosing partitioning in speed 6. The overall CPU cycle cost due
to this additional 64x64 block match (SSE2 version) takes around 2%
at low bit-rate rtc speed 6. When strong motion activities exist in
the video sequence, it substantially improves the partition
selection accuracy, thereby achieving better compression performance
and lower CPU cycles.
The experiments were tested in RTC speed -6 setting:
cloud 1080p 500 kbps
17006 b/f, 37.086 dB, 5386 ms ->
16669 b/f, 37.970 dB, 5085 ms (>0.9dB gain and 6% faster)
pedestrian_area 1080p 500 kbps
53537 b/f, 36.771 dB, 18706 ms ->
51897 b/f, 36.792 dB, 18585 ms (4% bit-rate savings)
hkuang [Mon, 9 Feb 2015 20:14:00 +0000 (12:14 -0800)]
Fix the frame parallel invalid file test failure on ARM.
There is a corner case that when a frame is corrupted, the following
inter frame decode worker will miss the previous failure. To solve
this problem, a need_resync flag needs to be added to master thread
to keep control of that.
James Zern [Sat, 14 Feb 2015 02:03:45 +0000 (18:03 -0800)]
loop_filter_rows_mt: remove dependency on 'last_height'
using this to control reallocation would miss a change if the function
were not called for every frame.
fixes potential memory corruption by the subsequent memset
James Zern [Sat, 14 Feb 2015 02:48:45 +0000 (18:48 -0800)]
test_vector_test: fix build with --disable-(vp8|vp9)
use VP[89]_INSTANTIATE_TEST_CASE case when possible to disable the tests if
the codec is unavailable.
broken since: be6aead Try again to merge branch 'frame-parallel' into master branch.
Yaowu Xu [Fri, 13 Feb 2015 22:53:11 +0000 (14:53 -0800)]
Fix an encoder/decode mismatch bug
This commit prevent the encoder to update last_frame_type when a frame
is dropped in the encoder. Prior to this fix, if there is a dropped
frame immediatedly after a key frame, decoder would have the value of
last_frame_type as key frame, different from encoder as the dropped
frame in encoder would have updated the value to an inter frame. This
leads to different probability update in encoder and decoder, thereby
encoder/decoder mismatch.