hkuang [Tue, 17 Jun 2014 01:23:06 +0000 (18:23 -0700)]
Introduce FrameWorker for decoding.
When decoding in serial mode, there will be only
one FrameWorker doing decoding. When decoding in
parallel mode, there will be several FrameWorkers
doing decoding in parallel.
Jim Bankoski [Fri, 20 Jun 2014 20:52:06 +0000 (13:52 -0700)]
Validate error checking code in decoder.
This patch adds a mechanism for insuring error checking on invalid files
by creating a unit test that runs the decoder and tests that the error
code matches what's expected on each frame in the decoder.
Disabled for now as this unit test will segfault with existing code.
Jingning Han [Thu, 19 Jun 2014 16:24:09 +0000 (09:24 -0700)]
Allow key frame more flexibility in mode search
This commit allows the key frame to search through more prediction
modes and more flexible block sizes. No speed change observed. The
coding performance for rtc set is improved by 1.7% for speed -5 and
3.0% for speed -6.
hkuang [Fri, 13 Jun 2014 21:14:02 +0000 (14:14 -0700)]
Add superframe support for frame parallel decoding.
A superframe is a bunch of frames that bundled as one frame. It is mostly
used to combine one or more non-displayable frames and one displayable frame.
For frame parallel decoding, libvpx decoder will only support decoding one
normal frame or a super frame with superframe index.
If an application pass a superframe without superframe index or a chunk
of displayable frames without superframe index to libvpx decoder, libvpx
will not decode it in frame parallel mode. But libvpx decoder still could
decode it in serial mode.
Yunqing Wang [Tue, 17 Jun 2014 23:59:28 +0000 (16:59 -0700)]
Modify non-rd intra mode checking
Speed 6 uses small tx size, namely 8x8. max_intra_bsize needs to
be modified accordingly to ensure valid intra mode checking.
Borg test on RTC set showed an overall PSNR gain of 0.335% in speed
-6.
This also changes speed -5 encoding by allowing DC_PRED checking
for block32x32. Borg test on RTC set showed a slight PSNR gain of
0.145%, and no noticeable speed change.
Jingning Han [Wed, 18 Jun 2014 17:50:38 +0000 (10:50 -0700)]
Separate rate-distortion modeling for DC and AC coefficients
This is the first step to rework the rate-distortion modeling used
in rtc coding mode. The overall goal is to make the modeling
customized for the statistics encountered in the rtc coding.
This commit makes encoder to perform rate-distortion modeling for
DC and AC coefficients separately. No speed changes observed.
The coding performance for pedestrian_area_1080p is largely
improved:
speed -5, from 79558 b/f, 37.871 dB -> 79598 b/f, 38.600 dB
speed -6, from 79515 b/f, 37.822 dB -> 79544 b/f, 38.130 dB
Overall performance for rtc set at speed -6 is improved by 0.67%.
James Zern [Fri, 13 Jun 2014 03:51:39 +0000 (20:51 -0700)]
gen_msvs_proj: fix in tree configure under cygwin
strip trailing '/' from paths, this is later converted to '\' which
causes execution errors for obj_int_extract/yasm. vs10+ wasn't affected
by this issue, but make the same change for consistency.
gen_msvs_proj:
+ add missing '"' to obj_int_extract call
unlike gen_msvs_vcproj, the block is duplicated
missed in: 1e3d9b9 build/msvs: fix builds in source dirs with spaces
Pengchong Jin [Mon, 16 Jun 2014 16:17:45 +0000 (09:17 -0700)]
skip the un-necessary motion search in the first pass
This patch allows the VP9 encoder to skip the un-necessary
motion search in the first pass. It computes the motion error
of 0,0 motion using the last source frame as the reference,
and skips the further motion search if this error is small.
Borg test shows overall the patch gives PSNR gain (derf -0.001%,
yt 0.341%, hd 0.282%). Individual clips may have PSNR gain or
loss. The best PSNR performance is 7.347% and the worst is -0.662%.
The first pass encoding speedup for slideshow clips is over 30%.
Marco Paniconi [Fri, 13 Jun 2014 17:08:09 +0000 (10:08 -0700)]
Allow for deblocking temporal-denoised signal.
Allow for an option to selectively apply the deblocking loop filter to the denoised
raw block, based on the denoised state (no-filter, filter with zero motion, or filter with non-zero motion)
of the current block and its upper and left denoised block.
This helps to reduce some blocking artifacts from the motion-compensated denoising.
Jingning Han [Thu, 12 Jun 2014 19:23:06 +0000 (12:23 -0700)]
Fix out of boundary memory read in fuzz test on vpxdec
This commit fixes frame header decoding for superframe index, to
prevent out of boundary memory read triggered by fuzz test
vector. It resolves a chromium security violation issue
crbug.com/376802.
The issue was introduced in the change:
Add VPXD_SET_DECRYPTOR support to the VP9 decoder.
cl-id I88f86c8ff9af34e0b6531028b691921b54c2fc48
where the buffer was read before validation check on index offset
applied.
hkuang [Tue, 10 Jun 2014 21:48:16 +0000 (14:48 -0700)]
Delay decreasing reference count in frame-parallel decoding.
The current decoding scheme will decrease the reference count
of the output frame when finish decoding. Then the application
could copy the frame from the decoder buffer to application buffer.
In frame-parallel decoding, a decoded frame will not be outputted
until several frames later which depends on thread numbers. So
the decoded frame's reference count should be decreased only
after application finish copying the frame out. But due to the
limitation of vpx_codec_get_frame, decoder could not know when
application finish decoding. So use a index last_show_frame to
release the last output frame's reference count.
Johann [Fri, 13 Jun 2014 02:16:59 +0000 (19:16 -0700)]
Use lrand48 on Android
When building x86 assembly use lrand48 instead of the
undocumented inlined _rand function.
Android now supports rand()
https://android-review.googlesource.com/97731
but only for new versions. Original workaround:
https://gerrit.chromium.org/gerrit/15744
Jingning Han [Fri, 30 May 2014 01:14:17 +0000 (18:14 -0700)]
Fast computation path for forward transform and quantization
This commit enables a fast path computational flow for forward
transformation. It checks the sse and variance of prediction
residuals and decides if the quantized coefficients are all
zero, dc only, or more. It then selects the corresponding coding
path in the forward transformation and quantization stage.
It is currently enabled in rtc coding mode. Will do it for rd
coding mode next.
In speed -6, the runtime for pedestrian_area 1080p at 1000 kbps
goes down from 14234 ms to 13704 ms, i.e., about 4% speed-up.
Overall coding performance for rtc set is changed by -0.18%.
Pengchong Jin [Wed, 4 Jun 2014 00:16:00 +0000 (17:16 -0700)]
skip un-neccessary motion search in the first pass
This patch allows the encoder to skip the
un-neccessary motion search in the first pass. It
calculates the error of the zero motion vector using
the last source frame as reference and skips the
further motion search in the first pass if the error
is small.
The encoding speedup of the first pass for slideshow
videos is over 30%. Borg test shows the overall PSNR
performance remain approximately the same (derf -0.009,
hd 0.387, yt 0.021, stdhd 0.065). Individual clips may
have either PSNR gain or loss. The worst PSNR perfomance
is from yt set, with a PSNR loss of -1.1.