Jingning Han [Fri, 30 May 2014 01:14:17 +0000 (18:14 -0700)]
Fast computation path for forward transform and quantization
This commit enables a fast path computational flow for forward
transformation. It checks the sse and variance of prediction
residuals and decides if the quantized coefficients are all
zero, dc only, or more. It then selects the corresponding coding
path in the forward transformation and quantization stage.
It is currently enabled in rtc coding mode. Will do it for rd
coding mode next.
In speed -6, the runtime for pedestrian_area 1080p at 1000 kbps
goes down from 14234 ms to 13704 ms, i.e., about 4% speed-up.
Overall coding performance for rtc set is changed by -0.18%.
hkuang [Mon, 9 Jun 2014 23:01:53 +0000 (16:01 -0700)]
Add mode info arrays and mode info index.
In non frame-parallel decoding, this works the same way as
current decoding scheme. Every time after decoder finish
decoding a frame, it will swap the current mode info pointer
and previous mode info pointer if the decoded frame needs
to be shown. Both mode info pointer and previous mode info
pointer are from mode info arrays.
In frame-parallel decoding, this will become more complicated
as current frame's mode info pointer will be shared with next
frame as previous mode info pointer. But when one decoder
thread finishes decoding one frame and starts to work on next
available frame, it needs to retain the decoded frame's mode
info pointers until next frame finishes decoding. The mode info
index will serve this purpose. The decoder will use different
buffer in the mode info arrays and use the other buffer to save
previous decoded frame’s mode info.
Yunqing Wang [Thu, 29 May 2014 23:53:23 +0000 (16:53 -0700)]
Use small transform size in non-rd real-time mode
In non-rd real-time mode, choosing smaller transform size in
encoding gives better video quality and good speed gain than
choosing larger transform size. This patch set tx size search
method to ALLOW_8X8, which is better than using 4x4 or other
larger sizes.
Borg tests on rtc set at speed 6 showed significant gain on quality.
PSNR gain: 11.034% and SSIM gain: 15.466%.
The speed gain is 5% - 12% for <720p clips, and 2% - 7% for
720p clips.
Adrian Grange [Fri, 6 Jun 2014 17:37:22 +0000 (10:37 -0700)]
Revert "Removing this_frame_stats member from TWO_PASS struct."
Use of stack frame variable "fps" beyond the lifetime of the function.
fps is sent as a paremeter to output_stats and stored in the
packet holding this encoded frame. This has scope beyond the
lifetime of the calling function.
James Zern [Fri, 6 Jun 2014 03:52:26 +0000 (20:52 -0700)]
Merge changes I0e4d807f,Ia5ff575c,Ie4a1f313
* changes:
gen_msvs_*proj.sh: strip SRC_PATH_BARE from obj names
*.mk: pass SRC_PATH_BARE to all GEN_VCPROJ invocations
build/msvs: fix builds in source dirs with spaces
Jingning Han [Wed, 28 May 2014 18:18:33 +0000 (11:18 -0700)]
Enable unit test for partial 16x16 inverse 2D-DCT
This commit enables unit test for SSSE3 16x16 inverse 2D-DCT with
10 non-zero coefficients. It includes a new test condition to
cover the potential overflow issue due to extremely coarse quantization.
Jingning Han [Tue, 3 Jun 2014 01:48:33 +0000 (18:48 -0700)]
Fix potential overflow issue in SSSE3 forward 8x8 2D-DCT
The SSSE3 implementation might find a potential overflow issue in
its second 1-D transform, if all input residual pixels are close to
255. This commit fixes the issue and re-enables the unit test on
the SSSE3 version.
Jingning Han [Mon, 2 Jun 2014 23:40:01 +0000 (16:40 -0700)]
Rework unit test for 8x8 transformation
This commit reworks the unit test for 8x8 forward/inverse
transformation. It adds extreme input value test to detect overflow
issues in the intermediate steps.
It temporarily disables unit test for the SSSE3 version, which
showed overflow failure in the new test conditions.
Paul Wilkins [Tue, 3 Jun 2014 12:03:49 +0000 (13:03 +0100)]
Fix AQ mode 2 bug where delta causes Q 0.
In Aq mode 2 for kf/arf/gf the segment q delta
is calculated and then applied by re-quantization without
going through the rd loop again. If the base Q != 0
but the segment Q == 0 (lossless) this can could give rise
to a situation where we have an illegal combination of
transform size and Q. (Q == 0 requires that all blocks
are coded 4x4 WHT).
James Zern [Mon, 2 Jun 2014 22:58:32 +0000 (15:58 -0700)]
build/msvs: fix builds in source dirs with spaces
...when configured below the path containing spaces. configuring outside
the path containing spaces still won't work due to issues with the
makefiles, e.g.,
/path with spaces/git
/path with spaces/build1
/build2
configure/make in build1 will work, build2 will not
Jingning Han [Thu, 29 May 2014 19:50:54 +0000 (12:50 -0700)]
Add overflow check unit test for 16x16 inverse DCT/ADST transform
This commit applies quantization process with coarse quantization
step size to the forward transform coefficients and tests all the
inverse 16x16 DCT and ADST implementation versions with the
dequantized coefficients as input, to verify that the outcomes
match the prototype.
hkuang [Fri, 23 May 2014 22:18:41 +0000 (15:18 -0700)]
Refactor the vp9_get_frame code for frame parallel.
In frame parallel decoding mode, there will be still several frames inside
the decoder when application stop calling vpx_codec_decode to decode frames.
The application will need to keep calling vpx_codec_get_frame to get all the
remaining decoded frames in the decoder.
Yaowu Xu [Fri, 30 May 2014 17:15:30 +0000 (10:15 -0700)]
Fix a problem of using an uninitialized parameter
This commit added a call to set speed feature before initializing
motion search, fixed the problem where unintialized search method
is used before its value being set.
Marco Paniconi [Tue, 27 May 2014 23:44:17 +0000 (16:44 -0700)]
vp8 denoiser: fix to zero_mv mode selection.
In the current logic, if the sse for zero motion is smaller
than the sse for new_mv (i.e., best_sse), we may still end up
using the non-zero mv for denoising (if the magnitude of new_mv is above threshold).
This can happen for very noisy content, and can lead to artifacts.
This change ensures that we always use zero_mv (over new_mv) for
denoisng if sse_zero_mv <= best_sse.
Jingning Han [Wed, 28 May 2014 17:51:09 +0000 (10:51 -0700)]
Enable SSSE3 inverse 2D-DCT with 10 non-zero coeffs
This commit enables SSSE3 implementation of the inverse 2D-DCT
with only first 10 coefficients non-zero. It reduces the runtime
of SSE2 version from 745 cycles to 538 cycles, i.e., 27% speed-up.