Marco Paniconi [Fri, 13 Jun 2014 17:08:09 +0000 (10:08 -0700)]
Allow for deblocking temporal-denoised signal.
Allow for an option to selectively apply the deblocking loop filter to the denoised
raw block, based on the denoised state (no-filter, filter with zero motion, or filter with non-zero motion)
of the current block and its upper and left denoised block.
This helps to reduce some blocking artifacts from the motion-compensated denoising.
hkuang [Tue, 10 Jun 2014 21:48:16 +0000 (14:48 -0700)]
Delay decreasing reference count in frame-parallel decoding.
The current decoding scheme will decrease the reference count
of the output frame when finish decoding. Then the application
could copy the frame from the decoder buffer to application buffer.
In frame-parallel decoding, a decoded frame will not be outputted
until several frames later which depends on thread numbers. So
the decoded frame's reference count should be decreased only
after application finish copying the frame out. But due to the
limitation of vpx_codec_get_frame, decoder could not know when
application finish decoding. So use a index last_show_frame to
release the last output frame's reference count.
Johann [Fri, 13 Jun 2014 02:16:59 +0000 (19:16 -0700)]
Use lrand48 on Android
When building x86 assembly use lrand48 instead of the
undocumented inlined _rand function.
Android now supports rand()
https://android-review.googlesource.com/97731
but only for new versions. Original workaround:
https://gerrit.chromium.org/gerrit/15744
Jingning Han [Fri, 30 May 2014 01:14:17 +0000 (18:14 -0700)]
Fast computation path for forward transform and quantization
This commit enables a fast path computational flow for forward
transformation. It checks the sse and variance of prediction
residuals and decides if the quantized coefficients are all
zero, dc only, or more. It then selects the corresponding coding
path in the forward transformation and quantization stage.
It is currently enabled in rtc coding mode. Will do it for rd
coding mode next.
In speed -6, the runtime for pedestrian_area 1080p at 1000 kbps
goes down from 14234 ms to 13704 ms, i.e., about 4% speed-up.
Overall coding performance for rtc set is changed by -0.18%.
Pengchong Jin [Wed, 4 Jun 2014 00:16:00 +0000 (17:16 -0700)]
skip un-neccessary motion search in the first pass
This patch allows the encoder to skip the
un-neccessary motion search in the first pass. It
calculates the error of the zero motion vector using
the last source frame as reference and skips the
further motion search in the first pass if the error
is small.
The encoding speedup of the first pass for slideshow
videos is over 30%. Borg test shows the overall PSNR
performance remain approximately the same (derf -0.009,
hd 0.387, yt 0.021, stdhd 0.065). Individual clips may
have either PSNR gain or loss. The worst PSNR perfomance
is from yt set, with a PSNR loss of -1.1.
Tom Finegan [Wed, 11 Jun 2014 01:52:58 +0000 (18:52 -0700)]
Add target armv7s-darwin-gcc.
Really just armv7. This is a convenience target intended to make iOS
development with libvpx easier. Xcode projects with default settings
will fail to build when a framework lacks armv7s support when targetting
iOS7.
hkuang [Mon, 9 Jun 2014 23:01:53 +0000 (16:01 -0700)]
Add mode info arrays and mode info index.
In non frame-parallel decoding, this works the same way as
current decoding scheme. Every time after decoder finish
decoding a frame, it will swap the current mode info pointer
and previous mode info pointer if the decoded frame needs
to be shown. Both mode info pointer and previous mode info
pointer are from mode info arrays.
In frame-parallel decoding, this will become more complicated
as current frame's mode info pointer will be shared with next
frame as previous mode info pointer. But when one decoder
thread finishes decoding one frame and starts to work on next
available frame, it needs to retain the decoded frame's mode
info pointers until next frame finishes decoding. The mode info
index will serve this purpose. The decoder will use different
buffer in the mode info arrays and use the other buffer to save
previous decoded frame’s mode info.
Yunqing Wang [Thu, 29 May 2014 23:53:23 +0000 (16:53 -0700)]
Use small transform size in non-rd real-time mode
In non-rd real-time mode, choosing smaller transform size in
encoding gives better video quality and good speed gain than
choosing larger transform size. This patch set tx size search
method to ALLOW_8X8, which is better than using 4x4 or other
larger sizes.
Borg tests on rtc set at speed 6 showed significant gain on quality.
PSNR gain: 11.034% and SSIM gain: 15.466%.
The speed gain is 5% - 12% for <720p clips, and 2% - 7% for
720p clips.
Adrian Grange [Fri, 6 Jun 2014 17:37:22 +0000 (10:37 -0700)]
Revert "Removing this_frame_stats member from TWO_PASS struct."
Use of stack frame variable "fps" beyond the lifetime of the function.
fps is sent as a paremeter to output_stats and stored in the
packet holding this encoded frame. This has scope beyond the
lifetime of the calling function.
James Zern [Fri, 6 Jun 2014 03:52:26 +0000 (20:52 -0700)]
Merge changes I0e4d807f,Ia5ff575c,Ie4a1f313
* changes:
gen_msvs_*proj.sh: strip SRC_PATH_BARE from obj names
*.mk: pass SRC_PATH_BARE to all GEN_VCPROJ invocations
build/msvs: fix builds in source dirs with spaces
Jingning Han [Wed, 28 May 2014 18:18:33 +0000 (11:18 -0700)]
Enable unit test for partial 16x16 inverse 2D-DCT
This commit enables unit test for SSSE3 16x16 inverse 2D-DCT with
10 non-zero coefficients. It includes a new test condition to
cover the potential overflow issue due to extremely coarse quantization.
Jingning Han [Tue, 3 Jun 2014 01:48:33 +0000 (18:48 -0700)]
Fix potential overflow issue in SSSE3 forward 8x8 2D-DCT
The SSSE3 implementation might find a potential overflow issue in
its second 1-D transform, if all input residual pixels are close to
255. This commit fixes the issue and re-enables the unit test on
the SSSE3 version.
Jingning Han [Mon, 2 Jun 2014 23:40:01 +0000 (16:40 -0700)]
Rework unit test for 8x8 transformation
This commit reworks the unit test for 8x8 forward/inverse
transformation. It adds extreme input value test to detect overflow
issues in the intermediate steps.
It temporarily disables unit test for the SSSE3 version, which
showed overflow failure in the new test conditions.
Paul Wilkins [Tue, 3 Jun 2014 12:03:49 +0000 (13:03 +0100)]
Fix AQ mode 2 bug where delta causes Q 0.
In Aq mode 2 for kf/arf/gf the segment q delta
is calculated and then applied by re-quantization without
going through the rd loop again. If the base Q != 0
but the segment Q == 0 (lossless) this can could give rise
to a situation where we have an illegal combination of
transform size and Q. (Q == 0 requires that all blocks
are coded 4x4 WHT).