Yunqing Wang [Fri, 10 Aug 2012 19:35:55 +0000 (12:35 -0700)]
Encoder denoiser performance improvement
The denoiser function was modified to reduce the computational
complexity.
1. The denoiser c function modification:
The original implementation calculated pixel's filter_coefficient
based on the pixel value difference between current raw frame and last
denoised raw frame, and stored them in lookup tables. For each pixel c,
find its coefficient using
filter_coefficient[c] = LUT[abs_diff[c]];
and then apply filtering operation for the pixel.
The denoising filter costed about 12% of encoding time when it was
turned on, and half of the time was spent on finding coefficients in
lookup tables. In order to simplify the process, a short cut was taken.
The pixel adjustments vs. pixel diff value were calculated ahead of time.
adjustment = filtered_value - current_raw
= (filter_coefficient * diff + 128) >> 8
The adjustment vs. diff curve becomes flat very quick when diff increases.
This allowed us to use only several levels to get a close approximation
of the curve. Following the denoiser algorithm, the adjustments are
further modified according to how big the motion magnitude is.
2. The sse2 function was rewritten.
This change made denoiser filter function 3x faster, and improved the
encoder performance by 7% ~ 10% with the denoiser on.
Yunqing Wang [Tue, 21 Aug 2012 17:52:35 +0000 (10:52 -0700)]
Add biasing to ZEROMV for videos with static background
For videos with big static background(such as video conferencing
clips), the mode decision was biased to ZEROMV in order to
obtain a stable background. The percentage of ZEROMV on last
frame was used to predict if there is static area in current frame,
and checking already-encoded neighboring macroblocks' motion
vectors to make sure the local area has low motion.
Yunqing Wang [Tue, 21 Aug 2012 00:31:44 +0000 (17:31 -0700)]
Fix inter_zz_count calculation bug
The current way of counting inter_zz_count doesn't work correctly
in multi-threaded encoding. Calculating it after the frame is
encoded fixed the problem.
James Zern [Wed, 15 Aug 2012 18:45:12 +0000 (11:45 -0700)]
fix msvc configure
visual studio targets do not depend on executables, only the projects
produced.
tested with --target=x86-win32-vs9
fixes:
...
make[1]: *** No rule to make target `test_libvpx', needed by `.bins'.
Stop.
Makefile:17: recipe for target `.DEFAULT' failed
Mike Frysinger [Wed, 15 Aug 2012 15:55:31 +0000 (11:55 -0400)]
Parse out arm isa targets from dumpmachine
The current parsing logic of the dumpmachine tuple lacks any arm
cases which means tgt_isa never gets set, so for all arm targets,
we get detected as generic-gnu. Add some basic arm checks here
so the automatic detection logic works.
Mike Frysinger [Tue, 14 Aug 2012 18:24:28 +0000 (14:24 -0400)]
do not error out on generic-gnu + --enable-shared
If you build with --enabled-shared on a Linux arch not explicitly
listed, the configure script will abort because it didn't detect
"linux" in the fallback generic-gnu tuple.
Since this is the fallback tuple and people are passing
--enable-shared, assume the user knows what they're in for.
Scott LaVarnway [Thu, 2 Aug 2012 18:58:09 +0000 (11:58 -0700)]
Added row based loopfilter
Interleaved loopfiltering with decode. For 1080p clips, up to 1%
performance gain. For 4k clips, up to 10% seen. This patch is required
for better "frame-based" multithreading.
Attila Nagy [Tue, 31 Jul 2012 11:04:45 +0000 (14:04 +0300)]
Fix potential encoder dead-lock after picture resize
The sync interval for the multithreaded encoder was considered as not changing
during the encoding. This is not true if picture size is changed.
The encoder could dead-lock because the main thread and the other threads were
using different sync interval.
Attila Nagy [Tue, 31 Jul 2012 08:52:10 +0000 (11:52 +0300)]
Fix encoder mem allocation when picture size is changed
After the picture size was changed to a bigger one, the internal memory was
corrupted and multithreaded encoder was deadlocking.
Memory for last frame's MVs, segmentation map and active map were allocated when
the compressor was created (vp8_create_compressor). Buffers need to be
reallocated when picture size is changed, so, the allocation was moved to
vp8_alloc_compressor_data, which is called every time the picture is resized.
Attila Nagy [Tue, 24 Apr 2012 12:17:28 +0000 (15:17 +0300)]
Optimizes updates of encoder block ptrs
Precalculated block ptrs do not need updates during encoding.
Set these at init stage.
Moved the allocation of 'mt_current_mb_col' (last encoded MB on each
row) to vp8_alloc_compressor_data(), so that it is correctly
reallocated when frame size is changing.
Yunqing Wang [Fri, 8 Jun 2012 15:17:50 +0000 (11:17 -0400)]
multi-res: add drop_frame support
Added drop_frame support in multi-resolution encoder.
If one frame is dropped at a lower-resolution level, the next
upper-resolution level encoder needs to encode that frame
independently without any lower-resolution level motion
information.
Another issue is that if one frame is dropped at some but not all
resolution levels, a frame after that one may use different set
of reference frames at different resolution levels. This reference
frame asynchronization could degrade motion search precision in
upper-resolution level encoding, which uses lower-resolution level
motion result. This change compares the lower-resolution and upper-
resolution level's reference frames. If they are not the same, the
upper-resolution level encoder can not use lower-resolution level
motion result.
Yunqing Wang [Wed, 11 Jul 2012 18:43:51 +0000 (11:43 -0700)]
multi-res: add parameter validity checking
Added validity checking in multi-res encoder. Disable spatial
resampling and frame dropping before we have those supports.
Also, deallocate the memory for all resolution levels once error
occurs.
John Koleszar [Tue, 10 Jul 2012 22:43:44 +0000 (15:43 -0700)]
keyframe_test: use a fixed speed step for realtime
The lower complexity modes may not generate a keyframe automatically.
This behavior was found when running under Valgrind, as the slow
performance caused the speed selection to pick lower complexities than
when running natively. Instead, use a fixed complexity for the
realtime auto keyframe test.
Yunqing Wang [Thu, 28 Jun 2012 17:37:53 +0000 (10:37 -0700)]
Add unit test for vp8_sixtap_predict functions
This unit test tests vp8_sixtap_predict function against preset
data and random generated data. The test against preset data
checks the correctness of the functions, and the test against
random data checks if the optimized six-tap predictor functions
generate matching result as the c functions. It tests the
following functions:
vp8_sixtap_predict16x16_c
vp8_sixtap_predict16x16_mmx
vp8_sixtap_predict16x16_sse2
vp8_sixtap_predict16x16_ssse3
Yunqing Wang [Mon, 2 Jul 2012 23:50:48 +0000 (16:50 -0700)]
Add 0 offsets handling in SSSE3 sixtap_predict functions
This patch fixed issue 458 by calling copy function when both
offsets are 0, which guarantees the SSSE3 functions output
same result as the c function for all possible offsets.
John Koleszar [Fri, 29 Jun 2012 19:15:00 +0000 (12:15 -0700)]
Build unit test driver from the default target
We need an easy way to build the unit test driver without running the
tests. This enables passing options like --gtest_filter to the
executable, which can't be done very cleanly when running under
`make test`.
Fixed a number of compiler errors/warnings when building the tests
in various configurations by Jenkins.