Yunqing Wang [Fri, 4 Mar 2011 00:02:45 +0000 (19:02 -0500)]
Write SSSE3 sub-pixel filter function
1. Process 16 pixels at one time instead of 8.
2. Add check for both xoffset =0 and yoffset=0, which happens
during motion search.
This change gave encoder 1%~3% performance gain.
Ralph Giles [Tue, 8 Mar 2011 15:14:12 +0000 (07:14 -0800)]
Fix a multi-line format-string warning.
GCC 4.5 and 4.6 both issue a warning about the multi-line format
string introduced in bc9c30a0, which also changed the whitespace
in the associated stt file by line-wrapping the long format string.
Instead, use multiple string constants, which the compiler will
concatenate. This maintains the original formatting, but remains
legible within the standard line length.
John Koleszar [Tue, 8 Mar 2011 01:58:37 +0000 (20:58 -0500)]
correct zbin boost for splitmv mode
Disable zbin boost in SPLITMV mode as intended. Was incorrectly looking
at vp8_ref_frame_order instead of vp8_mode_order when comparing against
SPLITMV. This condition should have always been false, as SPLITMV is
not in the range of valid reference frames.
Paul Wilkins [Mon, 7 Mar 2011 15:58:07 +0000 (15:58 +0000)]
Improved key frame detection.
In some cases where clips have been encoded with
borders (eg. some wide-screen content where there is a
border top and bottom and slide shows containing portrait
format photographs (border left and right)) key frames were
not being correctly detected.
The new code looks to measure cases where a portion of
the image can be coded equally easily using intra or inter
modes and where the resulting error score is also very low.
These "neutral" areas are then discounted in the key frame
detection code.
Paul Wilkins [Mon, 7 Mar 2011 15:11:09 +0000 (15:11 +0000)]
Improved KF insertion after fades to still.
This code extends what was previously done for GFs, to pick
cases where insertion of a key frame after a fade (or other
transition or complex motion) followed by a still section, will
be beneficial and will reduce the number of forced key frames.
John Koleszar [Wed, 2 Mar 2011 22:02:44 +0000 (17:02 -0500)]
Fix drastic undershoot in long form content
When the modified_error_left accumulator exceeds INT_MAX, an incorrect
cast to int resulted in a negative value, causing the rate control to
allocate no bits to that keyframe group, leading to severe undershoot
and subsequent poor quality.
This error was exposed by the recent change to the rolling target and
actual spend accumulators in commit 305be4e4 which fixed them to
actually calculate the average value rather than be re-initialized
on every frame to the average per-frame bitrate. When this bug was
triggered, the target bitrate could be 0, so the rolling target
becomes small, which causes the undershoot. The code prior to 305be4e4
did not exhibit this behavior because the rolling target was always
set to a reasonable value and was independent of the actual target
bitrate. With this patch, the actual target bitrate is calculated
correctly, and the rate control tracks as expected.
This cast was likely added to silence a compiler warning on a comparison
between a double (modified_error_left) and an int (0). Instead, this
patch removes the cast and changes the comparison to be against 0.0,
which should prevent the warning from reoccuring.
This fixes issue #289. Special thanks to gnafu for his efforts in
reporting and debugging this fix.
John Koleszar [Tue, 1 Mar 2011 01:16:14 +0000 (20:16 -0500)]
change CFLAGS for 64 bit icc builds
AMD64 only implies SSE2, not SSE3. There aren't any known cases where
icc was generating SSE3 instructions since all the vectorizable code
is already in handwritten asm, so this fix is included mostly for
correctness. Fixes issue #259.
John Koleszar [Tue, 1 Mar 2011 01:06:56 +0000 (20:06 -0500)]
examples: use function to get iface pointers
MSVC can't pass the address of global variables in a DLL correctly
across DLL boundaries. This patch allows linking the examples to
a libvpx dll build. Fixes issue #268.
Tero Rintaluoma [Wed, 23 Feb 2011 11:27:27 +0000 (13:27 +0200)]
ARMv6 optimized half pixel variance calculations
Adds following ARMv6 optimized functions to the encoder:
- vp8_variance_halfpixvar16x16_h_armv6
- vp8_variance_halfpixvar16x16_v_armv6
- vp8_variance_halfpixvar16x16_hv_armv6
Yunqing Wang [Wed, 16 Feb 2011 17:00:25 +0000 (12:00 -0500)]
Allocate source buffers to be multiples of 16
Currently, when the video frame width is not multiples of 16, the
source buffer has a stride of non-multiples of 16, which forces
an unaligned load in SAD function and hurts the performance. To
avoid that, this change allocates source buffers to be multiples
of 16.
Yunqing Wang [Mon, 14 Feb 2011 21:23:49 +0000 (16:23 -0500)]
Improve vp8_sad16x16_sse3 function
In real-time mode, vp8_sad16x16 function is called heavily in
motion search part. Improvement of this function gives 1.2%
encoding performance gain (real-time mode, tulip clip).
Yunqing Wang [Fri, 11 Feb 2011 14:43:37 +0000 (09:43 -0500)]
Add improved_mv_pred flag in real-time mode
As mentioned in check-in "Improve motion search in real-time mode",
MV prediction calculation causes speed loss for speed 7 and above.
This change added a flag to turn off this calculation for speed>6
in real-time mode.
Yunqing Wang [Tue, 8 Feb 2011 00:16:15 +0000 (19:16 -0500)]
Improve motion search in real-time mode
Applied better MV prediction in real-time mode, which improves
the encoding quality.
Used quarter-pixel search instead of iterative sub-pixel search
for speed >=5 to improve encoding performance.
Tests on the test set showed:
1. For speed=-5, quality improvement: 1.7% on AvgPSNR and 2.1%
on SSIM, performance improvement: 3.6% (This counts in the
performance lose caused by MV prediction calculation in "Improve
MV prediction in vp8_pick_inter_mode() for speed>3").
2. For speed=-8, quality improvement: 2.1% on AvgPSNR and 2.5%
on SSIM. but, 6.9% performance decrease because of MV prediction
calculation. This should be improved later.