John Koleszar [Wed, 2 Mar 2011 22:02:44 +0000 (17:02 -0500)]
Fix drastic undershoot in long form content
When the modified_error_left accumulator exceeds INT_MAX, an incorrect
cast to int resulted in a negative value, causing the rate control to
allocate no bits to that keyframe group, leading to severe undershoot
and subsequent poor quality.
This error was exposed by the recent change to the rolling target and
actual spend accumulators in commit 305be4e4 which fixed them to
actually calculate the average value rather than be re-initialized
on every frame to the average per-frame bitrate. When this bug was
triggered, the target bitrate could be 0, so the rolling target
becomes small, which causes the undershoot. The code prior to 305be4e4
did not exhibit this behavior because the rolling target was always
set to a reasonable value and was independent of the actual target
bitrate. With this patch, the actual target bitrate is calculated
correctly, and the rate control tracks as expected.
This cast was likely added to silence a compiler warning on a comparison
between a double (modified_error_left) and an int (0). Instead, this
patch removes the cast and changes the comparison to be against 0.0,
which should prevent the warning from reoccuring.
This fixes issue #289. Special thanks to gnafu for his efforts in
reporting and debugging this fix.
Tero Rintaluoma [Wed, 23 Feb 2011 11:27:27 +0000 (13:27 +0200)]
ARMv6 optimized half pixel variance calculations
Adds following ARMv6 optimized functions to the encoder:
- vp8_variance_halfpixvar16x16_h_armv6
- vp8_variance_halfpixvar16x16_v_armv6
- vp8_variance_halfpixvar16x16_hv_armv6
Yunqing Wang [Wed, 16 Feb 2011 17:00:25 +0000 (12:00 -0500)]
Allocate source buffers to be multiples of 16
Currently, when the video frame width is not multiples of 16, the
source buffer has a stride of non-multiples of 16, which forces
an unaligned load in SAD function and hurts the performance. To
avoid that, this change allocates source buffers to be multiples
of 16.
Yunqing Wang [Mon, 14 Feb 2011 21:23:49 +0000 (16:23 -0500)]
Improve vp8_sad16x16_sse3 function
In real-time mode, vp8_sad16x16 function is called heavily in
motion search part. Improvement of this function gives 1.2%
encoding performance gain (real-time mode, tulip clip).
Yunqing Wang [Fri, 11 Feb 2011 14:43:37 +0000 (09:43 -0500)]
Add improved_mv_pred flag in real-time mode
As mentioned in check-in "Improve motion search in real-time mode",
MV prediction calculation causes speed loss for speed 7 and above.
This change added a flag to turn off this calculation for speed>6
in real-time mode.
Yunqing Wang [Tue, 8 Feb 2011 00:16:15 +0000 (19:16 -0500)]
Improve motion search in real-time mode
Applied better MV prediction in real-time mode, which improves
the encoding quality.
Used quarter-pixel search instead of iterative sub-pixel search
for speed >=5 to improve encoding performance.
Tests on the test set showed:
1. For speed=-5, quality improvement: 1.7% on AvgPSNR and 2.1%
on SSIM, performance improvement: 3.6% (This counts in the
performance lose caused by MV prediction calculation in "Improve
MV prediction in vp8_pick_inter_mode() for speed>3").
2. For speed=-8, quality improvement: 2.1% on AvgPSNR and 2.5%
on SSIM. but, 6.9% performance decrease because of MV prediction
calculation. This should be improved later.
John Koleszar [Wed, 9 Feb 2011 17:50:17 +0000 (12:50 -0500)]
correct cost for implicit bit in mvs
Use 0xFFF0 vice 240 (0xF0) for determining whether the sometimes
implicit bit 3 will be transmitted. This is consistent with the decoder
and encode_mvcomponent().
Johann [Fri, 4 Feb 2011 22:44:31 +0000 (17:44 -0500)]
clarify *_offsets.asm differences
it's difficult to mux the *_offsets.c files because of header conflicts.
make three instead, name them consistently and partititon the contents
to allow building them as required.
John Koleszar [Fri, 4 Feb 2011 16:36:04 +0000 (11:36 -0500)]
correct quantizer initialization
The encoder was not correctly catching transitions in the quantizer
deltas. If a delta_q was set, then the quantizer would be reinitialized
on every frame, but if they transitioned to 0, the quantizer would
not be reinitialized, leading to a encode-decode mismatch.
This bug was triggered by commit 999e155, which sets a Y2 delta Q
for very low base Q levels.
Attila Nagy [Wed, 2 Feb 2011 11:10:27 +0000 (13:10 +0200)]
Delay auto key frame insertion in realtime configuration
Whe auto keyframe insertion is enabled and conditions are right (scene change)
the encoder can decide to insert a key frame and does a re-encoding. This can
introduce extra latency. In RT mode we do not do the re-encoding of the current
frame but force the next frame to key frame.
Attila Nagy [Wed, 26 Jan 2011 08:29:46 +0000 (10:29 +0200)]
Improved encoder threading
Reduce the number of sync points by letting each thread
continue imediatly with a new MB row.
Better multicore scaling, improves performance by 5-20% on ARM multicore.
Scott LaVarnway [Tue, 1 Feb 2011 00:53:02 +0000 (19:53 -0500)]
Removed prediction_error accumulation
from vp8cx_encode_intra_macro_block. prediction_error is used when
deciding if a frame should be a keyframe. After reviewing this with
Yaowu, it was pointed out that vp8cx_encode_intra_macro_block
is only called for keyframes, so the accumulation is unnecessary.
Scott LaVarnway [Mon, 31 Jan 2011 22:43:18 +0000 (17:43 -0500)]
Possible bug in vp8cx_encode_intra_macro_block
vp8_pick_intra4x4mby_modes uses the passed in distortion
for an early breakout. The best distortion was never saved
and the distortion for TM_PRED was always used.