Yunqing Wang [Wed, 27 Apr 2011 17:40:39 +0000 (13:40 -0400)]
Use insertion sort instead of quick sort
Insertion sort performs better for sorting small arrays. In real-
time encoding (speed=-5), test on test set showed 1.7% performance
gain with 0% PSNR change in average.
John Koleszar [Wed, 27 Apr 2011 16:04:48 +0000 (12:04 -0400)]
vpxdec: test for frame corruption
This change simply exercises the VP8D_GET_FRAME_CORRUPTED control,
outputting a warning message at the end if the bit was set for any
frames. Should never produce any output for good input.
John Koleszar [Tue, 26 Apr 2011 16:36:03 +0000 (12:36 -0400)]
Update VP8DX_BOOL_DECODER_FILL to better detect EOS
Allow more reliable detection of truncated bitstreams by being more
precise with the count of "virtual" bits in the value buffer.
Specifically, the VP8_LOTS_OF_BITS value is accumulated into count,
rather than being assigned, which was losing the prior value,
increasing the required tolerance when testing for the error condition.
John Koleszar [Mon, 25 Apr 2011 19:02:54 +0000 (15:02 -0400)]
Move pick_frame_size() to ratectrl.c
This is a first step in cleaning up the redundancies between
vp8_calc_{auto_,}iframe_target_size. The pick_frame_size() function is
moved to ratectrl.c, and made to be the primary interface. This means
that the various calc_*_target_size functions can be made private.
John Koleszar [Tue, 19 Apr 2011 18:05:27 +0000 (14:05 -0400)]
Limit size of initial keyframe in one-pass.
Rather than using a default size of 1/2 or 3/2 seconds for the first
frame, use a fraction of the initial buffer level to give the
application some control.
This will likely undergo further refinement as size limits on key
frames are currently under discussion on codec-devel@, but this gives
much better behavior for small buffer sizes as a starting point.
Create a new input parameter to allow specifying
the packed frame stereo 3d format. A default value
of mono will be written in the absence of user
specified input
Ronald S. Bultje [Thu, 21 Apr 2011 20:35:02 +0000 (16:35 -0400)]
Fix overflow in temporal_filter_apply_sse2().
The accumulator array is an integer array, so use paddd instead of paddw
to add values to it. Fixes overflows when using large --arnr-maxframes
(>8) values.
Adrian Grange [Thu, 21 Apr 2011 22:45:57 +0000 (15:45 -0700)]
Corrected format specifiers in debug print statements
The arguments to these fprintfs are int not long int so
the format specifier should be "%d" and not "%ld". This
was writing garbage in the linux build.
make two compiler options explicit for Visual Studio projects
This patch changes the release configuration of MS VS projects to
explicitly use two compiler options "Maximize Speed (/O2)" and
"Favor fast code(/Ot)".
Johann [Wed, 13 Apr 2011 20:38:02 +0000 (16:38 -0400)]
keep values in registers during quantization
add an sse4 quantizer so we can use pinsrw/pextrw and keep values in xmm
registers instead of proxying through the stack. and as long as we're
bumping up, use some ssse3 instructions in the EOB detection (see ssse3
fast quantizer)
pick up about a percent on 32bit and about two on 64bit.
Scott LaVarnway [Thu, 21 Apr 2011 18:38:36 +0000 (14:38 -0400)]
Removed dc_diff from MB_MODE_INFO
The dc_diff flag is used to skip loopfiltering. Instead
of setting this flag in the decoder/encoder, we now check
for this condition in the loopfilter.
John Koleszar [Tue, 19 Apr 2011 20:08:45 +0000 (16:08 -0400)]
Remove unused kf rate variables
Remove tot_key_frame_bits and prior_key_frame_size[] as they were
tracked but never used. Remove intra_frame_target, as it was only
used to initialize prior_key_frame_size.
Refactor vp8_adjust_key_frame_context() some to remove unnecessary
calculations.
Johann [Fri, 15 Apr 2011 14:05:20 +0000 (10:05 -0400)]
modify SAVE_XMM for potential 64bit use
the win64 abi requires saving and restoring xmm6:xmm15. currently
SAVE_XMM and RESTORE XMM only allow for saving xmm6:xmm7. allow
specifying the highest register used and if the stack is unaligned.
Johann [Thu, 7 Apr 2011 17:17:22 +0000 (13:17 -0400)]
Add save/restore xmm registers in x86 assembly code
Went through the code and fixed it. Verified on Windows.
Where possible, remove dependencies on xmm[67]
Current code relies on pushing rbp to the stack to get 16 byte
alignment. This broke when rbp wasn't pushed
(vp8/encoder/x86/sad_sse3.asm). Work around this by using unaligned
memory accesses. Revisit this and the offsets in
vp8/encoder/x86/sad_sse3.asm in another change to SAVE_XMM.
Yunqing Wang [Mon, 18 Apr 2011 19:48:34 +0000 (15:48 -0400)]
Use sub-pixel search's SSE in mode selection
Passed SSE from sub-pixel search back to pick_inter_mode
function, which is compared with the encode_breakout to
see if we could skip evaluating the remaining modes.
Yunqing Wang [Fri, 15 Apr 2011 16:57:15 +0000 (12:57 -0400)]
Handle long delay between video frames in multi-thread decoder(issue 312)
This is reported by m...@hesotech.de (see issue 312):
"The decoder causes an access violation
when you decode the first frame, then make a pause of about
60 seconds and then decode further frames. But only if
vpx_codec_dec_cfg_t.threads> 1.
This is caused by a timeout of WaitForSingleObject.
When I change the definition of VPXINFINITE to INFINITE(0xFFFFFFFF),
the problem is solved."
Reproduced the crash and verified the changes on Windows platform.
This brings the behavior inline with the other platforms using sem_wait().
Johann [Fri, 15 Apr 2011 14:11:53 +0000 (10:11 -0400)]
remove dead code, add missing RESTORE_XMM
vp8_filter_block1d16_h4_ssse3 was never called
because UNSHADOW_ARGS moves the stack by 'mov rsp, rbp', the issue was
masked. however, if/when win64 used those registers for persistant data,
issues could/will arise.
Adrian Grange [Wed, 13 Apr 2011 19:56:46 +0000 (12:56 -0700)]
Fixed use of early breakout in vp8_pick_intra4x4mby_modes
Index i is used to detect early breakout from the first loop, but
its value is lost due to reuse in the second for loop. I moved
the position of the second loop and did some format cleanup.
John Koleszar [Wed, 13 Apr 2011 18:00:18 +0000 (14:00 -0400)]
Refactor lookahead ring buffer
This patch cleans up the source buffer storage and copy mechanism to
allow access through a standard push/pop/peek interface. This approach
also avoids an extra copy in the case where the source is not a
multiple of 16, fixing issue #102.
John Koleszar [Mon, 11 Apr 2011 15:29:23 +0000 (11:29 -0400)]
Change rc undershoot/overshoot semantics
This patch changes the rc_undershoot_pct and rc_overshoot_pct controls
to set the "aggressiveness" of rate adaptation, by limiting the
amount of difference between the target buffer level and the actual
buffer level which is applied to the target frame rate for this frame.
This patch was initially provided by arosenberg at logitech.com as
an attachment to issue #270. It was modified to separate these controls
from the other unrelated modifications in that patch, as well as to
use the pre-existing variables rather than introducing new ones.
John Koleszar [Mon, 11 Apr 2011 17:05:08 +0000 (13:05 -0400)]
Bugfix for error accumulator stats
Previous to commit de4e9e3, there was an early return in the alt-ref
case that was inadvertantly removed when the function was refactored
to return void. This patch restores the prior behavior.
Jim Bankoski [Mon, 28 Mar 2011 23:39:05 +0000 (16:39 -0700)]
fixed an overflow in ssim calculation
This commit fixed an overflow in ssim calculation, added register
save and restore to make sure assembly code working for x64 platform.
It also changed the sampling points to every 4x4 instead of 8x8 and
adjusted the constants in SSIM calculation to match the scale of
previous VPXSSIM.