Paul Wilkins [Thu, 12 May 2011 16:01:55 +0000 (17:01 +0100)]
Restructure of activity masking code.
This commit restructures the mb activity masking code
to better facilitate experimentation using different metrics
etc. and also allows for adjustment of the zero bin either
for encode only or both the encode and mode selection
stages
It also uses information from the current frame rather than
the previous frame and the default strength has been
reduced.
Yaowu Xu [Wed, 11 May 2011 02:57:51 +0000 (19:57 -0700)]
remove a variable no longer in use
The variable is introduced in commit 2e53e9e53 to make more use of
trellis quantization, but this is no longer necessary after RDMULT
was made adaptive in a number of later commits.
Johann [Tue, 10 May 2011 19:58:56 +0000 (15:58 -0400)]
set up Global Offset Table in recon
global values were being referenced, but the GOT was not being set up.
as the GOT is only required for PIC, this issue wasn't caught in the
default configuration.
Yunqing Wang [Fri, 6 May 2011 16:51:31 +0000 (12:51 -0400)]
Use diamond search to replace full search in full-pixel refining search
In NEWMV mode, currently, full search is used as the refining search
after n-step search. By replacing it with an iterative diamond search
of radius 1 largely reduced the computation complexity, but still
maintained the same encoding quality since the refining search is
done for every macroblock instead of only a small precentage of
macroblocks while using full search.
Tests on the test set showed a 3.4% encoding speed increase with none
psnr & ssim loss.
Yaowu Xu [Fri, 6 May 2011 16:00:44 +0000 (09:00 -0700)]
fix a bug related to gf_active_flags in multi-threaded encoder
Paul pointed out that the pointer to the gf_active_flags is not being
properly incremented in multithreaded encoder. This commit fixes the
issue by making sure the gf_active_ptr points to the starting of next
group of mb rows.
John Koleszar [Fri, 6 May 2011 15:48:50 +0000 (11:48 -0400)]
Don't override active_worst_quality in 2 pass
Commit db5057c introduced a bug in that the active_worst_quality
selected by the 2 pass rate controller was being overridden for key
frames, causing a severe quality loss.
vp8_fast_quantize_b_neon function updated and further optimized.
- match current C implementation of fast quantizer
- updated to use asm_enc_offsets for structure members
- updated ads2gas scripts to handle alignment issues
Aron Rosenberg [Fri, 6 May 2011 04:10:37 +0000 (00:10 -0400)]
Fix semaphore emulation on Windows
The existing emulation of posix semaphores on Windows uses SetEvent()
and WaitForSingleObject(), which implements a binary semaphore, not a
counting semaphore as implemented by posix. This causes deadlock when
used with the expected posix semantics. Instead, this patch uses the
CreateSemaphore() and ReleaseSemaphore() calls (introduced in Windows
2000) which have the expected behavior.
This patch also reverts commit eb16f00, which split a semaphore that
was being used with counting semantics into two binary semaphores.
That commit is unnecessary with corrected emulation.
Yunqing Wang [Thu, 5 May 2011 14:42:29 +0000 (10:42 -0400)]
Fix rare hang in multi-thread encoder on Windows
This patch is to fix a rare hang in multi-thread encoder that was
only seen on Windows. Thanks for John's help in debugging the
problem. More test is needed.
Yaowu Xu [Mon, 2 May 2011 22:27:14 +0000 (15:27 -0700)]
change to use fast ssim code for internal ssim calculations
The commit also removed the slow ssim calculation that uses a 7x7
kernel, and revised the comments to better describe how sample ssim
values are computed and averaged
Always use CFLAGS/LDFLAGS that point to headers and libvpx.a inside our
build tree before ones from the environment, which could reference
headers or libs outside the build tree.
Yunqing Wang [Wed, 27 Apr 2011 17:40:39 +0000 (13:40 -0400)]
Use insertion sort instead of quick sort
Insertion sort performs better for sorting small arrays. In real-
time encoding (speed=-5), test on test set showed 1.7% performance
gain with 0% PSNR change in average.
John Koleszar [Wed, 27 Apr 2011 16:04:48 +0000 (12:04 -0400)]
vpxdec: test for frame corruption
This change simply exercises the VP8D_GET_FRAME_CORRUPTED control,
outputting a warning message at the end if the bit was set for any
frames. Should never produce any output for good input.
John Koleszar [Tue, 26 Apr 2011 16:36:03 +0000 (12:36 -0400)]
Update VP8DX_BOOL_DECODER_FILL to better detect EOS
Allow more reliable detection of truncated bitstreams by being more
precise with the count of "virtual" bits in the value buffer.
Specifically, the VP8_LOTS_OF_BITS value is accumulated into count,
rather than being assigned, which was losing the prior value,
increasing the required tolerance when testing for the error condition.
John Koleszar [Mon, 25 Apr 2011 19:02:54 +0000 (15:02 -0400)]
Move pick_frame_size() to ratectrl.c
This is a first step in cleaning up the redundancies between
vp8_calc_{auto_,}iframe_target_size. The pick_frame_size() function is
moved to ratectrl.c, and made to be the primary interface. This means
that the various calc_*_target_size functions can be made private.
John Koleszar [Tue, 19 Apr 2011 18:05:27 +0000 (14:05 -0400)]
Limit size of initial keyframe in one-pass.
Rather than using a default size of 1/2 or 3/2 seconds for the first
frame, use a fraction of the initial buffer level to give the
application some control.
This will likely undergo further refinement as size limits on key
frames are currently under discussion on codec-devel@, but this gives
much better behavior for small buffer sizes as a starting point.
Create a new input parameter to allow specifying
the packed frame stereo 3d format. A default value
of mono will be written in the absence of user
specified input
Ronald S. Bultje [Thu, 21 Apr 2011 20:35:02 +0000 (16:35 -0400)]
Fix overflow in temporal_filter_apply_sse2().
The accumulator array is an integer array, so use paddd instead of paddw
to add values to it. Fixes overflows when using large --arnr-maxframes
(>8) values.
Adrian Grange [Thu, 21 Apr 2011 22:45:57 +0000 (15:45 -0700)]
Corrected format specifiers in debug print statements
The arguments to these fprintfs are int not long int so
the format specifier should be "%d" and not "%ld". This
was writing garbage in the linux build.
make two compiler options explicit for Visual Studio projects
This patch changes the release configuration of MS VS projects to
explicitly use two compiler options "Maximize Speed (/O2)" and
"Favor fast code(/Ot)".
Johann [Wed, 13 Apr 2011 20:38:02 +0000 (16:38 -0400)]
keep values in registers during quantization
add an sse4 quantizer so we can use pinsrw/pextrw and keep values in xmm
registers instead of proxying through the stack. and as long as we're
bumping up, use some ssse3 instructions in the EOB detection (see ssse3
fast quantizer)
pick up about a percent on 32bit and about two on 64bit.
Scott LaVarnway [Thu, 21 Apr 2011 18:38:36 +0000 (14:38 -0400)]
Removed dc_diff from MB_MODE_INFO
The dc_diff flag is used to skip loopfiltering. Instead
of setting this flag in the decoder/encoder, we now check
for this condition in the loopfilter.
John Koleszar [Tue, 19 Apr 2011 20:08:45 +0000 (16:08 -0400)]
Remove unused kf rate variables
Remove tot_key_frame_bits and prior_key_frame_size[] as they were
tracked but never used. Remove intra_frame_target, as it was only
used to initialize prior_key_frame_size.
Refactor vp8_adjust_key_frame_context() some to remove unnecessary
calculations.
Johann [Fri, 15 Apr 2011 14:05:20 +0000 (10:05 -0400)]
modify SAVE_XMM for potential 64bit use
the win64 abi requires saving and restoring xmm6:xmm15. currently
SAVE_XMM and RESTORE XMM only allow for saving xmm6:xmm7. allow
specifying the highest register used and if the stack is unaligned.
Johann [Thu, 7 Apr 2011 17:17:22 +0000 (13:17 -0400)]
Add save/restore xmm registers in x86 assembly code
Went through the code and fixed it. Verified on Windows.
Where possible, remove dependencies on xmm[67]
Current code relies on pushing rbp to the stack to get 16 byte
alignment. This broke when rbp wasn't pushed
(vp8/encoder/x86/sad_sse3.asm). Work around this by using unaligned
memory accesses. Revisit this and the offsets in
vp8/encoder/x86/sad_sse3.asm in another change to SAVE_XMM.
Yunqing Wang [Mon, 18 Apr 2011 19:48:34 +0000 (15:48 -0400)]
Use sub-pixel search's SSE in mode selection
Passed SSE from sub-pixel search back to pick_inter_mode
function, which is compared with the encode_breakout to
see if we could skip evaluating the remaining modes.
Yunqing Wang [Fri, 15 Apr 2011 16:57:15 +0000 (12:57 -0400)]
Handle long delay between video frames in multi-thread decoder(issue 312)
This is reported by m...@hesotech.de (see issue 312):
"The decoder causes an access violation
when you decode the first frame, then make a pause of about
60 seconds and then decode further frames. But only if
vpx_codec_dec_cfg_t.threads> 1.
This is caused by a timeout of WaitForSingleObject.
When I change the definition of VPXINFINITE to INFINITE(0xFFFFFFFF),
the problem is solved."
Reproduced the crash and verified the changes on Windows platform.
This brings the behavior inline with the other platforms using sem_wait().