Ronald S. Bultje [Thu, 21 Apr 2011 20:35:02 +0000 (16:35 -0400)]
Fix overflow in temporal_filter_apply_sse2().
The accumulator array is an integer array, so use paddd instead of paddw
to add values to it. Fixes overflows when using large --arnr-maxframes
(>8) values.
Adrian Grange [Thu, 21 Apr 2011 22:45:57 +0000 (15:45 -0700)]
Corrected format specifiers in debug print statements
The arguments to these fprintfs are int not long int so
the format specifier should be "%d" and not "%ld". This
was writing garbage in the linux build.
make two compiler options explicit for Visual Studio projects
This patch changes the release configuration of MS VS projects to
explicitly use two compiler options "Maximize Speed (/O2)" and
"Favor fast code(/Ot)".
Johann [Wed, 13 Apr 2011 20:38:02 +0000 (16:38 -0400)]
keep values in registers during quantization
add an sse4 quantizer so we can use pinsrw/pextrw and keep values in xmm
registers instead of proxying through the stack. and as long as we're
bumping up, use some ssse3 instructions in the EOB detection (see ssse3
fast quantizer)
pick up about a percent on 32bit and about two on 64bit.
Scott LaVarnway [Thu, 21 Apr 2011 18:38:36 +0000 (14:38 -0400)]
Removed dc_diff from MB_MODE_INFO
The dc_diff flag is used to skip loopfiltering. Instead
of setting this flag in the decoder/encoder, we now check
for this condition in the loopfilter.
John Koleszar [Tue, 19 Apr 2011 20:08:45 +0000 (16:08 -0400)]
Remove unused kf rate variables
Remove tot_key_frame_bits and prior_key_frame_size[] as they were
tracked but never used. Remove intra_frame_target, as it was only
used to initialize prior_key_frame_size.
Refactor vp8_adjust_key_frame_context() some to remove unnecessary
calculations.
Johann [Fri, 15 Apr 2011 14:05:20 +0000 (10:05 -0400)]
modify SAVE_XMM for potential 64bit use
the win64 abi requires saving and restoring xmm6:xmm15. currently
SAVE_XMM and RESTORE XMM only allow for saving xmm6:xmm7. allow
specifying the highest register used and if the stack is unaligned.
Johann [Thu, 7 Apr 2011 17:17:22 +0000 (13:17 -0400)]
Add save/restore xmm registers in x86 assembly code
Went through the code and fixed it. Verified on Windows.
Where possible, remove dependencies on xmm[67]
Current code relies on pushing rbp to the stack to get 16 byte
alignment. This broke when rbp wasn't pushed
(vp8/encoder/x86/sad_sse3.asm). Work around this by using unaligned
memory accesses. Revisit this and the offsets in
vp8/encoder/x86/sad_sse3.asm in another change to SAVE_XMM.
Yunqing Wang [Mon, 18 Apr 2011 19:48:34 +0000 (15:48 -0400)]
Use sub-pixel search's SSE in mode selection
Passed SSE from sub-pixel search back to pick_inter_mode
function, which is compared with the encode_breakout to
see if we could skip evaluating the remaining modes.
Yunqing Wang [Fri, 15 Apr 2011 16:57:15 +0000 (12:57 -0400)]
Handle long delay between video frames in multi-thread decoder(issue 312)
This is reported by m...@hesotech.de (see issue 312):
"The decoder causes an access violation
when you decode the first frame, then make a pause of about
60 seconds and then decode further frames. But only if
vpx_codec_dec_cfg_t.threads> 1.
This is caused by a timeout of WaitForSingleObject.
When I change the definition of VPXINFINITE to INFINITE(0xFFFFFFFF),
the problem is solved."
Reproduced the crash and verified the changes on Windows platform.
This brings the behavior inline with the other platforms using sem_wait().
Johann [Fri, 15 Apr 2011 14:11:53 +0000 (10:11 -0400)]
remove dead code, add missing RESTORE_XMM
vp8_filter_block1d16_h4_ssse3 was never called
because UNSHADOW_ARGS moves the stack by 'mov rsp, rbp', the issue was
masked. however, if/when win64 used those registers for persistant data,
issues could/will arise.
Adrian Grange [Wed, 13 Apr 2011 19:56:46 +0000 (12:56 -0700)]
Fixed use of early breakout in vp8_pick_intra4x4mby_modes
Index i is used to detect early breakout from the first loop, but
its value is lost due to reuse in the second for loop. I moved
the position of the second loop and did some format cleanup.
John Koleszar [Wed, 13 Apr 2011 18:00:18 +0000 (14:00 -0400)]
Refactor lookahead ring buffer
This patch cleans up the source buffer storage and copy mechanism to
allow access through a standard push/pop/peek interface. This approach
also avoids an extra copy in the case where the source is not a
multiple of 16, fixing issue #102.
John Koleszar [Mon, 11 Apr 2011 17:05:08 +0000 (13:05 -0400)]
Bugfix for error accumulator stats
Previous to commit de4e9e3, there was an early return in the alt-ref
case that was inadvertantly removed when the function was refactored
to return void. This patch restores the prior behavior.
Jim Bankoski [Mon, 28 Mar 2011 23:39:05 +0000 (16:39 -0700)]
fixed an overflow in ssim calculation
This commit fixed an overflow in ssim calculation, added register
save and restore to make sure assembly code working for x64 platform.
It also changed the sampling points to every 4x4 instead of 8x8 and
adjusted the constants in SSIM calculation to match the scale of
previous VPXSSIM.
Yunqing Wang [Fri, 1 Apr 2011 20:41:58 +0000 (16:41 -0400)]
Use full-pixel MV in mvsadcost calculation
MV sad cost error is only used in full-pixel motion search,
which only need full-pixel resolution instead of quarter-pixel
resolution. This change reduced mvsadcost table size, and
removed unneccessary pamameter passing since this table is
constant once it is generated.
Johann [Thu, 31 Mar 2011 20:35:22 +0000 (16:35 -0400)]
support obj_int_extract on cygwin
cygwin doesn't support _sopen. drop down to the lowest common
denominator and merge main for all platforms. this also opens the door
for supporting multiple object formats with a single binary.
Tero Rintaluoma [Wed, 30 Mar 2011 10:45:59 +0000 (13:45 +0300)]
Wrapper function removed from vp8_subtract_b_neon function call
Address calculations moved from encodemb_arm.c file to neon
optimized assembly function to save cycles in function calls.
- vp8_subtract_b_neon_func replaced with vp8_subtract_b_neon
that contains all needed address calculations
- unnecessary file encodemb_arm.c removed
- consistent with ARMv6 optimized version
Ralph Giles [Mon, 28 Mar 2011 19:04:51 +0000 (12:04 -0700)]
Generate a vpx.pc file for pkg-config.
Rules are added to libs.mk to generate a vpx.pc, which is
installed as pkgconfig/vpx.pc under the target library directory.
This also requires the install path prefix be exported directly
in config.mk.
Some systems use a tool called pkg-config to query information
about intalled libraries or other resources, based on database
files provided by the packages themselves at install time.
Providing such a file for libvpx simplifies integration with
other build systems, and provides an easy avenue for developers
to test against their own builds of the library.
Ralph Giles [Mon, 28 Mar 2011 18:36:53 +0000 (11:36 -0700)]
Export the version string as a makefile variable.
The configure script exports the major/minor/patch version
numbers, but didn't make the full version string available
to Makefile recipes and rules, the way it is available to
C code from vpx_version.h.
John Koleszar [Wed, 30 Mar 2011 01:44:19 +0000 (21:44 -0400)]
vpxenc: die on realloc failures
Identified as a possible cause of issue #308, the code was silently
ignoring realloc failures, which would lead to corruption, memory
leaks, and likely a crash. The best we can do in this case is die
gracefully.
Tero Rintaluoma [Mon, 28 Mar 2011 06:51:51 +0000 (09:51 +0300)]
Half pixel variance further optimized for ARMv6
Half pixel interpolations optimized in variance calculations. Separate
function calls to vp8_filter_block2d_bil_x_pass_armv6 are avoided.On
average, performance improvement is 6-7% for VGA@30fps sequences.
Johann [Thu, 10 Feb 2011 19:57:43 +0000 (14:57 -0500)]
use asm_offsets with vp8_regular_quantize_b_sse2
remove helper function and avoid shadowing all the arguments to the
stack on 64bit systems
when running with --good --cpu-used=0:
~2% on linux x86 and x86_64
~2% on win32 x86 msys and visual studio
more on darwin10 x86_64
significantly more on
x86_64-win64-vs9
John Koleszar [Mon, 21 Mar 2011 11:50:42 +0000 (07:50 -0400)]
Allow specifying --end-usage by enum name
Map an enum to the --end-usage values, so you can specify
--end-usage=cq instead of --end-usage=2. The numerical values still
work for historical scripts, etc, but this is more user friendly.
Tero Rintaluoma [Mon, 21 Mar 2011 11:33:45 +0000 (13:33 +0200)]
ARMv6 optimized fdct4x4
Optimized fdct4x4 (8x4) for ARMv6 instruction set.
- No interlocks in Cortex-A8 pipeline
- One interlock cycle in ARM11 pipeline
- About 2.16 times faster than current C-code compiled with -O3