Adrian Grange [Wed, 13 Apr 2011 19:56:46 +0000 (12:56 -0700)]
Fixed use of early breakout in vp8_pick_intra4x4mby_modes
Index i is used to detect early breakout from the first loop, but
its value is lost due to reuse in the second for loop. I moved
the position of the second loop and did some format cleanup.
John Koleszar [Wed, 13 Apr 2011 18:00:18 +0000 (14:00 -0400)]
Refactor lookahead ring buffer
This patch cleans up the source buffer storage and copy mechanism to
allow access through a standard push/pop/peek interface. This approach
also avoids an extra copy in the case where the source is not a
multiple of 16, fixing issue #102.
John Koleszar [Mon, 11 Apr 2011 17:05:08 +0000 (13:05 -0400)]
Bugfix for error accumulator stats
Previous to commit de4e9e3, there was an early return in the alt-ref
case that was inadvertantly removed when the function was refactored
to return void. This patch restores the prior behavior.
Yunqing Wang [Fri, 1 Apr 2011 20:41:58 +0000 (16:41 -0400)]
Use full-pixel MV in mvsadcost calculation
MV sad cost error is only used in full-pixel motion search,
which only need full-pixel resolution instead of quarter-pixel
resolution. This change reduced mvsadcost table size, and
removed unneccessary pamameter passing since this table is
constant once it is generated.
Johann [Thu, 31 Mar 2011 20:35:22 +0000 (16:35 -0400)]
support obj_int_extract on cygwin
cygwin doesn't support _sopen. drop down to the lowest common
denominator and merge main for all platforms. this also opens the door
for supporting multiple object formats with a single binary.
Tero Rintaluoma [Wed, 30 Mar 2011 10:45:59 +0000 (13:45 +0300)]
Wrapper function removed from vp8_subtract_b_neon function call
Address calculations moved from encodemb_arm.c file to neon
optimized assembly function to save cycles in function calls.
- vp8_subtract_b_neon_func replaced with vp8_subtract_b_neon
that contains all needed address calculations
- unnecessary file encodemb_arm.c removed
- consistent with ARMv6 optimized version
Ralph Giles [Mon, 28 Mar 2011 19:04:51 +0000 (12:04 -0700)]
Generate a vpx.pc file for pkg-config.
Rules are added to libs.mk to generate a vpx.pc, which is
installed as pkgconfig/vpx.pc under the target library directory.
This also requires the install path prefix be exported directly
in config.mk.
Some systems use a tool called pkg-config to query information
about intalled libraries or other resources, based on database
files provided by the packages themselves at install time.
Providing such a file for libvpx simplifies integration with
other build systems, and provides an easy avenue for developers
to test against their own builds of the library.
Ralph Giles [Mon, 28 Mar 2011 18:36:53 +0000 (11:36 -0700)]
Export the version string as a makefile variable.
The configure script exports the major/minor/patch version
numbers, but didn't make the full version string available
to Makefile recipes and rules, the way it is available to
C code from vpx_version.h.
John Koleszar [Wed, 30 Mar 2011 01:44:19 +0000 (21:44 -0400)]
vpxenc: die on realloc failures
Identified as a possible cause of issue #308, the code was silently
ignoring realloc failures, which would lead to corruption, memory
leaks, and likely a crash. The best we can do in this case is die
gracefully.
Tero Rintaluoma [Mon, 28 Mar 2011 06:51:51 +0000 (09:51 +0300)]
Half pixel variance further optimized for ARMv6
Half pixel interpolations optimized in variance calculations. Separate
function calls to vp8_filter_block2d_bil_x_pass_armv6 are avoided.On
average, performance improvement is 6-7% for VGA@30fps sequences.
Johann [Thu, 10 Feb 2011 19:57:43 +0000 (14:57 -0500)]
use asm_offsets with vp8_regular_quantize_b_sse2
remove helper function and avoid shadowing all the arguments to the
stack on 64bit systems
when running with --good --cpu-used=0:
~2% on linux x86 and x86_64
~2% on win32 x86 msys and visual studio
more on darwin10 x86_64
significantly more on
x86_64-win64-vs9
John Koleszar [Mon, 21 Mar 2011 11:50:42 +0000 (07:50 -0400)]
Allow specifying --end-usage by enum name
Map an enum to the --end-usage values, so you can specify
--end-usage=cq instead of --end-usage=2. The numerical values still
work for historical scripts, etc, but this is more user friendly.
Tero Rintaluoma [Mon, 21 Mar 2011 11:33:45 +0000 (13:33 +0200)]
ARMv6 optimized fdct4x4
Optimized fdct4x4 (8x4) for ARMv6 instruction set.
- No interlocks in Cortex-A8 pipeline
- One interlock cycle in ARM11 pipeline
- About 2.16 times faster than current C-code compiled with -O3
Attila Nagy [Fri, 18 Mar 2011 08:44:08 +0000 (10:44 +0200)]
Fix multithreaded encoding for 1 MB wide frame
Thread synchronization was not correct when frame width was 1 MB.
Number of allocated encoding threads is limited by the sync_range.
There is no point having more because each thread lags sync_range MBs
behind the thread processing the row above.
John Koleszar [Thu, 17 Mar 2011 21:07:59 +0000 (17:07 -0400)]
Increase static linkage, remove unused functions
A large number of functions were defined with external linkage, even
though they were only used from within one file. This patch changes
their linkage to static and removes the vp8_ prefix from their names,
which should make it more obvious to the reader that the function is
contained within the current translation unit. Functions that were
not referenced were removed.
Ralph Giles [Thu, 17 Mar 2011 07:23:36 +0000 (00:23 -0700)]
Set bounds from the array when iterating mmaps.
The mmap allocation code in vp8_dx_iface.c was inconsistent.
The static array vp8_mem_req_segs defines two descriptors,
but only the first is real. The second is a sentinel and
isn't actually allocated, so vpx_codec_alg_priv is declared
with mmaps[NELEMENTS(vp8_mem_req_segs)-1]. Some functions
use this reduced upper bound when iterating though the mmap
array, but these two functions did not.
Instead, this commit calls NELEMENTS(...->mmaps) to directly
query the bounds of the dereferenced array.
This fixes an array-bounds warning from gcc 4.6 on
vp8_xma_set_mmap.
John Koleszar [Thu, 29 Jul 2010 21:04:39 +0000 (17:04 -0400)]
apple: include proper mach primatives
Fixes implicit declaration warning for 'mach_task_self'. This change
is an update to Change I9991dedd1ccfddc092eca86705ecbc3b764b799d,
which fixed this issue for the decoder but not the encoder.
Without this change I get link errors in firefox's libxul. It looks
like the linker expect a particular pattern for getting the GOT. This
patch changes webm to use the same pattern used by the compiler.
John Koleszar [Fri, 11 Mar 2011 16:35:38 +0000 (11:35 -0500)]
Move build_intra_predictors_mby to RTCD framework
The vp8_build_intra_predictors_mby and vp8_build_intra_predictors_mby_s
functions had global function pointers rather than using the RTCD
framework. This can show up as a potential data race with tools such as
helgrind. See https://bugzilla.mozilla.org/show_bug.cgi?id=640935
for an example.
Paul Wilkins [Fri, 11 Mar 2011 14:51:40 +0000 (14:51 +0000)]
Clean up of vp8_init_config()
Clean up vp8_init_config() a bit and remove null pointer case,
as this code can't be called any more and is not an adequate
trap anyway, as a null pointer would cause exceptions before
hitting the test.
Paul Wilkins [Thu, 10 Mar 2011 16:11:39 +0000 (16:11 +0000)]
1 Pass CQ and VBR bug fixes
Issue 291 highlighted the fact that CQ mode was not working
as expected in 1 pass mode,
This commit fixes that specific problem but in so doing I also
uncovered an overflow issue in the VBR code for 1 pass and
some data values not being correctly initialized.
For some clips (particularly short clips), the resulting
improvement is dramatic.
Attila Nagy [Fri, 25 Feb 2011 11:42:05 +0000 (13:42 +0200)]
Encoder loopfilter running in its own thread
In multithreaded mode the loopfilter is running in its own thread (filter level
calculation and frame filtering). Filtering is mostly done in parallel with the
bitstream packing. Before starting the packing the loopfilter level has
to be calculated. Also any needed reference frame copying is done in the
filter thread.
Currently the encoder will create n+1 threads, where n > 1 is the number of
threads specified by application and 1 is the extra filter thread. With n = 1
the encoder runs in single thread mode. There will never be more than n threads
running concurrently.
Johann [Wed, 2 Mar 2011 14:44:39 +0000 (09:44 -0500)]
obj_int_extract for Visual Studio
Enable extraction of assembly offsets from compiled examples in MSVS.
This will allow us to remove some stub functions from x86 assembly since
we will be able to reliably determine structure offsets at compile time.
see ARM code for examples:
vp8/encoder/arm/armv5te/
vpx_scale/arm/neon/
Adrian Grange [Thu, 10 Mar 2011 19:32:48 +0000 (11:32 -0800)]
Removed firstpass motion map
The firstpass motion map consists of an 8-bit flag for
each MB indicating how strongly the firstpass code
believes it should be filtered during the second pass
ARNR filtering.
For long or large format material the motion map can
become extremely large and hamper the operation of
the encoding process.
This change removes the motion map altogether, leaving
the second pass to rely on the magnitude of the motion
compensated error to determine the filter weight to
use for the MB during ARNR filtering.
Tests on the derf set indicate that the effect of this
change is neutral, with some small wins and losses. The
motion map has therefore been removed based on
a cost/benefit evaluation.
James Berry [Thu, 10 Mar 2011 16:13:44 +0000 (11:13 -0500)]
Fix incorrect macroblock counts in twopass rate control
The previous calculation of macroblock count (w*h)/256
is not correct when the width/height are not multiples of
16. Use the precalculated macroblock count from
cpi->common instead. This manifested itself as a divide
by zero when the number of pixels was less than 256.
num_mbs updated in estimate_max_q, estimate_q,
estimate_kf_group_q, and estimate_cq