Yunqing Wang [Fri, 25 Jun 2010 13:18:11 +0000 (09:18 -0400)]
Improve SSE2 loopfilter functions
Restructured and rewrote SSE2 loopfilter functions. Combined u and
v into one function to take advantage of SSE2 128-bit registers.
Tests on test clips showed a 4% decoder performance improvement on
Linux desktop.
Adrian Grange [Mon, 28 Jun 2010 11:00:11 +0000 (12:00 +0100)]
Fixed buffer selection for UV in AltRef filtering
Corrected setting of "which_buffer" for U & V cases to match that
used for Y, i.e. to refer to the temporally most recent frame of
those to be filtered.
Yaowu Xu [Wed, 16 Jun 2010 19:52:18 +0000 (12:52 -0700)]
Redo the forward 4x4 dct
The new fdct lowers the round trip sum squared error for a
4x4 block ~0.12. or ~0.008/pixel. For reference, the old
matrix multiply version has average round trip error 1.46
for a 4x4 block.
Thanks to "derf" for his suggestions and references.
Fritz Koenig [Thu, 24 Jun 2010 18:30:48 +0000 (14:30 -0400)]
vp8cx : bestsad declared and initialized incorrectly.
bestsad needs to be a int and set to INT_MAX because at the end
of the function it is compared to INT_MAX to determine if there
was a match in the function.
Fritz Koenig [Thu, 24 Jun 2010 16:18:23 +0000 (12:18 -0400)]
vp8cx : bestsad declared and initialized incorrectly.
bestsad should be an int initialized to INT_MAX. The optimized
SAD function expects a signed value for bestsad to use for comparison
and early loop termination. When no match is made, which is
determined by a comparison of bestsad to INT_MAX, INT_MAX is returned.
agrange [Mon, 21 Jun 2010 12:44:42 +0000 (13:44 +0100)]
Fix breakout thresh computation for golden & AltRef frames
1. Unavailability of each reference frame type should be tested
independently,
2. Also, only the VP8_GOLD_FLAG needs to be tested before setting
golden frame specific thresholds, and only VP8_ALT_FLAG needs
testing before setting thresholds relevant to the AltRef frame.
(Raised by gbvalor, in response to Issue 47)
agrange [Fri, 18 Jun 2010 14:58:18 +0000 (15:58 +0100)]
Changed unary operator from ! to ~
Since the intent is
to reset the appropriate bit in ref_frame_flags not to
test a logic condition. Prior result would always have
been ref_frame_flags being set to 0.
(Issue reported by dgohman, issue 47)
Fix a linker error on x86-64 Linux when not using a version script.
If the version script produced by the libvpx build system is not
used when linking a shared library on x86-64 Linux, the constant
data in the subpel filters produces R_X86_64_32 relocation errors
due to the use of wrt rip addressing instead of
wrt rip wrt ..gotpcrel.
Instead of adding a new macro for this addressing mode, this patch
sets the ELF visibility of these symbols to "hidden", which
allows wrt rip addressing to work without a text relocation.
This allows building a shared library without using the provided
build system or a separate version script.
Fixes http://code.google.com/p/webm/issues/detail?id=46
John Koleszar [Wed, 16 Jun 2010 19:43:09 +0000 (15:43 -0400)]
Generate AUTHORS file with a script
This information is in git, so it's better to use that as a source than
updating this file manually. This script can be run manually at release
time for now, or we can set up a cron job sometime in the future.
Tom Finegan [Wed, 16 Jun 2010 17:24:55 +0000 (13:24 -0400)]
Avoid encoding garbage when ivfenc encounters an unsupported Y4M file.
This change stops ivfenc from treating unsupported Y4M files as raw
input.
For example, if given an interlaced Y4M file, ivfenc treated the input
as if it were raw data because the unsupported Y4M file case previously
fell through without being handled.
John Koleszar [Wed, 16 Jun 2010 16:27:52 +0000 (12:27 -0400)]
gen_scalers: fix 64-bit integer promotion bug
i needs to be treated as signed to get the proper indexing on 64-bit
platforms. This behavior was accidentally reverted when fixing an
unsigned/signed comparison warning.
Change bitreading functions to use a larger window which is refilled less
often.
This makes it cheap enough to do bounds checking each time the window is
refilled, which avoids the need to copy the input into a large circular
buffer.
This uses less memory and speeds up the total decode time by 1.6% on an ARM11,
2.8% on a Cortex A8, and 2.2% on x86-32, but less than 1% on x86-64.
Inlining vp8dx_bool_decoder_fill() has a big penalty on x86-32, as does moving
the refill loop to the front of vp8dx_decode_bool().
However, having the refill loop between computation of the split values and
the branch in vp8_decode_mb_tokens() is a big win on ARM (presumably due to
memory latency and code size: refilling after normalization duplicates the
code in the DECODE_AND_BRANCH_IF_ZERO and DECODE_AND_LOOP_IF_ZERO cases.
Unfortunately, refilling at the end of vp8dx_bool_decoder_fill() and at the
beginning of each decode step in vp8_decode_mb_tokens() means the latter
requires an extra refill at the end.
Platform-specific versions could avoid the problem, but would require most of
detokenize.c to be duplicated.
James Zern [Fri, 4 Jun 2010 22:05:12 +0000 (18:05 -0400)]
VisualStudio projects: asm tool updates
vs8
- pull yasm.rules [1] into the source tree to avoid need to install
file into VC/VCProjectDefaults
- reference same w/ToolFile & RelativePath
- update arm branch to match
John Koleszar [Mon, 14 Jun 2010 12:33:54 +0000 (08:33 -0400)]
ivfenc: fix two-pass support of raw files
Commit 3245d46 "ivfenc: support reading/writing from a pipe" broke
support for two pass encodes of raw files when done in two
invocations of ivfenc. The raw image was only set up on pass 0,
which was never hit when running with --pass=2 --passes=2.
John Koleszar [Sat, 12 Jun 2010 18:11:51 +0000 (14:11 -0400)]
Make this/next iiratio unsigned.
This patch addresses issue #79, which is a regression since commit 28de670 "Fix RD bug." If the coded error value is zero, the iiratio
calculation effectively multiplies by 1000000 by the
DOUBLE_DIVIDE_CHECK macro. This can result in a value larger than
INT_MAX, giving a negative ratio. Since the error values are
conceptually unsigned (though they're stored in a double) this patch
makes the iiratio values unsigned, which allows the clamping to work
as expected.
John Koleszar [Fri, 11 Jun 2010 17:05:08 +0000 (13:05 -0400)]
require --enable-psnr to build ssim
ssim.c comiles in a huge (512M) amount of global scratch space. Allocating
this data on the heap would be a better solution, but this file doesn't
need to be built at all in most cases, so as a first pass, disable it
except when doing opsnr.stt output (--enable-psnr).
John Koleszar [Thu, 10 Jun 2010 12:56:31 +0000 (08:56 -0400)]
replace while(0) construct with if/else
No good reason to be tricky here. I don't know why 'break' occurred to me
as the natrual replacement for the 'return', but an if/else block is
definitely clearer.
The new scheme introduced in I68d35a2f did not clamp chroma MVs in the SPLITMV
case, and clamped them incorrectly (to the luma plane bounds) in every other
case.
Because chroma MVs are computed from the luma MVs before clamping occurs, they
could still point outside of the frame buffer and cause crashes.
This clamping happens outside of the MV prediction loop, and so should not
affect bitstream decoding.
Yunqing Wang [Thu, 10 Jun 2010 15:48:48 +0000 (11:48 -0400)]
Improve vp8_sixtap_predict functions
Restructure vp8_sixtap_predict functions to eliminate extra 5-line
calculation while doing first-pass only. Also, combline functions
to eliminate usage of intermediate buffer. This gives decoder a 3%
performance gain on my test clips.
Using uname fails e.g. on a 64-bit machine with a 32-bit toolchain.
The following gcc -dumpmachine strings have been verified:
* 32-bit Linux gives i486-linux-gnu
* 64-bit Linux gives x86_64-linux-gnu
* Mac OS X 10.5 gives i686-apple-darwin9
* MinGW gives mingw32
*darwin8* and *bsd* can safely be assumed to be correct, but *cygwin*
is a guess.
John Koleszar [Wed, 9 Jun 2010 15:29:20 +0000 (11:29 -0400)]
Remove secondary mv clamping from decode stage
This patch removes the secondary MV clamping from the MV decoder. This
behavior was consistent with limits placed on non-split MVs by the
reference encoder, but was inconsistent with the MVs generated in the
split case.
The purpose of this secondary clamping was only to prevent crashes on
invalid data. It was not intended to be a behaviour an encoder could or
should rely on. Instead of doing additional clamping in a way that
changes the entropy context, the secondary clamp is removed and the
border handling is made implmentation specific. With respect to the
spec, the border is treated as essentially infinite, limited only by
the clamping performed on the near/nearest reference and the maximum
encodable magnitude of the residual MV.
This does not affect any currently produced streams.
John Koleszar [Thu, 3 Jun 2010 14:29:04 +0000 (10:29 -0400)]
shared library support (.so)
This patch adds support for building shared libraries when configured
with the --enable-shared switch.
Building DLLs would require more invasive changes to the sample
utilities than I want to make in this patch, since on Windows you can't
use the address of an imported symbol in a static initializer. The best
way to work around this is proably to build the codec interface mapping
table with an init() function, but dll support is of questionable value
anyway, since most windows users will probably use a media framework
lib like webmdshow, which links this library in staticly.
Add support for reading YUV4MPEG2 files to ivfenc.
A large collection of example files may be found at
http://media.xiph.org/video/derf/
This also fixes a bug in ivfenc for uncompressed IVF input, which previously
appeared not to skip past the file header the second time it opened the file.
I don't actually have an IVF file with which to test this fix, however.
Yunqing Wang [Fri, 28 May 2010 18:34:39 +0000 (14:34 -0400)]
Remove costly memory reads/writes in vp8_reset_mb_tokens_context()
Tests on x86 showed this function costed 2.7% of total decoding time
because of all the memory reads/writes. After modification, it only
costs about 0.7% of decoding time, which gives a 2% gain.
John Koleszar [Fri, 28 May 2010 14:11:58 +0000 (10:11 -0400)]
configure: update script headers
The libvpx build system was influenced by the clever design of the
FFmpeg configure script. Say so in the script header, and provide a
little introduction.
Yaowu Xu [Thu, 27 May 2010 17:04:36 +0000 (10:04 -0700)]
Increase the size of output packet list
This is to accommodate output packets for both compressed
data and psnr stats. For each frame, there are at least
one packet for compressed data and one for psnr stats. For
a max lag of 25, 64 is large enough to cover all lagged
frames at the end of encoding.
John Koleszar [Wed, 26 May 2010 19:57:42 +0000 (15:57 -0400)]
configure: support --prefix, --libdir
Support --prefix, --libdir as a conventional way of specifying the default
installation directories. libdir is required to be a subdirectory of prefix
at this time.