Tom Finegan [Thu, 22 Jul 2010 17:34:25 +0000 (13:34 -0400)]
Add vs9 targets.
Add targets x86-win32-vs9 and x86_64-win64-vs9 for support of Visual
Studio 2008-- this removes the need to convert the vs8 projects before
using them within the IDE.
Paul Wilkins [Mon, 19 Jul 2010 13:10:07 +0000 (14:10 +0100)]
Rate control fix for ARNR filtered frames.
Previously we had assumed that it was necessary to give a full frame's
bit allocation to the alt ref frame if it has been created through temporal
filtering. This is not the case. The active max quantizer control
insures that sufficient bits are allocated if needed and allocating a
full frame's worth of bits creates an excessive overhead for the ARF.
Adrian Grange [Thu, 1 Jul 2010 13:17:04 +0000 (14:17 +0100)]
Fix bug in 1st pass motion compensation
In the case where the best reference mv is not (0,0) a secondary
search is carried out centered on (0,0). However, rather than
sending tmp_err into the search function, motion_error was
inadvertently passed.
As a result tmp_err remains set at INT_MAX and the (0,0)-based
search result will never be selected, even if it is better.
John Koleszar [Wed, 30 Jun 2010 14:22:40 +0000 (10:22 -0400)]
Update loopfilter frame/filter/sharp info for multithread
Change I9fd1a5a4 updated the multithreaded loopfilter to avoid
reinitializing several parameteres if they haven't changed from the
last frame, but the code to update the last frame's parameters wasn't
invoked in the multithreaded case.
Yunqing Wang [Fri, 25 Jun 2010 13:18:11 +0000 (09:18 -0400)]
Improve SSE2 loopfilter functions
Restructured and rewrote SSE2 loopfilter functions. Combined u and
v into one function to take advantage of SSE2 128-bit registers.
Tests on test clips showed a 4% decoder performance improvement on
Linux desktop.
Paul Wilkins [Tue, 29 Jun 2010 11:15:54 +0000 (12:15 +0100)]
Further adjustment of RD behaviour with Q and Zbin.
Following conversations with Tim T (Derf) I ran a large number of
tests comparing the existing polynomial expression with a simpler
^2 variant. Though the polynomial was sometimes a little better at
the extremes of Q it was possible to get close for most clips and
even a little better on some.
This code also changes the way the RD multiplier is calculated
when the ZBIN is extended to use a variant of the same ^2
expression.
I hope that this simpler expression will be easier to tune further
as we expand our test set and consider adjustments based on content.
Yaowu Xu [Tue, 29 Jun 2010 05:03:43 +0000 (22:03 -0700)]
Improve the accuracy of forward walsh-hadamard transform
Besides the slight improvement in round trip error. This
also fixes a sign bias in the forward transform, so the
round trip errors are evenly distributed between +1s and
-1s. The old bias seemed to work well with the dc sign bias
in old fdct, which no longer exist in the improved fdct.
Adrian Grange [Mon, 28 Jun 2010 11:00:11 +0000 (12:00 +0100)]
Fixed buffer selection for UV in AltRef filtering
Corrected setting of "which_buffer" for U & V cases to match that
used for Y, i.e. to refer to the temporally most recent frame of
those to be filtered.
Yaowu Xu [Wed, 16 Jun 2010 19:52:18 +0000 (12:52 -0700)]
Redo the forward 4x4 dct
The new fdct lowers the round trip sum squared error for a
4x4 block ~0.12. or ~0.008/pixel. For reference, the old
matrix multiply version has average round trip error 1.46
for a 4x4 block.
Thanks to "derf" for his suggestions and references.
Fritz Koenig [Thu, 24 Jun 2010 18:30:48 +0000 (14:30 -0400)]
vp8cx : bestsad declared and initialized incorrectly.
bestsad needs to be a int and set to INT_MAX because at the end
of the function it is compared to INT_MAX to determine if there
was a match in the function.
Fritz Koenig [Thu, 24 Jun 2010 16:18:23 +0000 (12:18 -0400)]
vp8cx : bestsad declared and initialized incorrectly.
bestsad should be an int initialized to INT_MAX. The optimized
SAD function expects a signed value for bestsad to use for comparison
and early loop termination. When no match is made, which is
determined by a comparison of bestsad to INT_MAX, INT_MAX is returned.
John Koleszar [Thu, 24 Jun 2010 12:41:51 +0000 (08:41 -0400)]
ivfenc: correct fixed kf interval, --disable-kf
ivfenc was setting the VPX_KF_FIXED mode when the kf_min_dist and
kf_max_dist parameters were set to each other. This flag actually means
that keyframes are disabled, and that name was deprecated to avoid
confusion such as this. Instead, a new option is exposed for setting the
VPX_KF_DISABLED mode, and the intervals are passed through to the codec,
which will do automatic placement at a fixed interval as expected.
agrange [Mon, 21 Jun 2010 12:44:42 +0000 (13:44 +0100)]
Fix breakout thresh computation for golden & AltRef frames
1. Unavailability of each reference frame type should be tested
independently,
2. Also, only the VP8_GOLD_FLAG needs to be tested before setting
golden frame specific thresholds, and only VP8_ALT_FLAG needs
testing before setting thresholds relevant to the AltRef frame.
(Raised by gbvalor, in response to Issue 47)
agrange [Fri, 18 Jun 2010 14:58:18 +0000 (15:58 +0100)]
Changed unary operator from ! to ~
Since the intent is
to reset the appropriate bit in ref_frame_flags not to
test a logic condition. Prior result would always have
been ref_frame_flags being set to 0.
(Issue reported by dgohman, issue 47)
Fix a linker error on x86-64 Linux when not using a version script.
If the version script produced by the libvpx build system is not
used when linking a shared library on x86-64 Linux, the constant
data in the subpel filters produces R_X86_64_32 relocation errors
due to the use of wrt rip addressing instead of
wrt rip wrt ..gotpcrel.
Instead of adding a new macro for this addressing mode, this patch
sets the ELF visibility of these symbols to "hidden", which
allows wrt rip addressing to work without a text relocation.
This allows building a shared library without using the provided
build system or a separate version script.
Fixes http://code.google.com/p/webm/issues/detail?id=46
John Koleszar [Wed, 16 Jun 2010 19:43:09 +0000 (15:43 -0400)]
Generate AUTHORS file with a script
This information is in git, so it's better to use that as a source than
updating this file manually. This script can be run manually at release
time for now, or we can set up a cron job sometime in the future.
Tom Finegan [Wed, 16 Jun 2010 17:24:55 +0000 (13:24 -0400)]
Avoid encoding garbage when ivfenc encounters an unsupported Y4M file.
This change stops ivfenc from treating unsupported Y4M files as raw
input.
For example, if given an interlaced Y4M file, ivfenc treated the input
as if it were raw data because the unsupported Y4M file case previously
fell through without being handled.
John Koleszar [Wed, 16 Jun 2010 16:27:52 +0000 (12:27 -0400)]
gen_scalers: fix 64-bit integer promotion bug
i needs to be treated as signed to get the proper indexing on 64-bit
platforms. This behavior was accidentally reverted when fixing an
unsigned/signed comparison warning.
Change bitreading functions to use a larger window which is refilled less
often.
This makes it cheap enough to do bounds checking each time the window is
refilled, which avoids the need to copy the input into a large circular
buffer.
This uses less memory and speeds up the total decode time by 1.6% on an ARM11,
2.8% on a Cortex A8, and 2.2% on x86-32, but less than 1% on x86-64.
Inlining vp8dx_bool_decoder_fill() has a big penalty on x86-32, as does moving
the refill loop to the front of vp8dx_decode_bool().
However, having the refill loop between computation of the split values and
the branch in vp8_decode_mb_tokens() is a big win on ARM (presumably due to
memory latency and code size: refilling after normalization duplicates the
code in the DECODE_AND_BRANCH_IF_ZERO and DECODE_AND_LOOP_IF_ZERO cases.
Unfortunately, refilling at the end of vp8dx_bool_decoder_fill() and at the
beginning of each decode step in vp8_decode_mb_tokens() means the latter
requires an extra refill at the end.
Platform-specific versions could avoid the problem, but would require most of
detokenize.c to be duplicated.
James Zern [Fri, 4 Jun 2010 22:05:12 +0000 (18:05 -0400)]
VisualStudio projects: asm tool updates
vs8
- pull yasm.rules [1] into the source tree to avoid need to install
file into VC/VCProjectDefaults
- reference same w/ToolFile & RelativePath
- update arm branch to match
John Koleszar [Mon, 14 Jun 2010 12:33:54 +0000 (08:33 -0400)]
ivfenc: fix two-pass support of raw files
Commit 3245d46 "ivfenc: support reading/writing from a pipe" broke
support for two pass encodes of raw files when done in two
invocations of ivfenc. The raw image was only set up on pass 0,
which was never hit when running with --pass=2 --passes=2.
John Koleszar [Sat, 12 Jun 2010 18:11:51 +0000 (14:11 -0400)]
Make this/next iiratio unsigned.
This patch addresses issue #79, which is a regression since commit 28de670 "Fix RD bug." If the coded error value is zero, the iiratio
calculation effectively multiplies by 1000000 by the
DOUBLE_DIVIDE_CHECK macro. This can result in a value larger than
INT_MAX, giving a negative ratio. Since the error values are
conceptually unsigned (though they're stored in a double) this patch
makes the iiratio values unsigned, which allows the clamping to work
as expected.
John Koleszar [Fri, 11 Jun 2010 17:05:08 +0000 (13:05 -0400)]
require --enable-psnr to build ssim
ssim.c comiles in a huge (512M) amount of global scratch space. Allocating
this data on the heap would be a better solution, but this file doesn't
need to be built at all in most cases, so as a first pass, disable it
except when doing opsnr.stt output (--enable-psnr).
John Koleszar [Thu, 10 Jun 2010 12:56:31 +0000 (08:56 -0400)]
replace while(0) construct with if/else
No good reason to be tricky here. I don't know why 'break' occurred to me
as the natrual replacement for the 'return', but an if/else block is
definitely clearer.
The new scheme introduced in I68d35a2f did not clamp chroma MVs in the SPLITMV
case, and clamped them incorrectly (to the luma plane bounds) in every other
case.
Because chroma MVs are computed from the luma MVs before clamping occurs, they
could still point outside of the frame buffer and cause crashes.
This clamping happens outside of the MV prediction loop, and so should not
affect bitstream decoding.
Yunqing Wang [Thu, 10 Jun 2010 15:48:48 +0000 (11:48 -0400)]
Improve vp8_sixtap_predict functions
Restructure vp8_sixtap_predict functions to eliminate extra 5-line
calculation while doing first-pass only. Also, combline functions
to eliminate usage of intermediate buffer. This gives decoder a 3%
performance gain on my test clips.
Using uname fails e.g. on a 64-bit machine with a 32-bit toolchain.
The following gcc -dumpmachine strings have been verified:
* 32-bit Linux gives i486-linux-gnu
* 64-bit Linux gives x86_64-linux-gnu
* Mac OS X 10.5 gives i686-apple-darwin9
* MinGW gives mingw32
*darwin8* and *bsd* can safely be assumed to be correct, but *cygwin*
is a guess.
John Koleszar [Wed, 9 Jun 2010 15:29:20 +0000 (11:29 -0400)]
Remove secondary mv clamping from decode stage
This patch removes the secondary MV clamping from the MV decoder. This
behavior was consistent with limits placed on non-split MVs by the
reference encoder, but was inconsistent with the MVs generated in the
split case.
The purpose of this secondary clamping was only to prevent crashes on
invalid data. It was not intended to be a behaviour an encoder could or
should rely on. Instead of doing additional clamping in a way that
changes the entropy context, the secondary clamp is removed and the
border handling is made implmentation specific. With respect to the
spec, the border is treated as essentially infinite, limited only by
the clamping performed on the near/nearest reference and the maximum
encodable magnitude of the residual MV.
This does not affect any currently produced streams.