The memory being zeroed in vp8_update_mode_info_border() was just
allocated with calloc, and so the entire function is actually
redundant, but it should be made correct in case someone expects
it to actually work in the future.
The timebase was being set to the value in the Y4M file on each
pass, but only doubled to account for the altref placement on
the first past.
This avoids reseting it on the second pass.
Fritz Koenig [Tue, 24 Aug 2010 23:27:49 +0000 (16:27 -0700)]
Allow --cpu= to work for x86.
--cpu was already implemented for most of our embedded
platforms, this just extends it to x86. Corner case for
Atom processor as it doesn't respond to the --march=
option under icc.
Johann [Tue, 24 Aug 2010 22:23:16 +0000 (18:23 -0400)]
clean up compiler warnings
did a test compile with clang and got rid of some warnings that have
been annoying me for a while:
vp8/decoder/detokenize.c: In function 'vp8_init_detokenizer':
vp8/decoder/detokenize.c:121: warning: assignment discards qualifiers from pointer target type
vp8/decoder/detokenize.c:122: warning: assignment discards qualifiers from pointer target type
vp8/decoder/detokenize.c:123: warning: assignment from incompatible pointer type
vp8/decoder/detokenize.c:124: warning: assignment discards qualifiers from pointer target type
vp8/decoder/detokenize.c:125: warning: assignment discards qualifiers from pointer target type
vp8/decoder/detokenize.c:128: warning: assignment discards qualifiers from pointer target type
vp8/decoder/detokenize.c:129: warning: assignment discards qualifiers from pointer target type
vp8/decoder/detokenize.c:130: warning: assignment discards qualifiers from pointer target type
vp8/decoder/detokenize.c:131: warning: assignment discards qualifiers from pointer target type
Fritz Koenig [Fri, 20 Aug 2010 17:58:19 +0000 (10:58 -0700)]
Rework idct calling structure.
Moving the eob structure allows for a non-struct based
function to handle decoding an entire mb of
idct/dequant/recon data. This allows for SIMD functions
to idct/dequant/recon multiple blocks at once.
John Koleszar [Fri, 20 Aug 2010 15:04:10 +0000 (11:04 -0400)]
increase rate control buffer level precision
The external API exposes the RC initial/optimal/full buffer level in
milliseconds, but this value was truncated internally to seconds. This
patch allows the use of the full precision during the conversion from
time to bits.
John Koleszar [Mon, 16 Aug 2010 13:34:30 +0000 (09:34 -0400)]
arm: fix missing dependency with --enable-shared
The C version of the dequant/idct/add function depends on the C
version of the IDCT, but this isn't compiled in on ARM. Since this
code has asm version, we can just remove this file to eliminate the
link error.
Johann [Thu, 12 Aug 2010 13:05:37 +0000 (09:05 -0400)]
framework for assembly version of the detokenizer
adds a compile time option: --enable-arm-asm-detok which pulls in
vp8/decoder/arm/detokenize.asm
currently about break even speed wise, but changes are pending to
the fill code (branch and load 3 bytes versus conditionally always
load one) and the error handling. Currently it doesn't handle zero
runs or overrunning the buffer.
this is really just so i don't have to rebase my changes all the
time to run benchmarks - now just need to replace one file!
Scott LaVarnway [Thu, 12 Aug 2010 20:25:43 +0000 (16:25 -0400)]
Removed unnecessary MB_MODE_INFO copies
These copies occurred for each macroblock in the encoder and decoder.
Thetemp MB_MODE_INFO mbmi was removed from MACROBLOCKD. As a result,
a large number compile errors had to be fixed.
John Koleszar [Mon, 9 Aug 2010 17:27:26 +0000 (13:27 -0400)]
avoid negative array subscript warnings
The mv_ref and sub_mv_ref token encodings are indexed from NEARESTMV
and LEFT4X4, respectively, rather than being zero-based like the
other token encodings.
Yaowu Xu [Wed, 11 Aug 2010 04:12:04 +0000 (21:12 -0700)]
Normalize quantizer's zero bin and rounding factors
This patch changes a few numbers in the two constant arrays
for quantizer's zerobin and rounding factors, in general to
make the sum of the two factors for any Q to be 128. While
it might be beneficial to calibrate the two arrays for best
quantizer performance, it is not the purpose of this patch.
Normalizing the two arrays will enable quick optimization
of the current faster quantizer, i.e .zerobin check can be
removed.
Replace the exponential search for optimal rounding during
quantization with a linear Viterbi trellis and enable it
by default when using --best.
Right now this operates on top of the output of the adaptive
zero-bin quantizer in vp8_regular_quantize_b() and gives a small
gain.
It can be tested as a replacement for that quantizer by
enabling the call to vp8_strict_quantize_b(), which uses
normal rounding and no zero bin offset.
Ultimately, the quantizer will have to become a function of lambda
in order to take advantage of activity masking, since there is
limited ability to change the quantization factor itself.
However, currently vp8_strict_quantize_b() plus the trellis
quantizer (which is lambda-dependent) loses to
vp8_regular_quantize_b() alone (which is not) on my test clip.
Patch Set 3:
Fix an issue related to the cost evaluation of successor
states when a coefficient is reduced to zero. With this
issue fixed, now the trellis search almost exactly matches
the exponential search.
Patch Set 2:
Overall, the goal of this patch set is to make "trellis"
search to produce encodings that match the exponential
search version. There are three main differences between
Patch Set 2 and 1:
a. Patch set 1 did not properly account for the scale of
2nd order error, so patch set 2 disable it all together
for 2nd blocks.
b. Patch set 1 was not consistent on when to enable the
the quantization optimization. Patch set 2 restore the
condition to be consistent.
c. Patch set 1 checks quantized level L-1, and L for any
input coefficient was quantized to L. Patch set 2 limits
the candidate coefficient to those that were rounded up
to L. It is worth noting here that a strategy to check
L and L+1 for coefficients that were truncated down to L
might work.
(a and b get trellis quant to basically match the exponential
search on all mid/low rate encodings on cif set, without
a, b, trellis quant can hurt the psnr by 0.2 to .3db at
200kbps for some cif clips)
(c gets trellis quant to match the exponential search
to match at Q0 encoding, without c, trellis quant can be
1.5 to 2db lower for encodings with fixed Q at 0 on most
derf cif clips)
Yunqing Wang [Thu, 29 Jul 2010 20:24:26 +0000 (16:24 -0400)]
First modification of multi-thread decoder
This is the first modification of VP8 multi-thread decoder, which uses
same threads to decode macroblocks and then do loopfiltering for each
frame.
Inspired by Rob Clark, synchronization was done on every 8 macroblocks
instead of every macroblock to reduce lock contention.
Comparing with the original code, this implementation gave about 15%-
20% performance gain while decoding my test clips on a Core2 Quad
platform (Linux).
John Koleszar [Mon, 9 Aug 2010 13:33:00 +0000 (09:33 -0400)]
Mark loopfilter C functions as static
Clang defaults to C99 mode, and inline works differently in C99.
(gcc, on the other hand, defaults to a special gnu-style inlining,
which uses different syntax.) Making the functions static makes sure
clang doesn't decide to discard a function because it's too large to
inline.
The regex which postprocesses the gcc make-deps (-M) output was too
greedy and matching in the dependencies part of the rule rather than
the target only. The patch provided with the issue was not correct, as
it tried to match the .o at the end of the line, which isn't correct
at least for my GCC version. This patch matches word characters
instead of .*
Thanks to raimue and the MacPorts community for isolating this issue.
Jan Kratochvil [Sat, 31 Jul 2010 15:12:31 +0000 (17:12 +0200)]
nasm: avoid space before the :data symbol type.
global label:data
^^
Provide nasm compatibility. No binary change by this patch with yasm
on {x86_64,i686}-fedora13-linux-gnu. Few longer opcodes with nasm on
{x86_64,i686}-fedora13-linux-gnu have been checked as safe.
Jan Kratochvil [Sat, 31 Jul 2010 15:12:31 +0000 (17:12 +0200)]
nasm: end labels with colon (':')
Labels should end by colon (':'), nasm requires it.
Provide nasm compatibility. No binary change by this patch with yasm
on {x86_64,i686}-fedora13-linux-gnu. Few longer opcodes with nasm on
{x86_64,i686}-fedora13-linux-gnu have been checked as safe.
Jan Kratochvil [Sat, 31 Jul 2010 15:12:31 +0000 (17:12 +0200)]
nasm: use OWORD vs DQWORD
nasm knows only OWORD. yasm knows both OWORD and DQWORD.
Provide nasm compatibility. No binary change by this patch with yasm on
{x86_64,i686}-fedora13-linux-gnu. Few longer opcodes with nasm on
{x86_64,i686}-fedora13-linux-gnu have been checked as safe.
Enable the switch between two versions of quantizer
To facilitate more testing related to quantizer and rate
control, the old version quantizer is added back. old and
new quantizer can be switched back and forth by define or
un-define the macro "EXACT_QUANT".
John Koleszar [Tue, 27 Jul 2010 15:12:21 +0000 (11:12 -0400)]
neon: disable asm quantizer
The assembly version of the quantizer has not been updated to match
the new exact quantizer introduced in commit e04e2935. That commit tried
to disable this code but missed the non-RTCD case.
Thanks to David Baker <david.baker at openmarket.com> for isolating the
issue and testing this fix.
Johann [Fri, 23 Jul 2010 17:42:30 +0000 (13:42 -0400)]
update arm idct functions
Jeff Muizelaar posted some changes to the idct/reconstruction c code.
This is the equivalent update for the arm assembly.
This shows a good boost on v6, and a minor boost on neon.
Here are some numbers for highway in qcif, 2641 frames:
HEAD neon: ~161 fps
new neon: ~162 fps
HEAD v6: ~102 fps
new v6: ~106 fps
The following functions have been updated for armv6 and neon:
vp8_dc_only_idct_add
vp8_dequant_idct_add
vp8_dequant_dc_idct_add
Jeff Muizelaar [Fri, 28 May 2010 18:28:12 +0000 (14:28 -0400)]
Combine idct and reconstruction steps
This moves the prediction step before the idct and combines the idct and
reconstruction steps into a single step. Combining them seems to give an
overall decoder performance improvement of about 1%.
Fritz Koenig [Thu, 22 Jul 2010 12:07:32 +0000 (08:07 -0400)]
Swap alt/gold/new/last frame buffer ptrs instead of copying.
At the end of the decode, frame buffers were being copied.
The frames are not updated after the copy, they are just
for reference on later frames. This change allows multiple
references to the same frame buffer instead of copying it.
Changes needed to be made to the encoder to handle this. The
encoder is still doing frame buffer copies in similar places
where pointer reference could be done.
Paul Wilkins [Fri, 23 Jul 2010 16:01:12 +0000 (17:01 +0100)]
Rate control bug with long key frame interval.
In two pass encodes, the calculation of the number of bits
allocated to a KF group had the potential to overflow for high data
rates if the interval is very long.
We observed the problem in one test clip where there was one
section where there was an 8000 frame gap between key frames.
This replaces the approximate division-by-multiplication in the
quantizer with an exact one that costs just one add and one
shift extra.
The asm versions have not been updated in this patch, and thus
have been disabled, since the new method requires different
multipliers which are not compatible with the old method.
Fritz Koenig [Thu, 22 Jul 2010 13:46:54 +0000 (09:46 -0400)]
Remove CONFIG_NEW_TOKENS files.
These files were out of date and no longer maintained.
Token decoding has implemented the no-crash code which
is incompatible with this arm assembly code.
Tom Finegan [Thu, 22 Jul 2010 17:34:25 +0000 (13:34 -0400)]
Add vs9 targets.
Add targets x86-win32-vs9 and x86_64-win64-vs9 for support of Visual
Studio 2008-- this removes the need to convert the vs8 projects before
using them within the IDE.
Paul Wilkins [Mon, 19 Jul 2010 13:10:07 +0000 (14:10 +0100)]
Rate control fix for ARNR filtered frames.
Previously we had assumed that it was necessary to give a full frame's
bit allocation to the alt ref frame if it has been created through temporal
filtering. This is not the case. The active max quantizer control
insures that sufficient bits are allocated if needed and allocating a
full frame's worth of bits creates an excessive overhead for the ARF.
Adrian Grange [Thu, 1 Jul 2010 13:17:04 +0000 (14:17 +0100)]
Fix bug in 1st pass motion compensation
In the case where the best reference mv is not (0,0) a secondary
search is carried out centered on (0,0). However, rather than
sending tmp_err into the search function, motion_error was
inadvertently passed.
As a result tmp_err remains set at INT_MAX and the (0,0)-based
search result will never be selected, even if it is better.
John Koleszar [Wed, 30 Jun 2010 14:22:40 +0000 (10:22 -0400)]
Update loopfilter frame/filter/sharp info for multithread
Change I9fd1a5a4 updated the multithreaded loopfilter to avoid
reinitializing several parameteres if they haven't changed from the
last frame, but the code to update the last frame's parameters wasn't
invoked in the multithreaded case.
Yunqing Wang [Fri, 25 Jun 2010 13:18:11 +0000 (09:18 -0400)]
Improve SSE2 loopfilter functions
Restructured and rewrote SSE2 loopfilter functions. Combined u and
v into one function to take advantage of SSE2 128-bit registers.
Tests on test clips showed a 4% decoder performance improvement on
Linux desktop.
Paul Wilkins [Tue, 29 Jun 2010 11:15:54 +0000 (12:15 +0100)]
Further adjustment of RD behaviour with Q and Zbin.
Following conversations with Tim T (Derf) I ran a large number of
tests comparing the existing polynomial expression with a simpler
^2 variant. Though the polynomial was sometimes a little better at
the extremes of Q it was possible to get close for most clips and
even a little better on some.
This code also changes the way the RD multiplier is calculated
when the ZBIN is extended to use a variant of the same ^2
expression.
I hope that this simpler expression will be easier to tune further
as we expand our test set and consider adjustments based on content.
Yaowu Xu [Tue, 29 Jun 2010 05:03:43 +0000 (22:03 -0700)]
Improve the accuracy of forward walsh-hadamard transform
Besides the slight improvement in round trip error. This
also fixes a sign bias in the forward transform, so the
round trip errors are evenly distributed between +1s and
-1s. The old bias seemed to work well with the dc sign bias
in old fdct, which no longer exist in the improved fdct.
Adrian Grange [Mon, 28 Jun 2010 11:00:11 +0000 (12:00 +0100)]
Fixed buffer selection for UV in AltRef filtering
Corrected setting of "which_buffer" for U & V cases to match that
used for Y, i.e. to refer to the temporally most recent frame of
those to be filtered.