Adrian Grange [Thu, 30 Sep 2010 09:06:09 +0000 (10:06 +0100)]
Changed defaults & range checking for AltRef params
Modified the range checking of parameters used in the
AltRef temporal filter (arnr-max-frames, arnr-strength,
arnr-type) and default values for each of them.
John Koleszar [Wed, 29 Sep 2010 17:04:04 +0000 (13:04 -0400)]
Fix loopfilter delta zero transitions
Loopfilter deltas are initialized to zero on keyframes in the decoder.
The values then persist from the previous frame unless an update bit
is set in the bitstream. This data is not included in the entropy
data saved by the 'refresh entropy' bit in the bitstream, so it is
effectively an additional contextual element beyond the 3 ref-frames
and the entropy data.
The encoder was treating this delta update bit as update-if-nonzero,
meaning that the value would be refreshed even if it hadn't changed,
and more significantly, if the correct value for the delta changed
to zero, the update wouldn't be sent, and the decoder would preserve
the last (presumably non-zero) value.
This patch updates the encoder to send an update only if the value
has changed from the previously transmitted value. It also forces the
value to be transmitted in error resilient mode, to account for lost
context in the event of lost frames.
Paul Wilkins [Wed, 29 Sep 2010 11:03:19 +0000 (12:03 +0100)]
Control of active min quantizer for two pass.
Create look up tables for controlling the active quantizer range.
Some initial tuning to improve quality circa 0.5% on test set.
Clean up of some stats output code
Fritz Koenig [Tue, 28 Sep 2010 19:01:34 +0000 (12:01 -0700)]
Optimizations on the loopfilters.
- Scheduling for Atom processors
- Combining of macros to allow for better interleaving
- Change from multiplies to adds for main filter
- Use of movhps/movlps to fill xmm registers without
shifting and orring
Johann [Tue, 28 Sep 2010 13:31:11 +0000 (09:31 -0400)]
update gitignore
this was excluding all .asm files when it should have just been .asm
files in the top level directory and .asm.s files lower down. also be
more restrictive on some other items, and run the whole thing through
sort to keep it organized
The existing code applied a 6-tap filter with 0's on either end.
We're already paying the branch penalty to avoid computing the two
extra columns needed as input to this filter.
We might as well save time computing the filter as well.
This reduces the inner loop from 21 instructions to 16, the number
of loads per iteration from 4 to 1, and the number of multiplies
from 7 to 4.
The gain in overall decoding performance, however, is small (less
than 1%).
This change also means we now valgrind clean on ARMv6, which is
its real purpose.
The errors reported here were valgrind's fault (it does not detect
that 0 times an uninitialized value is initialized), but Julian
Seward says it would slow down valgrind considerably to make such
checks.
Speeding up libvpx rather, even by a small amount, seems a much
better idea if only to enable proper valgrind checking of the
rest of the codec.
John Koleszar [Fri, 24 Sep 2010 15:10:25 +0000 (11:10 -0400)]
disable compilation of debugging code
This patch avoids compiling some debugging code in onyx_if.c. The most
significant fix is to avoid generating code for vp8_write_yuv_frame,
which is never called. Some other code was removed by the dead code
elimination performed by the compiler, and this patch does it with the
preprocessor instead. There are advantages both ways.
John Koleszar [Fri, 24 Sep 2010 15:21:35 +0000 (11:21 -0400)]
move reconintra_mt to decoder (for now)
reconintra_mt.c is only required for building the decoder right now.
It could definitely be used for the encoder in the future, but it
currently depends on decoder only data structures. (onyxd_int.h,
VP8D_COMP, etc). Move it from common/ to decoder/ until the
necessary changes to the common multithread code are complete.
This patch is needed to build with --disable-vp8-decoder.
John Koleszar [Tue, 21 Sep 2010 14:35:52 +0000 (10:35 -0400)]
Add getter functions for the interface data symbols
Having these symbols be available as functions rather than data is
occasionally more convenient. Implemented this way rather than a
get-codec-by-id style to avoid creating a link-time dependency
between the encoder and the decoder.
John Koleszar [Tue, 21 Sep 2010 15:54:36 +0000 (11:54 -0400)]
Don't reset mb clamping state during splitmv decoding
The MV decoding changes in c5fb0eb introduced a bug where the
macroblock clamping state was reset for each partition, so if an
earlier partition needed clamping but a subsequent one didn't,
the MB wouldn't receive clamping. Instead, the state is only
set during splitmv decoding, never cleared.
The patch related with issue #55 (5a72620) fixed some warnings, but the
fix was not optimal. It actually was a trick to confuse compiler rather
than a fix.
This patch fixes it by creating a new macro used when needed just a high
limit check for an unsigned.
Johann [Thu, 9 Sep 2010 19:55:19 +0000 (15:55 -0400)]
reorder data to use wider instructions
the previous commit laid the groundwork by doing two sets of idcts
together. this moved that further by grouping the interesting data
(q[0], q+16[0]) together to allow using wider instructions. also
managed to drop a few instructions by recognizing that the constant
for sinpi8sqrt2 could be downshifted all the time which avoided a
dowshift as well as workarounds for a function which only accepted
signed data
looks like a modest gain for performance: at qcif, went from ~180
fps to ~183
Change-Id: I842673f3080b8239e026cc9b50346dbccbab4adf
Yunqing Wang [Thu, 16 Sep 2010 18:08:52 +0000 (14:08 -0400)]
Restructure multi-threaded decoder
On each MB, loopfiltering is done right after MB decoding. This
combines two loops in multi-threaded code into one, which reduces
number of synchronizations to half.
The above-row/left-col data are saved in temp buffers for
next-row/next MB decoding.
Tests on 4-core gLucid machine showed 10% decoder performance
gain with threads=4 (tulip clip). Testing on other platforms
isn't done yet.
Fritz Koenig [Tue, 14 Sep 2010 22:46:37 +0000 (15:46 -0700)]
Modify GET_GOT macro for performance.
GET_GOT was producing a zero length call. This resulted in
pipeline flushes occuring when returing from the assembly
functions. Masked on out of order cores, but evident on
Atom cores.
Scott LaVarnway [Thu, 9 Sep 2010 18:42:48 +0000 (14:42 -0400)]
Improved subset block search
Improved the subset block search and fill. (about 3% improvement for
32 bit) Modified/merged the code in order to create
vp8_read_mb_modes_mv which can decode the modes/mvs on a macroblock
level. This will allow the decode loop (in the future) to decode
modes/mvs on a frame, row, or mb level.
Johann [Tue, 7 Sep 2010 18:21:27 +0000 (14:21 -0400)]
Update NEON wide idcts
Expand 93c32a55 which used SSE2 instructions to do two
idct/dequant/recons at a time to NEON. Initial working
commit. More work needs to be put into rearranging and
interlacing the data to take advantage of quadword
operations, which is when we'll hopefully see a much
better boost
John Koleszar [Thu, 9 Sep 2010 16:57:23 +0000 (12:57 -0400)]
Fix GF interval for non-lagged ARFs
When ARFs are enabled in non-lagged compress modes, the GF interval
was being reset to zero. Non-lagged ARF updates were enabled in commit 63ccfbd, but this incorrect GF interval caused a quality regression.
Jim Bankoski [Thu, 22 Jul 2010 20:07:13 +0000 (16:07 -0400)]
Skip unnecessary search of identical frames
vp8_get_compressed_data() was defeating logic in
encode_frame_to_datarate() that determined the reference buffers to
search and forcing all frames to be eligible to search. In cases
where buffers have identical contents, this is unnecessary extra
work.
Jim Bankoski [Thu, 22 Jul 2010 20:07:13 +0000 (16:07 -0400)]
Enable ARFs for non-lagged compress
ARFs were explicitly disabled except in lagged compress mode. New
ARF logic allows for the ARF buffer to hold an older golden frame,
which does not require lagged compress.
Scott LaVarnway [Thu, 2 Sep 2010 20:17:52 +0000 (16:17 -0400)]
Reduced the size of MB_MODE_INFO
Moved partition_bmi and partition_count out of MB_MODE_INFO and
placed into MACROBLOCK. Also reduced the size of other members
of the MB_MODE_INFO struct. For 1080p, the memory was reduced
by 1,209,516 bytes. The decoder performance appeared to improve
by 3% for the clip used.
Note: The main goal for this change is to improve the decoder
performance. The encoder will be revisited at a later date for
further structure cleanup.
James Zern [Fri, 20 Aug 2010 20:06:56 +0000 (16:06 -0400)]
encoder: remove postproc dependency
Remove the dependency on postproc.c for the encoder in general, the only
unchecked need for it is when CONFIG_PSNR is enabled. All other cases
are already wrapped in CONFIG_POSTPROC. In the CONFIG_PSNR case the file
will still be included.
Additionally, when VP8_SET_POSTPROC is used with the encoder when post
processing has been disabled an error will be returned.
Frank Galligan [Wed, 1 Sep 2010 20:40:18 +0000 (16:40 -0400)]
Fix rare deadlock before loop filter
There was an extremely rare deadlock that happened when one thread
was waiting to start the loop filter on frame n while the other
threads were starting to work on frame n+1.
The memory being zeroed in vp8_update_mode_info_border() was just
allocated with calloc, and so the entire function is actually
redundant, but it should be made correct in case someone expects
it to actually work in the future.
The timebase was being set to the value in the Y4M file on each
pass, but only doubled to account for the altref placement on
the first past.
This avoids reseting it on the second pass.
Fritz Koenig [Tue, 24 Aug 2010 23:27:49 +0000 (16:27 -0700)]
Allow --cpu= to work for x86.
--cpu was already implemented for most of our embedded
platforms, this just extends it to x86. Corner case for
Atom processor as it doesn't respond to the --march=
option under icc.
Johann [Tue, 24 Aug 2010 22:23:16 +0000 (18:23 -0400)]
clean up compiler warnings
did a test compile with clang and got rid of some warnings that have
been annoying me for a while:
vp8/decoder/detokenize.c: In function 'vp8_init_detokenizer':
vp8/decoder/detokenize.c:121: warning: assignment discards qualifiers from pointer target type
vp8/decoder/detokenize.c:122: warning: assignment discards qualifiers from pointer target type
vp8/decoder/detokenize.c:123: warning: assignment from incompatible pointer type
vp8/decoder/detokenize.c:124: warning: assignment discards qualifiers from pointer target type
vp8/decoder/detokenize.c:125: warning: assignment discards qualifiers from pointer target type
vp8/decoder/detokenize.c:128: warning: assignment discards qualifiers from pointer target type
vp8/decoder/detokenize.c:129: warning: assignment discards qualifiers from pointer target type
vp8/decoder/detokenize.c:130: warning: assignment discards qualifiers from pointer target type
vp8/decoder/detokenize.c:131: warning: assignment discards qualifiers from pointer target type