Yaowu Xu [Tue, 4 Dec 2012 16:35:37 +0000 (08:35 -0800)]
Fix the build with MSVC
1. remove the dependency on non existing "vp9_temporal_filter_x86.h"
2. prefix filenames with vp9_ in obj_int_extract.bat to reflect the
change of the actual filenames.
Yaowu Xu [Thu, 29 Nov 2012 01:34:02 +0000 (17:34 -0800)]
minor fix to eob check for setting CONTEXT
Previously, the "!=" check is logically incorrect when eob is at 0 and
effective coefficient starting position is 1. This commit should have
no effect on bitstream.
Deb Mukherjee [Thu, 15 Nov 2012 23:14:38 +0000 (15:14 -0800)]
Fixing 8x8/4x4 ADST for intra modes with tx select
This patch allows use of 8x8 and 4x4 ADST correctly for Intra
16x16 modes and Intra 8x8 modes when the block size selected
is smaller than the prediction mode. Also includes some cleanups
and refactoring.
Yunqing Wang [Wed, 28 Nov 2012 03:16:32 +0000 (19:16 -0800)]
Further improve macroblock loop filters
This change included:
1. Aligned reads in vp9_mbloop_filter_vertical_edge function.
Since we actually read 16 bytes, we can align the reads to read
starting at (s - 8) instead of (s - 5).
2. Combined u, v loop filters.
3. Added 8x16 transpose.
This gave 2% decoder performance gain (tulip clip).
Yaowu Xu [Tue, 27 Nov 2012 20:41:59 +0000 (12:41 -0800)]
removed redundant mode_context data structures
This commit removed a couple of redundant data structures in frame
coding contextsm, mode_context and mode_context_a, and changed to
use vp9_mode_contexts only. The switch of the context for different
frame type now relies on the switch of frame coding context between
lfc and lfc_a. This commit also removed a number of memcpy among
these redundant data structure.
John Koleszar [Tue, 27 Nov 2012 19:16:15 +0000 (11:16 -0800)]
Clamp decoded feature data
Not all segment feature data elements are full-range powers of two, so
there are values that can be encoded that are invalid. Add a new function
to clamp values to the maximum allowed.
Yunqing Wang [Wed, 21 Nov 2012 00:28:08 +0000 (16:28 -0800)]
Improve sad3x16 SSE2 function
Vp9_sad3x16_sse2() is heavily called in decoder, in which the
unaligned reads consume lots of cpu cycles. When CONFIG_SUBPELREFMV
is off, the unaligned offset is 1. In this situation,
we can adjust the src_ptr to be 4-byte aligned, and then do the
aligned reads. This reduced the reading time significantly. Tests
on 1080p clip showed over 2% decoder performance gain with
CONFIG_SUBPELREFM off.
Ronald S. Bultje [Fri, 23 Nov 2012 19:23:50 +0000 (11:23 -0800)]
Move switch(tx_size) around txsize to detokenize.c.
Add a new function vp9_decode_mb_tokens() that handles the switch
between different per-tx-size detokenize functions. Make actual
implementations (vp9_decode_mb_tokens_NxN()) static.
Ronald S. Bultje [Fri, 23 Nov 2012 17:43:13 +0000 (09:43 -0800)]
Restructure vp9_decode_mb_tokens_8x8() a bit.
Don't declare variables if they only ever have a single value and are
used only as argument to another function call; instead, just hardcode
the value in the function call directly. Split out UV and Y coefficient
loops for clarity. Use xd->block[].qcoeff instead of xd->qcoeff + magic
to remove use of magic offset variables.
Ronald S. Bultje [Fri, 23 Nov 2012 17:11:12 +0000 (09:11 -0800)]
Restructure vp9_decode_mb_tokens_16x16() a bit.
Don't declare variables if they only ever have a single value and are
used only as argument to another function call; instead, just hardcode
the value in the function call directly. Also remove unneeded brackets
around a code block, and remove the magic offsets 64 and 256 for chroma
values in the coefficient memory block.
John Koleszar [Mon, 19 Nov 2012 18:45:20 +0000 (10:45 -0800)]
make: fix dependency generation for flat build tree
Update the fmt_deps function to use a new sed expression to convert the
object file name generated by the compiler into the path-transformed
name of the .o and .d files.
Prior to this patch, changing a header file would not trigger an
incremental build.
Ronald S. Bultje [Sat, 17 Nov 2012 06:26:12 +0000 (22:26 -0800)]
Remove special-case inline detokenization in b_pred reconstruction.
Just like for all other block modes, b_pred tokens can be read together
before starting macroblock reconstruction. This removes special cases
for b_pred in decode_macroblock() and allows to make decode_coefs_4x4()
static in detokenize.c.
While at it, remove the redundant handling and checking of plane_type
and block_index (i) in decode_coefs_4x4(). Since the function is static,
and is called only from decode_mb_tokens_4x4(), we don't need to worry
that the arguments ever go out of sync.
Paul Wilkins [Fri, 16 Nov 2012 16:31:32 +0000 (16:31 +0000)]
Further experimentation with the mode context
Experiments with a larger set of contexts and some
clean up to replace magic numbers regarding the
number of contexts.
The starting values and rate of backwards adaption
are still suspect and based on a small set of tests.
Added forwards adjustment of probabilities.
The net result of adding the new context and forward
update is small compared to the old context from the
legacy find_near function. (down a little on derf but
up by a similar amount for HD)
HOWEVER.... with the new context and forward update
the impact of disabling the reverse update (which may be
necessary in some use cases to facilitate parallel decoding)
is hugely reduced.
For the old context without forward update, the impact of
turning off reverse update (Experiment was with SB off) was
Derf - 0.9, Yt -1.89, ythd -2.75 and sthd -8.35. The impact was
mainly at low data rates.
With the new context and forward update enabled the impact
for all the test sets was no more than 0.5-1% (again most at
the low end).