Yaowu Xu [Wed, 28 Nov 2012 23:15:51 +0000 (15:15 -0800)]
experiment with CONTEXT conversion
This commit changed the ENTROPY_CONTEXT conversion between MBs that
have different transform sizes.
In additioin, this commit also did a number of cleanup/bug fix:
1. removed duplicate function vp9_fix_contexts() and changed to use
vp8_reset_mb_token_contexts() for both encoder and decoder
2. fixed a bug in stuff_mb_16x16 where wrong context was used for
the UV.
3. changed reset all context to 0 if a MB is skipped to simplify the
logic.
Don't use vp9_decode_coefs_4x4() for 2nd order DC or luma blocks. The
code introduces some overhead which is unnecessary for these cases.
Also, remove variable declarations that are only used once, remove
magic offsets into the coefficient buffer (use xd->block[i].qcoeff
instead of xd->qcoeff + magic_offset), and fix a few Google Style
Guide violations.
Use these, instead of the 4/5-dimensional arrays, to hold statistics,
counts, accumulations and probabilities for coefficient tokens. This
commit also re-allows ENTROPY_STATS to compile.
This adds Debargha's DCT/DWT hybrid and a regular 32x32 DCT, and adds
code all over the place to wrap that in the bitstream/encoder/decoder/RD.
Some implementation notes (these probably need careful review):
- token range is extended by 1 bit, since the value range out of this
transform is [-16384,16383].
- the coefficients coming out of the FDCT are manually scaled back by
1 bit, or else they won't fit in int16_t (they are 17 bits). Because
of this, the RD error scoring does not right-shift the MSE score by
two (unlike for 4x4/8x8/16x16).
- to compensate for this loss in precision, the quantizer is halved
also. This is currently a little hacky.
- FDCT and IDCT is double-only right now. Needs a fixed-point impl.
- There are no default probabilities for the 32x32 transform yet; I'm
simply using the 16x16 luma ones. A future commit will add newly
generated probabilities for all transforms.
- No ADST version. I don't think we'll add one for this level; if an
ADST is desired, transform-size selection can scale back to 16x16
or lower, and use an ADST at that level.
Additional notes specific to Debargha's DWT/DCT hybrid:
- coefficient scale is different for the top/left 16x16 (DCT-over-DWT)
block than for the rest (DWT pixel differences) of the block. Therefore,
RD error scoring isn't easily scalable between coefficient and pixel
domain. Thus, unfortunately, we need to compute the RD distortion in
the pixel domain until we figure out how to scale these appropriately.
Johann [Mon, 3 Dec 2012 20:26:51 +0000 (12:26 -0800)]
Begin to refactor vpx_scale usage in VP9
Only declare the functions in vpx_scale RTCD and include the relevant
header.
Remove unused files and functions in vpx_scale to avoid wasting time
renaming. vpx_scale/win32/scaleopt.c contains functions which have not
been called in a long time but are potentially optimized.
The 'vp8' functions have not been renamed yet. That is for after the
cleanup.
Paul Wilkins [Tue, 4 Dec 2012 17:21:05 +0000 (17:21 +0000)]
Change to MV reference search.
This patch reduces the cpu cost of the MV ref
search by only allowing insert for candidates
that would be in the current top 4.
This could alter the outcome and slightly favors
near candidates which are tested first but also
limits the worst case loop count to 4 and means in
many cases it will drop out and not happen.
Yaowu Xu [Tue, 4 Dec 2012 16:35:37 +0000 (08:35 -0800)]
Fix the build with MSVC
1. remove the dependency on non existing "vp9_temporal_filter_x86.h"
2. prefix filenames with vp9_ in obj_int_extract.bat to reflect the
change of the actual filenames.
Yaowu Xu [Mon, 3 Dec 2012 22:53:45 +0000 (14:53 -0800)]
merged optimiz_b_16x16() into optmize_b()
The commit changed the trellis quantization function optimize_b() to
work for MBs using all transform sizes, and eliminated the function
for MB using 16x16 transform only, optimize_b_16x16.
Johann [Mon, 3 Dec 2012 20:26:51 +0000 (12:26 -0800)]
Begin to refactor vpx_scale usage in VP9
Only declare the functions in vpx_scale RTCD and include the relevant
header.
Remove unused files and functions in vpx_scale to avoid wasting time
renaming. vpx_scale/win32/scaleopt.c contains functions which have not
been called in a long time but are potentially optimized.
The 'vp8' functions have not been renamed yet. That is for after the
cleanup.
Yaowu Xu [Thu, 29 Nov 2012 01:34:02 +0000 (17:34 -0800)]
minor fix to eob check for setting CONTEXT
Previously, the "!=" check is logically incorrect when eob is at 0 and
effective coefficient starting position is 1. This commit should have
no effect on bitstream.
Deb Mukherjee [Thu, 15 Nov 2012 23:14:38 +0000 (15:14 -0800)]
Fixing 8x8/4x4 ADST for intra modes with tx select
This patch allows use of 8x8 and 4x4 ADST correctly for Intra
16x16 modes and Intra 8x8 modes when the block size selected
is smaller than the prediction mode. Also includes some cleanups
and refactoring.
Yunqing Wang [Wed, 28 Nov 2012 03:16:32 +0000 (19:16 -0800)]
Further improve macroblock loop filters
This change included:
1. Aligned reads in vp9_mbloop_filter_vertical_edge function.
Since we actually read 16 bytes, we can align the reads to read
starting at (s - 8) instead of (s - 5).
2. Combined u, v loop filters.
3. Added 8x16 transpose.
This gave 2% decoder performance gain (tulip clip).
Yaowu Xu [Tue, 27 Nov 2012 20:41:59 +0000 (12:41 -0800)]
removed redundant mode_context data structures
This commit removed a couple of redundant data structures in frame
coding contextsm, mode_context and mode_context_a, and changed to
use vp9_mode_contexts only. The switch of the context for different
frame type now relies on the switch of frame coding context between
lfc and lfc_a. This commit also removed a number of memcpy among
these redundant data structure.
John Koleszar [Tue, 27 Nov 2012 19:16:15 +0000 (11:16 -0800)]
Clamp decoded feature data
Not all segment feature data elements are full-range powers of two, so
there are values that can be encoded that are invalid. Add a new function
to clamp values to the maximum allowed.
Yunqing Wang [Wed, 21 Nov 2012 00:28:08 +0000 (16:28 -0800)]
Improve sad3x16 SSE2 function
Vp9_sad3x16_sse2() is heavily called in decoder, in which the
unaligned reads consume lots of cpu cycles. When CONFIG_SUBPELREFMV
is off, the unaligned offset is 1. In this situation,
we can adjust the src_ptr to be 4-byte aligned, and then do the
aligned reads. This reduced the reading time significantly. Tests
on 1080p clip showed over 2% decoder performance gain with
CONFIG_SUBPELREFM off.
Ronald S. Bultje [Fri, 23 Nov 2012 19:23:50 +0000 (11:23 -0800)]
Move switch(tx_size) around txsize to detokenize.c.
Add a new function vp9_decode_mb_tokens() that handles the switch
between different per-tx-size detokenize functions. Make actual
implementations (vp9_decode_mb_tokens_NxN()) static.
Ronald S. Bultje [Fri, 23 Nov 2012 17:43:13 +0000 (09:43 -0800)]
Restructure vp9_decode_mb_tokens_8x8() a bit.
Don't declare variables if they only ever have a single value and are
used only as argument to another function call; instead, just hardcode
the value in the function call directly. Split out UV and Y coefficient
loops for clarity. Use xd->block[].qcoeff instead of xd->qcoeff + magic
to remove use of magic offset variables.
Ronald S. Bultje [Fri, 23 Nov 2012 17:11:12 +0000 (09:11 -0800)]
Restructure vp9_decode_mb_tokens_16x16() a bit.
Don't declare variables if they only ever have a single value and are
used only as argument to another function call; instead, just hardcode
the value in the function call directly. Also remove unneeded brackets
around a code block, and remove the magic offsets 64 and 256 for chroma
values in the coefficient memory block.
John Koleszar [Mon, 19 Nov 2012 18:45:20 +0000 (10:45 -0800)]
make: fix dependency generation for flat build tree
Update the fmt_deps function to use a new sed expression to convert the
object file name generated by the compiler into the path-transformed
name of the .o and .d files.
Prior to this patch, changing a header file would not trigger an
incremental build.