Johann [Mon, 3 Dec 2012 20:26:51 +0000 (12:26 -0800)]
Begin to refactor vpx_scale usage in VP9
Only declare the functions in vpx_scale RTCD and include the relevant
header.
Remove unused files and functions in vpx_scale to avoid wasting time
renaming. vpx_scale/win32/scaleopt.c contains functions which have not
been called in a long time but are potentially optimized.
The 'vp8' functions have not been renamed yet. That is for after the
cleanup.
Paul Wilkins [Tue, 4 Dec 2012 17:21:05 +0000 (17:21 +0000)]
Change to MV reference search.
This patch reduces the cpu cost of the MV ref
search by only allowing insert for candidates
that would be in the current top 4.
This could alter the outcome and slightly favors
near candidates which are tested first but also
limits the worst case loop count to 4 and means in
many cases it will drop out and not happen.
Yaowu Xu [Tue, 4 Dec 2012 16:35:37 +0000 (08:35 -0800)]
Fix the build with MSVC
1. remove the dependency on non existing "vp9_temporal_filter_x86.h"
2. prefix filenames with vp9_ in obj_int_extract.bat to reflect the
change of the actual filenames.
Yaowu Xu [Mon, 3 Dec 2012 22:53:45 +0000 (14:53 -0800)]
merged optimiz_b_16x16() into optmize_b()
The commit changed the trellis quantization function optimize_b() to
work for MBs using all transform sizes, and eliminated the function
for MB using 16x16 transform only, optimize_b_16x16.
Johann [Mon, 3 Dec 2012 20:26:51 +0000 (12:26 -0800)]
Begin to refactor vpx_scale usage in VP9
Only declare the functions in vpx_scale RTCD and include the relevant
header.
Remove unused files and functions in vpx_scale to avoid wasting time
renaming. vpx_scale/win32/scaleopt.c contains functions which have not
been called in a long time but are potentially optimized.
The 'vp8' functions have not been renamed yet. That is for after the
cleanup.
Yaowu Xu [Thu, 29 Nov 2012 01:34:02 +0000 (17:34 -0800)]
minor fix to eob check for setting CONTEXT
Previously, the "!=" check is logically incorrect when eob is at 0 and
effective coefficient starting position is 1. This commit should have
no effect on bitstream.
Deb Mukherjee [Thu, 15 Nov 2012 23:14:38 +0000 (15:14 -0800)]
Fixing 8x8/4x4 ADST for intra modes with tx select
This patch allows use of 8x8 and 4x4 ADST correctly for Intra
16x16 modes and Intra 8x8 modes when the block size selected
is smaller than the prediction mode. Also includes some cleanups
and refactoring.
Yunqing Wang [Wed, 28 Nov 2012 03:16:32 +0000 (19:16 -0800)]
Further improve macroblock loop filters
This change included:
1. Aligned reads in vp9_mbloop_filter_vertical_edge function.
Since we actually read 16 bytes, we can align the reads to read
starting at (s - 8) instead of (s - 5).
2. Combined u, v loop filters.
3. Added 8x16 transpose.
This gave 2% decoder performance gain (tulip clip).
Yaowu Xu [Tue, 27 Nov 2012 20:41:59 +0000 (12:41 -0800)]
removed redundant mode_context data structures
This commit removed a couple of redundant data structures in frame
coding contextsm, mode_context and mode_context_a, and changed to
use vp9_mode_contexts only. The switch of the context for different
frame type now relies on the switch of frame coding context between
lfc and lfc_a. This commit also removed a number of memcpy among
these redundant data structure.
John Koleszar [Tue, 27 Nov 2012 19:16:15 +0000 (11:16 -0800)]
Clamp decoded feature data
Not all segment feature data elements are full-range powers of two, so
there are values that can be encoded that are invalid. Add a new function
to clamp values to the maximum allowed.
Yunqing Wang [Wed, 21 Nov 2012 00:28:08 +0000 (16:28 -0800)]
Improve sad3x16 SSE2 function
Vp9_sad3x16_sse2() is heavily called in decoder, in which the
unaligned reads consume lots of cpu cycles. When CONFIG_SUBPELREFMV
is off, the unaligned offset is 1. In this situation,
we can adjust the src_ptr to be 4-byte aligned, and then do the
aligned reads. This reduced the reading time significantly. Tests
on 1080p clip showed over 2% decoder performance gain with
CONFIG_SUBPELREFM off.
Ronald S. Bultje [Fri, 23 Nov 2012 19:23:50 +0000 (11:23 -0800)]
Move switch(tx_size) around txsize to detokenize.c.
Add a new function vp9_decode_mb_tokens() that handles the switch
between different per-tx-size detokenize functions. Make actual
implementations (vp9_decode_mb_tokens_NxN()) static.
Ronald S. Bultje [Fri, 23 Nov 2012 17:43:13 +0000 (09:43 -0800)]
Restructure vp9_decode_mb_tokens_8x8() a bit.
Don't declare variables if they only ever have a single value and are
used only as argument to another function call; instead, just hardcode
the value in the function call directly. Split out UV and Y coefficient
loops for clarity. Use xd->block[].qcoeff instead of xd->qcoeff + magic
to remove use of magic offset variables.
Ronald S. Bultje [Fri, 23 Nov 2012 17:11:12 +0000 (09:11 -0800)]
Restructure vp9_decode_mb_tokens_16x16() a bit.
Don't declare variables if they only ever have a single value and are
used only as argument to another function call; instead, just hardcode
the value in the function call directly. Also remove unneeded brackets
around a code block, and remove the magic offsets 64 and 256 for chroma
values in the coefficient memory block.
John Koleszar [Mon, 19 Nov 2012 18:45:20 +0000 (10:45 -0800)]
make: fix dependency generation for flat build tree
Update the fmt_deps function to use a new sed expression to convert the
object file name generated by the compiler into the path-transformed
name of the .o and .d files.
Prior to this patch, changing a header file would not trigger an
incremental build.