Scott LaVarnway [Wed, 12 Dec 2012 23:49:39 +0000 (15:49 -0800)]
Improved vp9_ihtllm_c
As suggested by Yaowu, we can use eob to reduce the complexity
of the vp9_ihtllm_c function. For the 1080p test clip used, the decoder
performance improved by 17%.
John Koleszar [Mon, 10 Dec 2012 20:07:59 +0000 (12:07 -0800)]
configure: add --enable-external-build support
First attempt at avoiding all the compile-time environment detection for
cases where you can generate the environments statically, as when the
real build is being performed by another build system.
John Koleszar [Thu, 6 Dec 2012 21:56:25 +0000 (13:56 -0800)]
libvpx_test: ensure rtcd init functions are called
In addition to allowing tests to use the RTCD-enabled functions (perhaps transitively)
without having run a full encode/decode test yet, this fixes a linking issue with
Apple's G++ whereby the Common symbols (the function pointers themselves) wouldn't
be resolved. Fixing this linking issue is the primary impetus for this patch, as none
of the tests exercise the RTCD functionality except through the main API.
Johann [Mon, 3 Dec 2012 20:26:51 +0000 (12:26 -0800)]
Begin to refactor vpx_scale usage in VP9
Only declare the functions in vpx_scale RTCD and include the relevant
header.
Remove unused files and functions in vpx_scale to avoid wasting time
renaming. vpx_scale/win32/scaleopt.c contains functions which have not
been called in a long time but are potentially optimized.
The 'vp8' functions have not been renamed yet. That is for after the
cleanup.
Yaowu Xu [Tue, 4 Dec 2012 16:35:37 +0000 (08:35 -0800)]
Fix the build with MSVC
1. remove the dependency on non existing "vp9_temporal_filter_x86.h"
2. prefix filenames with vp9_ in obj_int_extract.bat to reflect the
change of the actual filenames.
Yaowu Xu [Thu, 29 Nov 2012 01:34:02 +0000 (17:34 -0800)]
minor fix to eob check for setting CONTEXT
Previously, the "!=" check is logically incorrect when eob is at 0 and
effective coefficient starting position is 1. This commit should have
no effect on bitstream.
Deb Mukherjee [Thu, 15 Nov 2012 23:14:38 +0000 (15:14 -0800)]
Fixing 8x8/4x4 ADST for intra modes with tx select
This patch allows use of 8x8 and 4x4 ADST correctly for Intra
16x16 modes and Intra 8x8 modes when the block size selected
is smaller than the prediction mode. Also includes some cleanups
and refactoring.
Yunqing Wang [Wed, 28 Nov 2012 03:16:32 +0000 (19:16 -0800)]
Further improve macroblock loop filters
This change included:
1. Aligned reads in vp9_mbloop_filter_vertical_edge function.
Since we actually read 16 bytes, we can align the reads to read
starting at (s - 8) instead of (s - 5).
2. Combined u, v loop filters.
3. Added 8x16 transpose.
This gave 2% decoder performance gain (tulip clip).
Yaowu Xu [Tue, 27 Nov 2012 20:41:59 +0000 (12:41 -0800)]
removed redundant mode_context data structures
This commit removed a couple of redundant data structures in frame
coding contextsm, mode_context and mode_context_a, and changed to
use vp9_mode_contexts only. The switch of the context for different
frame type now relies on the switch of frame coding context between
lfc and lfc_a. This commit also removed a number of memcpy among
these redundant data structure.
John Koleszar [Tue, 27 Nov 2012 19:16:15 +0000 (11:16 -0800)]
Clamp decoded feature data
Not all segment feature data elements are full-range powers of two, so
there are values that can be encoded that are invalid. Add a new function
to clamp values to the maximum allowed.
Yunqing Wang [Wed, 21 Nov 2012 00:28:08 +0000 (16:28 -0800)]
Improve sad3x16 SSE2 function
Vp9_sad3x16_sse2() is heavily called in decoder, in which the
unaligned reads consume lots of cpu cycles. When CONFIG_SUBPELREFMV
is off, the unaligned offset is 1. In this situation,
we can adjust the src_ptr to be 4-byte aligned, and then do the
aligned reads. This reduced the reading time significantly. Tests
on 1080p clip showed over 2% decoder performance gain with
CONFIG_SUBPELREFM off.
Ronald S. Bultje [Fri, 23 Nov 2012 19:23:50 +0000 (11:23 -0800)]
Move switch(tx_size) around txsize to detokenize.c.
Add a new function vp9_decode_mb_tokens() that handles the switch
between different per-tx-size detokenize functions. Make actual
implementations (vp9_decode_mb_tokens_NxN()) static.
Ronald S. Bultje [Fri, 23 Nov 2012 17:43:13 +0000 (09:43 -0800)]
Restructure vp9_decode_mb_tokens_8x8() a bit.
Don't declare variables if they only ever have a single value and are
used only as argument to another function call; instead, just hardcode
the value in the function call directly. Split out UV and Y coefficient
loops for clarity. Use xd->block[].qcoeff instead of xd->qcoeff + magic
to remove use of magic offset variables.
Ronald S. Bultje [Fri, 23 Nov 2012 17:11:12 +0000 (09:11 -0800)]
Restructure vp9_decode_mb_tokens_16x16() a bit.
Don't declare variables if they only ever have a single value and are
used only as argument to another function call; instead, just hardcode
the value in the function call directly. Also remove unneeded brackets
around a code block, and remove the magic offsets 64 and 256 for chroma
values in the coefficient memory block.
John Koleszar [Mon, 19 Nov 2012 18:45:20 +0000 (10:45 -0800)]
make: fix dependency generation for flat build tree
Update the fmt_deps function to use a new sed expression to convert the
object file name generated by the compiler into the path-transformed
name of the .o and .d files.
Prior to this patch, changing a header file would not trigger an
incremental build.