Dmitry Kovalev [Fri, 9 Aug 2013 21:41:51 +0000 (14:41 -0700)]
Moving loopfilter struct to VP9_COMMON.
Loop filter configuration doesn't belong to macroblock, so moving it from
MACROBLOCKD to VP9_COMMON. Also moving the declaration of loopfilter struct
from vp9_blockd.h to vp9_loopfilter.h.
Dmitry Kovalev [Thu, 8 Aug 2013 21:52:39 +0000 (14:52 -0700)]
General code cleanup.
Removing redundant parenthesis and curly braces. Combining declarations
with initializations. Adding useful intermediate variables instead of
recalculating expressions every time.
Yaowu Xu [Fri, 9 Aug 2013 01:25:03 +0000 (18:25 -0700)]
fix unit test failure on win32 vs2008 build
The mix use of double type and simd code caused invalid values stored
in double variables, further caused unit tests to fail. The failures
were only observed on x86-win32-vs9 build with vs2008.
Deb Mukherjee [Thu, 8 Aug 2013 00:01:43 +0000 (17:01 -0700)]
Adds a new subpel motion function
Adds a new subpel motion estimation function that uses a 2-level
tree-structured decision tree to eliminate redundant computations.
It searches fewer points than iterative search (which can search
the same point multiple times) but has the same quality roughly.
This is made the default setting at speeds 0 and 1, while at
speed 2 and above only a 1-level search is used.
Also includes various cleanups for consistency and redundancy removal.
Results:
derf: +0.012% psnr
stdhd: +0.09% psnr
Speedup of about 2-3%
Adrian Grange [Thu, 1 Aug 2013 21:57:46 +0000 (14:57 -0700)]
Simplify & fix potential bug in rd_pick_partition
Different partitionings were not being evaluated against
best_rd and there were unnecessary calls to RDCOST. This
could have resulted in a non-optimal partioning being
selected.
I simplified the variables used to track the rate,
distortion and RD values throughout the function.
Jingning Han [Wed, 7 Aug 2013 22:22:51 +0000 (15:22 -0700)]
Use low precision 32x32fdct for encodemb in speed1
The low precision 32x32 fdct has all the intermediate steps within
16-bit depth, hence allowing faster SSE2 implementation, at the
expense of larger round-trip error. It was used in the rate-distortion
optimization search loop only.
Using the low precision version, in replace of the high precision one,
affects the compression performance by about 0.7% (derf, stdhd) at
speed 0. For speed 1, it makes derf set down by only 0.017%.
Dmitry Kovalev [Wed, 7 Aug 2013 22:33:17 +0000 (15:33 -0700)]
Adding ss_size_lookup table.
Removing the old one bsize_from_dim_lookup. Now we have a way to determine
block size for plane using its subsampling values (ss_size_lookup). And
then we can find the number of pixels in the block (num_pels_log2_lookup).
Dmitry Kovalev [Tue, 6 Aug 2013 22:43:56 +0000 (15:43 -0700)]
Using only one scale function in scale_factors struct.
Functions scale_mv_q4 and scale_mv_q3_to_q4 were almost identical except
q3->q4 conversion in scale_mv_q3_to_q4. Now q3->q4 conversion happens
directly in vp9_build_inter_predictor.
Also adding useful constants: SUBPEL_BITS and SUBPEL_MASK.
Deb Mukherjee [Mon, 22 Jul 2013 21:47:57 +0000 (14:47 -0700)]
Flexible support for various pattern searches
Adds a few pattern searches to achieve various tradeoffs
between motion estimation complexity and performance.
The search framework is unified across these searches so that a
common pattern search function is used for all. Besides it will
be easier to experiment with various patterns or combinations
thereof at different scales in the future.
The new pattern search is multi-scale and is capable of using
different patterns at different scales.
The new hex search uses 8 points at the smallest scale
and 6 points at other scales.
Two other pattern searches - big-diamond and square are
also added. Big diamond uses 4 points at the smallest scale and
8 points in diamond shape at the larger scales.
Square is very similar conceptually to the default n-step search
but is somewhat faster since it keeps only one survivor across
all scales.
Move fdct32x32 SSE2 implementation in separate file.
This is in preparation for the SSE2 version of the high-precision
32x32 forward DCT which will share a lot of code with the existing
low precision version used for rate-distortion search.
Deb Mukherjee [Wed, 31 Jul 2013 16:33:58 +0000 (09:33 -0700)]
Add variance based mode/skipping
Adds a speed feature to skip all intra modes other than
DC_PRED if the source variance is small. This feature is
made part of speed 1 and up.
Results on derf300: psnr -0.07%, speedup about 1-2%
Also uses the source variance to fine-tune the early
termination criteria when FLAG_EARLY_TERMINATE is on.
This feature is made part of speed 2 and up.
Results on derf300: psnr -0.52%, speedup about 5-7%
Dmitry Kovalev [Mon, 5 Aug 2013 19:26:15 +0000 (12:26 -0700)]
Changing the order switchable filter enum constants.
This changeset allows to remove vp9_switchable_interp and
vp9_switchable_interp_map arrays and make code much clear. Actually we
still have to use these mapping but only inside read_interp_filter_type and
write_interp_filter_type functions.
Jim Bankoski [Mon, 5 Aug 2013 19:07:30 +0000 (12:07 -0700)]
Begin to restrict x86inc.asm usage
Chromium does not support 32bit builds for Mac which use x86inc.asm.
Make the files which include it work if 64bit or not PIC enabled
starting with vp9_copy_sse2.asm
Mans Rullgard [Tue, 30 Jul 2013 17:11:06 +0000 (18:11 +0100)]
vp9: neon: optimise loads in horiz convolve functions
Loading to single lanes in multiple registers is expensive since
it requires a read and write of each register which saturates
the register file access. Loading to single registers followed
by a separate transpose reduces this pressure.