Deb Mukherjee [Sat, 3 Aug 2013 00:15:38 +0000 (17:15 -0700)]
Speed feature to skip split partition based on var
Adds a speed feature to disable split partition search based on a
given threshold on the source variance. A tighter threshold derived
from the threshold provided is used to also disable horizontal and
vertical partitions.
Results on stdhdraw250:
threshold = 32, psnr = -0.18%, speedup is somewhat more than derf
because of a larger number of smoother blocks at higher resolution.
Based on these results, a threshold of 32 is chosen for speed 1,
and a threshold of 64 is chosen for speeds 2 and above.
Jingning Han [Wed, 14 Aug 2013 01:58:21 +0000 (18:58 -0700)]
Unify luma and chroma rd-cost estimation
This commit unifies the rate-distortion cost calculation process of
luma and chroma components. It allows early termination to be enabled
later in the rd search loop of chroma components, in consistent with
luma pixels.
James Zern [Thu, 15 Aug 2013 01:28:42 +0000 (18:28 -0700)]
vpxenc: open output file after setting pass #
write_ivf_file_header would incorrectly skip writing the file header in
the 2nd pass, causing the initial frame header to be overwritten on
close potential causing an overly large frame header to be read and a
crash.
most likely broken since: 9e50ed7 vpxenc: initial implementation of multistream support
Dmitry Kovalev [Wed, 14 Aug 2013 18:39:31 +0000 (11:39 -0700)]
foreach_transformed_block_in_plane cleanup, explicit tx_size var.
Making foreach_transformed_block_in_plane more clear (it's not finished
yet). Using explicit tx_size variable consistently instead of
(ss_txfrm_size / 2) or (ss_txfrm_size >> 1) expression.
Paul Wilkins [Tue, 13 Aug 2013 16:47:40 +0000 (17:47 +0100)]
Renaming in MB_MODE_INFO
The macro block mode info context originally contained an
entry for each 16x16 macroblock. In VP9 each entry refers
to an 8x8 region not a macro block, so the naming is misleading.
This first stage clean up changes the names of 3 entries in the
structure to remove the mb_ prefix.
TODO clean up the nomenclature more widely in respect of
mbmi and bmi.
Jingning Han [Wed, 7 Aug 2013 21:45:37 +0000 (14:45 -0700)]
SSE2 high precision 32x32 forward DCT
Enable SSE2 implementation of high precision 32x32 forward DCT. The
intermediate stacks are of 32-bits. The run-time goes down from
32126 cycles to 13442 cycles.
Dmitry Kovalev [Mon, 12 Aug 2013 20:54:13 +0000 (13:54 -0700)]
Removing foreach_predicted_block_uv function.
Adding function build_inter_predictors_for_planes to build inter
predictors for specified planes. This function allows to remove
condition "#if CONFIG_ALPHA" and use MAX_MB_PLANE for general case.
Renaming 'which_mv' local var to 'ref', and 'weight' argument to 'ref'.
Dmitry Kovalev [Fri, 9 Aug 2013 21:41:51 +0000 (14:41 -0700)]
Moving loopfilter struct to VP9_COMMON.
Loop filter configuration doesn't belong to macroblock, so moving it from
MACROBLOCKD to VP9_COMMON. Also moving the declaration of loopfilter struct
from vp9_blockd.h to vp9_loopfilter.h.
Dmitry Kovalev [Thu, 8 Aug 2013 21:52:39 +0000 (14:52 -0700)]
General code cleanup.
Removing redundant parenthesis and curly braces. Combining declarations
with initializations. Adding useful intermediate variables instead of
recalculating expressions every time.
Yaowu Xu [Fri, 9 Aug 2013 01:25:03 +0000 (18:25 -0700)]
fix unit test failure on win32 vs2008 build
The mix use of double type and simd code caused invalid values stored
in double variables, further caused unit tests to fail. The failures
were only observed on x86-win32-vs9 build with vs2008.
Deb Mukherjee [Thu, 8 Aug 2013 00:01:43 +0000 (17:01 -0700)]
Adds a new subpel motion function
Adds a new subpel motion estimation function that uses a 2-level
tree-structured decision tree to eliminate redundant computations.
It searches fewer points than iterative search (which can search
the same point multiple times) but has the same quality roughly.
This is made the default setting at speeds 0 and 1, while at
speed 2 and above only a 1-level search is used.
Also includes various cleanups for consistency and redundancy removal.
Results:
derf: +0.012% psnr
stdhd: +0.09% psnr
Speedup of about 2-3%
Adrian Grange [Thu, 1 Aug 2013 21:57:46 +0000 (14:57 -0700)]
Simplify & fix potential bug in rd_pick_partition
Different partitionings were not being evaluated against
best_rd and there were unnecessary calls to RDCOST. This
could have resulted in a non-optimal partioning being
selected.
I simplified the variables used to track the rate,
distortion and RD values throughout the function.
Jingning Han [Wed, 7 Aug 2013 22:22:51 +0000 (15:22 -0700)]
Use low precision 32x32fdct for encodemb in speed1
The low precision 32x32 fdct has all the intermediate steps within
16-bit depth, hence allowing faster SSE2 implementation, at the
expense of larger round-trip error. It was used in the rate-distortion
optimization search loop only.
Using the low precision version, in replace of the high precision one,
affects the compression performance by about 0.7% (derf, stdhd) at
speed 0. For speed 1, it makes derf set down by only 0.017%.
Dmitry Kovalev [Wed, 7 Aug 2013 22:33:17 +0000 (15:33 -0700)]
Adding ss_size_lookup table.
Removing the old one bsize_from_dim_lookup. Now we have a way to determine
block size for plane using its subsampling values (ss_size_lookup). And
then we can find the number of pixels in the block (num_pels_log2_lookup).