Geza Lore [Tue, 3 Nov 2015 11:10:20 +0000 (11:10 +0000)]
Eliminate copying for FLIPADST in fwd transforms.
This patch eliminates the copying of data when using FLIPADST forward
transforms, by incorporating the necessary data flipping into the
load_buffer_* functions of the SSE2 optimized forward transforms. The
load_buffer_* functions are normally inlined, so the overhead of copying
the data is removed and the overhead of flipping is minimized. Left to
right flipping is still not free, as the columns need to be shuffled in
registers.
To preserve identity between the C and SSE2 implementations, the
appropriate C implementations now also do the data flipping as part of
the transform, rather than relying on the caller for flipping the input.
Overall speedup is about 1.5-2% in encode on my tests. Note that these
are only the forward transforms. Inverse transforms to come in a later
patch.
There are also a few code hygiene changes:
- Fixed some indents of switch statements.
- DCT_DCT transform now always use vp10_fht* functions, which dispatch
to vpx_fdct* for DCT_DCT (some of them used to call vpx_fdct*
directly, some of them used to call vp10_fht*).
Geza Lore [Tue, 3 Nov 2015 13:53:32 +0000 (13:53 +0000)]
Fix transform tables in C implementations.
These tables were out of sync with the indexing enum since the
refactoring in commit 4f16f119 (change 303389), due to the removal
of the ext_tx_to_txtype lookup table. This patch just puts them
back in order.
Marco [Sun, 1 Nov 2015 22:40:05 +0000 (14:40 -0800)]
Move noise level estimate outside denoiser.
Source noise level estimate is also useful for
setting variance encoder parameters (variance thresholds,
qp-delta, mode selection, etc), so allow it to be used also
if denoising is not on.
Change-Id: I38952cd55b91f35e5db45bc8e6a20ef25069c464
--ext-refs: extended references - for multi-ref in nextgen
--ext-inter: extended inter - for new_inter/copy_mode in nextgen
--ext-interp: for new interpolation
Jingning Han [Fri, 16 Oct 2015 22:54:58 +0000 (15:54 -0700)]
Refactor loop filter mask
This commit refactors the loop filter selection process to support
variable transform block sizes based filter mask. It disables the
multi-thread loop filter implementation to simplify the experiments.
The speed impact on speed 0 encoding is negligible.
Jingning Han [Wed, 28 Oct 2015 18:34:45 +0000 (11:34 -0700)]
Account for variable txfm sizes in coeff token packing
This commit makes the coefficient token packtization process account
for variable transform block sizes supported in a single processing
block. It fixes an enc/dec mismatch issue when var-tx, ext-tx, and
misc-fixes experiments are all turned on.
Jingning Han [Tue, 27 Oct 2015 23:50:27 +0000 (16:50 -0700)]
Add tx_type counts in key frame
Properly update the transform type counts in key frame coding at
decoder. It fixes an enc/dec mismatch issue when both ext-tx and
misc-fixes are turned on.
Peter de Rivaz [Tue, 27 Oct 2015 10:50:00 +0000 (10:50 +0000)]
Accumulate EXT_TX counts for multithread
EXT_TX introduces some new symbols to be decoded.
The encoder counts how many times these are used.
In multithreaded mode, the counts from the worker threads
need to be accumulated into the main thread.
This change means that VP10/VPxEncoderThreadTest now works
with more choices of cpu-used and number of passes.
Jingning Han [Mon, 26 Oct 2015 19:32:30 +0000 (12:32 -0700)]
Fix lossless coding
Use inter_block_yrd as rate-distortion optimization for lossless
coding. This fixes transform coefficient buffer swap use case and
resolves the unit test failure related to lossless coding.
Jingning Han [Mon, 26 Oct 2015 18:09:55 +0000 (11:09 -0700)]
Make transform block partition scheme support use largest txfm setting
This commit properly resets the recursive transform block partition
array in the settings of using largest transform block size at frame
header level. It fixes one of the unit test failure related to the use
of frame level fixed transform block size with 440 color format.
Alex Converse [Wed, 14 Oct 2015 18:03:14 +0000 (11:03 -0700)]
palette: Replace rand() call with custom LCG.
The custom LCG is based on the POSIX recommend constants for a 16-bit
rand(). This implementation uses less computation than typical standard
library procedures which have been extended for 32-bit support, is
guaranteed to be reentrant, and identical everywhere.
base_frame_target is supposed to track the idealized bit
allocation based on error score and not the actual bits
allocated to each frame.
The clamping of this value based on the VBR min and max pct values
was causing a bug where in some cases the loop that adjusts the
active max quantizer for each GF group was running out of bits at
the end of a KF group. This caused a spike in Q and some ugly artifacts.
A second change makes sure that the calculation of the active
Q range for a group DOES, however, take account of clamping.
Jingning Han [Fri, 23 Oct 2015 21:27:21 +0000 (14:27 -0700)]
Properly handle non-420 color format in recursive transform scheme
This commit makes the recursive transform block partitioning properly
handle the non-420 color format. It resolves an enc/dec mismatch
issue in that setting when var-tx experiment is turned on.
Jingning Han [Fri, 23 Oct 2015 00:25:00 +0000 (17:25 -0700)]
Use explicit block position in foreach_transformed_block
Add the row and column index to the argument list of unit functions
called by foreach_transformed_block wrapper. This avoids the
repeated internal parsing according to the block index.