Adrian Grange [Tue, 8 Jan 2013 22:14:01 +0000 (14:14 -0800)]
New prediction filter
This patch removes the old pred-filter experiment and replaces it
with one that is implemented using the switchable filter framework.
If the pred-filter experiment is enabled, three interopolation
filters are tested during mode selection; the standard 8-tap
interpolation filter, a sharp 8-tap filter and a (new) 8-tap
smoothing filter.
The 6-tap filter code has been preserved for now and if the
enable-6tap experiment is enabled (in addition to the pred-filter
experiment) the original 6-tap filter replaces the new 8-tap smooth
filter in the switchable mode.
The new experiment applies the prediction filter in cases of a
fractional-pel motion vector. Future patches will apply the filter
where the mv is pel-aligned and also to intra predicted blocks.
Deb Mukherjee [Tue, 8 Jan 2013 20:18:16 +0000 (12:18 -0800)]
Adds 64x64 hybrid dct/dwt transform
This is to add to the 64x64 transform experiment as an alternative to
a 64x64 DCT.
Two levels of wavelet decomposition is used on a 64x64 block, followed
by 16x16 DCT on the four lowest subbands. The highest three subbands
are left untransformed after the first level DWT.
Yaowu Xu [Wed, 19 Dec 2012 19:34:49 +0000 (11:34 -0800)]
minor loop filter refactoring and cleanup
This commit did a couple of minor cleanup/refactoring to prepare for
futher loop filter experiments. It merged y_only version of loop filter
function into the regular one, which makes sure that same logic is used
for functions for picking level and for actual loop filtering.
Paul Wilkins [Thu, 3 Jan 2013 15:14:36 +0000 (15:14 +0000)]
Further change to mv reference search.
This experimental change reorders the search so
that all possible references that match the target
reference frame are tested first and these in order
of distance from the current block. These will usually
be the highest scoring candidates.
If we do not find enough good candidates this way
we try non matching cases. These will usually be lower
scoring candidates.
The change in order together with breakouts when
we have found enough candidates should reduce
the computational cost and especially reduce the number
of sort operations.
Quality Results:
Std Hd +0.228%, Hd +0.074%, YT +0.046%, derf +0.137%
This effect is probably due to the fact that more distant
weak candidates are now less likely to get "promoted" over
near candidates even if they are repeated.
Adrian Grange [Thu, 20 Dec 2012 22:56:19 +0000 (14:56 -0800)]
New interpolation filter selection algorithm
Old Scheme:
When SWITCHABLE filter selection is enabled the encoder
evaluates the use of each interpolation filter type and
selects the best one to use at the MB level. A frame-
level flag can be set to force the use of a particular
filter type for all MBs in a frame if it is more efficient
to encode that way. The logic here involved a Q dependent
threshold that assumed that the second 8-tap filter was
a high-pass filter. However, this requires a trip around
the recode loop. If the frame-level flag indicates use
of a particular filter, the other filters are not
evaluated in the pick_mode loop.
New Scheme:
Each filter type is evaluated at the MB level and a record
of the best filter is kept, irrespective of what filter
is signaled at the frame-level. Once all MBs have been
encoded, a decision is made as to what frame-level mode
to set for the *next* frame. If one filter is used by 80%
or more of the MBs, then this filter is forced since it
is assumed that this will be more efficient if the
next frame has similar characteristics. i.e. there is a
one-frame lag between measuring the filter selection and
setting the frame-level mode to use.
Yaowu Xu [Thu, 3 Jan 2013 16:00:00 +0000 (08:00 -0800)]
Merge cost_coeffs_2x2() into cost_coeffs()
Remove special case function cost_coeffs_2x2() and change function
cost_coeffs() to handle 2nd order haar block as it is handle all
other block types already.
Yunqing Wang [Fri, 28 Dec 2012 00:04:44 +0000 (16:04 -0800)]
Skip finding best ref_mvs when the mode is ZEROMV
Read mode before calling vp9_find_best_ref_mvs(). If the mode is
ZEROMV, the best ref_mvs are not needed. Then, we can skip calling
vp9_find_best_ref_mvs().
Yunqing Wang [Thu, 27 Dec 2012 21:48:17 +0000 (13:48 -0800)]
Switch the order of calculating 2-D inverse transform
The 2-D inverse transform X = M1*Z*Transposed_M2 was calculated
in 2 steps from left to right:
1. Vertical transform: Y = M1*Z
2. Horizontal transform: X= Y*Transposed_M2
In SIMD, a transpose is needed in vertical transform.
Here, switched the calculation order to do it from right to left.
In this way, we could eliminate that transpose by writing the
intermediate results out to their transposed positions.
Deb Mukherjee [Tue, 27 Nov 2012 23:51:06 +0000 (15:51 -0800)]
New previous coef context experiment
Adds an experiment to derive the previous context of a coefficient
not just from the previous coefficient in the scan order but from a
combination of several neighboring coefficients previously encountered
in scan order. A precomputed table of neighbors for each location
for each scan type and block size is used. Currently 5 neighbors are
used.
Results are about 0.2% positive using a strategy where the max coef
magnitude from the 5 neigbors is used to derive the context.
John Koleszar [Wed, 19 Dec 2012 21:44:32 +0000 (13:44 -0800)]
make: fix dependency generation
Remove an extra level of escaping around the $@ variable to get valid output.
Prior to this change, modifying header files did not trigger a rebuild of
sources dependent on them.
John Koleszar [Fri, 16 Nov 2012 18:48:23 +0000 (10:48 -0800)]
Use boolcoder API instead of inlining
This patch changes the token packing to call the bool encoder API rather
than inlining it into the token packing function, and similarly removes
a special get_signed case from the detokenizer. This allows easier
experimentation with changing the bool coder as a whole.
Ronald S. Bultje [Tue, 18 Dec 2012 23:31:19 +0000 (15:31 -0800)]
Use standard integer types for pixel values and coefficients.
For coefficients, use int16_t (instead of short); for pixel values in
16-bit intermediates, use uint16_t (instead of unsigned short); for all
others, use uint8_t (instead of unsigned char).
Yaowu Xu [Tue, 18 Dec 2012 22:36:20 +0000 (14:36 -0800)]
Changed MAX_PSNR to 100
The MAX_PSNR was used to assign a "psnr" number when the mse is close
to zero. The direct assignment is used to prevent divide by zero in
computation. Changing it from 60 to 100 to be consistent against what
is being done in VP9
Paul Wilkins [Fri, 14 Dec 2012 17:49:46 +0000 (17:49 +0000)]
Problem of over smoothing with intra modes.
In some cases intra modes in inter frames give
an over smoothed appearance. Especially with
noisy but flat content.
Also in some cases there were problems with key
frame sizing again with very flat but noisy content.
These are temporary changes to help alleviate the
visual problems but will almost certainly hurt metric
results especially at the very low data rate end.
Deb Mukherjee [Wed, 12 Dec 2012 01:06:35 +0000 (17:06 -0800)]
Further improvements on the hybrid dwt/dct expt
Modifies the scanning pattern and uses a floating point 16x16
dct implementation for now to handle scaling better.
Also experiments are in progress with 2/6 and 9/7 wavelets.
Results have improved to within ~0.25% of 32x32 dct for std-hd
and about 0.03% for derf. This difference can probably be bridged by
re-optimizing the entropy stats for these transforms. Currently
the stats used are common between 32x32 dct and dwt/dct.
Experiments are in progress with various scan pattern - wavelet
combinations.
Ideally the subbands should be tokenized separately, and an
experiment will be condcuted next on that.
Ronald S. Bultje [Wed, 12 Dec 2012 18:25:58 +0000 (10:25 -0800)]
New default coefficient/band probabilities.
Gives 0.5-0.6% improvement on derf and stdhd, and 1.1% on hd. The
old tables basically derive from times that we had only 4x4 or
only 4x4 and 8x8 DCTs.
Note that some values are filled with 128, because e.g. ADST ever
only occurs as Y-with-DC, as does 32x32; 16x16 ever only occurs
as Y-with-DC or as UV (as complement of 32x32 Y); and 8x8 Y2 ever
only has 4 coefficients max. If preferred, I can add values of
other tables in their place (e.g. use 4x4 2nd order high-frequency
probabilities for 8x8 2nd order), so that they make at least some
sense if we ever implement a larger 2nd order transform for the
8x8 DCT (etc.), please let me know
Scott LaVarnway [Wed, 12 Dec 2012 23:49:39 +0000 (15:49 -0800)]
Improved vp9_ihtllm_c
As suggested by Yaowu, we can use eob to reduce the complexity
of the vp9_ihtllm_c function. For the 1080p test clip used, the decoder
performance improved by 17%.
Ronald S. Bultje [Mon, 10 Dec 2012 20:09:07 +0000 (12:09 -0800)]
Consistently use get_prob(), clip_prob() and newly added clip_pixel().
Add a function clip_pixel() to clip a pixel value to the [0,255] range
of allowed values, and use this where-ever appropriate (e.g. prediction,
reconstruction). Likewise, consistently use the recently added function
clip_prob(), which calculates a binary probability in the [1,255] range.
If possible, try to use get_prob() or its sister get_binary_prob() to
calculate binary probabilities, for consistency.
Since in some places, this means that binary probability calculations
are changed (we use {255,256}*count0/(total) in a range of places,
and all of these are now changed to use 256*count0+(total>>1)/total),
this changes the encoding result, so this patch warrants some extensive
testing.
John Koleszar [Mon, 10 Dec 2012 20:07:59 +0000 (12:07 -0800)]
configure: add --enable-external-build support
First attempt at avoiding all the compile-time environment detection for
cases where you can generate the environments statically, as when the
real build is being performed by another build system.