Yunqing Wang [Mon, 25 Jan 2016 20:14:18 +0000 (12:14 -0800)]
Implement a tile copying method in large-scale tile coding
A tile copy mode is introduced, while allows a tile to use
another tile's coded data directly at bitstream level. This
largely reduces the bit rate in this use case. Our tests
showed that 10% - 20% bit rate reduction was achieved.
Yunqing Wang [Fri, 22 Jan 2016 01:04:59 +0000 (17:04 -0800)]
Make set_reference control API work in VP9
1. Made VP8_SET_REFERENCE vpx_codec_control API work in VP9 based
on Ryan Overbeck's patch. (Thanks.)
2. Added vp9cx_set_ref example, which demonstrated how to set
VP8_SET_REFERENCE vpx_codec_control in encoder and decoder. If
we only set_reference in the encoder, the encoder/decoder
mismatch would be observed.
3. Also updated test/cx_set_ref.sh.
Brandon Young [Thu, 14 Jan 2016 00:32:34 +0000 (16:32 -0800)]
Changes to CONFIG_NEW_QUANT experiment.
Added dq_off_index attribute to mbmi to allow for switching between
dequantization modes.
Reduced number of different dequantization modes from 5 to 3.
Changed dequant_val_nuq to be allow for 3 dequant levels instead of 1.
Fixed lint errors
Yunqing Wang [Mon, 11 Jan 2016 19:27:35 +0000 (11:27 -0800)]
Adaptively determine the number of bytes used for tile-data size transmission
In large-scale tile coding, when the number of tiles is large and the tile
size is small, using a fixed number of bytes in the tile header to store
tile-data size information as done in current VP9 codec would bring high
overhead for each tile. This patch implemented 2 ways to lower that overhead
and adaptively determine the number of bytes needed for tile-data size
transmission.
The test on a test clip having the tile size of 64x64 showed that the number
of bytes used for storing tile-data size was reduced from 4 to 1, which
substantially improved the compression ratio in large-scale tile coding.
A small gain (0.1 - 0.2%) with this experiment on derflr/hevcmr.
The DST2 can be implemened very efficiently using sign flipping
of odd indexed inputs, followed by DCT, followed by reversal of
the output. This is how it is implemented in this patch.
SIMD optimization is pending.
Yunqing Wang [Wed, 25 Nov 2015 23:28:03 +0000 (15:28 -0800)]
Reduce the memset call in tile decoding
When the error resilient mode is on, the decoder resets mode info structure
to zero once per frame. This makes decoder about 10x slower if we decode
a single tile at a time. This patch resolves the issue by only memset mode
info of those decoded tiles. Currently, to decode a frame, tile decoding is
less than 2x slower than frame decoding.
This commit also fixes a bug where FLIPADST transforms when combined
with a DST (that is FLIPADST_DST and DST_FLIPADST) did not actually did
a flipped transform but a straight ADST instead. This was due to the C
implementation that it fell back on not implementing flipping. This is
now fixed as well and FLIPADST_DST and DST_FLIPADST does what it is
supposed to do.
There are 3 functions in the SR_MODE experiment that should be updated,
but given that the build of SR_MODE is broken at the upstream tip of
nextgen, I could not test these, so I have put in assertions and FIXME
notes at the problematic places.
This commit also fixes a bug where FLIPADST transforms when combined
with a DST (that is FLIPADST_DST and DST_FLIPADST) did not actually did
a flipped transform but a straight ADST instead. This was due to the C
implementation that it fell back on not implementing flipping. This is
now fixed as well and FLIPADST_DST and DST_FLIPADST does what it is
supposed to do.
Zoe Liu [Wed, 16 Sep 2015 01:58:36 +0000 (18:58 -0700)]
Added a 3rd reference frame LAST3_FRAME
Under experiment CONFIG_LAST3_REF, which can only be turned on when
the experiment of CONFIG_MULTI_REF is on, i.e. LAST3_FRAME can only
be used when LAST2_FRAME is used. CONFIG_LAST3_REF would most likely
be combined with CONFIG_MULTI_REF once the performance improvement
is further confirmed.
On the testset of derflr, using Average PSNR metrics, with HighBitDepth
(HBD) on:
(1) LAST2 HBD obtained +0.579% against base HBD;
(2) LAST2 + LAST3 HBD obtained +0.591% against LAST2 HBD;
(3) LAST2 + LAST3 HBD obtained +1.173% against base HBD.
Zoe Liu [Tue, 15 Sep 2015 21:12:12 +0000 (14:12 -0700)]
Fixed a bug on the number of MAX_MODES in baseline
All the numbers of MAX_MODES have been changed assuming
CONFIG_MULTI_REF. Now correct numbers have been put in for both with and
without the enabling of the experiment MULTI_REF.
Zoe Liu [Fri, 11 Sep 2015 21:57:31 +0000 (14:57 -0700)]
Added MACRO for reference frame encoding
This CL introduces a few macros plus code cleaning on the encoding of
the reference frames. Coding performance remains unchanged.
For the encoding of either the compound reference or the single reference
case, since each bit has different contexts, the tree structure may not
be applied to treat the combined bits as one symbol. It is possible we may
explore the sharing of the same context for all the bits to introduce
the use of tree structure for the next step.
More tuning on the reference frame context design and default
probs are being conducted. This version does not guarantee to
work with other experiments in nextgen. A separate CL will address
the working with all other experiments.
Shunyao Li [Wed, 12 Aug 2015 01:29:41 +0000 (18:29 -0700)]
Super resolution mode (+CONFIG_SR_MODE)
CONFIG_SR_MODE=1, enable SR mode
USE_POST_F=1, enable SR post filter
SR_USE_MULTI_F=1, enable SR post filter family
Not compatible with other experiments yet
Disables some test vector tests when Vp8/Vp9 decoders are disabled
in configuration. Also moves some macros to the vpx level in
line with recent refactoring on the master branch.
Framework for alternate transforms for inter 32x32 and larger based
on dwt-dct hybrid is implemented.
Further experiments are to be condcuted with different
variations of hybrid dct/dwt or plain dwt, as well as super-resolution
mode.
Shunyao Li [Mon, 29 Jun 2015 18:54:17 +0000 (11:54 -0700)]
Optimize bilateral filter to improve speed
Optimization of bilateral filter:
1) Pre-calculate the bilateral filters at all the
levels at the initialization.
2) Convert 1D matrix to 2D matrix, avoid too many
multiplications in the bilateral filter loop.
3) Fix a bug in "loop_bilateral_filter_highbd".
The right-shifted range index can be larger than 255.