Paul Wilkins [Thu, 11 Dec 2014 16:22:03 +0000 (16:22 +0000)]
Improve motion detection for low complexity regions.
Where there is very subtle motion, especially when combined
with low spatial complexity, the codec sometimes fails to quickly
pick up the ambient motion field.
Once it has been established though the field propagates well using
Nearest and Near MV.
This patch looks specifically at the case where the Nearest and Near
have not been established as non zero vectors and in this case
discounts the cost of searching for a new vector in the rd code.
This will almost certainly have some implications in terms of encode
speed but it should be possible to mitigate the impact in a subsequent
using first pass stats and the local spatial complexity.
Average results for test sets approximately neutral.
Frank Galligan [Thu, 13 Nov 2014 20:28:34 +0000 (12:28 -0800)]
Add support for setting byte alignment.
Add support for setting byte alignment on the Y, U, and V plane of the
reference buffers. The byte alignment must be a power of 2, from 32 to
1024. A value of 0 sets legacy alignment.
Jingning Han [Fri, 12 Dec 2014 01:17:53 +0000 (17:17 -0800)]
Fix PICK_MODE_CONTEXT index in non-RD coding mode
This commit fixes a bug in the PICK_MODE_CONTEXT index for
horizontal partition case. The compression performance change
is less than 0.01% level, since most blocks are selected to
use square block size in RTC coding mode.
Marco [Thu, 11 Dec 2014 23:21:17 +0000 (15:21 -0800)]
Allow for 4x4 prediction blocks for key frame, speed 6.
For key frame under variance source partition: 4x4 prediction blocks
may be selected when variance of 8x8 block is very high (threshold is set fairly high for now).
Testing on some RTC clips shows this helps to reduce some ringing artifacts on key frame.
Encoded key frame size increases about ~10%. Key frame PSNR increases about ~0.1-0.2dB.
Peter de Rivaz [Thu, 11 Dec 2014 15:54:23 +0000 (15:54 +0000)]
Corrected optimization of 8x8 DCT code
The 8x8 DCT uses a fast version whenever possible.
There was a mistake in the checking code which
meant sometimes the fast version was used when it
was not safe to do so.
Jingning Han [Thu, 11 Dec 2014 17:29:36 +0000 (09:29 -0800)]
Refactor choose_partitioning computing scheme
This commit refactors the choose_partitioning function. It removes
redundant memset calls and makes the encoder to calculate
variance value per block only when it is needed. It reduces the
average runtime cost of choose_partitioning by 60%. Overall it
reduces speed -6 runtime by 2-5%.
JackyChen [Tue, 2 Dec 2014 20:14:47 +0000 (12:14 -0800)]
Multiframe Quality Enhancement(MFQE) in VP9.
It is the first version of MFQE in VP9. There are a few TODOs included
in this version.
Usage: Add flag --enable-vp9-postproc to config the project.
In decoder, use flag --mfqe in the command line to enable
MFQE in postproc.
Note: Need to have key frame with low quality to see the effect of this
new patch. In my experiment, I fixed the qindex to 200 in key frame.
Jingning Han [Wed, 10 Dec 2014 02:01:17 +0000 (18:01 -0800)]
Use use_prev_frame_mvs flag for ref mv search branch
Replace error_resilient flag with use_prev_frame_mvs in
vp9_pick_inter_mode reference motion vector search selection.
This effectively turns off the simplified ref mv search in the
settings of frame resizing, even if error-resilient mode is off.
hkuang [Tue, 9 Dec 2014 22:32:48 +0000 (14:32 -0800)]
Fix clang ioc warning due to NULL src_mi pointer.
The warning only happens in VP9 encoder's first pass due to src_mi
is not set up yet. But it will not fail the encoder as left_mi and
above_mi are not used in the first_pass and they will be set up again
in the second pass.
Jingning Han [Tue, 9 Dec 2014 19:31:45 +0000 (11:31 -0800)]
Make RTC coding flow support sub8x8 in key frame coding
This commit enables the use of sub8x8 blocks in RTC key frame
encoding. It requires the block size to be preset and will decide
the coding mode and encode the bit-stream.
Paul Wilkins [Thu, 27 Nov 2014 10:50:56 +0000 (10:50 +0000)]
Substantial restructuring of AQ mode 2.
The restructure moves the decision into the rd pick
modes loop and makes a decision based at the 16x16
block level instead of only the 64x64 level.
This gives finer granularity and better visual results
on the clips I have tested. Metrics results are worse
than the old AQ2 especially for PSNR and this mode
now falls between AQ0 and AQ1 in terms of visual
impact and metrics results.
Further tuning of this to follow.
It should be noted that if there are multiple iterations
of the recode loop the segment for a MB could change
in each loop if the previous loop causes a change in the
complexity / variance bin of the block. Also where a block
gets a delta Q this will alter the rd multiplier for this block
in subsequent recode iterations and frames where the
segmentation is applied.
levytamar82 [Fri, 5 Dec 2014 18:14:33 +0000 (11:14 -0700)]
SSSE3 Optimization for Atom processors using new instruction selection and ordering
The function vp9_filter_block1d16_h8_ssse3 uses the PSHUFB instruction which has a 3 cycle latency and slows execution when done in blocks of 5 or more on Atom processors.
By replacing the PSHUFB instructions with other more efficient single cycle instructions (PUNPCKLBW + PUNPCHBW + PALIGNR) performance can be improved.
In the original code, the PSHUBF uses every byte and is consecutively copied.
This is done more efficiently by PUNPCKLBW and PUNPCHBW, using PALIGNR to concatenate the intermediate result and then shift right the next consecutive 16 bytes for the final result.
This optimization improved the function performance by 23% and produced a 3% user level gain on 1080p content on Atom processors.
There was no observed performance impact on Core processors (expected).
Jim Bankoski [Sun, 7 Dec 2014 19:28:51 +0000 (11:28 -0800)]
Make the decoder Cfg available to encoder tests..
Adds decoder config as a changeable parameter to unit tests, and
changes end to end test to use commonly used parameters to enable
base test of tiles encoding and frame parallel decoding.