Changes to the breakout behavior for partition selection.
The biggest impact is on speed 0 where encode speed in
some cases more than doubles with typically less than 1%
impact on quality.
Speed 0 encode speed impact examples
Animation test clip: +128%
Park Joy: +59%
Old town Cross: + 109%
This change (in a new config experiment: universal_hp) removes the
bitstream parsing dependency of the HP MV bit on the ref MV to be
coded. It also cleans up clearing of the HP bit in near/nearestMV,
since HP is always on if it's set in the frame header.
This admittedly doesn't clean up the crap that could be cleaned up,
but that's mostly because I think this needs some careful review;
not so much for coding style, but more from hardware people and from
the codec team on what we/you want. It would also be nice to get some
actual numbers on the real quality impact of this change. If, for
example, hardware people come up and tell us they don't actually care
anymore, we should probably just this code as-is and do nothing (i.e.
discard this patch).
vp10: remove clamp_mv2() call from vp10_find_best_ref_mvs().
This actually has no effect whatsoever, since the input MVs themselves
are clamped by clamp_mv_ref() already, which is significantly more
restrictive in its bounds.
Geza Lore [Thu, 8 Oct 2015 14:44:49 +0000 (15:44 +0100)]
Optimization of 8bit block error for high bitdepth
If high bit depth configuration is enabled, but encoding in profile 0,
the code now falls back on optimized SSE2 assembler to compute the
block errors, similar to when high bit depth is not enabled.
jackychen [Wed, 7 Oct 2015 17:44:04 +0000 (10:44 -0700)]
VP9 denoiser bug-fix: artifact caused by false buffer swap.
The artifact occurs periodically when VP9 denoiser is on and
refresh_golden_frame happen. When refresh_golden_frame happen,
we should copy the frame buffer instead of swapping the pointers.
James Zern [Sat, 26 Sep 2015 03:43:04 +0000 (20:43 -0700)]
vp9/tile_worker_hook: add multiple tile decoding
this reduces the number of synchronizations in decode_tiles_mt() and
improves overall performance when the number of threads is less than the
number of tiles
The serial decode check is too strict for tile-threaded decoding as
there is no guarantee on the decode order nor which specific error
will take precedence. Currently a tile-level error is not forwarded so
the frame will simply be marked corrupt.
Marco [Thu, 1 Oct 2015 22:46:06 +0000 (15:46 -0700)]
Add first_spatial_layer_to_encode to SVC.
Use the existing VP9_SET_SVC control to set the
first spatial layer to encode.
Since we loop over all spatial layers inside the encoder, the
setting of spatial_layer_id via VP9_SET_SVC has no relevance.
Use it instead to set the first_spatial_layer_to_encode,
which allows an application to skip encoding lower layer(s).
Julia Robson [Tue, 6 Oct 2015 11:11:14 +0000 (12:11 +0100)]
SSSE3 optimisation for quantize in high bit depth
When configured with high bit detpth enabled, the 8bit quantize
function stopped using optimised code. This made 8bit content
decode slowly. This commit re-enables the SSSE3 optimisations.
We have historically added new bits to cat6 whenever we added a new
transform size (or bitdepth, for that matter). However, we have
always coded these new bits regardless of the actual transform size,
which means that for smaller transforms, we code bits that cannot
possibly be set. The coding (quality) impact of this is negligible,
but the bigger issue is that this allows creating bitstreams with
coefficient values that are nonsensible and can cause int overflows,
which then de facto become part of the bitstream spec. By not coding
these bits, we remove this possibility.
See issue 1051. 6 bits is fairly arbitrary but at least allows writing
delta Q values that are fairly normal in other codecs. I can extend to
8 if people want full range, although I personally don't have any need
for that.
Julia Robson [Fri, 2 Oct 2015 09:20:06 +0000 (10:20 +0100)]
SSE2 optimisation for quantize in high bit depth
When configured with high bit detpth enabled, the 8bit quantize
function stopped using optimised code. This made 8bit content
decode slowly. This commit re-enables the SSE2 optimisation
(but not the SSSE3 optimisation).
Marco [Fri, 2 Oct 2015 00:31:40 +0000 (17:31 -0700)]
Fix to denoiser with dynamic resize.
Temporary fix to denoiser when dynamic resizing is on.
-Reallocate denoiser buffers on resized frame.
-Force golden update on resized frame.
-Don't denoise resized frame, and copy source into denoised buffers.
Marco [Thu, 1 Oct 2015 01:27:49 +0000 (18:27 -0700)]
Stabilize the encoder buffer from going too negative.
For screen-content mode, with frame dropper off, put a limit
on how low encoder buffer can go.
Under hard slide changes, the buffer level can go too low and then
take long time to come back up (in particular when frame-dropping
is not used), which will affect the active_worst and target frame size.
jackychen [Thu, 1 Oct 2015 21:03:22 +0000 (14:03 -0700)]
Two-steps scaling in VP9 encoder dynamic resizing.
Dynamic resizing now support two-steps scaling: first go down to
3/4 and then 1/2. This feature is under a flag which controls the
switch between two-steps scaling and one-step scaling (1/2 only).
Ronald S. Bultje [Wed, 30 Sep 2015 22:45:08 +0000 (18:45 -0400)]
vp10: reimplement d45/4x4 to match vp8 instead of vp9.
This is more a proof of concept than anything else. The problem here
isn't so much how to code it, but rather where to place the resulting
code. All intrapred DSP code lives in vpx_dsp, so do we want the vp10
specific intra pred functions to live there, or in vp10/?
Ronald S. Bultje [Wed, 30 Sep 2015 22:44:37 +0000 (18:44 -0400)]
vp8: change build_intra4x4_predictors() to use vpx_dsp.
I've added a few new functions (d45e, d63e, he, ve) to cover the
filtered h/v 4x4 predictors that are vp8-specific, the "correct"
d45 with the correctly filtered bottom-right pixel (as opposed to
the unfiltered version in vp9), and the "broken" d63 with weirdly
filtered bottom-right pixels (which is correctly filtered in vp9).
There may be a minor performance impact on all systems because we
have to do an extra copy of the Above pixel array to incorporate
the topleft pixel in the same array (thus fitting the vpx_dsp API).
In addition, armv6 will have a more serious performance impact b/c
I removed the armv6/vp8-specific assembly. I'm not sure anyone
cares...