paulwilkins [Fri, 26 Aug 2016 10:43:47 +0000 (11:43 +0100)]
Add ALLOW_RECODE_FIRST speed mode.
This patch is to address concerns that changes to allow
recodes on the first frame in each ARF group do not give a
good enough speed quality trade off for speed 2. Though the
average impact on encode speed is 1-2%, for some hard clips
it is > 5% rise. For speed 1 this is less an issue and for Speed 0
the previous patch actually improves speed.
Current commit is just an API template for the rest of the code, and
I will add inner logic later.
Altref frames generate a lot of bitrate and at the same time
other frames refer to them a lot, so it makes sense to apply
special compensation-based adaptive quantization scheme for altref
frames. E.g., for blocks that are good predictors for the future
apply rate-control chosen quantizer while for bad predictors apply
worse one.
James Zern [Thu, 25 Aug 2016 04:46:26 +0000 (21:46 -0700)]
add_noise,vpx_setup_noise: correct 'char_dist' type
fixes SSE2/AddNoiseTest.CheckCvsAssembly/0 with -funsigned-char.
visibly broken since: 0dc69c7 postproc : fix function parameters for noise functions.
where the types diverged (char vs. int8)
but likely the return changed in: 2ca24b0 postproc - move filling of noise buffer to vpx_dsp.
when multiple implementations were merged.
In the future this option will activate adaptive quantization special
for altref frames. Encoder will create the adaptive quantization map
on the basis of lookahead buffers similarity which is the estimate of
the future motion compensation performance.
James Zern [Tue, 23 Aug 2016 22:51:32 +0000 (15:51 -0700)]
vp8: fix decoder crash with invalid leading keyframes
decoding the same invalid keyframe twice would result in a crash as the
second time through the decoder would be assumed to have been
initialized as there was no resolution change. in this case the
resolution was itself invalid (0x6), but vp8_peek_si() was only failing
in the case of 0x0.
invalid-vp80-00-comprehensive-018.ivf.2kf_0x6.ivf tests this case by
duplicating the first keyframe and additionally adds a valid one to
ensure decoding can resume without error.
JackyChen [Fri, 5 Aug 2016 18:10:29 +0000 (11:10 -0700)]
vp9 svc: SVC encoder speed up.
Bias towards base_mv and skip 1/4 pixel motion search when using base mv.
2~3% speed up for 2 spatial layers, 3~5% speed up for 3 spatial layers.
PSNR loss:
(2 layers) 0.07dB for gips_stationary, 0.04dB for gips_motion;
(3 layers) 0.07dB for gips_stationary, 0.06dB for gips_motion.
Marco [Mon, 15 Aug 2016 17:22:40 +0000 (10:22 -0700)]
vp9 non-rd pickmode: Add limit on newmv-last and golden bias.
Add option, for newmv-last, to limit the rd-threshold update for early exit,
under a source varianace condition.
This can improve visual quality in low texture moving areas,
like forehead/faces.
Also add bias against golden to improve the speed/fps,
will little/negligible loss in quality.
Only affects CBR mode, non-svc, non-screen-content.
paulwilkins [Wed, 10 Aug 2016 13:00:52 +0000 (14:00 +0100)]
Change default recode rule for good speed 0 and best.
Changes the default recode rule for Speed 0 and best quality
from ALLOW_RECODE to ALLOW_RECODE_KFARFGF.
Tested on the NF, hdres, midres and lowres test sets, this setting
when combined with patch I40cb559... now performs "as well" in
metrics terms (in fact it came out a tiny amount better overall)
but encode time is 9.6% faster (measured as the average
from 27 mid rate local encodes on clips in the derf/lowres set.
Yunqing Wang [Fri, 12 Aug 2016 00:34:20 +0000 (17:34 -0700)]
Fix another motion vector out of range bug
This patch fixed a motion vector out of range bug:
vpxenc: ../libvpx/vp9/encoder/vp9_mcomp.c:69:
mv_cost: Assertion `mv->col >= -((1 << (11 + 1 + 2)) - 1) &&
mv->col < ((1 << (11 + 1 + 2)) - 1)' failed.
For blocks that returned without having full-pixel search, the original
MV limits were not restored, which caused the failure. Moved the set
MV limit function down to fix the bug.
_beginthreadex does not align the stack on 16-byte boundary as expected
by gcc.
On x86 targets, the force_align_arg_pointer attribute may be applied to
individual function definitions, generating an alternate prologue and
epilogue that realigns the run-time stack if necessary. This supports
mixing legacy codes that run with a 4-byte aligned stack with modern
codes that keep a 16-byte stack for SSE compatibility.
https://gcc.gnu.org/onlinedocs/gcc/x86-Function-Attributes.html
Yunqing Wang [Fri, 5 Aug 2016 22:09:13 +0000 (15:09 -0700)]
Fix a motion vector out of range bug
This patch fixed a motion vector(MV) out of range bug, which was caused
by not restoring the original values of the MV min/max thresholds after
the sub8x8 full pixel motion search. It occurred rarely and only was seen
while encoding a 4k clip for 200 frames.
* changes:
Use common transpose for vpx_idct32x32_1024_add_neon
Use common transpose for vpx_idct8x8_[12|64]_add_neon
Use common transpose for vp9_iht8x8_add_neon
Use common transpose for vpx_idct16x16_[10|256]_add_neon
Johann [Wed, 27 Jul 2016 21:24:14 +0000 (14:24 -0700)]
Don't expand to Q register for 4x4 intrapred
The code was expanding to Q registers so that vqrshn could be used, for
vector quad round shift and narrow. If 4 values are added together,
there is a shift by 2. If 8 values, a shift by 3. Since this accounts
for any possibility of overflow, we can skip the narrowing shift.
This allows keeping the values in D registers and casting the 16 bit
value to 8 bits.
Johann [Wed, 27 Jul 2016 21:19:20 +0000 (14:19 -0700)]
Pad 'Left' when building under ASan
The neon intrinsics are not able to load just the 4 values that are
used. In vpx_dsp/arm/intrapred_neon.c:dc_4x4 it loads 8 values for both
the 'above' and 'left' computations, but only uses the sum of the first
4 values.