Yunqing Wang [Tue, 8 Feb 2011 00:16:15 +0000 (19:16 -0500)]
Improve motion search in real-time mode
Applied better MV prediction in real-time mode, which improves
the encoding quality.
Used quarter-pixel search instead of iterative sub-pixel search
for speed >=5 to improve encoding performance.
Tests on the test set showed:
1. For speed=-5, quality improvement: 1.7% on AvgPSNR and 2.1%
on SSIM, performance improvement: 3.6% (This counts in the
performance lose caused by MV prediction calculation in "Improve
MV prediction in vp8_pick_inter_mode() for speed>3").
2. For speed=-8, quality improvement: 2.1% on AvgPSNR and 2.5%
on SSIM. but, 6.9% performance decrease because of MV prediction
calculation. This should be improved later.
John Koleszar [Fri, 4 Feb 2011 16:36:04 +0000 (11:36 -0500)]
correct quantizer initialization
The encoder was not correctly catching transitions in the quantizer
deltas. If a delta_q was set, then the quantizer would be reinitialized
on every frame, but if they transitioned to 0, the quantizer would
not be reinitialized, leading to a encode-decode mismatch.
This bug was triggered by commit 999e155, which sets a Y2 delta Q
for very low base Q levels.
Attila Nagy [Wed, 2 Feb 2011 11:10:27 +0000 (13:10 +0200)]
Delay auto key frame insertion in realtime configuration
Whe auto keyframe insertion is enabled and conditions are right (scene change)
the encoder can decide to insert a key frame and does a re-encoding. This can
introduce extra latency. In RT mode we do not do the re-encoding of the current
frame but force the next frame to key frame.
Attila Nagy [Wed, 26 Jan 2011 08:29:46 +0000 (10:29 +0200)]
Improved encoder threading
Reduce the number of sync points by letting each thread
continue imediatly with a new MB row.
Better multicore scaling, improves performance by 5-20% on ARM multicore.
Scott LaVarnway [Tue, 1 Feb 2011 00:53:02 +0000 (19:53 -0500)]
Removed prediction_error accumulation
from vp8cx_encode_intra_macro_block. prediction_error is used when
deciding if a frame should be a keyframe. After reviewing this with
Yaowu, it was pointed out that vp8cx_encode_intra_macro_block
is only called for keyframes, so the accumulation is unnecessary.
Scott LaVarnway [Mon, 31 Jan 2011 22:43:18 +0000 (17:43 -0500)]
Possible bug in vp8cx_encode_intra_macro_block
vp8_pick_intra4x4mby_modes uses the passed in distortion
for an early breakout. The best distortion was never saved
and the distortion for TM_PRED was always used.
Yaowu Xu [Thu, 20 Jan 2011 00:21:01 +0000 (16:21 -0800)]
change the threshold of DC check for encode breakout
Previously, the DC check is to make sure there is no code-able
DC shift for quantizer Q0, which has been verified rather
conservative. This commit changes the criteria to have two
components, DC and AC, to address the conservativeness. First,
it checks if all AC energy is enough to contribute a single
non-zero quantized AC coefficient. Second, for DC, the decision
to skip further considers two possible scenarios: 1. There is
no code-able 2nd order DC coefficient at all; 2 The residue is
relatively flat, but the uniform DC change is very small, i.e.
less than 1/2 gray level per pixel.
Comparing to previous criteria, the new criteria is about 10%
to 15% faster in encoding time with a very small quality loss.
(threshold ~1000 and quality range 33db-45db)
It should be noted that this commit enables "automatic" static
threshold for encodebreakout if a non-zero small value is passed
in to encoder.
Tero Rintaluoma [Mon, 24 Jan 2011 09:21:40 +0000 (11:21 +0200)]
Adds "armvX-none-rvct" targets
Adds following targets to configure script to support RVCT compilation
without operating system support (for Profiler or bare metal images).
- armv5te-none-rvct
- armv6-none-rvct
- armv7-none-rvct
To strip OS specific parts from the code "os_support"-config was added
to script and CONFIG_OS_SUPPORT flag is used in the code to exclude OS
specific parts such as OS specific includes and function calls for
timers and threads etc. This was done to enable RVCT compilation for
profiling purposes or running the image on bare metal target with
Lauterbach.
Removed separate AREA directives for READONLY data in armv6 and neon
assembly files to fix the RVCT compilation. Otherwise
"ldr <reg>, =label" syntax would have been needed to prevent linker
errors. This syntax is not supported by older gnu assemblers.
Johann [Thu, 27 Jan 2011 16:50:29 +0000 (11:50 -0500)]
warning: pointer targets differ in signedness
vp8/encoder/rdopt.c:728: warning: pointer targets in passing argument 3
of 'macro_block_yrd' differ in signedness
vp8/encoder/rdopt.c:541: note: expected 'int *' but argument is of type
'unsigned int *'
distortion is signed when calling macro_block_yrd is both other cases,
as well as for RDCOST
Johann [Thu, 27 Jan 2011 16:31:59 +0000 (11:31 -0500)]
clean up implicit declaration warnings for neon
Change-Id: I6ca2d89f355839c4c770773c09fc69dcea7c1406
warning: implicit declaration of function
'vp8_variance_halfpixvar16x16_[h|v|hv]_neon'
'vp8_sub_pixel_variance16x16_neon_func'
Scott LaVarnway [Wed, 26 Jan 2011 21:42:56 +0000 (16:42 -0500)]
Performance improvement of first pass
Improved the performance of the first pass only
(~6% on 720p test clip) by making use of LUT instead of the
float calculations. Might try a SIMD version later.
Also started to make use of int_mv instead of
MV.
Yaowu Xu [Wed, 26 Jan 2011 06:24:22 +0000 (22:24 -0800)]
cap the best quantizer for 2nd order DC
This commit also removes artificial RDMULT cap for low quantizers.
The intention is to address some abnormal behavior of mode selections
at the low quantizer end, where many macroblocks were coded with
SPLITMV with all partitions using same motion vector including (0,0).
This change improves the compression quality substantially for high
quality encodings in both PSNR and SSIM terms. Overall effect on
mid/low rate range is also positive for all metrics, but smaller
in magnitude.
Yunqing Wang [Tue, 25 Jan 2011 20:54:34 +0000 (15:54 -0500)]
Refine motion vector prediction for NEWMV mode
Adjust checking points in motion vector prediction to better cover
possible movements, and get a better prediction. Tests on test
clips showed a 0.1% improvement in SSIM, and no change in PSNR
and performance.
Paul Wilkins [Tue, 25 Jan 2011 12:29:06 +0000 (12:29 +0000)]
Incorrect bit allocation in forced KF groups.
The old 2 pass code estimated error distribution when coding a
forced (by interval) key frame. The result of this was that in some
cases, when allocating bits at the GF group level within a KF
group there was either a glut of bits or starvation of bits at the end
of the KF group.
Added code to rescan and get the correct data once the position of
a forced key frame has been determined.
James Berry [Mon, 24 Jan 2011 21:48:21 +0000 (16:48 -0500)]
configure.sh fix for visual studio
-For targets with external build systems like visual
studio CC is not set so check_add_cflags will fail.
Only call this function if extra_cflags is set.
Scott LaVarnway [Wed, 29 Dec 2010 19:30:57 +0000 (14:30 -0500)]
Added vp8_update_zbin_extra
vp8cx_mb_init_quantizer was being called for every mode checked
in vp8_rd_pick_inter_mode. zbin_extra is the only value that
really needs to be recalculated. This calculation is disabled
when using the fast quantizer for mode selection.
This gave a small performance boost (~.5% to 1%).
Note: This needs to be verified with segmentation_enabled.
Yunqing Wang [Thu, 20 Jan 2011 18:01:30 +0000 (13:01 -0500)]
Modify sub-pixel filters to eliminate unnecessary calculations
In sub-pixel calculation, xoffset and yoffset mostly take some
specific values. Modified sub-pixel filter functions according to
these possible values to improve performance.
Paul Wilkins [Thu, 20 Jan 2011 18:01:20 +0000 (18:01 +0000)]
Further work to reduce pulsing.
These changes are specifically targeted at fade transitions to
static scenes. Here we want to place a GF/ARF immediately
after the fade and prevent an ARF just before the fade.
Also some code lines and comment lines shortened to 80 chars
while I was there.
Henrik Lundin [Thu, 16 Dec 2010 15:46:31 +0000 (16:46 +0100)]
Implement error tracking in the decoder
A new vpx_codec_control called VP8D_GET_FRAME_CORRUPTED. The output
from the function is non-zero if the last decoded frame contains
corruption due to packet losses.
The decoder is also modified to accept encoded frames of zero length.
A zero length frame indicates to the decoder that one or more frames
have been completely lost. This will mark the last decoded reference
buffer as corrupted. The data pointer can be NULL if the length is
zero.
Yunqing Wang [Tue, 18 Jan 2011 19:19:52 +0000 (14:19 -0500)]
Modify calling of NEON code in sub-pixel search
In vp8_find_best_sub_pixel_step_iteratively(), many times xoffset
and yoffset are specific values - (4,0) (0,4) and (4,4). Modified
code to call simplified NEON version at these specific offsets to
help with the performance.
Attila Nagy [Mon, 10 Jan 2011 09:14:10 +0000 (11:14 +0200)]
Fix encoder real-time only configuration.
Remove allocation/deallocation of stats storage.
Remove full search functions in machine specific encoder inits.
Remove last pass validation in validate_config.
Paul Wilkins [Mon, 17 Jan 2011 17:23:11 +0000 (17:23 +0000)]
Fix CQ range and experimental KF sizing changes.
The CQ level was not using the q_trans[] array to convert
to a 0-127 range as per min and maxq
Experimental change to try and match the reconstruction
error for forced key frames approximately to that of the
previous frame by means of the recode loop. Though this
may cause extra recodes and the recode behavior has not
been optimized, it can only happen on forced key frames.
Paul Wilkins [Fri, 14 Jan 2011 14:52:15 +0000 (14:52 +0000)]
Testing of modes with Alt Ref frame
Previously when a frame was being overlaid on a previously coded
alt ref frame we only checked the alt ref 0,0 mode. Where there is
a possibility that the alt ref buffer is a filtered frame we should allow
the other prediction modes as normal or at the least allow use of
the last frame buffer.
Adrian Grange [Fri, 14 Jan 2011 15:04:39 +0000 (15:04 +0000)]
ARNR filter pointer update bug fix
In cases where the frame width is not a multiple of 16 the
ARNR filter would go wrong.
In vp8_temporal_filter_iterate_c when updating pointers
at the end of a row of MBs, the image size was
incorrectly used rather than using Num_MBs_In_Row
times 16 (Y) or 8 (U,V).
This worked when width is multiple of 16 but failed
otherwise.
Paul Wilkins [Fri, 14 Jan 2011 11:34:53 +0000 (11:34 +0000)]
KF/GF Pulsing
This change is designed to try and reduce pulsing effects when moving
with a complex transition like a fade, into an easy or static section in
an otherwise difficult clip in CQ mode.
The active CQ level is relaxed down to the user entered level for frames that
are generating less than the passed in minimum bandwidth.
Paul Wilkins [Wed, 12 Jan 2011 17:08:42 +0000 (17:08 +0000)]
Limit key frame quantizer for forced key frames.
Where a key frame occurs because of a minimum interval
selected by the user, then these forced key frames ideally need
to be more closely matched in quality to the surrounding frame.
Yunqing Wang [Mon, 10 Jan 2011 21:16:59 +0000 (16:16 -0500)]
Fix bug in motion search
The maximum possible MV in 1/8 pel units is (1<<11), which could
cause mvcost out of its range that is 1023. Change maximum
possible MV in 1/8 pel units to (1<<11)-8 will fix this problem.
Paul Wilkins [Mon, 10 Jan 2011 16:41:53 +0000 (16:41 +0000)]
Two Pass VBR change
Further experiment with restriction of the Q range.
This uses the average non KF/GF/ARF quantizer, instead
of just relying on the initial value. It is not such a strong constraint
but there may be a reduced risk of rate misses.