John Koleszar [Fri, 18 Nov 2011 20:47:16 +0000 (12:47 -0800)]
Speed selection support for disabled reference frames
There was an implicit reference frame test order (typically LAST,
GOLD, ARF) in the mode selection logic, but this doesn't provide the
expected results when some reference frames are disabled. For
instance, in real-time mode, the speed selection logic often disables
the ARF modes. So if the user disables the LAST and GOLD frames, the
encoder was always choosing INTRA, when in reality searching the ARF
in this case has the same speed penalty as searching LAST would have
had.
Instead, introduce the notion of a reference frame search order. This
patch preserves the former priorities, so if a frame is disabled, the
other frames bump up a slot to take its place. This patch lays the
groundwork for doing something smarter in the frame test order, for
example considering temporal distance or looking at the frames used by
nearby blocks.
John Koleszar [Fri, 11 Nov 2011 18:47:20 +0000 (10:47 -0800)]
avoid resetting framerate during vpx_codec_enc_config_set()
The calculated frame_rate is a state variable in the codec, and
shouldn't be maintained in the configuration struct. Move it to the
main part of cpi so that it isn't clobbered when the configuration
struct is updated. The initial framerate estimate is moved from the
vp8_cx_iface.c wrapper into the body of init_config() in onyx_if.c, so
that it is only called once and not reset on every call to
vp8_change_config().
Scott LaVarnway [Wed, 9 Nov 2011 15:41:05 +0000 (10:41 -0500)]
Relocated idct/add calls for encoder
Call the idct/add after the tokenize. This is WIP with
the goal of creating a common idct/add for the encoder and
decoder. This move is necessary because the decoder's version
of the idct clobbers qcoeff, which is used by the tokenize.
Tero Rintaluoma [Mon, 7 Nov 2011 11:40:01 +0000 (13:40 +0200)]
ARMv6 optimized Intra4x4 prediction
Added ARM optimized intra 4x4 prediction
- 2x faster on Profiler compared to C-code compiled with -O3
- Function interface changed a little to improve BLOCKD structure
access
Yunqing Wang [Tue, 8 Nov 2011 17:11:48 +0000 (12:11 -0500)]
Fix checks in MB quantizer initialization
vp8cx_mb_init_quantizer() needs to be called at least once to get
all values calculated. This change added one check to decide if
we could skip initialization or not.
Adrian Grange [Tue, 8 Nov 2011 00:28:13 +0000 (16:28 -0800)]
Added check to make sure maximum buffer size not exceeded
Added code to clip the buffer level to the maximum buffer
size. Without this the buffer level would increase
unchecked.
This bug was found when encoding an essentially static
scene at 2Mb/s. The encoder is unable to generate frames
consistent with the high data-rate because Q bottoms out
at Qmin.
As frames generated are consistently undersized the buffer
level increases and does not get checked against the
maximum size specified by the user (or default).
Tero Rintaluoma [Tue, 25 Oct 2011 11:25:11 +0000 (14:25 +0300)]
Change use of eob in the encoder
Changed 'int eob' to 'char *eob' in BLOCKD so that both encoder and
decoder will use eobs[25] array from MACROBLOCKD structure. In future,
this will enable use of the decoder side IDCT in the encoder.
Stefan Holmer [Thu, 29 Sep 2011 07:17:09 +0000 (09:17 +0200)]
Changing decoder input partition API to input fragments.
Adding support for several partitions within one input fragment.
This is necessary to fully support all possible packetization
combinations in the VP8 RTP profile. Several partitions can
be transmitted in the same packet, and they can only be split
by reading the partition lengths from the bitstream.
Yunqing Wang [Tue, 1 Nov 2011 20:20:00 +0000 (16:20 -0400)]
Add checks in MB quantizer initialization
In some situations (f.g. error-resilient is turned on), vp8cx_mb
_init_quantizer() was called once per macroblock. Added checks
to avoid calculations when there is no change.
John Koleszar [Mon, 31 Oct 2011 21:42:51 +0000 (14:42 -0700)]
Correct SPLITMV clamping
Prior to this fix, the clamping state of the last subblock partition
dominated, whereas the correct behavior is to clamp if any partition
needs clamping. This bug was introduced by v0.9.6-232-g6b25501
See also:
[1]: http://code.google.com/p/webm/issues/detail?id=371
[2]: https://bugzilla.mozilla.org/show_bug.cgi?id=696390
Yaowu Xu [Thu, 20 Oct 2011 19:32:34 +0000 (15:32 -0400)]
added code to clear 2nd order block when appropriate
It is discovered that in rare situations the 2nd order block may
produce a few small magnitude coefficients that has no effect on
reconstruction. The situations are a combination of low quantizer
values (high quality) and low energy in residual signals (content
dependent). This commit added code to detect such cases and reset
the 2nd order block to all 0.
Patch 1 to 4 used code to do all-zero-check on idct result buffer,
and tests on derf set showed a consistent gain of .12%-.14% on all
metrics.But due to a recent change Ie31d90b, the idct result buffer
is not longer populated. So patch 5&6 use an alternative method to
detect the situations. Tests on derf set now shows a consistent
quality gain of .16%-.20%.
As suggested by Jim, Patch 7&8 removed the condition of all first
order block not having any coefficient, instead we reset 2nd order
coefficients to all 0 if sum of absolute value of the coefficients
is small. So it does slightly more than just detecting the oddity
as discussed above, but tests on derf set now show a consistent
gain of .20%-.23% on all metrics.
It is worth noting here that this change does not have any effect
on mid/high quantizer range, it only affects the quantizer value
18 or blow. Within this range, the change helps compression by up
to 2.5% on clips in the derf set.
Attila Nagy [Tue, 18 Oct 2011 06:48:50 +0000 (09:48 +0300)]
Reduce partial frame copy in encoder's pick_filter_level_fast
The partial frame copy function used to copy an extra 8 lines above
and below. The partial frame filtering can only modify 3 pixel rows
above the partial frame. Reduce copy to bare minimum needed, which is
4 lines, so that partial filtering on copied frame is possible.
Define the "magic" fraction number for partial filtering in
loopfilter.h .
Scott LaVarnway [Mon, 24 Oct 2011 20:16:08 +0000 (16:16 -0400)]
Removed read_mv_ref
Decode the mv mode with if-then-elses instead of traversing
the vp8_mv_ref_tree data structure. This will make it
easier to interleave vp8_find_near_mvs and vp8_mv_ref_probs.
Scott LaVarnway [Tue, 18 Oct 2011 16:06:50 +0000 (12:06 -0400)]
Remove usage of predict buffer for decode
Instead of using the predict buffer, the decoder now writes
the predictor into the recon buffer. For blocks with eob=0,
unnecessary idcts can be eliminated. This gave a performance
boost of ~1.8% for the HD clips used.
Tero: Added needed changes to ARM side and scheduled some
assembly code to prevent interlocks.
Patch Set 6: Merged (I1bcdca7a95aacc3a181b9faa6b10e3a71ee24df3)
into this commit because of similarities in the idct
functions.
Patch Set 7: EC bug fix.
It was crashing when number of partitions was bigger than the number
of MB rows (ex. 128x96 with 8 partitions).
Start point was not checked against mb_rows, plus extra
"empty" partitions were not written out.
Adrian Grange [Thu, 6 Oct 2011 22:49:11 +0000 (15:49 -0700)]
Added rate-targeted temporal scalability
Added the ability to create rate-targeted, temporally
scalable, VP8 compatible bitstreams.
The application vp8_scalable_patterns.c demonstrates how
to use this capability. Users can create output bitstreams
containing upto 5 temporally separable streams encoded
as a single VP8 bitstream.
(previously abandoned as:
I92d1483e887adb274d07ce9e567e4d0314881b0a)
James Berry [Fri, 7 Oct 2011 19:42:23 +0000 (15:42 -0400)]
bug fix - starting/optimal/max and buffer_level changed from int to int64_t
buffer_level in VP8_COMP and starting_buffer_level, optimal_buffer_level
and maximum_buffer_size in VP8_CONFIG changed from int to int64_t
to avoid potential crash issues for larger target bit rates.
Johann [Tue, 27 Sep 2011 00:17:20 +0000 (17:17 -0700)]
combine loopfilter data access
The data processed by the loopfilter overlaps. At the block level, this
results in some redundant transforms. Grouping the filtering allows for
a single 16x16 transpose (and inversion) instead of three 16x8 transposes
(and three more inversions).
This implementation is x86_64 only. We retain the previous
implementation for x86.
Improvements are obviously material dependant, but it seems to be ~%1 in
tests here.