Yaowu Xu [Thu, 20 Oct 2011 19:32:34 +0000 (15:32 -0400)]
added code to clear 2nd order block when appropriate
It is discovered that in rare situations the 2nd order block may
produce a few small magnitude coefficients that has no effect on
reconstruction. The situations are a combination of low quantizer
values (high quality) and low energy in residual signals (content
dependent). This commit added code to detect such cases and reset
the 2nd order block to all 0.
Patch 1 to 4 used code to do all-zero-check on idct result buffer,
and tests on derf set showed a consistent gain of .12%-.14% on all
metrics.But due to a recent change Ie31d90b, the idct result buffer
is not longer populated. So patch 5&6 use an alternative method to
detect the situations. Tests on derf set now shows a consistent
quality gain of .16%-.20%.
As suggested by Jim, Patch 7&8 removed the condition of all first
order block not having any coefficient, instead we reset 2nd order
coefficients to all 0 if sum of absolute value of the coefficients
is small. So it does slightly more than just detecting the oddity
as discussed above, but tests on derf set now show a consistent
gain of .20%-.23% on all metrics.
It is worth noting here that this change does not have any effect
on mid/high quantizer range, it only affects the quantizer value
18 or blow. Within this range, the change helps compression by up
to 2.5% on clips in the derf set.
Scott LaVarnway [Mon, 24 Oct 2011 20:16:08 +0000 (16:16 -0400)]
Removed read_mv_ref
Decode the mv mode with if-then-elses instead of traversing
the vp8_mv_ref_tree data structure. This will make it
easier to interleave vp8_find_near_mvs and vp8_mv_ref_probs.
Scott LaVarnway [Tue, 18 Oct 2011 16:06:50 +0000 (12:06 -0400)]
Remove usage of predict buffer for decode
Instead of using the predict buffer, the decoder now writes
the predictor into the recon buffer. For blocks with eob=0,
unnecessary idcts can be eliminated. This gave a performance
boost of ~1.8% for the HD clips used.
Tero: Added needed changes to ARM side and scheduled some
assembly code to prevent interlocks.
Patch Set 6: Merged (I1bcdca7a95aacc3a181b9faa6b10e3a71ee24df3)
into this commit because of similarities in the idct
functions.
Patch Set 7: EC bug fix.
It was crashing when number of partitions was bigger than the number
of MB rows (ex. 128x96 with 8 partitions).
Start point was not checked against mb_rows, plus extra
"empty" partitions were not written out.
Adrian Grange [Thu, 6 Oct 2011 22:49:11 +0000 (15:49 -0700)]
Added rate-targeted temporal scalability
Added the ability to create rate-targeted, temporally
scalable, VP8 compatible bitstreams.
The application vp8_scalable_patterns.c demonstrates how
to use this capability. Users can create output bitstreams
containing upto 5 temporally separable streams encoded
as a single VP8 bitstream.
(previously abandoned as:
I92d1483e887adb274d07ce9e567e4d0314881b0a)
James Berry [Fri, 7 Oct 2011 19:42:23 +0000 (15:42 -0400)]
bug fix - starting/optimal/max and buffer_level changed from int to int64_t
buffer_level in VP8_COMP and starting_buffer_level, optimal_buffer_level
and maximum_buffer_size in VP8_CONFIG changed from int to int64_t
to avoid potential crash issues for larger target bit rates.
Johann [Tue, 27 Sep 2011 00:17:20 +0000 (17:17 -0700)]
combine loopfilter data access
The data processed by the loopfilter overlaps. At the block level, this
results in some redundant transforms. Grouping the filtering allows for
a single 16x16 transpose (and inversion) instead of three 16x8 transposes
(and three more inversions).
This implementation is x86_64 only. We retain the previous
implementation for x86.
Improvements are obviously material dependant, but it seems to be ~%1 in
tests here.
vp8_find_near_mvs() is being called on all possible reference frames
but the data computed may be used if the loop exits early, which can
be due to x->skip beign set to 1.
Optimize this by call vp8_find_near_mvs() laziy only if it is going
to be used and not computed yet.
John Koleszar [Thu, 29 Sep 2011 13:14:37 +0000 (09:14 -0400)]
makefile: fix target 'all'
'all' is the conventional target for building everything in the
makefile, but the child make was expecting all-$(target), for debugging
reasons that I don't recall exactly. Restore the expected behavior.
Attila Nagy [Fri, 16 Sep 2011 10:54:06 +0000 (13:54 +0300)]
Multithreaded encoder, late sync loopfilter
Sync with loopfilter thread just at the beginning of next frame encoding.
This returns control to application faster and allows a better multicore scaling.
When PSNR packets are generated the final filtered frame is needed imediatly
so we cannot delay the sync.
Rd and Rm registers should be different in 'mul'. This register
combination results in unpredictable behaviour. GCC will give
a warning and RVCT an error in this case.
Restriction applies only to armv5 targets and not for armv6 and above.
Stefan Holmer [Tue, 6 Sep 2011 12:34:36 +0000 (14:34 +0200)]
Fix necessary for input partitions iface to match the RTP profile
These changes fixes a glitch between the RTP profile and the input
partitions interface. Since there's no way for the user to know the
actual number of partitions, the decoder have to read the
multi_token_paritition bits also when input partitions mode is
enabled.
Included are also a couple of fixes for issues with independent
partitions and uninitialized memory reads.
Scott LaVarnway [Fri, 16 Sep 2011 15:03:53 +0000 (11:03 -0400)]
clamp_mvs() using the wrong motion vector information
In the "Removed bmi copy to/from BLOCKD" commit, the copy
to the bmi in BLOCKD was eliminated. The clamp_mvs() used
the bmi in BLOCKD, which now contains incorrect values. This
patch fixes this problem.
Scott LaVarnway [Wed, 24 Aug 2011 18:42:26 +0000 (14:42 -0400)]
Removed bmi copy to/from BLOCKD
for SPLITMV and B_PRED modes. Modified code to use the bmi
found in mode_info_context instead of BLOCKD. On the decode
side, the uvmvs are calculated only when required, instead of
every macroblock. This is WIP. (bmi should eventually be
removed from BLOCKD)
Small performance gains noticed for RT encodes and decodes.(VGA)
Fritz Koenig [Mon, 22 Aug 2011 22:29:41 +0000 (15:29 -0700)]
Use local labels for jumps/loops in x86 assembly.
Prepend . to local labels in assembly code. This
allows non unique labels within a file. Also
makes profiling information more informative
by keeping the function name with the loop name.
Fritz Koenig [Mon, 22 Aug 2011 19:36:28 +0000 (12:36 -0700)]
Reclassify optimized ssim calculations as SSE2.
Calculations were incorrectly classified as either
SSE3 or SSSE3. Only using SSE2 instructions.
Cleanup function names and make non-RTCD code work
as well.
Fritz Koenig [Fri, 19 Aug 2011 15:51:27 +0000 (08:51 -0700)]
Reclasify optimized ssim calculations as SSE2.
Calculations were incorrectly classified as either
SSE3 or SSSE3. Only using SSE2 instructions.
Cleanup function names and make non-RTCD code work
as well.
Alpha Lam [Tue, 9 Aug 2011 19:59:45 +0000 (20:59 +0100)]
Copy less when active map is in use
When active map is specified and the current frame is not a key frame,
golden frame nor a altref frame then copy only those active regions.
This significantly reduces encoding time by as much as 19% on the test
system where realtime encoding is used. This is particularly useful
when the frame size is large (e.g. 2560x1600) and there's only a few
action macroblocks.
Paul Wilkins [Wed, 17 Aug 2011 13:14:23 +0000 (14:14 +0100)]
Small boost to every other frame.
Instead of a single mid GF boost apply a few extra bits to
every other frame. This gives a very small average metrics
improvement on both derf and YT sets.
John Koleszar [Fri, 12 Aug 2011 18:51:36 +0000 (14:51 -0400)]
Revert "Improved 1-pass CBR rate control"
This reverts commit b5ea2fbc2c1554769848774c836aad262af95072. Further
testing showed noticable keyframe popping in some cases, reverting this
for now to give time for a proper fix.
John Koleszar [Fri, 12 Aug 2011 15:30:54 +0000 (11:30 -0400)]
Propagate macroblock MV to subblocks for error concealment
EC expects the subblock MVs to be populated, but f1d6cc79e43f0066632f19c1854ca365086b712b removed this code. This
commit restores it, protected by CONFIG_ERROR_CONCEALMENT. May move this
to the EC code more directly in the future.