Stefan Holmer [Tue, 6 Sep 2011 12:34:36 +0000 (14:34 +0200)]
Fix necessary for input partitions iface to match the RTP profile
These changes fixes a glitch between the RTP profile and the input
partitions interface. Since there's no way for the user to know the
actual number of partitions, the decoder have to read the
multi_token_paritition bits also when input partitions mode is
enabled.
Included are also a couple of fixes for issues with independent
partitions and uninitialized memory reads.
Scott LaVarnway [Wed, 24 Aug 2011 18:42:26 +0000 (14:42 -0400)]
Removed bmi copy to/from BLOCKD
for SPLITMV and B_PRED modes. Modified code to use the bmi
found in mode_info_context instead of BLOCKD. On the decode
side, the uvmvs are calculated only when required, instead of
every macroblock. This is WIP. (bmi should eventually be
removed from BLOCKD)
Small performance gains noticed for RT encodes and decodes.(VGA)
Fritz Koenig [Mon, 22 Aug 2011 22:29:41 +0000 (15:29 -0700)]
Use local labels for jumps/loops in x86 assembly.
Prepend . to local labels in assembly code. This
allows non unique labels within a file. Also
makes profiling information more informative
by keeping the function name with the loop name.
Fritz Koenig [Mon, 22 Aug 2011 19:36:28 +0000 (12:36 -0700)]
Reclassify optimized ssim calculations as SSE2.
Calculations were incorrectly classified as either
SSE3 or SSSE3. Only using SSE2 instructions.
Cleanup function names and make non-RTCD code work
as well.
Fritz Koenig [Fri, 19 Aug 2011 15:51:27 +0000 (08:51 -0700)]
Reclasify optimized ssim calculations as SSE2.
Calculations were incorrectly classified as either
SSE3 or SSSE3. Only using SSE2 instructions.
Cleanup function names and make non-RTCD code work
as well.
Alpha Lam [Tue, 9 Aug 2011 19:59:45 +0000 (20:59 +0100)]
Copy less when active map is in use
When active map is specified and the current frame is not a key frame,
golden frame nor a altref frame then copy only those active regions.
This significantly reduces encoding time by as much as 19% on the test
system where realtime encoding is used. This is particularly useful
when the frame size is large (e.g. 2560x1600) and there's only a few
action macroblocks.
Paul Wilkins [Wed, 17 Aug 2011 13:14:23 +0000 (14:14 +0100)]
Small boost to every other frame.
Instead of a single mid GF boost apply a few extra bits to
every other frame. This gives a very small average metrics
improvement on both derf and YT sets.
John Koleszar [Fri, 12 Aug 2011 18:51:36 +0000 (14:51 -0400)]
Revert "Improved 1-pass CBR rate control"
This reverts commit b5ea2fbc2c1554769848774c836aad262af95072. Further
testing showed noticable keyframe popping in some cases, reverting this
for now to give time for a proper fix.
John Koleszar [Fri, 12 Aug 2011 15:30:54 +0000 (11:30 -0400)]
Propagate macroblock MV to subblocks for error concealment
EC expects the subblock MVs to be populated, but f1d6cc79e43f0066632f19c1854ca365086b712b removed this code. This
commit restores it, protected by CONFIG_ERROR_CONCEALMENT. May move this
to the EC code more directly in the future.
Stefan Holmer [Mon, 8 Aug 2011 08:56:20 +0000 (10:56 +0200)]
Disable error concealment until first key frame is decoded
When error concealment is enabled the first key frame must
be successfully received before error concealment is activated.
Error concealment will be activated when the delta following
delta frame is received.
Also fixed a couple of bugs related to error tracking in
multi-threading. And avoiding decoding corrupt residual
when we have multiple non-resilient partitions.
John Koleszar [Fri, 5 Aug 2011 16:27:25 +0000 (12:27 -0400)]
Fix potential OOB read with Error Concealment
This patch fixes an OOB read when error concealment is enabled and the
partition sizes are corrupt. The partition size read from the bitstream
was not being validated in EC mode.
John Koleszar [Fri, 12 Aug 2011 15:30:54 +0000 (11:30 -0400)]
Propagate macroblock MV to subblocks for error concealment
EC expects the subblock MVs to be populated, but f1d6cc79e43f0066632f19c1854ca365086b712b removed this code. This
commit restores it, protected by CONFIG_ERROR_CONCEALMENT. May move this
to the EC code more directly in the future.
Stefan Holmer [Mon, 8 Aug 2011 08:56:20 +0000 (10:56 +0200)]
Disable error concealment until first key frame is decoded
When error concealment is enabled the first key frame must
be successfully received before error concealment is activated.
Error concealment will be activated when the delta following
delta frame is received.
Also fixed a couple of bugs related to error tracking in
multi-threading. And avoiding decoding corrupt residual
when we have multiple non-resilient partitions.
John Koleszar [Fri, 5 Aug 2011 16:27:25 +0000 (12:27 -0400)]
Fix potential OOB read with Error Concealment
This patch fixes an OOB read when error concealment is enabled and the
partition sizes are corrupt. The partition size read from the bitstream
was not being validated in EC mode.
John Koleszar [Wed, 3 Aug 2011 20:12:12 +0000 (16:12 -0400)]
Fix source buffer selection
This patch fixes a bug in the interaction between the recode loop and
spatial resampling. If the codec was in a spatial resampling state,
and a subsequent iteration of the recode loop disables resampling,
then the source buffer must be reset to the unscaled source.
Yunqing Wang [Wed, 3 Aug 2011 15:51:07 +0000 (11:51 -0400)]
Adjust half-pixel only search
Changed motion search in vp8_find_best_half_pixel_step() to be the
same as in vp8_find_best_sub_pixel_step(), which checks 5 points
instead of 8 points. This only affects real-time mode with
cpu-used >=9. Tests showed it gives 2% encoding speedup with
a quality loss(psnr) of up to 0.5%.
John Koleszar [Wed, 3 Aug 2011 13:20:37 +0000 (09:20 -0400)]
Fix building of static libs on universal-darwin
The static libs should not be built from sources during the top level
of a universal build. This regression was introduced in commit 495b241fa6b03345baf2b2f39aa8c06c735fccc2, which made the static
libs selectable under CONFIG_STATIC.
John Koleszar [Thu, 28 Jul 2011 13:17:32 +0000 (09:17 -0400)]
Convert rc_max_intra_bitrate_pct to control
Since this is the only ABI incompatible change since the last release,
convert it to use the control interface instead. The member of the
configuration struct is replaced with the VP8E_SET_MAX_INTRA_BITRATE_PCT
control.
More significant API changes were expected to be forthcoming when this
control was first introduced, and while they continue to be expected,
it's not worth breaking compatibility for only this change.
Yunqing Wang [Fri, 22 Jul 2011 20:01:11 +0000 (16:01 -0400)]
Preload reference area in sub-pixel motion search (real-time mode)
This change implemented same idea in change "Preload reference area
to an intermediate buffer in sub-pixel motion search." The changes
were made to vp8_find_best_sub_pixel_step() and vp8_find_best_half
_pixel_step() functions which are called when speed >= 5. Test
result (using tulip clip):
Yunqing Wang [Tue, 28 Jun 2011 13:14:13 +0000 (09:14 -0400)]
Preload reference area to an intermediate buffer in sub-pixel motion search
In sub-pixel motion search, the search range is small(+/- 3 pixels).
Preload whole search area from reference buffer into a 32-byte
aligned buffer. Then in search, load reference data from this buffer
instead. This keeps data in cache, and reduces the crossing cache-
line penalty. For tulip clip, tests on Intel Core2 Quad machine(linux)
showed encoder speed improvement:
3.4% at --rt --cpu-used =-4
2.8% at --rt --cpu-used =-3
2.3% at --rt --cpu-used =-2
2.2% at --rt --cpu-used =-1
Test on Atom notebook showed only 1.1% speed improvement(speed=-4).
Test on Xeon machine also showed less improvement, since unaligned
data access latency is greatly reduced in newer cores.
Next, I will apply similar idea to other 2 sub-pixel search functions
for encoding speed > 4.
Mark ARM asm objects as allowing a non-executable stack.
This adds the magic .note.GNU-stack section at the end of each ARM
asm file (when built with gas), indicating that a non-executable
stack is allowed.
Without this section, the linker will assume the object requires an
executable stack by default, forcing an executable stack for the
entire program.
This is done by expanding luma row to 32-byte alignment, since
there is currently a bunch of code that assumes that
uv_stride == y_stride/2 (see, for example, vp8/common/postproc.c,
common/reconinter.c, common/arm/neon/recon16x16mb_neon.asm,
encoder/temporal_filter.c, and possibly others; I haven't done a
full audit).
It also uses replaces the hardcoded border of 16 in a number of
encoder buffers with VP8BORDERINPIXELS (currently 32), as the
chroma rows start at an offset of border/2.
Together, these two changes have the nice advantage that simply
dumping the frame memory as a contiguous blob produces a valid,
if padded, image.
This version of the check doesn't work with generic-gnu, and figuring
out the correct symbol version at configure time is probably more work
than this is worth. May revisit in the future.