Paul Wilkins [Tue, 28 Jun 2011 16:29:47 +0000 (17:29 +0100)]
Change to arf boost calculation.
In this commit I have added an experimental function
that tests prediction quality either side of a central position
to calculate a suggested boost number for an ARF frame.
The function is passed an offset from the current position and
a number of frames to search forwards and backwards.
It returns a forward, backward and compound boost number.
The new code can be deactivated using #define NEW_BOOST 0
In its current default state the code searches forwards and backwards
from the proposed position of the next alt ref.
The the old code used a boost number calculated by scanning forward
from the previous GF up to the proposed alt ref frame position.
I have also added some code to try and prevent placement of a gf/arf
where there is a brief flash.
John Koleszar [Wed, 29 Jun 2011 13:53:12 +0000 (09:53 -0400)]
vpxenc: prevent wraparound in the --rate-hist ringbuffer
For clips that are near 60fps and have a lot of alt refs, it's possible
that the ring buffer holding the previous frames sizes/pts could wrap
around, leading to a division by zero.
In addition to checking for this condition in the ring buffer loop,
the buffer size is made dependent on the actual frame rate in use,
rather than defaulting to 60, which should improve accuracy at frame
rates >= ~60.
John Koleszar [Tue, 28 Jun 2011 21:03:47 +0000 (17:03 -0400)]
Use MAX_ENTROPY_TOKENS and ENTROPY_NODES more consistently
There were many instances in the code of vp8_coef_tokens and
vp8_coef_tokens-1, which was a preprocessor macro despite the naming
convention. Replace these with MAX_ENTROPY_TOKENS and ENTROPY_NODES,
respectively.
Stefan Holmer [Mon, 13 Jun 2011 14:42:27 +0000 (16:42 +0200)]
New ways of passing encoded data between encoder and decoder.
With this commit frames can be received partition-by-partition
from the encoder and passed partition-by-partition to the
decoder.
At the encoder-side this makes it easier to split encoded
frames at partition boundaries, useful when packetizing
frames. When VPX_CODEC_USE_OUTPUT_PARTITION is enabled,
several VPX_CODEC_CX_FRAME_PKT packets will be returned
from vpx_codec_get_cx_data(), containing one partition
each. The partition_id (starting at 0) specifies the decoding
order of the partitions. All partitions but the last has
the VPX_FRAME_IS_FRAGMENT flag set.
At the decoder this opens up the possibility of decoding partition
N even though partition N-1 was lost (given that independent
partitioning has been enabled in the encoder) if more info
about the missing parts of the stream is available through
external signaling.
Each partition is passed to the decoder through the
vpx_codec_decode() function, with the data pointer pointing
to the start of the partition, and with data_sz equal to the
size of the partition. Missing partitions can be signaled to
the decoder by setting data != NULL and data_sz = 0. When
all partitions have been given to the decoder "end of data"
should be signaled by calling vpx_codec_decode() with
data = NULL and data_sz = 0.
The first partition is the first partition according to the
VP8 bitstream + the uncompressed data chunk + DCT address
offsets if multiple residual partitions are used.
Mike Hommey [Fri, 17 Jun 2011 00:33:52 +0000 (02:33 +0200)]
Avoid text relocations in ARM vp8 decoder
The current code stores pointers to coefficient tables and loads them to
access the tables contents. As these pointers are stored in the code
sections, it means we end up with text relocations. eu-findtextrel will
thus complain about code not compiled with -fpic/-fPIC.
Since the pointers are stored in the code sections, we can actually cheat
and let the assembler generate relative addressing when accessing the
coefficient tables, and just load their location with adr.
Stefan Holmer [Thu, 16 Jun 2011 09:44:50 +0000 (11:44 +0200)]
Adding support for error concealment in multi-threaded decoding
Also includes a couple of error concealment bug fixes:
- the segment_id wasn't properly initialized when missing
- when interpolating and no neighbors are found, set to zero
- clear the qcoef buffer when concealing an MB
Further testing shows that the overshoot accumulation/damping is too
aggressive on some clips. Allowing the accumulated overshoot to
decay and limiting to damping to golden frames shows some promise.
But some clips show significant overshoot in the buffer window, so
I think this still needs work.
James Berry [Wed, 22 Jun 2011 16:41:17 +0000 (12:41 -0400)]
get/set reference buffer dimension check added
vp8_yv12_copy_frame_ptr() expects same size
buffers which was not previously gaurenteed.
Using an improperly allocated buffer would
cause a crash before.
Yaowu Xu [Mon, 20 Jun 2011 23:30:26 +0000 (16:30 -0700)]
adjusting the calculation of errorperbit
RDMULT/RDDIV defines a bit worth of distortion in term of sum squared
difference. This has also been used as errorperbit in subpixel motion
search, where the distortions computed as variance of the difference.
The variance of differences is different from sum squared differences
by amount of DC squared. Typically, for inter predicted MBs, this
difference averages around 10% between the two distortion, so this patch
introduces a 110% constant in deriving errorperbit from RDMULT/RDDIV.
Test on CIF set shows small but positive gain on overall PSNR (.03%)
and SSIM (.07%), overall impact on average PSNR is 0.
Scott LaVarnway [Mon, 20 Jun 2011 18:44:16 +0000 (14:44 -0400)]
Improved vp8dx_decode_bool
Relocated the vp8dx_bool_decoder_fill() call, allowing
the compiler to produce better assembly code. Tests
showed a 1 - 2 % performance boost (x86 using gcc)
for the 720p clip used.
Yunqing Wang [Fri, 17 Jun 2011 18:19:51 +0000 (14:19 -0400)]
Remove unnecessary bounds checking in motion search
The starting points are always within the limits, and bounds
checking on these points is not needed. For speed < 5, the
encoded result changes a little because different treatment
is taken while starting point equals the bounds.
Ronald S. Bultje [Thu, 16 Jun 2011 17:01:27 +0000 (13:01 -0400)]
Assign boost to GF bit allocation if past frame had no ARF.
Modify the second-pass code to provide a full golden-frame (GF) bit
allocation boost if the past GF group (GFG) had no alt-ref frame (ARF),
even if the current GFG does contain and ARF.
This mostly has no effect on clips, since switching ARFs on/off between
GFGs is not very common. Has a positive effect on e.g. cheer (+0.45 SSIM
at 600kbps) and football (+0.25 SSIM at 600kbps), particularly at high
bitrates. Has a negative effect (-0.04 SSIM at 300kbps) at pamphlet,
which appears only marginally related to this patch, and crew (-0.1 SSIM
at 700kbps).
James Zern [Wed, 15 Jun 2011 21:42:06 +0000 (14:42 -0700)]
gen_msvs_proj: write boolean for Debug attribute
Replace =1 with =true for yasm tool element. This aids in upgrading
e.g., vs9 project files to vs10.
build/x86-msvs/yasm.xml generated during conversion will require the
Separator attribute to be removed for the build to complete
successfully.
Ronald S. Bultje [Wed, 15 Jun 2011 13:47:00 +0000 (09:47 -0400)]
Disable specialcase for last frames if the sequence contains ARFs.
firstpass.c contains some rate adjustment code that assures that the
last few frames in a sequence abide by rate limits. If the second-to-
last group of frames contains an alt-ref frame (ARF), the last golden
frame (GF) is zero bytes, and we will thus spend a ridiculously high
number of bits on regular P-frames trying to hit the target rate. This
does slightly enhance the quality of these last few frames, but has
no perceptual value (other than hitting the target rate).
Disabling this code means we consistently (slightly) undershoot the
target rate and consequently do worse on the last few frames of a
clip, which is particularly noticeable for small clips. The quality-
per-bitrate is generally better, ~0.2% better overall on derf-set,
especially on clips such as garden, tennis, foreman at low bitrates.
Has a negative effect on hallmonitor at high bitrates.
Tero Rintaluoma [Tue, 14 Jun 2011 10:39:06 +0000 (13:39 +0300)]
Fix RT only build
Moved encode_intra function from firstpass.c to encodeintra.c to
prevent linking problem in real-time only build. Also changed name
of the function to vp8_encode_intra because it is not a static.
Tero Rintaluoma [Tue, 14 Jun 2011 08:29:35 +0000 (11:29 +0300)]
Update -linux-rvct targets
- Updated -linux-rvct targets to support RVDS 4.0 and later.
- Changed optimization flag to -Otime because -O3 ruined performance
for RVCT linux targets.
- Added support for --enable-small for RVCT
- RVCT created library should be able to link with GCC
- Supports building shared linux libraries
Scott LaVarnway [Mon, 13 Jun 2011 21:14:11 +0000 (17:14 -0400)]
Populate bmi for B_PRED only
Small decode performance gain (~1%) on keyframes. No
noticeable gains on encode. Also changed pick_intra4x4mby_modes()
to read the above and left block modes for keyframes only.
Tero Rintaluoma [Fri, 10 Jun 2011 11:05:53 +0000 (14:05 +0300)]
Fix make clean for asm offset files
Automatically created assembly offset files added to CLEAN-OBJS list
for proper cleanup. This will fix following build error:
1) Build for the workstation
./conigure
make
make clean
2) Build for ARM platform
./configure --target=armv7-linux-gcc
make ==> this will fail because it uses old asm_*_offset.asm files
John Koleszar [Wed, 8 Jun 2011 17:50:50 +0000 (13:50 -0400)]
vp8_pick_inter_mode: remove best_bmodes
Since BPRED will be tested at most once, and SPLITMV is not enabled,
there's nothing to clobber the subblock modes, so there's no need to
save and restore them.
Yaowu Xu [Tue, 7 Jun 2011 20:59:46 +0000 (13:59 -0700)]
Adjust errorperbit according to RDMULT in activity masking
In activity masking, RDO constant RDMULT is adjusted on a per MB basis
adaptive to activity with the MB. errorperbit, which is defined as
RDMULT/RDDIV, is a constant used in motion estimation. Previously, in
activity masking, errorperbit is not changed even when RDMULT is changed.
This commit changed to adjust errorperbit according to the change in
RDMULT.
Test in cif set showed a very small but consistent gain by all quality
metrics (average, overall psnr and ssim) when activity masking is on.
John Koleszar [Wed, 8 Jun 2011 15:24:52 +0000 (11:24 -0400)]
Move intra block mode selection to pickinter.c
This commit moves the intra block mode selection from encodeframe.c
to pickinter.c (in the non-RD case). This allowed pick_intra_mbuv_mode
and pick_intra4x4mby_modes to be made static, and is a step towards
refactoring intra mode selection in the main pickinter loop. Gave a
small perf increase (~0.5%).
Paul Wilkins [Wed, 8 Jun 2011 15:00:59 +0000 (16:00 +0100)]
Further activity masking changes:
Some further re-structuring of activity masking code.
Still has various experimental switches.
Supports a metric based on intra encode.
Experimental comparison against a fixed activity target rather
than a frame average, for altering rd and zbin.
Overall the SSIM performance is similar to TT's original
code but there is a much smaller PSNR hit of circa
0.5% instead of 3.2%
Yaowu Xu [Fri, 20 May 2011 06:12:40 +0000 (23:12 -0700)]
adjust sad per bit constants
While investigating the effect of DC values on SAD and SSE in motion
estimation, a side finding indicates the two table of constants need
be adjusted. The adjustment was done by multiplying old constants by
90% with rounding. Also absorb the 1/2 scaling constant into the two
tables. Refer to change Ifa285c3e for background of the 1/2 factor.
Cif set test showed a very small gain on all metric.
Yaowu Xu [Mon, 6 Jun 2011 23:42:58 +0000 (16:42 -0700)]
remove redundant functions
The encoder defined about 4 set of similar functions to calculate sum,
variance or sse or a combination of them. This commit removed one set
of these functions, get8x8var and get16x16var, where calls to the later
function are replaced with var16x16 by using the fact on a 16x16 MB:
variance == sse - sum*sum/256
Two pass encodes were tested at best quality. One pass encodes were
tested only at realtime speed 4:
--rt --cpu-used=-4
The peak datarate (over the specified 500ms window) was measured
for each encode, and averaged together to get metric for
"average peak," computed as SUM(peak)/SUM(target). This patch
reduces the average peak datarate as follows:
One pass:
baseline: 1.29715
this patch: 1.23664
Two pass:
baseline: 1.32702
this patch: 1.37824
This change had a positive effect on our quality metrics as well:
One pass CBR:
Min / Mean / Max (pct)
Average PSNR -0.42 / 2.86 / 27.32
Overall PSNR -0.90 / 2.00 / 17.27
SSIM -0.05 / 3.95 / 37.46
Two pass CBR:
Min / Mean / Max (pct)
Average PSNR -4.47 / 4.35 / 35.99
Overall PSNR -3.40 / 4.18 / 36.46
SSIM -4.56 / 6.98 / 53.67
One pass VBR:
Min / Mean / Max (pct)
Average PSNR -5.21 / 0.01 / 3.30
Overall PSNR -8.10 / -0.38 / 1.21
SSIM -7.38 / -0.11 / 3.17
(note: most values here were close to the mean, there were a few
outliers on files that were very sensitive to golden frame size)
Two pass VBR:
Min / Mean / Max (pct)
Average PSNR 0.00 / 0.00 / 0.00
Overall PSNR 0.00 / 0.00 / 0.00
SSIM 0.00 / 0.00 / 0.00
Neither one pass or two pass CBR mode adheres particularly strictly
to the short term buffer constraints, and two pass is less
consistent, even in the baseline commit. This should be addressed
in a later commit. This likely will hurt the quality numbers, as it
will have to reduce the burstiness of golden frames.
Aside: My work on this commit makes it clear that we need to make
rate control modes "pluggable", where you can easily write a new
one or work on one in isolation.
Yunqing Wang [Thu, 2 Jun 2011 21:33:17 +0000 (17:33 -0400)]
Adjust bounds checking for hex search in real-time mode
Currently, hex search couldn't guarantee the motion vector(MV)
found is within the limit of maximum MV. Therefore, very large
motion vectors resulted from big motion in the video could cause
encoding artifacts. This change adjusted hex search bounds
checking to make sure the resulted motion vector won't go out
of the range. James Berry, thank you for finding the bug.
Don't allow very short GF groups even when the GF is predicted from an ARF.
This is basically a slightly modified version of the previous patch,
and it has a moderately positive effect (SSIM/PSNR both +0.08% avg
on derf-set). Most clips show no change, except waterfall/coastguard,
each ~ +0.8% SSIM/PSNR. You can see similar effects in other clips
by shortening their length to terminate at a very short last group
of frames.