Peter de Rivaz [Fri, 24 Oct 2014 07:37:39 +0000 (08:37 +0100)]
Refactored idct routines and headers
This change is made in preparation for a
subsequent patch which adds acceleration
for the highbitdepth transform functions.
The highbitdepth transform functions attempt
to use 16/32bit sse instructions where possible,
but fallback to using the C implementations if
potential overflow is detected. For this reason
the dct routines are made global so they can be
called from the acceleration functions in the
subsequent patch.
Paul Wilkins [Fri, 21 Nov 2014 02:32:44 +0000 (18:32 -0800)]
Remove rate component adjustment for AQ1
In AQ1 a rate adjustment was applied for blocks coded with a
deltaq. This tends to skew the partition selection and cause
rate overshoot.
For example, consider a 64x64 super block where some but not all
sub blocks are in a low q segment and some are in a high q segment.
The choice of Q when considering large partition and transform sizes
is defined by the lowest sub block segment id (currently this implies the
lowest Q). If some parts of the larger partition are very hard this will
cause a high rate component.
The correct behavior here is for the rd code to discard the large partition
choice and break down to sub blocks where some have low and some
have high Q. However the rate correction factor above mask the high
cost of coding at a larger partition size.
Johann [Thu, 20 Nov 2014 21:24:55 +0000 (13:24 -0800)]
Correctly initialize "ones" value in neon quantize
By using 0xff for a short it was not setting the high bits. When
comparing the output with vtst to find non-zero elements it was skipping
vaules which had no low bits set such as -512 / 0xFE00.
Using -8191 as the first element of coeff will generate this condition.
Paul Wilkins [Fri, 7 Nov 2014 16:32:50 +0000 (16:32 +0000)]
Add variance restriction to AQ2.
Add an additional restriction to bit/complexity based
segmentation based on spatial variance.
Only lower Q when both the number of bits spent
in the initial encoding pass and the spatial complexity are
below a threshold. This will prevent the low Q segments
being used just because there is a surfeit of bits.
Small metrics gains especially opsnr.
derf ~0.2% std-hd ~0.3%
Yunqing Wang [Thu, 20 Nov 2014 20:42:36 +0000 (12:42 -0800)]
vp9_ethread: move filter_cache out of RD_OPT struct
Similar to mask_filter, the filter_cache in RD_OPT struct can be
moved out, and declared as a local variable since it is only
used in pick_inter_mode functions.
Paul Wilkins [Thu, 20 Nov 2014 20:22:53 +0000 (12:22 -0800)]
Fix bug in calculating number of mbs with scaling.
Correct calculation of number of mbs in two pass code when
frame resizing is enabled. Always use initial number of mbs if
scaling is enabled, as this is what was used in the first pass.
Yunqing Wang [Thu, 20 Nov 2014 17:41:49 +0000 (09:41 -0800)]
vp9_ethread: change mask_filter to a local variable
The mask_filter in RD_OPT struct is used to record rd result in
filter decision. It is only used in pick_inter_mode functions,
and is removed from the struct and declared as a local variable.
Yunqing Wang [Thu, 20 Nov 2014 17:24:50 +0000 (09:24 -0800)]
vp9_ethread: move max/min partition size to mb struct
The max_partition_size and max_partition_size are set at the
beginning while setting speed features, and then adjusted at
SB level. Moving them to mb struct ensures there is a local
copy for each thread.
Yaowu Xu [Wed, 19 Nov 2014 23:32:11 +0000 (15:32 -0800)]
Add a reset to rc tracking for dropped frames
VP9/DatarateTestVP9Large.ChangingDropFrameThresh/[34] fails post the
merge of commit#ffa06b37. This commit adds reset of rc tracking info
when frame is dropped, and fixes the causes of the bad interaction
between the tests and the previous commit.
Jingning Han [Tue, 18 Nov 2014 19:53:14 +0000 (11:53 -0800)]
Combine fdct8x8 and quantization process
This commit reworks the forward transform and quantization process
for 8x8 block coding. It combines the two operations in a single
function to save a store/load stage of the original transform
coefficients. Overall the speed -6 is slightly faster (around 1%
range). The compression performance of speed -6 is improved by
3.4%.
Yaowu Xu [Tue, 18 Nov 2014 16:52:21 +0000 (08:52 -0800)]
Prevent severe rate control errors in CBR mode
In rare cases, the interaction between rate correction factor and Q
choices may cause severe oscillating frame sizes that are way off
target bandwidth. This commit adds tracking of rate control results
for last two frames, and use the information to prevent oscillating
Q choices.
Jingning Han [Tue, 18 Nov 2014 16:58:09 +0000 (08:58 -0800)]
Add sse2 version for vp9_quantize_fp
vp9_quantize_fp is the quantization process used by rtc coding
mode. This commit adds a sse2 implementation of it. The
implementation is modified based on vp9_quantize_b_sse2. No speed
difference from ssse3 version.
Marco [Tue, 11 Nov 2014 19:01:55 +0000 (11:01 -0800)]
Modify active_worst_quality setting for one pass CBR.
Current setting had active_worst_quality set too high (close to worst_quality)
for first frame(s) following first key frame. This changes that to be somewhat
more aggressive in allowing active_worst_quality to be lower following key frame.
Also remove the 4/5 reduction in active_worst for key frame as
this should be set by the user qp_max setting.
Jingning Han [Mon, 17 Nov 2014 19:21:42 +0000 (11:21 -0800)]
Add empty pointer check to pred buffering in rtc coding mode
This commit adds a check condition to the prediction buffering
operation used in the rtc coding mode. This resolves a unit test
warning in example/vpx_tsvc_encoder_vp9_mode_7.
Yunqing Wang [Sat, 15 Nov 2014 00:04:15 +0000 (16:04 -0800)]
vp9_ethread: combine encoder counts in separate struct
Several frame counters in encoder are updated at SB level. Combine
those counters and put them in a separate struct, which allows us
to allocate one copy for each thread.
Yunqing Wang [Thu, 13 Nov 2014 21:38:23 +0000 (13:38 -0800)]
vp9_ethread: modify the cyclic refresh struct
Two members in struct CYCLIC_REFRESH
int64_t projected_rate_sb;
int64_t projected_dist_sb;
are updated at the superblock level, which makes them shared data
in the multi-thread situation, and requires extra work to handle
them. However, those values are updated and used immediately, and
therefore can be removed. This patch cleaned up the code and
removed the two members.
Yaowu Xu [Thu, 13 Nov 2014 19:20:04 +0000 (11:20 -0800)]
adapt the adjustment limit for rate correction factor in RTC mode
Rate correction factor is used to correct the estimated rate for any
given quantizer, and feeds into rate control for quantizer selection.
We make use of the actual bits used to calculate this rate correction
factor with an adjustment limit to prevent over-adjustment.
This commit adapts the adjustment limit to the difference between the
estimated bits and the actual bits, allows the adjustment limit to vary
between 0.125 (when estimate is close to actual) and 0.625 (when there
is >10X factor off between estimated and actual bits). By doing this,
the commit appears to have largely corrected two observed issues:
1. Adjustment is too slow when the actual bits used is way off from
estimate due to the small adjustment limit.
2. Extreme oscillating quantizer choices due to the feedback loop.
Jingning Han [Sat, 8 Nov 2014 01:50:55 +0000 (17:50 -0800)]
Use reconstructed pixels for intra prediction
This commit makes the speed -6 and above use the reconstructed
boundary pixels for precise intra prediction. This allows more
intra prediction modes to be tested in the non-RD coding process.
Enabling horizontal and vertical intra prediction modes can
improve the speed -6 compression performance for rtc set
by 0.331%.
Yaowu Xu [Mon, 10 Nov 2014 19:46:58 +0000 (11:46 -0800)]
Use normal rate_correction_factor for gf in CBR mode
I0c5f010 changed to allow update golden reference buffer in CBR mode,
this commit changes the use of rate_correction_factor for those frames
to be aligned with the new usage. This commit attempts to solve two
issues:
a. Initialization of rate correction factor for Golden Frame
Prior to this patch, even the regular inter frame has been update
the rate correction factor based on content and encoding results,
the first golden frame would still use the ininitialized value
that can be way off.
b. Allowing rate correction factor update to be slightly faster
Prior to this patch, when the rate correction factor is off, the
update to the factor is too slow, the factor could not get close
to a semi-correct value even after many frames.
The commit helps all clips in psnr/ssim metric, but especially to
a few clip in RTC set that rate correction was way off. For example
thaloundeskmtgvga gained about .5dB for both overall/average psnr.