Marco [Wed, 15 Jun 2016 21:20:32 +0000 (14:20 -0700)]
vp9: Adjustments to nonrd-pickmode for vbr
For VBR: (1) allow newmv mode for golden ref to
select interpolation filter (as in last ref case), and
(2) don't use the more aggressive tx-skip testing logic for large blocks.
Only affects 1 pass real-time vbr mode (speed >= 5).
PSNR/SSIM metrics on ytlive set are all positive, ~0.5-2% gain.
JackyChen [Wed, 15 Jun 2016 17:14:14 +0000 (10:14 -0700)]
vp9: Add bias to last frame in choose_partitioning.
This change is only for real-time mode if short_circuit_low_temp_var
is on. Add bias to last frame in choosing ref frame for partitioning,
when y_sad and y_sad_g are close. It speeds up real-time encoding by
0.5% on some clips with less than 0.1% overall PSNR drop on rtc test set.
Johann [Sat, 11 Jun 2016 01:00:34 +0000 (18:00 -0700)]
Match prev_ to number_of_layers
Defined as unsigned in VP8_CONFIG
Cleans warning in Android build:
comparison of integers of different signs: 'unsigned int' and 'int'
if (cpi->oxcf.number_of_layers != prev_number_of_layers)
~~~~~~~~~~~~~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~~~~~~~~
Change-Id: I969e64cd2bfda6e61c564476dbd35b892b177646
Johann [Sat, 11 Jun 2016 01:17:18 +0000 (18:17 -0700)]
Active map and ROI map use unsigned rows/cols
The vpx_roi_map_t and vpx_active_map_t structures use unsigned rows
and cols but VP8_COMMON uses signed values for mb_rows and mb_cols.
Cleans warning in Android build:
comparison of integers of different signs: 'int' and 'unsigned int'
if (cpi->common.mb_rows != rows || cpi->common.mb_cols != cols)
~~~~~~~~~~~~~~~~~~~ ^ ~~~~
comparison of integers of different signs: 'int' and 'unsigned int'
if (cpi->common.mb_rows != rows || cpi->common.mb_cols != cols)
~~~~~~~~~~~~~~~~~~~ ^ ~~~~
comparison of integers of different signs: 'unsigned int' and 'int'
if (rows == cpi->common.mb_rows && cols == cpi->common.mb_cols)
~~~~ ^ ~~~~~~~~~~~~~~~~~~~
comparison of integers of different signs: 'unsigned int' and 'int'
if (rows == cpi->common.mb_rows && cols == cpi->common.mb_cols)
Johann [Mon, 13 Jun 2016 18:56:42 +0000 (11:56 -0700)]
Make new_qc signed
Mode is signed
Cleans warning in Android build:
comparison of integers of different signs: 'int' and 'unsigned int'
if (ctx->oxcf.Mode != new_qc)
~~~~~~~~~~~~~~ ^ ~~~~~~
JackyChen [Mon, 6 Jun 2016 23:30:14 +0000 (16:30 -0700)]
vp9: Encoding cycle reduction for speed 8.
1. Skip golden non-zeromv and newmv-last for bsize >= 16x16 if the
temporal variance obtained from choose_partitioning is very low.
2. Skip horz and vert INTRA mode for speed 8.
This change works best on the clips with little noise and with some
motion (e.g. gips_motion which has > 5% speed up). PSNR drop is 1.78%
on rtc test set, no obvious visual quality regression found.
James Zern [Thu, 9 Jun 2016 00:33:34 +0000 (17:33 -0700)]
fdct8x8_test: fix unsigned overflow
the difference between src and dst will be signed, the error will be
unsigned.
quiets -fsanitize=integer:
unsigned integer overflow: 4294967295 * 4294967295
James Zern [Thu, 9 Jun 2016 00:29:02 +0000 (17:29 -0700)]
fdct4x4_test: fix unsigned overflow
the difference between src and dst will be signed, the error will be
unsigned.
quiets -fsanitize=integer:
unsigned integer overflow: 4294967295 * 4294967295
Marco [Wed, 8 Jun 2016 20:03:29 +0000 (13:03 -0700)]
vp9: Use nonrd_pick_partition on scene-cut, for speed 5 vbr mode.
On scene-cut detected frames (i.e., high_source_sad = 1), use
nonrd_pick_partition (over choose_part + select_part), as
the nonrd_pick partitioning is generally better.
Small positive increase in metrics on ytlive set (~0.5 - 1%).
Negligle overall speed decrease, as its only used on scene-cut frames.
Marco [Mon, 6 Jun 2016 18:42:09 +0000 (11:42 -0700)]
vp9: Small ajustment to settings gf_interval, 1 pass vbr.
Add a max condition and lower the min value.
No change in behavior (metrics for yt live set) for the
default min/max_gf_interval=4/16 settings.
Small positive change when min/max_gf_interval=7/16
(for 60fps clips on ytlive set).
Provides more comprehensive coverage for --enable-coefficient-checking.
The intent is to make the --enable-coefficient-checking option
consistent with the VP9 spec.
Linfeng Zhang [Fri, 3 Jun 2016 16:28:11 +0000 (09:28 -0700)]
Fix Visual Studio build failure in filter_selectively_vert_row2() calls
Error messages:
..\vp9\common\vp9_loopfilter.c(1312): warning C4244: 'function' :
conversion from 'uint64_t' to 'unsigned int', possible loss of data
[.build-x86_64-win64-vs10\vpx.vcxproj]
..\vp9\common\vp9_loopfilter.c(1313): warning C4244: 'function' :
conversion from 'uint64_t' to 'unsigned int', possible loss of data
[.build-x86_64-win64-vs10\vpx.vcxproj]
..\vp9\common\vp9_loopfilter.c(1312): error C2220: warning treated as
error - no 'object' file generated
[.build-x86_64-win64-vs10\vpx.vcxproj]
paulwilkins [Wed, 1 Jun 2016 16:13:31 +0000 (17:13 +0100)]
Slightly more damped VBR adjustment.
Increase in the damping used in adjusting the active Q range.
This does hurt rate accuracy a little in a few extreme cases
especially if the clip is very short*, but helps metrics.
* Note that the adjustment is applied at the GF/ARF group level based
on what happened in the last group. Hence for very short clips where
the length of a single group may be a significant % of the clip length
there is still scope for some drift that cannot be accommodated.
In practice most data points in our test sets are now much closer to target
than was previously the case with default settings and in some cases are
better even than they were with the command line undershoot and overshoot
parameter was set very low (e.g. 2%). For example in bridge_close at high rates
the old mechanism was unable to adapt enough to prevent extreme overshoot.
paulwilkins [Fri, 27 May 2016 15:04:43 +0000 (16:04 +0100)]
Change to get_twopass_worst_quality()
Change to the calculation of the error divisor used in
get_twopass_worst_quality(). This follows on from other
changes to the rate control that impact the output of this
function.
paulwilkins [Thu, 2 Jun 2016 16:34:03 +0000 (17:34 +0100)]
Remove gf_zeromotion_pct.
The use of this value is preventing rate adjustment on clips
or sections that have very little motion but high noise and
this can give rise to some sections with massive overshoot.
Linfeng Zhang [Tue, 24 May 2016 21:32:49 +0000 (14:32 -0700)]
Slow pshufb removal in 3 intra prediction functions.
Replaced vpx_d45_predictor_4x4_ssse3(), vpx_d45_predictor_8x8_ssse3()
and vpx_d207_predictor_4x4_ssse3() with
created vpx_d45_predictor_4x4_sse2(), vpx_d45_predictor_8x8_sse2()
and vpx_d207_predictor_4x4_sse2() respectively.
It's mostly neutral or slightly worse than ssse3 in good cases and
better than ssse3 in the bad cases (but still worse than using the mmx
regs).
jackychen [Fri, 20 May 2016 20:45:46 +0000 (13:45 -0700)]
vp9: Skip some modes when variance is low for big blocks, for 1 pass real-time.
Skip intra-mode and some inter-modes (newmv, nearmv, nearestmv) for
golden frame if the variance got from choose_partitioning is very low.
Only for 1 pass real-time CBR mode and bsize >= 32x32, it has ~2.5%
speed up with less than 0.1% PSNR drop for rtc test set. Don't see
visual regression.
Linfeng Zhang [Tue, 31 May 2016 17:38:01 +0000 (10:38 -0700)]
Update filter_selectively_vert_row2()
Reduce operations and jumps. perf shows CPU time reduced from 1.9% to
1.6% when decoding fdJc1_IBKJA.248.webm on Xeon E5.
Will apply the changes to vp10 after code review.
Marco [Tue, 31 May 2016 16:37:26 +0000 (09:37 -0700)]
vp9: Skip computation of best_sad for newmv, unless needed.
For non-rd pickmode:
best_pred_sad, computed for NEWMV-last, is only used for
skipping golden non-zero modes. Add condition to avoid this
computation if not used (i.e, if golden nonzero modes are not used).
And remove code for computing best_pred_sad for NEWMV-golden,
since that sad is not used.
No change in behavior; small speed gain (~1%) for svc encodes.
Linfeng Zhang [Tue, 17 May 2016 19:42:55 +0000 (12:42 -0700)]
Upgrade fwht4x4_mmx() to fwht4x4_sse2() for vp9 and vp10.
Function level timing test shows about 27% time saving on
a Xeon E5-2680 v2 desktop.
Rename vp9_dct_sse2.c to vp9_dct_intrin_sse2.c for vp9 and
rename dct_sse2.c to dct_intrin_sse2.c for vp10 to avoid
duplicate basenames.
Actually vp9_fwht4x4_mmx/sse2() and vp10_fwht4x4_mmx/sse2()
are identical. TODO: They should be unified later if there is
no intention to keep a duplicate.