Marco [Wed, 15 Jun 2016 21:20:32 +0000 (14:20 -0700)]
vp9: Adjustments to nonrd-pickmode for vbr
For VBR: (1) allow newmv mode for golden ref to
select interpolation filter (as in last ref case), and
(2) don't use the more aggressive tx-skip testing logic for large blocks.
Only affects 1 pass real-time vbr mode (speed >= 5).
PSNR/SSIM metrics on ytlive set are all positive, ~0.5-2% gain.
Johann [Sat, 11 Jun 2016 01:17:18 +0000 (18:17 -0700)]
Active map and ROI map use unsigned rows/cols
The vpx_roi_map_t and vpx_active_map_t structures use unsigned rows
and cols but VP8_COMMON uses signed values for mb_rows and mb_cols.
Cleans warning in Android build:
comparison of integers of different signs: 'int' and 'unsigned int'
if (cpi->common.mb_rows != rows || cpi->common.mb_cols != cols)
~~~~~~~~~~~~~~~~~~~ ^ ~~~~
comparison of integers of different signs: 'int' and 'unsigned int'
if (cpi->common.mb_rows != rows || cpi->common.mb_cols != cols)
~~~~~~~~~~~~~~~~~~~ ^ ~~~~
comparison of integers of different signs: 'unsigned int' and 'int'
if (rows == cpi->common.mb_rows && cols == cpi->common.mb_cols)
~~~~ ^ ~~~~~~~~~~~~~~~~~~~
comparison of integers of different signs: 'unsigned int' and 'int'
if (rows == cpi->common.mb_rows && cols == cpi->common.mb_cols)
Johann [Mon, 13 Jun 2016 18:56:42 +0000 (11:56 -0700)]
Make new_qc signed
Mode is signed
Cleans warning in Android build:
comparison of integers of different signs: 'int' and 'unsigned int'
if (ctx->oxcf.Mode != new_qc)
~~~~~~~~~~~~~~ ^ ~~~~~~
JackyChen [Mon, 6 Jun 2016 23:30:14 +0000 (16:30 -0700)]
vp9: Encoding cycle reduction for speed 8.
1. Skip golden non-zeromv and newmv-last for bsize >= 16x16 if the
temporal variance obtained from choose_partitioning is very low.
2. Skip horz and vert INTRA mode for speed 8.
This change works best on the clips with little noise and with some
motion (e.g. gips_motion which has > 5% speed up). PSNR drop is 1.78%
on rtc test set, no obvious visual quality regression found.
James Zern [Thu, 9 Jun 2016 00:33:34 +0000 (17:33 -0700)]
fdct8x8_test: fix unsigned overflow
the difference between src and dst will be signed, the error will be
unsigned.
quiets -fsanitize=integer:
unsigned integer overflow: 4294967295 * 4294967295
James Zern [Thu, 9 Jun 2016 00:29:02 +0000 (17:29 -0700)]
fdct4x4_test: fix unsigned overflow
the difference between src and dst will be signed, the error will be
unsigned.
quiets -fsanitize=integer:
unsigned integer overflow: 4294967295 * 4294967295
Marco [Wed, 8 Jun 2016 20:03:29 +0000 (13:03 -0700)]
vp9: Use nonrd_pick_partition on scene-cut, for speed 5 vbr mode.
On scene-cut detected frames (i.e., high_source_sad = 1), use
nonrd_pick_partition (over choose_part + select_part), as
the nonrd_pick partitioning is generally better.
Small positive increase in metrics on ytlive set (~0.5 - 1%).
Negligle overall speed decrease, as its only used on scene-cut frames.
Marco [Mon, 6 Jun 2016 18:42:09 +0000 (11:42 -0700)]
vp9: Small ajustment to settings gf_interval, 1 pass vbr.
Add a max condition and lower the min value.
No change in behavior (metrics for yt live set) for the
default min/max_gf_interval=4/16 settings.
Small positive change when min/max_gf_interval=7/16
(for 60fps clips on ytlive set).
Provides more comprehensive coverage for --enable-coefficient-checking.
The intent is to make the --enable-coefficient-checking option
consistent with the VP9 spec.
Linfeng Zhang [Fri, 3 Jun 2016 16:28:11 +0000 (09:28 -0700)]
Fix Visual Studio build failure in filter_selectively_vert_row2() calls
Error messages:
..\vp9\common\vp9_loopfilter.c(1312): warning C4244: 'function' :
conversion from 'uint64_t' to 'unsigned int', possible loss of data
[.build-x86_64-win64-vs10\vpx.vcxproj]
..\vp9\common\vp9_loopfilter.c(1313): warning C4244: 'function' :
conversion from 'uint64_t' to 'unsigned int', possible loss of data
[.build-x86_64-win64-vs10\vpx.vcxproj]
..\vp9\common\vp9_loopfilter.c(1312): error C2220: warning treated as
error - no 'object' file generated
[.build-x86_64-win64-vs10\vpx.vcxproj]
paulwilkins [Wed, 1 Jun 2016 16:13:31 +0000 (17:13 +0100)]
Slightly more damped VBR adjustment.
Increase in the damping used in adjusting the active Q range.
This does hurt rate accuracy a little in a few extreme cases
especially if the clip is very short*, but helps metrics.
* Note that the adjustment is applied at the GF/ARF group level based
on what happened in the last group. Hence for very short clips where
the length of a single group may be a significant % of the clip length
there is still scope for some drift that cannot be accommodated.
In practice most data points in our test sets are now much closer to target
than was previously the case with default settings and in some cases are
better even than they were with the command line undershoot and overshoot
parameter was set very low (e.g. 2%). For example in bridge_close at high rates
the old mechanism was unable to adapt enough to prevent extreme overshoot.
paulwilkins [Fri, 27 May 2016 15:04:43 +0000 (16:04 +0100)]
Change to get_twopass_worst_quality()
Change to the calculation of the error divisor used in
get_twopass_worst_quality(). This follows on from other
changes to the rate control that impact the output of this
function.
paulwilkins [Thu, 2 Jun 2016 16:34:03 +0000 (17:34 +0100)]
Remove gf_zeromotion_pct.
The use of this value is preventing rate adjustment on clips
or sections that have very little motion but high noise and
this can give rise to some sections with massive overshoot.
Linfeng Zhang [Tue, 24 May 2016 21:32:49 +0000 (14:32 -0700)]
Slow pshufb removal in 3 intra prediction functions.
Replaced vpx_d45_predictor_4x4_ssse3(), vpx_d45_predictor_8x8_ssse3()
and vpx_d207_predictor_4x4_ssse3() with
created vpx_d45_predictor_4x4_sse2(), vpx_d45_predictor_8x8_sse2()
and vpx_d207_predictor_4x4_sse2() respectively.
It's mostly neutral or slightly worse than ssse3 in good cases and
better than ssse3 in the bad cases (but still worse than using the mmx
regs).
jackychen [Fri, 20 May 2016 20:45:46 +0000 (13:45 -0700)]
vp9: Skip some modes when variance is low for big blocks, for 1 pass real-time.
Skip intra-mode and some inter-modes (newmv, nearmv, nearestmv) for
golden frame if the variance got from choose_partitioning is very low.
Only for 1 pass real-time CBR mode and bsize >= 32x32, it has ~2.5%
speed up with less than 0.1% PSNR drop for rtc test set. Don't see
visual regression.
Linfeng Zhang [Tue, 31 May 2016 17:38:01 +0000 (10:38 -0700)]
Update filter_selectively_vert_row2()
Reduce operations and jumps. perf shows CPU time reduced from 1.9% to
1.6% when decoding fdJc1_IBKJA.248.webm on Xeon E5.
Will apply the changes to vp10 after code review.
Marco [Tue, 31 May 2016 16:37:26 +0000 (09:37 -0700)]
vp9: Skip computation of best_sad for newmv, unless needed.
For non-rd pickmode:
best_pred_sad, computed for NEWMV-last, is only used for
skipping golden non-zero modes. Add condition to avoid this
computation if not used (i.e, if golden nonzero modes are not used).
And remove code for computing best_pred_sad for NEWMV-golden,
since that sad is not used.
No change in behavior; small speed gain (~1%) for svc encodes.
Linfeng Zhang [Tue, 17 May 2016 19:42:55 +0000 (12:42 -0700)]
Upgrade fwht4x4_mmx() to fwht4x4_sse2() for vp9 and vp10.
Function level timing test shows about 27% time saving on
a Xeon E5-2680 v2 desktop.
Rename vp9_dct_sse2.c to vp9_dct_intrin_sse2.c for vp9 and
rename dct_sse2.c to dct_intrin_sse2.c for vp10 to avoid
duplicate basenames.
Actually vp9_fwht4x4_mmx/sse2() and vp10_fwht4x4_mmx/sse2()
are identical. TODO: They should be unified later if there is
no intention to keep a duplicate.
Tom Finegan [Fri, 27 May 2016 16:16:35 +0000 (09:16 -0700)]
vpx_ports/mem_ops.h: cast the lhs of bitwise shifts of 24.
C does not allow for shifting into the sign bit of a signed
integer, and the two instances here become signed ints via
promotion. Explcitly cast them to unsigned MEM_VALUE_T to
avoid the problem.
Linfeng Zhang [Thu, 19 May 2016 23:52:14 +0000 (16:52 -0700)]
Upgrade vpx_lpf_{vertical,horizontal}_4 mmx to sse2
Followed the code style of other lpf fuctions.
These 2 functions put 2 rows of data in a single xmm register,
so they have similar but not identical filter operations,
and cannot share the same macros.
Brion Vibber [Tue, 24 May 2016 18:42:51 +0000 (11:42 -0700)]
Move git version extras out of iOS shared framework bundle version
Apple's version format specification is strictly checked on app
store submission, even for embedded frameworks:
http://apple.co/1WgelY1
The build version number should be a string comprised of
three non-negative, period-separated integers with the
first integer being greater than zero. The string should
only contain numeric (0-9) and period (.) characters.
The full version returned from 'version.sh --bare' is now
embedded under a 'VPXFullVersion' custom key in the Info.plist,
so it can still be extracted from the resulting framework.
James Zern [Tue, 24 May 2016 19:02:22 +0000 (12:02 -0700)]
remove vp9_diamond_search_sad_avx.c
vp9_diamond_search_sad_avx was disabled in: 057c1c4 disable vp9_diamond_search_sad_avx
this removes a missing prototype warning as the prototype is no longer
included in vp9_rtcd.h. the file can be restored if someone gets around
to fixing the issue.
KO Myung-Hun [Sun, 22 May 2016 08:44:49 +0000 (17:44 +0900)]
configure: Add -mstackrealign flags to CFLAGS on OS/2
Many codes require -mstackrealign flags. Although -mstackrealign has
been already added to CFLAGS of some modules, SIGSEGV occurs in other
modules than those modules.
The best way may be to find causes and to fix them. However, we
cannot know those causes until SIGSEGV occur really. In addition, if
SIGSEGV occurs in other programs, it will be fatal.
So adding -mstackrealign flags to CFLAGS unconditionally is
reasonable.