Angie Chiang [Fri, 30 Nov 2018 01:36:57 +0000 (17:36 -0800)]
Consider mv inconsistency in single_motion_search
This is still a work-in-process.
nb_full_mvs and lambda are set to zero for now, which means
mv inconsistency penalty is zero while doing the mv search.
Marco Paniconi [Thu, 29 Nov 2018 06:08:08 +0000 (22:08 -0800)]
vp9: Fix condition for disabling noise estimation
Fix condition for turning off denoiser due to high
motion: use proper superframe counter and
frames_since_key counter so this condition won't
take effect on key (super)frame.
Jerome Jiang [Tue, 27 Nov 2018 01:10:37 +0000 (17:10 -0800)]
VP9 SVC: fix crash on scaling partition.
When scaling up partition from lower resolution layer L, mi_row and
mi_col from L must be smaller than mi_rows and mi_cols from L.
Before this change, the condition was based on mi_rows from top layer
divided by 2, which is not necessarily equal to the mi_rows from lower
resolution layer.
Added variable in SVC structure to keep track of mi_rows and mi_cols
from each spatial layer.
Marco Paniconi [Mon, 26 Nov 2018 21:58:28 +0000 (13:58 -0800)]
vp9-svc: Fix to skip enhancement layer setting
If in constrained layer drop mode, avoid setting
skip flag if base layer is dropped, as whole superframe
will be dropped in this case. This avoids an assert trigger
in the svc superframe packing.
Jingning Han [Mon, 26 Nov 2018 17:43:07 +0000 (09:43 -0800)]
Fix ARF rate allocation for cq mode
In the limited test set, it improves the cq mode compression
performance by 1.9% in PSNR and 6% in SSIM as compared to use
same quantization parameter for all ARFs.
Jon Kunkee [Thu, 15 Nov 2018 09:27:42 +0000 (01:27 -0800)]
Work around ARM64 Windows SDK arm_neon.h quirk
Since the Windows SDK has an ARM32-only arm_neon.h, files including it
during ARM64 Windows builds need to be redirected to arm64_neon.h.
Instead of editing many files to include ARM64-Windows-specific ifdef
logic, this commit introduces an ARM64-Windows-specific version of
arm_neon.h that performs the needed redirection and lands earlier in
the header search path than the ARM32-only arm_neon.h.
Marco Paniconi [Mon, 12 Nov 2018 06:09:31 +0000 (22:09 -0800)]
vp9: Reorganize the buffer level for cbr mode
Refactor the code with some changes.
Split update into two parts: move the fillup
(with per-frame-bandwidth) before the encoding, and
keep the leaking part (with encoded_frame_size) after
the encoding (postencode).
For SVC with ref_frame_config usage: allow usage of timestamp
delta for the fillup part of buffer, instead of the (average)
framerate passed in via the duration.
Moving the buffer fillup (+per-frame-bandwidth) part to the
pre-encode causes some difference in performance
(since buffer level affects active_worst/QPand frame-dropping),
but the change is observed to be small.
Made small adjustment to active_worst_quality to compensate.
Jon Kunkee [Thu, 15 Nov 2018 21:01:04 +0000 (13:01 -0800)]
Add ARM64 support to VS project generation
Windows builds can use msbuild.exe to build libvpx through a set of
generated Visual Studio project files. This commit adds awareness of
ARM64 Windows to this process by adding ARM64 configurations and
setting msbuild properties to consume the right SDK version.
Jon Kunkee [Mon, 12 Nov 2018 21:40:56 +0000 (13:40 -0800)]
Add ARM64 Windows to configure scripts
In order to correctly configure for Windows 10 on ARM, this change adds
a --target value arm64-win64-vs15 to ./configure and adds feature
enable/disable logic for the new platform.
This is merely sufficient for Chromium targeting ARM64 Windows.
Jingning Han [Wed, 14 Nov 2018 22:58:56 +0000 (14:58 -0800)]
Disable tpl model in GF-only GOP structure
The tpl model assumes a relative short stats buffer length. Hence
it is not ready to support GF-only GOP structure where the max
length can go up to 250. Disable tpl model in such setting to avoid
a rare encode failure in GF-only setting.
vpx_dec_fuzzer: Unify single and multi-thread tests
As thread count is now randomized, serial and threaded modes can be
combined to a single binary.
With this change, threads takes values between 1 to 64 and tests both
single thread and multi-thread variants of the decoders
Jingning Han [Wed, 14 Nov 2018 07:20:03 +0000 (23:20 -0800)]
Fix GF-only frame type allocation
Rework the recursive ARF allocation to avoid missing one frame's
type assignment issue in GF only GOP structure. This fixes a rare
encoder failure issue in GF only setting.
Jingning Han [Tue, 13 Nov 2018 00:22:46 +0000 (16:22 -0800)]
Rescale arf bit budget calculation
To compute the total budget for a depth layer, exclude the count of
frames that have been allocated the bit budget. This improves the
avg PSNR by 0.15% and overall PSNR by 0.25% for lowres and midres
test sets.
Johann [Mon, 12 Nov 2018 19:30:03 +0000 (11:30 -0800)]
quantize: use aarch64 vmaxv
Simplify max value calculation on aarch64 by using vmaxv. Much
faster for 4x4 but diminishing returns as the block size grows.
Only the vp9 quantize has a speed test hooked up. Anticipate
similar results for the other quantize versions.
Before:
[ RUN ] NEON/VP9QuantizeTest.DISABLED_Speed/2
[ BENCH ] Bypass calculations 4x4 31.6 ms ( ±0.0 ms )
[ BENCH ] Full calculations 4x4 31.6 ms ( ±0.0 ms )
[ BENCH ] Bypass calculations 8x8 17.7 ms ( ±0.0 ms )
[ BENCH ] Full calculations 8x8 17.7 ms ( ±0.0 ms )
[ BENCH ] Bypass calculations 16x16 14.2 ms ( ±0.0 ms )
[ BENCH ] Full calculations 16x16 14.2 ms ( ±0.0 ms )
[ OK ] NEON/VP9QuantizeTest.DISABLED_Speed/2 (1906 ms)
[ RUN ] NEON/VP9QuantizeTest.DISABLED_Speed/3
[ BENCH ] Bypass calculations 32x32 18.6 ms ( ±0.0 ms )
[ BENCH ] Full calculations 32x32 18.6 ms ( ±0.0 ms )
After:
[ RUN ] NEON/VP9QuantizeTest.DISABLED_Speed/2
[ BENCH ] Bypass calculations 4x4 29.1 ms ( ±0.0 ms )
[ BENCH ] Full calculations 4x4 29.1 ms ( ±0.0 ms )
[ BENCH ] Bypass calculations 8x8 16.9 ms ( ±0.0 ms )
[ BENCH ] Full calculations 8x8 16.9 ms ( ±0.0 ms )
[ BENCH ] Bypass calculations 16x16 14.1 ms ( ±0.0 ms )
[ BENCH ] Full calculations 16x16 14.1 ms ( ±0.0 ms )
[ OK ] NEON/VP9QuantizeTest.DISABLED_Speed/2 (1803 ms)
[ RUN ] NEON/VP9QuantizeTest.DISABLED_Speed/3
[ BENCH ] Bypass calculations 32x32 18.6 ms ( ±0.0 ms )
[ BENCH ] Full calculations 32x32 18.6 ms ( ±0.0 ms )
Yaowu Xu [Wed, 7 Nov 2018 19:20:32 +0000 (11:20 -0800)]
Simplify rdmult computation
Recognizing that max dc_quant used in rdmult computation is 21387 and
21387 * 21387 * 88 / 24 is still within the range of int32_t, this
commit simplifies the computation with minor cleanups.