Jon Kunkee [Thu, 15 Nov 2018 09:27:42 +0000 (01:27 -0800)]
Work around ARM64 Windows SDK arm_neon.h quirk
Since the Windows SDK has an ARM32-only arm_neon.h, files including it
during ARM64 Windows builds need to be redirected to arm64_neon.h.
Instead of editing many files to include ARM64-Windows-specific ifdef
logic, this commit introduces an ARM64-Windows-specific version of
arm_neon.h that performs the needed redirection and lands earlier in
the header search path than the ARM32-only arm_neon.h.
Marco Paniconi [Mon, 12 Nov 2018 06:09:31 +0000 (22:09 -0800)]
vp9: Reorganize the buffer level for cbr mode
Refactor the code with some changes.
Split update into two parts: move the fillup
(with per-frame-bandwidth) before the encoding, and
keep the leaking part (with encoded_frame_size) after
the encoding (postencode).
For SVC with ref_frame_config usage: allow usage of timestamp
delta for the fillup part of buffer, instead of the (average)
framerate passed in via the duration.
Moving the buffer fillup (+per-frame-bandwidth) part to the
pre-encode causes some difference in performance
(since buffer level affects active_worst/QPand frame-dropping),
but the change is observed to be small.
Made small adjustment to active_worst_quality to compensate.
Jon Kunkee [Thu, 15 Nov 2018 21:01:04 +0000 (13:01 -0800)]
Add ARM64 support to VS project generation
Windows builds can use msbuild.exe to build libvpx through a set of
generated Visual Studio project files. This commit adds awareness of
ARM64 Windows to this process by adding ARM64 configurations and
setting msbuild properties to consume the right SDK version.
Jon Kunkee [Mon, 12 Nov 2018 21:40:56 +0000 (13:40 -0800)]
Add ARM64 Windows to configure scripts
In order to correctly configure for Windows 10 on ARM, this change adds
a --target value arm64-win64-vs15 to ./configure and adds feature
enable/disable logic for the new platform.
This is merely sufficient for Chromium targeting ARM64 Windows.
Jingning Han [Wed, 14 Nov 2018 22:58:56 +0000 (14:58 -0800)]
Disable tpl model in GF-only GOP structure
The tpl model assumes a relative short stats buffer length. Hence
it is not ready to support GF-only GOP structure where the max
length can go up to 250. Disable tpl model in such setting to avoid
a rare encode failure in GF-only setting.
vpx_dec_fuzzer: Unify single and multi-thread tests
As thread count is now randomized, serial and threaded modes can be
combined to a single binary.
With this change, threads takes values between 1 to 64 and tests both
single thread and multi-thread variants of the decoders
Jingning Han [Wed, 14 Nov 2018 07:20:03 +0000 (23:20 -0800)]
Fix GF-only frame type allocation
Rework the recursive ARF allocation to avoid missing one frame's
type assignment issue in GF only GOP structure. This fixes a rare
encoder failure issue in GF only setting.
Jingning Han [Tue, 13 Nov 2018 00:22:46 +0000 (16:22 -0800)]
Rescale arf bit budget calculation
To compute the total budget for a depth layer, exclude the count of
frames that have been allocated the bit budget. This improves the
avg PSNR by 0.15% and overall PSNR by 0.25% for lowres and midres
test sets.
Johann [Mon, 12 Nov 2018 19:30:03 +0000 (11:30 -0800)]
quantize: use aarch64 vmaxv
Simplify max value calculation on aarch64 by using vmaxv. Much
faster for 4x4 but diminishing returns as the block size grows.
Only the vp9 quantize has a speed test hooked up. Anticipate
similar results for the other quantize versions.
Before:
[ RUN ] NEON/VP9QuantizeTest.DISABLED_Speed/2
[ BENCH ] Bypass calculations 4x4 31.6 ms ( ±0.0 ms )
[ BENCH ] Full calculations 4x4 31.6 ms ( ±0.0 ms )
[ BENCH ] Bypass calculations 8x8 17.7 ms ( ±0.0 ms )
[ BENCH ] Full calculations 8x8 17.7 ms ( ±0.0 ms )
[ BENCH ] Bypass calculations 16x16 14.2 ms ( ±0.0 ms )
[ BENCH ] Full calculations 16x16 14.2 ms ( ±0.0 ms )
[ OK ] NEON/VP9QuantizeTest.DISABLED_Speed/2 (1906 ms)
[ RUN ] NEON/VP9QuantizeTest.DISABLED_Speed/3
[ BENCH ] Bypass calculations 32x32 18.6 ms ( ±0.0 ms )
[ BENCH ] Full calculations 32x32 18.6 ms ( ±0.0 ms )
After:
[ RUN ] NEON/VP9QuantizeTest.DISABLED_Speed/2
[ BENCH ] Bypass calculations 4x4 29.1 ms ( ±0.0 ms )
[ BENCH ] Full calculations 4x4 29.1 ms ( ±0.0 ms )
[ BENCH ] Bypass calculations 8x8 16.9 ms ( ±0.0 ms )
[ BENCH ] Full calculations 8x8 16.9 ms ( ±0.0 ms )
[ BENCH ] Bypass calculations 16x16 14.1 ms ( ±0.0 ms )
[ BENCH ] Full calculations 16x16 14.1 ms ( ±0.0 ms )
[ OK ] NEON/VP9QuantizeTest.DISABLED_Speed/2 (1803 ms)
[ RUN ] NEON/VP9QuantizeTest.DISABLED_Speed/3
[ BENCH ] Bypass calculations 32x32 18.6 ms ( ±0.0 ms )
[ BENCH ] Full calculations 32x32 18.6 ms ( ±0.0 ms )
Yaowu Xu [Wed, 7 Nov 2018 19:20:32 +0000 (11:20 -0800)]
Simplify rdmult computation
Recognizing that max dc_quant used in rdmult computation is 21387 and
21387 * 21387 * 88 / 24 is still within the range of int32_t, this
commit simplifies the computation with minor cleanups.
Marco Paniconi [Tue, 6 Nov 2018 01:49:39 +0000 (17:49 -0800)]
vp9 screen-content: Adjustments for screen content.
Increase search area, use NSTEP, and in some cases avoid
bsize below 16x16. This for base spatial layer when many blocks
in the frame have motion (from scene detection analysis).
Paul Wilkins [Fri, 26 Oct 2018 11:12:57 +0000 (12:12 +0100)]
Modified key frame detection.
Address poor key frame detection in some content.
This patch improves on poor key frame / scene cut detection observed
with some test content. The content in question was letter boxed film
style material and also had quite low contrast. For both 1080P and 4K
multiple genuine scene cuts were being missed.
The changes alter the conditions for marking a transition as a "flash" rather
than a scene change. The new code still deals well with genuine flashes as
observed in the "crew" test clip, without falsely flagging some of the
the scene cuts in the "film" test clip.
The new film test clip also had some "flash" frames caused by a lightning
effect and in one case a flash occurred right before a scene change. This
caused a misplacement of the key frame but has been addressed by a new
clause that requires the coded error for the next frame after a candidate
key frame to be lower than the current frame.
The patch also changes the way in which neutral blocks (similar inter and
inter error) are handled in the candidate key frame decision in a way which
hopefully handles the letter boxed format better.
During wider testing some film clips still had missed key frames but this
patch does improve things. In the case of the initial test clip the encoder
correctly marks all 3 scene cuts vs 0 before the patch.
Testing with our standard (mainly short single kf) derf and NF test clips
is neutral.
Johann [Thu, 1 Nov 2018 20:02:45 +0000 (13:02 -0700)]
vpx_codec_enc_config_default: disable 'usage'
Found with clang-tidy. This value is unused in libvpx.
There is an existing test which ensures this is not used:
test/encode_api_test.cc:
EXPECT_EQ(VPX_CODEC_INVALID_PARAM,
vpx_codec_enc_config_default(kCodecs[i], &cfg, 1));
Marco Paniconi [Mon, 5 Nov 2018 17:35:47 +0000 (09:35 -0800)]
vp8: Increase rate correction threshold for drop-overshoot
For 1 pass cbr encoding mode, with frame-dropping on:
increase the rate correction threshold for drop-overshoot detection,
to better capture cases of large overshoot.
Jerome Jiang [Tue, 7 Aug 2018 18:10:26 +0000 (11:10 -0700)]
vp8: fix to address overflow in decoder.
Can't call internal error from the decoder thread.
Add vpx_internal_error_info to MACROBLOCKD. When corrupted frame
detected, the decoder thread returns to its own context and signal
completion of decoding for current frame.
The main decoding thread will detect error too and return error code to
decoding API call.
Each thread will signal end of decoding of the frame. Main thread waits
for the signal of all other threads to start decoding next frame.