James Zern [Tue, 8 Nov 2016 04:22:22 +0000 (20:22 -0800)]
enable vpx_idct32x32_34_add_neon in hbd builds
replace load_and_transpose_s16_8x8() in idct32_6_neon() with a separate
load_tran_low_to_s16() and transpose_s16_8x8(). the combined function is
used in idct32_8_neon() where the input is the correctly sized output
from the earlier stage.
Marco [Thu, 3 Nov 2016 17:21:12 +0000 (10:21 -0700)]
vp9: Non-rd pickmode: fix logic in reference masking.
Add condition that usable_ref_frame > LAST.
This is to avoid potentially skipping all last-nonzero mv modes,
if golden is used as a reference but skipped completely for the
current block.
This has no effect currenty, as we always consider testing golden
mode for each block.
James Zern [Wed, 2 Nov 2016 01:45:50 +0000 (18:45 -0700)]
vp9,tile_worker_hook: correctly set jmp target
vp9_init_macroblockd() resets the error_info to cm's global copy; this
needs to be set to the thread-level target to avoid jumping to the
incorrect stack, resulting in hang or crash.
broken since: 1f4a6c8 vp9/tile_worker_hook: add multiple tile decoding
includes v1.5.0, v1.6.0
James Zern [Tue, 18 Oct 2016 19:30:43 +0000 (12:30 -0700)]
idct,NEON: add a tran_low_t->s16 load adapter
enable idct4x4* and idct8x8* which are compatible for 8-bit decodes in
high-bitdepth mode. the adapter narrows 32-bit input to 16, whether the
expansion can be avoided at all in this case remains a TODO. roughly
matches sse2.
Jim Bankoski [Fri, 28 Oct 2016 12:53:26 +0000 (05:53 -0700)]
vpxdec.c : don't double count corrupted frames
A past patch made it so that every frame that had a decode error
caused a corrupted frame to be counted. Unfortunately it was possible
to get both a decode error and a corrupt frame for the same frame
and thus double count an error. This code makes that impossible.
Peter Boström [Fri, 28 Oct 2016 18:50:20 +0000 (14:50 -0400)]
Add temporal-layer support to tiny_ssim.
Permits skipping 0, 1/2 or 3/4 of the frames, corresponding to
temporal layers 2, 1 and 0 of a 3-temporal-layer encoding. 1/2
corresponds to TL0 in a 2-layer encoding.
Paul Wilkins [Wed, 12 Oct 2016 19:50:08 +0000 (20:50 +0100)]
Change to KF boost calculation.
This change is a step in a larger change to the way boost and interval are
determined for ARF and Key frames.
This patch contains some pluming for the general case but focuses on the
key frame boost calculation. This now relies more heavily on the rate at
which the error score increases between the primary and secondary reference
frame. This seems to be less fragile when dealing with different frame sizes.
For example larger image formats tend in the first pass to see a higher
% of intra coded blocks and the use of this number in calculating the frame
decay factor was leading to much lower boost numbers for 4K, for example,
than the same clip coded at 2K.
This change does give overall gains but they are MUCH larger for the 4K Netflix
set. For the 4K Netflix set the average gain is around 3% with some clips > 20%
whereas for the same set at 2K the average gain is 0.5-1%.
In general for small image formats the boost is most often reduced a little whereas
4K clips the boost is increased. There are some -ve cases such as Akiyo at 352x288
where the reduced boost hurts the metrics, especially for SSIM, even while
the set as a whole improves. This is most notable at very low Q and may be the
subject of a future patch.
Some common code for KF and ARF was separated in this patch for the purposes of
tuning but may later be re-merged if appropriate.
Johann [Thu, 27 Oct 2016 04:24:46 +0000 (21:24 -0700)]
partial_idct_test: add _add_ test
The result of the transform is added to the destination buffers. In the
existing tests the destination buffer is always empty so that portion of
the code was never exercised.
Yunqing Wang [Tue, 25 Oct 2016 17:47:21 +0000 (10:47 -0700)]
Modify the encoder multi-thread unit test
Modified the encoder multi-thread test so that it included cpu-used=0 and
frame-parallel=0.
frame_parallel_decoding_mode is 1 by default, which disables probability
updating and gives lower encoding quality. Current VP9 multi-threading
encoder and decoder support probability updating. To test this part, we
should turn on it in the unit test, namely, setting frame-parallel to 0.
Yunqing Wang [Tue, 25 Oct 2016 16:00:58 +0000 (09:00 -0700)]
Change 2 motion search counts to be tile data
This patch modified the motion search counts used in:
https://chromium-review.googlesource.com/#/c/305640/
These 2 counts were originally added as thread data, and used to
make decisions in motion search. The tile encoding order can be
inconsistent while using different number of threads, which can
cause bitstream mismatch. Here moved them to tile data to solve
the issue.
Re-use the tile worker threads to pack the bitstream in parallel
on a per-tile basis. Restricting this to real-time only for now
(further testing is needed to ensure this does not make 2-pass
worse in any case).
Marco [Fri, 21 Oct 2016 18:11:34 +0000 (11:11 -0700)]
vp9: Nonrd variance partition: increase threshold for using 4x4 avg.
In variance partition low resolutions may use varianace based on
4x4 average for better partitioning.
Increase the threshold for doing this at speed = 8.
Improves speed by ~5%, with little loss, < 1%, on RTC_derf set.
James Zern [Thu, 20 Oct 2016 04:04:12 +0000 (21:04 -0700)]
remove idct32x32*_add_neon.asm
the intrinsics are neutral to ~20% faster on cros/android
devices when using gcc-4.9/clang-3.8.1 and gcc-4.9/clang-3.8.x from the
r13 ndk. neutral results typically came with gcc-4.9 while larger
positive gains were achieved with clang 3.8.x.