Marco [Fri, 16 Dec 2016 19:15:57 +0000 (11:15 -0800)]
vp9: Change condition to enable recheck_zeromv_after_denoising.
For when denoising enabled: change condition to enable
the recheck_zeromv_after_denoising for only very high noise level.
This is causing an issue, so enabling it for very high noise
to effectively shut it off.
Johann [Fri, 16 Dec 2016 20:19:00 +0000 (12:19 -0800)]
postproc test: disable new down and across test
The new test is causing valgrind failures:
[ RUN ] SSE2/VpxPostProcDownAndAcrossMbRowTest.CheckCvsAssembly/0
==28923== Invalid read of size 16
28923== at 0x724016: ??? (deblock_sse2.asm:146)
Disable during investigation. The test is new but the code is not.
Jim Bankoski [Fri, 16 Dec 2016 16:50:55 +0000 (08:50 -0800)]
vp8 : use threading mutex's for tsan only.
To avoid decode performance hit of 2% when running on hyperthreaded
cores.
This patch only uses the mutex's when we are running tsan.
This is safe because 32 bit operations like read and store are atomic
on all the platforms we care about. Tsan warns about race situations,
but in this case either situation ( read occurs before write or write
before read) the worst case is that we go around one extra time in the
loop. So the ordering doesn't really matter.
That said a few other things have been tried :
for instance as per here:
webrtc/base/atomicops.h#52
In this patch they use:
__atomic_load_n(i, __ATOMIC_ACQUIRE);
__atomic_store_n(i, value, __ATOMIC_RELEASE);
This code works on gcc, clang ( replacing protected write and read), and
avoids tsan errors. Incurring no penalty in performance. In C11 its
replaced by straight atomic operands.
However there is no equivalent in the visual studio's we support as
int32 on all windows platforms is already atomic. To avoid tsan like
warnings on windows we'd need to use interlocked exchange and the
end result doesn't gain us any thing.
Marco [Wed, 14 Dec 2016 22:08:09 +0000 (14:08 -0800)]
vp9: Fix to usage of flag USE_ALTREF_FOR_ONE_PASS
The flag USE_ALTREF_FOR_ONE_PASS allows for alt-ref lookahead
in 1 pass vbr (from https://chromium-review.googlesource.com/#/c/365498).
This change is to make sure this macro flag only has effect if
the config flag cpi->oxcf.enable_auto_altef is also on.
No change in ytlive encoding, as USE_ALTREF_FOR_ONE_PASS is not
yet enabled.
James Zern [Thu, 8 Dec 2016 21:02:30 +0000 (13:02 -0800)]
idct16x16_add_neon: fix arm visual studio builds
after: 2d3d95f enable vpx_idct16x16_256_add_neon in hbd builds
reorder INCLUDEs and fix indent of IF/ENDIFs
remove vpx_config.asm to avoid multiple symbol definitions in windows
builds and shift idct_neon.asm.S to the top to allow use of
CONFIG_VP9_HIGHBITDEPTH in the export list.
Yunqing Wang [Wed, 7 Dec 2016 18:00:36 +0000 (10:00 -0800)]
Remove an unused first pass statistic
One of the first pass stats "new_mv_count" is no longer used in VP9,
and is removed. This also makes it easy to implement a multi-threaded
first pass. This change doesn't affect the coding performance, which
has been verified by borg tests.
Linfeng Zhang [Wed, 7 Dec 2016 19:34:00 +0000 (11:34 -0800)]
Update TEST_P(PartialIDctTest, RunQuantCheck)
1. Use correct projections when copying real dct/quant outputs.
2. Remove local random number generator and combine loops.
3. Quantization with minimum allowed step sizes instead of maximum.
This may generate larger inputs.
James Zern [Wed, 30 Nov 2016 03:47:50 +0000 (19:47 -0800)]
idct16x16,NEON: rm output_stride from pass1 fns
vpx_idct16x16_256_add_neon_pass1, vpx_idct16x16_10_add_neon:
this was a constant 8 in all cases meaning the results are stored
contiguously, this allows the number of stores to be reduced.
Marco [Mon, 5 Dec 2016 20:05:35 +0000 (12:05 -0800)]
vp9: Adjust the weight factor for segment rate cost for aq-mode=3.
Use the segment weight factor based on the target (cr->percent_refresh)
if it less than the current estimate (avergae of past usage and target).
Small improvement at low bitrates.
James Zern [Fri, 25 Nov 2016 01:51:10 +0000 (17:51 -0800)]
build/make/Android.mk: correct rtcd template var refs
the expansion of findstring and rtcd_dep_template_CONFIG_ASM_ABIS needs
to be deferred until the block is parsed as makefile syntax rather than
eval time where rtcd_dep_template_CONFIG_ASM_ABIS will be unset. this
ensures vpx_config.asm is properly created.
* changes:
Update vpx_idct4x4_16_add_neon() to pass SingleExtremeCoeff test
Refine 8-bit 4x4 idct NEON intrinsics
Add idct speed test.
Update partial_idct_test.cc to support high bitdepth
James Zern [Thu, 24 Nov 2016 00:49:19 +0000 (16:49 -0800)]
Android.mk,armv7: fix idct_neon.asm.S creation
force this to be created before any other .S files. this change
additionally removes the file from the source list as it doesn't need to
be compiled on its own.
Marco [Tue, 22 Nov 2016 00:37:32 +0000 (16:37 -0800)]
vp9: Adjust cyclic refresh parameters for low bitrates.
Increase the motion threshold and qp-delta for segment#2 boost.
This can increase the frame-drop at low bitrates, but generally
better spatial quality.
Only affects real-time mode with aq-mode=3, at very low bitrates.
Jerome Jiang [Sat, 19 Nov 2016 01:07:20 +0000 (17:07 -0800)]
Cover more filter levels in unit tests for post proc.
For some filter level, the C/MSA doesn't match SSE2. Part of unit tests
are disabled. They will be re-enabled when C/MSA funcs are fixed.
BUG=webm:1321
James Zern [Wed, 23 Nov 2016 07:03:12 +0000 (23:03 -0800)]
use storage.googleapis for testdata download
replace downloads.webmproject.org with the canonical
storage.googleapis.com/... form. this appears less likely to fail when
dealing with multiple concurrent connections.
Marco [Tue, 22 Nov 2016 18:10:06 +0000 (10:10 -0800)]
vp9: Use more aggressive skip when short_circuit_low_temp_var = 1.
Use the same feature as https://chromium-review.googlesource.com/#/c/411327/,
but allow it to be used for speed = 6 and 7, where
short_circuit_low_temp_var = 1.
Speed up of ~2-3% for speed 7, with little/no loss in compression.
Jim Bankoski [Tue, 22 Nov 2016 15:31:04 +0000 (07:31 -0800)]
vp9-tests : split VpxEncoderThreadTest into two tests.
VpxEncoderThreadTest was taking a very long time for some runs and
timing out a lot. This is an attempt to split the test into runs
that can be run nightly ( speeds 2 through 9) and runs that can
be run weekly ( speeds 0-1 ).
Yaowu Xu [Mon, 21 Nov 2016 18:49:56 +0000 (10:49 -0800)]
Add validation of frame_parallel_decoding_mode
This is a boolean value that is written into bitstream, any value other
than 0 or 1 could have led to unexpected behavior. This commit fix the
issue by adding validation of the value to make sure it is boolean.