]> granicus.if.org Git - libvpx/log
libvpx
7 years agovp9: Adjust some speed settings for speed 8.
Marco [Wed, 22 Mar 2017 19:15:06 +0000 (12:15 -0700)]
vp9: Adjust some speed settings for speed 8.

Allow for simple_block_rd for VGA resoln, and reduce
adaptive_rd_thresh to 1.

On average no loss on RTC set, ~4% speedup on mac.

Change-Id: Ib549c4061c853776062b5e34040f839d470fbebc

7 years agoMerge "vp9: Modify datarate tests to cover denoising with multi-threading."
Marco Paniconi [Tue, 21 Mar 2017 23:44:05 +0000 (23:44 +0000)]
Merge "vp9: Modify datarate tests to cover denoising with multi-threading."

7 years agoMerge "Fix the data race caused by vp9 denoiser."
Jerome Jiang [Tue, 21 Mar 2017 23:27:48 +0000 (23:27 +0000)]
Merge "Fix the data race caused by vp9 denoiser."

7 years agovp9: Modify datarate tests to cover denoising with multi-threading.
Marco [Tue, 21 Mar 2017 05:15:13 +0000 (22:15 -0700)]
vp9: Modify datarate tests to cover denoising with multi-threading.

Change-Id: I6ed48a630edf9923c25a05deaca50e0afec43918

7 years agoFix the data race caused by vp9 denoiser.
Jerome Jiang [Tue, 21 Mar 2017 22:33:42 +0000 (15:33 -0700)]
Fix the data race caused by vp9 denoiser.

BUG=webm:1391

Change-Id: I9669ae62fe9c695d4c6f9973094cb0f39bed51c7

7 years agoMerge "Make butterfly_self() signature consistent with butterfly()"
Yi Luo [Tue, 21 Mar 2017 22:32:20 +0000 (22:32 +0000)]
Merge "Make butterfly_self() signature consistent with butterfly()"

7 years agoCode refactoring in the partition search
Yunqing Wang [Tue, 21 Mar 2017 16:47:55 +0000 (09:47 -0700)]
Code refactoring in the partition search

Computed the partition search early termination score in a separate
function.

Change-Id: I1894b517ff179a38b1c05e054d373ac4b7f4cbb4

7 years agoMake butterfly_self() signature consistent with butterfly()
Yi Luo [Tue, 21 Mar 2017 00:18:10 +0000 (17:18 -0700)]
Make butterfly_self() signature consistent with butterfly()

- Refer to patch: 48fca113d inv_txfm_ssse3,butterfly: fix win32 abi
  compatibility.
- Change four butterfly() calls to butterfly_self(), to simplify the
  operations.

Change-Id: Ib2a8cfe6cddcaf0a59e6e6270d8380055ea42ef3

7 years agoMerge "Add vpx_highbd_idct32x32_1024_add_neon()"
James Zern [Tue, 21 Mar 2017 03:27:35 +0000 (03:27 +0000)]
Merge "Add vpx_highbd_idct32x32_1024_add_neon()"

7 years agoMerge "Add vpx_highbd_idct32x32_34_add_neon()"
James Zern [Tue, 21 Mar 2017 03:02:50 +0000 (03:02 +0000)]
Merge "Add vpx_highbd_idct32x32_34_add_neon()"

7 years agoMerge "vp9: Nonrd variance partition: improve split to 16x16."
Marco Paniconi [Tue, 21 Mar 2017 00:17:35 +0000 (00:17 +0000)]
Merge "vp9: Nonrd variance partition: improve split to 16x16."

7 years agoMerge "Record the sum of tx block eobs in the partition block"
Yunqing Wang [Mon, 20 Mar 2017 23:20:12 +0000 (23:20 +0000)]
Merge "Record the sum of tx block eobs in the partition block"

7 years agovp9: Nonrd variance partition: improve split to 16x16.
Marco [Mon, 20 Mar 2017 16:16:23 +0000 (09:16 -0700)]
vp9: Nonrd variance partition: improve split to 16x16.

Add additional condition to split to 16x16, for resolutions <= 360p,
reduces dragging artifact near moving boundary.

Small/no change on RTC metrics.

Change-Id: I314694f2166435d918f74e7ab42f002b07f40dae

7 years agoMerge "vp9: Use sb content measure to bias against golden."
Marco Paniconi [Mon, 20 Mar 2017 21:35:11 +0000 (21:35 +0000)]
Merge "vp9: Use sb content measure to bias against golden."

7 years agovp9: Use sb content measure to bias against golden.
Marco [Thu, 16 Mar 2017 22:55:33 +0000 (15:55 -0700)]
vp9: Use sb content measure to bias against golden.

For each superblock, keep track of how far from current frame
was the last significant content change, and use that (along
with GF distance), to turnoff GF search in non-rd pickmode.

Only enabled for speed >= 8.

avgPNSR on RTC/RTC_derf down by ~0.9/1.2.
Speedup on mac: ~3-5%.
Speedup on arm: 3.6% for VGA and 4.4% for HD.

Change-Id: Ic3f3d6a2af650aca6ba0064d2b1db8d48c035ac7

7 years agoMerge "temporal filter test: update types"
Johann Koenig [Mon, 20 Mar 2017 19:05:54 +0000 (19:05 +0000)]
Merge "temporal filter test: update types"

7 years agoRecord the sum of tx block eobs in the partition block
Yunqing Wang [Thu, 16 Mar 2017 22:45:07 +0000 (15:45 -0700)]
Record the sum of tx block eobs in the partition block

The sum of tx bloxk eobs is needed in the machine learning based partition
early termination. The eobs are first accumulated during tx search, and
then the value associated with the best tx_size is copied to ctx for later
use.

After the sum of eobs are calculated correctly, re-enabled
ml_partition_search_early_termination speed feature.

Re-did the quality/speed test to check the impact of the fix.

1. Borg test BDRATE result:
4k set:     PSNR: +0.183%; SSIM: +0.100%;
hdres set:  PSNR: +0.168%; SSIM: +0.256%;
midres set: PSNR: +0.186%; SSIM: +0.326%;

2.Average speed gain result:
4k clips: 21%;
hd clips: 26%;
midres clips: 15%.

The result is in line with the original result.

Change-Id: I4209a95c89be03b4cbfb6a95b16885f89feddbda

7 years agoMerge "inv_txfm_sse2: clear conversion warning in hbd build"
James Zern [Fri, 17 Mar 2017 21:48:20 +0000 (21:48 +0000)]
Merge "inv_txfm_sse2: clear conversion warning in hbd build"

7 years agotemporal filter test: update types
Johann [Fri, 17 Mar 2017 20:19:45 +0000 (13:19 -0700)]
temporal filter test: update types

Use 'int' for w/h since it is that way everywhere else.

Pass Buffer pointers

Change-Id: I9eef6890af657baba171c6bcfcc85fc976173399

7 years agoMerge "test: add vp9_temporal_filter_apply test"
Johann Koenig [Fri, 17 Mar 2017 18:18:05 +0000 (18:18 +0000)]
Merge "test: add vp9_temporal_filter_apply test"

7 years agoMerge "vp9_optimize_b: Combine extrabits cost with token lookup"
Alex Converse [Fri, 17 Mar 2017 16:18:20 +0000 (16:18 +0000)]
Merge "vp9_optimize_b: Combine extrabits cost with token lookup"

7 years agoinv_txfm_sse2: clear conversion warning in hbd build
James Zern [Fri, 17 Mar 2017 08:16:38 +0000 (01:16 -0700)]
inv_txfm_sse2: clear conversion warning in hbd build

tran_high -> tran_low in return from dct_const_round_shift()

Change-Id: I2fe06c4b604823b1d1fe40a487017c3c2819a440

7 years agoAdd vpx_highbd_idct32x32_1024_add_neon()
Linfeng Zhang [Wed, 15 Mar 2017 18:31:35 +0000 (11:31 -0700)]
Add vpx_highbd_idct32x32_1024_add_neon()

BUG=webm:1301

Change-Id: Ib90af0c1712e56b301d0e981dbe9a641e15e36ca

7 years agoAdd vpx_highbd_idct32x32_34_add_neon()
Linfeng Zhang [Tue, 14 Mar 2017 21:07:25 +0000 (14:07 -0700)]
Add vpx_highbd_idct32x32_34_add_neon()

BUG=webm:1301

Change-Id: I74dd16c6c64e7bb71aa991cedccddf0663ef5e06

7 years agoMerge "Add vpx_highbd_idct32x32_135_add_neon()"
James Zern [Fri, 17 Mar 2017 07:26:52 +0000 (07:26 +0000)]
Merge "Add vpx_highbd_idct32x32_135_add_neon()"

7 years agoAdd vpx_highbd_idct32x32_135_add_neon()
Linfeng Zhang [Tue, 14 Mar 2017 17:16:35 +0000 (10:16 -0700)]
Add vpx_highbd_idct32x32_135_add_neon()

BUG=webm:1301

Change-Id: I58c2d65d385080711c3666d6d8f9d241dac7b21a

7 years agoMerge "Clean vpx_idct32x32_1024_add_neon()"
James Zern [Fri, 17 Mar 2017 05:24:57 +0000 (05:24 +0000)]
Merge "Clean vpx_idct32x32_1024_add_neon()"

7 years agovp9: Fix speed 8 condition for enabling copy_partition.
Marco [Fri, 17 Mar 2017 00:05:42 +0000 (17:05 -0700)]
vp9: Fix speed 8 condition for enabling copy_partition.

Change-Id: I2c090e6ba853a30fef1957b620853315f9471753

7 years agovp9_optimize_b: Combine extrabits cost with token lookup
Alex Converse [Thu, 16 Mar 2017 23:34:26 +0000 (16:34 -0700)]
vp9_optimize_b: Combine extrabits cost with token lookup

About 0.6% fewer cycles spent in vp9_optimize_b.

Change-Id: I2ae62a78374c594ed81d4e3100a5848e2f6f2c4e

7 years agoAdd a vector form of routine vp9_model_rd_from_var_lapndz
Gabriel Marin [Wed, 14 Dec 2016 20:07:34 +0000 (12:07 -0800)]
Add a vector form of routine vp9_model_rd_from_var_lapndz

Add routine vp9_model_rd_from_var_lapndz_vec and call it from model_rd_for_sb
to model the rate and distortion for MAX_MB_PLANE Laplacian sources in
parallel. The caller ensures that all sources have non-zero variance.

Measured a 18% to 25% reduction in retired instructions, and 17% to 24%
reduction in instruction execution cost with different compilers for the
Laplacian modeling.

No change in behavior.

TEST=Verified that encoded files match bit for bit, with and without this
change.
BUG=b/33678225

Change-Id: I6b76947f21c659a349adb896e13e99f6e3f951e6

7 years agoMerge "vp9: Fixes in non-rd pickmode for denoising with SVC."
Marco Paniconi [Thu, 16 Mar 2017 21:53:38 +0000 (21:53 +0000)]
Merge "vp9: Fixes in non-rd pickmode for denoising with SVC."

7 years agoMerge "Remove ppc-linux-gcc target"
Johann Koenig [Thu, 16 Mar 2017 21:53:17 +0000 (21:53 +0000)]
Merge "Remove ppc-linux-gcc target"

7 years agoMerge "Add Hadamard for Power8"
Johann Koenig [Thu, 16 Mar 2017 21:52:15 +0000 (21:52 +0000)]
Merge "Add Hadamard for Power8"

7 years agovp9: Fixes in non-rd pickmode for denoising with SVC.
Marco [Thu, 16 Mar 2017 19:47:44 +0000 (12:47 -0700)]
vp9: Fixes in non-rd pickmode for denoising with SVC.

Don't denoise spatial layer frames whose base layer is a key frame.

Disallow golden reference for SVC with denoising on frames
that will be denoised (highest layer), as this removes bad artifact.
Will re-enable when issue is resolved.

Change-Id: I87a6597812330500966458172acfce54af65f70f

7 years agovpx_codec.h: include vpx/*.h -> ./*.h
Marco [Tue, 14 Mar 2017 17:38:50 +0000 (10:38 -0700)]
vpx_codec.h: include vpx/*.h -> ./*.h

This matches the other includes and also fixes a compile issue in
chromium.

Change-Id: I45e00a1454f7ed948aa3b96b04cc5946b1d02985

7 years agoMerge "Refactor: Change cpi->resize_state to enum values."
Jerome Jiang [Thu, 16 Mar 2017 16:43:41 +0000 (16:43 +0000)]
Merge "Refactor: Change cpi->resize_state to enum values."

7 years agoMerge "vp8: Fix compiler warning in vp8 pickinter.c"
Marco Paniconi [Thu, 16 Mar 2017 05:13:38 +0000 (05:13 +0000)]
Merge "vp8: Fix compiler warning in vp8 pickinter.c"

7 years agoAdd Hadamard for Power8
Rafael de Lucena Valle [Thu, 20 Oct 2016 00:21:09 +0000 (22:21 -0200)]
Add Hadamard for Power8

Change-Id: I3b4b043c1402b4100653ace4869847e030861b18
Signed-off-by: Rafael de Lucena Valle <rafaeldelucena@gmail.com>
7 years agoMerge "vp9: Fix some issues with denoiser and SVC."
Marco Paniconi [Thu, 16 Mar 2017 02:42:55 +0000 (02:42 +0000)]
Merge "vp9: Fix some issues with denoiser and SVC."

7 years agovp9: Fix some issues with denoiser and SVC.
Marco [Wed, 15 Mar 2017 23:51:34 +0000 (16:51 -0700)]
vp9: Fix some issues with denoiser and SVC.

Fix the update of the denoiser buffer when the base
spatial layer is a key frame. And allow for better/lower
QP on high spatial layers when their base layer is key frame.

Change-Id: I96b2426f1eaa43b8b8d4c31a68b0c6d68c3024a2

7 years agoRefactor: Change cpi->resize_state to enum values.
Jerome Jiang [Mon, 13 Mar 2017 21:08:32 +0000 (14:08 -0700)]
Refactor: Change cpi->resize_state to enum values.

Change-Id: Iab1409b0fc1175bc5a14afc4749a08c536c98c41

7 years agovp9: Turn off ml_partition_search_early_termination.
Marco [Wed, 15 Mar 2017 20:44:26 +0000 (13:44 -0700)]
vp9: Turn off ml_partition_search_early_termination.

Fails on nightly ubsan, valgrind tests.
Enabled on commit:6701014

Change-Id: Ied3f5cb38e39cba54ac134f4514107cdfdfce159

7 years agovp8: Fix compiler warning in vp8 pickinter.c
Marco [Wed, 15 Mar 2017 18:44:07 +0000 (11:44 -0700)]
vp8: Fix compiler warning in vp8 pickinter.c

Change-Id: I0e5714538fe53d885a2201d808846901ae8fc288

7 years agoClean vpx_idct32x32_1024_add_neon()
Linfeng Zhang [Tue, 14 Mar 2017 22:14:34 +0000 (15:14 -0700)]
Clean vpx_idct32x32_1024_add_neon()

Change-Id: I05921e16d6a3e4e7e5b00a90624735050a186636

7 years agoMerge "Improve idct32x32_1024_add SSSE3 intrinsics performance"
Yi Luo [Wed, 15 Mar 2017 02:32:52 +0000 (02:32 +0000)]
Merge "Improve idct32x32_1024_add SSSE3 intrinsics performance"

7 years agoMerge "Fix overflow issue in 32x32 idct NEON intrinsics"
Linfeng Zhang [Wed, 15 Mar 2017 00:38:17 +0000 (00:38 +0000)]
Merge "Fix overflow issue in 32x32 idct NEON intrinsics"

7 years agoMerge "vp9: Using source sad for speedup for dynamic resizing."
Jerome Jiang [Wed, 15 Mar 2017 00:03:52 +0000 (00:03 +0000)]
Merge "vp9: Using source sad for speedup for dynamic resizing."

7 years agoFix overflow issue in 32x32 idct NEON intrinsics
Linfeng Zhang [Tue, 14 Mar 2017 16:31:52 +0000 (09:31 -0700)]
Fix overflow issue in 32x32 idct NEON intrinsics

Similar issue as Change bc1c18e.

The PartialIDctTest.ResultsMatch test on vpx_idct32x32_135_add_neon()
in high bit-depth mode exposes 16-bit overflow in final stage of pass
2, when changing the test number from 1,000 to 1,000,000.

Change to use saturating add/sub for vpx_idct32x32_34_add_neon(),
vpx_idct32x32_135_add_neon and vpx_idct32x32_1024_add_neon() in high
bit-depth mode.

Change-Id: Iaec0e9aeab41a3fdb4e170d7e9b3ad1fda922f6f

7 years agoMerge "vp9: Enable row multithreading for SVC in real-time mode."
Jerome Jiang [Tue, 14 Mar 2017 23:29:46 +0000 (23:29 +0000)]
Merge "vp9: Enable row multithreading for SVC in real-time mode."

7 years agovp9: Using source sad for speedup for dynamic resizing.
Jerome Jiang [Mon, 13 Mar 2017 22:27:02 +0000 (15:27 -0700)]
vp9: Using source sad for speedup for dynamic resizing.

Only for speed >= 7.

Change-Id: I3ac85fbb4023cf7e6f8333806b345b0174382a09

7 years agoImprove idct32x32_1024_add SSSE3 intrinsics performance
Yi Luo [Mon, 6 Mar 2017 23:11:49 +0000 (15:11 -0800)]
Improve idct32x32_1024_add SSSE3 intrinsics performance

- Function level speed improves ~12%.

Change-Id: I9b7dbddabf08c7d0f6b25264e6074d5ccbe39290

7 years agoMerge "vp9/encoder: fix segfault on win32 using vs < 2015"
James Zern [Tue, 14 Mar 2017 19:21:42 +0000 (19:21 +0000)]
Merge "vp9/encoder: fix segfault on win32 using vs < 2015"

7 years agoMerge "Apply machine learning-based early termination in VP9 partition search"
Yunqing Wang [Tue, 14 Mar 2017 18:07:05 +0000 (18:07 +0000)]
Merge "Apply machine learning-based early termination in VP9 partition search"

7 years agoMerge "vp9: Speed >= 8: Enable simple_block_yrd speed feature."
Marco Paniconi [Tue, 14 Mar 2017 17:50:17 +0000 (17:50 +0000)]
Merge "vp9: Speed >= 8: Enable simple_block_yrd speed feature."

7 years agovp9: Adjust copy partition threshold, for speed 8.
Marco [Tue, 14 Mar 2017 16:17:06 +0000 (09:17 -0700)]
vp9: Adjust copy partition threshold, for speed 8.

Reduce it from 5 to 4, small/no change in metrics or speed.
Small reduction in dragging artifact near moving head.

Change-Id: Ic3bc5ca67c70bf0c89fc2ed14454840a28ae5b6a

7 years agovp9: Speed >= 8: Enable simple_block_yrd speed feature.
Marco [Mon, 13 Mar 2017 05:38:52 +0000 (22:38 -0700)]
vp9: Speed >= 8: Enable simple_block_yrd speed feature.

Enable speed feature for resolutions > VGA.
avgPSNR on RTC down by ~1.7%.
Speedup on ARM: ~5%.

Change-Id: I7a3fe5f7425aa8df3f4a2eced1afa355bc0d4c95

7 years agotest: add vp9_temporal_filter_apply test
Johann [Mon, 13 Mar 2017 21:54:38 +0000 (14:54 -0700)]
test: add vp9_temporal_filter_apply test

Add an independent implementation of the filter.

BUG=webm:1379

Change-Id: I309c459b493c3011273b78b127a786bb23c59f9c

7 years agoMerge "vp9: Fix to source_sad feature for SVC."
Marco Paniconi [Mon, 13 Mar 2017 19:18:30 +0000 (19:18 +0000)]
Merge "vp9: Fix to source_sad feature for SVC."

7 years agoMerge "Add vpx_highbd_idct32x32_135_add_c()"
Linfeng Zhang [Mon, 13 Mar 2017 18:49:01 +0000 (18:49 +0000)]
Merge "Add vpx_highbd_idct32x32_135_add_c()"

7 years agovp9: Fix to source_sad feature for SVC.
Marco [Wed, 8 Mar 2017 18:57:48 +0000 (10:57 -0800)]
vp9: Fix to source_sad feature for SVC.

Allow speed feature sf->use_source_sad to be used
on highest spatial layer for SVC.

Change-Id: I260eb0478902764f49f83e43b17024fe86ff3b22

7 years agoApply machine learning-based early termination in VP9 partition search
Yunqing Wang [Mon, 27 Feb 2017 22:26:15 +0000 (14:26 -0800)]
Apply machine learning-based early termination in VP9 partition search

This patch was based on Yang Xian's intern project code. Further modifications
were done.
1. Moved machine-learning related parameters into the context structure.
2. Corrected the calculation of sum_eobs.
3. Removed unused parameters and calculations.
4. Made it work with multiple tiles.
5. Added a speed feature for the machine-learning based partition search
early termination.
6. Re-organized the code.

The patch was rebased to the top-of-tree.

Borg test BDRATE result:
4k set:     PSNR: +0.144%; SSIM: +0.043%;
hdres set:  PSNR: +0.149%; SSIM: +0.269%;
midres set: PSNR: +0.127%; SSIM: +0.257%;

Average speed gain result:
4k clips: 22%;
hd clips: 23%;
midres clips: 15%.

Change-Id: I0220e93a8277e6a7ea4b2c34b605966e3b1584ac

7 years agoMerge "vp9: Fix condition for intra search in non-rd pickmode."
Marco Paniconi [Mon, 13 Mar 2017 06:11:12 +0000 (06:11 +0000)]
Merge "vp9: Fix condition for intra search in non-rd pickmode."

7 years agovp9: Fix condition for intra search in non-rd pickmode.
Marco [Sat, 11 Mar 2017 06:50:43 +0000 (22:50 -0800)]
vp9: Fix condition for intra search in non-rd pickmode.

Fixes an issue when the LAST and golden is not used as a reference,
in which case its possible no encoding mode is set (since intra may be
skipped under certain codtions). Fix is to make sure intra is searched
if no inter mode is checked.

Issue can happen for temporal layer pattern#7 in vpx_temporal_svc_encoder.c

Change-Id: I5ab4999b2f9dbd739044888e0916b5ec491d966b

7 years agoinv_txfm_ssse3,butterfly: fix win32 abi compatibility
James Zern [Fri, 10 Mar 2017 07:29:54 +0000 (23:29 -0800)]
inv_txfm_ssse3,butterfly: fix win32 abi compatibility

only the first 3 parameters can be aligned to 16 as required by __m128i,
make them all pointers for consistency.

since:
07c48ccfe Improve idct32x32_34_add SSSE3 intrinsics performance

BUG=webm:1384

Change-Id: I0324f701e723a27cb470036a180693ba8829d01d

7 years agovp9/encoder: fix segfault on win32 using vs < 2015
James Zern [Fri, 10 Mar 2017 07:36:11 +0000 (23:36 -0800)]
vp9/encoder: fix segfault on win32 using vs < 2015

shift the bsse[] member of the macroblock struct to the front to avoid
an incorrect offset (0) to the upper half of bsse[0] which leads to a
negative resulting in a crash. restrict this to visual studio versions
before 2015 (the bug was observed with 2013, fixed in 2015) to avoid any
potential cache impact on other platforms.

https://connect.microsoft.com/VisualStudio/feedback/details/2396360/bad-structure-offset-in-32-bit-code

BUG=webm:1054

Change-Id: I40f68a1d421ccc503cc712192263bab4f7dde076

7 years agoMerge "vp9: Sample encoder vpx_temporal_svc_encoder: enable row-mt"
Marco Paniconi [Fri, 10 Mar 2017 18:26:06 +0000 (18:26 +0000)]
Merge "vp9: Sample encoder vpx_temporal_svc_encoder: enable row-mt"

7 years agovp9: Sample encoder vpx_temporal_svc_encoder: enable row-mt
Marco [Fri, 10 Mar 2017 16:46:23 +0000 (08:46 -0800)]
vp9: Sample encoder vpx_temporal_svc_encoder: enable row-mt

Enable row-mt in the sample encoder vpx_temporal_svc_encoder.c,
under certain condiitons.

Change-Id: Ic103ee81a9d80be5bf6e5778cc21fc3199db909d

7 years agoMerge "Improve idct32x32_135_add SSSE3 intrinsics performance"
Yi Luo [Fri, 10 Mar 2017 17:14:30 +0000 (17:14 +0000)]
Merge "Improve idct32x32_135_add SSSE3 intrinsics performance"

7 years agovp9: Enable row multithreading for SVC in real-time mode.
Marco [Tue, 7 Mar 2017 22:32:30 +0000 (14:32 -0800)]
vp9: Enable row multithreading for SVC in real-time mode.

Enable row-mt for SVC for real-time mode, speed >=5.

Add the controls to the sample encoders, but keep it off for now.
Add the control and enable it for the 1 pass CBR unittests.

For speed 7, 3 layer SVC, 2 threads, row-mt enabled gives about ~5% speedup.

Change-Id: Ie8e77323c17263e3e7a7b9858aec12a3a93ec0c1

7 years agoImprove idct32x32_135_add SSSE3 intrinsics performance
Yi Luo [Fri, 3 Mar 2017 00:52:41 +0000 (16:52 -0800)]
Improve idct32x32_135_add SSSE3 intrinsics performance

- Split the inv txfm into three parts to avoid stack spillover.
- Function level speed improves ~12%.
- Use function and macro to remove some repeated code.

Change-Id: I14f5f072334fd766808cb52bf648df792e7379ee

7 years agoMerge "ppc: include ppc.h for ppc_simd_caps()"
Johann Koenig [Thu, 9 Mar 2017 23:12:36 +0000 (23:12 +0000)]
Merge "ppc: include ppc.h for ppc_simd_caps()"

7 years agoMerge "move vp9_scale_and_extend_frame_c to vp9_frame_scale.c"
James Zern [Thu, 9 Mar 2017 22:51:08 +0000 (22:51 +0000)]
Merge "move vp9_scale_and_extend_frame_c to vp9_frame_scale.c"

7 years agoRemove ppc-linux-gcc target
Johann [Thu, 9 Mar 2017 19:33:33 +0000 (11:33 -0800)]
Remove ppc-linux-gcc target

Change-Id: Iec2430966f54e2e5ba79f6bb703f47adde46479f

7 years agoppc: include ppc.h for ppc_simd_caps()
Johann [Thu, 9 Mar 2017 17:26:45 +0000 (09:26 -0800)]
ppc: include ppc.h for ppc_simd_caps()

Change-Id: Idc829eb066cf4e905d062cb9c08424e0f1b7e1a7

7 years agomove vp9_scale_and_extend_frame_c to vp9_frame_scale.c
James Zern [Thu, 9 Mar 2017 04:42:35 +0000 (20:42 -0800)]
move vp9_scale_and_extend_frame_c to vp9_frame_scale.c

this is similar to the x86 configuration and helps mitigate an issue
with a circular dependency between this function and the ssse3 variant
causing an outsized increase in binary size (~300K for chrome)
chrome.dll:
.text 255B000 -> 252B000
.data 7B000 -> 75000
-221184 bytes

BUG=chromium:697956

Change-Id: Ic95b142ecd62dd4f1795788aa27dd8fab59b708c

7 years agoMerge "vp9: Enable two speed features for SVC real-time mode."
Marco Paniconi [Thu, 9 Mar 2017 03:58:14 +0000 (03:58 +0000)]
Merge "vp9: Enable two speed features for SVC real-time mode."

7 years agovp9: Enable two speed features for SVC real-time mode.
Marco [Thu, 9 Mar 2017 00:10:45 +0000 (16:10 -0800)]
vp9: Enable two speed features for SVC real-time mode.

Enable short_circuit_low_temp_var and limit_newmv_early_exit
for SVC, 1 pass CBR mode.

Change-Id: I77df2b2c6cc40657bb8ea76e19dfc2fdaad6389e

7 years agovp9: Add control to vpx_temporal_svc_encoder for row-mt.
Marco [Thu, 9 Mar 2017 00:01:58 +0000 (16:01 -0800)]
vp9: Add control to vpx_temporal_svc_encoder for row-mt.

Keep it off as default for now.

Change-Id: Ia2518a8ce96c9735c3fe67215dde25a35e8620af

7 years agoMerge "Shift speed 2 from non-large VP9 tests to large ones."
Jerome Jiang [Wed, 8 Mar 2017 23:14:27 +0000 (23:14 +0000)]
Merge "Shift speed 2 from non-large VP9 tests to large ones."

7 years agoMerge "Add support for POWER8/VSX"
Johann Koenig [Wed, 8 Mar 2017 22:38:21 +0000 (22:38 +0000)]
Merge "Add support for POWER8/VSX"

7 years agoMerge "Make the partition search early termination feature to be frame size dependent"
Yunqing Wang [Wed, 8 Mar 2017 22:31:30 +0000 (22:31 +0000)]
Merge "Make the partition search early termination feature to be frame size dependent"

7 years agoMake the partition search early termination feature to be frame size dependent
Yunqing Wang [Wed, 8 Mar 2017 20:24:15 +0000 (12:24 -0800)]
Make the partition search early termination feature to be frame size dependent

The 2 thresholds(i.e. partition_search_breakout_dist_thr and
partition_search_breakout_rate_thr) are used as the partition search
early termination speed feature. This refactoring patch made this
feature to be frame size dependent consistently throughout the code.

Change-Id: Idaa0bd8400badaa0f8e2091e3f41ed2544e71be9

7 years agoUpdate vpx_idct32x32_1024_add_neon()
Linfeng Zhang [Tue, 7 Mar 2017 21:06:06 +0000 (13:06 -0800)]
Update vpx_idct32x32_1024_add_neon()

Most are cosmetics changes.
Speed has no change with clang 3.8, and about 5% faster with gcc 4.8.4

Tried the strategy used in 8x8 and 16x16 (which operations' orders are
similar to the C code), though speed gets better with gcc, it's worse
with clang.

Tried to remove store_in_output(), but speed gets worse.

Change-Id: I93c8d284e90836f98962bb23d63a454cd40f776e

7 years agoAdd support for POWER8/VSX
Rafael de Lucena Valle [Thu, 20 Oct 2016 00:21:09 +0000 (22:21 -0200)]
Add support for POWER8/VSX

Add ppc, ppc64 and ppc64le on all_platforms and ARCH_LIST

Add VSX flags and check for -mvsx

Define empty setup_rtcd_internal

Add Altivec detection based on:
http://freevec.org/function/altivec_runtime_detection_linux

Detect VSX at runtime when enabled

Change-Id: I304f4d8c5fee0ff19b6483cd2e9cc50d6ddec472
Signed-off-by: Rafael de Lucena Valle <rafaeldelucena@gmail.com>
7 years agoAdd vpx_highbd_idct32x32_135_add_c()
Linfeng Zhang [Wed, 8 Mar 2017 18:46:33 +0000 (10:46 -0800)]
Add vpx_highbd_idct32x32_135_add_c()

When eob is less than or equal to 135 for high-bitdepth 32x32 idct,
call this function.

BUG=webm:1301

Change-Id: I8a5864f5c076e449c984e602946547a7b09c9fe6

7 years agoMerge "vp9: Fix for denoising with SVC."
Marco Paniconi [Wed, 8 Mar 2017 18:26:11 +0000 (18:26 +0000)]
Merge "vp9: Fix for denoising with SVC."

7 years agovp9: Fix for denoising with SVC.
Marco [Wed, 8 Mar 2017 01:35:45 +0000 (17:35 -0800)]
vp9: Fix for denoising with SVC.

Fix the conditon for getting last_source when denoising is on.
This avoids unneeded scaling in the case of SVC.

No change in quality.

Change-Id: I32c1c2c9085104da51af8535716bcc4d55fb0f42

7 years agocosmetics,dsp/arm/: vpx_idct32x32_{34,135}_add_neon()
Linfeng Zhang [Tue, 7 Mar 2017 23:29:15 +0000 (15:29 -0800)]
cosmetics,dsp/arm/: vpx_idct32x32_{34,135}_add_neon()

No speed changes and disassembly is almost identical.

Change-Id: Id07996237d2607ca6004da5906b7d288b8307e1f

7 years agocosmetics,dsp/arm/: rename a variable
Linfeng Zhang [Wed, 1 Mar 2017 23:11:46 +0000 (15:11 -0800)]
cosmetics,dsp/arm/: rename a variable

Rename cospi_6_26_14_18N to cospi_6_26N_14_18N for consistency.

Change-Id: I00498b43bb612b368219a489b3adaa41729bf31a

7 years agoShift speed 2 from non-large VP9 tests to large ones.
Jerome Jiang [Tue, 7 Mar 2017 21:58:11 +0000 (13:58 -0800)]
Shift speed 2 from non-large VP9 tests to large ones.

This may fix the time out failure of valgrind tests in nightly
since more coverages were added on row-mt.

Change-Id: Id9414e66d1a266602c7495243d9f5cb69e17ccdc

7 years agoMerge "tiny_ssim.c : adds y4m support to tiny_ssim."
James Bankoski [Tue, 7 Mar 2017 18:49:13 +0000 (18:49 +0000)]
Merge "tiny_ssim.c : adds y4m support to tiny_ssim."

7 years agotiny_ssim.c : adds y4m support to tiny_ssim.
Jim Bankoski [Thu, 9 Feb 2017 22:12:55 +0000 (14:12 -0800)]
tiny_ssim.c : adds y4m support to tiny_ssim.

Change-Id: I7a13b7e3a1e11ddbe4be3009edf03528e1bc7647

7 years agoMerge "vp8_create_decoder_instances: correct pbi[] memset"
James Zern [Sat, 4 Mar 2017 00:47:17 +0000 (00:47 +0000)]
Merge "vp8_create_decoder_instances: correct pbi[] memset"

7 years agoMerge "Narrow cat6_high_cost tables to uint16_t"
Alex Converse [Fri, 3 Mar 2017 23:45:39 +0000 (23:45 +0000)]
Merge "Narrow cat6_high_cost tables to uint16_t"

7 years agovp8_create_decoder_instances: correct pbi[] memset
James Zern [Fri, 3 Mar 2017 23:23:32 +0000 (15:23 -0800)]
vp8_create_decoder_instances: correct pbi[] memset

clear the entire array on error. the size used previously was equal to
the number of elements.

BUG=webm:1364

Change-Id: I2f2e16ed6e867f41d4774a5a8ac9cedaee11ce46

7 years agoNarrow cat6_high_cost tables to uint16_t
Alex Converse [Fri, 3 Mar 2017 23:02:56 +0000 (15:02 -0800)]
Narrow cat6_high_cost tables to uint16_t

Saves 2688 bytes of rodata.

Change-Id: I46633b6e50c2845181c70fff6273a8e58fdd1e56

7 years agoMerge "vp9,realtime: Enable row multithreading for non-rd"
Vignesh Venkatasubramanian [Fri, 3 Mar 2017 19:05:52 +0000 (19:05 +0000)]
Merge "vp9,realtime: Enable row multithreading for non-rd"

7 years agoMerge "vp9: Speed 8: reduce the adaptive_rd_thresh level."
Marco Paniconi [Thu, 2 Mar 2017 22:25:03 +0000 (22:25 +0000)]
Merge "vp9: Speed 8: reduce the adaptive_rd_thresh level."

7 years agovp9: Speed 8: reduce the adaptive_rd_thresh level.
Marco [Thu, 2 Mar 2017 21:01:53 +0000 (13:01 -0800)]
vp9: Speed 8: reduce the adaptive_rd_thresh level.

Reduce the level from 4 to 2.
This gives ~1-2% quality gain on RTC set, with small decreaee in speed (~1-2% on mac).

Change-Id: I7d959731badcee3d45b2f4a08efe378765016a13

7 years agovp9,realtime: Enable row multithreading for non-rd
Vignesh Venkatasubramanian [Mon, 13 Feb 2017 19:36:02 +0000 (11:36 -0800)]
vp9,realtime: Enable row multithreading for non-rd

Enable row level multithreading for realtime encodes where non-rd
path is used (speed >= 5).

Change-Id: I5439cb49a02171166d8e1de06c7d5e6f8e819a41