]>
granicus.if.org Git - libvpx/log
Kyle Siefring [Mon, 1 May 2017 16:19:11 +0000 (09:19 -0700)]
block error avx2: sum in 32 bits when possible
Add 31bit pairs before unpacking in x86 block error code
AVX2 code provides a very minor performance improvement.
BUG=webm:1210
Change-Id: I4c82308eaf65741dca2f5c6db9be9c85f905073a
Johann [Mon, 1 May 2017 16:12:25 +0000 (09:12 -0700)]
move vp9_error_intrin_avx2.c
There is only one avx2 implementation. Drop '_intrin'
Change-Id: I887a0d27d58567eaad49f749f127eca61313f312
Luca Barbato [Wed, 26 Apr 2017 16:31:11 +0000 (16:31 +0000)]
ppc: Add convolve_avg
Change-Id: Ib203c444c708f42072e38301ee3db97b5b53d014
Luca Barbato [Wed, 26 Apr 2017 15:43:19 +0000 (15:43 +0000)]
ppc: Add convolve_copy
Change-Id: Ie26d6dbe090e711d84bac01ba7da270db983f405
Johann Koenig [Fri, 28 Apr 2017 18:32:08 +0000 (18:32 +0000)]
Merge "Use uint32_t for accumulator"
Jerome Jiang [Fri, 28 Apr 2017 18:10:35 +0000 (18:10 +0000)]
Merge "vp9: Fix condition for disabling adaptive_rd_thresh."
Jerome Jiang [Thu, 27 Apr 2017 19:56:52 +0000 (12:56 -0700)]
vp9: Fix condition for disabling adaptive_rd_thresh.
Add speed constrains for disabling adaptive_rd_thresh when
row_mt_bit_exact is set.
Change-Id: I2445115c2f9a2e46b8a0966031a0fea488d4964e
Jerome Jiang [Fri, 28 Apr 2017 15:45:52 +0000 (15:45 +0000)]
Merge "Generalize vp9 sse2 denoiser test for other platforms."
Johann [Fri, 28 Apr 2017 13:34:21 +0000 (06:34 -0700)]
Use uint32_t for accumulator
Be specific about the data type size.
Use convenience macro vp9_zero_array.
Change-Id: I5fadf7dbd408befb73820d85db0be4832e8cfcbd
Johann Koenig [Fri, 28 Apr 2017 13:22:40 +0000 (13:22 +0000)]
Merge "vp9 temporal filter: sse4 implementation"
Jerome Jiang [Thu, 27 Apr 2017 22:56:39 +0000 (15:56 -0700)]
Generalize vp9 sse2 denoiser test for other platforms.
Renamed to vp9_denoiser_test.
Change-Id: I0d8f4c94bcb81a60949a13d9fe839cee95d03f77
Yaowu Xu [Fri, 28 Apr 2017 00:16:55 +0000 (00:16 +0000)]
Merge "VP9: enable trellis for high bitdepth intra"
James Zern [Thu, 27 Apr 2017 21:47:09 +0000 (21:47 +0000)]
Merge "webm_read_frame: avoid NULL dereference"
Johann [Wed, 15 Mar 2017 17:40:58 +0000 (10:40 -0700)]
vp9 temporal filter: sse4 implementation
Approximates division using multiply and shift.
Speeds up both sizes (8x8 and 16x16) by 30 times.
Fix the call sites to use the RTCD function.
Delete sse2 and mips implementation. They were based on a previous
implementation of the filter. It was changed in Dec 2015:
ece4fd5d2247c9512b31a93dd593de567beaf928
BUG=webm:1378
Change-Id: I0818e767a802966520b5c6e7999584ad13159276
Jerome Jiang [Wed, 26 Apr 2017 18:12:21 +0000 (11:12 -0700)]
vp9: Don't force disabling of adaptive_rd_thresh for realtime.
Don't force disabling of adaptive_rd_thresh for realtime when
row_mt_bit_exact is set.
Row based adaptive rd is made usable in CL
454882(https://chromium-review.googlesource.com/c/454882) for REALTIME.
Change-Id: Ief023414f0fd6eb86f299dd46ae58f4436875af5
Yunqing Wang [Wed, 26 Apr 2017 16:12:14 +0000 (16:12 +0000)]
Merge "Make the row based multi-threaded encoder deterministic"
Linfeng Zhang [Wed, 26 Apr 2017 15:50:45 +0000 (15:50 +0000)]
Merge "Update highbd convolve functions arguments to use uint16_t src/dst"
Marco Paniconi [Wed, 26 Apr 2017 15:45:05 +0000 (15:45 +0000)]
Merge "vp9: SVC: Adjust some speed settings for temporal layers."
Peter de Rivaz [Wed, 26 Apr 2017 10:40:58 +0000 (11:40 +0100)]
VP9: enable trellis for high bitdepth intra
BUG=webm:1409
Change-Id: I5236595aac1c09386c60ffe8ad621e01422ed5a7
Jerome Jiang [Wed, 26 Apr 2017 00:09:29 +0000 (00:09 +0000)]
Merge "Fix the decoder seg fault when frame is corrupted."
Jerome Jiang [Wed, 26 Apr 2017 00:09:21 +0000 (00:09 +0000)]
Merge "vp9: speed >= 8: Skip uv variance in model_rd_sb_y_large"
Marco [Tue, 25 Apr 2017 23:11:19 +0000 (16:11 -0700)]
vp9: SVC: Adjust some speed settings for temporal layers.
Make some speed setting changes for temporal enhancement layers,
and remove the switch in subpel_force_stop for the aggressive_base_mv
in non-rd pickmode.
Gain some 2-3% speed with little/negligible quality loss.
Change-Id: I3e2a7f80ff45f38c0a6ceb01b34dbca2f53edbf0
Jerome Jiang [Fri, 21 Apr 2017 17:10:05 +0000 (10:10 -0700)]
vp9: speed >= 8: Skip uv variance in model_rd_sb_y_large
For speed >= 8 and color_sensitivity not set, skip the transform
skipping test in UV planes.
Add a new condition to check noise level to skip chroma check
for speed >= 8 if y_sad is high.
1~2% speedup on ARM for speed 8.
Borg tests show neutral results in both rtc and rtc_derf.
Change-Id: Idecd3ff6e28c97757a43bb6f3a7082c85f72109c
Linfeng Zhang [Wed, 19 Apr 2017 20:08:25 +0000 (13:08 -0700)]
Update highbd convolve functions arguments to use uint16_t src/dst
BUG=webm:1388
Change-Id: I6912de2639895d817ce850da8ea9f6c8fe21da42
James Zern [Sat, 22 Apr 2017 20:11:16 +0000 (13:11 -0700)]
webm_read_frame: avoid NULL dereference
block may be NULL with block_entry_eos or from return of GetBlock()
Change-Id: Ia0dd3ffa46305ee70efcdc55c05c2ad24efc993b
Marco [Fri, 14 Apr 2017 18:32:19 +0000 (11:32 -0700)]
vp9; Reduce artifact in non-rd pickmode for lighting changes.
Add a low-variance high-sumdiff to the superblock content state
and use it to limit the mv and bias some decisions in non-rd pickmode.
Only affects speed >= 6.
Reduces artifact for lighting changes.
Small/no difference in metrics on RTC set.
Change-Id: Ic84b2379fe0ae3fa71ae826ee6bae3eaf551a25b
Yunqing Wang [Mon, 24 Apr 2017 19:06:49 +0000 (12:06 -0700)]
Make the row based multi-threaded encoder deterministic
This patch followed allow_exhaustive_searches feature modification and
continued to modify the encoder to achieve the determinism in the row
based multi-threaded encoding. While row-mt = 1 and using multiple
threads, the adaptive feature in encoder was disabled, which gave
BDRate gain(at speed 1, -0.6% ~ -0.7%; at speed 2, -0.46% ~ -0.59%),
but some encoder speed losses(7% ~ 10% at speed 1 and 3% ~ 6% at
speed 2). These speed losses were acceptable considering the speed
gains obtained from row-mt.
Change-Id: I60d87a25346ebc487a864b57d559f560b7e398bb
Yunqing Wang [Mon, 24 Apr 2017 17:41:10 +0000 (17:41 +0000)]
Merge "Make allow_exhaustive_searches feature no longer adaptive"
Marco Paniconi [Fri, 21 Apr 2017 21:28:16 +0000 (21:28 +0000)]
Merge "vp9: SVC: fix condition for partition/skip threshold when denoising."
Yunqing Wang [Thu, 20 Apr 2017 00:00:08 +0000 (17:00 -0700)]
Make allow_exhaustive_searches feature no longer adaptive
A previous patch turned on allow_exhaustive_searches feature only for
FC_GRAPHICS_ANIMATION content. This patch further modified the feature
by removing the exhaustive search limit, and made it no longer adaptive.
As a result, the 2 counts that recorded the number of motion searches
were removed, which helped achieve the determinism in the row based
multi-threading encoding. Tests showed that this patch didn't cause
the encoder much slower.
Used exhaustive_searches_thresh for this speed feature, and removed
allow_exhaustive_searches. Also, refactored the speed feature code
to follow the general speed feature setting style.
Change-Id: Ib96b182c4c8dfff4c1ab91d2497cc42bb9e5a4aa
Jerome Jiang [Fri, 21 Apr 2017 00:51:46 +0000 (00:51 +0000)]
Merge "vp9: Non-rd pickmode: Avoid computation duplication."
Marco [Thu, 20 Apr 2017 23:32:46 +0000 (16:32 -0700)]
vp9: SVC: fix condition for partition/skip threshold when denoising.
The more aggressive settings should only be used when denoise_svc
condition is satisfied (which means top spatial layer).
Change-Id: Ia8e3515b27f31bf21b1976ca80a2fa826daece3a
Jerome Jiang [Thu, 20 Apr 2017 17:57:02 +0000 (10:57 -0700)]
vp9: Non-rd pickmode: Avoid computation duplication.
In non-rd pickmode (speed >= 5), avoid duplication of computations in
model_rd_for_sb_y when the speed feature use_simple_block_yrd is
enabled (or for high bitdepth build under certain conditions).
QVGA, VGA and HD have 1.23%, 2.68% and 1.7% speedup on ARM for speed 8,
respectively.
Encoding results are bitexact for speed >= 5.
Change-Id: I3f9130810c21439f5ad7e159e21cb2243dcd05f1
Jerome Jiang [Thu, 20 Apr 2017 21:48:22 +0000 (14:48 -0700)]
Fix the decoder seg fault when frame is corrupted.
BUG=webm:1399
Change-Id: I1e006e0260d9b56a4d2273659ca19b86c69c474b
Marco [Thu, 20 Apr 2017 21:13:57 +0000 (14:13 -0700)]
vp9: 1 pass SVC: Fix comment and condition for up-sampling reference.
No change in behavior.
Change-Id: I218fb30289091da623acb23324027435b8510d0e
Yunqing Wang [Thu, 20 Apr 2017 19:57:46 +0000 (19:57 +0000)]
Merge "Only allow allow_exhaustive_searches for FC_GRAPHICS_ANIMATION content"
Marco Paniconi [Thu, 20 Apr 2017 19:53:19 +0000 (19:53 +0000)]
Merge "vp9: Re-enable SVC datarate tests."
Marco [Wed, 19 Apr 2017 18:12:42 +0000 (11:12 -0700)]
vp9: Re-enable SVC datarate tests.
Re-enable the SVC tests, wrap the non-zero expectation
in GetMismatchFrames around #if CONFIG_VP9_DECODER.
Change-Id: I0e8a2d78b868c32f18fe597540f397d3a1b303b5
Marco [Thu, 20 Apr 2017 16:50:16 +0000 (09:50 -0700)]
vp9: SVC: Redefine the source downsample filter choice.
Rename the source downsampling filter, and define it
per spatial layers. Used 1 pass CBR SVC.
Change-Id: I8135f2ab89c535c53429b9c58b586f746bb668c7
Luca Barbato [Tue, 18 Apr 2017 23:37:57 +0000 (23:37 +0000)]
ppc: Add the intra predictor tests
Change-Id: Idea15b916044ab3d8e74519337880a484ecfd87e
Luca Barbato [Tue, 18 Apr 2017 22:55:53 +0000 (22:55 +0000)]
ppc: h predictor 8x8
Slightly faster with the current compiler.
Change-Id: Iae225fac08395eb430c97a2abec69c60f5cf5c47
Luca Barbato [Tue, 11 Apr 2017 23:18:35 +0000 (01:18 +0200)]
ppc: d63 predictor 8x8
10x faster.
Change-Id: I7cedbf4df2ce7df5b6f1108b11815d088fdb9ba8
Luca Barbato [Sun, 9 Apr 2017 15:07:03 +0000 (15:07 +0000)]
ppc: tm predictor 4x4
Slightly faster.
Change-Id: I0ca43f309b3d9b50435d69bd5be64b53a99bd191
Luca Barbato [Sun, 9 Apr 2017 13:44:41 +0000 (13:44 +0000)]
ppc: h predictor 4x4
2x faster.
Change-Id: I0583dec353299c6797401b646099f18db4e0420d
Luca Barbato [Sun, 9 Apr 2017 13:05:09 +0000 (13:05 +0000)]
ppc: dc predictor 8x8
Slightly faster, the other dc predictors cannot be faster since
the computation speedup is overwhelmed by the time spent reading
dst to write just the 8x8 part.
Change-Id: I94a0b50500adf8b7b6bb919dbf5c7adf5b9fba66
Luca Barbato [Sun, 9 Apr 2017 11:07:22 +0000 (11:07 +0000)]
ppc: d45 predictor 8x8
11x faster.
Change-Id: I5b8f39213ee1f5260724fc254e3fb5c462435798
Luca Barbato [Sun, 9 Apr 2017 00:09:56 +0000 (00:09 +0000)]
ppc: d63 predictor 32x32
About 10x faster.
Change-Id: If7d0645f75c5d7deb9751edd0bf47e2f9068e9e7
Luca Barbato [Sun, 9 Apr 2017 00:09:56 +0000 (00:09 +0000)]
ppc: d63 predictor 16x16
About 18x faster.
Change-Id: Id043bf76c011e03e992085bb5e20f330d3e98cd4
Luca Barbato [Sat, 8 Apr 2017 22:41:41 +0000 (22:41 +0000)]
ppc: d45 predictor 32x32
About 12x faster.
Change-Id: I22c150256aefb4941861ab1f6c17d554fb694bed
Luca Barbato [Sat, 8 Apr 2017 22:41:41 +0000 (22:41 +0000)]
ppc: d45 predictor 16x16
About 16x faster.
Change-Id: Ie5469fb32d5fd11bb6cb06318cea475d8a5b00b9
Luca Barbato [Sat, 8 Apr 2017 02:55:33 +0000 (02:55 +0000)]
ppc: dc predictor 32x32
10x and 5x faster.
Change-Id: I7913c58c768334d818f541a5e219f1035791eeaf
Luca Barbato [Sat, 8 Apr 2017 02:55:33 +0000 (02:55 +0000)]
ppc: dc top and left predictor 32x32
6x faster.
Change-Id: I717995b4056e5579c68191d11b495372971fe1ae
Luca Barbato [Sat, 8 Apr 2017 02:55:33 +0000 (02:55 +0000)]
ppc: dc top and left predictor 16x16
13x faster.
Change-Id: I1771ac39fda599153f933cb3f0506c9f97a6cbe6
Luca Barbato [Sat, 8 Apr 2017 00:39:24 +0000 (00:39 +0000)]
ppc: dc_128 predictor 32x32
6x faster.
Change-Id: I1da8f51b4262871cb98f0aa03ccda41b0ac2b08b
Luca Barbato [Sat, 8 Apr 2017 00:26:54 +0000 (00:26 +0000)]
ppc: dc_128 predictor 16x16
20x faster.
Change-Id: I05f0deb2d38ae7966eae6b71fbc0aa51880e5709
Luca Barbato [Fri, 7 Apr 2017 14:49:00 +0000 (14:49 +0000)]
ppc: tm predictor 32x32
About 8x faster.
Change-Id: I9bad827ccbdf47ec95406e961c74ac2ff45f80cf
James Zern [Thu, 20 Apr 2017 02:45:44 +0000 (02:45 +0000)]
Merge changes I1f5a3752,I95123051,I3bb724e0,Ie81077fa,Ic80f3c05, ...
* changes:
ppc: tm predictor 16x16
ppc: tm predictor 8x8
ppc: horizontal predictor 32x32
ppc: horizontal predictor 16x16
ppc: vertical intrapred 16x16 and 32x32
configure: Workaround clang not enabling altivec on -mvsx
configure: Match power*64* as ppc64
Yunqing Wang [Wed, 19 Apr 2017 23:32:59 +0000 (16:32 -0700)]
Only allow allow_exhaustive_searches for FC_GRAPHICS_ANIMATION content
The allow_exhaustive_searches feature improves the encoding quality
of FC_GRAPHICS_ANIMATION content a lot. For non-FC_GRAPHICS_ANIMATION
content, the quality test result is almost neutral. This patch makes
this feature to be used only for FC_GRAPHICS_ANIMATION content.
The motivation of doing that is to make this feature no longer adaptive,
which will be implemented in the following patch.
Change-Id: Ic911df6dd757402b6480789cc247801e99840369
Linfeng Zhang [Wed, 19 Apr 2017 23:55:57 +0000 (23:55 +0000)]
Merge changes I9e18a73b,Ie47c8cd4
* changes:
Clean CONVERT_TO_BYTEPTR/SHORTPTR in convolve
Create CAST_TO_BYTEPTR/SHORTPTR
Linfeng Zhang [Thu, 6 Apr 2017 00:54:42 +0000 (17:54 -0700)]
Clean CONVERT_TO_BYTEPTR/SHORTPTR in convolve
Replace by CAST_TO_BYTEPTR/SHORTPTR.
The rule is: if a short ptr is casted to a byte ptr, any offset
operation on the byte ptr must be doubled. We do this by casting to
short ptr first, adding offset, then casting back to byte ptr.
BUG=webm:1388
Change-Id: I9e18a73ba45ddae58fc9dae470c0ff34951fe248
Marco Paniconi [Wed, 19 Apr 2017 15:27:55 +0000 (15:27 +0000)]
Merge "vp9: Add phase to get averaging filter for 1:2 downsampling."
Marco [Wed, 19 Apr 2017 14:59:59 +0000 (07:59 -0700)]
vp9: Fix the disabling of a SVC 3TL datarate test.
Change-Id: Ib42d23ab5ee39ab3c85e1d9a84e36249e59fe74e
Marco [Fri, 14 Apr 2017 00:19:06 +0000 (17:19 -0700)]
vp9: Add phase to get averaging filter for 1:2 downsampling.
The scaling filter with zero shift will give sub-sampling for
2x downsampling. Allow for a phase shift to get an averaging filter.
Usage is for source scaling in 1 pass SVC mode for 1:2 downscale.
Reduces aliasing in downsampled image.
Keep the phase to 0/off for now.
Change-Id: Ic547ea0748d151b675f877527e656407fcf4d51e
Luca Barbato [Fri, 7 Apr 2017 14:49:00 +0000 (14:49 +0000)]
ppc: tm predictor 16x16
About 10x faster.
Change-Id: I1f5a3752d346459df3b45f92963208bf3e520f06
Luca Barbato [Fri, 7 Apr 2017 14:49:00 +0000 (14:49 +0000)]
ppc: tm predictor 8x8
About 5x faster.
Change-Id: I951230517f49c0dca9ac9eac2efa8916a303b85a
Luca Barbato [Fri, 7 Apr 2017 14:49:00 +0000 (14:49 +0000)]
ppc: horizontal predictor 32x32
About 5x faster.
Change-Id: I3bb724e07baffd901aa2d0f65060ba48882cc9b8
Luca Barbato [Fri, 7 Apr 2017 14:49:00 +0000 (14:49 +0000)]
ppc: horizontal predictor 16x16
About 10x faster.
Change-Id: Ie81077fa32ad214cdb46bdcb0be4e9e2c7df47c2
Luca Barbato [Fri, 7 Apr 2017 13:50:12 +0000 (13:50 +0000)]
ppc: vertical intrapred 16x16 and 32x32
Change-Id: Ic80f3c050cfbe7697e81a311b4edaaa597b85cab
Luca Barbato [Tue, 18 Apr 2017 18:31:18 +0000 (18:31 +0000)]
configure: Workaround clang not enabling altivec on -mvsx
The flag `-mvsx` implies `-maltivec`.
Change-Id: I7544553eba131a533467b387f8bf329d57f5af5c
Luca Barbato [Fri, 7 Apr 2017 13:14:35 +0000 (13:14 +0000)]
configure: Match power*64* as ppc64
Change-Id: Ie640dff50a5db935bb57c5a2570b423ce8946f2c
Linfeng Zhang [Thu, 6 Apr 2017 00:40:12 +0000 (17:40 -0700)]
Create CAST_TO_BYTEPTR/SHORTPTR
They will replace CONVERT_TO_BYTEPTR/SHORTPTR module by module.
BUG=webm:1388
Change-Id: Ie47c8cd4897696481b9cbbf9e2d439dc22dc85ec
Marco [Tue, 18 Apr 2017 16:43:32 +0000 (09:43 -0700)]
vp9: Disable some SVC tests for now.
Disable the 1 pass CBR SVC tests with temporal_layers > 1.
Issue with the commit
863f860 , which will cause encoder/decoder
mismatch due to skipping encoder loopfilter for non-reference frames.
Will re-enable the tests when fixed.
Change-Id: I74918a0045a17976b069c4be63fbeb921974df0d
Marco [Mon, 17 Apr 2017 21:36:11 +0000 (14:36 -0700)]
vp9: Add key_frame condition to is_reference check for loopfilter.
This condiiton is not needed as key_frame should set the refresh
of the reference frames, but good to have for clarity in condition.
Change-Id: Icf9838e7e4f0ff5cf0a9562ae3b5d6c7e6f78702
Johann Koenig [Mon, 17 Apr 2017 22:07:34 +0000 (22:07 +0000)]
Merge "re-enable vpx_comp_avg_pred_sse2"
Marco Paniconi [Mon, 17 Apr 2017 18:00:09 +0000 (18:00 +0000)]
Revert "Revert "vp9: Avoid encoder loopfilter for non-reference frames.""
This reverts commit
e9b7f98c56b3b9c99a60eb41b83bf8346b3ad25f .
Reason for revert:
Commit
d578bdad fixes the issue (encoder/decoder mismatch
in 3TL datarate test) that causes the original revert.
Original change's description:
> Revert "vp9: Avoid encoder loopfilter for non-reference frames."
>
> This reverts commit
863f860bfcf3bdc26eeecb299aa481d0f63d11ac .
>
> This causes encoder / decoder mismatches in various
> VP9/DatarateTestVP9Large.BasicRateTargeting3TemporalLayers tests
>
> BUG=webm:1408
>
> Change-Id: Ic200c39d7ed9c0b0247ef562f5d6f7b2625f7e14
>
TBR=jzern@google.com,marpan@google.com,builds@webmproject.org,jianj@google.com
BUG=webm:1408
Change-Id: Ifeb81460856d1d56482d4e0477a70ee98f8bfaa6
Marco [Mon, 17 Apr 2017 16:19:03 +0000 (09:19 -0700)]
vp9: Datarate test: modify frame flags for 3 TL.
Modify the frame flags to update the ARF on top layer,
for the tests:
VP9/DatarateTestVP9Large.BasicRateTargeting3TemporalLayers
VP9/DatarateTestVP9Large.BasicRateTargeting3TemporalLayersFrameDropping
This is needed to fix the encode/decoder mismatches caused by
863f860 ,
and removed in the revert
e9b7f98 .
Change-Id: I6b9fecfdd17315fc0179e29949338c77636026c0
Johann [Mon, 17 Apr 2017 15:38:02 +0000 (08:38 -0700)]
re-enable vpx_comp_avg_pred_sse2
Buffers on 32 bit x86 builds only guaranteed 8 byte alignment. Fixed
with "AvgPred test: use aligned buffers" and "sad avg: align
intermediate buffer"
Also re-enable asserts on the C version.
BUG=webm:1390
Change-Id: I93081f1b0002a352bb0a3371ac35452417fa8514
Johann Koenig [Mon, 17 Apr 2017 15:36:41 +0000 (15:36 +0000)]
Merge "AvgPred test: use aligned buffers"
Johann [Fri, 14 Apr 2017 21:37:58 +0000 (14:37 -0700)]
sad avg: align intermediate buffer
comp_avg_pred has started declaring a requirement for aligned buffers.
BUG=webm:1390
Change-Id: Idaf6667498ea343e8d49b32bc9d8b9d0aa43ef5c
James Zern [Sat, 15 Apr 2017 00:26:08 +0000 (00:26 +0000)]
Merge "Add AVX2 optimization to copy/avg functions"
Yi Luo [Tue, 28 Mar 2017 22:30:07 +0000 (15:30 -0700)]
Add AVX2 optimization to copy/avg functions
Change-Id: Ibcef70e4fead74e2c2909330a7044a29381a8074
Johann Koenig [Fri, 14 Apr 2017 22:01:38 +0000 (22:01 +0000)]
Merge "Disable vpx_comp_avg_pred_sse2"
Johann [Fri, 14 Apr 2017 19:44:06 +0000 (12:44 -0700)]
AvgPred test: use aligned buffers
BUG=webm:1390
Change-Id: Idb6d1ce119a09c5e7c9f3c58bbbae3de63463d1d
James Zern [Fri, 14 Apr 2017 18:28:25 +0000 (11:28 -0700)]
Revert "vp9: Avoid encoder loopfilter for non-reference frames."
This reverts commit
863f860bfcf3bdc26eeecb299aa481d0f63d11ac .
This causes encoder / decoder mismatches in various
VP9/DatarateTestVP9Large.BasicRateTargeting3TemporalLayers tests
BUG=webm:1408
Change-Id: Ic200c39d7ed9c0b0247ef562f5d6f7b2625f7e14
Marco Paniconi [Fri, 14 Apr 2017 17:12:58 +0000 (17:12 +0000)]
Merge "vp9: SVC: fix to allow use_base_mv to be used for 3 layers."
Johann [Fri, 14 Apr 2017 15:24:59 +0000 (08:24 -0700)]
Disable vpx_comp_avg_pred_sse2
Failures on windows:
unknown file: error: SEH exception with code 0xc0000005 thrown in the
test body.
Alignment check errors on linux:
test_libvpx: ../libvpx/vpx_dsp/variance.c:230: void
vpx_comp_avg_pred_c(uint8_t *, const uint8_t *, int, int, const uint8_t
*, int): Assertion `((intptr_t)comp_pred & 0xf) == 0' failed.
BUG=webm:1390
Change-Id: I5eed5381c0f1a8fe594a128eb415e77232f544ea
Johann Koenig [Fri, 14 Apr 2017 04:10:55 +0000 (04:10 +0000)]
Merge "vpx_comp_avg_pred: sse2 optimization"
Marco [Fri, 14 Apr 2017 00:45:55 +0000 (17:45 -0700)]
vp9: SVC: fix to allow use_base_mv to be used for 3 layers.
Allow use_base_mv to be used for 3 spatial layers where
base is 4x4 scale from the top layer.
Change-Id: If6641baf8b8e4d0fd5dc67619d873c6d75065f43
Marco Paniconi [Fri, 14 Apr 2017 00:45:41 +0000 (00:45 +0000)]
Merge "vp9: Avoid encoder loopfilter for non-reference frames."
Marco [Thu, 13 Apr 2017 00:06:03 +0000 (17:06 -0700)]
vp9: 1 pass VBR: Fix to rate control at low min-q.
Fix to avoid getting stuck at very low Q even
though content is changing, which can happen for --min-q=0.
Fix is to more aggressively increase active_worst_quality
when detecting significant rate_deviation at very low Q.
Change will only affect 1 pass VBR for --min-q < 4, so no
change in ytlive metrics for --min-q >= 4.
Change-Id: I4dd77dd7c08a30a4390da0ff2c8bda6fccfa76d7
Marco [Tue, 11 Apr 2017 23:17:18 +0000 (16:17 -0700)]
vp9: Avoid encoder loopfilter for non-reference frames.
Useful for SVC, where the top layer enhancement frames may
not update any reference buffers, as is the case for the
patterns in the 1 pass CBR SVC when #temporal_layers > 1.
~3% encoder speedup for SVC patterns with temporal layers
in 1 pass CBR mode.
Updated the SVC datarate tests for the mismatch frames.
Set the frame-dropper off in some tests with #temporal_layers > 1
so we can correctly set #mismatch frames. Adjusted rate target
threshold for tests where frame-dropper was turned off.
Change-Id: Ia0c142f02100be0fed61cd2049691be9c59d6793
Johann [Thu, 23 Mar 2017 21:54:48 +0000 (14:54 -0700)]
vpx_comp_avg_pred: sse2 optimization
Provides over 15x speedup for width > 8.
Due to smaller loads and shifting for width == 8 it gets about 8x
speedup.
For width == 4 it's only about 4x speedup because there is a lot of
shuffling and shifting to get the data properly situated.
BUG=webm:1390
Change-Id: Ice0b3dbbf007be3d9509786a61e7f35e94bdffa8
Yunqing Wang [Mon, 10 Apr 2017 17:57:41 +0000 (10:57 -0700)]
Fix an integer overflow in vp9_mcomp.c
The MV unit test revealed an integer overflow issue in vp9_mcomp.c.
This was caused if the MV was very large. In mv_err_cost(), when
mv->row = 8184, mv->col = 8184 and ref_mv is 0, mv_cost = 34363
and error_per_bit = 132412, causing the overflow.
BUG=webm:1406
Change-Id: I35f8299f22f9bee39cd9153d7b00d0993838845e
Jerome Jiang [Tue, 11 Apr 2017 00:45:20 +0000 (00:45 +0000)]
Merge "vp9: speed >= 8: Adjust speed settings on ARM."
Jerome Jiang [Mon, 10 Apr 2017 20:55:35 +0000 (13:55 -0700)]
vp9: speed >= 8: Adjust speed settings on ARM.
Set adaptive_rd_thresh to 2 when simple block yrd is not used.
Fix regression caused by computing y sad without
int_pro_motion_estimation on low res motion clips.
Overall 0.07% quality loss on rtc_derf.
Change only affects low res on speed 8.
Change-Id: Ic6a188a56529f1034d6431005fb4b0e24e8a7e27
Marco [Mon, 10 Apr 2017 21:56:46 +0000 (14:56 -0700)]
vp9: 1 pass CBR: avoid nonrd_pick_partition on segment.
For speed 5, 1 pass CBR: Don't use the nonrd_pick_partition
on the segment, rather use choose_partitioning followed by
nonrd_select_partition (as is done on base segment).
Little/no quality loss on RTC and RTC_derf (< 0.3%),
speedup of at least 5%.
Change-Id: I5273d5f950e60adf5e437b4ca8c4f63964641e83
Marco Paniconi [Fri, 7 Apr 2017 17:13:21 +0000 (17:13 +0000)]
Merge "vp9: Fix to noise estimation for temporal denoising."
Yunqing Wang [Fri, 7 Apr 2017 16:46:22 +0000 (16:46 +0000)]
Merge "VP9 motion vector unit test"
Marco [Thu, 6 Apr 2017 23:35:01 +0000 (16:35 -0700)]
vp9: Fix to noise estimation for temporal denoising.
If the noise estimation is avoided due to large motion,
the last_source for denoising should still be updated.
Change-Id: I67155ea7dbe9ac2785978e64a27bdafd7d57aac0
Marco [Fri, 7 Apr 2017 15:52:54 +0000 (08:52 -0700)]
vp9: Adjust consec_zeromv threshold for aq-mode=3.
To reduce refresh on partial super-blocks on boundary,
for noisy input. Reduces some artifacts on noisy input.
Change-Id: I10b5808a296874e08c7f378b3df58466591d8dbe
Edit