granicus.if.org Git - libvpx/log

]> granicus.if.org Git - libvpx/log

Shiyou Yin [Wed, 6 Sep 2017 00:51:21 +0000 (08:51 +0800)]

vp8: [loongson] optimize idctllm with mmi

1. vp8_short_idct4x4llm_mmi
2. vp8_short_inv_walsh4x4_mmi
3. vp8_dc_only_idct_add_mmi

Change-Id: I616923681e79d78607a4988608fc39df77b093f4

commit | commitdiff | tree

Linfeng Zhang [Wed, 13 Sep 2017 17:21:45 +0000 (17:21 +0000)]

Merge "Specialize 4 to 3 scaling in vp9_scale_and_extend_frame_c()"

commit | commitdiff | tree

Johann Koenig [Wed, 13 Sep 2017 14:44:53 +0000 (14:44 +0000)]

Merge "Revert "Revert "quantize avx: copy 32x32 implementation"""

commit | commitdiff | tree

Kaustubh Raste [Wed, 13 Sep 2017 06:02:49 +0000 (06:02 +0000)]

Merge "Optimize mips msa vp9 average mc functions"

commit | commitdiff | tree

Shiyou Yin [Wed, 13 Sep 2017 01:05:46 +0000 (01:05 +0000)]

Merge "vp8: [loongson] optimize loopfilter with mmi"

commit | commitdiff | tree

Johann [Tue, 12 Sep 2017 21:09:42 +0000 (14:09 -0700)]

Revert "Revert "quantize avx: copy 32x32 implementation""

This reverts commit 8c42237bb200253931c49e2c530838f3a877dd65.

Because ssse3 code is used for the reference, the qcoeff and dqcoeff
reference buffers must be aligned.

Original change's description:
> quantize avx: copy 32x32 implementation
>
> Ensure avx and ssse3 stay in sync by testing them against each other.
>
> Change-Id: I699f3b48785c83260825402d7826231f475f697c

Change-Id: Ieeef11b9406964194028b0d81d84bcb63296ae06

commit | commitdiff | tree

Linfeng Zhang [Tue, 12 Sep 2017 18:37:04 +0000 (11:37 -0700)]

Specialize 4 to 3 scaling in vp9_scale_and_extend_frame_c()

Scale 3x3 block instead of 16x16 block in each loop.

Benefits:
1. Reduced number of different phase_scaler from 16 to 3. Optimization code
will be smaller and faster.
2. The maximum phase_scaler drifting will be reduced from 5/16 to 1/24.
(The drifting is 1/(3*16) in each step.)

BUG=webm:1419

Change-Id: Ibb9242a629ddb03e1ff93b859bece738255e698c

commit | commitdiff | tree

Kaustubh Raste [Tue, 12 Sep 2017 10:05:07 +0000 (15:35 +0530)]

Optimize mips msa vp9 average mc functions

Load the specific destination loads instead of vector load

Change-Id: I65ca13ae8f608fad07121fef848e2a18f54171fe

commit | commitdiff | tree

Scott LaVarnway [Mon, 11 Sep 2017 22:32:23 +0000 (22:32 +0000)]

Merge "vpxdsp: [x86] add highbd_d207_predictor functions"

commit | commitdiff | tree

Linfeng Zhang [Thu, 7 Sep 2017 19:50:36 +0000 (12:50 -0700)]

Add 4 to 1 scaling NEON optimization

BUG=webm:1419

Change-Id: If82a93935d2453e61b7647aae70983db1740bec7

commit | commitdiff | tree

Scott LaVarnway [Wed, 6 Sep 2017 17:08:03 +0000 (10:08 -0700)]

vpxdsp: [x86] add highbd_d207_predictor functions

C vs SSE2 speed gains:
_4x4 : ~2.31x

C vs SSSE3 speed gains:
_8x8 : ~4.73x
_16x16 : ~10.88x
_32x32 : ~4.80x

BUG=webm:1411

Change-Id: I0bac29db261079181ddabc6814bd62c463109caf

commit | commitdiff | tree

Shiyou Yin [Fri, 8 Sep 2017 01:42:51 +0000 (09:42 +0800)]

vp8: [loongson] optimize loopfilter with mmi

1. vp8_loop_filter_horizontal_edge_mmi
2. vp8_loop_filter_vertical_edge_mmi
3. vp8_mbloop_filter_horizontal_edge_mmi
4. vp8_mbloop_filter_vertical_edge_mmi
5. vp8_loop_filter_simple_horizontal_edge_mmi
6. vp8_loop_filter_simple_vertical_edge_mmi

Change-Id: Ie34bbff3a16cff64e39a50798afd2b7dac9bcdc3

commit | commitdiff | tree

James Zern [Sat, 9 Sep 2017 02:20:07 +0000 (19:20 -0700)]

intrapred: sync highbd_d63_predictor w/d63_

8/16/32: ~6%/~18%/~33% faster

previously:
7012ba639 vp9_reconintra: simplify d63_predictor

BUG=webm:1411

Change-Id: Ie775f3a4f7fd74df44754e65686d826a51c2cdc2

commit | commitdiff | tree

James Zern [Sat, 9 Sep 2017 01:57:08 +0000 (18:57 -0700)]

vpx_mem: make vpx_memset16 inline

Change-Id: Ibb2cab930c95836e6d6e66300c33e7d08e4474d4

commit | commitdiff | tree

James Zern [Sat, 9 Sep 2017 01:52:01 +0000 (18:52 -0700)]

intrapred: sync highbd_d45_predictor w/d45_

8/16/32:: ~19%/~54%/~75.5% faster

previously:
acc481eaa vp9_reconintra: simplify d45_predictor

BUG=webm:1411

Change-Id: Ie8340b0c5070ae640f124733f025e4e749b660d8

commit | commitdiff | tree

James Zern [Fri, 8 Sep 2017 19:23:40 +0000 (19:23 +0000)]

Merge changes I9ec438aa,I99c954ff

* changes:
Update convolve functions' assertions
Add 2 to 1 scaling NEON optimization

commit | commitdiff | tree

James Zern [Fri, 8 Sep 2017 07:06:25 +0000 (00:06 -0700)]

vpx_scale_test.h: remove #if from inside macro

fixes visual studio error

Change-Id: I86206f17ca951b15e247c1b92561847d8c21ec7a

commit | commitdiff | tree

Shiyou Yin [Fri, 8 Sep 2017 00:59:31 +0000 (00:59 +0000)]

Merge "vp8: [loongson] optimize sixtap predict with mmi"

commit | commitdiff | tree

Shiyou Yin [Fri, 8 Sep 2017 00:55:14 +0000 (00:55 +0000)]

Merge "vpxdsp: [loongson] optimize sad functions with mmi"

commit | commitdiff | tree

Linfeng Zhang [Wed, 6 Sep 2017 19:01:07 +0000 (12:01 -0700)]

Update convolve functions' assertions

So that 4 to 1 frame scaling can call them.

Change-Id: I9ec438aa63b923ba164ad3c59d7ecfa12789eab5

commit | commitdiff | tree

Linfeng Zhang [Tue, 5 Sep 2017 22:07:00 +0000 (15:07 -0700)]

Add 2 to 1 scaling NEON optimization

BUG=webm:1419

Change-Id: I99c954ffa50a62ccff2c4ab54162916141826d9b

commit | commitdiff | tree

Linfeng Zhang [Tue, 5 Sep 2017 21:48:17 +0000 (14:48 -0700)]

Refactor convolve8 NEON functions

Change-Id: I4ac576875c91fee7cb150d298fae4a2c156d374c

commit | commitdiff | tree

Linfeng Zhang [Tue, 5 Sep 2017 21:38:45 +0000 (14:38 -0700)]

Add ScaleFrameTest

Move class VpxScaleBase to new file test/vpx_scale_test.h.
Add new file test/vp9_scale_test.cc with ScaleFrameTest.

BUG=webm:1419

Change-Id: Iec2098eafcef99b94047de525e5da47bcab519c1

commit | commitdiff | tree

Linfeng Zhang [Wed, 6 Sep 2017 22:39:15 +0000 (22:39 +0000)]

Merge "Remove get_filter_base() and get_filter_offset() in convolve"

commit | commitdiff | tree

Scott LaVarnway [Wed, 6 Sep 2017 21:53:32 +0000 (21:53 +0000)]

Merge "vpxdsp: [x86] add highbd_dc_128_predictor functions"

commit | commitdiff | tree

Peter Boström [Wed, 6 Sep 2017 15:48:42 +0000 (11:48 -0400)]

Remove support for stdatomic.h.

This header doesn't build on g++ v6 as it's a C and not C++ header
(_Atomic is not a keyword in C++11). Since the C and C++ invocations
cannot be guaranteed to point to the same underlying atomic_int
implementation, remove support for them and use compiler intrinsics
instead.

BUG=webm:1461

Change-Id: Ie1cd6759c258042efc87f51f036b9aa53e4ea9d5

commit | commitdiff | tree

Linfeng Zhang [Mon, 28 Aug 2017 17:35:43 +0000 (10:35 -0700)]

Remove get_filter_base() and get_filter_offset() in convolve

so that the convolve functions are independent of table alignment.

Change-Id: Ieab132a30d72c6e75bbe9473544fbe2cf51541ee

commit | commitdiff | tree

Scott LaVarnway [Tue, 5 Sep 2017 14:52:36 +0000 (07:52 -0700)]

vpxdsp: [x86] add highbd_dc_128_predictor functions

C vs SSE2 speed gains:
_4x4 : ~7.64x
_8x8 : ~16.60x
_16x16 : ~8.15x
_32x32 : ~5.05x

BUG=webm:1411

Change-Id: If165d419711cfda901bd428a05ca1560a009e62e

commit | commitdiff | tree

Shiyou Yin [Sat, 2 Sep 2017 16:40:37 +0000 (00:40 +0800)]

vp8: [loongson] optimize sixtap predict with mmi

1. vp8_sixtap_predict16x16_mmi
2. vp8_sixtap_predict8x8_mmi
3. vp8_sixtap_predict8x4_mmi
4. vp8_sixtap_predict4x4_mmi

Change-Id: I186669d1a1d998a0f3ba3a548e25eee8b52c251b

commit | commitdiff | tree

Shiyou Yin [Sat, 2 Sep 2017 07:46:38 +0000 (15:46 +0800)]

vpxdsp: [loongson] optimize sad functions with mmi

1. vpx_sadWxH_c
2. vpx_sadWxH_avg_c
3. vpx_sadWxHx3_c
4. vpx_sadWxHx8_c
5. vpx_sadWxHx4d_c

Change-Id: Ie13161e3d73a052ea6ea7bac9cfadf55598fea7a

commit | commitdiff | tree

James Zern [Fri, 1 Sep 2017 03:07:01 +0000 (20:07 -0700)]

test,Android.mk: export gtest include path

fixes test file builds

Change-Id: Iaa725ad95d56cf77d9fef8994981a80102e9a966

commit | commitdiff | tree

clang-format [Mon, 28 Aug 2017 01:26:24 +0000 (18:26 -0700)]

apply clang-format

Change-Id: If4c3e8a396d0fcb304f407b44e28cac3219f038c

commit | commitdiff | tree

James Zern [Mon, 28 Aug 2017 01:22:04 +0000 (18:22 -0700)]

.clang-format: update to 4.0.1

based on Google style with the following differences:

3a4
> # Generated with clang-format 4.0.1
13c14
< AllowShortCaseLabelsOnASingleLine: false
---
> AllowShortCaseLabelsOnASingleLine: true
23c24
< BraceWrapping:
---
> BraceWrapping:
43c44
< ConstructorInitializerAllOnOneLineOrOnePerLine: true
---
> ConstructorInitializerAllOnOneLineOrOnePerLine: false
46,47c47,48
< Cpp11BracedListStyle: true
< DerivePointerAlignment: true
---
> Cpp11BracedListStyle: false
> DerivePointerAlignment: false
51c52
< IncludeCategories:
---
> IncludeCategories:
78c79
< PointerAlignment: Left
---
> PointerAlignment: Right
80c81
< SortIncludes: true
---
> SortIncludes: false

Change-Id: Ibc0ef87a516b8eae88d426dfdd7624be57e7b87c

commit | commitdiff | tree

Peter Boström [Fri, 1 Sep 2017 05:37:51 +0000 (05:37 +0000)]

Merge "Prevent data race from low-pass filter."

commit | commitdiff | tree

James Zern [Fri, 1 Sep 2017 03:09:49 +0000 (03:09 +0000)]

Merge "inv_txfm_vsx: fix loads in high-bitdepth"

commit | commitdiff | tree

Peter Boström [Thu, 31 Aug 2017 21:33:59 +0000 (14:33 -0700)]

Prevent data race from low-pass filter.

Makes main thread wait for the filter level to be picked to avoid a race
between the LPF thread and update_reference_frames(). This also
re-enables the failing tests under thread_sanitizer where this data race
was detected.

BUG=webm:1460

Change-Id: I7f5797142ea0200394309842ce3e91a480be4fbc

commit | commitdiff | tree

Peter Boström [Fri, 1 Sep 2017 01:36:22 +0000 (01:36 +0000)]

Merge "Add atomics to vp8 synchronization primitives."

commit | commitdiff | tree

Peter Boström [Fri, 25 Aug 2017 22:48:11 +0000 (15:48 -0700)]

Add atomics to vp8 synchronization primitives.

Fixes issue on iPad Pro 10.5 (and probably other places) where threads
are not properly synchronized. On x86 this data race was benign as load
and store instructions are atomic, they were being atomic in practice as
the program hasn't been observed to be miscompiled.

Such guarantees are not made outside x86, and real problems manifested
where libvpx reliably reproduced a broken bitstream for even just the
initial keyframe. This was detected in WebRTC where this device started
using multithreading (as its CPU count is higher than earlier devices,
where the problem did not manifest as single-threading was used in
practice).

This issue was not detected under thread-sanitizer bots as mutexes were
conditionally used under this platform to simulate the protected read
and write semantics that were in practice provided on x86 platforms.

This change also removes several mutexes, so encoder/decoder state is
lighter-weight after this change and we do not need to initialize so
many mutexes (this was done even on non-thread-sanitizer platforms where
they were unused).

Change-Id: If41fcb0d99944f7bbc8ec40877cdc34d672ae72a

commit | commitdiff | tree

Scott LaVarnway [Thu, 31 Aug 2017 21:34:27 +0000 (21:34 +0000)]

Merge "vpxdsp: [x86] add highbd_dc_left_predictor functions"

commit | commitdiff | tree

Jerome Jiang [Thu, 31 Aug 2017 17:16:19 +0000 (17:16 +0000)]

Merge "vp9: Skip testing duplicate zero mv in nonrd-pickmode."

commit | commitdiff | tree

Jerome Jiang [Tue, 29 Aug 2017 20:36:34 +0000 (13:36 -0700)]

vp9: Skip testing duplicate zero mv in nonrd-pickmode.

Neutral on rtc set for speed 8. Neutral on ytlive for speed 5.

Saves some computation cycles but no speed gain observed on Pixel.

Change-Id: I34c4642cd543aa89c5b9c4bff6b7113577c64c91

commit | commitdiff | tree

James Zern [Thu, 31 Aug 2017 06:47:56 +0000 (23:47 -0700)]

inv_txfm_vsx: fix loads in high-bitdepth

vec_vsx_ld -> load_tran_low

Change-Id: Id3144cdd528d2d406a515e5812e2ea9e4db64bf1

commit | commitdiff | tree

Jerome Jiang [Thu, 31 Aug 2017 01:52:42 +0000 (01:52 +0000)]

Merge "Revert "Re-enable disabled tests under TSan.""

commit | commitdiff | tree

Jerome Jiang [Wed, 30 Aug 2017 23:44:21 +0000 (23:44 +0000)]

Revert "Re-enable disabled tests under TSan."

This reverts commit df9ce12259a4e866feeb580d2e0cf9648f60d3b5.

Reason for revert:

Re-enabled tests still fail tsan in high bitdepth.

Original change's description:
> Re-enable disabled tests under TSan.
>
> These tests point to an already-fixed bug, this should no longer have a
> data race.
>
> BUG=webm:1049
>
> Change-Id: Iaedc5db8df99362bdc501b70ff7fdebf8756fdb8

TBR=jzern@google.com,pbos@chromium.org,builds@webmproject.org

# Not skipping CQ checks because original CL landed > 1 day ago.

Bug: webm:1049
Change-Id: I232f1f7726bf795b301abfb2e07cad6756642e53

commit | commitdiff | tree

Scott LaVarnway [Wed, 30 Aug 2017 16:13:14 +0000 (09:13 -0700)]

vpxdsp: [x86] add highbd_dc_left_predictor functions

C vs SSE2 speed gains:
_4x4 : ~6.49x
_8x8 : ~10.82x
_16x16 : ~7.61x
_32x32 : ~5.29x

BUG=webm:1411

Change-Id: Ibc30c50cb7139049bf05298010803499e6ef949b

commit | commitdiff | tree

Scott LaVarnway [Wed, 30 Aug 2017 11:25:07 +0000 (11:25 +0000)]

Merge "vpxdsp: [x86] add highbd_dc_top_predictor functions"

commit | commitdiff | tree

Scott LaVarnway [Tue, 29 Aug 2017 18:25:32 +0000 (11:25 -0700)]

vpxdsp: [x86] add highbd_dc_top_predictor functions

C vs SSE2 speed gains:
_4x4 : ~7.39x
_8x8 : ~11.36x
_16x16 : ~8.68x
_32x32 : ~4.33x

BUG=webm:1411

Change-Id: I7f1487cd1531d4e7f0fbb4596fed3bfb72a59d58

commit | commitdiff | tree

Jerome Jiang [Tue, 29 Aug 2017 16:45:09 +0000 (16:45 +0000)]

Merge "vp9: Speed 8: Enable skip_encode_sb"

commit | commitdiff | tree

Peter Boström [Tue, 29 Aug 2017 15:42:39 +0000 (15:42 +0000)]

Merge "Re-enable disabled tests under TSan."

commit | commitdiff | tree

Scott LaVarnway [Tue, 29 Aug 2017 14:05:13 +0000 (14:05 +0000)]

Merge "vpxdsp: [x86] add highbd_h_predictor functions"

commit | commitdiff | tree

Scott LaVarnway [Mon, 28 Aug 2017 14:26:08 +0000 (07:26 -0700)]

vpxdsp: [x86] add highbd_h_predictor functions

C vs SSE2 speed gains:
_4x4 : ~8.12x
_8x8 : ~9.71x
_16x16 : ~8.21x
_32x32 : ~5.0x

BUG=webm:1422

Change-Id: I5e8a1ed4db7b8dc539b3e2a728b0b34d8b4b1993

commit | commitdiff | tree

Jerome Jiang [Tue, 29 Aug 2017 00:05:48 +0000 (17:05 -0700)]

vp9: Speed 8: Enable skip_encode_sb

Neutral in borg tests.

Some clips show 3-4% speed gain on 2 threads on Pixel.

Change-Id: Ic959f34e44892a854551de6e9a3d9ec819ffed00

commit | commitdiff | tree

Peter Boström [Mon, 28 Aug 2017 23:23:16 +0000 (16:23 -0700)]

Re-enable disabled tests under TSan.

These tests point to an already-fixed bug, this should no longer have a
data race.

BUG=webm:1049

Change-Id: Iaedc5db8df99362bdc501b70ff7fdebf8756fdb8

commit | commitdiff | tree

Jerome Jiang [Mon, 28 Aug 2017 19:48:19 +0000 (12:48 -0700)]

vp9: Remove resolution condition for using source_sad in speed 6.

Rev d147771 fixed the test failure. So remove the resolution condition
for using source_sad in speed 6.

BUG=webm:1452

Change-Id: I1efba97e1ef5bd4de5f886299f6fcb907187abcd

commit | commitdiff | tree

Marco Paniconi [Fri, 25 Aug 2017 22:00:08 +0000 (22:00 +0000)]

Merge "vp9: Speed 6 adapt_partition for live/vbr usage."

commit | commitdiff | tree

Marco Paniconi [Fri, 25 Aug 2017 21:46:35 +0000 (21:46 +0000)]

Merge "vp9: SVC: Modify mv search condition in speed features."

commit | commitdiff | tree

Marco [Mon, 21 Aug 2017 23:39:56 +0000 (16:39 -0700)]

vp9: Speed 6 adapt_partition for live/vbr usage.

Enable adapt_partition for vbr mode for speed 6.
This allows the usage of the pickmode-based partition
(used in speed 5), but only selectively for superblocks
with high source sad, otherwise the faster variance based
partition scheme is used.

For speed 6 on ytlive set: avgPSNR/SSIM metrics up by ~0.6%,
several clips up by ~1.5%. Small/negligible decrease in speed.

Change-Id: I12f3efef6b3e059391de330fdbe5a44c2587f1f8

commit | commitdiff | tree

Marco Paniconi [Fri, 25 Aug 2017 18:20:31 +0000 (18:20 +0000)]

Merge "Revert "quantize avx: copy 32x32 implementation""

commit | commitdiff | tree

Marco [Fri, 25 Aug 2017 16:59:57 +0000 (09:59 -0700)]

vp9: SVC: Modify mv search condition in speed features.

For SVC at speed >= 7: only use the improved mv search
on base spatial layer, if top layer resolution is above 640x360.

~2.3% speedup
Small/negligible loss in avgPSNR metrics on rtc set.

Change-Id: Iaef75a57ebf1c248931bc1aa28d20b7fecac1851

commit | commitdiff | tree

Marco Paniconi [Fri, 25 Aug 2017 16:56:08 +0000 (16:56 +0000)]

Revert "quantize avx: copy 32x32 implementation"

This reverts commit f60d1dcd3de46f72bafc5eeef481bd1a4e203301.

Reason for revert: <INSERT REASONING HERE>
Failures in AVX/VP9QuantizeTest in nightly tests.
Original change's description:
> quantize avx: copy 32x32 implementation
>
> Ensure avx and ssse3 stay in sync by testing them against each other.
>
> Change-Id: I699f3b48785c83260825402d7826231f475f697c

TBR=slavarnway@google.com,johannkoenig@google.com,builds@webmproject.org

Change-Id: Ibd38636212269328317dd0721be9d25452113d1c
No-Presubmit: true
No-Tree-Checks: true
No-Try: true

commit | commitdiff | tree

Shiyou Yin [Fri, 25 Aug 2017 06:44:02 +0000 (06:44 +0000)]

Merge "vpx_dsp:loongson optimize vpx_varianceWxH_c,vpx_sub_pixel_varianceWxH_c and vpx_sub_pixel_avg_varianceWxH_c with mmi."

commit | commitdiff | tree

Marco Paniconi [Thu, 24 Aug 2017 22:26:43 +0000 (22:26 +0000)]

Merge "vp9: Adjust 16x16 splot threshold for variance partition"

commit | commitdiff | tree

Tom Finegan [Thu, 24 Aug 2017 19:11:48 +0000 (12:11 -0700)]

Make sure diff is present at configure time.

This avoids an endless build loop at vpx_version.h
creation time when diff is not present.

Change-Id: I16ae386dbdaf14f9a2b85e4c5d1aaa6c08f52a45

commit | commitdiff | tree

Johann Koenig [Thu, 24 Aug 2017 18:55:03 +0000 (18:55 +0000)]

Merge "quantize avx: copy 32x32 implementation"

commit | commitdiff | tree

Shiyou Yin [Thu, 24 Aug 2017 15:11:58 +0000 (23:11 +0800)]

vpx_dsp:loongson optimize vpx_varianceWxH_c,vpx_sub_pixel_varianceWxH_c and vpx_sub_pixel_avg_varianceWxH_c with mmi.

Change-Id: Ia576a721df6312329b599c31cfe1fb1267a9f174

commit | commitdiff | tree

Marco [Thu, 24 Aug 2017 17:36:27 +0000 (10:36 -0700)]

vp9: Adjust 16x16 splot threshold for variance partition

For speeds < 7, increase threshold that controls the split
of 16x16->8x8 blocks, for resolutions 720p and higher.

Minor change for speed 5 (since it uses reference partition scheme
which only uses variance partition as first step).
For speed 6: ~0.5% increase in avgPSNR/SSIM metrics on ytlvie set.
No change in speed.

Change-Id: I5126580973201538d8ca26a9256b93c4d11d685b

commit | commitdiff | tree

Johann Koenig [Thu, 24 Aug 2017 17:43:10 +0000 (17:43 +0000)]

Merge "quantize test: skip block was removed"

commit | commitdiff | tree

Johann [Wed, 23 Aug 2017 20:59:33 +0000 (13:59 -0700)]

quantize avx: copy 32x32 implementation

Ensure avx and ssse3 stay in sync by testing them against each other.

Change-Id: I699f3b48785c83260825402d7826231f475f697c

commit | commitdiff | tree

Johann [Wed, 16 Aug 2017 20:10:59 +0000 (13:10 -0700)]

quantize ssse3: copy implementation to intrinsics

Still does not pass tests. Does match the previous assembly, although
saving the sign before multiplying is dubious.

Change-Id: Ia163f18c755aba542d6e93f7bf7343184660df5a

commit | commitdiff | tree

Johann [Thu, 24 Aug 2017 14:21:42 +0000 (07:21 -0700)]

quantize test: skip block was removed

Change-Id: I1d93698bc27529b0544d79dd7b9fe37afa51ef87

commit | commitdiff | tree

Johann Koenig [Thu, 24 Aug 2017 14:04:29 +0000 (14:04 +0000)]

Merge "quantize test: set threshold for 32x32"

commit | commitdiff | tree

Shiyou Yin [Thu, 24 Aug 2017 00:55:11 +0000 (00:55 +0000)]

Merge "vpx_dsp:loongson optimize vpx_mseWxH_c(case 16x16,16X8,8X16,8X8) with mmi."

commit | commitdiff | tree

Marco Paniconi [Wed, 23 Aug 2017 23:09:33 +0000 (23:09 +0000)]

Merge "vp9: SVC: Skip NEWMV for small blocks for (0, 0) base_mv."

commit | commitdiff | tree

Johann [Wed, 23 Aug 2017 22:59:11 +0000 (15:59 -0700)]

quantize test: set threshold for 32x32

Change-Id: I77be617c7d7c64929dd51c6077322f4f8ad23897

commit | commitdiff | tree

Johann Koenig [Wed, 23 Aug 2017 21:14:13 +0000 (21:14 +0000)]

Merge "quantize avx: copy implementation to intrinsics"

commit | commitdiff | tree

Marco [Wed, 23 Aug 2017 20:01:57 +0000 (13:01 -0700)]

vp9: SVC: Skip NEWMV for small blocks for (0, 0) base_mv.

For SVC encoding:
average speedup ~1.5%, with small ~0.57 loss in avgPSNR metrics.

Change-Id: Icebce6f6ef4e819d7dfcf8db898c583167351de4

commit | commitdiff | tree

Scott LaVarnway [Wed, 23 Aug 2017 19:59:25 +0000 (19:59 +0000)]

Merge "vpx_dsp: get32x32var_avx2() cleanup"

commit | commitdiff | tree

Johann Koenig [Wed, 23 Aug 2017 19:20:53 +0000 (19:20 +0000)]

Merge "quantize neon: round dqcoeff towards zero"

commit | commitdiff | tree

Johann [Tue, 22 Aug 2017 22:43:35 +0000 (15:43 -0700)]

quantize avx: copy implementation to intrinsics

Adds an early exit based on ptest. Slightly slower than ssse3 in the
full case because of the extra check, but potentially faster if lots of
rows can be skipped.

Very close in speed to the assembly.

Can run in 32 bit, unlike the assembly. Allows reworking the function
prototype to use structs.

Change-Id: If80e2b9ba059370a4cad3c973196e82a97b4330e

commit | commitdiff | tree

Johann [Mon, 21 Aug 2017 18:23:49 +0000 (11:23 -0700)]

quantize neon: round dqcoeff towards zero

Add 1 if negative to get dqcoeff to round towards zero.

10-15% faster than converting to positive before shifting.

Change-Id: I01a62fd0c9bca786b6885b318bd447bb9229903d

commit | commitdiff | tree

Johann [Thu, 10 Aug 2017 22:02:22 +0000 (15:02 -0700)]

quantize fp: neon implementation

About 4x faster when values are below the dequant threshold and 10x
faster if everything needs to be calculated.

Both numbers would improve if the division for dqcoeff could be
simplified.

BUG=webm:1426

Change-Id: I8da67c1f3fcb4abed8751990c1afe00bc841f4b2

commit | commitdiff | tree

Shiyou Yin [Tue, 22 Aug 2017 00:44:36 +0000 (08:44 +0800)]

vpx_dsp:loongson optimize vpx_mseWxH_c(case 16x16,16X8,8X16,8X8) with mmi.

Change-Id: I2c782d18d9004414ba61b77238e0caf3e022d8f2

commit | commitdiff | tree

Marco Paniconi [Tue, 22 Aug 2017 22:52:05 +0000 (22:52 +0000)]

Merge "vp9: Condition lighting change detection on CBR mode."

commit | commitdiff | tree

Johann Koenig [Tue, 22 Aug 2017 22:27:56 +0000 (22:27 +0000)]

Merge changes I53f8a160,I48f282bf

* changes:
quantize ssse3: copy style from sse2
quantize sse2: copy opts from ssse3

commit | commitdiff | tree

Marco [Tue, 22 Aug 2017 21:46:39 +0000 (14:46 -0700)]

vp9: Condition lighting change detection on CBR mode.

This feature is used for the CBR RTC encoding mode
at speed >= 6. This change will exclude it for VBR mode.

For speed 6 live encoding (VBR):
avgPSNR/SSIM metrics on ytlive set up by ~1% (few clips up by 2/3%).
No change in speed.

Change-Id: I1a0dd94c334f7df309ab5a48d477d7e25355b798

commit | commitdiff | tree

Johann [Tue, 22 Aug 2017 21:25:27 +0000 (14:25 -0700)]

quantize ssse3: copy style from sse2

Change-Id: I53f8a160e640c674ea035fc112e207b6dca42598

commit | commitdiff | tree

Johann Koenig [Tue, 22 Aug 2017 20:03:02 +0000 (20:03 +0000)]

Merge "quantize: capture skip block early"

commit | commitdiff | tree

Johann [Tue, 22 Aug 2017 20:01:44 +0000 (13:01 -0700)]

quantize sse2: copy opts from ssse3

Simplify eob calculations based on ssse3 implementation.

General clean up and re-scoping.

Change-Id: I48f282bf9bd28ee9bc2c7a6779be9d45b5a3a3ee

commit | commitdiff | tree

Johann Koenig [Tue, 22 Aug 2017 19:19:14 +0000 (19:19 +0000)]

Merge changes Icfb70687,I9a963e99,Ie8ac00ef,I1272917c

* changes:
  quantize: ignore skip_block in arm
  quantize: ignore skip_block in x86
  quantize fp: ignore skip_block in arm
  quantize fp: ignore skip_block in x86

commit | commitdiff | tree

Johann [Tue, 22 Aug 2017 18:24:33 +0000 (11:24 -0700)]

quantize: capture skip block early

This should probably be handled before vp9_regular_quantize_b_4x4 even
gets called.

Fixes an assert resulting from removing skip_block from the quantize
functions.

BUG=webm:1459

Change-Id: I7f52b53f959b4654b3d4517ebda31a678f4d0fde

commit | commitdiff | tree

James Zern [Tue, 22 Aug 2017 00:48:39 +0000 (00:48 +0000)]

Merge "ppc: Add vpx_idct16x16_256_add_vsx"

commit | commitdiff | tree

Shiyou Yin [Tue, 22 Aug 2017 00:37:23 +0000 (00:37 +0000)]

Merge "vpx_dsp:loongson optimize vpx_subtract_block_c (case 4x4,8x8,16x16) with mmi."

commit | commitdiff | tree

Johann [Mon, 21 Aug 2017 18:15:39 +0000 (11:15 -0700)]

quantize: ignore skip_block in arm

Change-Id: Icfb70687476b2edb25d255793ba325b261d40584

commit | commitdiff | tree

Johann [Mon, 21 Aug 2017 18:15:23 +0000 (11:15 -0700)]

quantize: ignore skip_block in x86

Change-Id: I9a963e99f08761f0c8d6a305619270b2f1c4edf8

commit | commitdiff | tree

Johann [Mon, 21 Aug 2017 18:14:51 +0000 (11:14 -0700)]

quantize fp: ignore skip_block in arm

Change-Id: Ie8ac00efa826eead2a227726a1add816e04ff147

commit | commitdiff | tree

Johann [Mon, 21 Aug 2017 18:14:39 +0000 (11:14 -0700)]

quantize fp: ignore skip_block in x86

Change-Id: I1272917c49cf6e6710e52c36535b2fc8c8dced78

commit | commitdiff | tree

Johann [Wed, 2 Aug 2017 21:28:05 +0000 (14:28 -0700)]

quantize test: test _fp_ version of quantize

None of the x86 optimizations pass the tests.

Change-Id: Ic67f2ba1977b657e68f2a13b0711fc5fcbafd909

commit | commitdiff | tree

Johann [Wed, 16 Aug 2017 20:34:14 +0000 (13:34 -0700)]

Remove skip_block from quantize

This condition is handled before this code is reached. The ssse3 version
of the function has always crashed when attempting to handle the
skip_block condition.

Add assert() and comments regarding the usage of skip_block.

Removing the parameter is a fairly involved process so leave it be for
the moment.

Change-Id: Ib299f6fc6589d7ee102262cc74a7aeb60110bc5a

commit | commitdiff | tree

Scott LaVarnway [Fri, 18 Aug 2017 20:44:09 +0000 (13:44 -0700)]

vpx_dsp: get32x32var_avx2() cleanup

renamed to get32x16var_avx2()

BUG=webm:1404

Change-Id: Icb8f3986c9c9c646e13a69430db7235fc7e1a036

commit | commitdiff | tree

Scott LaVarnway [Fri, 18 Aug 2017 20:30:59 +0000 (20:30 +0000)]

Merge "vpx_dsp: vpx_get16x16var_avx2() cleanup"

Unnamed repository; edit this file 'description' to name the repository.

RSS Atom