From: Johann Date: Wed, 22 Jun 2016 23:08:10 +0000 (-0700) Subject: Merge changes from libvpx/master by cherry-pick X-Git-Url: https://granicus.if.org/sourcecode?a=commitdiff_plain;h=2967bf355e1b8e3e89ab0e3cd5196ec930b97182;p=libvpx Merge changes from libvpx/master by cherry-pick This commit bring all up-to-date changes from master that are applicable to nextgenv2. Due to the remove VP10 code in master, we had to cherry pick the following commits to get those changes: Add default flags for arm64/armv8 builds Allows building simple targets with sane default flags. For example, using the Android arm64 toolchain from the NDK: https://developer.android.com/ndk/guides/standalone_toolchain.html ./build/tools/make-standalone-toolchain.sh --arch=arm64 \ --platform=android-24 --install-dir=/tmp/arm64 CROSS=/tmp/arm64/bin/aarch64-linux-android- \ ~/libvpx/configure --target=arm64-linux-gcc --disable-multithread BUG=webm:1143 vpx_lpf_horizontal_4_sse2: Remove dead load. Change-Id: I51026c52baa1f0881fcd5b68e1fdf08a2dc0916e Fail early when android target does not include --sdk-path Change-Id: I07e7e63476a2e32e3aae123abdee8b7bbbdc6a8c configure: clean up var style and set_all usage Use quotes whenever possible and {} always for variables. Replace multiple set_all calls with *able_feature(). Conflicts: build/make/configure.sh vp9-svc: Remove some unneeded code/comment. datarate_test,DatarateTestLarge: normalize bits type quiets a msvc warning: conversion from 'const int64_t' to 'size_t', possible loss of data mips added p6600 cpu support Removed -funroll-loops psnr.c: use int64_t for sum of differences Since the values can be negative. *.asm: normalize label format add a trailing ':', though it's optional with the tools we support, it's more common to use it to mark a label. this also quiets the orphan-labels warning with nasm/yasm. BUG=b/29583530 Prevent negative variance Due to rounding, hbd variance may become negative. This commit put in check and clamp of negative values to 0. configure: remove old visual studio support (<2010) BUG=b/29583530 Conflicts: configure configure: restore vs_version variable inadvertently lost in the final patchset of: 078dff7 configure: remove old visual studio support (<2010) this prevents an empty CONFIG_VS_VERSION and avoids make failure Require x86inc.asm Force enable x86inc.asm when building for x86. Previously there were compatibility issues so a flag was added to simplify disabling this code. The known issues have been resolved and x86inc.asm is the preferred abstraction layer (over x86_abi_support.asm). BUG=b:29583530 convolve_test: fix byte offsets in hbd build CONVERT_TO_BYTEPTR(x) was corrected in: 003a9d2 Port metric computation changes from nextgenv2 to use the more common (x) within the expansion. offsets should occur after converting the pointer to the desired type. + factorized some common expressions Conflicts: test/convolve_test.cc vpx_dsp: remove x86inc.asm distinction BUG=b:29583530 Conflicts: vpx_dsp/vpx_dsp.mk vpx_dsp/vpx_dsp_rtcd_defs.pl vpx_dsp/x86/highbd_variance_sse2.c vpx_dsp/x86/variance_sse2.c test: remove x86inc.asm distinction BUG=b:29583530 Conflicts: test/vp9_subtract_test.cc configure: remove x86inc.asm distinction BUG=b:29583530 Change-Id: I59a1192142e89a6a36b906f65a491a734e603617 Update vpx subpixel 1d filter ssse3 asm Speed test shows the new vertical filters have degradation on Celeron Chromebook. Added "X86_SUBPIX_VFILTER_PREFER_SLOW_CELERON" to control the vertical filters activated code. Now just simply active the code without degradation on Celeron. Later there should be 2 set of vertical filters ssse3 functions, and let jump table to choose based on CPU type. improve vpx_filter_block1d* based on replace paddsw+psrlw to pmulhrsw Make set_reference control API work in VP9 Moved the API patch from NextGenv2. An example was included. To try it, for example, run the following command: $ examples/vpx_cx_set_ref vp9 352 288 in.yuv out.ivf 4 30 Conflicts: examples.mk examples/vpx_cx_set_ref.c test/cx_set_ref.sh vp9/decoder/vp9_decoder.c deblock filter : moved from vp8 code branch The deblocking filters used in vp8 have been moved to vpx_dsp for use by both vp8 and vp9. vpx_thread.[hc]: update webp source reference + drop the blob hash, the updated reference will be updated in the commit message BUG=b/29583578 vpx_thread: use native windows cond var if available BUG=b/29583578 original webp change: commit 110ad5835ecd66995d0e7f66dca1b90dea595f5a Author: James Zern Date: Mon Nov 23 19:49:58 2015 -0800 thread: use native windows cond var if available Vista / Server 2008 and up. no speed difference observed. 100644 blob 4fc372b7bc6980a9ed3618c8cce5b67ed7b0f412 src/utils/thread.c 100644 blob 840831185502d42a3246e4b7ff870121c8064791 src/utils/thread.h vpx_thread: use InitializeCriticalSectionEx if available BUG=b/29583578 original webp change: commit 63fadc9ffacc77d4617526a50c696d21d558a70b Author: James Zern Date: Mon Nov 23 20:38:46 2015 -0800 thread: use InitializeCriticalSectionEx if available Windows Vista / Server 2008 and up 100644 blob f84207d89b3a6bb98bfe8f3fa55cad72dfd061ff src/utils/thread.c 100644 blob 840831185502d42a3246e4b7ff870121c8064791 src/utils/thread.h vpx_thread: use WaitForSingleObjectEx if available BUG=b/29583578 original webp change: commit 0fd0e12bfe83f16ce4f1c038b251ccbc13c62ac2 Author: James Zern Date: Mon Nov 23 20:40:26 2015 -0800 thread: use WaitForSingleObjectEx if available Windows XP and up 100644 blob d58f74e5523dbc985fc531cf5f0833f1e9157cf0 src/utils/thread.c 100644 blob 840831185502d42a3246e4b7ff870121c8064791 src/utils/thread.h vpx_thread: use CreateThread for windows phone BUG=b/29583578 original webp change: commit d2afe974f9d751de144ef09d31255aea13b442c0 Author: James Zern Date: Mon Nov 23 20:41:26 2015 -0800 thread: use CreateThread for windows phone _beginthreadex is unavailable for winrt/uwp Change-Id: Ie7412a568278ac67f0047f1764e2521193d74d4d 100644 blob 93f7622797f05f6acc1126e8296c481d276e4047 src/utils/thread.c 100644 blob 840831185502d42a3246e4b7ff870121c8064791 src/utils/thread.h vp9_postproc.c missing extern. BUG=webm:1256 deblock: missing const on extern const. postproc - move filling of noise buffer to vpx_dsp. Fix encoder crashes for odd size input clean-up vp9_intrapred_test remove tuple and overkill VP9IntraPredBase class. postproc: noise style fixes. gtest-all.cc: quiet an unused variable warning under windows / mingw builds vp9_intrapred_test: follow-up cleanup address few comments from ce050afaf3e288895c3bee4160336e2d2133b6ea Change-Id: I3eece7efa9335f4210303993ef6c1857ad5c29c8 --- diff --git a/build/make/configure.sh b/build/make/configure.sh index ee887ab3f..86afb8857 100644 --- a/build/make/configure.sh +++ b/build/make/configure.sh @@ -186,24 +186,6 @@ add_extralibs() { # Boolean Manipulation Functions # -enable_codec(){ - enabled $1 || echo " enabling $1" - set_all yes $1 - - is_in $1 vp8 vp9 vp10 && \ - set_all yes $1_encoder && \ - set_all yes $1_decoder -} - -disable_codec(){ - disabled $1 || echo " disabling $1" - set_all no $1 - - is_in $1 vp8 vp9 vp10 && \ - set_all no $1_encoder && \ - set_all no $1_decoder -} - enable_feature(){ set_all yes $* } @@ -220,6 +202,20 @@ disabled(){ eval test "x\$$1" = "xno" } +enable_codec(){ + enabled "${1}" || echo " enabling ${1}" + enable_feature "${1}" + + is_in "${1}" vp8 vp9 vp10 && enable_feature "${1}_encoder" "${1}_decoder" +} + +disable_codec(){ + disabled "${1}" || echo " disabling ${1}" + disable_feature "${1}" + + is_in "${1}" vp8 vp9 vp10 && disable_feature "${1}_encoder" "${1}_decoder" +} + # Iterates through positional parameters, checks to confirm the parameter has # not been explicitly (force) disabled, and enables the setting controlled by # the parameter when the setting is not disabled. @@ -945,6 +941,9 @@ EOF check_add_cflags -mfpu=neon #-ftree-vectorize check_add_asflags -mfpu=neon fi + elif [ ${tgt_isa} = "arm64" ] || [ ${tgt_isa} = "armv8" ]; then + check_add_cflags -march=armv8-a + check_add_asflags -march=armv8-a else check_add_cflags -march=${tgt_isa} check_add_asflags -march=${tgt_isa} @@ -1012,6 +1011,10 @@ EOF ;; android*) + if [ -z "${sdk_path}" ]; then + die "Must specify --sdk-path for Android builds." + fi + SDK_PATH=${sdk_path} COMPILER_LOCATION=`find "${SDK_PATH}" \ -name "arm-linux-androideabi-gcc*" -print -quit` @@ -1150,13 +1153,13 @@ EOF if [ -n "${tune_cpu}" ]; then case ${tune_cpu} in p5600) - check_add_cflags -mips32r5 -funroll-loops -mload-store-pairs + check_add_cflags -mips32r5 -mload-store-pairs check_add_cflags -msched-weight -mhard-float -mfp64 check_add_asflags -mips32r5 -mhard-float -mfp64 check_add_ldflags -mfp64 ;; - i6400) - check_add_cflags -mips64r6 -mabi=64 -funroll-loops -msched-weight + i6400|p6600) + check_add_cflags -mips64r6 -mabi=64 -msched-weight check_add_cflags -mload-store-pairs -mhard-float -mfp64 check_add_asflags -mips64r6 -mabi=64 -mhard-float -mfp64 check_add_ldflags -mips64r6 -mabi=64 -mfp64 @@ -1393,10 +1396,6 @@ EOF fi fi - if [ "${tgt_isa}" = "x86_64" ] || [ "${tgt_isa}" = "x86" ]; then - soft_enable use_x86inc - fi - # Position Independent Code (PIC) support, for building relocatable # shared objects enabled gcc && enabled pic && check_add_cflags -fPIC diff --git a/configure b/configure index ae9bb5d0c..cf6a7c377 100755 --- a/configure +++ b/configure @@ -98,11 +98,11 @@ EOF # all_platforms is a list of all supported target platforms. Maintain # alphabetically by architecture, generic-gnu last. +all_platforms="${all_platforms} arm64-darwin-gcc" +all_platforms="${all_platforms} arm64-linux-gcc" all_platforms="${all_platforms} armv6-linux-rvct" all_platforms="${all_platforms} armv6-linux-gcc" all_platforms="${all_platforms} armv6-none-rvct" -all_platforms="${all_platforms} arm64-darwin-gcc" -all_platforms="${all_platforms} arm64-linux-gcc" all_platforms="${all_platforms} armv7-android-gcc" #neon Cortex-A8 all_platforms="${all_platforms} armv7-darwin-gcc" #neon Cortex-A8 all_platforms="${all_platforms} armv7-linux-rvct" #neon Cortex-A8 @@ -112,6 +112,7 @@ all_platforms="${all_platforms} armv7-win32-vs11" all_platforms="${all_platforms} armv7-win32-vs12" all_platforms="${all_platforms} armv7-win32-vs14" all_platforms="${all_platforms} armv7s-darwin-gcc" +all_platforms="${all_platforms} armv8-linux-gcc" all_platforms="${all_platforms} mips32-linux-gcc" all_platforms="${all_platforms} mips64-linux-gcc" all_platforms="${all_platforms} sparc-solaris-gcc" @@ -293,7 +294,6 @@ CONFIG_LIST=" install_bins install_libs install_srcs - use_x86inc debug gprof gcov @@ -355,7 +355,6 @@ CMDLINE_SELECT=" gprof gcov pic - use_x86inc optimizations ccache runtime_cpu_detect diff --git a/test/add_noise_test.cc b/test/add_noise_test.cc index e9945c409..35aaadfa7 100644 --- a/test/add_noise_test.cc +++ b/test/add_noise_test.cc @@ -13,6 +13,7 @@ #include "third_party/googletest/src/include/gtest/gtest.h" #include "./vpx_dsp_rtcd.h" #include "vpx/vpx_integer.h" +#include "vpx_dsp/postproc.h" #include "vpx_mem/vpx_mem.h" namespace { @@ -40,50 +41,6 @@ double stddev6(char a, char b, char c, char d, char e, char f) { return sqrt(v); } -// TODO(jimbankoski): The following 2 functions are duplicated in each codec. -// For now the vp9 one has been copied into the test as is. We should normalize -// these in vpx_dsp and not have 3 copies of these unless there is different -// noise we add for each codec. - -double gaussian(double sigma, double mu, double x) { - return 1 / (sigma * sqrt(2.0 * 3.14159265)) * - (exp(-(x - mu) * (x - mu) / (2 * sigma * sigma))); -} - -int setup_noise(int size_noise, char *noise) { - char char_dist[300]; - const int ai = 4; - const int qi = 24; - const double sigma = ai + .5 + .6 * (63 - qi) / 63.0; - - /* set up a lookup table of 256 entries that matches - * a gaussian distribution with sigma determined by q. - */ - int next = 0; - - for (int i = -32; i < 32; i++) { - int a_i = (int) (0.5 + 256 * gaussian(sigma, 0, i)); - - if (a_i) { - for (int j = 0; j < a_i; j++) { - char_dist[next + j] = (char)(i); - } - - next = next + a_i; - } - } - - for (; next < 256; next++) - char_dist[next] = 0; - - for (int i = 0; i < size_noise; i++) { - noise[i] = char_dist[rand() & 0xff]; // NOLINT - } - - // Returns the most negative value in distribution. - return char_dist[0]; -} - TEST_P(AddNoiseTest, CheckNoiseAdded) { DECLARE_ALIGNED(16, char, blackclamp[16]); DECLARE_ALIGNED(16, char, whiteclamp[16]); @@ -92,12 +49,12 @@ TEST_P(AddNoiseTest, CheckNoiseAdded) { const int height = 64; const int image_size = width * height; char noise[3072]; + const int clamp = vpx_setup_noise(4.4, sizeof(noise), noise); - const int clamp = setup_noise(3072, noise); for (int i = 0; i < 16; i++) { - blackclamp[i] = -clamp; - whiteclamp[i] = -clamp; - bothclamp[i] = -2 * clamp; + blackclamp[i] = clamp; + whiteclamp[i] = clamp; + bothclamp[i] = 2 * clamp; } uint8_t *const s = reinterpret_cast(vpx_calloc(image_size, 1)); @@ -127,7 +84,7 @@ TEST_P(AddNoiseTest, CheckNoiseAdded) { // Check to make sure don't roll over. for (int i = 0; i < image_size; ++i) { - EXPECT_GT((int)s[i], 10) << "i = " << i; + EXPECT_GT(static_cast(s[i]), clamp) << "i = " << i; } // Initialize pixels in the image to 0 and check for roll under. @@ -138,7 +95,7 @@ TEST_P(AddNoiseTest, CheckNoiseAdded) { // Check to make sure don't roll under. for (int i = 0; i < image_size; ++i) { - EXPECT_LT((int)s[i], 245) << "i = " << i; + EXPECT_LT(static_cast(s[i]), 255 - clamp) << "i = " << i; } vpx_free(s); @@ -153,11 +110,12 @@ TEST_P(AddNoiseTest, CheckCvsAssembly) { const int image_size = width * height; char noise[3072]; - const int clamp = setup_noise(3072, noise); + const int clamp = vpx_setup_noise(4.4, sizeof(noise), noise); + for (int i = 0; i < 16; i++) { - blackclamp[i] = -clamp; - whiteclamp[i] = -clamp; - bothclamp[i] = -2 * clamp; + blackclamp[i] = clamp; + whiteclamp[i] = clamp; + bothclamp[i] = 2 * clamp; } uint8_t *const s = reinterpret_cast(vpx_calloc(image_size, 1)); @@ -175,7 +133,7 @@ TEST_P(AddNoiseTest, CheckCvsAssembly) { width, height, width)); for (int i = 0; i < image_size; ++i) { - EXPECT_EQ((int)s[i], (int)d[i]) << "i = " << i; + EXPECT_EQ(static_cast(s[i]), static_cast(d[i])) << "i = " << i; } vpx_free(d); diff --git a/test/convolve_test.cc b/test/convolve_test.cc index 21f185a93..70802ecc5 100644 --- a/test/convolve_test.cc +++ b/test/convolve_test.cc @@ -453,7 +453,7 @@ class ConvolveTest : public ::testing::TestWithParam { memcpy(output_ref_, output_, kOutputBufferSize); #if CONFIG_VP9_HIGHBITDEPTH memcpy(output16_ref_, output16_, - kOutputBufferSize * sizeof(*output16_ref_)); + kOutputBufferSize * sizeof(output16_ref_[0])); #endif } @@ -465,41 +465,41 @@ class ConvolveTest : public ::testing::TestWithParam { } uint8_t *input() const { - const int index = BorderTop() * kOuterBlockSize + BorderLeft(); + const int offset = BorderTop() * kOuterBlockSize + BorderLeft(); #if CONFIG_VP9_HIGHBITDEPTH if (UUT_->use_highbd_ == 0) { - return input_ + index; + return input_ + offset; } else { - return CONVERT_TO_BYTEPTR(input16_) + index; + return CONVERT_TO_BYTEPTR(input16_) + offset; } #else - return input_ + index; + return input_ + offset; #endif } uint8_t *output() const { - const int index = BorderTop() * kOuterBlockSize + BorderLeft(); + const int offset = BorderTop() * kOuterBlockSize + BorderLeft(); #if CONFIG_VP9_HIGHBITDEPTH if (UUT_->use_highbd_ == 0) { - return output_ + index; + return output_ + offset; } else { - return CONVERT_TO_BYTEPTR(output16_ + index); + return CONVERT_TO_BYTEPTR(output16_) + offset; } #else - return output_ + index; + return output_ + offset; #endif } uint8_t *output_ref() const { - const int index = BorderTop() * kOuterBlockSize + BorderLeft(); + const int offset = BorderTop() * kOuterBlockSize + BorderLeft(); #if CONFIG_VP9_HIGHBITDEPTH if (UUT_->use_highbd_ == 0) { - return output_ref_ + index; + return output_ref_ + offset; } else { - return CONVERT_TO_BYTEPTR(output16_ref_ + index); + return CONVERT_TO_BYTEPTR(output16_ref_) + offset; } #else - return output_ref_ + index; + return output_ref_ + offset; #endif } @@ -1011,14 +1011,12 @@ void wrap_ ## func ## _ ## bd(const uint8_t *src, ptrdiff_t src_stride, \ w, h, bd); \ } #if HAVE_SSE2 && ARCH_X86_64 -#if CONFIG_USE_X86INC WRAP(convolve_copy_sse2, 8) WRAP(convolve_avg_sse2, 8) WRAP(convolve_copy_sse2, 10) WRAP(convolve_avg_sse2, 10) WRAP(convolve_copy_sse2, 12) WRAP(convolve_avg_sse2, 12) -#endif // CONFIG_USE_X86INC WRAP(convolve8_horiz_sse2, 8) WRAP(convolve8_avg_horiz_sse2, 8) WRAP(convolve8_vert_sse2, 8) @@ -1112,11 +1110,7 @@ INSTANTIATE_TEST_CASE_P(C, ConvolveTest, #if HAVE_SSE2 && ARCH_X86_64 #if CONFIG_VP9_HIGHBITDEPTH const ConvolveFunctions convolve8_sse2( -#if CONFIG_USE_X86INC wrap_convolve_copy_sse2_8, wrap_convolve_avg_sse2_8, -#else - wrap_convolve_copy_c_8, wrap_convolve_avg_c_8, -#endif // CONFIG_USE_X86INC wrap_convolve8_horiz_sse2_8, wrap_convolve8_avg_horiz_sse2_8, wrap_convolve8_vert_sse2_8, wrap_convolve8_avg_vert_sse2_8, wrap_convolve8_sse2_8, wrap_convolve8_avg_sse2_8, @@ -1124,11 +1118,7 @@ const ConvolveFunctions convolve8_sse2( wrap_convolve8_vert_sse2_8, wrap_convolve8_avg_vert_sse2_8, wrap_convolve8_sse2_8, wrap_convolve8_avg_sse2_8, 8); const ConvolveFunctions convolve10_sse2( -#if CONFIG_USE_X86INC wrap_convolve_copy_sse2_10, wrap_convolve_avg_sse2_10, -#else - wrap_convolve_copy_c_10, wrap_convolve_avg_c_10, -#endif // CONFIG_USE_X86INC wrap_convolve8_horiz_sse2_10, wrap_convolve8_avg_horiz_sse2_10, wrap_convolve8_vert_sse2_10, wrap_convolve8_avg_vert_sse2_10, wrap_convolve8_sse2_10, wrap_convolve8_avg_sse2_10, @@ -1136,11 +1126,7 @@ const ConvolveFunctions convolve10_sse2( wrap_convolve8_vert_sse2_10, wrap_convolve8_avg_vert_sse2_10, wrap_convolve8_sse2_10, wrap_convolve8_avg_sse2_10, 10); const ConvolveFunctions convolve12_sse2( -#if CONFIG_USE_X86INC wrap_convolve_copy_sse2_12, wrap_convolve_avg_sse2_12, -#else - wrap_convolve_copy_c_12, wrap_convolve_avg_c_12, -#endif // CONFIG_USE_X86INC wrap_convolve8_horiz_sse2_12, wrap_convolve8_avg_horiz_sse2_12, wrap_convolve8_vert_sse2_12, wrap_convolve8_avg_vert_sse2_12, wrap_convolve8_sse2_12, wrap_convolve8_avg_sse2_12, @@ -1154,11 +1140,7 @@ const ConvolveParam kArrayConvolve_sse2[] = { }; #else const ConvolveFunctions convolve8_sse2( -#if CONFIG_USE_X86INC vpx_convolve_copy_sse2, vpx_convolve_avg_sse2, -#else - vpx_convolve_copy_c, vpx_convolve_avg_c, -#endif // CONFIG_USE_X86INC vpx_convolve8_horiz_sse2, vpx_convolve8_avg_horiz_sse2, vpx_convolve8_vert_sse2, vpx_convolve8_avg_vert_sse2, vpx_convolve8_sse2, vpx_convolve8_avg_sse2, diff --git a/test/cx_set_ref.sh b/test/cx_set_ref.sh index c21894eaf..b319bbffb 100755 --- a/test/cx_set_ref.sh +++ b/test/cx_set_ref.sh @@ -1,6 +1,6 @@ #!/bin/sh ## -## Copyright (c) 2014 The WebM project authors. All Rights Reserved. +## Copyright (c) 2016 The WebM project authors. All Rights Reserved. ## ## Use of this source code is governed by a BSD-style license ## that can be found in the LICENSE file in the root of the source diff --git a/test/datarate_test.cc b/test/datarate_test.cc index 2f1db9c64..220cbf3a3 100644 --- a/test/datarate_test.cc +++ b/test/datarate_test.cc @@ -135,7 +135,7 @@ class DatarateTestLarge : public ::libvpx_test::EncoderTest, double duration_; double file_datarate_; double effective_datarate_; - size_t bits_in_last_frame_; + int64_t bits_in_last_frame_; int denoiser_on_; int denoiser_offon_test_; int denoiser_offon_period_; diff --git a/test/fdct4x4_test.cc b/test/fdct4x4_test.cc index f6b65676e..236f75e3b 100644 --- a/test/fdct4x4_test.cc +++ b/test/fdct4x4_test.cc @@ -302,7 +302,7 @@ INSTANTIATE_TEST_CASE_P( make_tuple(&vp9_fht4x4_c, &vp9_iht4x4_16_add_neon, 3, VPX_BITS_8, 16))); #endif // HAVE_NEON && !CONFIG_VP9_HIGHBITDEPTH && !CONFIG_EMULATE_HARDWARE -#if CONFIG_USE_X86INC && HAVE_SSE2 && !CONFIG_EMULATE_HARDWARE +#if HAVE_SSE2 && !CONFIG_EMULATE_HARDWARE INSTANTIATE_TEST_CASE_P( SSE2, Trans4x4WHT, ::testing::Values( diff --git a/test/fdct8x8_test.cc b/test/fdct8x8_test.cc index 29f215817..083ee6628 100644 --- a/test/fdct8x8_test.cc +++ b/test/fdct8x8_test.cc @@ -766,7 +766,7 @@ INSTANTIATE_TEST_CASE_P( &idct8x8_64_add_12_sse2, 6225, VPX_BITS_12))); #endif // HAVE_SSE2 && CONFIG_VP9_HIGHBITDEPTH && !CONFIG_EMULATE_HARDWARE -#if HAVE_SSSE3 && CONFIG_USE_X86INC && ARCH_X86_64 && \ +#if HAVE_SSSE3 && ARCH_X86_64 && \ !CONFIG_VP9_HIGHBITDEPTH && !CONFIG_EMULATE_HARDWARE INSTANTIATE_TEST_CASE_P( SSSE3, FwdTrans8x8DCT, diff --git a/test/hadamard_test.cc b/test/hadamard_test.cc index 7a5bd5b4c..b8eec523f 100644 --- a/test/hadamard_test.cc +++ b/test/hadamard_test.cc @@ -152,10 +152,10 @@ INSTANTIATE_TEST_CASE_P(SSE2, Hadamard8x8Test, ::testing::Values(&vpx_hadamard_8x8_sse2)); #endif // HAVE_SSE2 -#if HAVE_SSSE3 && CONFIG_USE_X86INC && ARCH_X86_64 +#if HAVE_SSSE3 && ARCH_X86_64 INSTANTIATE_TEST_CASE_P(SSSE3, Hadamard8x8Test, ::testing::Values(&vpx_hadamard_8x8_ssse3)); -#endif // HAVE_SSSE3 && CONFIG_USE_X86INC && ARCH_X86_64 +#endif // HAVE_SSSE3 && ARCH_X86_64 #if HAVE_NEON INSTANTIATE_TEST_CASE_P(NEON, Hadamard8x8Test, diff --git a/test/partial_idct_test.cc b/test/partial_idct_test.cc index 6c824128b..1efb1a4eb 100644 --- a/test/partial_idct_test.cc +++ b/test/partial_idct_test.cc @@ -295,7 +295,7 @@ INSTANTIATE_TEST_CASE_P( TX_4X4, 1))); #endif -#if HAVE_SSSE3 && CONFIG_USE_X86INC && ARCH_X86_64 && \ +#if HAVE_SSSE3 && ARCH_X86_64 && \ !CONFIG_VP9_HIGHBITDEPTH && !CONFIG_EMULATE_HARDWARE INSTANTIATE_TEST_CASE_P( SSSE3_64, PartialIDctTest, diff --git a/test/pp_filter_test.cc b/test/pp_filter_test.cc index e4688dd8c..89349e48b 100644 --- a/test/pp_filter_test.cc +++ b/test/pp_filter_test.cc @@ -11,7 +11,7 @@ #include "test/register_state_check.h" #include "third_party/googletest/src/include/gtest/gtest.h" #include "./vpx_config.h" -#include "./vp8_rtcd.h" +#include "./vpx_dsp_rtcd.h" #include "vpx/vpx_integer.h" #include "vpx_mem/vpx_mem.h" @@ -25,7 +25,7 @@ typedef void (*PostProcFunc)(unsigned char *src_ptr, namespace { -class VP8PostProcessingFilterTest +class VPxPostProcessingFilterTest : public ::testing::TestWithParam { public: virtual void TearDown() { @@ -33,10 +33,10 @@ class VP8PostProcessingFilterTest } }; -// Test routine for the VP8 post-processing function -// vp8_post_proc_down_and_across_mb_row_c. +// Test routine for the VPx post-processing function +// vpx_post_proc_down_and_across_mb_row_c. -TEST_P(VP8PostProcessingFilterTest, FilterOutputCheck) { +TEST_P(VPxPostProcessingFilterTest, FilterOutputCheck) { // Size of the underlying data block that will be filtered. const int block_width = 16; const int block_height = 16; @@ -92,7 +92,7 @@ TEST_P(VP8PostProcessingFilterTest, FilterOutputCheck) { for (int i = 0; i < block_height; ++i) { for (int j = 0; j < block_width; ++j) { EXPECT_EQ(expected_data[i], pixel_ptr[j]) - << "VP8PostProcessingFilterTest failed with invalid filter output"; + << "VPxPostProcessingFilterTest failed with invalid filter output"; } pixel_ptr += output_stride; } @@ -102,17 +102,17 @@ TEST_P(VP8PostProcessingFilterTest, FilterOutputCheck) { vpx_free(flimits); }; -INSTANTIATE_TEST_CASE_P(C, VP8PostProcessingFilterTest, - ::testing::Values(vp8_post_proc_down_and_across_mb_row_c)); +INSTANTIATE_TEST_CASE_P(C, VPxPostProcessingFilterTest, + ::testing::Values(vpx_post_proc_down_and_across_mb_row_c)); #if HAVE_SSE2 -INSTANTIATE_TEST_CASE_P(SSE2, VP8PostProcessingFilterTest, - ::testing::Values(vp8_post_proc_down_and_across_mb_row_sse2)); +INSTANTIATE_TEST_CASE_P(SSE2, VPxPostProcessingFilterTest, + ::testing::Values(vpx_post_proc_down_and_across_mb_row_sse2)); #endif #if HAVE_MSA -INSTANTIATE_TEST_CASE_P(MSA, VP8PostProcessingFilterTest, - ::testing::Values(vp8_post_proc_down_and_across_mb_row_msa)); +INSTANTIATE_TEST_CASE_P(MSA, VPxPostProcessingFilterTest, + ::testing::Values(vpx_post_proc_down_and_across_mb_row_msa)); #endif } // namespace diff --git a/test/sad_test.cc b/test/sad_test.cc index f27729452..36f777d9e 100644 --- a/test/sad_test.cc +++ b/test/sad_test.cc @@ -750,7 +750,6 @@ INSTANTIATE_TEST_CASE_P(NEON, SADx4Test, ::testing::ValuesIn(x4d_neon_tests)); //------------------------------------------------------------------------------ // x86 functions #if HAVE_SSE2 -#if CONFIG_USE_X86INC const SadMxNParam sse2_tests[] = { #if CONFIG_VP10 && CONFIG_EXT_PARTITION make_tuple(128, 128, &vpx_sad128x128_sse2, -1), @@ -927,7 +926,6 @@ const SadMxNx4Param x4d_sse2_tests[] = { #endif // CONFIG_VP9_HIGHBITDEPTH }; INSTANTIATE_TEST_CASE_P(SSE2, SADx4Test, ::testing::ValuesIn(x4d_sse2_tests)); -#endif // CONFIG_USE_X86INC #endif // HAVE_SSE2 #if HAVE_SSE3 diff --git a/test/test_intra_pred_speed.cc b/test/test_intra_pred_speed.cc index 2acf744d5..8928bf87c 100644 --- a/test/test_intra_pred_speed.cc +++ b/test/test_intra_pred_speed.cc @@ -187,21 +187,21 @@ INTRA_PRED_TEST(C, TestIntraPred4, vpx_dc_predictor_4x4_c, vpx_d153_predictor_4x4_c, vpx_d207_predictor_4x4_c, vpx_d63_predictor_4x4_c, vpx_tm_predictor_4x4_c) -#if HAVE_SSE2 && CONFIG_USE_X86INC +#if HAVE_SSE2 INTRA_PRED_TEST(SSE2, TestIntraPred4, vpx_dc_predictor_4x4_sse2, vpx_dc_left_predictor_4x4_sse2, vpx_dc_top_predictor_4x4_sse2, vpx_dc_128_predictor_4x4_sse2, vpx_v_predictor_4x4_sse2, vpx_h_predictor_4x4_sse2, vpx_d45_predictor_4x4_sse2, NULL, NULL, NULL, vpx_d207_predictor_4x4_sse2, NULL, vpx_tm_predictor_4x4_sse2) -#endif // HAVE_SSE2 && CONFIG_USE_X86INC +#endif // HAVE_SSE2 -#if HAVE_SSSE3 && CONFIG_USE_X86INC +#if HAVE_SSSE3 INTRA_PRED_TEST(SSSE3, TestIntraPred4, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, vpx_d153_predictor_4x4_ssse3, NULL, vpx_d63_predictor_4x4_ssse3, NULL) -#endif // HAVE_SSSE3 && CONFIG_USE_X86INC +#endif // HAVE_SSSE3 #if HAVE_DSPR2 INTRA_PRED_TEST(DSPR2, TestIntraPred4, vpx_dc_predictor_4x4_dspr2, NULL, NULL, @@ -237,20 +237,20 @@ INTRA_PRED_TEST(C, TestIntraPred8, vpx_dc_predictor_8x8_c, vpx_d153_predictor_8x8_c, vpx_d207_predictor_8x8_c, vpx_d63_predictor_8x8_c, vpx_tm_predictor_8x8_c) -#if HAVE_SSE2 && CONFIG_USE_X86INC +#if HAVE_SSE2 INTRA_PRED_TEST(SSE2, TestIntraPred8, vpx_dc_predictor_8x8_sse2, vpx_dc_left_predictor_8x8_sse2, vpx_dc_top_predictor_8x8_sse2, vpx_dc_128_predictor_8x8_sse2, vpx_v_predictor_8x8_sse2, vpx_h_predictor_8x8_sse2, vpx_d45_predictor_8x8_sse2, NULL, NULL, NULL, NULL, NULL, vpx_tm_predictor_8x8_sse2) -#endif // HAVE_SSE2 && CONFIG_USE_X86INC +#endif // HAVE_SSE2 -#if HAVE_SSSE3 && CONFIG_USE_X86INC +#if HAVE_SSSE3 INTRA_PRED_TEST(SSSE3, TestIntraPred8, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, vpx_d153_predictor_8x8_ssse3, vpx_d207_predictor_8x8_ssse3, vpx_d63_predictor_8x8_ssse3, NULL) -#endif // HAVE_SSSE3 && CONFIG_USE_X86INC +#endif // HAVE_SSSE3 #if HAVE_DSPR2 INTRA_PRED_TEST(DSPR2, TestIntraPred8, vpx_dc_predictor_8x8_dspr2, NULL, NULL, @@ -286,22 +286,22 @@ INTRA_PRED_TEST(C, TestIntraPred16, vpx_dc_predictor_16x16_c, vpx_d153_predictor_16x16_c, vpx_d207_predictor_16x16_c, vpx_d63_predictor_16x16_c, vpx_tm_predictor_16x16_c) -#if HAVE_SSE2 && CONFIG_USE_X86INC +#if HAVE_SSE2 INTRA_PRED_TEST(SSE2, TestIntraPred16, vpx_dc_predictor_16x16_sse2, vpx_dc_left_predictor_16x16_sse2, vpx_dc_top_predictor_16x16_sse2, vpx_dc_128_predictor_16x16_sse2, vpx_v_predictor_16x16_sse2, vpx_h_predictor_16x16_sse2, NULL, NULL, NULL, NULL, NULL, NULL, vpx_tm_predictor_16x16_sse2) -#endif // HAVE_SSE2 && CONFIG_USE_X86INC +#endif // HAVE_SSE2 -#if HAVE_SSSE3 && CONFIG_USE_X86INC +#if HAVE_SSSE3 INTRA_PRED_TEST(SSSE3, TestIntraPred16, NULL, NULL, NULL, NULL, NULL, NULL, vpx_d45_predictor_16x16_ssse3, NULL, NULL, vpx_d153_predictor_16x16_ssse3, vpx_d207_predictor_16x16_ssse3, vpx_d63_predictor_16x16_ssse3, NULL) -#endif // HAVE_SSSE3 && CONFIG_USE_X86INC +#endif // HAVE_SSSE3 #if HAVE_DSPR2 INTRA_PRED_TEST(DSPR2, TestIntraPred16, vpx_dc_predictor_16x16_dspr2, NULL, @@ -337,21 +337,21 @@ INTRA_PRED_TEST(C, TestIntraPred32, vpx_dc_predictor_32x32_c, vpx_d153_predictor_32x32_c, vpx_d207_predictor_32x32_c, vpx_d63_predictor_32x32_c, vpx_tm_predictor_32x32_c) -#if HAVE_SSE2 && CONFIG_USE_X86INC +#if HAVE_SSE2 INTRA_PRED_TEST(SSE2, TestIntraPred32, vpx_dc_predictor_32x32_sse2, vpx_dc_left_predictor_32x32_sse2, vpx_dc_top_predictor_32x32_sse2, vpx_dc_128_predictor_32x32_sse2, vpx_v_predictor_32x32_sse2, vpx_h_predictor_32x32_sse2, NULL, NULL, NULL, NULL, NULL, NULL, vpx_tm_predictor_32x32_sse2) -#endif // HAVE_SSE2 && CONFIG_USE_X86INC +#endif // HAVE_SSE2 -#if HAVE_SSSE3 && CONFIG_USE_X86INC +#if HAVE_SSSE3 INTRA_PRED_TEST(SSSE3, TestIntraPred32, NULL, NULL, NULL, NULL, NULL, NULL, vpx_d45_predictor_32x32_ssse3, NULL, NULL, vpx_d153_predictor_32x32_ssse3, vpx_d207_predictor_32x32_ssse3, vpx_d63_predictor_32x32_ssse3, NULL) -#endif // HAVE_SSSE3 && CONFIG_USE_X86INC +#endif // HAVE_SSSE3 #if HAVE_NEON INTRA_PRED_TEST(NEON, TestIntraPred32, vpx_dc_predictor_32x32_neon, diff --git a/test/variance_test.cc b/test/variance_test.cc index 7eaed271e..657474945 100644 --- a/test/variance_test.cc +++ b/test/variance_test.cc @@ -217,7 +217,6 @@ class VarianceTest : public ::testing::TestWithParam > { public: - typedef tuple ParamType; virtual void SetUp() { const tuple& params = this->GetParam(); log2width_ = get<0>(params); @@ -766,77 +765,53 @@ INSTANTIATE_TEST_CASE_P(C, VpxMseTest, make_tuple(3, 4, &vpx_mse8x16_c), make_tuple(3, 3, &vpx_mse8x8_c))); -const VpxVarianceTest::ParamType kArrayVariance_c[] = { -#if CONFIG_VP10 && CONFIG_EXT_PARTITION - make_tuple(7, 7, &vpx_variance128x128_c, 0), - make_tuple(7, 6, &vpx_variance128x64_c, 0), - make_tuple(6, 7, &vpx_variance64x128_c, 0), -#endif // CONFIG_VP10 && CONFIG_EXT_PARTITION - make_tuple(6, 6, &vpx_variance64x64_c, 0), - make_tuple(6, 5, &vpx_variance64x32_c, 0), - make_tuple(5, 6, &vpx_variance32x64_c, 0), - make_tuple(5, 5, &vpx_variance32x32_c, 0), - make_tuple(5, 4, &vpx_variance32x16_c, 0), - make_tuple(4, 5, &vpx_variance16x32_c, 0), - make_tuple(4, 4, &vpx_variance16x16_c, 0), - make_tuple(4, 3, &vpx_variance16x8_c, 0), - make_tuple(3, 4, &vpx_variance8x16_c, 0), - make_tuple(3, 3, &vpx_variance8x8_c, 0), - make_tuple(3, 2, &vpx_variance8x4_c, 0), - make_tuple(2, 3, &vpx_variance4x8_c, 0), - make_tuple(2, 2, &vpx_variance4x4_c, 0) -}; INSTANTIATE_TEST_CASE_P( C, VpxVarianceTest, - ::testing::ValuesIn(kArrayVariance_c)); + ::testing::Values(make_tuple(6, 6, &vpx_variance64x64_c, 0), + make_tuple(6, 5, &vpx_variance64x32_c, 0), + make_tuple(5, 6, &vpx_variance32x64_c, 0), + make_tuple(5, 5, &vpx_variance32x32_c, 0), + make_tuple(5, 4, &vpx_variance32x16_c, 0), + make_tuple(4, 5, &vpx_variance16x32_c, 0), + make_tuple(4, 4, &vpx_variance16x16_c, 0), + make_tuple(4, 3, &vpx_variance16x8_c, 0), + make_tuple(3, 4, &vpx_variance8x16_c, 0), + make_tuple(3, 3, &vpx_variance8x8_c, 0), + make_tuple(3, 2, &vpx_variance8x4_c, 0), + make_tuple(2, 3, &vpx_variance4x8_c, 0), + make_tuple(2, 2, &vpx_variance4x4_c, 0))); -const VpxSubpelVarianceTest::ParamType kArraySubpelVariance_c[] = { -#if CONFIG_VP10 && CONFIG_EXT_PARTITION - make_tuple(7, 7, &vpx_sub_pixel_variance128x128_c, 0), - make_tuple(7, 6, &vpx_sub_pixel_variance128x64_c, 0), - make_tuple(6, 7, &vpx_sub_pixel_variance64x128_c, 0), -#endif // CONFIG_VP10 && CONFIG_EXT_PARTITION - make_tuple(6, 6, &vpx_sub_pixel_variance64x64_c, 0), - make_tuple(6, 5, &vpx_sub_pixel_variance64x32_c, 0), - make_tuple(5, 6, &vpx_sub_pixel_variance32x64_c, 0), - make_tuple(5, 5, &vpx_sub_pixel_variance32x32_c, 0), - make_tuple(5, 4, &vpx_sub_pixel_variance32x16_c, 0), - make_tuple(4, 5, &vpx_sub_pixel_variance16x32_c, 0), - make_tuple(4, 4, &vpx_sub_pixel_variance16x16_c, 0), - make_tuple(4, 3, &vpx_sub_pixel_variance16x8_c, 0), - make_tuple(3, 4, &vpx_sub_pixel_variance8x16_c, 0), - make_tuple(3, 3, &vpx_sub_pixel_variance8x8_c, 0), - make_tuple(3, 2, &vpx_sub_pixel_variance8x4_c, 0), - make_tuple(2, 3, &vpx_sub_pixel_variance4x8_c, 0), - make_tuple(2, 2, &vpx_sub_pixel_variance4x4_c, 0) -}; INSTANTIATE_TEST_CASE_P( C, VpxSubpelVarianceTest, - ::testing::ValuesIn(kArraySubpelVariance_c)); + ::testing::Values(make_tuple(6, 6, &vpx_sub_pixel_variance64x64_c, 0), + make_tuple(6, 5, &vpx_sub_pixel_variance64x32_c, 0), + make_tuple(5, 6, &vpx_sub_pixel_variance32x64_c, 0), + make_tuple(5, 5, &vpx_sub_pixel_variance32x32_c, 0), + make_tuple(5, 4, &vpx_sub_pixel_variance32x16_c, 0), + make_tuple(4, 5, &vpx_sub_pixel_variance16x32_c, 0), + make_tuple(4, 4, &vpx_sub_pixel_variance16x16_c, 0), + make_tuple(4, 3, &vpx_sub_pixel_variance16x8_c, 0), + make_tuple(3, 4, &vpx_sub_pixel_variance8x16_c, 0), + make_tuple(3, 3, &vpx_sub_pixel_variance8x8_c, 0), + make_tuple(3, 2, &vpx_sub_pixel_variance8x4_c, 0), + make_tuple(2, 3, &vpx_sub_pixel_variance4x8_c, 0), + make_tuple(2, 2, &vpx_sub_pixel_variance4x4_c, 0))); -const VpxSubpelAvgVarianceTest::ParamType kArraySubpelAvgVariance_c[] = { -#if CONFIG_VP10 && CONFIG_EXT_PARTITION - make_tuple(7, 7, &vpx_sub_pixel_avg_variance128x128_c, 0), - make_tuple(7, 6, &vpx_sub_pixel_avg_variance128x64_c, 0), - make_tuple(6, 7, &vpx_sub_pixel_avg_variance64x128_c, 0), -#endif // CONFIG_VP10 && CONFIG_EXT_PARTITION - make_tuple(6, 6, &vpx_sub_pixel_avg_variance64x64_c, 0), - make_tuple(6, 5, &vpx_sub_pixel_avg_variance64x32_c, 0), - make_tuple(5, 6, &vpx_sub_pixel_avg_variance32x64_c, 0), - make_tuple(5, 5, &vpx_sub_pixel_avg_variance32x32_c, 0), - make_tuple(5, 4, &vpx_sub_pixel_avg_variance32x16_c, 0), - make_tuple(4, 5, &vpx_sub_pixel_avg_variance16x32_c, 0), - make_tuple(4, 4, &vpx_sub_pixel_avg_variance16x16_c, 0), - make_tuple(4, 3, &vpx_sub_pixel_avg_variance16x8_c, 0), - make_tuple(3, 4, &vpx_sub_pixel_avg_variance8x16_c, 0), - make_tuple(3, 3, &vpx_sub_pixel_avg_variance8x8_c, 0), - make_tuple(3, 2, &vpx_sub_pixel_avg_variance8x4_c, 0), - make_tuple(2, 3, &vpx_sub_pixel_avg_variance4x8_c, 0), - make_tuple(2, 2, &vpx_sub_pixel_avg_variance4x4_c, 0) -}; INSTANTIATE_TEST_CASE_P( C, VpxSubpelAvgVarianceTest, - ::testing::ValuesIn(kArraySubpelAvgVariance_c)); + ::testing::Values(make_tuple(6, 6, &vpx_sub_pixel_avg_variance64x64_c, 0), + make_tuple(6, 5, &vpx_sub_pixel_avg_variance64x32_c, 0), + make_tuple(5, 6, &vpx_sub_pixel_avg_variance32x64_c, 0), + make_tuple(5, 5, &vpx_sub_pixel_avg_variance32x32_c, 0), + make_tuple(5, 4, &vpx_sub_pixel_avg_variance32x16_c, 0), + make_tuple(4, 5, &vpx_sub_pixel_avg_variance16x32_c, 0), + make_tuple(4, 4, &vpx_sub_pixel_avg_variance16x16_c, 0), + make_tuple(4, 3, &vpx_sub_pixel_avg_variance16x8_c, 0), + make_tuple(3, 4, &vpx_sub_pixel_avg_variance8x16_c, 0), + make_tuple(3, 3, &vpx_sub_pixel_avg_variance8x8_c, 0), + make_tuple(3, 2, &vpx_sub_pixel_avg_variance8x4_c, 0), + make_tuple(2, 3, &vpx_sub_pixel_avg_variance4x8_c, 0), + make_tuple(2, 2, &vpx_sub_pixel_avg_variance4x4_c, 0))); #if CONFIG_VP9_HIGHBITDEPTH typedef MseTest VpxHBDMseTest; @@ -872,73 +847,70 @@ INSTANTIATE_TEST_CASE_P( make_tuple(4, 4, &vpx_highbd_8_mse8x8_c))); */ -const VpxHBDVarianceTest::ParamType kArrayHBDVariance_c[] = { +INSTANTIATE_TEST_CASE_P( + C, VpxHBDVarianceTest, + ::testing::Values( #if CONFIG_VP10 && CONFIG_EXT_PARTITION - make_tuple(7, 7, &vpx_highbd_12_variance128x128_c, 12), - make_tuple(7, 6, &vpx_highbd_12_variance128x64_c, 12), - make_tuple(6, 7, &vpx_highbd_12_variance64x128_c, 12), + make_tuple(7, 7, &vpx_highbd_12_variance128x128_c, 12), + make_tuple(7, 6, &vpx_highbd_12_variance128x64_c, 12), + make_tuple(6, 7, &vpx_highbd_12_variance64x128_c, 12), #endif // CONFIG_VP10 && CONFIG_EXT_PARTITION - make_tuple(6, 6, &vpx_highbd_12_variance64x64_c, 12), - make_tuple(6, 5, &vpx_highbd_12_variance64x32_c, 12), - make_tuple(5, 6, &vpx_highbd_12_variance32x64_c, 12), - make_tuple(5, 5, &vpx_highbd_12_variance32x32_c, 12), - make_tuple(5, 4, &vpx_highbd_12_variance32x16_c, 12), - make_tuple(4, 5, &vpx_highbd_12_variance16x32_c, 12), - make_tuple(4, 4, &vpx_highbd_12_variance16x16_c, 12), - make_tuple(4, 3, &vpx_highbd_12_variance16x8_c, 12), - make_tuple(3, 4, &vpx_highbd_12_variance8x16_c, 12), - make_tuple(3, 3, &vpx_highbd_12_variance8x8_c, 12), - make_tuple(3, 2, &vpx_highbd_12_variance8x4_c, 12), - make_tuple(2, 3, &vpx_highbd_12_variance4x8_c, 12), - make_tuple(2, 2, &vpx_highbd_12_variance4x4_c, 12), + make_tuple(6, 6, &vpx_highbd_12_variance64x64_c, 12), + make_tuple(6, 5, &vpx_highbd_12_variance64x32_c, 12), + make_tuple(5, 6, &vpx_highbd_12_variance32x64_c, 12), + make_tuple(5, 5, &vpx_highbd_12_variance32x32_c, 12), + make_tuple(5, 4, &vpx_highbd_12_variance32x16_c, 12), + make_tuple(4, 5, &vpx_highbd_12_variance16x32_c, 12), + make_tuple(4, 4, &vpx_highbd_12_variance16x16_c, 12), + make_tuple(4, 3, &vpx_highbd_12_variance16x8_c, 12), + make_tuple(3, 4, &vpx_highbd_12_variance8x16_c, 12), + make_tuple(3, 3, &vpx_highbd_12_variance8x8_c, 12), + make_tuple(3, 2, &vpx_highbd_12_variance8x4_c, 12), + make_tuple(2, 3, &vpx_highbd_12_variance4x8_c, 12), + make_tuple(2, 2, &vpx_highbd_12_variance4x4_c, 12), #if CONFIG_VP10 && CONFIG_EXT_PARTITION - make_tuple(7, 7, &vpx_highbd_10_variance128x128_c, 10), - make_tuple(7, 6, &vpx_highbd_10_variance128x64_c, 10), - make_tuple(6, 7, &vpx_highbd_10_variance64x128_c, 10), + make_tuple(7, 7, &vpx_highbd_10_variance128x128_c, 10), + make_tuple(7, 6, &vpx_highbd_10_variance128x64_c, 10), + make_tuple(6, 7, &vpx_highbd_10_variance64x128_c, 10), #endif // CONFIG_VP10 && CONFIG_EXT_PARTITION - make_tuple(6, 6, &vpx_highbd_10_variance64x64_c, 10), - make_tuple(6, 5, &vpx_highbd_10_variance64x32_c, 10), - make_tuple(5, 6, &vpx_highbd_10_variance32x64_c, 10), - make_tuple(5, 5, &vpx_highbd_10_variance32x32_c, 10), - make_tuple(5, 4, &vpx_highbd_10_variance32x16_c, 10), - make_tuple(4, 5, &vpx_highbd_10_variance16x32_c, 10), - make_tuple(4, 4, &vpx_highbd_10_variance16x16_c, 10), - make_tuple(4, 3, &vpx_highbd_10_variance16x8_c, 10), - make_tuple(3, 4, &vpx_highbd_10_variance8x16_c, 10), - make_tuple(3, 3, &vpx_highbd_10_variance8x8_c, 10), - make_tuple(3, 2, &vpx_highbd_10_variance8x4_c, 10), - make_tuple(2, 3, &vpx_highbd_10_variance4x8_c, 10), - make_tuple(2, 2, &vpx_highbd_10_variance4x4_c, 10), + make_tuple(6, 6, &vpx_highbd_10_variance64x64_c, 10), + make_tuple(6, 5, &vpx_highbd_10_variance64x32_c, 10), + make_tuple(5, 6, &vpx_highbd_10_variance32x64_c, 10), + make_tuple(5, 5, &vpx_highbd_10_variance32x32_c, 10), + make_tuple(5, 4, &vpx_highbd_10_variance32x16_c, 10), + make_tuple(4, 5, &vpx_highbd_10_variance16x32_c, 10), + make_tuple(4, 4, &vpx_highbd_10_variance16x16_c, 10), + make_tuple(4, 3, &vpx_highbd_10_variance16x8_c, 10), + make_tuple(3, 4, &vpx_highbd_10_variance8x16_c, 10), + make_tuple(3, 3, &vpx_highbd_10_variance8x8_c, 10), + make_tuple(3, 2, &vpx_highbd_10_variance8x4_c, 10), + make_tuple(2, 3, &vpx_highbd_10_variance4x8_c, 10), + make_tuple(2, 2, &vpx_highbd_10_variance4x4_c, 10), #if CONFIG_VP10 && CONFIG_EXT_PARTITION - make_tuple(7, 7, &vpx_highbd_8_variance128x128_c, 8), - make_tuple(7, 6, &vpx_highbd_8_variance128x64_c, 8), - make_tuple(6, 7, &vpx_highbd_8_variance64x128_c, 8), + make_tuple(7, 7, &vpx_highbd_8_variance128x128_c, 8), + make_tuple(7, 6, &vpx_highbd_8_variance128x64_c, 8), + make_tuple(6, 7, &vpx_highbd_8_variance64x128_c, 8), #endif // CONFIG_VP10 && CONFIG_EXT_PARTITION - make_tuple(6, 6, &vpx_highbd_8_variance64x64_c, 8), - make_tuple(6, 5, &vpx_highbd_8_variance64x32_c, 8), - make_tuple(5, 6, &vpx_highbd_8_variance32x64_c, 8), - make_tuple(5, 5, &vpx_highbd_8_variance32x32_c, 8), - make_tuple(5, 4, &vpx_highbd_8_variance32x16_c, 8), - make_tuple(4, 5, &vpx_highbd_8_variance16x32_c, 8), - make_tuple(4, 4, &vpx_highbd_8_variance16x16_c, 8), - make_tuple(4, 3, &vpx_highbd_8_variance16x8_c, 8), - make_tuple(3, 4, &vpx_highbd_8_variance8x16_c, 8), - make_tuple(3, 3, &vpx_highbd_8_variance8x8_c, 8), - make_tuple(3, 2, &vpx_highbd_8_variance8x4_c, 8), - make_tuple(2, 3, &vpx_highbd_8_variance4x8_c, 8), - make_tuple(2, 2, &vpx_highbd_8_variance4x4_c, 8) -}; -INSTANTIATE_TEST_CASE_P( - C, VpxHBDVarianceTest, - ::testing::ValuesIn(kArrayHBDVariance_c)); + make_tuple(6, 6, &vpx_highbd_8_variance64x64_c, 8), + make_tuple(6, 5, &vpx_highbd_8_variance64x32_c, 8), + make_tuple(5, 6, &vpx_highbd_8_variance32x64_c, 8), + make_tuple(5, 5, &vpx_highbd_8_variance32x32_c, 8), + make_tuple(5, 4, &vpx_highbd_8_variance32x16_c, 8), + make_tuple(4, 5, &vpx_highbd_8_variance16x32_c, 8), + make_tuple(4, 4, &vpx_highbd_8_variance16x16_c, 8), + make_tuple(4, 3, &vpx_highbd_8_variance16x8_c, 8), + make_tuple(3, 4, &vpx_highbd_8_variance8x16_c, 8), + make_tuple(3, 3, &vpx_highbd_8_variance8x8_c, 8), + make_tuple(3, 2, &vpx_highbd_8_variance8x4_c, 8), + make_tuple(2, 3, &vpx_highbd_8_variance4x8_c, 8), + make_tuple(2, 2, &vpx_highbd_8_variance4x4_c, 8))); #if HAVE_SSE4_1 && CONFIG_VP9_HIGHBITDEPTH INSTANTIATE_TEST_CASE_P( SSE4_1, VpxHBDVarianceTest, - ::testing::Values( - make_tuple(2, 2, &vpx_highbd_8_variance4x4_sse4_1, 8), - make_tuple(2, 2, &vpx_highbd_10_variance4x4_sse4_1, 10), - make_tuple(2, 2, &vpx_highbd_12_variance4x4_sse4_1, 12))); + ::testing::Values(make_tuple(2, 2, &vpx_highbd_8_variance4x4_sse4_1, 8), + make_tuple(2, 2, &vpx_highbd_10_variance4x4_sse4_1, 10), + make_tuple(2, 2, &vpx_highbd_12_variance4x4_sse4_1, 12))); #endif // HAVE_SSE4_1 && CONFIG_VP9_HIGHBITDEPTH const VpxHBDSubpelVarianceTest::ParamType kArrayHBDSubpelVariance_c[] = { @@ -995,7 +967,7 @@ const VpxHBDSubpelVarianceTest::ParamType kArrayHBDSubpelVariance_c[] = { make_tuple(3, 3, &vpx_highbd_12_sub_pixel_variance8x8_c, 12), make_tuple(3, 2, &vpx_highbd_12_sub_pixel_variance8x4_c, 12), make_tuple(2, 3, &vpx_highbd_12_sub_pixel_variance4x8_c, 12), - make_tuple(2, 2, &vpx_highbd_12_sub_pixel_variance4x4_c, 12) + make_tuple(2, 2, &vpx_highbd_12_sub_pixel_variance4x4_c, 12), }; INSTANTIATE_TEST_CASE_P( C, VpxHBDSubpelVarianceTest, @@ -1088,7 +1060,6 @@ INSTANTIATE_TEST_CASE_P( make_tuple(2, 3, &vpx_variance4x8_sse2, 0), make_tuple(2, 2, &vpx_variance4x4_sse2, 0))); -#if CONFIG_USE_X86INC INSTANTIATE_TEST_CASE_P( SSE2, VpxSubpelVarianceTest, ::testing::Values(make_tuple(6, 6, &vpx_sub_pixel_variance64x64_sse2, 0), @@ -1121,7 +1092,6 @@ INSTANTIATE_TEST_CASE_P( make_tuple(3, 2, &vpx_sub_pixel_avg_variance8x4_sse2, 0), make_tuple(2, 3, &vpx_sub_pixel_avg_variance4x8_sse2, 0), make_tuple(2, 2, &vpx_sub_pixel_avg_variance4x4_sse2, 0))); -#endif // CONFIG_USE_X86INC #if HAVE_SSE4_1 && CONFIG_VP9_HIGHBITDEPTH INSTANTIATE_TEST_CASE_P( @@ -1190,7 +1160,6 @@ INSTANTIATE_TEST_CASE_P( make_tuple(3, 4, &vpx_highbd_8_variance8x16_sse2, 8), make_tuple(3, 3, &vpx_highbd_8_variance8x8_sse2, 8))); -#if CONFIG_USE_X86INC INSTANTIATE_TEST_CASE_P( SSE2, VpxHBDSubpelVarianceTest, ::testing::Values( @@ -1264,12 +1233,10 @@ INSTANTIATE_TEST_CASE_P( make_tuple(3, 4, &vpx_highbd_8_sub_pixel_avg_variance8x16_sse2, 8), make_tuple(3, 3, &vpx_highbd_8_sub_pixel_avg_variance8x8_sse2, 8), make_tuple(3, 2, &vpx_highbd_8_sub_pixel_avg_variance8x4_sse2, 8))); -#endif // CONFIG_USE_X86INC #endif // CONFIG_VP9_HIGHBITDEPTH #endif // HAVE_SSE2 #if HAVE_SSSE3 -#if CONFIG_USE_X86INC INSTANTIATE_TEST_CASE_P( SSSE3, VpxSubpelVarianceTest, ::testing::Values(make_tuple(6, 6, &vpx_sub_pixel_variance64x64_ssse3, 0), @@ -1302,7 +1269,6 @@ INSTANTIATE_TEST_CASE_P( make_tuple(3, 2, &vpx_sub_pixel_avg_variance8x4_ssse3, 0), make_tuple(2, 3, &vpx_sub_pixel_avg_variance4x8_ssse3, 0), make_tuple(2, 2, &vpx_sub_pixel_avg_variance4x4_ssse3, 0))); -#endif // CONFIG_USE_X86INC #endif // HAVE_SSSE3 #if HAVE_AVX2 diff --git a/test/vp9_error_block_test.cc b/test/vp9_error_block_test.cc index 23a249e2b..341cc19cb 100644 --- a/test/vp9_error_block_test.cc +++ b/test/vp9_error_block_test.cc @@ -157,9 +157,9 @@ TEST_P(ErrorBlockTest, ExtremeValues) { << "First failed at test case " << first_failure; } +#if HAVE_SSE2 || HAVE_AVX using std::tr1::make_tuple; -#if CONFIG_USE_X86INC int64_t wrap_vp9_highbd_block_error_8bit_c(const tran_low_t *coeff, const tran_low_t *dqcoeff, intptr_t block_size, @@ -167,6 +167,7 @@ int64_t wrap_vp9_highbd_block_error_8bit_c(const tran_low_t *coeff, EXPECT_EQ(8, bps); return vp9_highbd_block_error_8bit_c(coeff, dqcoeff, block_size, ssz); } +#endif // HAVE_SSE2 || HAVE_AVX #if HAVE_SSE2 int64_t wrap_vp9_highbd_block_error_8bit_sse2(const tran_low_t *coeff, @@ -206,6 +207,5 @@ INSTANTIATE_TEST_CASE_P( &wrap_vp9_highbd_block_error_8bit_c, VPX_BITS_8))); #endif // HAVE_AVX -#endif // CONFIG_USE_X86INC #endif // CONFIG_VP9_HIGHBITDEPTH } // namespace diff --git a/test/vp9_intrapred_test.cc b/test/vp9_intrapred_test.cc index 416f3c322..cd87b77a8 100644 --- a/test/vp9_intrapred_test.cc +++ b/test/vp9_intrapred_test.cc @@ -28,45 +28,43 @@ using libvpx_test::ACMRandom; const int count_test_block = 100000; -// Base class for VP9 intra prediction tests. -class VP9IntraPredBase { - public: - virtual ~VP9IntraPredBase() { libvpx_test::ClearSystemState(); } - - protected: - virtual void Predict() = 0; - - void CheckPrediction(int test_case_number, int *error_count) const { - // For each pixel ensure that the calculated value is the same as reference. - for (int y = 0; y < block_size_; y++) { - for (int x = 0; x < block_size_; x++) { - *error_count += ref_dst_[x + y * stride_] != dst_[x + y * stride_]; - if (*error_count == 1) { - ASSERT_EQ(ref_dst_[x + y * stride_], dst_[x + y * stride_]) - << " Failed on Test Case Number "<< test_case_number; - } - } - } - } +typedef void (*IntraPred)(uint16_t* dst, ptrdiff_t stride, + const uint16_t* above, const uint16_t* left, + int bps); + +struct IntraPredFunc { + IntraPredFunc(IntraPred pred = NULL, IntraPred ref = NULL, + int block_size_value = 0, int bit_depth_value = 0) + : pred_fn(pred), ref_fn(ref), + block_size(block_size_value), bit_depth(bit_depth_value) {} + + IntraPred pred_fn; + IntraPred ref_fn; + int block_size; + int bit_depth; +}; +class VP9IntraPredTest : public ::testing::TestWithParam { + public: void RunTest(uint16_t* left_col, uint16_t* above_data, uint16_t* dst, uint16_t* ref_dst) { ACMRandom rnd(ACMRandom::DeterministicSeed()); + const int block_size = params_.block_size; + above_row_ = above_data + 16; left_col_ = left_col; dst_ = dst; ref_dst_ = ref_dst; - above_row_ = above_data + 16; int error_count = 0; for (int i = 0; i < count_test_block; ++i) { // Fill edges with random data, try first with saturated values. - for (int x = -1; x <= block_size_*2; x++) { + for (int x = -1; x <= block_size * 2; x++) { if (i == 0) { above_row_[x] = mask_; } else { above_row_[x] = rnd.Rand16() & mask_; } } - for (int y = 0; y < block_size_; y++) { + for (int y = 0; y < block_size; y++) { if (i == 0) { left_col_[y] = mask_; } else { @@ -79,43 +77,42 @@ class VP9IntraPredBase { ASSERT_EQ(0, error_count); } - int block_size_; + protected: + virtual void SetUp() { + params_ = GetParam(); + stride_ = params_.block_size * 3; + mask_ = (1 << params_.bit_depth) - 1; + } + + void Predict() { + const int bit_depth = params_.bit_depth; + params_.ref_fn(ref_dst_, stride_, above_row_, left_col_, bit_depth); + ASM_REGISTER_STATE_CHECK(params_.pred_fn(dst_, stride_, + above_row_, left_col_, bit_depth)); + } + + void CheckPrediction(int test_case_number, int *error_count) const { + // For each pixel ensure that the calculated value is the same as reference. + const int block_size = params_.block_size; + for (int y = 0; y < block_size; y++) { + for (int x = 0; x < block_size; x++) { + *error_count += ref_dst_[x + y * stride_] != dst_[x + y * stride_]; + if (*error_count == 1) { + ASSERT_EQ(ref_dst_[x + y * stride_], dst_[x + y * stride_]) + << " Failed on Test Case Number "<< test_case_number; + } + } + } + } + uint16_t *above_row_; uint16_t *left_col_; uint16_t *dst_; uint16_t *ref_dst_; ptrdiff_t stride_; int mask_; -}; - -typedef void (*intra_pred_fn_t)( - uint16_t *dst, ptrdiff_t stride, const uint16_t *above, - const uint16_t *left, int bps); -typedef std::tr1::tuple intra_pred_params_t; -class VP9IntraPredTest - : public VP9IntraPredBase, - public ::testing::TestWithParam { - - virtual void SetUp() { - pred_fn_ = GET_PARAM(0); - ref_fn_ = GET_PARAM(1); - block_size_ = GET_PARAM(2); - bit_depth_ = GET_PARAM(3); - stride_ = block_size_ * 3; - mask_ = (1 << bit_depth_) - 1; - } - virtual void Predict() { - const uint16_t *const_above_row = above_row_; - const uint16_t *const_left_col = left_col_; - ref_fn_(ref_dst_, stride_, const_above_row, const_left_col, bit_depth_); - ASM_REGISTER_STATE_CHECK(pred_fn_(dst_, stride_, const_above_row, - const_left_col, bit_depth_)); - } - intra_pred_fn_t pred_fn_; - intra_pred_fn_t ref_fn_; - int bit_depth_; + IntraPredFunc params_; }; TEST_P(VP9IntraPredTest, IntraPredTests) { @@ -127,105 +124,89 @@ TEST_P(VP9IntraPredTest, IntraPredTests) { RunTest(left_col, above_data, dst, ref_dst); } -using std::tr1::make_tuple; - #if HAVE_SSE2 #if CONFIG_VP9_HIGHBITDEPTH -#if CONFIG_USE_X86INC INSTANTIATE_TEST_CASE_P(SSE2_TO_C_8, VP9IntraPredTest, - ::testing::Values( - make_tuple(&vpx_highbd_dc_predictor_32x32_sse2, - &vpx_highbd_dc_predictor_32x32_c, 32, 8), - make_tuple(&vpx_highbd_tm_predictor_16x16_sse2, - &vpx_highbd_tm_predictor_16x16_c, 16, 8), - make_tuple(&vpx_highbd_tm_predictor_32x32_sse2, - &vpx_highbd_tm_predictor_32x32_c, 32, 8), - make_tuple(&vpx_highbd_dc_predictor_4x4_sse2, - &vpx_highbd_dc_predictor_4x4_c, 4, 8), - make_tuple(&vpx_highbd_dc_predictor_8x8_sse2, - &vpx_highbd_dc_predictor_8x8_c, 8, 8), - make_tuple(&vpx_highbd_dc_predictor_16x16_sse2, - &vpx_highbd_dc_predictor_16x16_c, 16, 8), - make_tuple(&vpx_highbd_v_predictor_4x4_sse2, - &vpx_highbd_v_predictor_4x4_c, 4, 8), - make_tuple(&vpx_highbd_v_predictor_8x8_sse2, - &vpx_highbd_v_predictor_8x8_c, 8, 8), - make_tuple(&vpx_highbd_v_predictor_16x16_sse2, - &vpx_highbd_v_predictor_16x16_c, 16, 8), - make_tuple(&vpx_highbd_v_predictor_32x32_sse2, - &vpx_highbd_v_predictor_32x32_c, 32, 8), - make_tuple(&vpx_highbd_tm_predictor_4x4_sse2, - &vpx_highbd_tm_predictor_4x4_c, 4, 8), - make_tuple(&vpx_highbd_tm_predictor_8x8_sse2, - &vpx_highbd_tm_predictor_8x8_c, 8, 8))); + ::testing::Values( + IntraPredFunc(&vpx_highbd_dc_predictor_32x32_sse2, + &vpx_highbd_dc_predictor_32x32_c, 32, 8), + IntraPredFunc(&vpx_highbd_tm_predictor_16x16_sse2, + &vpx_highbd_tm_predictor_16x16_c, 16, 8), + IntraPredFunc(&vpx_highbd_tm_predictor_32x32_sse2, + &vpx_highbd_tm_predictor_32x32_c, 32, 8), + IntraPredFunc(&vpx_highbd_dc_predictor_4x4_sse2, + &vpx_highbd_dc_predictor_4x4_c, 4, 8), + IntraPredFunc(&vpx_highbd_dc_predictor_8x8_sse2, + &vpx_highbd_dc_predictor_8x8_c, 8, 8), + IntraPredFunc(&vpx_highbd_dc_predictor_16x16_sse2, + &vpx_highbd_dc_predictor_16x16_c, 16, 8), + IntraPredFunc(&vpx_highbd_v_predictor_4x4_sse2, + &vpx_highbd_v_predictor_4x4_c, 4, 8), + IntraPredFunc(&vpx_highbd_v_predictor_8x8_sse2, + &vpx_highbd_v_predictor_8x8_c, 8, 8), + IntraPredFunc(&vpx_highbd_v_predictor_16x16_sse2, + &vpx_highbd_v_predictor_16x16_c, 16, 8), + IntraPredFunc(&vpx_highbd_v_predictor_32x32_sse2, + &vpx_highbd_v_predictor_32x32_c, 32, 8), + IntraPredFunc(&vpx_highbd_tm_predictor_4x4_sse2, + &vpx_highbd_tm_predictor_4x4_c, 4, 8), + IntraPredFunc(&vpx_highbd_tm_predictor_8x8_sse2, + &vpx_highbd_tm_predictor_8x8_c, 8, 8))); INSTANTIATE_TEST_CASE_P(SSE2_TO_C_10, VP9IntraPredTest, - ::testing::Values( - make_tuple(&vpx_highbd_dc_predictor_32x32_sse2, - &vpx_highbd_dc_predictor_32x32_c, 32, - 10), - make_tuple(&vpx_highbd_tm_predictor_16x16_sse2, - &vpx_highbd_tm_predictor_16x16_c, 16, - 10), - make_tuple(&vpx_highbd_tm_predictor_32x32_sse2, - &vpx_highbd_tm_predictor_32x32_c, 32, - 10), - make_tuple(&vpx_highbd_dc_predictor_4x4_sse2, - &vpx_highbd_dc_predictor_4x4_c, 4, 10), - make_tuple(&vpx_highbd_dc_predictor_8x8_sse2, - &vpx_highbd_dc_predictor_8x8_c, 8, 10), - make_tuple(&vpx_highbd_dc_predictor_16x16_sse2, - &vpx_highbd_dc_predictor_16x16_c, 16, - 10), - make_tuple(&vpx_highbd_v_predictor_4x4_sse2, - &vpx_highbd_v_predictor_4x4_c, 4, 10), - make_tuple(&vpx_highbd_v_predictor_8x8_sse2, - &vpx_highbd_v_predictor_8x8_c, 8, 10), - make_tuple(&vpx_highbd_v_predictor_16x16_sse2, - &vpx_highbd_v_predictor_16x16_c, 16, - 10), - make_tuple(&vpx_highbd_v_predictor_32x32_sse2, - &vpx_highbd_v_predictor_32x32_c, 32, - 10), - make_tuple(&vpx_highbd_tm_predictor_4x4_sse2, - &vpx_highbd_tm_predictor_4x4_c, 4, 10), - make_tuple(&vpx_highbd_tm_predictor_8x8_sse2, - &vpx_highbd_tm_predictor_8x8_c, 8, 10))); + ::testing::Values( + IntraPredFunc(&vpx_highbd_dc_predictor_32x32_sse2, + &vpx_highbd_dc_predictor_32x32_c, 32, 10), + IntraPredFunc(&vpx_highbd_tm_predictor_16x16_sse2, + &vpx_highbd_tm_predictor_16x16_c, 16, 10), + IntraPredFunc(&vpx_highbd_tm_predictor_32x32_sse2, + &vpx_highbd_tm_predictor_32x32_c, 32, 10), + IntraPredFunc(&vpx_highbd_dc_predictor_4x4_sse2, + &vpx_highbd_dc_predictor_4x4_c, 4, 10), + IntraPredFunc(&vpx_highbd_dc_predictor_8x8_sse2, + &vpx_highbd_dc_predictor_8x8_c, 8, 10), + IntraPredFunc(&vpx_highbd_dc_predictor_16x16_sse2, + &vpx_highbd_dc_predictor_16x16_c, 16, 10), + IntraPredFunc(&vpx_highbd_v_predictor_4x4_sse2, + &vpx_highbd_v_predictor_4x4_c, 4, 10), + IntraPredFunc(&vpx_highbd_v_predictor_8x8_sse2, + &vpx_highbd_v_predictor_8x8_c, 8, 10), + IntraPredFunc(&vpx_highbd_v_predictor_16x16_sse2, + &vpx_highbd_v_predictor_16x16_c, 16, 10), + IntraPredFunc(&vpx_highbd_v_predictor_32x32_sse2, + &vpx_highbd_v_predictor_32x32_c, 32, 10), + IntraPredFunc(&vpx_highbd_tm_predictor_4x4_sse2, + &vpx_highbd_tm_predictor_4x4_c, 4, 10), + IntraPredFunc(&vpx_highbd_tm_predictor_8x8_sse2, + &vpx_highbd_tm_predictor_8x8_c, 8, 10))); INSTANTIATE_TEST_CASE_P(SSE2_TO_C_12, VP9IntraPredTest, - ::testing::Values( - make_tuple(&vpx_highbd_dc_predictor_32x32_sse2, - &vpx_highbd_dc_predictor_32x32_c, 32, - 12), - make_tuple(&vpx_highbd_tm_predictor_16x16_sse2, - &vpx_highbd_tm_predictor_16x16_c, 16, - 12), - make_tuple(&vpx_highbd_tm_predictor_32x32_sse2, - &vpx_highbd_tm_predictor_32x32_c, 32, - 12), - make_tuple(&vpx_highbd_dc_predictor_4x4_sse2, - &vpx_highbd_dc_predictor_4x4_c, 4, 12), - make_tuple(&vpx_highbd_dc_predictor_8x8_sse2, - &vpx_highbd_dc_predictor_8x8_c, 8, 12), - make_tuple(&vpx_highbd_dc_predictor_16x16_sse2, - &vpx_highbd_dc_predictor_16x16_c, 16, - 12), - make_tuple(&vpx_highbd_v_predictor_4x4_sse2, - &vpx_highbd_v_predictor_4x4_c, 4, 12), - make_tuple(&vpx_highbd_v_predictor_8x8_sse2, - &vpx_highbd_v_predictor_8x8_c, 8, 12), - make_tuple(&vpx_highbd_v_predictor_16x16_sse2, - &vpx_highbd_v_predictor_16x16_c, 16, - 12), - make_tuple(&vpx_highbd_v_predictor_32x32_sse2, - &vpx_highbd_v_predictor_32x32_c, 32, - 12), - make_tuple(&vpx_highbd_tm_predictor_4x4_sse2, - &vpx_highbd_tm_predictor_4x4_c, 4, 12), - make_tuple(&vpx_highbd_tm_predictor_8x8_sse2, - &vpx_highbd_tm_predictor_8x8_c, 8, 12))); - -#endif // CONFIG_USE_X86INC + ::testing::Values( + IntraPredFunc(&vpx_highbd_dc_predictor_32x32_sse2, + &vpx_highbd_dc_predictor_32x32_c, 32, 12), + IntraPredFunc(&vpx_highbd_tm_predictor_16x16_sse2, + &vpx_highbd_tm_predictor_16x16_c, 16, 12), + IntraPredFunc(&vpx_highbd_tm_predictor_32x32_sse2, + &vpx_highbd_tm_predictor_32x32_c, 32, 12), + IntraPredFunc(&vpx_highbd_dc_predictor_4x4_sse2, + &vpx_highbd_dc_predictor_4x4_c, 4, 12), + IntraPredFunc(&vpx_highbd_dc_predictor_8x8_sse2, + &vpx_highbd_dc_predictor_8x8_c, 8, 12), + IntraPredFunc(&vpx_highbd_dc_predictor_16x16_sse2, + &vpx_highbd_dc_predictor_16x16_c, 16, 12), + IntraPredFunc(&vpx_highbd_v_predictor_4x4_sse2, + &vpx_highbd_v_predictor_4x4_c, 4, 12), + IntraPredFunc(&vpx_highbd_v_predictor_8x8_sse2, + &vpx_highbd_v_predictor_8x8_c, 8, 12), + IntraPredFunc(&vpx_highbd_v_predictor_16x16_sse2, + &vpx_highbd_v_predictor_16x16_c, 16, 12), + IntraPredFunc(&vpx_highbd_v_predictor_32x32_sse2, + &vpx_highbd_v_predictor_32x32_c, 32, 12), + IntraPredFunc(&vpx_highbd_tm_predictor_4x4_sse2, + &vpx_highbd_tm_predictor_4x4_c, 4, 12), + IntraPredFunc(&vpx_highbd_tm_predictor_8x8_sse2, + &vpx_highbd_tm_predictor_8x8_c, 8, 12))); + #endif // CONFIG_VP9_HIGHBITDEPTH #endif // HAVE_SSE2 } // namespace diff --git a/third_party/googletest/README.libvpx b/third_party/googletest/README.libvpx index 1eca78dd9..0e3b8b937 100644 --- a/third_party/googletest/README.libvpx +++ b/third_party/googletest/README.libvpx @@ -16,4 +16,6 @@ Local Modifications: free build. - Added GTEST_ATTRIBUTE_UNUSED_ to test registering dummies in TEST_P and INSTANTIATE_TEST_CASE_P to remove warnings about unused variables - under GCC 5. \ No newline at end of file + under GCC 5. +- Only define g_in_fast_death_test_child for non-Windows builds; quiets an + unused variable warning. diff --git a/third_party/googletest/src/src/gtest-all.cc b/third_party/googletest/src/src/gtest-all.cc index 8d906279a..912868148 100644 --- a/third_party/googletest/src/src/gtest-all.cc +++ b/third_party/googletest/src/src/gtest-all.cc @@ -6612,9 +6612,11 @@ GTEST_DEFINE_string_( namespace internal { +# if !GTEST_OS_WINDOWS // Valid only for fast death tests. Indicates the code is running in the // child process of a fast style death test. static bool g_in_fast_death_test_child = false; +# endif // !GTEST_OS_WINDOWS // Returns a Boolean value indicating whether the caller is currently // executing in the context of the death test child process. Tools such as diff --git a/vp10/common/idct.c b/vp10/common/idct.c index 179b903b2..1a573bd19 100644 --- a/vp10/common/idct.c +++ b/vp10/common/idct.c @@ -77,7 +77,8 @@ static void highbd_iidtx4_c(const tran_low_t *input, tran_low_t *output, int bd) { int i; for (i = 0; i < 4; ++i) - output[i] = (tran_low_t)highbd_dct_const_round_shift(input[i] * Sqrt2, bd); + output[i] = HIGHBD_WRAPLOW( + highbd_dct_const_round_shift(input[i] * Sqrt2), bd); } static void highbd_iidtx8_c(const tran_low_t *input, tran_low_t *output, @@ -92,8 +93,8 @@ static void highbd_iidtx16_c(const tran_low_t *input, tran_low_t *output, int bd) { int i; for (i = 0; i < 16; ++i) - output[i] = (tran_low_t)highbd_dct_const_round_shift( - input[i] * 2 * Sqrt2, bd); + output[i] = HIGHBD_WRAPLOW( + highbd_dct_const_round_shift(input[i] * 2 * Sqrt2), bd); } static void highbd_iidtx32_c(const tran_low_t *input, tran_low_t *output, @@ -113,8 +114,8 @@ static void highbd_ihalfright32_c(const tran_low_t *input, tran_low_t *output, } // Multiply input by sqrt(2) for (i = 0; i < 16; ++i) { - inputhalf[i] = (tran_low_t)highbd_dct_const_round_shift( - input[i] * Sqrt2, bd); + inputhalf[i] = HIGHBD_WRAPLOW( + highbd_dct_const_round_shift(input[i] * Sqrt2), bd); } vpx_highbd_idct16_c(inputhalf, output + 16, bd); // Note overall scaling factor is 4 times orthogonal @@ -190,18 +191,18 @@ void highbd_idst4_c(const tran_low_t *input, tran_low_t *output, int bd) { // stage 1 temp1 = (input[3] + input[1]) * cospi_16_64; temp2 = (input[3] - input[1]) * cospi_16_64; - step[0] = WRAPLOW(dct_const_round_shift(temp1), bd); - step[1] = WRAPLOW(dct_const_round_shift(temp2), bd); + step[0] = HIGHBD_WRAPLOW(dct_const_round_shift(temp1), bd); + step[1] = HIGHBD_WRAPLOW(dct_const_round_shift(temp2), bd); temp1 = input[2] * cospi_24_64 - input[0] * cospi_8_64; temp2 = input[2] * cospi_8_64 + input[0] * cospi_24_64; - step[2] = WRAPLOW(dct_const_round_shift(temp1), bd); - step[3] = WRAPLOW(dct_const_round_shift(temp2), bd); + step[2] = HIGHBD_WRAPLOW(dct_const_round_shift(temp1), bd); + step[3] = HIGHBD_WRAPLOW(dct_const_round_shift(temp2), bd); // stage 2 - output[0] = WRAPLOW(step[0] + step[3], bd); - output[1] = WRAPLOW(-step[1] - step[2], bd); - output[2] = WRAPLOW(step[1] - step[2], bd); - output[3] = WRAPLOW(step[3] - step[0], bd); + output[0] = HIGHBD_WRAPLOW(step[0] + step[3], bd); + output[1] = HIGHBD_WRAPLOW(-step[1] - step[2], bd); + output[2] = HIGHBD_WRAPLOW(step[1] - step[2], bd); + output[3] = HIGHBD_WRAPLOW(step[3] - step[0], bd); } void highbd_idst8_c(const tran_low_t *input, tran_low_t *output, int bd) { @@ -215,48 +216,48 @@ void highbd_idst8_c(const tran_low_t *input, tran_low_t *output, int bd) { step1[3] = input[1]; temp1 = input[6] * cospi_28_64 - input[0] * cospi_4_64; temp2 = input[6] * cospi_4_64 + input[0] * cospi_28_64; - step1[4] = WRAPLOW(dct_const_round_shift(temp1), bd); - step1[7] = WRAPLOW(dct_const_round_shift(temp2), bd); + step1[4] = HIGHBD_WRAPLOW(dct_const_round_shift(temp1), bd); + step1[7] = HIGHBD_WRAPLOW(dct_const_round_shift(temp2), bd); temp1 = input[2] * cospi_12_64 - input[4] * cospi_20_64; temp2 = input[2] * cospi_20_64 + input[4] * cospi_12_64; - step1[5] = WRAPLOW(dct_const_round_shift(temp1), bd); - step1[6] = WRAPLOW(dct_const_round_shift(temp2), bd); + step1[5] = HIGHBD_WRAPLOW(dct_const_round_shift(temp1), bd); + step1[6] = HIGHBD_WRAPLOW(dct_const_round_shift(temp2), bd); // stage 2 temp1 = (step1[0] + step1[2]) * cospi_16_64; temp2 = (step1[0] - step1[2]) * cospi_16_64; - step2[0] = WRAPLOW(dct_const_round_shift(temp1), bd); - step2[1] = WRAPLOW(dct_const_round_shift(temp2), bd); + step2[0] = HIGHBD_WRAPLOW(dct_const_round_shift(temp1), bd); + step2[1] = HIGHBD_WRAPLOW(dct_const_round_shift(temp2), bd); temp1 = step1[1] * cospi_24_64 - step1[3] * cospi_8_64; temp2 = step1[1] * cospi_8_64 + step1[3] * cospi_24_64; - step2[2] = WRAPLOW(dct_const_round_shift(temp1), bd); - step2[3] = WRAPLOW(dct_const_round_shift(temp2), bd); - step2[4] = WRAPLOW(step1[4] + step1[5], bd); - step2[5] = WRAPLOW(step1[4] - step1[5], bd); - step2[6] = WRAPLOW(-step1[6] + step1[7], bd); - step2[7] = WRAPLOW(step1[6] + step1[7], bd); + step2[2] = HIGHBD_WRAPLOW(dct_const_round_shift(temp1), bd); + step2[3] = HIGHBD_WRAPLOW(dct_const_round_shift(temp2), bd); + step2[4] = HIGHBD_WRAPLOW(step1[4] + step1[5], bd); + step2[5] = HIGHBD_WRAPLOW(step1[4] - step1[5], bd); + step2[6] = HIGHBD_WRAPLOW(-step1[6] + step1[7], bd); + step2[7] = HIGHBD_WRAPLOW(step1[6] + step1[7], bd); // stage 3 - step1[0] = WRAPLOW(step2[0] + step2[3], bd); - step1[1] = WRAPLOW(step2[1] + step2[2], bd); - step1[2] = WRAPLOW(step2[1] - step2[2], bd); - step1[3] = WRAPLOW(step2[0] - step2[3], bd); + step1[0] = HIGHBD_WRAPLOW(step2[0] + step2[3], bd); + step1[1] = HIGHBD_WRAPLOW(step2[1] + step2[2], bd); + step1[2] = HIGHBD_WRAPLOW(step2[1] - step2[2], bd); + step1[3] = HIGHBD_WRAPLOW(step2[0] - step2[3], bd); step1[4] = step2[4]; temp1 = (step2[6] - step2[5]) * cospi_16_64; temp2 = (step2[5] + step2[6]) * cospi_16_64; - step1[5] = WRAPLOW(dct_const_round_shift(temp1), bd); - step1[6] = WRAPLOW(dct_const_round_shift(temp2), bd); + step1[5] = HIGHBD_WRAPLOW(dct_const_round_shift(temp1), bd); + step1[6] = HIGHBD_WRAPLOW(dct_const_round_shift(temp2), bd); step1[7] = step2[7]; // stage 4 - output[0] = WRAPLOW(step1[0] + step1[7], bd); - output[1] = WRAPLOW(-step1[1] - step1[6], bd); - output[2] = WRAPLOW(step1[2] + step1[5], bd); - output[3] = WRAPLOW(-step1[3] - step1[4], bd); - output[4] = WRAPLOW(step1[3] - step1[4], bd); - output[5] = WRAPLOW(-step1[2] + step1[5], bd); - output[6] = WRAPLOW(step1[1] - step1[6], bd); - output[7] = WRAPLOW(-step1[0] + step1[7], bd); + output[0] = HIGHBD_WRAPLOW(step1[0] + step1[7], bd); + output[1] = HIGHBD_WRAPLOW(-step1[1] - step1[6], bd); + output[2] = HIGHBD_WRAPLOW(step1[2] + step1[5], bd); + output[3] = HIGHBD_WRAPLOW(-step1[3] - step1[4], bd); + output[4] = HIGHBD_WRAPLOW(step1[3] - step1[4], bd); + output[5] = HIGHBD_WRAPLOW(-step1[2] + step1[5], bd); + output[6] = HIGHBD_WRAPLOW(step1[1] - step1[6], bd); + output[7] = HIGHBD_WRAPLOW(-step1[0] + step1[7], bd); } void highbd_idst16_c(const tran_low_t *input, tran_low_t *output, int bd) { @@ -295,23 +296,23 @@ void highbd_idst16_c(const tran_low_t *input, tran_low_t *output, int bd) { temp1 = step1[8] * cospi_30_64 - step1[15] * cospi_2_64; temp2 = step1[8] * cospi_2_64 + step1[15] * cospi_30_64; - step2[8] = WRAPLOW(dct_const_round_shift(temp1), bd); - step2[15] = WRAPLOW(dct_const_round_shift(temp2), bd); + step2[8] = HIGHBD_WRAPLOW(dct_const_round_shift(temp1), bd); + step2[15] = HIGHBD_WRAPLOW(dct_const_round_shift(temp2), bd); temp1 = step1[9] * cospi_14_64 - step1[14] * cospi_18_64; temp2 = step1[9] * cospi_18_64 + step1[14] * cospi_14_64; - step2[9] = WRAPLOW(dct_const_round_shift(temp1), bd); - step2[14] = WRAPLOW(dct_const_round_shift(temp2), bd); + step2[9] = HIGHBD_WRAPLOW(dct_const_round_shift(temp1), bd); + step2[14] = HIGHBD_WRAPLOW(dct_const_round_shift(temp2), bd); temp1 = step1[10] * cospi_22_64 - step1[13] * cospi_10_64; temp2 = step1[10] * cospi_10_64 + step1[13] * cospi_22_64; - step2[10] = WRAPLOW(dct_const_round_shift(temp1), bd); - step2[13] = WRAPLOW(dct_const_round_shift(temp2), bd); + step2[10] = HIGHBD_WRAPLOW(dct_const_round_shift(temp1), bd); + step2[13] = HIGHBD_WRAPLOW(dct_const_round_shift(temp2), bd); temp1 = step1[11] * cospi_6_64 - step1[12] * cospi_26_64; temp2 = step1[11] * cospi_26_64 + step1[12] * cospi_6_64; - step2[11] = WRAPLOW(dct_const_round_shift(temp1), bd); - step2[12] = WRAPLOW(dct_const_round_shift(temp2), bd); + step2[11] = HIGHBD_WRAPLOW(dct_const_round_shift(temp1), bd); + step2[12] = HIGHBD_WRAPLOW(dct_const_round_shift(temp2), bd); // stage 3 step1[0] = step2[0]; @@ -321,109 +322,109 @@ void highbd_idst16_c(const tran_low_t *input, tran_low_t *output, int bd) { temp1 = step2[4] * cospi_28_64 - step2[7] * cospi_4_64; temp2 = step2[4] * cospi_4_64 + step2[7] * cospi_28_64; - step1[4] = WRAPLOW(dct_const_round_shift(temp1), bd); - step1[7] = WRAPLOW(dct_const_round_shift(temp2), bd); + step1[4] = HIGHBD_WRAPLOW(dct_const_round_shift(temp1), bd); + step1[7] = HIGHBD_WRAPLOW(dct_const_round_shift(temp2), bd); temp1 = step2[5] * cospi_12_64 - step2[6] * cospi_20_64; temp2 = step2[5] * cospi_20_64 + step2[6] * cospi_12_64; - step1[5] = WRAPLOW(dct_const_round_shift(temp1), bd); - step1[6] = WRAPLOW(dct_const_round_shift(temp2), bd); - - step1[8] = WRAPLOW(step2[8] + step2[9], bd); - step1[9] = WRAPLOW(step2[8] - step2[9], bd); - step1[10] = WRAPLOW(-step2[10] + step2[11], bd); - step1[11] = WRAPLOW(step2[10] + step2[11], bd); - step1[12] = WRAPLOW(step2[12] + step2[13], bd); - step1[13] = WRAPLOW(step2[12] - step2[13], bd); - step1[14] = WRAPLOW(-step2[14] + step2[15], bd); - step1[15] = WRAPLOW(step2[14] + step2[15], bd); + step1[5] = HIGHBD_WRAPLOW(dct_const_round_shift(temp1), bd); + step1[6] = HIGHBD_WRAPLOW(dct_const_round_shift(temp2), bd); + + step1[8] = HIGHBD_WRAPLOW(step2[8] + step2[9], bd); + step1[9] = HIGHBD_WRAPLOW(step2[8] - step2[9], bd); + step1[10] = HIGHBD_WRAPLOW(-step2[10] + step2[11], bd); + step1[11] = HIGHBD_WRAPLOW(step2[10] + step2[11], bd); + step1[12] = HIGHBD_WRAPLOW(step2[12] + step2[13], bd); + step1[13] = HIGHBD_WRAPLOW(step2[12] - step2[13], bd); + step1[14] = HIGHBD_WRAPLOW(-step2[14] + step2[15], bd); + step1[15] = HIGHBD_WRAPLOW(step2[14] + step2[15], bd); // stage 4 temp1 = (step1[0] + step1[1]) * cospi_16_64; temp2 = (step1[0] - step1[1]) * cospi_16_64; - step2[0] = WRAPLOW(dct_const_round_shift(temp1), bd); - step2[1] = WRAPLOW(dct_const_round_shift(temp2), bd); + step2[0] = HIGHBD_WRAPLOW(dct_const_round_shift(temp1), bd); + step2[1] = HIGHBD_WRAPLOW(dct_const_round_shift(temp2), bd); temp1 = step1[2] * cospi_24_64 - step1[3] * cospi_8_64; temp2 = step1[2] * cospi_8_64 + step1[3] * cospi_24_64; - step2[2] = WRAPLOW(dct_const_round_shift(temp1), bd); - step2[3] = WRAPLOW(dct_const_round_shift(temp2), bd); - step2[4] = WRAPLOW(step1[4] + step1[5], bd); - step2[5] = WRAPLOW(step1[4] - step1[5], bd); - step2[6] = WRAPLOW(-step1[6] + step1[7], bd); - step2[7] = WRAPLOW(step1[6] + step1[7], bd); + step2[2] = HIGHBD_WRAPLOW(dct_const_round_shift(temp1), bd); + step2[3] = HIGHBD_WRAPLOW(dct_const_round_shift(temp2), bd); + step2[4] = HIGHBD_WRAPLOW(step1[4] + step1[5], bd); + step2[5] = HIGHBD_WRAPLOW(step1[4] - step1[5], bd); + step2[6] = HIGHBD_WRAPLOW(-step1[6] + step1[7], bd); + step2[7] = HIGHBD_WRAPLOW(step1[6] + step1[7], bd); step2[8] = step1[8]; step2[15] = step1[15]; temp1 = -step1[9] * cospi_8_64 + step1[14] * cospi_24_64; temp2 = step1[9] * cospi_24_64 + step1[14] * cospi_8_64; - step2[9] = WRAPLOW(dct_const_round_shift(temp1), bd); - step2[14] = WRAPLOW(dct_const_round_shift(temp2), bd); + step2[9] = HIGHBD_WRAPLOW(dct_const_round_shift(temp1), bd); + step2[14] = HIGHBD_WRAPLOW(dct_const_round_shift(temp2), bd); temp1 = -step1[10] * cospi_24_64 - step1[13] * cospi_8_64; temp2 = -step1[10] * cospi_8_64 + step1[13] * cospi_24_64; - step2[10] = WRAPLOW(dct_const_round_shift(temp1), bd); - step2[13] = WRAPLOW(dct_const_round_shift(temp2), bd); + step2[10] = HIGHBD_WRAPLOW(dct_const_round_shift(temp1), bd); + step2[13] = HIGHBD_WRAPLOW(dct_const_round_shift(temp2), bd); step2[11] = step1[11]; step2[12] = step1[12]; // stage 5 - step1[0] = WRAPLOW(step2[0] + step2[3], bd); - step1[1] = WRAPLOW(step2[1] + step2[2], bd); - step1[2] = WRAPLOW(step2[1] - step2[2], bd); - step1[3] = WRAPLOW(step2[0] - step2[3], bd); + step1[0] = HIGHBD_WRAPLOW(step2[0] + step2[3], bd); + step1[1] = HIGHBD_WRAPLOW(step2[1] + step2[2], bd); + step1[2] = HIGHBD_WRAPLOW(step2[1] - step2[2], bd); + step1[3] = HIGHBD_WRAPLOW(step2[0] - step2[3], bd); step1[4] = step2[4]; temp1 = (step2[6] - step2[5]) * cospi_16_64; temp2 = (step2[5] + step2[6]) * cospi_16_64; - step1[5] = WRAPLOW(dct_const_round_shift(temp1), bd); - step1[6] = WRAPLOW(dct_const_round_shift(temp2), bd); + step1[5] = HIGHBD_WRAPLOW(dct_const_round_shift(temp1), bd); + step1[6] = HIGHBD_WRAPLOW(dct_const_round_shift(temp2), bd); step1[7] = step2[7]; - step1[8] = WRAPLOW(step2[8] + step2[11], bd); - step1[9] = WRAPLOW(step2[9] + step2[10], bd); - step1[10] = WRAPLOW(step2[9] - step2[10], bd); - step1[11] = WRAPLOW(step2[8] - step2[11], bd); - step1[12] = WRAPLOW(-step2[12] + step2[15], bd); - step1[13] = WRAPLOW(-step2[13] + step2[14], bd); - step1[14] = WRAPLOW(step2[13] + step2[14], bd); - step1[15] = WRAPLOW(step2[12] + step2[15], bd); + step1[8] = HIGHBD_WRAPLOW(step2[8] + step2[11], bd); + step1[9] = HIGHBD_WRAPLOW(step2[9] + step2[10], bd); + step1[10] = HIGHBD_WRAPLOW(step2[9] - step2[10], bd); + step1[11] = HIGHBD_WRAPLOW(step2[8] - step2[11], bd); + step1[12] = HIGHBD_WRAPLOW(-step2[12] + step2[15], bd); + step1[13] = HIGHBD_WRAPLOW(-step2[13] + step2[14], bd); + step1[14] = HIGHBD_WRAPLOW(step2[13] + step2[14], bd); + step1[15] = HIGHBD_WRAPLOW(step2[12] + step2[15], bd); // stage 6 - step2[0] = WRAPLOW(step1[0] + step1[7], bd); - step2[1] = WRAPLOW(step1[1] + step1[6], bd); - step2[2] = WRAPLOW(step1[2] + step1[5], bd); - step2[3] = WRAPLOW(step1[3] + step1[4], bd); - step2[4] = WRAPLOW(step1[3] - step1[4], bd); - step2[5] = WRAPLOW(step1[2] - step1[5], bd); - step2[6] = WRAPLOW(step1[1] - step1[6], bd); - step2[7] = WRAPLOW(step1[0] - step1[7], bd); + step2[0] = HIGHBD_WRAPLOW(step1[0] + step1[7], bd); + step2[1] = HIGHBD_WRAPLOW(step1[1] + step1[6], bd); + step2[2] = HIGHBD_WRAPLOW(step1[2] + step1[5], bd); + step2[3] = HIGHBD_WRAPLOW(step1[3] + step1[4], bd); + step2[4] = HIGHBD_WRAPLOW(step1[3] - step1[4], bd); + step2[5] = HIGHBD_WRAPLOW(step1[2] - step1[5], bd); + step2[6] = HIGHBD_WRAPLOW(step1[1] - step1[6], bd); + step2[7] = HIGHBD_WRAPLOW(step1[0] - step1[7], bd); step2[8] = step1[8]; step2[9] = step1[9]; temp1 = (-step1[10] + step1[13]) * cospi_16_64; temp2 = (step1[10] + step1[13]) * cospi_16_64; - step2[10] = WRAPLOW(dct_const_round_shift(temp1), bd); - step2[13] = WRAPLOW(dct_const_round_shift(temp2), bd); + step2[10] = HIGHBD_WRAPLOW(dct_const_round_shift(temp1), bd); + step2[13] = HIGHBD_WRAPLOW(dct_const_round_shift(temp2), bd); temp1 = (-step1[11] + step1[12]) * cospi_16_64; temp2 = (step1[11] + step1[12]) * cospi_16_64; - step2[11] = WRAPLOW(dct_const_round_shift(temp1), bd); - step2[12] = WRAPLOW(dct_const_round_shift(temp2), bd); + step2[11] = HIGHBD_WRAPLOW(dct_const_round_shift(temp1), bd); + step2[12] = HIGHBD_WRAPLOW(dct_const_round_shift(temp2), bd); step2[14] = step1[14]; step2[15] = step1[15]; // stage 7 - output[0] = WRAPLOW(step2[0] + step2[15], bd); - output[1] = WRAPLOW(-step2[1] - step2[14], bd); - output[2] = WRAPLOW(step2[2] + step2[13], bd); - output[3] = WRAPLOW(-step2[3] - step2[12], bd); - output[4] = WRAPLOW(step2[4] + step2[11], bd); - output[5] = WRAPLOW(-step2[5] - step2[10], bd); - output[6] = WRAPLOW(step2[6] + step2[9], bd); - output[7] = WRAPLOW(-step2[7] - step2[8], bd); - output[8] = WRAPLOW(step2[7] - step2[8], bd); - output[9] = WRAPLOW(-step2[6] + step2[9], bd); - output[10] = WRAPLOW(step2[5] - step2[10], bd); - output[11] = WRAPLOW(-step2[4] + step2[11], bd); - output[12] = WRAPLOW(step2[3] - step2[12], bd); - output[13] = WRAPLOW(-step2[2] + step2[13], bd); - output[14] = WRAPLOW(step2[1] - step2[14], bd); - output[15] = WRAPLOW(-step2[0] + step2[15], bd); + output[0] = HIGHBD_WRAPLOW(step2[0] + step2[15], bd); + output[1] = HIGHBD_WRAPLOW(-step2[1] - step2[14], bd); + output[2] = HIGHBD_WRAPLOW(step2[2] + step2[13], bd); + output[3] = HIGHBD_WRAPLOW(-step2[3] - step2[12], bd); + output[4] = HIGHBD_WRAPLOW(step2[4] + step2[11], bd); + output[5] = HIGHBD_WRAPLOW(-step2[5] - step2[10], bd); + output[6] = HIGHBD_WRAPLOW(step2[6] + step2[9], bd); + output[7] = HIGHBD_WRAPLOW(-step2[7] - step2[8], bd); + output[8] = HIGHBD_WRAPLOW(step2[7] - step2[8], bd); + output[9] = HIGHBD_WRAPLOW(-step2[6] + step2[9], bd); + output[10] = HIGHBD_WRAPLOW(step2[5] - step2[10], bd); + output[11] = HIGHBD_WRAPLOW(-step2[4] + step2[11], bd); + output[12] = HIGHBD_WRAPLOW(step2[3] - step2[12], bd); + output[13] = HIGHBD_WRAPLOW(-step2[2] + step2[13], bd); + output[14] = HIGHBD_WRAPLOW(step2[1] - step2[14], bd); + output[15] = HIGHBD_WRAPLOW(-step2[0] + step2[15], bd); } static void highbd_inv_idtx_add_c(const tran_low_t *input, uint8_t *dest8, diff --git a/vp10/common/vp10_rtcd_defs.pl b/vp10/common/vp10_rtcd_defs.pl index 51b674b8d..8f87b0222 100644 --- a/vp10/common/vp10_rtcd_defs.pl +++ b/vp10/common/vp10_rtcd_defs.pl @@ -24,29 +24,6 @@ EOF } forward_decls qw/vp10_common_forward_decls/; -# x86inc.asm had specific constraints. break it out so it's easy to disable. -# zero all the variables to avoid tricky else conditions. -$mmx_x86inc = $sse_x86inc = $sse2_x86inc = $ssse3_x86inc = $avx_x86inc = - $avx2_x86inc = ''; -$mmx_x86_64_x86inc = $sse_x86_64_x86inc = $sse2_x86_64_x86inc = - $ssse3_x86_64_x86inc = $avx_x86_64_x86inc = $avx2_x86_64_x86inc = ''; -if (vpx_config("CONFIG_USE_X86INC") eq "yes") { - $mmx_x86inc = 'mmx'; - $sse_x86inc = 'sse'; - $sse2_x86inc = 'sse2'; - $ssse3_x86inc = 'ssse3'; - $avx_x86inc = 'avx'; - $avx2_x86inc = 'avx2'; - if ($opts{arch} eq "x86_64") { - $mmx_x86_64_x86inc = 'mmx'; - $sse_x86_64_x86inc = 'sse'; - $sse2_x86_64_x86inc = 'sse2'; - $ssse3_x86_64_x86inc = 'ssse3'; - $avx_x86_64_x86inc = 'avx'; - $avx2_x86_64_x86inc = 'avx2'; - } -} - # functions that are 64 bit only. $mmx_x86_64 = $sse2_x86_64 = $ssse3_x86_64 = $avx_x86_64 = $avx2_x86_64 = ''; if ($opts{arch} eq "x86_64") { @@ -409,16 +386,16 @@ if (vpx_config("CONFIG_VP9_HIGHBITDEPTH") eq "yes") { specialize qw/vp10_fdct8x8_quant/; } else { add_proto qw/int64_t vp10_block_error/, "const tran_low_t *coeff, const tran_low_t *dqcoeff, intptr_t block_size, int64_t *ssz"; - specialize qw/vp10_block_error avx2 msa/, "$sse2_x86inc"; + specialize qw/vp10_block_error avx2 msa/; add_proto qw/int64_t vp10_block_error_fp/, "const int16_t *coeff, const int16_t *dqcoeff, int block_size"; - specialize qw/vp10_block_error_fp neon/, "$sse2_x86inc"; + specialize qw/vp10_block_error_fp neon sse2/; add_proto qw/void vp10_quantize_fp/, "const tran_low_t *coeff_ptr, intptr_t n_coeffs, int skip_block, const int16_t *zbin_ptr, const int16_t *round_ptr, const int16_t *quant_ptr, const int16_t *quant_shift_ptr, tran_low_t *qcoeff_ptr, tran_low_t *dqcoeff_ptr, const int16_t *dequant_ptr, uint16_t *eob_ptr, const int16_t *scan, const int16_t *iscan"; - specialize qw/vp10_quantize_fp neon sse2/, "$ssse3_x86_64_x86inc"; + specialize qw/vp10_quantize_fp neon sse2/; add_proto qw/void vp10_quantize_fp_32x32/, "const tran_low_t *coeff_ptr, intptr_t n_coeffs, int skip_block, const int16_t *zbin_ptr, const int16_t *round_ptr, const int16_t *quant_ptr, const int16_t *quant_shift_ptr, tran_low_t *qcoeff_ptr, tran_low_t *dqcoeff_ptr, const int16_t *dequant_ptr, uint16_t *eob_ptr, const int16_t *scan, const int16_t *iscan"; - specialize qw/vp10_quantize_fp_32x32/, "$ssse3_x86_64_x86inc"; + specialize qw/vp10_quantize_fp_32x32/; add_proto qw/void vp10_fdct8x8_quant/, "const int16_t *input, int stride, tran_low_t *coeff_ptr, intptr_t n_coeffs, int skip_block, const int16_t *zbin_ptr, const int16_t *round_ptr, const int16_t *quant_ptr, const int16_t *quant_shift_ptr, tran_low_t *qcoeff_ptr, tran_low_t *dqcoeff_ptr, const int16_t *dequant_ptr, uint16_t *eob_ptr, const int16_t *scan, const int16_t *iscan"; specialize qw/vp10_fdct8x8_quant sse2 ssse3 neon/; @@ -440,7 +417,7 @@ if (vpx_config("CONFIG_VP9_HIGHBITDEPTH") eq "yes") { specialize qw/vp10_fht32x32/; add_proto qw/void vp10_fwht4x4/, "const int16_t *input, tran_low_t *output, int stride"; - specialize qw/vp10_fwht4x4/, "$sse2_x86inc"; + specialize qw/vp10_fwht4x4/; } else { add_proto qw/void vp10_fht4x4/, "const int16_t *input, tran_low_t *output, int stride, int tx_type"; specialize qw/vp10_fht4x4 sse2/; @@ -461,7 +438,7 @@ if (vpx_config("CONFIG_VP9_HIGHBITDEPTH") eq "yes") { specialize qw/vp10_fht32x32/; add_proto qw/void vp10_fwht4x4/, "const int16_t *input, tran_low_t *output, int stride"; - specialize qw/vp10_fwht4x4 msa/, "$sse2_x86inc"; + specialize qw/vp10_fwht4x4/; } add_proto qw/void vp10_fwd_idtx/, "const int16_t *src_diff, tran_low_t *coeff, int stride, int bs, int tx_type"; diff --git a/vp10/vp10cx.mk b/vp10/vp10cx.mk index 5d5c88ab0..552432246 100644 --- a/vp10/vp10cx.mk +++ b/vp10/vp10cx.mk @@ -103,10 +103,8 @@ ifeq ($(CONFIG_VP9_HIGHBITDEPTH),yes) VP10_CX_SRCS-$(HAVE_SSE2) += encoder/x86/highbd_block_error_intrin_sse2.c endif -ifeq ($(CONFIG_USE_X86INC),yes) VP10_CX_SRCS-$(HAVE_SSE2) += encoder/x86/dct_sse2.asm VP10_CX_SRCS-$(HAVE_SSE2) += encoder/x86/error_sse2.asm -endif ifeq ($(ARCH_X86_64),yes) ifeq ($(CONFIG_USE_X86INC),yes) diff --git a/vp8/common/mips/msa/postproc_msa.c b/vp8/common/mips/msa/postproc_msa.c deleted file mode 100644 index 23dcde2eb..000000000 --- a/vp8/common/mips/msa/postproc_msa.c +++ /dev/null @@ -1,801 +0,0 @@ -/* - * Copyright (c) 2015 The WebM project authors. All Rights Reserved. - * - * Use of this source code is governed by a BSD-style license - * that can be found in the LICENSE file in the root of the source - * tree. An additional intellectual property rights grant can be found - * in the file PATENTS. All contributing project authors may - * be found in the AUTHORS file in the root of the source tree. - */ - -#include -#include "./vp8_rtcd.h" -#include "./vpx_dsp_rtcd.h" -#include "vp8/common/mips/msa/vp8_macros_msa.h" - -static const int16_t vp8_rv_msa[] = -{ - 8, 5, 2, 2, 8, 12, 4, 9, 8, 3, - 0, 3, 9, 0, 0, 0, 8, 3, 14, 4, - 10, 1, 11, 14, 1, 14, 9, 6, 12, 11, - 8, 6, 10, 0, 0, 8, 9, 0, 3, 14, - 8, 11, 13, 4, 2, 9, 0, 3, 9, 6, - 1, 2, 3, 14, 13, 1, 8, 2, 9, 7, - 3, 3, 1, 13, 13, 6, 6, 5, 2, 7, - 11, 9, 11, 8, 7, 3, 2, 0, 13, 13, - 14, 4, 12, 5, 12, 10, 8, 10, 13, 10, - 4, 14, 4, 10, 0, 8, 11, 1, 13, 7, - 7, 14, 6, 14, 13, 2, 13, 5, 4, 4, - 0, 10, 0, 5, 13, 2, 12, 7, 11, 13, - 8, 0, 4, 10, 7, 2, 7, 2, 2, 5, - 3, 4, 7, 3, 3, 14, 14, 5, 9, 13, - 3, 14, 3, 6, 3, 0, 11, 8, 13, 1, - 13, 1, 12, 0, 10, 9, 7, 6, 2, 8, - 5, 2, 13, 7, 1, 13, 14, 7, 6, 7, - 9, 6, 10, 11, 7, 8, 7, 5, 14, 8, - 4, 4, 0, 8, 7, 10, 0, 8, 14, 11, - 3, 12, 5, 7, 14, 3, 14, 5, 2, 6, - 11, 12, 12, 8, 0, 11, 13, 1, 2, 0, - 5, 10, 14, 7, 8, 0, 4, 11, 0, 8, - 0, 3, 10, 5, 8, 0, 11, 6, 7, 8, - 10, 7, 13, 9, 2, 5, 1, 5, 10, 2, - 4, 3, 5, 6, 10, 8, 9, 4, 11, 14, - 0, 10, 0, 5, 13, 2, 12, 7, 11, 13, - 8, 0, 4, 10, 7, 2, 7, 2, 2, 5, - 3, 4, 7, 3, 3, 14, 14, 5, 9, 13, - 3, 14, 3, 6, 3, 0, 11, 8, 13, 1, - 13, 1, 12, 0, 10, 9, 7, 6, 2, 8, - 5, 2, 13, 7, 1, 13, 14, 7, 6, 7, - 9, 6, 10, 11, 7, 8, 7, 5, 14, 8, - 4, 4, 0, 8, 7, 10, 0, 8, 14, 11, - 3, 12, 5, 7, 14, 3, 14, 5, 2, 6, - 11, 12, 12, 8, 0, 11, 13, 1, 2, 0, - 5, 10, 14, 7, 8, 0, 4, 11, 0, 8, - 0, 3, 10, 5, 8, 0, 11, 6, 7, 8, - 10, 7, 13, 9, 2, 5, 1, 5, 10, 2, - 4, 3, 5, 6, 10, 8, 9, 4, 11, 14, - 3, 8, 3, 7, 8, 5, 11, 4, 12, 3, - 11, 9, 14, 8, 14, 13, 4, 3, 1, 2, - 14, 6, 5, 4, 4, 11, 4, 6, 2, 1, - 5, 8, 8, 12, 13, 5, 14, 10, 12, 13, - 0, 9, 5, 5, 11, 10, 13, 9, 10, 13, -}; - -#define VP8_TRANSPOSE8x16_UB_UB(in0, in1, in2, in3, in4, in5, in6, in7, \ - out0, out1, out2, out3, \ - out4, out5, out6, out7, \ - out8, out9, out10, out11, \ - out12, out13, out14, out15) \ -{ \ - v8i16 temp0, temp1, temp2, temp3, temp4; \ - v8i16 temp5, temp6, temp7, temp8, temp9; \ - \ - ILVR_B4_SH(in1, in0, in3, in2, in5, in4, in7, in6, \ - temp0, temp1, temp2, temp3); \ - ILVR_H2_SH(temp1, temp0, temp3, temp2, temp4, temp5); \ - ILVRL_W2_SH(temp5, temp4, temp6, temp7); \ - ILVL_H2_SH(temp1, temp0, temp3, temp2, temp4, temp5); \ - ILVRL_W2_SH(temp5, temp4, temp8, temp9); \ - ILVL_B4_SH(in1, in0, in3, in2, in5, in4, in7, in6, \ - temp0, temp1, temp2, temp3); \ - ILVR_H2_SH(temp1, temp0, temp3, temp2, temp4, temp5); \ - ILVRL_W2_UB(temp5, temp4, out8, out10); \ - ILVL_H2_SH(temp1, temp0, temp3, temp2, temp4, temp5); \ - ILVRL_W2_UB(temp5, temp4, out12, out14); \ - out0 = (v16u8)temp6; \ - out2 = (v16u8)temp7; \ - out4 = (v16u8)temp8; \ - out6 = (v16u8)temp9; \ - out9 = (v16u8)__msa_ilvl_d((v2i64)out8, (v2i64)out8); \ - out11 = (v16u8)__msa_ilvl_d((v2i64)out10, (v2i64)out10); \ - out13 = (v16u8)__msa_ilvl_d((v2i64)out12, (v2i64)out12); \ - out15 = (v16u8)__msa_ilvl_d((v2i64)out14, (v2i64)out14); \ - out1 = (v16u8)__msa_ilvl_d((v2i64)out0, (v2i64)out0); \ - out3 = (v16u8)__msa_ilvl_d((v2i64)out2, (v2i64)out2); \ - out5 = (v16u8)__msa_ilvl_d((v2i64)out4, (v2i64)out4); \ - out7 = (v16u8)__msa_ilvl_d((v2i64)out6, (v2i64)out6); \ -} - -#define VP8_AVER_IF_RETAIN(above2_in, above1_in, src_in, \ - below1_in, below2_in, ref, out) \ -{ \ - v16u8 temp0, temp1; \ - \ - temp1 = __msa_aver_u_b(above2_in, above1_in); \ - temp0 = __msa_aver_u_b(below2_in, below1_in); \ - temp1 = __msa_aver_u_b(temp1, temp0); \ - out = __msa_aver_u_b(src_in, temp1); \ - temp0 = __msa_asub_u_b(src_in, above2_in); \ - temp1 = __msa_asub_u_b(src_in, above1_in); \ - temp0 = (temp0 < ref); \ - temp1 = (temp1 < ref); \ - temp0 = temp0 & temp1; \ - temp1 = __msa_asub_u_b(src_in, below1_in); \ - temp1 = (temp1 < ref); \ - temp0 = temp0 & temp1; \ - temp1 = __msa_asub_u_b(src_in, below2_in); \ - temp1 = (temp1 < ref); \ - temp0 = temp0 & temp1; \ - out = __msa_bmz_v(out, src_in, temp0); \ -} - -#define TRANSPOSE12x16_B(in0, in1, in2, in3, in4, in5, in6, in7, \ - in8, in9, in10, in11, in12, in13, in14, in15) \ -{ \ - v8i16 temp0, temp1, temp2, temp3, temp4; \ - v8i16 temp5, temp6, temp7, temp8, temp9; \ - \ - ILVR_B2_SH(in1, in0, in3, in2, temp0, temp1); \ - ILVRL_H2_SH(temp1, temp0, temp2, temp3); \ - ILVR_B2_SH(in5, in4, in7, in6, temp0, temp1); \ - ILVRL_H2_SH(temp1, temp0, temp4, temp5); \ - ILVRL_W2_SH(temp4, temp2, temp0, temp1); \ - ILVRL_W2_SH(temp5, temp3, temp2, temp3); \ - ILVR_B2_SH(in9, in8, in11, in10, temp4, temp5); \ - ILVR_B2_SH(in9, in8, in11, in10, temp4, temp5); \ - ILVRL_H2_SH(temp5, temp4, temp6, temp7); \ - ILVR_B2_SH(in13, in12, in15, in14, temp4, temp5); \ - ILVRL_H2_SH(temp5, temp4, temp8, temp9); \ - ILVRL_W2_SH(temp8, temp6, temp4, temp5); \ - ILVRL_W2_SH(temp9, temp7, temp6, temp7); \ - ILVL_B2_SH(in1, in0, in3, in2, temp8, temp9); \ - ILVR_D2_UB(temp4, temp0, temp5, temp1, in0, in2); \ - in1 = (v16u8)__msa_ilvl_d((v2i64)temp4, (v2i64)temp0); \ - in3 = (v16u8)__msa_ilvl_d((v2i64)temp5, (v2i64)temp1); \ - ILVL_B2_SH(in5, in4, in7, in6, temp0, temp1); \ - ILVR_D2_UB(temp6, temp2, temp7, temp3, in4, in6); \ - in5 = (v16u8)__msa_ilvl_d((v2i64)temp6, (v2i64)temp2); \ - in7 = (v16u8)__msa_ilvl_d((v2i64)temp7, (v2i64)temp3); \ - ILVL_B4_SH(in9, in8, in11, in10, in13, in12, in15, in14, \ - temp2, temp3, temp4, temp5); \ - ILVR_H4_SH(temp9, temp8, temp1, temp0, temp3, temp2, temp5, temp4, \ - temp6, temp7, temp8, temp9); \ - ILVR_W2_SH(temp7, temp6, temp9, temp8, temp0, temp1); \ - in8 = (v16u8)__msa_ilvr_d((v2i64)temp1, (v2i64)temp0); \ - in9 = (v16u8)__msa_ilvl_d((v2i64)temp1, (v2i64)temp0); \ - ILVL_W2_SH(temp7, temp6, temp9, temp8, temp2, temp3); \ - in10 = (v16u8)__msa_ilvr_d((v2i64)temp3, (v2i64)temp2); \ - in11 = (v16u8)__msa_ilvl_d((v2i64)temp3, (v2i64)temp2); \ -} - -#define VP8_TRANSPOSE12x8_UB_UB(in0, in1, in2, in3, in4, in5, \ - in6, in7, in8, in9, in10, in11) \ -{ \ - v8i16 temp0, temp1, temp2, temp3; \ - v8i16 temp4, temp5, temp6, temp7; \ - \ - ILVR_B2_SH(in1, in0, in3, in2, temp0, temp1); \ - ILVRL_H2_SH(temp1, temp0, temp2, temp3); \ - ILVR_B2_SH(in5, in4, in7, in6, temp0, temp1); \ - ILVRL_H2_SH(temp1, temp0, temp4, temp5); \ - ILVRL_W2_SH(temp4, temp2, temp0, temp1); \ - ILVRL_W2_SH(temp5, temp3, temp2, temp3); \ - ILVL_B2_SH(in1, in0, in3, in2, temp4, temp5); \ - temp4 = __msa_ilvr_h(temp5, temp4); \ - ILVL_B2_SH(in5, in4, in7, in6, temp6, temp7); \ - temp5 = __msa_ilvr_h(temp7, temp6); \ - ILVRL_W2_SH(temp5, temp4, temp6, temp7); \ - in0 = (v16u8)temp0; \ - in2 = (v16u8)temp1; \ - in4 = (v16u8)temp2; \ - in6 = (v16u8)temp3; \ - in8 = (v16u8)temp6; \ - in10 = (v16u8)temp7; \ - in1 = (v16u8)__msa_ilvl_d((v2i64)temp0, (v2i64)temp0); \ - in3 = (v16u8)__msa_ilvl_d((v2i64)temp1, (v2i64)temp1); \ - in5 = (v16u8)__msa_ilvl_d((v2i64)temp2, (v2i64)temp2); \ - in7 = (v16u8)__msa_ilvl_d((v2i64)temp3, (v2i64)temp3); \ - in9 = (v16u8)__msa_ilvl_d((v2i64)temp6, (v2i64)temp6); \ - in11 = (v16u8)__msa_ilvl_d((v2i64)temp7, (v2i64)temp7); \ -} - -static void postproc_down_across_chroma_msa(uint8_t *src_ptr, uint8_t *dst_ptr, - int32_t src_stride, - int32_t dst_stride, - int32_t cols, uint8_t *f) -{ - uint8_t *p_src = src_ptr; - uint8_t *p_dst = dst_ptr; - uint8_t *f_orig = f; - uint8_t *p_dst_st = dst_ptr; - uint16_t col; - uint64_t out0, out1, out2, out3; - v16u8 above2, above1, below2, below1, src, ref, ref_temp; - v16u8 inter0, inter1, inter2, inter3, inter4, inter5; - v16u8 inter6, inter7, inter8, inter9, inter10, inter11; - - for (col = (cols / 16); col--;) - { - ref = LD_UB(f); - LD_UB2(p_src - 2 * src_stride, src_stride, above2, above1); - src = LD_UB(p_src); - LD_UB2(p_src + 1 * src_stride, src_stride, below1, below2); - VP8_AVER_IF_RETAIN(above2, above1, src, below1, below2, ref, inter0); - above2 = LD_UB(p_src + 3 * src_stride); - VP8_AVER_IF_RETAIN(above1, src, below1, below2, above2, ref, inter1); - above1 = LD_UB(p_src + 4 * src_stride); - VP8_AVER_IF_RETAIN(src, below1, below2, above2, above1, ref, inter2); - src = LD_UB(p_src + 5 * src_stride); - VP8_AVER_IF_RETAIN(below1, below2, above2, above1, src, ref, inter3); - below1 = LD_UB(p_src + 6 * src_stride); - VP8_AVER_IF_RETAIN(below2, above2, above1, src, below1, ref, inter4); - below2 = LD_UB(p_src + 7 * src_stride); - VP8_AVER_IF_RETAIN(above2, above1, src, below1, below2, ref, inter5); - above2 = LD_UB(p_src + 8 * src_stride); - VP8_AVER_IF_RETAIN(above1, src, below1, below2, above2, ref, inter6); - above1 = LD_UB(p_src + 9 * src_stride); - VP8_AVER_IF_RETAIN(src, below1, below2, above2, above1, ref, inter7); - ST_UB8(inter0, inter1, inter2, inter3, inter4, inter5, inter6, inter7, - p_dst, dst_stride); - - p_dst += 16; - p_src += 16; - f += 16; - } - - if (0 != (cols / 16)) - { - ref = LD_UB(f); - LD_UB2(p_src - 2 * src_stride, src_stride, above2, above1); - src = LD_UB(p_src); - LD_UB2(p_src + 1 * src_stride, src_stride, below1, below2); - VP8_AVER_IF_RETAIN(above2, above1, src, below1, below2, ref, inter0); - above2 = LD_UB(p_src + 3 * src_stride); - VP8_AVER_IF_RETAIN(above1, src, below1, below2, above2, ref, inter1); - above1 = LD_UB(p_src + 4 * src_stride); - VP8_AVER_IF_RETAIN(src, below1, below2, above2, above1, ref, inter2); - src = LD_UB(p_src + 5 * src_stride); - VP8_AVER_IF_RETAIN(below1, below2, above2, above1, src, ref, inter3); - below1 = LD_UB(p_src + 6 * src_stride); - VP8_AVER_IF_RETAIN(below2, above2, above1, src, below1, ref, inter4); - below2 = LD_UB(p_src + 7 * src_stride); - VP8_AVER_IF_RETAIN(above2, above1, src, below1, below2, ref, inter5); - above2 = LD_UB(p_src + 8 * src_stride); - VP8_AVER_IF_RETAIN(above1, src, below1, below2, above2, ref, inter6); - above1 = LD_UB(p_src + 9 * src_stride); - VP8_AVER_IF_RETAIN(src, below1, below2, above2, above1, ref, inter7); - out0 = __msa_copy_u_d((v2i64)inter0, 0); - out1 = __msa_copy_u_d((v2i64)inter1, 0); - out2 = __msa_copy_u_d((v2i64)inter2, 0); - out3 = __msa_copy_u_d((v2i64)inter3, 0); - SD4(out0, out1, out2, out3, p_dst, dst_stride); - - out0 = __msa_copy_u_d((v2i64)inter4, 0); - out1 = __msa_copy_u_d((v2i64)inter5, 0); - out2 = __msa_copy_u_d((v2i64)inter6, 0); - out3 = __msa_copy_u_d((v2i64)inter7, 0); - SD4(out0, out1, out2, out3, p_dst + 4 * dst_stride, dst_stride); - } - - f = f_orig; - p_dst = dst_ptr - 2; - LD_UB8(p_dst, dst_stride, - inter0, inter1, inter2, inter3, inter4, inter5, inter6, inter7); - - for (col = 0; col < (cols / 8); ++col) - { - ref = LD_UB(f); - f += 8; - VP8_TRANSPOSE12x8_UB_UB(inter0, inter1, inter2, inter3, - inter4, inter5, inter6, inter7, - inter8, inter9, inter10, inter11); - if (0 == col) - { - above2 = inter2; - above1 = inter2; - } - else - { - above2 = inter0; - above1 = inter1; - } - src = inter2; - below1 = inter3; - below2 = inter4; - ref_temp = (v16u8)__msa_splati_b((v16i8)ref, 0); - VP8_AVER_IF_RETAIN(above2, above1, src, below1, below2, - ref_temp, inter2); - above2 = inter5; - ref_temp = (v16u8)__msa_splati_b((v16i8)ref, 1); - VP8_AVER_IF_RETAIN(above1, src, below1, below2, above2, - ref_temp, inter3); - above1 = inter6; - ref_temp = (v16u8)__msa_splati_b((v16i8)ref, 2); - VP8_AVER_IF_RETAIN(src, below1, below2, above2, above1, - ref_temp, inter4); - src = inter7; - ref_temp = (v16u8)__msa_splati_b((v16i8)ref, 3); - VP8_AVER_IF_RETAIN(below1, below2, above2, above1, src, - ref_temp, inter5); - below1 = inter8; - ref_temp = (v16u8)__msa_splati_b((v16i8)ref, 4); - VP8_AVER_IF_RETAIN(below2, above2, above1, src, below1, - ref_temp, inter6); - below2 = inter9; - ref_temp = (v16u8)__msa_splati_b((v16i8)ref, 5); - VP8_AVER_IF_RETAIN(above2, above1, src, below1, below2, - ref_temp, inter7); - if (col == (cols / 8 - 1)) - { - above2 = inter9; - } - else - { - above2 = inter10; - } - ref_temp = (v16u8)__msa_splati_b((v16i8)ref, 6); - VP8_AVER_IF_RETAIN(above1, src, below1, below2, above2, - ref_temp, inter8); - if (col == (cols / 8 - 1)) - { - above1 = inter9; - } - else - { - above1 = inter11; - } - ref_temp = (v16u8)__msa_splati_b((v16i8)ref, 7); - VP8_AVER_IF_RETAIN(src, below1, below2, above2, above1, - ref_temp, inter9); - TRANSPOSE8x8_UB_UB(inter2, inter3, inter4, inter5, inter6, inter7, - inter8, inter9, inter2, inter3, inter4, inter5, - inter6, inter7, inter8, inter9); - p_dst += 8; - LD_UB2(p_dst, dst_stride, inter0, inter1); - ST8x1_UB(inter2, p_dst_st); - ST8x1_UB(inter3, (p_dst_st + 1 * dst_stride)); - LD_UB2(p_dst + 2 * dst_stride, dst_stride, inter2, inter3); - ST8x1_UB(inter4, (p_dst_st + 2 * dst_stride)); - ST8x1_UB(inter5, (p_dst_st + 3 * dst_stride)); - LD_UB2(p_dst + 4 * dst_stride, dst_stride, inter4, inter5); - ST8x1_UB(inter6, (p_dst_st + 4 * dst_stride)); - ST8x1_UB(inter7, (p_dst_st + 5 * dst_stride)); - LD_UB2(p_dst + 6 * dst_stride, dst_stride, inter6, inter7); - ST8x1_UB(inter8, (p_dst_st + 6 * dst_stride)); - ST8x1_UB(inter9, (p_dst_st + 7 * dst_stride)); - p_dst_st += 8; - } -} - -static void postproc_down_across_luma_msa(uint8_t *src_ptr, uint8_t *dst_ptr, - int32_t src_stride, - int32_t dst_stride, - int32_t cols, uint8_t *f) -{ - uint8_t *p_src = src_ptr; - uint8_t *p_dst = dst_ptr; - uint8_t *p_dst_st = dst_ptr; - uint8_t *f_orig = f; - uint16_t col; - v16u8 above2, above1, below2, below1; - v16u8 src, ref, ref_temp; - v16u8 inter0, inter1, inter2, inter3, inter4, inter5, inter6; - v16u8 inter7, inter8, inter9, inter10, inter11; - v16u8 inter12, inter13, inter14, inter15; - - for (col = (cols / 16); col--;) - { - ref = LD_UB(f); - LD_UB2(p_src - 2 * src_stride, src_stride, above2, above1); - src = LD_UB(p_src); - LD_UB2(p_src + 1 * src_stride, src_stride, below1, below2); - VP8_AVER_IF_RETAIN(above2, above1, src, below1, below2, ref, inter0); - above2 = LD_UB(p_src + 3 * src_stride); - VP8_AVER_IF_RETAIN(above1, src, below1, below2, above2, ref, inter1); - above1 = LD_UB(p_src + 4 * src_stride); - VP8_AVER_IF_RETAIN(src, below1, below2, above2, above1, ref, inter2); - src = LD_UB(p_src + 5 * src_stride); - VP8_AVER_IF_RETAIN(below1, below2, above2, above1, src, ref, inter3); - below1 = LD_UB(p_src + 6 * src_stride); - VP8_AVER_IF_RETAIN(below2, above2, above1, src, below1, ref, inter4); - below2 = LD_UB(p_src + 7 * src_stride); - VP8_AVER_IF_RETAIN(above2, above1, src, below1, below2, ref, inter5); - above2 = LD_UB(p_src + 8 * src_stride); - VP8_AVER_IF_RETAIN(above1, src, below1, below2, above2, ref, inter6); - above1 = LD_UB(p_src + 9 * src_stride); - VP8_AVER_IF_RETAIN(src, below1, below2, above2, above1, ref, inter7); - src = LD_UB(p_src + 10 * src_stride); - VP8_AVER_IF_RETAIN(below1, below2, above2, above1, src, ref, inter8); - below1 = LD_UB(p_src + 11 * src_stride); - VP8_AVER_IF_RETAIN(below2, above2, above1, src, below1, ref, inter9); - below2 = LD_UB(p_src + 12 * src_stride); - VP8_AVER_IF_RETAIN(above2, above1, src, below1, below2, ref, inter10); - above2 = LD_UB(p_src + 13 * src_stride); - VP8_AVER_IF_RETAIN(above1, src, below1, below2, above2, ref, inter11); - above1 = LD_UB(p_src + 14 * src_stride); - VP8_AVER_IF_RETAIN(src, below1, below2, above2, above1, ref, inter12); - src = LD_UB(p_src + 15 * src_stride); - VP8_AVER_IF_RETAIN(below1, below2, above2, above1, src, ref, inter13); - below1 = LD_UB(p_src + 16 * src_stride); - VP8_AVER_IF_RETAIN(below2, above2, above1, src, below1, ref, inter14); - below2 = LD_UB(p_src + 17 * src_stride); - VP8_AVER_IF_RETAIN(above2, above1, src, below1, below2, ref, inter15); - ST_UB8(inter0, inter1, inter2, inter3, inter4, inter5, inter6, inter7, - p_dst, dst_stride); - ST_UB8(inter8, inter9, inter10, inter11, inter12, inter13, - inter14, inter15, p_dst + 8 * dst_stride, dst_stride); - p_src += 16; - p_dst += 16; - f += 16; - } - - f = f_orig; - p_dst = dst_ptr - 2; - LD_UB8(p_dst, dst_stride, - inter0, inter1, inter2, inter3, inter4, inter5, inter6, inter7); - LD_UB8(p_dst + 8 * dst_stride, dst_stride, - inter8, inter9, inter10, inter11, inter12, inter13, - inter14, inter15); - - for (col = 0; col < cols / 8; ++col) - { - ref = LD_UB(f); - f += 8; - TRANSPOSE12x16_B(inter0, inter1, inter2, inter3, inter4, inter5, - inter6, inter7, inter8, inter9, inter10, inter11, - inter12, inter13, inter14, inter15); - if (0 == col) - { - above2 = inter2; - above1 = inter2; - } - else - { - above2 = inter0; - above1 = inter1; - } - - src = inter2; - below1 = inter3; - below2 = inter4; - ref_temp = (v16u8)__msa_splati_b((v16i8)ref, 0); - VP8_AVER_IF_RETAIN(above2, above1, src, below1, below2, - ref_temp, inter2); - above2 = inter5; - ref_temp = (v16u8)__msa_splati_b((v16i8)ref, 1); - VP8_AVER_IF_RETAIN(above1, src, below1, below2, above2, - ref_temp, inter3); - above1 = inter6; - ref_temp = (v16u8)__msa_splati_b((v16i8)ref, 2); - VP8_AVER_IF_RETAIN(src, below1, below2, above2, above1, - ref_temp, inter4); - src = inter7; - ref_temp = (v16u8)__msa_splati_b((v16i8)ref, 3); - VP8_AVER_IF_RETAIN(below1, below2, above2, above1, src, - ref_temp, inter5); - below1 = inter8; - ref_temp = (v16u8)__msa_splati_b((v16i8)ref, 4); - VP8_AVER_IF_RETAIN(below2, above2, above1, src, below1, - ref_temp, inter6); - below2 = inter9; - ref_temp = (v16u8)__msa_splati_b((v16i8)ref, 5); - VP8_AVER_IF_RETAIN(above2, above1, src, below1, below2, - ref_temp, inter7); - if (col == (cols / 8 - 1)) - { - above2 = inter9; - } - else - { - above2 = inter10; - } - ref_temp = (v16u8)__msa_splati_b((v16i8)ref, 6); - VP8_AVER_IF_RETAIN(above1, src, below1, below2, above2, - ref_temp, inter8); - if (col == (cols / 8 - 1)) - { - above1 = inter9; - } - else - { - above1 = inter11; - } - ref_temp = (v16u8)__msa_splati_b((v16i8)ref, 7); - VP8_AVER_IF_RETAIN(src, below1, below2, above2, above1, - ref_temp, inter9); - VP8_TRANSPOSE8x16_UB_UB(inter2, inter3, inter4, inter5, - inter6, inter7, inter8, inter9, - inter2, inter3, inter4, inter5, - inter6, inter7, inter8, inter9, - inter10, inter11, inter12, inter13, - inter14, inter15, above2, above1); - - p_dst += 8; - LD_UB2(p_dst, dst_stride, inter0, inter1); - ST8x1_UB(inter2, p_dst_st); - ST8x1_UB(inter3, (p_dst_st + 1 * dst_stride)); - LD_UB2(p_dst + 2 * dst_stride, dst_stride, inter2, inter3); - ST8x1_UB(inter4, (p_dst_st + 2 * dst_stride)); - ST8x1_UB(inter5, (p_dst_st + 3 * dst_stride)); - LD_UB2(p_dst + 4 * dst_stride, dst_stride, inter4, inter5); - ST8x1_UB(inter6, (p_dst_st + 4 * dst_stride)); - ST8x1_UB(inter7, (p_dst_st + 5 * dst_stride)); - LD_UB2(p_dst + 6 * dst_stride, dst_stride, inter6, inter7); - ST8x1_UB(inter8, (p_dst_st + 6 * dst_stride)); - ST8x1_UB(inter9, (p_dst_st + 7 * dst_stride)); - LD_UB2(p_dst + 8 * dst_stride, dst_stride, inter8, inter9); - ST8x1_UB(inter10, (p_dst_st + 8 * dst_stride)); - ST8x1_UB(inter11, (p_dst_st + 9 * dst_stride)); - LD_UB2(p_dst + 10 * dst_stride, dst_stride, inter10, inter11); - ST8x1_UB(inter12, (p_dst_st + 10 * dst_stride)); - ST8x1_UB(inter13, (p_dst_st + 11 * dst_stride)); - LD_UB2(p_dst + 12 * dst_stride, dst_stride, inter12, inter13); - ST8x1_UB(inter14, (p_dst_st + 12 * dst_stride)); - ST8x1_UB(inter15, (p_dst_st + 13 * dst_stride)); - LD_UB2(p_dst + 14 * dst_stride, dst_stride, inter14, inter15); - ST8x1_UB(above2, (p_dst_st + 14 * dst_stride)); - ST8x1_UB(above1, (p_dst_st + 15 * dst_stride)); - p_dst_st += 8; - } -} - -void vp8_post_proc_down_and_across_mb_row_msa(uint8_t *src, uint8_t *dst, - int32_t src_stride, - int32_t dst_stride, - int32_t cols, uint8_t *f, - int32_t size) -{ - if (8 == size) - { - postproc_down_across_chroma_msa(src, dst, src_stride, dst_stride, - cols, f); - } - else if (16 == size) - { - postproc_down_across_luma_msa(src, dst, src_stride, dst_stride, - cols, f); - } -} - -void vp8_mbpost_proc_across_ip_msa(uint8_t *src_ptr, int32_t pitch, - int32_t rows, int32_t cols, int32_t flimit) -{ - int32_t row, col, cnt; - uint8_t *src_dup = src_ptr; - v16u8 src0, src, tmp_orig; - v16u8 tmp = { 0 }; - v16i8 zero = { 0 }; - v8u16 sum_h, src_r_h, src_l_h; - v4u32 src_r_w, src_l_w; - v4i32 flimit_vec; - - flimit_vec = __msa_fill_w(flimit); - for (row = rows; row--;) - { - int32_t sum_sq = 0; - int32_t sum = 0; - src0 = (v16u8)__msa_fill_b(src_dup[0]); - ST8x1_UB(src0, (src_dup - 8)); - - src0 = (v16u8)__msa_fill_b(src_dup[cols - 1]); - ST_UB(src0, src_dup + cols); - src_dup[cols + 16] = src_dup[cols - 1]; - tmp_orig = (v16u8)__msa_ldi_b(0); - tmp_orig[15] = tmp[15]; - src = LD_UB(src_dup - 8); - src[15] = 0; - ILVRL_B2_UH(zero, src, src_r_h, src_l_h); - src_r_w = __msa_dotp_u_w(src_r_h, src_r_h); - src_l_w = __msa_dotp_u_w(src_l_h, src_l_h); - sum_sq = HADD_SW_S32(src_r_w); - sum_sq += HADD_SW_S32(src_l_w); - sum_h = __msa_hadd_u_h(src, src); - sum = HADD_UH_U32(sum_h); - { - v16u8 src7, src8, src_r, src_l; - v16i8 mask; - v8u16 add_r, add_l; - v8i16 sub_r, sub_l, sum_r, sum_l, mask0, mask1; - v4i32 sum_sq0, sum_sq1, sum_sq2, sum_sq3; - v4i32 sub0, sub1, sub2, sub3; - v4i32 sum0_w, sum1_w, sum2_w, sum3_w; - v4i32 mul0, mul1, mul2, mul3; - v4i32 total0, total1, total2, total3; - v8i16 const8 = __msa_fill_h(8); - - src7 = LD_UB(src_dup + 7); - src8 = LD_UB(src_dup - 8); - for (col = 0; col < (cols >> 4); ++col) - { - ILVRL_B2_UB(src7, src8, src_r, src_l); - HSUB_UB2_SH(src_r, src_l, sub_r, sub_l); - - sum_r[0] = sum + sub_r[0]; - for (cnt = 0; cnt < 7; ++cnt) - { - sum_r[cnt + 1] = sum_r[cnt] + sub_r[cnt + 1]; - } - sum_l[0] = sum_r[7] + sub_l[0]; - for (cnt = 0; cnt < 7; ++cnt) - { - sum_l[cnt + 1] = sum_l[cnt] + sub_l[cnt + 1]; - } - sum = sum_l[7]; - src = LD_UB(src_dup + 16 * col); - ILVRL_B2_UH(zero, src, src_r_h, src_l_h); - src7 = (v16u8)((const8 + sum_r + (v8i16)src_r_h) >> 4); - src8 = (v16u8)((const8 + sum_l + (v8i16)src_l_h) >> 4); - tmp = (v16u8)__msa_pckev_b((v16i8)src8, (v16i8)src7); - - HADD_UB2_UH(src_r, src_l, add_r, add_l); - UNPCK_SH_SW(sub_r, sub0, sub1); - UNPCK_SH_SW(sub_l, sub2, sub3); - ILVR_H2_SW(zero, add_r, zero, add_l, sum0_w, sum2_w); - ILVL_H2_SW(zero, add_r, zero, add_l, sum1_w, sum3_w); - MUL4(sum0_w, sub0, sum1_w, sub1, sum2_w, sub2, sum3_w, sub3, - mul0, mul1, mul2, mul3); - sum_sq0[0] = sum_sq + mul0[0]; - for (cnt = 0; cnt < 3; ++cnt) - { - sum_sq0[cnt + 1] = sum_sq0[cnt] + mul0[cnt + 1]; - } - sum_sq1[0] = sum_sq0[3] + mul1[0]; - for (cnt = 0; cnt < 3; ++cnt) - { - sum_sq1[cnt + 1] = sum_sq1[cnt] + mul1[cnt + 1]; - } - sum_sq2[0] = sum_sq1[3] + mul2[0]; - for (cnt = 0; cnt < 3; ++cnt) - { - sum_sq2[cnt + 1] = sum_sq2[cnt] + mul2[cnt + 1]; - } - sum_sq3[0] = sum_sq2[3] + mul3[0]; - for (cnt = 0; cnt < 3; ++cnt) - { - sum_sq3[cnt + 1] = sum_sq3[cnt] + mul3[cnt + 1]; - } - sum_sq = sum_sq3[3]; - - UNPCK_SH_SW(sum_r, sum0_w, sum1_w); - UNPCK_SH_SW(sum_l, sum2_w, sum3_w); - total0 = sum_sq0 * __msa_ldi_w(15); - total0 -= sum0_w * sum0_w; - total1 = sum_sq1 * __msa_ldi_w(15); - total1 -= sum1_w * sum1_w; - total2 = sum_sq2 * __msa_ldi_w(15); - total2 -= sum2_w * sum2_w; - total3 = sum_sq3 * __msa_ldi_w(15); - total3 -= sum3_w * sum3_w; - total0 = (total0 < flimit_vec); - total1 = (total1 < flimit_vec); - total2 = (total2 < flimit_vec); - total3 = (total3 < flimit_vec); - PCKEV_H2_SH(total1, total0, total3, total2, mask0, mask1); - mask = __msa_pckev_b((v16i8)mask1, (v16i8)mask0); - tmp = __msa_bmz_v(tmp, src, (v16u8)mask); - - if (col == 0) - { - uint64_t src_d; - - src_d = __msa_copy_u_d((v2i64)tmp_orig, 1); - SD(src_d, (src_dup - 8)); - } - - src7 = LD_UB(src_dup + 16 * (col + 1) + 7); - src8 = LD_UB(src_dup + 16 * (col + 1) - 8); - ST_UB(tmp, (src_dup + (16 * col))); - } - - src_dup += pitch; - } - } -} - -void vp8_mbpost_proc_down_msa(uint8_t *dst_ptr, int32_t pitch, int32_t rows, - int32_t cols, int32_t flimit) -{ - int32_t row, col, cnt, i; - const int16_t *rv3 = &vp8_rv_msa[63 & rand()]; - v4i32 flimit_vec; - v16u8 dst7, dst8, dst_r_b, dst_l_b; - v16i8 mask; - v8u16 add_r, add_l; - v8i16 dst_r_h, dst_l_h, sub_r, sub_l, mask0, mask1; - v4i32 sub0, sub1, sub2, sub3, total0, total1, total2, total3; - - flimit_vec = __msa_fill_w(flimit); - - for (col = 0; col < (cols >> 4); ++col) - { - uint8_t *dst_tmp = &dst_ptr[col << 4]; - v16u8 dst; - v16i8 zero = { 0 }; - v16u8 tmp[16]; - v8i16 mult0, mult1, rv2_0, rv2_1; - v8i16 sum0_h = { 0 }; - v8i16 sum1_h = { 0 }; - v4i32 mul0 = { 0 }; - v4i32 mul1 = { 0 }; - v4i32 mul2 = { 0 }; - v4i32 mul3 = { 0 }; - v4i32 sum0_w, sum1_w, sum2_w, sum3_w; - v4i32 add0, add1, add2, add3; - const int16_t *rv2[16]; - - dst = LD_UB(dst_tmp); - for (cnt = (col << 4), i = 0; i < 16; ++cnt) - { - rv2[i] = rv3 + ((cnt * 17) & 127); - ++i; - } - for (cnt = -8; cnt < 0; ++cnt) - { - ST_UB(dst, dst_tmp + cnt * pitch); - } - - dst = LD_UB((dst_tmp + (rows - 1) * pitch)); - for (cnt = rows; cnt < rows + 17; ++cnt) - { - ST_UB(dst, dst_tmp + cnt * pitch); - } - for (cnt = -8; cnt <= 6; ++cnt) - { - dst = LD_UB(dst_tmp + (cnt * pitch)); - UNPCK_UB_SH(dst, dst_r_h, dst_l_h); - MUL2(dst_r_h, dst_r_h, dst_l_h, dst_l_h, mult0, mult1); - mul0 += (v4i32)__msa_ilvr_h((v8i16)zero, (v8i16)mult0); - mul1 += (v4i32)__msa_ilvl_h((v8i16)zero, (v8i16)mult0); - mul2 += (v4i32)__msa_ilvr_h((v8i16)zero, (v8i16)mult1); - mul3 += (v4i32)__msa_ilvl_h((v8i16)zero, (v8i16)mult1); - ADD2(sum0_h, dst_r_h, sum1_h, dst_l_h, sum0_h, sum1_h); - } - - for (row = 0; row < (rows + 8); ++row) - { - for (i = 0; i < 8; ++i) - { - rv2_0[i] = *(rv2[i] + (row & 127)); - rv2_1[i] = *(rv2[i + 8] + (row & 127)); - } - dst7 = LD_UB(dst_tmp + (7 * pitch)); - dst8 = LD_UB(dst_tmp - (8 * pitch)); - ILVRL_B2_UB(dst7, dst8, dst_r_b, dst_l_b); - - HSUB_UB2_SH(dst_r_b, dst_l_b, sub_r, sub_l); - UNPCK_SH_SW(sub_r, sub0, sub1); - UNPCK_SH_SW(sub_l, sub2, sub3); - sum0_h += sub_r; - sum1_h += sub_l; - - HADD_UB2_UH(dst_r_b, dst_l_b, add_r, add_l); - - ILVRL_H2_SW(zero, add_r, add0, add1); - ILVRL_H2_SW(zero, add_l, add2, add3); - mul0 += add0 * sub0; - mul1 += add1 * sub1; - mul2 += add2 * sub2; - mul3 += add3 * sub3; - dst = LD_UB(dst_tmp); - ILVRL_B2_SH(zero, dst, dst_r_h, dst_l_h); - dst7 = (v16u8)((rv2_0 + sum0_h + dst_r_h) >> 4); - dst8 = (v16u8)((rv2_1 + sum1_h + dst_l_h) >> 4); - tmp[row & 15] = (v16u8)__msa_pckev_b((v16i8)dst8, (v16i8)dst7); - - UNPCK_SH_SW(sum0_h, sum0_w, sum1_w); - UNPCK_SH_SW(sum1_h, sum2_w, sum3_w); - total0 = mul0 * __msa_ldi_w(15); - total0 -= sum0_w * sum0_w; - total1 = mul1 * __msa_ldi_w(15); - total1 -= sum1_w * sum1_w; - total2 = mul2 * __msa_ldi_w(15); - total2 -= sum2_w * sum2_w; - total3 = mul3 * __msa_ldi_w(15); - total3 -= sum3_w * sum3_w; - total0 = (total0 < flimit_vec); - total1 = (total1 < flimit_vec); - total2 = (total2 < flimit_vec); - total3 = (total3 < flimit_vec); - PCKEV_H2_SH(total1, total0, total3, total2, mask0, mask1); - mask = __msa_pckev_b((v16i8)mask1, (v16i8)mask0); - tmp[row & 15] = __msa_bmz_v(tmp[row & 15], dst, (v16u8)mask); - - if (row >= 8) - { - ST_UB(tmp[(row - 8) & 15], (dst_tmp - 8 * pitch)); - } - - dst_tmp += pitch; - } - } -} diff --git a/vp8/common/postproc.c b/vp8/common/postproc.c index 6baf00f1e..05e2bfc60 100644 --- a/vp8/common/postproc.c +++ b/vp8/common/postproc.c @@ -12,6 +12,7 @@ #include "vpx_config.h" #include "vpx_dsp_rtcd.h" #include "vp8_rtcd.h" +#include "vpx_dsp/postproc.h" #include "vpx_scale_rtcd.h" #include "vpx_scale/yv12config.h" #include "postproc.h" @@ -72,142 +73,11 @@ static const unsigned char MV_REFERENCE_FRAME_colors[MAX_REF_FRAMES][3] = }; #endif -const short vp8_rv[] = -{ - 8, 5, 2, 2, 8, 12, 4, 9, 8, 3, - 0, 3, 9, 0, 0, 0, 8, 3, 14, 4, - 10, 1, 11, 14, 1, 14, 9, 6, 12, 11, - 8, 6, 10, 0, 0, 8, 9, 0, 3, 14, - 8, 11, 13, 4, 2, 9, 0, 3, 9, 6, - 1, 2, 3, 14, 13, 1, 8, 2, 9, 7, - 3, 3, 1, 13, 13, 6, 6, 5, 2, 7, - 11, 9, 11, 8, 7, 3, 2, 0, 13, 13, - 14, 4, 12, 5, 12, 10, 8, 10, 13, 10, - 4, 14, 4, 10, 0, 8, 11, 1, 13, 7, - 7, 14, 6, 14, 13, 2, 13, 5, 4, 4, - 0, 10, 0, 5, 13, 2, 12, 7, 11, 13, - 8, 0, 4, 10, 7, 2, 7, 2, 2, 5, - 3, 4, 7, 3, 3, 14, 14, 5, 9, 13, - 3, 14, 3, 6, 3, 0, 11, 8, 13, 1, - 13, 1, 12, 0, 10, 9, 7, 6, 2, 8, - 5, 2, 13, 7, 1, 13, 14, 7, 6, 7, - 9, 6, 10, 11, 7, 8, 7, 5, 14, 8, - 4, 4, 0, 8, 7, 10, 0, 8, 14, 11, - 3, 12, 5, 7, 14, 3, 14, 5, 2, 6, - 11, 12, 12, 8, 0, 11, 13, 1, 2, 0, - 5, 10, 14, 7, 8, 0, 4, 11, 0, 8, - 0, 3, 10, 5, 8, 0, 11, 6, 7, 8, - 10, 7, 13, 9, 2, 5, 1, 5, 10, 2, - 4, 3, 5, 6, 10, 8, 9, 4, 11, 14, - 0, 10, 0, 5, 13, 2, 12, 7, 11, 13, - 8, 0, 4, 10, 7, 2, 7, 2, 2, 5, - 3, 4, 7, 3, 3, 14, 14, 5, 9, 13, - 3, 14, 3, 6, 3, 0, 11, 8, 13, 1, - 13, 1, 12, 0, 10, 9, 7, 6, 2, 8, - 5, 2, 13, 7, 1, 13, 14, 7, 6, 7, - 9, 6, 10, 11, 7, 8, 7, 5, 14, 8, - 4, 4, 0, 8, 7, 10, 0, 8, 14, 11, - 3, 12, 5, 7, 14, 3, 14, 5, 2, 6, - 11, 12, 12, 8, 0, 11, 13, 1, 2, 0, - 5, 10, 14, 7, 8, 0, 4, 11, 0, 8, - 0, 3, 10, 5, 8, 0, 11, 6, 7, 8, - 10, 7, 13, 9, 2, 5, 1, 5, 10, 2, - 4, 3, 5, 6, 10, 8, 9, 4, 11, 14, - 3, 8, 3, 7, 8, 5, 11, 4, 12, 3, - 11, 9, 14, 8, 14, 13, 4, 3, 1, 2, - 14, 6, 5, 4, 4, 11, 4, 6, 2, 1, - 5, 8, 8, 12, 13, 5, 14, 10, 12, 13, - 0, 9, 5, 5, 11, 10, 13, 9, 10, 13, -}; extern void vp8_blit_text(const char *msg, unsigned char *address, const int pitch); extern void vp8_blit_line(int x0, int x1, int y0, int y1, unsigned char *image, const int pitch); /*********************************************************************************************************** */ -void vp8_post_proc_down_and_across_mb_row_c -( - unsigned char *src_ptr, - unsigned char *dst_ptr, - int src_pixels_per_line, - int dst_pixels_per_line, - int cols, - unsigned char *f, - int size -) -{ - unsigned char *p_src, *p_dst; - int row; - int col; - unsigned char v; - unsigned char d[4]; - - for (row = 0; row < size; row++) - { - /* post_proc_down for one row */ - p_src = src_ptr; - p_dst = dst_ptr; - - for (col = 0; col < cols; col++) - { - unsigned char p_above2 = p_src[col - 2 * src_pixels_per_line]; - unsigned char p_above1 = p_src[col - src_pixels_per_line]; - unsigned char p_below1 = p_src[col + src_pixels_per_line]; - unsigned char p_below2 = p_src[col + 2 * src_pixels_per_line]; - - v = p_src[col]; - - if ((abs(v - p_above2) < f[col]) && (abs(v - p_above1) < f[col]) - && (abs(v - p_below1) < f[col]) && (abs(v - p_below2) < f[col])) - { - unsigned char k1, k2, k3; - k1 = (p_above2 + p_above1 + 1) >> 1; - k2 = (p_below2 + p_below1 + 1) >> 1; - k3 = (k1 + k2 + 1) >> 1; - v = (k3 + v + 1) >> 1; - } - - p_dst[col] = v; - } - - /* now post_proc_across */ - p_src = dst_ptr; - p_dst = dst_ptr; - - p_src[-2] = p_src[-1] = p_src[0]; - p_src[cols] = p_src[cols + 1] = p_src[cols - 1]; - - for (col = 0; col < cols; col++) - { - v = p_src[col]; - - if ((abs(v - p_src[col - 2]) < f[col]) - && (abs(v - p_src[col - 1]) < f[col]) - && (abs(v - p_src[col + 1]) < f[col]) - && (abs(v - p_src[col + 2]) < f[col])) - { - unsigned char k1, k2, k3; - k1 = (p_src[col - 2] + p_src[col - 1] + 1) >> 1; - k2 = (p_src[col + 2] + p_src[col + 1] + 1) >> 1; - k3 = (k1 + k2 + 1) >> 1; - v = (k3 + v + 1) >> 1; - } - - d[col & 3] = v; - - if (col >= 2) - p_dst[col - 2] = d[(col - 2) & 3]; - } - - /* handle the last two pixels */ - p_dst[col - 2] = d[(col - 2) & 3]; - p_dst[col - 1] = d[(col - 1) & 3]; - - /* next row */ - src_ptr += src_pixels_per_line; - dst_ptr += dst_pixels_per_line; - } -} - static int q2mbl(int x) { if (x < 20) x = 20; @@ -216,108 +86,13 @@ static int q2mbl(int x) return x * x / 3; } -void vp8_mbpost_proc_across_ip_c(unsigned char *src, int pitch, int rows, int cols, int flimit) -{ - int r, c, i; - - unsigned char *s = src; - unsigned char d[16]; - - for (r = 0; r < rows; r++) - { - int sumsq = 0; - int sum = 0; - - for (i = -8; i < 0; i++) - s[i]=s[0]; - - /* 17 avoids valgrind warning - we buffer values in c in d - * and only write them when we've read 8 ahead... - */ - for (i = 0; i < 17; i++) - s[i+cols]=s[cols-1]; - - for (i = -8; i <= 6; i++) - { - sumsq += s[i] * s[i]; - sum += s[i]; - d[i+8] = 0; - } - - for (c = 0; c < cols + 8; c++) - { - int x = s[c+7] - s[c-8]; - int y = s[c+7] + s[c-8]; - - sum += x; - sumsq += x * y; - - d[c&15] = s[c]; - - if (sumsq * 15 - sum * sum < flimit) - { - d[c&15] = (8 + sum + s[c]) >> 4; - } - - s[c-8] = d[(c-8)&15]; - } - - s += pitch; - } -} - -void vp8_mbpost_proc_down_c(unsigned char *dst, int pitch, int rows, int cols, int flimit) -{ - int r, c, i; - const short *rv3 = &vp8_rv[63&rand()]; - - for (c = 0; c < cols; c++ ) - { - unsigned char *s = &dst[c]; - int sumsq = 0; - int sum = 0; - unsigned char d[16]; - const short *rv2 = rv3 + ((c * 17) & 127); - - for (i = -8; i < 0; i++) - s[i*pitch]=s[0]; - - /* 17 avoids valgrind warning - we buffer values in c in d - * and only write them when we've read 8 ahead... - */ - for (i = 0; i < 17; i++) - s[(i+rows)*pitch]=s[(rows-1)*pitch]; - - for (i = -8; i <= 6; i++) - { - sumsq += s[i*pitch] * s[i*pitch]; - sum += s[i*pitch]; - } - - for (r = 0; r < rows + 8; r++) - { - sumsq += s[7*pitch] * s[ 7*pitch] - s[-8*pitch] * s[-8*pitch]; - sum += s[7*pitch] - s[-8*pitch]; - d[r&15] = s[0]; - - if (sumsq * 15 - sum * sum < flimit) - { - d[r&15] = (rv2[r&127] + sum + s[0]) >> 4; - } - if (r >= 8) - s[-8*pitch] = d[(r-8)&15]; - s += pitch; - } - } -} - #if CONFIG_POSTPROC static void vp8_de_mblock(YV12_BUFFER_CONFIG *post, int q) { - vp8_mbpost_proc_across_ip(post->y_buffer, post->y_stride, post->y_height, + vpx_mbpost_proc_across_ip(post->y_buffer, post->y_stride, post->y_height, post->y_width, q2mbl(q)); - vp8_mbpost_proc_down(post->y_buffer, post->y_stride, post->y_height, + vpx_mbpost_proc_down(post->y_buffer, post->y_stride, post->y_height, post->y_width, q2mbl(q)); } @@ -365,16 +140,16 @@ void vp8_deblock(VP8_COMMON *cm, } mode_info_context++; - vp8_post_proc_down_and_across_mb_row( + vpx_post_proc_down_and_across_mb_row( source->y_buffer + 16 * mbr * source->y_stride, post->y_buffer + 16 * mbr * post->y_stride, source->y_stride, post->y_stride, source->y_width, ylimits, 16); - vp8_post_proc_down_and_across_mb_row( + vpx_post_proc_down_and_across_mb_row( source->u_buffer + 8 * mbr * source->uv_stride, post->u_buffer + 8 * mbr * post->uv_stride, source->uv_stride, post->uv_stride, source->uv_width, uvlimits, 8); - vp8_post_proc_down_and_across_mb_row( + vpx_post_proc_down_and_across_mb_row( source->v_buffer + 8 * mbr * source->uv_stride, post->v_buffer + 8 * mbr * post->uv_stride, source->uv_stride, post->uv_stride, source->uv_width, uvlimits, 8); @@ -409,17 +184,17 @@ void vp8_de_noise(VP8_COMMON *cm, /* TODO: The original code don't filter the 2 outer rows and columns. */ for (mbr = 0; mbr < mb_rows; mbr++) { - vp8_post_proc_down_and_across_mb_row( + vpx_post_proc_down_and_across_mb_row( source->y_buffer + 16 * mbr * source->y_stride, source->y_buffer + 16 * mbr * source->y_stride, source->y_stride, source->y_stride, source->y_width, limits, 16); if (uvfilter == 1) { - vp8_post_proc_down_and_across_mb_row( + vpx_post_proc_down_and_across_mb_row( source->u_buffer + 8 * mbr * source->uv_stride, source->u_buffer + 8 * mbr * source->uv_stride, source->uv_stride, source->uv_stride, source->uv_width, limits, 8); - vp8_post_proc_down_and_across_mb_row( + vpx_post_proc_down_and_across_mb_row( source->v_buffer + 8 * mbr * source->uv_stride, source->v_buffer + 8 * mbr * source->uv_stride, source->uv_stride, source->uv_stride, source->uv_width, limits, @@ -428,69 +203,6 @@ void vp8_de_noise(VP8_COMMON *cm, } } -static double gaussian(double sigma, double mu, double x) -{ - return 1 / (sigma * sqrt(2.0 * 3.14159265)) * - (exp(-(x - mu) * (x - mu) / (2 * sigma * sigma))); -} - -static void fillrd(struct postproc_state *state, int q, int a) -{ - char char_dist[300]; - - double sigma; - int i; - - vp8_clear_system_state(); - - - sigma = a + .5 + .6 * (63 - q) / 63.0; - - /* set up a lookup table of 256 entries that matches - * a gaussian distribution with sigma determined by q. - */ - { - int next, j; - - next = 0; - - for (i = -32; i < 32; i++) - { - const int v = (int)(.5 + 256 * gaussian(sigma, 0, i)); - - if (v) - { - for (j = 0; j < v; j++) - { - char_dist[next+j] = (char) i; - } - - next = next + j; - } - - } - - for (; next < 256; next++) - char_dist[next] = 0; - - } - - for (i = 0; i < 3072; i++) - { - state->noise[i] = char_dist[rand() & 0xff]; - } - - for (i = 0; i < 16; i++) - { - state->blackclamp[i] = -char_dist[0]; - state->whiteclamp[i] = -char_dist[0]; - state->bothclamp[i] = -2 * char_dist[0]; - } - - state->last_q = q; - state->last_noise = a; -} - /* Blend the macro block with a solid colored square. Leave the * edges unblended to give distinction to macro blocks in areas * filled with the same color block. @@ -778,7 +490,22 @@ int vp8_post_proc_frame(VP8_COMMON *oci, YV12_BUFFER_CONFIG *dest, vp8_ppflags_t if (oci->postproc_state.last_q != q || oci->postproc_state.last_noise != noise_level) { - fillrd(&oci->postproc_state, 63 - q, noise_level); + double sigma; + int clamp, i; + struct postproc_state *ppstate = &oci->postproc_state; + vp8_clear_system_state(); + sigma = noise_level + .5 + .6 * q / 63.0; + clamp = vpx_setup_noise(sigma, sizeof(ppstate->noise), + ppstate->noise); + for (i = 0; i < 16; i++) + { + ppstate->blackclamp[i] = clamp; + ppstate->whiteclamp[i] = clamp; + ppstate->bothclamp[i] = 2 * clamp; + } + + ppstate->last_q = q; + ppstate->last_noise = noise_level; } vpx_plane_add_noise diff --git a/vp8/common/rtcd_defs.pl b/vp8/common/rtcd_defs.pl index 856ede189..a440352f4 100644 --- a/vp8/common/rtcd_defs.pl +++ b/vp8/common/rtcd_defs.pl @@ -156,16 +156,6 @@ $vp8_copy_mem8x4_dspr2=vp8_copy_mem8x4_dspr2; # Postproc # if (vpx_config("CONFIG_POSTPROC") eq "yes") { - add_proto qw/void vp8_mbpost_proc_down/, "unsigned char *dst, int pitch, int rows, int cols,int flimit"; - specialize qw/vp8_mbpost_proc_down mmx sse2 msa/; - $vp8_mbpost_proc_down_sse2=vp8_mbpost_proc_down_xmm; - - add_proto qw/void vp8_mbpost_proc_across_ip/, "unsigned char *dst, int pitch, int rows, int cols,int flimit"; - specialize qw/vp8_mbpost_proc_across_ip sse2 msa/; - $vp8_mbpost_proc_across_ip_sse2=vp8_mbpost_proc_across_ip_xmm; - - add_proto qw/void vp8_post_proc_down_and_across_mb_row/, "unsigned char *src, unsigned char *dst, int src_pitch, int dst_pitch, int cols, unsigned char *flimits, int size"; - specialize qw/vp8_post_proc_down_and_across_mb_row sse2 msa/; add_proto qw/void vp8_blend_mb_inner/, "unsigned char *y, unsigned char *u, unsigned char *v, int y1, int u1, int v1, int alpha, int stride"; # no asm yet diff --git a/vp8/common/x86/mfqe_sse2.asm b/vp8/common/x86/mfqe_sse2.asm index a8a7f568d..8177b7922 100644 --- a/vp8/common/x86/mfqe_sse2.asm +++ b/vp8/common/x86/mfqe_sse2.asm @@ -45,7 +45,7 @@ sym(vp8_filter_by_weight16x16_sse2): mov rcx, 16 ; loop count pxor xmm6, xmm6 -.combine +.combine: movdqa xmm2, [rax] movdqa xmm4, [rdx] add rax, rsi @@ -122,7 +122,7 @@ sym(vp8_filter_by_weight8x8_sse2): mov rcx, 8 ; loop count pxor xmm4, xmm4 -.combine +.combine: movq xmm2, [rax] movq xmm3, [rdx] add rax, rsi @@ -189,7 +189,7 @@ sym(vp8_variance_and_sad_16x16_sse2): ; Because we're working with the actual output frames ; we can't depend on any kind of data alignment. -.accumulate +.accumulate: movdqa xmm0, [rax] ; src1 movdqa xmm1, [rdx] ; src2 add rax, rcx ; src1 + stride1 diff --git a/vp8/common/x86/postproc_mmx.asm b/vp8/common/x86/postproc_mmx.asm deleted file mode 100644 index 1a89e7ead..000000000 --- a/vp8/common/x86/postproc_mmx.asm +++ /dev/null @@ -1,253 +0,0 @@ -; -; Copyright (c) 2010 The WebM project authors. All Rights Reserved. -; -; Use of this source code is governed by a BSD-style license -; that can be found in the LICENSE file in the root of the source -; tree. An additional intellectual property rights grant can be found -; in the file PATENTS. All contributing project authors may -; be found in the AUTHORS file in the root of the source tree. -; - - -%include "vpx_ports/x86_abi_support.asm" - -%define VP8_FILTER_WEIGHT 128 -%define VP8_FILTER_SHIFT 7 - -;void vp8_mbpost_proc_down_mmx(unsigned char *dst, -; int pitch, int rows, int cols,int flimit) -extern sym(vp8_rv) -global sym(vp8_mbpost_proc_down_mmx) PRIVATE -sym(vp8_mbpost_proc_down_mmx): - push rbp - mov rbp, rsp - SHADOW_ARGS_TO_STACK 5 - GET_GOT rbx - push rsi - push rdi - ; end prolog - - ALIGN_STACK 16, rax - sub rsp, 136 - - ; unsigned char d[16][8] at [rsp] - ; create flimit2 at [rsp+128] - mov eax, dword ptr arg(4) ;flimit - mov [rsp+128], eax - mov [rsp+128+4], eax -%define flimit2 [rsp+128] - -%if ABI_IS_32BIT=0 - lea r8, [GLOBAL(sym(vp8_rv))] -%endif - - ;rows +=8; - add dword ptr arg(2), 8 - - ;for(c=0; cpost_proc_buffer); vpx_free_frame_buffer(&cm->post_proc_buffer_int); + vpx_free(cm->postproc_state.limits); + cm->postproc_state.limits = 0; #else (void)cm; #endif diff --git a/vp9/common/vp9_postproc.c b/vp9/common/vp9_postproc.c index c04cc8f05..4651f67a0 100644 --- a/vp9/common/vp9_postproc.c +++ b/vp9/common/vp9_postproc.c @@ -18,6 +18,7 @@ #include "./vp9_rtcd.h" #include "vpx_dsp/vpx_dsp_common.h" +#include "vpx_dsp/postproc.h" #include "vpx_ports/mem.h" #include "vpx_ports/system_state.h" #include "vpx_scale/vpx_scale.h" @@ -32,128 +33,9 @@ static const int16_t kernel5[] = { 1, 1, 4, 1, 1 }; -const int16_t vp9_rv[] = { - 8, 5, 2, 2, 8, 12, 4, 9, 8, 3, - 0, 3, 9, 0, 0, 0, 8, 3, 14, 4, - 10, 1, 11, 14, 1, 14, 9, 6, 12, 11, - 8, 6, 10, 0, 0, 8, 9, 0, 3, 14, - 8, 11, 13, 4, 2, 9, 0, 3, 9, 6, - 1, 2, 3, 14, 13, 1, 8, 2, 9, 7, - 3, 3, 1, 13, 13, 6, 6, 5, 2, 7, - 11, 9, 11, 8, 7, 3, 2, 0, 13, 13, - 14, 4, 12, 5, 12, 10, 8, 10, 13, 10, - 4, 14, 4, 10, 0, 8, 11, 1, 13, 7, - 7, 14, 6, 14, 13, 2, 13, 5, 4, 4, - 0, 10, 0, 5, 13, 2, 12, 7, 11, 13, - 8, 0, 4, 10, 7, 2, 7, 2, 2, 5, - 3, 4, 7, 3, 3, 14, 14, 5, 9, 13, - 3, 14, 3, 6, 3, 0, 11, 8, 13, 1, - 13, 1, 12, 0, 10, 9, 7, 6, 2, 8, - 5, 2, 13, 7, 1, 13, 14, 7, 6, 7, - 9, 6, 10, 11, 7, 8, 7, 5, 14, 8, - 4, 4, 0, 8, 7, 10, 0, 8, 14, 11, - 3, 12, 5, 7, 14, 3, 14, 5, 2, 6, - 11, 12, 12, 8, 0, 11, 13, 1, 2, 0, - 5, 10, 14, 7, 8, 0, 4, 11, 0, 8, - 0, 3, 10, 5, 8, 0, 11, 6, 7, 8, - 10, 7, 13, 9, 2, 5, 1, 5, 10, 2, - 4, 3, 5, 6, 10, 8, 9, 4, 11, 14, - 0, 10, 0, 5, 13, 2, 12, 7, 11, 13, - 8, 0, 4, 10, 7, 2, 7, 2, 2, 5, - 3, 4, 7, 3, 3, 14, 14, 5, 9, 13, - 3, 14, 3, 6, 3, 0, 11, 8, 13, 1, - 13, 1, 12, 0, 10, 9, 7, 6, 2, 8, - 5, 2, 13, 7, 1, 13, 14, 7, 6, 7, - 9, 6, 10, 11, 7, 8, 7, 5, 14, 8, - 4, 4, 0, 8, 7, 10, 0, 8, 14, 11, - 3, 12, 5, 7, 14, 3, 14, 5, 2, 6, - 11, 12, 12, 8, 0, 11, 13, 1, 2, 0, - 5, 10, 14, 7, 8, 0, 4, 11, 0, 8, - 0, 3, 10, 5, 8, 0, 11, 6, 7, 8, - 10, 7, 13, 9, 2, 5, 1, 5, 10, 2, - 4, 3, 5, 6, 10, 8, 9, 4, 11, 14, - 3, 8, 3, 7, 8, 5, 11, 4, 12, 3, - 11, 9, 14, 8, 14, 13, 4, 3, 1, 2, - 14, 6, 5, 4, 4, 11, 4, 6, 2, 1, - 5, 8, 8, 12, 13, 5, 14, 10, 12, 13, - 0, 9, 5, 5, 11, 10, 13, 9, 10, 13, -}; - static const uint8_t q_diff_thresh = 20; static const uint8_t last_q_thresh = 170; - -void vp9_post_proc_down_and_across_c(const uint8_t *src_ptr, - uint8_t *dst_ptr, - int src_pixels_per_line, - int dst_pixels_per_line, - int rows, - int cols, - int flimit) { - uint8_t const *p_src; - uint8_t *p_dst; - int row, col, i, v, kernel; - int pitch = src_pixels_per_line; - uint8_t d[8]; - (void)dst_pixels_per_line; - - for (row = 0; row < rows; row++) { - /* post_proc_down for one row */ - p_src = src_ptr; - p_dst = dst_ptr; - - for (col = 0; col < cols; col++) { - kernel = 4; - v = p_src[col]; - - for (i = -2; i <= 2; i++) { - if (abs(v - p_src[col + i * pitch]) > flimit) - goto down_skip_convolve; - - kernel += kernel5[2 + i] * p_src[col + i * pitch]; - } - - v = (kernel >> 3); - down_skip_convolve: - p_dst[col] = v; - } - - /* now post_proc_across */ - p_src = dst_ptr; - p_dst = dst_ptr; - - for (i = 0; i < 8; i++) - d[i] = p_src[i]; - - for (col = 0; col < cols; col++) { - kernel = 4; - v = p_src[col]; - - d[col & 7] = v; - - for (i = -2; i <= 2; i++) { - if (abs(v - p_src[col + i]) > flimit) - goto across_skip_convolve; - - kernel += kernel5[2 + i] * p_src[col + i]; - } - - d[col & 7] = (kernel >> 3); - across_skip_convolve: - - if (col >= 2) - p_dst[col - 2] = d[(col - 2) & 7]; - } - - /* handle the last two pixels */ - p_dst[col - 2] = d[(col - 2) & 7]; - p_dst[col - 1] = d[(col - 1) & 7]; - - - /* next row */ - src_ptr += pitch; - dst_ptr += pitch; - } -} +extern const int16_t vpx_rv[]; #if CONFIG_VP9_HIGHBITDEPTH void vp9_highbd_post_proc_down_and_across_c(const uint16_t *src_ptr, @@ -237,41 +119,6 @@ static int q2mbl(int x) { return x * x / 3; } -void vp9_mbpost_proc_across_ip_c(uint8_t *src, int pitch, - int rows, int cols, int flimit) { - int r, c, i; - uint8_t *s = src; - uint8_t d[16]; - - for (r = 0; r < rows; r++) { - int sumsq = 0; - int sum = 0; - - for (i = -8; i <= 6; i++) { - sumsq += s[i] * s[i]; - sum += s[i]; - d[i + 8] = 0; - } - - for (c = 0; c < cols + 8; c++) { - int x = s[c + 7] - s[c - 8]; - int y = s[c + 7] + s[c - 8]; - - sum += x; - sumsq += x * y; - - d[c & 15] = s[c]; - - if (sumsq * 15 - sum * sum < flimit) { - d[c & 15] = (8 + sum + s[c]) >> 4; - } - - s[c - 8] = d[(c - 8) & 15]; - } - s += pitch; - } -} - #if CONFIG_VP9_HIGHBITDEPTH void vp9_highbd_mbpost_proc_across_ip_c(uint16_t *src, int pitch, int rows, int cols, int flimit) { @@ -312,43 +159,12 @@ void vp9_highbd_mbpost_proc_across_ip_c(uint16_t *src, int pitch, } #endif // CONFIG_VP9_HIGHBITDEPTH -void vp9_mbpost_proc_down_c(uint8_t *dst, int pitch, - int rows, int cols, int flimit) { - int r, c, i; - const short *rv3 = &vp9_rv[63 & rand()]; // NOLINT - - for (c = 0; c < cols; c++) { - uint8_t *s = &dst[c]; - int sumsq = 0; - int sum = 0; - uint8_t d[16]; - const int16_t *rv2 = rv3 + ((c * 17) & 127); - - for (i = -8; i <= 6; i++) { - sumsq += s[i * pitch] * s[i * pitch]; - sum += s[i * pitch]; - } - - for (r = 0; r < rows + 8; r++) { - sumsq += s[7 * pitch] * s[ 7 * pitch] - s[-8 * pitch] * s[-8 * pitch]; - sum += s[7 * pitch] - s[-8 * pitch]; - d[r & 15] = s[0]; - - if (sumsq * 15 - sum * sum < flimit) { - d[r & 15] = (rv2[r & 127] + sum + s[0]) >> 4; - } - - s[-8 * pitch] = d[(r - 8) & 15]; - s += pitch; - } - } -} #if CONFIG_VP9_HIGHBITDEPTH void vp9_highbd_mbpost_proc_down_c(uint16_t *dst, int pitch, int rows, int cols, int flimit) { int r, c, i; - const int16_t *rv3 = &vp9_rv[63 & rand()]; // NOLINT + const int16_t *rv3 = &vpx_rv[63 & rand()]; // NOLINT for (c = 0; c < cols; c++) { uint16_t *s = &dst[c]; @@ -382,14 +198,14 @@ static void deblock_and_de_macro_block(YV12_BUFFER_CONFIG *source, YV12_BUFFER_CONFIG *post, int q, int low_var_thresh, - int flag) { - double level = 6.0e-05 * q * q * q - .0067 * q * q + .306 * q + .0065; - int ppl = (int)(level + .5); + int flag, + uint8_t *limits) { (void) low_var_thresh; (void) flag; - #if CONFIG_VP9_HIGHBITDEPTH if (source->flags & YV12_FLAG_HIGHBITDEPTH) { + double level = 6.0e-05 * q * q * q - .0067 * q * q + .306 * q + .0065; + int ppl = (int)(level + .5); vp9_highbd_post_proc_down_and_across(CONVERT_TO_SHORTPTR(source->y_buffer), CONVERT_TO_SHORTPTR(post->y_buffer), source->y_stride, post->y_stride, @@ -415,177 +231,68 @@ static void deblock_and_de_macro_block(YV12_BUFFER_CONFIG *source, source->uv_height, source->uv_width, ppl); } else { - vp9_post_proc_down_and_across(source->y_buffer, post->y_buffer, - source->y_stride, post->y_stride, - source->y_height, source->y_width, ppl); - - vp9_mbpost_proc_across_ip(post->y_buffer, post->y_stride, post->y_height, +#endif // CONFIG_VP9_HIGHBITDEPTH + vp9_deblock(source, post, q, limits); + vpx_mbpost_proc_across_ip(post->y_buffer, post->y_stride, post->y_height, post->y_width, q2mbl(q)); - - vp9_mbpost_proc_down(post->y_buffer, post->y_stride, post->y_height, + vpx_mbpost_proc_down(post->y_buffer, post->y_stride, post->y_height, post->y_width, q2mbl(q)); - - vp9_post_proc_down_and_across(source->u_buffer, post->u_buffer, - source->uv_stride, post->uv_stride, - source->uv_height, source->uv_width, ppl); - vp9_post_proc_down_and_across(source->v_buffer, post->v_buffer, - source->uv_stride, post->uv_stride, - source->uv_height, source->uv_width, ppl); +#if CONFIG_VP9_HIGHBITDEPTH } -#else - vp9_post_proc_down_and_across(source->y_buffer, post->y_buffer, - source->y_stride, post->y_stride, - source->y_height, source->y_width, ppl); - - vp9_mbpost_proc_across_ip(post->y_buffer, post->y_stride, post->y_height, - post->y_width, q2mbl(q)); - - vp9_mbpost_proc_down(post->y_buffer, post->y_stride, post->y_height, - post->y_width, q2mbl(q)); - - vp9_post_proc_down_and_across(source->u_buffer, post->u_buffer, - source->uv_stride, post->uv_stride, - source->uv_height, source->uv_width, ppl); - vp9_post_proc_down_and_across(source->v_buffer, post->v_buffer, - source->uv_stride, post->uv_stride, - source->uv_height, source->uv_width, ppl); #endif // CONFIG_VP9_HIGHBITDEPTH } void vp9_deblock(const YV12_BUFFER_CONFIG *src, YV12_BUFFER_CONFIG *dst, - int q) { + int q, uint8_t *limits) { const int ppl = (int)(6.0e-05 * q * q * q - 0.0067 * q * q + 0.306 * q + 0.0065 + 0.5); - int i; - - const uint8_t *const srcs[3] = {src->y_buffer, src->u_buffer, src->v_buffer}; - const int src_strides[3] = {src->y_stride, src->uv_stride, src->uv_stride}; - const int src_widths[3] = {src->y_width, src->uv_width, src->uv_width}; - const int src_heights[3] = {src->y_height, src->uv_height, src->uv_height}; - - uint8_t *const dsts[3] = {dst->y_buffer, dst->u_buffer, dst->v_buffer}; - const int dst_strides[3] = {dst->y_stride, dst->uv_stride, dst->uv_stride}; - - for (i = 0; i < MAX_MB_PLANE; ++i) { #if CONFIG_VP9_HIGHBITDEPTH - assert((src->flags & YV12_FLAG_HIGHBITDEPTH) == - (dst->flags & YV12_FLAG_HIGHBITDEPTH)); - if (src->flags & YV12_FLAG_HIGHBITDEPTH) { + if (src->flags & YV12_FLAG_HIGHBITDEPTH) { + int i; + const uint8_t * const srcs[3] = + {src->y_buffer, src->u_buffer, src->v_buffer}; + const int src_strides[3] = {src->y_stride, src->uv_stride, src->uv_stride}; + const int src_widths[3] = {src->y_width, src->uv_width, src->uv_width}; + const int src_heights[3] = {src->y_height, src->uv_height, src->uv_height}; + + uint8_t * const dsts[3] = {dst->y_buffer, dst->u_buffer, dst->v_buffer}; + const int dst_strides[3] = {dst->y_stride, dst->uv_stride, dst->uv_stride}; + for (i = 0; i < MAX_MB_PLANE; ++i) { vp9_highbd_post_proc_down_and_across(CONVERT_TO_SHORTPTR(srcs[i]), CONVERT_TO_SHORTPTR(dsts[i]), src_strides[i], dst_strides[i], src_heights[i], src_widths[i], ppl); - } else { - vp9_post_proc_down_and_across(srcs[i], dsts[i], - src_strides[i], dst_strides[i], - src_heights[i], src_widths[i], ppl); } -#else - vp9_post_proc_down_and_across(srcs[i], dsts[i], - src_strides[i], dst_strides[i], - src_heights[i], src_widths[i], ppl); + } else { #endif // CONFIG_VP9_HIGHBITDEPTH - } -} - -void vp9_denoise(const YV12_BUFFER_CONFIG *src, YV12_BUFFER_CONFIG *dst, - int q) { - const int ppl = (int)(6.0e-05 * q * q * q - 0.0067 * q * q + 0.306 * q - + 0.0065 + 0.5); - int i; - - const uint8_t *const srcs[3] = {src->y_buffer, src->u_buffer, src->v_buffer}; - const int src_strides[3] = {src->y_stride, src->uv_stride, src->uv_stride}; - const int src_widths[3] = {src->y_width, src->uv_width, src->uv_width}; - const int src_heights[3] = {src->y_height, src->uv_height, src->uv_height}; - - uint8_t *const dsts[3] = {dst->y_buffer, dst->u_buffer, dst->v_buffer}; - const int dst_strides[3] = {dst->y_stride, dst->uv_stride, dst->uv_stride}; - - for (i = 0; i < MAX_MB_PLANE; ++i) { - const int src_stride = src_strides[i]; - const int src_width = src_widths[i] - 4; - const int src_height = src_heights[i] - 4; - const int dst_stride = dst_strides[i]; - -#if CONFIG_VP9_HIGHBITDEPTH - assert((src->flags & YV12_FLAG_HIGHBITDEPTH) == - (dst->flags & YV12_FLAG_HIGHBITDEPTH)); - if (src->flags & YV12_FLAG_HIGHBITDEPTH) { - const uint16_t *const src_plane = CONVERT_TO_SHORTPTR( - srcs[i] + 2 * src_stride + 2); - uint16_t *const dst_plane = CONVERT_TO_SHORTPTR( - dsts[i] + 2 * dst_stride + 2); - vp9_highbd_post_proc_down_and_across(src_plane, dst_plane, src_stride, - dst_stride, src_height, src_width, - ppl); - } else { - const uint8_t *const src_plane = srcs[i] + 2 * src_stride + 2; - uint8_t *const dst_plane = dsts[i] + 2 * dst_stride + 2; - - vp9_post_proc_down_and_across(src_plane, dst_plane, src_stride, - dst_stride, src_height, src_width, ppl); + int mbr; + const int mb_rows = src->y_height / 16; + const int mb_cols = src->y_width / 16; + + memset(limits, (unsigned char) ppl, 16 * mb_cols); + + for (mbr = 0; mbr < mb_rows; mbr++) { + vpx_post_proc_down_and_across_mb_row( + src->y_buffer + 16 * mbr * src->y_stride, + dst->y_buffer + 16 * mbr * dst->y_stride, src->y_stride, + dst->y_stride, src->y_width, limits, 16); + vpx_post_proc_down_and_across_mb_row( + src->u_buffer + 8 * mbr * src->uv_stride, + dst->u_buffer + 8 * mbr * dst->uv_stride, src->uv_stride, + dst->uv_stride, src->uv_width, limits, 8); + vpx_post_proc_down_and_across_mb_row( + src->v_buffer + 8 * mbr * src->uv_stride, + dst->v_buffer + 8 * mbr * dst->uv_stride, src->uv_stride, + dst->uv_stride, src->uv_width, limits, 8); } -#else - const uint8_t *const src_plane = srcs[i] + 2 * src_stride + 2; - uint8_t *const dst_plane = dsts[i] + 2 * dst_stride + 2; - vp9_post_proc_down_and_across(src_plane, dst_plane, src_stride, dst_stride, - src_height, src_width, ppl); -#endif +#if CONFIG_VP9_HIGHBITDEPTH } +#endif // CONFIG_VP9_HIGHBITDEPTH } -static double gaussian(double sigma, double mu, double x) { - return 1 / (sigma * sqrt(2.0 * 3.14159265)) * - (exp(-(x - mu) * (x - mu) / (2 * sigma * sigma))); -} - -static void fillrd(struct postproc_state *state, int q, int a) { - char char_dist[300]; - - double sigma; - int ai = a, qi = q, i; - - vpx_clear_system_state(); - - sigma = ai + .5 + .6 * (63 - qi) / 63.0; - - /* set up a lookup table of 256 entries that matches - * a gaussian distribution with sigma determined by q. - */ - { - int next, j; - - next = 0; - - for (i = -32; i < 32; i++) { - int a_i = (int)(0.5 + 256 * gaussian(sigma, 0, i)); - - if (a_i) { - for (j = 0; j < a_i; j++) { - char_dist[next + j] = (char) i; - } - - next = next + j; - } - } - - for (; next < 256; next++) - char_dist[next] = 0; - } - - for (i = 0; i < 3072; i++) { - state->noise[i] = char_dist[rand() & 0xff]; // NOLINT - } - - for (i = 0; i < 16; i++) { - state->blackclamp[i] = -char_dist[0]; - state->whiteclamp[i] = -char_dist[0]; - state->bothclamp[i] = -2 * char_dist[0]; - } - - state->last_q = q; - state->last_noise = a; +void vp9_denoise(const YV12_BUFFER_CONFIG *src, YV12_BUFFER_CONFIG *dst, + int q, uint8_t *limits) { + vp9_deblock(src, dst, q, limits); } static void swap_mi_and_prev_mi(VP9_COMMON *cm) { @@ -663,6 +370,14 @@ int vp9_post_proc_frame(struct VP9Common *cm, vpx_internal_error(&cm->error, VPX_CODEC_MEM_ERROR, "Failed to allocate post-processing buffer"); + if (flags & (VP9D_DEMACROBLOCK | VP9D_DEBLOCK)) { + if (!cm->postproc_state.limits) { + cm->postproc_state.limits = vpx_calloc( + cm->width, sizeof(*cm->postproc_state.limits)); + } + } + + if ((flags & VP9D_MFQE) && cm->current_video_frame >= 2 && cm->postproc_state.last_frame_valid && cm->bit_depth == 8 && cm->postproc_state.last_base_qindex <= last_q_thresh && @@ -677,17 +392,19 @@ int vp9_post_proc_frame(struct VP9Common *cm, if ((flags & VP9D_DEMACROBLOCK) && cm->post_proc_buffer_int.buffer_alloc) { deblock_and_de_macro_block(&cm->post_proc_buffer_int, ppbuf, q + (ppflags->deblocking_level - 5) * 10, - 1, 0); + 1, 0, cm->postproc_state.limits); } else if (flags & VP9D_DEBLOCK) { - vp9_deblock(&cm->post_proc_buffer_int, ppbuf, q); + vp9_deblock(&cm->post_proc_buffer_int, ppbuf, q, + cm->postproc_state.limits); } else { vp8_yv12_copy_frame(&cm->post_proc_buffer_int, ppbuf); } } else if (flags & VP9D_DEMACROBLOCK) { deblock_and_de_macro_block(cm->frame_to_show, ppbuf, - q + (ppflags->deblocking_level - 5) * 10, 1, 0); + q + (ppflags->deblocking_level - 5) * 10, 1, 0, + cm->postproc_state.limits); } else if (flags & VP9D_DEBLOCK) { - vp9_deblock(cm->frame_to_show, ppbuf, q); + vp9_deblock(cm->frame_to_show, ppbuf, q, cm->postproc_state.limits); } else { vp8_yv12_copy_frame(cm->frame_to_show, ppbuf); } @@ -699,7 +416,20 @@ int vp9_post_proc_frame(struct VP9Common *cm, const int noise_level = ppflags->noise_level; if (ppstate->last_q != q || ppstate->last_noise != noise_level) { - fillrd(ppstate, 63 - q, noise_level); + double sigma; + int clamp, i; + vpx_clear_system_state(); + sigma = noise_level + .5 + .6 * q / 63.0; + clamp = vpx_setup_noise(sigma, sizeof(ppstate->noise), + ppstate->noise); + + for (i = 0; i < 16; i++) { + ppstate->blackclamp[i] = clamp; + ppstate->whiteclamp[i] = clamp; + ppstate->bothclamp[i] = 2 * clamp; + } + ppstate->last_q = q; + ppstate->last_noise = noise_level; } vpx_plane_add_noise(ppbuf->y_buffer, ppstate->noise, ppstate->blackclamp, ppstate->whiteclamp, ppstate->bothclamp, diff --git a/vp9/common/vp9_postproc.h b/vp9/common/vp9_postproc.h index 035c9cdf8..60e6f5232 100644 --- a/vp9/common/vp9_postproc.h +++ b/vp9/common/vp9_postproc.h @@ -33,6 +33,7 @@ struct postproc_state { DECLARE_ALIGNED(16, char, blackclamp[16]); DECLARE_ALIGNED(16, char, whiteclamp[16]); DECLARE_ALIGNED(16, char, bothclamp[16]); + uint8_t *limits; }; struct VP9Common; @@ -42,9 +43,11 @@ struct VP9Common; int vp9_post_proc_frame(struct VP9Common *cm, YV12_BUFFER_CONFIG *dest, vp9_ppflags_t *flags); -void vp9_denoise(const YV12_BUFFER_CONFIG *src, YV12_BUFFER_CONFIG *dst, int q); +void vp9_denoise(const YV12_BUFFER_CONFIG *src, YV12_BUFFER_CONFIG *dst, int q, + uint8_t *limits); -void vp9_deblock(const YV12_BUFFER_CONFIG *src, YV12_BUFFER_CONFIG *dst, int q); +void vp9_deblock(const YV12_BUFFER_CONFIG *src, YV12_BUFFER_CONFIG *dst, int q, + uint8_t *limits); #ifdef __cplusplus } // extern "C" diff --git a/vp9/common/vp9_rtcd_defs.pl b/vp9/common/vp9_rtcd_defs.pl index 846133674..f315a3b85 100644 --- a/vp9/common/vp9_rtcd_defs.pl +++ b/vp9/common/vp9_rtcd_defs.pl @@ -21,29 +21,6 @@ EOF } forward_decls qw/vp9_common_forward_decls/; -# x86inc.asm had specific constraints. break it out so it's easy to disable. -# zero all the variables to avoid tricky else conditions. -$mmx_x86inc = $sse_x86inc = $sse2_x86inc = $ssse3_x86inc = $avx_x86inc = - $avx2_x86inc = ''; -$mmx_x86_64_x86inc = $sse_x86_64_x86inc = $sse2_x86_64_x86inc = - $ssse3_x86_64_x86inc = $avx_x86_64_x86inc = $avx2_x86_64_x86inc = ''; -if (vpx_config("CONFIG_USE_X86INC") eq "yes") { - $mmx_x86inc = 'mmx'; - $sse_x86inc = 'sse'; - $sse2_x86inc = 'sse2'; - $ssse3_x86inc = 'ssse3'; - $avx_x86inc = 'avx'; - $avx2_x86inc = 'avx2'; - if ($opts{arch} eq "x86_64") { - $mmx_x86_64_x86inc = 'mmx'; - $sse_x86_64_x86inc = 'sse'; - $sse2_x86_64_x86inc = 'sse2'; - $ssse3_x86_64_x86inc = 'ssse3'; - $avx_x86_64_x86inc = 'avx'; - $avx2_x86_64_x86inc = 'avx2'; - } -} - # functions that are 64 bit only. $mmx_x86_64 = $sse2_x86_64 = $ssse3_x86_64 = $avx_x86_64 = $avx2_x86_64 = ''; if ($opts{arch} eq "x86_64") { @@ -58,18 +35,6 @@ if ($opts{arch} eq "x86_64") { # post proc # if (vpx_config("CONFIG_VP9_POSTPROC") eq "yes") { -add_proto qw/void vp9_mbpost_proc_down/, "uint8_t *dst, int pitch, int rows, int cols, int flimit"; -specialize qw/vp9_mbpost_proc_down sse2/; -$vp9_mbpost_proc_down_sse2=vp9_mbpost_proc_down_xmm; - -add_proto qw/void vp9_mbpost_proc_across_ip/, "uint8_t *src, int pitch, int rows, int cols, int flimit"; -specialize qw/vp9_mbpost_proc_across_ip sse2/; -$vp9_mbpost_proc_across_ip_sse2=vp9_mbpost_proc_across_ip_xmm; - -add_proto qw/void vp9_post_proc_down_and_across/, "const uint8_t *src_ptr, uint8_t *dst_ptr, int src_pixels_per_line, int dst_pixels_per_line, int rows, int cols, int flimit"; -specialize qw/vp9_post_proc_down_and_across sse2/; -$vp9_post_proc_down_and_across_sse2=vp9_post_proc_down_and_across_xmm; - add_proto qw/void vp9_filter_by_weight16x16/, "const uint8_t *src, int src_stride, uint8_t *dst, int dst_stride, int src_weight"; specialize qw/vp9_filter_by_weight16x16 sse2 msa/; @@ -202,10 +167,10 @@ if (vpx_config("CONFIG_VP9_HIGHBITDEPTH") eq "yes") { specialize qw/vp9_block_error/; add_proto qw/int64_t vp9_highbd_block_error/, "const tran_low_t *coeff, const tran_low_t *dqcoeff, intptr_t block_size, int64_t *ssz, int bd"; - specialize qw/vp9_highbd_block_error/, "$sse2_x86inc"; + specialize qw/vp9_highbd_block_error sse2/; add_proto qw/int64_t vp9_highbd_block_error_8bit/, "const tran_low_t *coeff, const tran_low_t *dqcoeff, intptr_t block_size, int64_t *ssz"; - specialize qw/vp9_highbd_block_error_8bit/, "$sse2_x86inc", "$avx_x86inc"; + specialize qw/vp9_highbd_block_error_8bit sse2 avx/; add_proto qw/void vp9_quantize_fp/, "const tran_low_t *coeff_ptr, intptr_t n_coeffs, int skip_block, const int16_t *zbin_ptr, const int16_t *round_ptr, const int16_t *quant_ptr, const int16_t *quant_shift_ptr, tran_low_t *qcoeff_ptr, tran_low_t *dqcoeff_ptr, const int16_t *dequant_ptr, uint16_t *eob_ptr, const int16_t *scan, const int16_t *iscan"; specialize qw/vp9_quantize_fp/; @@ -217,16 +182,16 @@ if (vpx_config("CONFIG_VP9_HIGHBITDEPTH") eq "yes") { specialize qw/vp9_fdct8x8_quant/; } else { add_proto qw/int64_t vp9_block_error/, "const tran_low_t *coeff, const tran_low_t *dqcoeff, intptr_t block_size, int64_t *ssz"; - specialize qw/vp9_block_error avx2 msa/, "$sse2_x86inc"; + specialize qw/vp9_block_error avx2 msa sse2/; add_proto qw/int64_t vp9_block_error_fp/, "const int16_t *coeff, const int16_t *dqcoeff, int block_size"; - specialize qw/vp9_block_error_fp neon/, "$sse2_x86inc"; + specialize qw/vp9_block_error_fp neon sse2/; add_proto qw/void vp9_quantize_fp/, "const tran_low_t *coeff_ptr, intptr_t n_coeffs, int skip_block, const int16_t *zbin_ptr, const int16_t *round_ptr, const int16_t *quant_ptr, const int16_t *quant_shift_ptr, tran_low_t *qcoeff_ptr, tran_low_t *dqcoeff_ptr, const int16_t *dequant_ptr, uint16_t *eob_ptr, const int16_t *scan, const int16_t *iscan"; - specialize qw/vp9_quantize_fp neon sse2/, "$ssse3_x86_64_x86inc"; + specialize qw/vp9_quantize_fp neon sse2/, "$ssse3_x86_64"; add_proto qw/void vp9_quantize_fp_32x32/, "const tran_low_t *coeff_ptr, intptr_t n_coeffs, int skip_block, const int16_t *zbin_ptr, const int16_t *round_ptr, const int16_t *quant_ptr, const int16_t *quant_shift_ptr, tran_low_t *qcoeff_ptr, tran_low_t *dqcoeff_ptr, const int16_t *dequant_ptr, uint16_t *eob_ptr, const int16_t *scan, const int16_t *iscan"; - specialize qw/vp9_quantize_fp_32x32/, "$ssse3_x86_64_x86inc"; + specialize qw/vp9_quantize_fp_32x32/, "$ssse3_x86_64"; add_proto qw/void vp9_fdct8x8_quant/, "const int16_t *input, int stride, tran_low_t *coeff_ptr, intptr_t n_coeffs, int skip_block, const int16_t *zbin_ptr, const int16_t *round_ptr, const int16_t *quant_ptr, const int16_t *quant_shift_ptr, tran_low_t *qcoeff_ptr, tran_low_t *dqcoeff_ptr, const int16_t *dequant_ptr, uint16_t *eob_ptr, const int16_t *scan, const int16_t *iscan"; specialize qw/vp9_fdct8x8_quant sse2 ssse3 neon/; @@ -245,7 +210,7 @@ if (vpx_config("CONFIG_VP9_HIGHBITDEPTH") eq "yes") { specialize qw/vp9_fht16x16 sse2/; add_proto qw/void vp9_fwht4x4/, "const int16_t *input, tran_low_t *output, int stride"; - specialize qw/vp9_fwht4x4/, "$sse2_x86inc"; + specialize qw/vp9_fwht4x4 sse2/; } else { add_proto qw/void vp9_fht4x4/, "const int16_t *input, tran_low_t *output, int stride, int tx_type"; specialize qw/vp9_fht4x4 sse2 msa/; @@ -257,7 +222,7 @@ if (vpx_config("CONFIG_VP9_HIGHBITDEPTH") eq "yes") { specialize qw/vp9_fht16x16 sse2 msa/; add_proto qw/void vp9_fwht4x4/, "const int16_t *input, tran_low_t *output, int stride"; - specialize qw/vp9_fwht4x4 msa/, "$sse2_x86inc"; + specialize qw/vp9_fwht4x4 msa sse2/; } # diff --git a/vp9/common/x86/vp9_mfqe_sse2.asm b/vp9/common/x86/vp9_mfqe_sse2.asm index 6029420d1..30852049b 100644 --- a/vp9/common/x86/vp9_mfqe_sse2.asm +++ b/vp9/common/x86/vp9_mfqe_sse2.asm @@ -46,7 +46,7 @@ sym(vp9_filter_by_weight16x16_sse2): mov rcx, 16 ; loop count pxor xmm6, xmm6 -.combine +.combine: movdqa xmm2, [rax] movdqa xmm4, [rdx] add rax, rsi @@ -123,7 +123,7 @@ sym(vp9_filter_by_weight8x8_sse2): mov rcx, 8 ; loop count pxor xmm4, xmm4 -.combine +.combine: movq xmm2, [rax] movq xmm3, [rdx] add rax, rsi @@ -190,7 +190,7 @@ sym(vp9_variance_and_sad_16x16_sse2): ; Because we're working with the actual output frames ; we can't depend on any kind of data alignment. -.accumulate +.accumulate: movdqa xmm0, [rax] ; src1 movdqa xmm1, [rdx] ; src2 add rax, rcx ; src1 + stride1 diff --git a/vp9/common/x86/vp9_postproc_sse2.asm b/vp9/common/x86/vp9_postproc_sse2.asm deleted file mode 100644 index 430762815..000000000 --- a/vp9/common/x86/vp9_postproc_sse2.asm +++ /dev/null @@ -1,632 +0,0 @@ -; -; Copyright (c) 2010 The WebM project authors. All Rights Reserved. -; -; Use of this source code is governed by a BSD-style license -; that can be found in the LICENSE file in the root of the source -; tree. An additional intellectual property rights grant can be found -; in the file PATENTS. All contributing project authors may -; be found in the AUTHORS file in the root of the source tree. -; - - -%include "vpx_ports/x86_abi_support.asm" - -;void vp9_post_proc_down_and_across_xmm -;( -; unsigned char *src_ptr, -; unsigned char *dst_ptr, -; int src_pixels_per_line, -; int dst_pixels_per_line, -; int rows, -; int cols, -; int flimit -;) -global sym(vp9_post_proc_down_and_across_xmm) PRIVATE -sym(vp9_post_proc_down_and_across_xmm): - push rbp - mov rbp, rsp - SHADOW_ARGS_TO_STACK 7 - SAVE_XMM 7 - GET_GOT rbx - push rsi - push rdi - ; end prolog - -%if ABI_IS_32BIT=1 && CONFIG_PIC=1 - ALIGN_STACK 16, rax - ; move the global rd onto the stack, since we don't have enough registers - ; to do PIC addressing - movdqa xmm0, [GLOBAL(rd42)] - sub rsp, 16 - movdqa [rsp], xmm0 -%define RD42 [rsp] -%else -%define RD42 [GLOBAL(rd42)] -%endif - - - movd xmm2, dword ptr arg(6) ;flimit - punpcklwd xmm2, xmm2 - punpckldq xmm2, xmm2 - punpcklqdq xmm2, xmm2 - - mov rsi, arg(0) ;src_ptr - mov rdi, arg(1) ;dst_ptr - - movsxd rcx, DWORD PTR arg(4) ;rows - movsxd rax, DWORD PTR arg(2) ;src_pixels_per_line ; destination pitch? - pxor xmm0, xmm0 ; mm0 = 00000000 - -.nextrow: - - xor rdx, rdx ; clear out rdx for use as loop counter -.nextcol: - movq xmm3, QWORD PTR [rsi] ; mm4 = r0 p0..p7 - punpcklbw xmm3, xmm0 ; mm3 = p0..p3 - movdqa xmm1, xmm3 ; mm1 = p0..p3 - psllw xmm3, 2 ; - - movq xmm5, QWORD PTR [rsi + rax] ; mm4 = r1 p0..p7 - punpcklbw xmm5, xmm0 ; mm5 = r1 p0..p3 - paddusw xmm3, xmm5 ; mm3 += mm6 - - ; thresholding - movdqa xmm7, xmm1 ; mm7 = r0 p0..p3 - psubusw xmm7, xmm5 ; mm7 = r0 p0..p3 - r1 p0..p3 - psubusw xmm5, xmm1 ; mm5 = r1 p0..p3 - r0 p0..p3 - paddusw xmm7, xmm5 ; mm7 = abs(r0 p0..p3 - r1 p0..p3) - pcmpgtw xmm7, xmm2 - - movq xmm5, QWORD PTR [rsi + 2*rax] ; mm4 = r2 p0..p7 - punpcklbw xmm5, xmm0 ; mm5 = r2 p0..p3 - paddusw xmm3, xmm5 ; mm3 += mm5 - - ; thresholding - movdqa xmm6, xmm1 ; mm6 = r0 p0..p3 - psubusw xmm6, xmm5 ; mm6 = r0 p0..p3 - r2 p0..p3 - psubusw xmm5, xmm1 ; mm5 = r2 p0..p3 - r2 p0..p3 - paddusw xmm6, xmm5 ; mm6 = abs(r0 p0..p3 - r2 p0..p3) - pcmpgtw xmm6, xmm2 - por xmm7, xmm6 ; accumulate thresholds - - - neg rax - movq xmm5, QWORD PTR [rsi+2*rax] ; mm4 = r-2 p0..p7 - punpcklbw xmm5, xmm0 ; mm5 = r-2 p0..p3 - paddusw xmm3, xmm5 ; mm3 += mm5 - - ; thresholding - movdqa xmm6, xmm1 ; mm6 = r0 p0..p3 - psubusw xmm6, xmm5 ; mm6 = p0..p3 - r-2 p0..p3 - psubusw xmm5, xmm1 ; mm5 = r-2 p0..p3 - p0..p3 - paddusw xmm6, xmm5 ; mm6 = abs(r0 p0..p3 - r-2 p0..p3) - pcmpgtw xmm6, xmm2 - por xmm7, xmm6 ; accumulate thresholds - - movq xmm4, QWORD PTR [rsi+rax] ; mm4 = r-1 p0..p7 - punpcklbw xmm4, xmm0 ; mm4 = r-1 p0..p3 - paddusw xmm3, xmm4 ; mm3 += mm5 - - ; thresholding - movdqa xmm6, xmm1 ; mm6 = r0 p0..p3 - psubusw xmm6, xmm4 ; mm6 = p0..p3 - r-2 p0..p3 - psubusw xmm4, xmm1 ; mm5 = r-1 p0..p3 - p0..p3 - paddusw xmm6, xmm4 ; mm6 = abs(r0 p0..p3 - r-1 p0..p3) - pcmpgtw xmm6, xmm2 - por xmm7, xmm6 ; accumulate thresholds - - - paddusw xmm3, RD42 ; mm3 += round value - psraw xmm3, 3 ; mm3 /= 8 - - pand xmm1, xmm7 ; mm1 select vals > thresh from source - pandn xmm7, xmm3 ; mm7 select vals < thresh from blurred result - paddusw xmm1, xmm7 ; combination - - packuswb xmm1, xmm0 ; pack to bytes - movq QWORD PTR [rdi], xmm1 ; - - neg rax ; pitch is positive - add rsi, 8 - add rdi, 8 - - add rdx, 8 - cmp edx, dword arg(5) ;cols - - jl .nextcol - - ; done with the all cols, start the across filtering in place - sub rsi, rdx - sub rdi, rdx - - xor rdx, rdx - movq mm0, QWORD PTR [rdi-8]; - -.acrossnextcol: - movq xmm7, QWORD PTR [rdi +rdx -2] - movd xmm4, DWORD PTR [rdi +rdx +6] - - pslldq xmm4, 8 - por xmm4, xmm7 - - movdqa xmm3, xmm4 - psrldq xmm3, 2 - punpcklbw xmm3, xmm0 ; mm3 = p0..p3 - movdqa xmm1, xmm3 ; mm1 = p0..p3 - psllw xmm3, 2 - - - movdqa xmm5, xmm4 - psrldq xmm5, 3 - punpcklbw xmm5, xmm0 ; mm5 = p1..p4 - paddusw xmm3, xmm5 ; mm3 += mm6 - - ; thresholding - movdqa xmm7, xmm1 ; mm7 = p0..p3 - psubusw xmm7, xmm5 ; mm7 = p0..p3 - p1..p4 - psubusw xmm5, xmm1 ; mm5 = p1..p4 - p0..p3 - paddusw xmm7, xmm5 ; mm7 = abs(p0..p3 - p1..p4) - pcmpgtw xmm7, xmm2 - - movdqa xmm5, xmm4 - psrldq xmm5, 4 - punpcklbw xmm5, xmm0 ; mm5 = p2..p5 - paddusw xmm3, xmm5 ; mm3 += mm5 - - ; thresholding - movdqa xmm6, xmm1 ; mm6 = p0..p3 - psubusw xmm6, xmm5 ; mm6 = p0..p3 - p1..p4 - psubusw xmm5, xmm1 ; mm5 = p1..p4 - p0..p3 - paddusw xmm6, xmm5 ; mm6 = abs(p0..p3 - p1..p4) - pcmpgtw xmm6, xmm2 - por xmm7, xmm6 ; accumulate thresholds - - - movdqa xmm5, xmm4 ; mm5 = p-2..p5 - punpcklbw xmm5, xmm0 ; mm5 = p-2..p1 - paddusw xmm3, xmm5 ; mm3 += mm5 - - ; thresholding - movdqa xmm6, xmm1 ; mm6 = p0..p3 - psubusw xmm6, xmm5 ; mm6 = p0..p3 - p1..p4 - psubusw xmm5, xmm1 ; mm5 = p1..p4 - p0..p3 - paddusw xmm6, xmm5 ; mm6 = abs(p0..p3 - p1..p4) - pcmpgtw xmm6, xmm2 - por xmm7, xmm6 ; accumulate thresholds - - psrldq xmm4, 1 ; mm4 = p-1..p5 - punpcklbw xmm4, xmm0 ; mm4 = p-1..p2 - paddusw xmm3, xmm4 ; mm3 += mm5 - - ; thresholding - movdqa xmm6, xmm1 ; mm6 = p0..p3 - psubusw xmm6, xmm4 ; mm6 = p0..p3 - p1..p4 - psubusw xmm4, xmm1 ; mm5 = p1..p4 - p0..p3 - paddusw xmm6, xmm4 ; mm6 = abs(p0..p3 - p1..p4) - pcmpgtw xmm6, xmm2 - por xmm7, xmm6 ; accumulate thresholds - - paddusw xmm3, RD42 ; mm3 += round value - psraw xmm3, 3 ; mm3 /= 8 - - pand xmm1, xmm7 ; mm1 select vals > thresh from source - pandn xmm7, xmm3 ; mm7 select vals < thresh from blurred result - paddusw xmm1, xmm7 ; combination - - packuswb xmm1, xmm0 ; pack to bytes - movq QWORD PTR [rdi+rdx-8], mm0 ; store previous four bytes - movdq2q mm0, xmm1 - - add rdx, 8 - cmp edx, dword arg(5) ;cols - jl .acrossnextcol; - - ; last 8 pixels - movq QWORD PTR [rdi+rdx-8], mm0 - - ; done with this rwo - add rsi,rax ; next line - mov eax, dword arg(3) ;dst_pixels_per_line ; destination pitch? - add rdi,rax ; next destination - mov eax, dword arg(2) ;src_pixels_per_line ; destination pitch? - - dec rcx ; decrement count - jnz .nextrow ; next row - -%if ABI_IS_32BIT=1 && CONFIG_PIC=1 - add rsp,16 - pop rsp -%endif - ; begin epilog - pop rdi - pop rsi - RESTORE_GOT - RESTORE_XMM - UNSHADOW_ARGS - pop rbp - ret -%undef RD42 - - -;void vp9_mbpost_proc_down_xmm(unsigned char *dst, -; int pitch, int rows, int cols,int flimit) -extern sym(vp9_rv) -global sym(vp9_mbpost_proc_down_xmm) PRIVATE -sym(vp9_mbpost_proc_down_xmm): - push rbp - mov rbp, rsp - SHADOW_ARGS_TO_STACK 5 - SAVE_XMM 7 - GET_GOT rbx - push rsi - push rdi - ; end prolog - - ALIGN_STACK 16, rax - sub rsp, 128+16 - - ; unsigned char d[16][8] at [rsp] - ; create flimit2 at [rsp+128] - mov eax, dword ptr arg(4) ;flimit - mov [rsp+128], eax - mov [rsp+128+4], eax - mov [rsp+128+8], eax - mov [rsp+128+12], eax -%define flimit4 [rsp+128] - -%if ABI_IS_32BIT=0 - lea r8, [GLOBAL(sym(vp9_rv))] -%endif - - ;rows +=8; - add dword arg(2), 8 - - ;for(c=0; cSource, cpi->Source, l); + if (!cpi->common.postproc_state.limits) { + cpi->common.postproc_state.limits = vpx_calloc( + cpi->common.width, sizeof(*cpi->common.postproc_state.limits)); + } + vp9_denoise(cpi->Source, cpi->Source, l, cpi->common.postproc_state.limits); } #endif // CONFIG_VP9_POSTPROC } @@ -4649,7 +4653,7 @@ int vp9_get_compressed_data(VP9_COMP *cpi, unsigned int *frame_flags, } vp9_deblock(cm->frame_to_show, pp, - cm->lf.filter_level * 10 / 6); + cm->lf.filter_level * 10 / 6, cm->postproc_state.limits); #endif vpx_clear_system_state(); diff --git a/vp9/vp9_common.mk b/vp9/vp9_common.mk index d0135c6f8..2dbf0f69f 100644 --- a/vp9/vp9_common.mk +++ b/vp9/vp9_common.mk @@ -67,7 +67,6 @@ VP9_COMMON_SRCS-$(CONFIG_VP9_POSTPROC) += common/vp9_mfqe.h VP9_COMMON_SRCS-$(CONFIG_VP9_POSTPROC) += common/vp9_mfqe.c ifeq ($(CONFIG_VP9_POSTPROC),yes) VP9_COMMON_SRCS-$(HAVE_SSE2) += common/x86/vp9_mfqe_sse2.asm -VP9_COMMON_SRCS-$(HAVE_SSE2) += common/x86/vp9_postproc_sse2.asm endif ifneq ($(CONFIG_VP9_HIGHBITDEPTH),yes) diff --git a/vp9/vp9_cx_iface.c b/vp9/vp9_cx_iface.c index 51c6fbb02..f4e989fb5 100644 --- a/vp9/vp9_cx_iface.c +++ b/vp9/vp9_cx_iface.c @@ -992,6 +992,7 @@ static vpx_codec_frame_flags_t get_frame_pkt_flags(const VP9_COMP *cpi, return flags; } +const size_t kMinCompressedSize = 8192; static vpx_codec_err_t encoder_encode(vpx_codec_alg_priv_t *ctx, const vpx_image_t *img, vpx_codec_pts_t pts, @@ -1013,8 +1014,8 @@ static vpx_codec_err_t encoder_encode(vpx_codec_alg_priv_t *ctx, // instance for its status to determine the compressed data size. data_sz = ctx->cfg.g_w * ctx->cfg.g_h * get_image_bps(img) / 8 * (cpi->multi_arf_allowed ? 8 : 2); - if (data_sz < 4096) - data_sz = 4096; + if (data_sz < kMinCompressedSize) + data_sz = kMinCompressedSize; if (ctx->cx_data == NULL || ctx->cx_data_sz < data_sz) { ctx->cx_data_sz = data_sz; free(ctx->cx_data); diff --git a/vp9/vp9cx.mk b/vp9/vp9cx.mk index 5f3de8f8a..b8342b9e1 100644 --- a/vp9/vp9cx.mk +++ b/vp9/vp9cx.mk @@ -101,7 +101,6 @@ ifeq ($(CONFIG_VP9_HIGHBITDEPTH),yes) VP9_CX_SRCS-$(HAVE_SSE2) += encoder/x86/vp9_highbd_block_error_intrin_sse2.c endif -ifeq ($(CONFIG_USE_X86INC),yes) VP9_CX_SRCS-$(HAVE_SSE2) += encoder/x86/vp9_dct_sse2.asm ifeq ($(CONFIG_VP9_HIGHBITDEPTH),yes) VP9_CX_SRCS-$(HAVE_SSE2) += encoder/x86/vp9_highbd_error_sse2.asm @@ -109,13 +108,10 @@ VP9_CX_SRCS-$(HAVE_AVX) += encoder/x86/vp9_highbd_error_avx.asm else VP9_CX_SRCS-$(HAVE_SSE2) += encoder/x86/vp9_error_sse2.asm endif -endif ifeq ($(ARCH_X86_64),yes) -ifeq ($(CONFIG_USE_X86INC),yes) VP9_CX_SRCS-$(HAVE_SSSE3) += encoder/x86/vp9_quantize_ssse3_x86_64.asm endif -endif VP9_CX_SRCS-$(HAVE_SSE2) += encoder/x86/vp9_dct_intrin_sse2.c VP9_CX_SRCS-$(HAVE_SSSE3) += encoder/x86/vp9_dct_ssse3.c diff --git a/vpx/src/svc_encodeframe.c b/vpx/src/svc_encodeframe.c index 802860860..ef9b3528a 100644 --- a/vpx/src/svc_encodeframe.c +++ b/vpx/src/svc_encodeframe.c @@ -397,13 +397,6 @@ vpx_codec_err_t vpx_svc_init(SvcContext *svc_ctx, vpx_codec_ctx_t *codec_ctx, si->width = enc_cfg->g_w; si->height = enc_cfg->g_h; -// wonkap: why is this necessary? - /*if (enc_cfg->kf_max_dist < 2) { - svc_log(svc_ctx, SVC_LOG_ERROR, "key frame distance too small: %d\n", - enc_cfg->kf_max_dist); - return VPX_CODEC_INVALID_PARAM; - }*/ - si->kf_dist = enc_cfg->kf_max_dist; if (svc_ctx->spatial_layers == 0) diff --git a/vpx_dsp/add_noise.c b/vpx_dsp/add_noise.c index 682b44419..4ae67a813 100644 --- a/vpx_dsp/add_noise.c +++ b/vpx_dsp/add_noise.c @@ -8,6 +8,7 @@ * be found in the AUTHORS file in the root of the source tree. */ +#include #include #include "./vpx_config.h" @@ -23,11 +24,11 @@ void vpx_plane_add_noise_c(uint8_t *start, char *noise, unsigned int width, unsigned int height, int pitch) { unsigned int i, j; - for (i = 0; i < height; i++) { + for (i = 0; i < height; ++i) { uint8_t *pos = start + i * pitch; char *ref = (char *)(noise + (rand() & 0xff)); // NOLINT - for (j = 0; j < width; j++) { + for (j = 0; j < width; ++j) { int v = pos[j]; v = clamp(v - blackclamp[0], 0, 255); @@ -38,3 +39,36 @@ void vpx_plane_add_noise_c(uint8_t *start, char *noise, } } } + +static double gaussian(double sigma, double mu, double x) { + return 1 / (sigma * sqrt(2.0 * 3.14159265)) * + (exp(-(x - mu) * (x - mu) / (2 * sigma * sigma))); +} + +int vpx_setup_noise(double sigma, int size, char *noise) { + char char_dist[256]; + int next = 0, i, j; + + // set up a 256 entry lookup that matches gaussian distribution + for (i = -32; i < 32; ++i) { + const int a_i = (int) (0.5 + 256 * gaussian(sigma, 0, i)); + if (a_i) { + for (j = 0; j < a_i; ++j) { + char_dist[next + j] = (char)i; + } + next = next + j; + } + } + + // Rounding error - might mean we have less than 256. + for (; next < 256; ++next) { + char_dist[next] = 0; + } + + for (i = 0; i < size; ++i) { + noise[i] = char_dist[rand() & 0xff]; // NOLINT + } + + // Returns the highest non 0 value used in distribution. + return -char_dist[0]; +} diff --git a/vpx_dsp/deblock.c b/vpx_dsp/deblock.c new file mode 100644 index 000000000..2b1b7e30e --- /dev/null +++ b/vpx_dsp/deblock.c @@ -0,0 +1,203 @@ +/* + * Copyright (c) 2016 The WebM project authors. All Rights Reserved. + * + * Use of this source code is governed by a BSD-style license + * that can be found in the LICENSE file in the root of the source + * tree. An additional intellectual property rights grant can be found + * in the file PATENTS. All contributing project authors may + * be found in the AUTHORS file in the root of the source tree. + */ +#include +#include "vpx/vpx_integer.h" + +const int16_t vpx_rv[] = {8, 5, 2, 2, 8, 12, 4, 9, 8, 3, 0, 3, 9, 0, 0, 0, 8, 3, + 14, 4, 10, 1, 11, 14, 1, 14, 9, 6, 12, 11, 8, 6, 10, 0, 0, 8, 9, 0, 3, 14, + 8, 11, 13, 4, 2, 9, 0, 3, 9, 6, 1, 2, 3, 14, 13, 1, 8, 2, 9, 7, 3, 3, 1, 13, + 13, 6, 6, 5, 2, 7, 11, 9, 11, 8, 7, 3, 2, 0, 13, 13, 14, 4, 12, 5, 12, 10, + 8, 10, 13, 10, 4, 14, 4, 10, 0, 8, 11, 1, 13, 7, 7, 14, 6, 14, 13, 2, 13, 5, + 4, 4, 0, 10, 0, 5, 13, 2, 12, 7, 11, 13, 8, 0, 4, 10, 7, 2, 7, 2, 2, 5, 3, + 4, 7, 3, 3, 14, 14, 5, 9, 13, 3, 14, 3, 6, 3, 0, 11, 8, 13, 1, 13, 1, 12, 0, + 10, 9, 7, 6, 2, 8, 5, 2, 13, 7, 1, 13, 14, 7, 6, 7, 9, 6, 10, 11, 7, 8, 7, + 5, 14, 8, 4, 4, 0, 8, 7, 10, 0, 8, 14, 11, 3, 12, 5, 7, 14, 3, 14, 5, 2, 6, + 11, 12, 12, 8, 0, 11, 13, 1, 2, 0, 5, 10, 14, 7, 8, 0, 4, 11, 0, 8, 0, 3, + 10, 5, 8, 0, 11, 6, 7, 8, 10, 7, 13, 9, 2, 5, 1, 5, 10, 2, 4, 3, 5, 6, 10, + 8, 9, 4, 11, 14, 0, 10, 0, 5, 13, 2, 12, 7, 11, 13, 8, 0, 4, 10, 7, 2, 7, 2, + 2, 5, 3, 4, 7, 3, 3, 14, 14, 5, 9, 13, 3, 14, 3, 6, 3, 0, 11, 8, 13, 1, 13, + 1, 12, 0, 10, 9, 7, 6, 2, 8, 5, 2, 13, 7, 1, 13, 14, 7, 6, 7, 9, 6, 10, 11, + 7, 8, 7, 5, 14, 8, 4, 4, 0, 8, 7, 10, 0, 8, 14, 11, 3, 12, 5, 7, 14, 3, 14, + 5, 2, 6, 11, 12, 12, 8, 0, 11, 13, 1, 2, 0, 5, 10, 14, 7, 8, 0, 4, 11, 0, 8, + 0, 3, 10, 5, 8, 0, 11, 6, 7, 8, 10, 7, 13, 9, 2, 5, 1, 5, 10, 2, 4, 3, 5, 6, + 10, 8, 9, 4, 11, 14, 3, 8, 3, 7, 8, 5, 11, 4, 12, 3, 11, 9, 14, 8, 14, 13, + 4, 3, 1, 2, 14, 6, 5, 4, 4, 11, 4, 6, 2, 1, 5, 8, 8, 12, 13, 5, 14, 10, 12, + 13, 0, 9, 5, 5, 11, 10, 13, 9, 10, 13, }; + +void vpx_post_proc_down_and_across_mb_row_c(unsigned char *src_ptr, + unsigned char *dst_ptr, + int src_pixels_per_line, + int dst_pixels_per_line, int cols, + unsigned char *f, int size) { + unsigned char *p_src, *p_dst; + int row; + int col; + unsigned char v; + unsigned char d[4]; + + for (row = 0; row < size; row++) { + /* post_proc_down for one row */ + p_src = src_ptr; + p_dst = dst_ptr; + + for (col = 0; col < cols; col++) { + unsigned char p_above2 = p_src[col - 2 * src_pixels_per_line]; + unsigned char p_above1 = p_src[col - src_pixels_per_line]; + unsigned char p_below1 = p_src[col + src_pixels_per_line]; + unsigned char p_below2 = p_src[col + 2 * src_pixels_per_line]; + + v = p_src[col]; + + if ((abs(v - p_above2) < f[col]) && (abs(v - p_above1) < f[col]) + && (abs(v - p_below1) < f[col]) && (abs(v - p_below2) < f[col])) { + unsigned char k1, k2, k3; + k1 = (p_above2 + p_above1 + 1) >> 1; + k2 = (p_below2 + p_below1 + 1) >> 1; + k3 = (k1 + k2 + 1) >> 1; + v = (k3 + v + 1) >> 1; + } + + p_dst[col] = v; + } + + /* now post_proc_across */ + p_src = dst_ptr; + p_dst = dst_ptr; + + p_src[-2] = p_src[-1] = p_src[0]; + p_src[cols] = p_src[cols + 1] = p_src[cols - 1]; + + for (col = 0; col < cols; col++) { + v = p_src[col]; + + if ((abs(v - p_src[col - 2]) < f[col]) + && (abs(v - p_src[col - 1]) < f[col]) + && (abs(v - p_src[col + 1]) < f[col]) + && (abs(v - p_src[col + 2]) < f[col])) { + unsigned char k1, k2, k3; + k1 = (p_src[col - 2] + p_src[col - 1] + 1) >> 1; + k2 = (p_src[col + 2] + p_src[col + 1] + 1) >> 1; + k3 = (k1 + k2 + 1) >> 1; + v = (k3 + v + 1) >> 1; + } + + d[col & 3] = v; + + if (col >= 2) + p_dst[col - 2] = d[(col - 2) & 3]; + } + + /* handle the last two pixels */ + p_dst[col - 2] = d[(col - 2) & 3]; + p_dst[col - 1] = d[(col - 1) & 3]; + + /* next row */ + src_ptr += src_pixels_per_line; + dst_ptr += dst_pixels_per_line; + } +} + +void vpx_mbpost_proc_across_ip_c(unsigned char *src, int pitch, int rows, + int cols, int flimit) { + int r, c, i; + + unsigned char *s = src; + unsigned char d[16]; + + for (r = 0; r < rows; r++) { + int sumsq = 0; + int sum = 0; + + for (i = -8; i < 0; i++) + s[i] = s[0]; + + /* 17 avoids valgrind warning - we buffer values in c in d + * and only write them when we've read 8 ahead... + */ + for (i = 0; i < 17; i++) + s[i + cols] = s[cols - 1]; + + for (i = -8; i <= 6; i++) { + sumsq += s[i] * s[i]; + sum += s[i]; + d[i + 8] = 0; + } + + for (c = 0; c < cols + 8; c++) { + int x = s[c + 7] - s[c - 8]; + int y = s[c + 7] + s[c - 8]; + + sum += x; + sumsq += x * y; + + d[c & 15] = s[c]; + + if (sumsq * 15 - sum * sum < flimit) { + d[c & 15] = (8 + sum + s[c]) >> 4; + } + + s[c - 8] = d[(c - 8) & 15]; + } + + s += pitch; + } +} + +void vpx_mbpost_proc_down_c(unsigned char *dst, int pitch, int rows, int cols, + int flimit) { + int r, c, i; + const int16_t *rv3 = &vpx_rv[63 & rand()]; + + for (c = 0; c < cols; c++) { + unsigned char *s = &dst[c]; + int sumsq = 0; + int sum = 0; + unsigned char d[16]; + const int16_t *rv2 = rv3 + ((c * 17) & 127); + + for (i = -8; i < 0; i++) + s[i * pitch] = s[0]; + + /* 17 avoids valgrind warning - we buffer values in c in d + * and only write them when we've read 8 ahead... + */ + for (i = 0; i < 17; i++) + s[(i + rows) * pitch] = s[(rows - 1) * pitch]; + + for (i = -8; i <= 6; i++) { + sumsq += s[i * pitch] * s[i * pitch]; + sum += s[i * pitch]; + } + + for (r = 0; r < rows + 8; r++) { + sumsq += s[7 * pitch] * s[7 * pitch] - s[-8 * pitch] * s[-8 * pitch]; + sum += s[7 * pitch] - s[-8 * pitch]; + d[r & 15] = s[0]; + + if (sumsq * 15 - sum * sum < flimit) { + d[r & 15] = (rv2[r & 127] + sum + s[0]) >> 4; + } + if (r >= 8) + s[-8 * pitch] = d[(r - 8) & 15]; + s += pitch; + } + } +} + +#if CONFIG_POSTPROC +static void vpx_de_mblock(YV12_BUFFER_CONFIG *post, + int q) { + vpx_mbpost_proc_across_ip(post->y_buffer, post->y_stride, post->y_height, + post->y_width, q2mbl(q)); + vpx_mbpost_proc_down(post->y_buffer, post->y_stride, post->y_height, + post->y_width, q2mbl(q)); +} + +#endif diff --git a/vpx_dsp/mips/deblock_msa.c b/vpx_dsp/mips/deblock_msa.c new file mode 100644 index 000000000..e98a0399b --- /dev/null +++ b/vpx_dsp/mips/deblock_msa.c @@ -0,0 +1,682 @@ +/* + * Copyright (c) 2016 The WebM project authors. All Rights Reserved. + * + * Use of this source code is governed by a BSD-style license + * that can be found in the LICENSE file in the root of the source + * tree. An additional intellectual property rights grant can be found + * in the file PATENTS. All contributing project authors may + * be found in the AUTHORS file in the root of the source tree. + */ + +#include +#include "./macros_msa.h" + +extern const int16_t vpx_rv[]; + +#define VPX_TRANSPOSE8x16_UB_UB(in0, in1, in2, in3, in4, in5, in6, in7, \ + out0, out1, out2, out3, \ + out4, out5, out6, out7, \ + out8, out9, out10, out11, \ + out12, out13, out14, out15) \ +{ \ + v8i16 temp0, temp1, temp2, temp3, temp4; \ + v8i16 temp5, temp6, temp7, temp8, temp9; \ + \ + ILVR_B4_SH(in1, in0, in3, in2, in5, in4, in7, in6, \ + temp0, temp1, temp2, temp3); \ + ILVR_H2_SH(temp1, temp0, temp3, temp2, temp4, temp5); \ + ILVRL_W2_SH(temp5, temp4, temp6, temp7); \ + ILVL_H2_SH(temp1, temp0, temp3, temp2, temp4, temp5); \ + ILVRL_W2_SH(temp5, temp4, temp8, temp9); \ + ILVL_B4_SH(in1, in0, in3, in2, in5, in4, in7, in6, \ + temp0, temp1, temp2, temp3); \ + ILVR_H2_SH(temp1, temp0, temp3, temp2, temp4, temp5); \ + ILVRL_W2_UB(temp5, temp4, out8, out10); \ + ILVL_H2_SH(temp1, temp0, temp3, temp2, temp4, temp5); \ + ILVRL_W2_UB(temp5, temp4, out12, out14); \ + out0 = (v16u8)temp6; \ + out2 = (v16u8)temp7; \ + out4 = (v16u8)temp8; \ + out6 = (v16u8)temp9; \ + out9 = (v16u8)__msa_ilvl_d((v2i64)out8, (v2i64)out8); \ + out11 = (v16u8)__msa_ilvl_d((v2i64)out10, (v2i64)out10); \ + out13 = (v16u8)__msa_ilvl_d((v2i64)out12, (v2i64)out12); \ + out15 = (v16u8)__msa_ilvl_d((v2i64)out14, (v2i64)out14); \ + out1 = (v16u8)__msa_ilvl_d((v2i64)out0, (v2i64)out0); \ + out3 = (v16u8)__msa_ilvl_d((v2i64)out2, (v2i64)out2); \ + out5 = (v16u8)__msa_ilvl_d((v2i64)out4, (v2i64)out4); \ + out7 = (v16u8)__msa_ilvl_d((v2i64)out6, (v2i64)out6); \ +} + +#define VPX_AVER_IF_RETAIN(above2_in, above1_in, src_in, \ + below1_in, below2_in, ref, out) \ +{ \ + v16u8 temp0, temp1; \ + \ + temp1 = __msa_aver_u_b(above2_in, above1_in); \ + temp0 = __msa_aver_u_b(below2_in, below1_in); \ + temp1 = __msa_aver_u_b(temp1, temp0); \ + out = __msa_aver_u_b(src_in, temp1); \ + temp0 = __msa_asub_u_b(src_in, above2_in); \ + temp1 = __msa_asub_u_b(src_in, above1_in); \ + temp0 = (temp0 < ref); \ + temp1 = (temp1 < ref); \ + temp0 = temp0 & temp1; \ + temp1 = __msa_asub_u_b(src_in, below1_in); \ + temp1 = (temp1 < ref); \ + temp0 = temp0 & temp1; \ + temp1 = __msa_asub_u_b(src_in, below2_in); \ + temp1 = (temp1 < ref); \ + temp0 = temp0 & temp1; \ + out = __msa_bmz_v(out, src_in, temp0); \ +} + +#define TRANSPOSE12x16_B(in0, in1, in2, in3, in4, in5, in6, in7, \ + in8, in9, in10, in11, in12, in13, in14, in15) \ +{ \ + v8i16 temp0, temp1, temp2, temp3, temp4; \ + v8i16 temp5, temp6, temp7, temp8, temp9; \ + \ + ILVR_B2_SH(in1, in0, in3, in2, temp0, temp1); \ + ILVRL_H2_SH(temp1, temp0, temp2, temp3); \ + ILVR_B2_SH(in5, in4, in7, in6, temp0, temp1); \ + ILVRL_H2_SH(temp1, temp0, temp4, temp5); \ + ILVRL_W2_SH(temp4, temp2, temp0, temp1); \ + ILVRL_W2_SH(temp5, temp3, temp2, temp3); \ + ILVR_B2_SH(in9, in8, in11, in10, temp4, temp5); \ + ILVR_B2_SH(in9, in8, in11, in10, temp4, temp5); \ + ILVRL_H2_SH(temp5, temp4, temp6, temp7); \ + ILVR_B2_SH(in13, in12, in15, in14, temp4, temp5); \ + ILVRL_H2_SH(temp5, temp4, temp8, temp9); \ + ILVRL_W2_SH(temp8, temp6, temp4, temp5); \ + ILVRL_W2_SH(temp9, temp7, temp6, temp7); \ + ILVL_B2_SH(in1, in0, in3, in2, temp8, temp9); \ + ILVR_D2_UB(temp4, temp0, temp5, temp1, in0, in2); \ + in1 = (v16u8)__msa_ilvl_d((v2i64)temp4, (v2i64)temp0); \ + in3 = (v16u8)__msa_ilvl_d((v2i64)temp5, (v2i64)temp1); \ + ILVL_B2_SH(in5, in4, in7, in6, temp0, temp1); \ + ILVR_D2_UB(temp6, temp2, temp7, temp3, in4, in6); \ + in5 = (v16u8)__msa_ilvl_d((v2i64)temp6, (v2i64)temp2); \ + in7 = (v16u8)__msa_ilvl_d((v2i64)temp7, (v2i64)temp3); \ + ILVL_B4_SH(in9, in8, in11, in10, in13, in12, in15, in14, \ + temp2, temp3, temp4, temp5); \ + ILVR_H4_SH(temp9, temp8, temp1, temp0, temp3, temp2, temp5, temp4, \ + temp6, temp7, temp8, temp9); \ + ILVR_W2_SH(temp7, temp6, temp9, temp8, temp0, temp1); \ + in8 = (v16u8)__msa_ilvr_d((v2i64)temp1, (v2i64)temp0); \ + in9 = (v16u8)__msa_ilvl_d((v2i64)temp1, (v2i64)temp0); \ + ILVL_W2_SH(temp7, temp6, temp9, temp8, temp2, temp3); \ + in10 = (v16u8)__msa_ilvr_d((v2i64)temp3, (v2i64)temp2); \ + in11 = (v16u8)__msa_ilvl_d((v2i64)temp3, (v2i64)temp2); \ +} + +#define VPX_TRANSPOSE12x8_UB_UB(in0, in1, in2, in3, in4, in5, \ + in6, in7, in8, in9, in10, in11) \ +{ \ + v8i16 temp0, temp1, temp2, temp3; \ + v8i16 temp4, temp5, temp6, temp7; \ + \ + ILVR_B2_SH(in1, in0, in3, in2, temp0, temp1); \ + ILVRL_H2_SH(temp1, temp0, temp2, temp3); \ + ILVR_B2_SH(in5, in4, in7, in6, temp0, temp1); \ + ILVRL_H2_SH(temp1, temp0, temp4, temp5); \ + ILVRL_W2_SH(temp4, temp2, temp0, temp1); \ + ILVRL_W2_SH(temp5, temp3, temp2, temp3); \ + ILVL_B2_SH(in1, in0, in3, in2, temp4, temp5); \ + temp4 = __msa_ilvr_h(temp5, temp4); \ + ILVL_B2_SH(in5, in4, in7, in6, temp6, temp7); \ + temp5 = __msa_ilvr_h(temp7, temp6); \ + ILVRL_W2_SH(temp5, temp4, temp6, temp7); \ + in0 = (v16u8)temp0; \ + in2 = (v16u8)temp1; \ + in4 = (v16u8)temp2; \ + in6 = (v16u8)temp3; \ + in8 = (v16u8)temp6; \ + in10 = (v16u8)temp7; \ + in1 = (v16u8)__msa_ilvl_d((v2i64)temp0, (v2i64)temp0); \ + in3 = (v16u8)__msa_ilvl_d((v2i64)temp1, (v2i64)temp1); \ + in5 = (v16u8)__msa_ilvl_d((v2i64)temp2, (v2i64)temp2); \ + in7 = (v16u8)__msa_ilvl_d((v2i64)temp3, (v2i64)temp3); \ + in9 = (v16u8)__msa_ilvl_d((v2i64)temp6, (v2i64)temp6); \ + in11 = (v16u8)__msa_ilvl_d((v2i64)temp7, (v2i64)temp7); \ +} + +static void postproc_down_across_chroma_msa(uint8_t *src_ptr, uint8_t *dst_ptr, + int32_t src_stride, + int32_t dst_stride, int32_t cols, + uint8_t *f) { + uint8_t *p_src = src_ptr; + uint8_t *p_dst = dst_ptr; + uint8_t *f_orig = f; + uint8_t *p_dst_st = dst_ptr; + uint16_t col; + uint64_t out0, out1, out2, out3; + v16u8 above2, above1, below2, below1, src, ref, ref_temp; + v16u8 inter0, inter1, inter2, inter3, inter4, inter5; + v16u8 inter6, inter7, inter8, inter9, inter10, inter11; + + for (col = (cols / 16); col--;) { + ref = LD_UB(f); + LD_UB2(p_src - 2 * src_stride, src_stride, above2, above1); + src = LD_UB(p_src); + LD_UB2(p_src + 1 * src_stride, src_stride, below1, below2); + VPX_AVER_IF_RETAIN(above2, above1, src, below1, below2, ref, inter0); + above2 = LD_UB(p_src + 3 * src_stride); + VPX_AVER_IF_RETAIN(above1, src, below1, below2, above2, ref, inter1); + above1 = LD_UB(p_src + 4 * src_stride); + VPX_AVER_IF_RETAIN(src, below1, below2, above2, above1, ref, inter2); + src = LD_UB(p_src + 5 * src_stride); + VPX_AVER_IF_RETAIN(below1, below2, above2, above1, src, ref, inter3); + below1 = LD_UB(p_src + 6 * src_stride); + VPX_AVER_IF_RETAIN(below2, above2, above1, src, below1, ref, inter4); + below2 = LD_UB(p_src + 7 * src_stride); + VPX_AVER_IF_RETAIN(above2, above1, src, below1, below2, ref, inter5); + above2 = LD_UB(p_src + 8 * src_stride); + VPX_AVER_IF_RETAIN(above1, src, below1, below2, above2, ref, inter6); + above1 = LD_UB(p_src + 9 * src_stride); + VPX_AVER_IF_RETAIN(src, below1, below2, above2, above1, ref, inter7); + ST_UB8(inter0, inter1, inter2, inter3, inter4, inter5, inter6, inter7, + p_dst, dst_stride); + + p_dst += 16; + p_src += 16; + f += 16; + } + + if (0 != (cols / 16)) { + ref = LD_UB(f); + LD_UB2(p_src - 2 * src_stride, src_stride, above2, above1); + src = LD_UB(p_src); + LD_UB2(p_src + 1 * src_stride, src_stride, below1, below2); + VPX_AVER_IF_RETAIN(above2, above1, src, below1, below2, ref, inter0); + above2 = LD_UB(p_src + 3 * src_stride); + VPX_AVER_IF_RETAIN(above1, src, below1, below2, above2, ref, inter1); + above1 = LD_UB(p_src + 4 * src_stride); + VPX_AVER_IF_RETAIN(src, below1, below2, above2, above1, ref, inter2); + src = LD_UB(p_src + 5 * src_stride); + VPX_AVER_IF_RETAIN(below1, below2, above2, above1, src, ref, inter3); + below1 = LD_UB(p_src + 6 * src_stride); + VPX_AVER_IF_RETAIN(below2, above2, above1, src, below1, ref, inter4); + below2 = LD_UB(p_src + 7 * src_stride); + VPX_AVER_IF_RETAIN(above2, above1, src, below1, below2, ref, inter5); + above2 = LD_UB(p_src + 8 * src_stride); + VPX_AVER_IF_RETAIN(above1, src, below1, below2, above2, ref, inter6); + above1 = LD_UB(p_src + 9 * src_stride); + VPX_AVER_IF_RETAIN(src, below1, below2, above2, above1, ref, inter7); + out0 = __msa_copy_u_d((v2i64) inter0, 0); + out1 = __msa_copy_u_d((v2i64) inter1, 0); + out2 = __msa_copy_u_d((v2i64) inter2, 0); + out3 = __msa_copy_u_d((v2i64) inter3, 0); + SD4(out0, out1, out2, out3, p_dst, dst_stride); + + out0 = __msa_copy_u_d((v2i64) inter4, 0); + out1 = __msa_copy_u_d((v2i64) inter5, 0); + out2 = __msa_copy_u_d((v2i64) inter6, 0); + out3 = __msa_copy_u_d((v2i64) inter7, 0); + SD4(out0, out1, out2, out3, p_dst + 4 * dst_stride, dst_stride); + } + + f = f_orig; + p_dst = dst_ptr - 2; + LD_UB8(p_dst, dst_stride, inter0, inter1, inter2, inter3, inter4, inter5, + inter6, inter7); + + for (col = 0; col < (cols / 8); ++col) { + ref = LD_UB(f); + f += 8; + VPX_TRANSPOSE12x8_UB_UB(inter0, inter1, inter2, inter3, inter4, inter5, + inter6, inter7, inter8, inter9, inter10, inter11); + if (0 == col) { + above2 = inter2; + above1 = inter2; + } else { + above2 = inter0; + above1 = inter1; + } + src = inter2; + below1 = inter3; + below2 = inter4; + ref_temp = (v16u8) __msa_splati_b((v16i8) ref, 0); + VPX_AVER_IF_RETAIN(above2, above1, src, below1, below2, ref_temp, inter2); + above2 = inter5; + ref_temp = (v16u8) __msa_splati_b((v16i8) ref, 1); + VPX_AVER_IF_RETAIN(above1, src, below1, below2, above2, ref_temp, inter3); + above1 = inter6; + ref_temp = (v16u8) __msa_splati_b((v16i8) ref, 2); + VPX_AVER_IF_RETAIN(src, below1, below2, above2, above1, ref_temp, inter4); + src = inter7; + ref_temp = (v16u8) __msa_splati_b((v16i8) ref, 3); + VPX_AVER_IF_RETAIN(below1, below2, above2, above1, src, ref_temp, inter5); + below1 = inter8; + ref_temp = (v16u8) __msa_splati_b((v16i8) ref, 4); + VPX_AVER_IF_RETAIN(below2, above2, above1, src, below1, ref_temp, inter6); + below2 = inter9; + ref_temp = (v16u8) __msa_splati_b((v16i8) ref, 5); + VPX_AVER_IF_RETAIN(above2, above1, src, below1, below2, ref_temp, inter7); + if (col == (cols / 8 - 1)) { + above2 = inter9; + } else { + above2 = inter10; + } + ref_temp = (v16u8) __msa_splati_b((v16i8) ref, 6); + VPX_AVER_IF_RETAIN(above1, src, below1, below2, above2, ref_temp, inter8); + if (col == (cols / 8 - 1)) { + above1 = inter9; + } else { + above1 = inter11; + } + ref_temp = (v16u8) __msa_splati_b((v16i8) ref, 7); + VPX_AVER_IF_RETAIN(src, below1, below2, above2, above1, ref_temp, inter9); + TRANSPOSE8x8_UB_UB(inter2, inter3, inter4, inter5, inter6, inter7, inter8, + inter9, inter2, inter3, inter4, inter5, inter6, inter7, + inter8, inter9); + p_dst += 8; + LD_UB2(p_dst, dst_stride, inter0, inter1); + ST8x1_UB(inter2, p_dst_st); + ST8x1_UB(inter3, (p_dst_st + 1 * dst_stride)); + LD_UB2(p_dst + 2 * dst_stride, dst_stride, inter2, inter3); + ST8x1_UB(inter4, (p_dst_st + 2 * dst_stride)); + ST8x1_UB(inter5, (p_dst_st + 3 * dst_stride)); + LD_UB2(p_dst + 4 * dst_stride, dst_stride, inter4, inter5); + ST8x1_UB(inter6, (p_dst_st + 4 * dst_stride)); + ST8x1_UB(inter7, (p_dst_st + 5 * dst_stride)); + LD_UB2(p_dst + 6 * dst_stride, dst_stride, inter6, inter7); + ST8x1_UB(inter8, (p_dst_st + 6 * dst_stride)); + ST8x1_UB(inter9, (p_dst_st + 7 * dst_stride)); + p_dst_st += 8; + } +} + +static void postproc_down_across_luma_msa(uint8_t *src_ptr, uint8_t *dst_ptr, + int32_t src_stride, + int32_t dst_stride, int32_t cols, + uint8_t *f) { + uint8_t *p_src = src_ptr; + uint8_t *p_dst = dst_ptr; + uint8_t *p_dst_st = dst_ptr; + uint8_t *f_orig = f; + uint16_t col; + v16u8 above2, above1, below2, below1; + v16u8 src, ref, ref_temp; + v16u8 inter0, inter1, inter2, inter3, inter4, inter5, inter6; + v16u8 inter7, inter8, inter9, inter10, inter11; + v16u8 inter12, inter13, inter14, inter15; + + for (col = (cols / 16); col--;) { + ref = LD_UB(f); + LD_UB2(p_src - 2 * src_stride, src_stride, above2, above1); + src = LD_UB(p_src); + LD_UB2(p_src + 1 * src_stride, src_stride, below1, below2); + VPX_AVER_IF_RETAIN(above2, above1, src, below1, below2, ref, inter0); + above2 = LD_UB(p_src + 3 * src_stride); + VPX_AVER_IF_RETAIN(above1, src, below1, below2, above2, ref, inter1); + above1 = LD_UB(p_src + 4 * src_stride); + VPX_AVER_IF_RETAIN(src, below1, below2, above2, above1, ref, inter2); + src = LD_UB(p_src + 5 * src_stride); + VPX_AVER_IF_RETAIN(below1, below2, above2, above1, src, ref, inter3); + below1 = LD_UB(p_src + 6 * src_stride); + VPX_AVER_IF_RETAIN(below2, above2, above1, src, below1, ref, inter4); + below2 = LD_UB(p_src + 7 * src_stride); + VPX_AVER_IF_RETAIN(above2, above1, src, below1, below2, ref, inter5); + above2 = LD_UB(p_src + 8 * src_stride); + VPX_AVER_IF_RETAIN(above1, src, below1, below2, above2, ref, inter6); + above1 = LD_UB(p_src + 9 * src_stride); + VPX_AVER_IF_RETAIN(src, below1, below2, above2, above1, ref, inter7); + src = LD_UB(p_src + 10 * src_stride); + VPX_AVER_IF_RETAIN(below1, below2, above2, above1, src, ref, inter8); + below1 = LD_UB(p_src + 11 * src_stride); + VPX_AVER_IF_RETAIN(below2, above2, above1, src, below1, ref, inter9); + below2 = LD_UB(p_src + 12 * src_stride); + VPX_AVER_IF_RETAIN(above2, above1, src, below1, below2, ref, inter10); + above2 = LD_UB(p_src + 13 * src_stride); + VPX_AVER_IF_RETAIN(above1, src, below1, below2, above2, ref, inter11); + above1 = LD_UB(p_src + 14 * src_stride); + VPX_AVER_IF_RETAIN(src, below1, below2, above2, above1, ref, inter12); + src = LD_UB(p_src + 15 * src_stride); + VPX_AVER_IF_RETAIN(below1, below2, above2, above1, src, ref, inter13); + below1 = LD_UB(p_src + 16 * src_stride); + VPX_AVER_IF_RETAIN(below2, above2, above1, src, below1, ref, inter14); + below2 = LD_UB(p_src + 17 * src_stride); + VPX_AVER_IF_RETAIN(above2, above1, src, below1, below2, ref, inter15); + ST_UB8(inter0, inter1, inter2, inter3, inter4, inter5, inter6, inter7, + p_dst, dst_stride); + ST_UB8(inter8, inter9, inter10, inter11, inter12, inter13, inter14, inter15, + p_dst + 8 * dst_stride, dst_stride); + p_src += 16; + p_dst += 16; + f += 16; + } + + f = f_orig; + p_dst = dst_ptr - 2; + LD_UB8(p_dst, dst_stride, inter0, inter1, inter2, inter3, inter4, inter5, + inter6, inter7); + LD_UB8(p_dst + 8 * dst_stride, dst_stride, inter8, inter9, inter10, inter11, + inter12, inter13, inter14, inter15); + + for (col = 0; col < cols / 8; ++col) { + ref = LD_UB(f); + f += 8; + TRANSPOSE12x16_B(inter0, inter1, inter2, inter3, inter4, inter5, inter6, + inter7, inter8, inter9, inter10, inter11, inter12, inter13, + inter14, inter15); + if (0 == col) { + above2 = inter2; + above1 = inter2; + } else { + above2 = inter0; + above1 = inter1; + } + + src = inter2; + below1 = inter3; + below2 = inter4; + ref_temp = (v16u8) __msa_splati_b((v16i8) ref, 0); + VPX_AVER_IF_RETAIN(above2, above1, src, below1, below2, ref_temp, inter2); + above2 = inter5; + ref_temp = (v16u8) __msa_splati_b((v16i8) ref, 1); + VPX_AVER_IF_RETAIN(above1, src, below1, below2, above2, ref_temp, inter3); + above1 = inter6; + ref_temp = (v16u8) __msa_splati_b((v16i8) ref, 2); + VPX_AVER_IF_RETAIN(src, below1, below2, above2, above1, ref_temp, inter4); + src = inter7; + ref_temp = (v16u8) __msa_splati_b((v16i8) ref, 3); + VPX_AVER_IF_RETAIN(below1, below2, above2, above1, src, ref_temp, inter5); + below1 = inter8; + ref_temp = (v16u8) __msa_splati_b((v16i8) ref, 4); + VPX_AVER_IF_RETAIN(below2, above2, above1, src, below1, ref_temp, inter6); + below2 = inter9; + ref_temp = (v16u8) __msa_splati_b((v16i8) ref, 5); + VPX_AVER_IF_RETAIN(above2, above1, src, below1, below2, ref_temp, inter7); + if (col == (cols / 8 - 1)) { + above2 = inter9; + } else { + above2 = inter10; + } + ref_temp = (v16u8) __msa_splati_b((v16i8) ref, 6); + VPX_AVER_IF_RETAIN(above1, src, below1, below2, above2, ref_temp, inter8); + if (col == (cols / 8 - 1)) { + above1 = inter9; + } else { + above1 = inter11; + } + ref_temp = (v16u8) __msa_splati_b((v16i8) ref, 7); + VPX_AVER_IF_RETAIN(src, below1, below2, above2, above1, ref_temp, inter9); + VPX_TRANSPOSE8x16_UB_UB(inter2, inter3, inter4, inter5, inter6, inter7, + inter8, inter9, inter2, inter3, inter4, inter5, + inter6, inter7, inter8, inter9, inter10, inter11, + inter12, inter13, inter14, inter15, above2, above1); + + p_dst += 8; + LD_UB2(p_dst, dst_stride, inter0, inter1); + ST8x1_UB(inter2, p_dst_st); + ST8x1_UB(inter3, (p_dst_st + 1 * dst_stride)); + LD_UB2(p_dst + 2 * dst_stride, dst_stride, inter2, inter3); + ST8x1_UB(inter4, (p_dst_st + 2 * dst_stride)); + ST8x1_UB(inter5, (p_dst_st + 3 * dst_stride)); + LD_UB2(p_dst + 4 * dst_stride, dst_stride, inter4, inter5); + ST8x1_UB(inter6, (p_dst_st + 4 * dst_stride)); + ST8x1_UB(inter7, (p_dst_st + 5 * dst_stride)); + LD_UB2(p_dst + 6 * dst_stride, dst_stride, inter6, inter7); + ST8x1_UB(inter8, (p_dst_st + 6 * dst_stride)); + ST8x1_UB(inter9, (p_dst_st + 7 * dst_stride)); + LD_UB2(p_dst + 8 * dst_stride, dst_stride, inter8, inter9); + ST8x1_UB(inter10, (p_dst_st + 8 * dst_stride)); + ST8x1_UB(inter11, (p_dst_st + 9 * dst_stride)); + LD_UB2(p_dst + 10 * dst_stride, dst_stride, inter10, inter11); + ST8x1_UB(inter12, (p_dst_st + 10 * dst_stride)); + ST8x1_UB(inter13, (p_dst_st + 11 * dst_stride)); + LD_UB2(p_dst + 12 * dst_stride, dst_stride, inter12, inter13); + ST8x1_UB(inter14, (p_dst_st + 12 * dst_stride)); + ST8x1_UB(inter15, (p_dst_st + 13 * dst_stride)); + LD_UB2(p_dst + 14 * dst_stride, dst_stride, inter14, inter15); + ST8x1_UB(above2, (p_dst_st + 14 * dst_stride)); + ST8x1_UB(above1, (p_dst_st + 15 * dst_stride)); + p_dst_st += 8; + } +} + +void vpx_post_proc_down_and_across_mb_row_msa(uint8_t *src, uint8_t *dst, + int32_t src_stride, + int32_t dst_stride, int32_t cols, + uint8_t *f, int32_t size) { + if (8 == size) { + postproc_down_across_chroma_msa(src, dst, src_stride, dst_stride, cols, f); + } else if (16 == size) { + postproc_down_across_luma_msa(src, dst, src_stride, dst_stride, cols, f); + } +} + +void vpx_mbpost_proc_across_ip_msa(uint8_t *src_ptr, int32_t pitch, + int32_t rows, int32_t cols, int32_t flimit) { + int32_t row, col, cnt; + uint8_t *src_dup = src_ptr; + v16u8 src0, src, tmp_orig; + v16u8 tmp = {0}; + v16i8 zero = {0}; + v8u16 sum_h, src_r_h, src_l_h; + v4u32 src_r_w, src_l_w; + v4i32 flimit_vec; + + flimit_vec = __msa_fill_w(flimit); + for (row = rows; row--;) { + int32_t sum_sq = 0; + int32_t sum = 0; + src0 = (v16u8) __msa_fill_b(src_dup[0]); + ST8x1_UB(src0, (src_dup - 8)); + + src0 = (v16u8) __msa_fill_b(src_dup[cols - 1]); + ST_UB(src0, src_dup + cols); + src_dup[cols + 16] = src_dup[cols - 1]; + tmp_orig = (v16u8) __msa_ldi_b(0); + tmp_orig[15] = tmp[15]; + src = LD_UB(src_dup - 8); + src[15] = 0; + ILVRL_B2_UH(zero, src, src_r_h, src_l_h); + src_r_w = __msa_dotp_u_w(src_r_h, src_r_h); + src_l_w = __msa_dotp_u_w(src_l_h, src_l_h); + sum_sq = HADD_SW_S32(src_r_w); + sum_sq += HADD_SW_S32(src_l_w); + sum_h = __msa_hadd_u_h(src, src); + sum = HADD_UH_U32(sum_h); + { + v16u8 src7, src8, src_r, src_l; + v16i8 mask; + v8u16 add_r, add_l; + v8i16 sub_r, sub_l, sum_r, sum_l, mask0, mask1; + v4i32 sum_sq0, sum_sq1, sum_sq2, sum_sq3; + v4i32 sub0, sub1, sub2, sub3; + v4i32 sum0_w, sum1_w, sum2_w, sum3_w; + v4i32 mul0, mul1, mul2, mul3; + v4i32 total0, total1, total2, total3; + v8i16 const8 = __msa_fill_h(8); + + src7 = LD_UB(src_dup + 7); + src8 = LD_UB(src_dup - 8); + for (col = 0; col < (cols >> 4); ++col) { + ILVRL_B2_UB(src7, src8, src_r, src_l); + HSUB_UB2_SH(src_r, src_l, sub_r, sub_l); + + sum_r[0] = sum + sub_r[0]; + for (cnt = 0; cnt < 7; ++cnt) { + sum_r[cnt + 1] = sum_r[cnt] + sub_r[cnt + 1]; + } + sum_l[0] = sum_r[7] + sub_l[0]; + for (cnt = 0; cnt < 7; ++cnt) { + sum_l[cnt + 1] = sum_l[cnt] + sub_l[cnt + 1]; + } + sum = sum_l[7]; + src = LD_UB(src_dup + 16 * col); + ILVRL_B2_UH(zero, src, src_r_h, src_l_h); + src7 = (v16u8)((const8 + sum_r + (v8i16) src_r_h) >> 4); + src8 = (v16u8)((const8 + sum_l + (v8i16) src_l_h) >> 4); + tmp = (v16u8) __msa_pckev_b((v16i8) src8, (v16i8) src7); + + HADD_UB2_UH(src_r, src_l, add_r, add_l); + UNPCK_SH_SW(sub_r, sub0, sub1); + UNPCK_SH_SW(sub_l, sub2, sub3); + ILVR_H2_SW(zero, add_r, zero, add_l, sum0_w, sum2_w); + ILVL_H2_SW(zero, add_r, zero, add_l, sum1_w, sum3_w); + MUL4(sum0_w, sub0, sum1_w, sub1, sum2_w, sub2, sum3_w, sub3, mul0, mul1, + mul2, mul3); + sum_sq0[0] = sum_sq + mul0[0]; + for (cnt = 0; cnt < 3; ++cnt) { + sum_sq0[cnt + 1] = sum_sq0[cnt] + mul0[cnt + 1]; + } + sum_sq1[0] = sum_sq0[3] + mul1[0]; + for (cnt = 0; cnt < 3; ++cnt) { + sum_sq1[cnt + 1] = sum_sq1[cnt] + mul1[cnt + 1]; + } + sum_sq2[0] = sum_sq1[3] + mul2[0]; + for (cnt = 0; cnt < 3; ++cnt) { + sum_sq2[cnt + 1] = sum_sq2[cnt] + mul2[cnt + 1]; + } + sum_sq3[0] = sum_sq2[3] + mul3[0]; + for (cnt = 0; cnt < 3; ++cnt) { + sum_sq3[cnt + 1] = sum_sq3[cnt] + mul3[cnt + 1]; + } + sum_sq = sum_sq3[3]; + + UNPCK_SH_SW(sum_r, sum0_w, sum1_w); + UNPCK_SH_SW(sum_l, sum2_w, sum3_w); + total0 = sum_sq0 * __msa_ldi_w(15); + total0 -= sum0_w * sum0_w; + total1 = sum_sq1 * __msa_ldi_w(15); + total1 -= sum1_w * sum1_w; + total2 = sum_sq2 * __msa_ldi_w(15); + total2 -= sum2_w * sum2_w; + total3 = sum_sq3 * __msa_ldi_w(15); + total3 -= sum3_w * sum3_w; + total0 = (total0 < flimit_vec); + total1 = (total1 < flimit_vec); + total2 = (total2 < flimit_vec); + total3 = (total3 < flimit_vec); + PCKEV_H2_SH(total1, total0, total3, total2, mask0, mask1); + mask = __msa_pckev_b((v16i8) mask1, (v16i8) mask0); + tmp = __msa_bmz_v(tmp, src, (v16u8) mask); + + if (col == 0) { + uint64_t src_d; + + src_d = __msa_copy_u_d((v2i64) tmp_orig, 1); + SD(src_d, (src_dup - 8)); + } + + src7 = LD_UB(src_dup + 16 * (col + 1) + 7); + src8 = LD_UB(src_dup + 16 * (col + 1) - 8); + ST_UB(tmp, (src_dup + (16 * col))); + } + + src_dup += pitch; + } + } +} + +void vpx_mbpost_proc_down_msa(uint8_t *dst_ptr, int32_t pitch, int32_t rows, + int32_t cols, int32_t flimit) { + int32_t row, col, cnt, i; + const int16_t *rv3 = &vpx_rv[63 & rand()]; + v4i32 flimit_vec; + v16u8 dst7, dst8, dst_r_b, dst_l_b; + v16i8 mask; + v8u16 add_r, add_l; + v8i16 dst_r_h, dst_l_h, sub_r, sub_l, mask0, mask1; + v4i32 sub0, sub1, sub2, sub3, total0, total1, total2, total3; + + flimit_vec = __msa_fill_w(flimit); + + for (col = 0; col < (cols >> 4); ++col) { + uint8_t *dst_tmp = &dst_ptr[col << 4]; + v16u8 dst; + v16i8 zero = {0}; + v16u8 tmp[16]; + v8i16 mult0, mult1, rv2_0, rv2_1; + v8i16 sum0_h = {0}; + v8i16 sum1_h = {0}; + v4i32 mul0 = {0}; + v4i32 mul1 = {0}; + v4i32 mul2 = {0}; + v4i32 mul3 = {0}; + v4i32 sum0_w, sum1_w, sum2_w, sum3_w; + v4i32 add0, add1, add2, add3; + const int16_t *rv2[16]; + + dst = LD_UB(dst_tmp); + for (cnt = (col << 4), i = 0; i < 16; ++cnt) { + rv2[i] = rv3 + ((cnt * 17) & 127); + ++i; + } + for (cnt = -8; cnt < 0; ++cnt) { + ST_UB(dst, dst_tmp + cnt * pitch); + } + + dst = LD_UB((dst_tmp + (rows - 1) * pitch)); + for (cnt = rows; cnt < rows + 17; ++cnt) { + ST_UB(dst, dst_tmp + cnt * pitch); + } + for (cnt = -8; cnt <= 6; ++cnt) { + dst = LD_UB(dst_tmp + (cnt * pitch)); + UNPCK_UB_SH(dst, dst_r_h, dst_l_h); + MUL2(dst_r_h, dst_r_h, dst_l_h, dst_l_h, mult0, mult1); + mul0 += (v4i32) __msa_ilvr_h((v8i16) zero, (v8i16) mult0); + mul1 += (v4i32) __msa_ilvl_h((v8i16) zero, (v8i16) mult0); + mul2 += (v4i32) __msa_ilvr_h((v8i16) zero, (v8i16) mult1); + mul3 += (v4i32) __msa_ilvl_h((v8i16) zero, (v8i16) mult1); + ADD2(sum0_h, dst_r_h, sum1_h, dst_l_h, sum0_h, sum1_h); + } + + for (row = 0; row < (rows + 8); ++row) { + for (i = 0; i < 8; ++i) { + rv2_0[i] = *(rv2[i] + (row & 127)); + rv2_1[i] = *(rv2[i + 8] + (row & 127)); + } + dst7 = LD_UB(dst_tmp + (7 * pitch)); + dst8 = LD_UB(dst_tmp - (8 * pitch)); + ILVRL_B2_UB(dst7, dst8, dst_r_b, dst_l_b); + + HSUB_UB2_SH(dst_r_b, dst_l_b, sub_r, sub_l); + UNPCK_SH_SW(sub_r, sub0, sub1); + UNPCK_SH_SW(sub_l, sub2, sub3); + sum0_h += sub_r; + sum1_h += sub_l; + + HADD_UB2_UH(dst_r_b, dst_l_b, add_r, add_l); + + ILVRL_H2_SW(zero, add_r, add0, add1); + ILVRL_H2_SW(zero, add_l, add2, add3); + mul0 += add0 * sub0; + mul1 += add1 * sub1; + mul2 += add2 * sub2; + mul3 += add3 * sub3; + dst = LD_UB(dst_tmp); + ILVRL_B2_SH(zero, dst, dst_r_h, dst_l_h); + dst7 = (v16u8)((rv2_0 + sum0_h + dst_r_h) >> 4); + dst8 = (v16u8)((rv2_1 + sum1_h + dst_l_h) >> 4); + tmp[row & 15] = (v16u8) __msa_pckev_b((v16i8) dst8, (v16i8) dst7); + + UNPCK_SH_SW(sum0_h, sum0_w, sum1_w); + UNPCK_SH_SW(sum1_h, sum2_w, sum3_w); + total0 = mul0 * __msa_ldi_w(15); + total0 -= sum0_w * sum0_w; + total1 = mul1 * __msa_ldi_w(15); + total1 -= sum1_w * sum1_w; + total2 = mul2 * __msa_ldi_w(15); + total2 -= sum2_w * sum2_w; + total3 = mul3 * __msa_ldi_w(15); + total3 -= sum3_w * sum3_w; + total0 = (total0 < flimit_vec); + total1 = (total1 < flimit_vec); + total2 = (total2 < flimit_vec); + total3 = (total3 < flimit_vec); + PCKEV_H2_SH(total1, total0, total3, total2, mask0, mask1); + mask = __msa_pckev_b((v16i8) mask1, (v16i8) mask0); + tmp[row & 15] = __msa_bmz_v(tmp[row & 15], dst, (v16u8) mask); + + if (row >= 8) { + ST_UB(tmp[(row - 8) & 15], (dst_tmp - 8 * pitch)); + } + + dst_tmp += pitch; + } + } +} diff --git a/vpx_dsp/mips/macros_msa.h b/vpx_dsp/mips/macros_msa.h index 91e3615cf..ea59eafe9 100644 --- a/vpx_dsp/mips/macros_msa.h +++ b/vpx_dsp/mips/macros_msa.h @@ -1060,6 +1060,7 @@ ILVL_B2(RTYPE, in4, in5, in6, in7, out2, out3); \ } #define ILVL_B4_SB(...) ILVL_B4(v16i8, __VA_ARGS__) +#define ILVL_B4_SH(...) ILVL_B4(v8i16, __VA_ARGS__) #define ILVL_B4_UH(...) ILVL_B4(v8u16, __VA_ARGS__) /* Description : Interleave left half of halfword elements from vectors @@ -1074,6 +1075,7 @@ out1 = (RTYPE)__msa_ilvl_h((v8i16)in2, (v8i16)in3); \ } #define ILVL_H2_SH(...) ILVL_H2(v8i16, __VA_ARGS__) +#define ILVL_H2_SW(...) ILVL_H2(v4i32, __VA_ARGS__) /* Description : Interleave left half of word elements from vectors Arguments : Inputs - in0, in1, in2, in3 @@ -1137,6 +1139,7 @@ out1 = (RTYPE)__msa_ilvr_h((v8i16)in2, (v8i16)in3); \ } #define ILVR_H2_SH(...) ILVR_H2(v8i16, __VA_ARGS__) +#define ILVR_H2_SW(...) ILVR_H2(v4i32, __VA_ARGS__) #define ILVR_H4(RTYPE, in0, in1, in2, in3, in4, in5, in6, in7, \ out0, out1, out2, out3) { \ @@ -1215,6 +1218,7 @@ out0 = (RTYPE)__msa_ilvr_w((v4i32)in0, (v4i32)in1); \ out1 = (RTYPE)__msa_ilvl_w((v4i32)in0, (v4i32)in1); \ } +#define ILVRL_W2_UB(...) ILVRL_W2(v16u8, __VA_ARGS__) #define ILVRL_W2_SH(...) ILVRL_W2(v8i16, __VA_ARGS__) #define ILVRL_W2_SW(...) ILVRL_W2(v4i32, __VA_ARGS__) diff --git a/vpx_dsp/postproc.h b/vpx_dsp/postproc.h new file mode 100644 index 000000000..78d11b186 --- /dev/null +++ b/vpx_dsp/postproc.h @@ -0,0 +1,25 @@ +/* + * Copyright (c) 2016 The WebM project authors. All Rights Reserved. + * + * Use of this source code is governed by a BSD-style license + * that can be found in the LICENSE file in the root of the source + * tree. An additional intellectual property rights grant can be found + * in the file PATENTS. All contributing project authors may + * be found in the AUTHORS file in the root of the source tree. + */ + +#ifndef VPX_DSP_POSTPROC_H_ +#define VPX_DSP_POSTPROC_H_ + +#ifdef __cplusplus +extern "C" { +#endif + +// Fills a noise buffer with gaussian noise strength determined by sigma. +int vpx_setup_noise(double sigma, int size, char *noise); + +#ifdef __cplusplus +} +#endif + +#endif // VPX_DSP_POSTPROC_H_ diff --git a/vpx_dsp/psnr.c b/vpx_dsp/psnr.c index 1655f116c..5bf786271 100644 --- a/vpx_dsp/psnr.c +++ b/vpx_dsp/psnr.c @@ -51,7 +51,7 @@ static void encoder_variance(const uint8_t *a, int a_stride, static void encoder_highbd_variance64(const uint8_t *a8, int a_stride, const uint8_t *b8, int b_stride, int w, int h, uint64_t *sse, - uint64_t *sum) { + int64_t *sum) { int i, j; uint16_t *a = CONVERT_TO_SHORTPTR(a8); @@ -75,7 +75,7 @@ static void encoder_highbd_8_variance(const uint8_t *a8, int a_stride, int w, int h, unsigned int *sse, int *sum) { uint64_t sse_long = 0; - uint64_t sum_long = 0; + int64_t sum_long = 0; encoder_highbd_variance64(a8, a_stride, b8, b_stride, w, h, &sse_long, &sum_long); *sse = (unsigned int)sse_long; diff --git a/vpx_dsp/psnrhvs.c b/vpx_dsp/psnrhvs.c index 095ba5d13..3708cc3c8 100644 --- a/vpx_dsp/psnrhvs.c +++ b/vpx_dsp/psnrhvs.c @@ -245,6 +245,8 @@ static double calc_psnrhvs(const unsigned char *src, int _systride, } } } + if (pixels <=0) + return 0; ret /= pixels; return ret; } diff --git a/vpx_dsp/vpx_dsp.mk b/vpx_dsp/vpx_dsp.mk index 06b46d321..43a78a878 100644 --- a/vpx_dsp/vpx_dsp.mk +++ b/vpx_dsp/vpx_dsp.mk @@ -42,24 +42,24 @@ endif # intra predictions DSP_SRCS-yes += intrapred.c -ifeq ($(CONFIG_USE_X86INC),yes) DSP_SRCS-$(HAVE_SSE) += x86/intrapred_sse2.asm DSP_SRCS-$(HAVE_SSE2) += x86/intrapred_sse2.asm DSP_SRCS-$(HAVE_SSSE3) += x86/intrapred_ssse3.asm DSP_SRCS-$(HAVE_SSSE3) += x86/vpx_subpixel_8t_ssse3.asm -endif # CONFIG_USE_X86INC ifeq ($(CONFIG_VP9_HIGHBITDEPTH),yes) -ifeq ($(CONFIG_USE_X86INC),yes) DSP_SRCS-$(HAVE_SSE) += x86/highbd_intrapred_sse2.asm DSP_SRCS-$(HAVE_SSE2) += x86/highbd_intrapred_sse2.asm -endif # CONFIG_USE_X86INC endif # CONFIG_VP9_HIGHBITDEPTH ifneq ($(filter yes,$(CONFIG_POSTPROC) $(CONFIG_VP9_POSTPROC)),) DSP_SRCS-yes += add_noise.c +DSP_SRCS-yes += deblock.c +DSP_SRCS-yes += postproc.h DSP_SRCS-$(HAVE_MSA) += mips/add_noise_msa.c +DSP_SRCS-$(HAVE_MSA) += mips/deblock_msa.c DSP_SRCS-$(HAVE_SSE2) += x86/add_noise_sse2.asm +DSP_SRCS-$(HAVE_SSE2) += x86/deblock_sse2.asm endif # CONFIG_POSTPROC DSP_SRCS-$(HAVE_NEON_ASM) += arm/intrapred_neon_asm$(ASM) @@ -102,9 +102,8 @@ ifeq ($(CONFIG_VP9_HIGHBITDEPTH),yes) DSP_SRCS-$(HAVE_SSE2) += x86/vpx_high_subpixel_8t_sse2.asm DSP_SRCS-$(HAVE_SSE2) += x86/vpx_high_subpixel_bilinear_sse2.asm endif -ifeq ($(CONFIG_USE_X86INC),yes) + DSP_SRCS-$(HAVE_SSE2) += x86/vpx_convolve_copy_sse2.asm -endif ifeq ($(HAVE_NEON_ASM),yes) DSP_SRCS-yes += arm/vpx_convolve_copy_neon_asm$(ASM) @@ -194,10 +193,8 @@ DSP_SRCS-$(HAVE_SSE2) += x86/fwd_txfm_sse2.c DSP_SRCS-$(HAVE_SSE2) += x86/fwd_txfm_impl_sse2.h DSP_SRCS-$(HAVE_SSE2) += x86/fwd_dct32x32_impl_sse2.h ifeq ($(ARCH_X86_64),yes) -ifeq ($(CONFIG_USE_X86INC),yes) DSP_SRCS-$(HAVE_SSSE3) += x86/fwd_txfm_ssse3_x86_64.asm endif -endif DSP_SRCS-$(HAVE_AVX2) += x86/fwd_txfm_avx2.c DSP_SRCS-$(HAVE_AVX2) += x86/fwd_dct32x32_impl_avx2.h DSP_SRCS-$(HAVE_NEON) += arm/fwd_txfm_neon.c @@ -212,12 +209,10 @@ DSP_SRCS-yes += inv_txfm.h DSP_SRCS-yes += inv_txfm.c DSP_SRCS-$(HAVE_SSE2) += x86/inv_txfm_sse2.h DSP_SRCS-$(HAVE_SSE2) += x86/inv_txfm_sse2.c -ifeq ($(CONFIG_USE_X86INC),yes) DSP_SRCS-$(HAVE_SSE2) += x86/inv_wht_sse2.asm ifeq ($(ARCH_X86_64),yes) DSP_SRCS-$(HAVE_SSSE3) += x86/inv_txfm_ssse3_x86_64.asm endif # ARCH_X86_64 -endif # CONFIG_USE_X86INC ifeq ($(HAVE_NEON_ASM),yes) DSP_SRCS-yes += arm/save_reg_neon$(ASM) @@ -269,11 +264,9 @@ ifeq ($(CONFIG_VP9_HIGHBITDEPTH),yes) DSP_SRCS-$(HAVE_SSE2) += x86/highbd_quantize_intrin_sse2.c endif ifeq ($(ARCH_X86_64),yes) -ifeq ($(CONFIG_USE_X86INC),yes) DSP_SRCS-$(HAVE_SSSE3) += x86/quantize_ssse3_x86_64.asm DSP_SRCS-$(HAVE_AVX) += x86/quantize_avx_x86_64.asm endif -endif # avg DSP_SRCS-yes += avg.c @@ -282,10 +275,8 @@ DSP_SRCS-$(HAVE_NEON) += arm/avg_neon.c DSP_SRCS-$(HAVE_MSA) += mips/avg_msa.c DSP_SRCS-$(HAVE_NEON) += arm/hadamard_neon.c ifeq ($(ARCH_X86_64),yes) -ifeq ($(CONFIG_USE_X86INC),yes) DSP_SRCS-$(HAVE_SSSE3) += x86/avg_ssse3_x86_64.asm endif -endif # high bit depth subtract ifeq ($(CONFIG_VP9_HIGHBITDEPTH),yes) @@ -329,7 +320,6 @@ DSP_SRCS-$(HAVE_SSE4_1) += x86/obmc_variance_sse4.c endif #CONFIG_OBMC endif #CONFIG_VP10_ENCODER -ifeq ($(CONFIG_USE_X86INC),yes) DSP_SRCS-$(HAVE_SSE) += x86/sad4d_sse2.asm DSP_SRCS-$(HAVE_SSE) += x86/sad_sse2.asm DSP_SRCS-$(HAVE_SSE2) += x86/sad4d_sse2.asm @@ -340,7 +330,7 @@ ifeq ($(CONFIG_VP9_HIGHBITDEPTH),yes) DSP_SRCS-$(HAVE_SSE2) += x86/highbd_sad4d_sse2.asm DSP_SRCS-$(HAVE_SSE2) += x86/highbd_sad_sse2.asm endif # CONFIG_VP9_HIGHBITDEPTH -endif # CONFIG_USE_X86INC + endif # CONFIG_ENCODERS ifneq ($(filter yes,$(CONFIG_ENCODERS) $(CONFIG_POSTPROC) $(CONFIG_VP9_POSTPROC)),) @@ -370,18 +360,14 @@ ifeq ($(ARCH_X86_64),yes) DSP_SRCS-$(HAVE_SSE2) += x86/ssim_opt_x86_64.asm endif # ARCH_X86_64 -ifeq ($(CONFIG_USE_X86INC),yes) DSP_SRCS-$(HAVE_SSE) += x86/subpel_variance_sse2.asm DSP_SRCS-$(HAVE_SSE2) += x86/subpel_variance_sse2.asm # Contains SSE2 and SSSE3 -endif # CONFIG_USE_X86INC ifeq ($(CONFIG_VP9_HIGHBITDEPTH),yes) DSP_SRCS-$(HAVE_SSE2) += x86/highbd_variance_sse2.c DSP_SRCS-$(HAVE_SSE4_1) += x86/highbd_variance_sse4.c DSP_SRCS-$(HAVE_SSE2) += x86/highbd_variance_impl_sse2.asm -ifeq ($(CONFIG_USE_X86INC),yes) DSP_SRCS-$(HAVE_SSE2) += x86/highbd_subpel_variance_impl_sse2.asm -endif # CONFIG_USE_X86INC endif # CONFIG_VP9_HIGHBITDEPTH endif # CONFIG_ENCODERS || CONFIG_POSTPROC || CONFIG_VP9_POSTPROC diff --git a/vpx_dsp/vpx_dsp_rtcd_defs.pl b/vpx_dsp/vpx_dsp_rtcd_defs.pl index a04a6849d..a210b793b 100644 --- a/vpx_dsp/vpx_dsp_rtcd_defs.pl +++ b/vpx_dsp/vpx_dsp_rtcd_defs.pl @@ -11,29 +11,6 @@ EOF } forward_decls qw/vpx_dsp_forward_decls/; -# x86inc.asm had specific constraints. break it out so it's easy to disable. -# zero all the variables to avoid tricky else conditions. -$mmx_x86inc = $sse_x86inc = $sse2_x86inc = $ssse3_x86inc = $avx_x86inc = - $avx2_x86inc = ''; -$mmx_x86_64_x86inc = $sse_x86_64_x86inc = $sse2_x86_64_x86inc = - $ssse3_x86_64_x86inc = $avx_x86_64_x86inc = $avx2_x86_64_x86inc = ''; -if (vpx_config("CONFIG_USE_X86INC") eq "yes") { - $mmx_x86inc = 'mmx'; - $sse_x86inc = 'sse'; - $sse2_x86inc = 'sse2'; - $ssse3_x86inc = 'ssse3'; - $avx_x86inc = 'avx'; - $avx2_x86inc = 'avx2'; - if ($opts{arch} eq "x86_64") { - $mmx_x86_64_x86inc = 'mmx'; - $sse_x86_64_x86inc = 'sse'; - $sse2_x86_64_x86inc = 'sse2'; - $ssse3_x86_64_x86inc = 'ssse3'; - $avx_x86_64_x86inc = 'avx'; - $avx2_x86_64_x86inc = 'avx2'; - } -} - # optimizations which depend on multiple features $avx2_ssse3 = ''; if ((vpx_config("HAVE_AVX2") eq "yes") && (vpx_config("HAVE_SSSE3") eq "yes")) { @@ -68,19 +45,19 @@ foreach $w (@block_widths) { # add_proto qw/void vpx_d207_predictor_4x4/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left"; -specialize qw/vpx_d207_predictor_4x4/, "$sse2_x86inc"; +specialize qw/vpx_d207_predictor_4x4 sse2/; add_proto qw/void vpx_d207e_predictor_4x4/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left"; specialize qw/vpx_d207e_predictor_4x4/; add_proto qw/void vpx_d45_predictor_4x4/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left"; -specialize qw/vpx_d45_predictor_4x4 neon/, "$sse2_x86inc"; +specialize qw/vpx_d45_predictor_4x4 neon sse2/; add_proto qw/void vpx_d45e_predictor_4x4/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left"; specialize qw/vpx_d45e_predictor_4x4/; add_proto qw/void vpx_d63_predictor_4x4/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left"; -specialize qw/vpx_d63_predictor_4x4/, "$ssse3_x86inc"; +specialize qw/vpx_d63_predictor_4x4 ssse3/; add_proto qw/void vpx_d63e_predictor_4x4/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left"; specialize qw/vpx_d63e_predictor_4x4/; @@ -89,7 +66,7 @@ add_proto qw/void vpx_d63f_predictor_4x4/, "uint8_t *dst, ptrdiff_t y_stride, co specialize qw/vpx_d63f_predictor_4x4/; add_proto qw/void vpx_h_predictor_4x4/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left"; -specialize qw/vpx_h_predictor_4x4 neon dspr2 msa/, "$sse2_x86inc"; +specialize qw/vpx_h_predictor_4x4 neon dspr2 msa sse2/; add_proto qw/void vpx_he_predictor_4x4/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left"; specialize qw/vpx_he_predictor_4x4/; @@ -101,49 +78,49 @@ add_proto qw/void vpx_d135_predictor_4x4/, "uint8_t *dst, ptrdiff_t y_stride, co specialize qw/vpx_d135_predictor_4x4 neon/; add_proto qw/void vpx_d153_predictor_4x4/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left"; -specialize qw/vpx_d153_predictor_4x4/, "$ssse3_x86inc"; +specialize qw/vpx_d153_predictor_4x4 ssse3/; add_proto qw/void vpx_v_predictor_4x4/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left"; -specialize qw/vpx_v_predictor_4x4 neon msa/, "$sse2_x86inc"; +specialize qw/vpx_v_predictor_4x4 neon msa sse2/; add_proto qw/void vpx_ve_predictor_4x4/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left"; specialize qw/vpx_ve_predictor_4x4/; add_proto qw/void vpx_tm_predictor_4x4/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left"; -specialize qw/vpx_tm_predictor_4x4 neon dspr2 msa/, "$sse2_x86inc"; +specialize qw/vpx_tm_predictor_4x4 neon dspr2 msa sse2/; add_proto qw/void vpx_dc_predictor_4x4/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left"; -specialize qw/vpx_dc_predictor_4x4 dspr2 msa neon/, "$sse2_x86inc"; +specialize qw/vpx_dc_predictor_4x4 dspr2 msa neon sse2/; add_proto qw/void vpx_dc_top_predictor_4x4/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left"; -specialize qw/vpx_dc_top_predictor_4x4 msa neon/, "$sse2_x86inc"; +specialize qw/vpx_dc_top_predictor_4x4 msa neon sse2/; add_proto qw/void vpx_dc_left_predictor_4x4/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left"; -specialize qw/vpx_dc_left_predictor_4x4 msa neon/, "$sse2_x86inc"; +specialize qw/vpx_dc_left_predictor_4x4 msa neon sse2/; add_proto qw/void vpx_dc_128_predictor_4x4/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left"; -specialize qw/vpx_dc_128_predictor_4x4 msa neon/, "$sse2_x86inc"; +specialize qw/vpx_dc_128_predictor_4x4 msa neon sse2/; add_proto qw/void vpx_d207_predictor_8x8/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left"; -specialize qw/vpx_d207_predictor_8x8/, "$ssse3_x86inc"; +specialize qw/vpx_d207_predictor_8x8 ssse3/; add_proto qw/void vpx_d207e_predictor_8x8/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left"; specialize qw/vpx_d207e_predictor_8x8/; add_proto qw/void vpx_d45_predictor_8x8/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left"; -specialize qw/vpx_d45_predictor_8x8 neon/, "$sse2_x86inc"; +specialize qw/vpx_d45_predictor_8x8 neon sse2/; add_proto qw/void vpx_d45e_predictor_8x8/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left"; specialize qw/vpx_d45e_predictor_8x8/; add_proto qw/void vpx_d63_predictor_8x8/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left"; -specialize qw/vpx_d63_predictor_8x8/, "$ssse3_x86inc"; +specialize qw/vpx_d63_predictor_8x8 ssse3/; add_proto qw/void vpx_d63e_predictor_8x8/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left"; specialize qw/vpx_d63e_predictor_8x8/; add_proto qw/void vpx_h_predictor_8x8/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left"; -specialize qw/vpx_h_predictor_8x8 neon dspr2 msa/, "$sse2_x86inc"; +specialize qw/vpx_h_predictor_8x8 neon dspr2 msa sse2/; add_proto qw/void vpx_d117_predictor_8x8/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left"; specialize qw/vpx_d117_predictor_8x8/; @@ -152,46 +129,46 @@ add_proto qw/void vpx_d135_predictor_8x8/, "uint8_t *dst, ptrdiff_t y_stride, co specialize qw/vpx_d135_predictor_8x8/; add_proto qw/void vpx_d153_predictor_8x8/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left"; -specialize qw/vpx_d153_predictor_8x8/, "$ssse3_x86inc"; +specialize qw/vpx_d153_predictor_8x8 ssse3/; add_proto qw/void vpx_v_predictor_8x8/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left"; -specialize qw/vpx_v_predictor_8x8 neon msa/, "$sse2_x86inc"; +specialize qw/vpx_v_predictor_8x8 neon msa sse2/; add_proto qw/void vpx_tm_predictor_8x8/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left"; -specialize qw/vpx_tm_predictor_8x8 neon dspr2 msa/, "$sse2_x86inc"; +specialize qw/vpx_tm_predictor_8x8 neon dspr2 msa sse2/; add_proto qw/void vpx_dc_predictor_8x8/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left"; -specialize qw/vpx_dc_predictor_8x8 dspr2 neon msa/, "$sse2_x86inc"; +specialize qw/vpx_dc_predictor_8x8 dspr2 neon msa sse2/; add_proto qw/void vpx_dc_top_predictor_8x8/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left"; -specialize qw/vpx_dc_top_predictor_8x8 neon msa/, "$sse2_x86inc"; +specialize qw/vpx_dc_top_predictor_8x8 neon msa sse2/; add_proto qw/void vpx_dc_left_predictor_8x8/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left"; -specialize qw/vpx_dc_left_predictor_8x8 neon msa/, "$sse2_x86inc"; +specialize qw/vpx_dc_left_predictor_8x8 neon msa sse2/; add_proto qw/void vpx_dc_128_predictor_8x8/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left"; -specialize qw/vpx_dc_128_predictor_8x8 neon msa/, "$sse2_x86inc"; +specialize qw/vpx_dc_128_predictor_8x8 neon msa sse2/; add_proto qw/void vpx_d207_predictor_16x16/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left"; -specialize qw/vpx_d207_predictor_16x16/, "$ssse3_x86inc"; +specialize qw/vpx_d207_predictor_16x16 ssse3/; add_proto qw/void vpx_d207e_predictor_16x16/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left"; specialize qw/vpx_d207e_predictor_16x16/; add_proto qw/void vpx_d45_predictor_16x16/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left"; -specialize qw/vpx_d45_predictor_16x16 neon/, "$ssse3_x86inc"; +specialize qw/vpx_d45_predictor_16x16 neon ssse3/; add_proto qw/void vpx_d45e_predictor_16x16/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left"; specialize qw/vpx_d45e_predictor_16x16/; add_proto qw/void vpx_d63_predictor_16x16/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left"; -specialize qw/vpx_d63_predictor_16x16/, "$ssse3_x86inc"; +specialize qw/vpx_d63_predictor_16x16 ssse3/; add_proto qw/void vpx_d63e_predictor_16x16/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left"; specialize qw/vpx_d63e_predictor_16x16/; add_proto qw/void vpx_h_predictor_16x16/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left"; -specialize qw/vpx_h_predictor_16x16 neon dspr2 msa/, "$sse2_x86inc"; +specialize qw/vpx_h_predictor_16x16 neon dspr2 msa sse2/; add_proto qw/void vpx_d117_predictor_16x16/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left"; specialize qw/vpx_d117_predictor_16x16/; @@ -200,46 +177,46 @@ add_proto qw/void vpx_d135_predictor_16x16/, "uint8_t *dst, ptrdiff_t y_stride, specialize qw/vpx_d135_predictor_16x16/; add_proto qw/void vpx_d153_predictor_16x16/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left"; -specialize qw/vpx_d153_predictor_16x16/, "$ssse3_x86inc"; +specialize qw/vpx_d153_predictor_16x16 ssse3/; add_proto qw/void vpx_v_predictor_16x16/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left"; -specialize qw/vpx_v_predictor_16x16 neon msa/, "$sse2_x86inc"; +specialize qw/vpx_v_predictor_16x16 neon msa sse2/; add_proto qw/void vpx_tm_predictor_16x16/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left"; -specialize qw/vpx_tm_predictor_16x16 neon msa/, "$sse2_x86inc"; +specialize qw/vpx_tm_predictor_16x16 neon msa sse2/; add_proto qw/void vpx_dc_predictor_16x16/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left"; -specialize qw/vpx_dc_predictor_16x16 dspr2 neon msa/, "$sse2_x86inc"; +specialize qw/vpx_dc_predictor_16x16 dspr2 neon msa sse2/; add_proto qw/void vpx_dc_top_predictor_16x16/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left"; -specialize qw/vpx_dc_top_predictor_16x16 neon msa/, "$sse2_x86inc"; +specialize qw/vpx_dc_top_predictor_16x16 neon msa sse2/; add_proto qw/void vpx_dc_left_predictor_16x16/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left"; -specialize qw/vpx_dc_left_predictor_16x16 neon msa/, "$sse2_x86inc"; +specialize qw/vpx_dc_left_predictor_16x16 neon msa sse2/; add_proto qw/void vpx_dc_128_predictor_16x16/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left"; -specialize qw/vpx_dc_128_predictor_16x16 neon msa/, "$sse2_x86inc"; +specialize qw/vpx_dc_128_predictor_16x16 neon msa sse2/; add_proto qw/void vpx_d207_predictor_32x32/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left"; -specialize qw/vpx_d207_predictor_32x32/, "$ssse3_x86inc"; +specialize qw/vpx_d207_predictor_32x32 ssse3/; add_proto qw/void vpx_d207e_predictor_32x32/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left"; specialize qw/vpx_d207e_predictor_32x32/; add_proto qw/void vpx_d45_predictor_32x32/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left"; -specialize qw/vpx_d45_predictor_32x32/, "$ssse3_x86inc"; +specialize qw/vpx_d45_predictor_32x32 ssse3/; add_proto qw/void vpx_d45e_predictor_32x32/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left"; specialize qw/vpx_d45e_predictor_32x32/; add_proto qw/void vpx_d63_predictor_32x32/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left"; -specialize qw/vpx_d63_predictor_32x32/, "$ssse3_x86inc"; +specialize qw/vpx_d63_predictor_32x32 ssse3/; add_proto qw/void vpx_d63e_predictor_32x32/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left"; specialize qw/vpx_d63e_predictor_32x32/; add_proto qw/void vpx_h_predictor_32x32/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left"; -specialize qw/vpx_h_predictor_32x32 neon msa/, "$sse2_x86inc"; +specialize qw/vpx_h_predictor_32x32 neon msa sse2/; add_proto qw/void vpx_d117_predictor_32x32/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left"; specialize qw/vpx_d117_predictor_32x32/; @@ -248,25 +225,25 @@ add_proto qw/void vpx_d135_predictor_32x32/, "uint8_t *dst, ptrdiff_t y_stride, specialize qw/vpx_d135_predictor_32x32/; add_proto qw/void vpx_d153_predictor_32x32/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left"; -specialize qw/vpx_d153_predictor_32x32/, "$ssse3_x86inc"; +specialize qw/vpx_d153_predictor_32x32 ssse3/; add_proto qw/void vpx_v_predictor_32x32/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left"; -specialize qw/vpx_v_predictor_32x32 neon msa/, "$sse2_x86inc"; +specialize qw/vpx_v_predictor_32x32 neon msa sse2/; add_proto qw/void vpx_tm_predictor_32x32/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left"; -specialize qw/vpx_tm_predictor_32x32 neon msa/, "$sse2_x86inc"; +specialize qw/vpx_tm_predictor_32x32 neon msa sse2/; add_proto qw/void vpx_dc_predictor_32x32/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left"; -specialize qw/vpx_dc_predictor_32x32 msa neon/, "$sse2_x86inc"; +specialize qw/vpx_dc_predictor_32x32 msa neon sse2/; add_proto qw/void vpx_dc_top_predictor_32x32/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left"; -specialize qw/vpx_dc_top_predictor_32x32 msa neon/, "$sse2_x86inc"; +specialize qw/vpx_dc_top_predictor_32x32 msa neon sse2/; add_proto qw/void vpx_dc_left_predictor_32x32/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left"; -specialize qw/vpx_dc_left_predictor_32x32 msa neon/, "$sse2_x86inc"; +specialize qw/vpx_dc_left_predictor_32x32 msa neon sse2/; add_proto qw/void vpx_dc_128_predictor_32x32/, "uint8_t *dst, ptrdiff_t y_stride, const uint8_t *above, const uint8_t *left"; -specialize qw/vpx_dc_128_predictor_32x32 msa neon/, "$sse2_x86inc"; +specialize qw/vpx_dc_128_predictor_32x32 msa neon sse2/; # High bitdepth functions if (vpx_config("CONFIG_VP9_HIGHBITDEPTH") eq "yes") { @@ -301,13 +278,13 @@ if (vpx_config("CONFIG_VP9_HIGHBITDEPTH") eq "yes") { specialize qw/vpx_highbd_d153_predictor_4x4/; add_proto qw/void vpx_highbd_v_predictor_4x4/, "uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd"; - specialize qw/vpx_highbd_v_predictor_4x4/, "$sse2_x86inc"; + specialize qw/vpx_highbd_v_predictor_4x4 sse2/; add_proto qw/void vpx_highbd_tm_predictor_4x4/, "uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd"; - specialize qw/vpx_highbd_tm_predictor_4x4/, "$sse2_x86inc"; + specialize qw/vpx_highbd_tm_predictor_4x4 sse2/; add_proto qw/void vpx_highbd_dc_predictor_4x4/, "uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd"; - specialize qw/vpx_highbd_dc_predictor_4x4/, "$sse2_x86inc"; + specialize qw/vpx_highbd_dc_predictor_4x4 sse2/; add_proto qw/void vpx_highbd_dc_top_predictor_4x4/, "uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd"; specialize qw/vpx_highbd_dc_top_predictor_4x4/; @@ -349,13 +326,13 @@ if (vpx_config("CONFIG_VP9_HIGHBITDEPTH") eq "yes") { specialize qw/vpx_highbd_d153_predictor_8x8/; add_proto qw/void vpx_highbd_v_predictor_8x8/, "uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd"; - specialize qw/vpx_highbd_v_predictor_8x8/, "$sse2_x86inc"; + specialize qw/vpx_highbd_v_predictor_8x8 sse2/; add_proto qw/void vpx_highbd_tm_predictor_8x8/, "uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd"; - specialize qw/vpx_highbd_tm_predictor_8x8/, "$sse2_x86inc"; + specialize qw/vpx_highbd_tm_predictor_8x8 sse2/; add_proto qw/void vpx_highbd_dc_predictor_8x8/, "uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd"; - specialize qw/vpx_highbd_dc_predictor_8x8/, "$sse2_x86inc";; + specialize qw/vpx_highbd_dc_predictor_8x8 sse2/;; add_proto qw/void vpx_highbd_dc_top_predictor_8x8/, "uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd"; specialize qw/vpx_highbd_dc_top_predictor_8x8/; @@ -397,13 +374,13 @@ if (vpx_config("CONFIG_VP9_HIGHBITDEPTH") eq "yes") { specialize qw/vpx_highbd_d153_predictor_16x16/; add_proto qw/void vpx_highbd_v_predictor_16x16/, "uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd"; - specialize qw/vpx_highbd_v_predictor_16x16/, "$sse2_x86inc"; + specialize qw/vpx_highbd_v_predictor_16x16 sse2/; add_proto qw/void vpx_highbd_tm_predictor_16x16/, "uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd"; - specialize qw/vpx_highbd_tm_predictor_16x16/, "$sse2_x86inc"; + specialize qw/vpx_highbd_tm_predictor_16x16 sse2/; add_proto qw/void vpx_highbd_dc_predictor_16x16/, "uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd"; - specialize qw/vpx_highbd_dc_predictor_16x16/, "$sse2_x86inc"; + specialize qw/vpx_highbd_dc_predictor_16x16 sse2/; add_proto qw/void vpx_highbd_dc_top_predictor_16x16/, "uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd"; specialize qw/vpx_highbd_dc_top_predictor_16x16/; @@ -445,13 +422,13 @@ if (vpx_config("CONFIG_VP9_HIGHBITDEPTH") eq "yes") { specialize qw/vpx_highbd_d153_predictor_32x32/; add_proto qw/void vpx_highbd_v_predictor_32x32/, "uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd"; - specialize qw/vpx_highbd_v_predictor_32x32/, "$sse2_x86inc"; + specialize qw/vpx_highbd_v_predictor_32x32 sse2/; add_proto qw/void vpx_highbd_tm_predictor_32x32/, "uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd"; - specialize qw/vpx_highbd_tm_predictor_32x32/, "$sse2_x86inc"; + specialize qw/vpx_highbd_tm_predictor_32x32 sse2/; add_proto qw/void vpx_highbd_dc_predictor_32x32/, "uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd"; - specialize qw/vpx_highbd_dc_predictor_32x32/, "$sse2_x86inc"; + specialize qw/vpx_highbd_dc_predictor_32x32 sse2/; add_proto qw/void vpx_highbd_dc_top_predictor_32x32/, "uint16_t *dst, ptrdiff_t y_stride, const uint16_t *above, const uint16_t *left, int bd"; specialize qw/vpx_highbd_dc_top_predictor_32x32/; @@ -481,8 +458,8 @@ add_proto qw/void vpx_scaled_avg_2d/, "const uint8_t *src, ptrdiff_t src_s add_proto qw/void vpx_scaled_avg_horiz/, "const uint8_t *src, ptrdiff_t src_stride, uint8_t *dst, ptrdiff_t dst_stride, const int16_t *filter_x, int x_step_q4, const int16_t *filter_y, int y_step_q4, int w, int h"; add_proto qw/void vpx_scaled_avg_vert/, "const uint8_t *src, ptrdiff_t src_stride, uint8_t *dst, ptrdiff_t dst_stride, const int16_t *filter_x, int x_step_q4, const int16_t *filter_y, int y_step_q4, int w, int h"; -specialize qw/vpx_convolve_copy /, "$sse2_x86inc"; -specialize qw/vpx_convolve_avg /, "$sse2_x86inc"; +specialize qw/vpx_convolve_copy sse2 /; +specialize qw/vpx_convolve_avg sse2 /; specialize qw/vpx_convolve8 sse2 ssse3/, "$avx2_ssse3"; specialize qw/vpx_convolve8_horiz sse2 ssse3/, "$avx2_ssse3"; specialize qw/vpx_convolve8_vert sse2 ssse3/, "$avx2_ssse3"; @@ -505,10 +482,10 @@ if (!(vpx_config("CONFIG_VP10") eq "yes" && vpx_config("CONFIG_EXT_PARTITION") e if (vpx_config("CONFIG_VP9_HIGHBITDEPTH") eq "yes") { add_proto qw/void vpx_highbd_convolve_copy/, "const uint8_t *src, ptrdiff_t src_stride, uint8_t *dst, ptrdiff_t dst_stride, const int16_t *filter_x, int x_step_q4, const int16_t *filter_y, int y_step_q4, int w, int h, int bps"; - specialize qw/vpx_highbd_convolve_copy/, "$sse2_x86inc"; + specialize qw/vpx_highbd_convolve_copy sse2/; add_proto qw/void vpx_highbd_convolve_avg/, "const uint8_t *src, ptrdiff_t src_stride, uint8_t *dst, ptrdiff_t dst_stride, const int16_t *filter_x, int x_step_q4, const int16_t *filter_y, int y_step_q4, int w, int h, int bps"; - specialize qw/vpx_highbd_convolve_avg/, "$sse2_x86inc"; + specialize qw/vpx_highbd_convolve_avg sse2/; add_proto qw/void vpx_highbd_convolve8/, "const uint8_t *src, ptrdiff_t src_stride, uint8_t *dst, ptrdiff_t dst_stride, const int16_t *filter_x, int x_step_q4, const int16_t *filter_y, int y_step_q4, int w, int h, int bps"; specialize qw/vpx_highbd_convolve8/, "$sse2_x86_64"; @@ -679,7 +656,7 @@ if (vpx_config("CONFIG_VP9_HIGHBITDEPTH") eq "yes") { specialize qw/vpx_fdct4x4_1 sse2/; add_proto qw/void vpx_fdct8x8/, "const int16_t *input, tran_low_t *output, int stride"; - specialize qw/vpx_fdct8x8 sse2 neon msa/, "$ssse3_x86_64_x86inc"; + specialize qw/vpx_fdct8x8 sse2 neon msa/, "$ssse3_x86_64"; add_proto qw/void vpx_fdct8x8_1/, "const int16_t *input, tran_low_t *output, int stride"; specialize qw/vpx_fdct8x8_1 sse2 neon msa/; @@ -711,7 +688,7 @@ if (vpx_config("CONFIG_VP9_HIGHBITDEPTH") eq "yes") { specialize qw/vpx_iwht4x4_1_add/; add_proto qw/void vpx_iwht4x4_16_add/, "const tran_low_t *input, uint8_t *dest, int dest_stride"; - specialize qw/vpx_iwht4x4_16_add/, "$sse2_x86inc"; + specialize qw/vpx_iwht4x4_16_add sse2/; add_proto qw/void vpx_highbd_idct4x4_1_add/, "const tran_low_t *input, uint8_t *dest, int dest_stride, int bd"; specialize qw/vpx_highbd_idct4x4_1_add/; @@ -797,10 +774,10 @@ if (vpx_config("CONFIG_VP9_HIGHBITDEPTH") eq "yes") { specialize qw/vpx_idct4x4_1_add sse2/; add_proto qw/void vpx_idct8x8_64_add/, "const tran_low_t *input, uint8_t *dest, int dest_stride"; - specialize qw/vpx_idct8x8_64_add sse2/, "$ssse3_x86_64_x86inc"; + specialize qw/vpx_idct8x8_64_add sse2/, "$ssse3_x86_64"; add_proto qw/void vpx_idct8x8_12_add/, "const tran_low_t *input, uint8_t *dest, int dest_stride"; - specialize qw/vpx_idct8x8_12_add sse2/, "$ssse3_x86_64_x86inc"; + specialize qw/vpx_idct8x8_12_add sse2/, "$ssse3_x86_64"; add_proto qw/void vpx_idct8x8_1_add/, "const tran_low_t *input, uint8_t *dest, int dest_stride"; specialize qw/vpx_idct8x8_1_add sse2/; @@ -815,15 +792,15 @@ if (vpx_config("CONFIG_VP9_HIGHBITDEPTH") eq "yes") { specialize qw/vpx_idct16x16_1_add sse2/; add_proto qw/void vpx_idct32x32_1024_add/, "const tran_low_t *input, uint8_t *dest, int dest_stride"; - specialize qw/vpx_idct32x32_1024_add sse2/, "$ssse3_x86_64_x86inc"; + specialize qw/vpx_idct32x32_1024_add sse2/, "$ssse3_x86_64"; add_proto qw/void vpx_idct32x32_135_add/, "const tran_low_t *input, uint8_t *dest, int dest_stride"; - specialize qw/vpx_idct32x32_135_add sse2/, "$ssse3_x86_64_x86inc"; + specialize qw/vpx_idct32x32_135_add sse2/, "$ssse3_x86_64"; # Need to add 135 eob idct32x32 implementations. $vpx_idct32x32_135_add_sse2=vpx_idct32x32_1024_add_sse2; add_proto qw/void vpx_idct32x32_34_add/, "const tran_low_t *input, uint8_t *dest, int dest_stride"; - specialize qw/vpx_idct32x32_34_add sse2/, "$ssse3_x86_64_x86inc"; + specialize qw/vpx_idct32x32_34_add sse2/, "$ssse3_x86_64"; add_proto qw/void vpx_idct32x32_1_add/, "const tran_low_t *input, uint8_t *dest, int dest_stride"; specialize qw/vpx_idct32x32_1_add sse2/; @@ -898,10 +875,10 @@ if (vpx_config("CONFIG_VP9_HIGHBITDEPTH") eq "yes") { specialize qw/vpx_idct8x8_1_add sse2 neon dspr2 msa/; add_proto qw/void vpx_idct8x8_64_add/, "const tran_low_t *input, uint8_t *dest, int dest_stride"; - specialize qw/vpx_idct8x8_64_add sse2 neon dspr2 msa/, "$ssse3_x86_64_x86inc"; + specialize qw/vpx_idct8x8_64_add sse2 neon dspr2 msa/, "$ssse3_x86_64"; add_proto qw/void vpx_idct8x8_12_add/, "const tran_low_t *input, uint8_t *dest, int dest_stride"; - specialize qw/vpx_idct8x8_12_add sse2 neon dspr2 msa/, "$ssse3_x86_64_x86inc"; + specialize qw/vpx_idct8x8_12_add sse2 neon dspr2 msa/, "$ssse3_x86_64"; add_proto qw/void vpx_idct16x16_1_add/, "const tran_low_t *input, uint8_t *dest, int dest_stride"; specialize qw/vpx_idct16x16_1_add sse2 neon dspr2 msa/; @@ -913,10 +890,10 @@ if (vpx_config("CONFIG_VP9_HIGHBITDEPTH") eq "yes") { specialize qw/vpx_idct16x16_10_add sse2 neon dspr2 msa/; add_proto qw/void vpx_idct32x32_1024_add/, "const tran_low_t *input, uint8_t *dest, int dest_stride"; - specialize qw/vpx_idct32x32_1024_add sse2 neon dspr2 msa/, "$ssse3_x86_64_x86inc"; + specialize qw/vpx_idct32x32_1024_add sse2 neon dspr2 msa/, "$ssse3_x86_64"; add_proto qw/void vpx_idct32x32_135_add/, "const tran_low_t *input, uint8_t *dest, int dest_stride"; - specialize qw/vpx_idct32x32_135_add sse2 neon dspr2 msa/, "$ssse3_x86_64_x86inc"; + specialize qw/vpx_idct32x32_135_add sse2 neon dspr2 msa/, "$ssse3_x86_64"; # Need to add 135 eob idct32x32 implementations. $vpx_idct32x32_135_add_sse2=vpx_idct32x32_1024_add_sse2; $vpx_idct32x32_135_add_neon=vpx_idct32x32_1024_add_neon; @@ -924,7 +901,7 @@ if (vpx_config("CONFIG_VP9_HIGHBITDEPTH") eq "yes") { $vpx_idct32x32_135_add_msa=vpx_idct32x32_1024_add_msa; add_proto qw/void vpx_idct32x32_34_add/, "const tran_low_t *input, uint8_t *dest, int dest_stride"; - specialize qw/vpx_idct32x32_34_add sse2 neon dspr2 msa/, "$ssse3_x86_64_x86inc"; + specialize qw/vpx_idct32x32_34_add sse2 neon dspr2 msa/, "$ssse3_x86_64"; # Need to add 34 eob idct32x32 neon implementation. $vpx_idct32x32_34_add_neon=vpx_idct32x32_1024_add_neon; @@ -935,7 +912,7 @@ if (vpx_config("CONFIG_VP9_HIGHBITDEPTH") eq "yes") { specialize qw/vpx_iwht4x4_1_add msa/; add_proto qw/void vpx_iwht4x4_16_add/, "const tran_low_t *input, uint8_t *dest, int dest_stride"; - specialize qw/vpx_iwht4x4_16_add msa/, "$sse2_x86inc"; + specialize qw/vpx_iwht4x4_16_add msa sse2/; } # CONFIG_EMULATE_HARDWARE } # CONFIG_VP9_HIGHBITDEPTH } # CONFIG_VP9 || CONFIG_VP10 @@ -945,10 +922,10 @@ if (vpx_config("CONFIG_VP9_HIGHBITDEPTH") eq "yes") { # if ((vpx_config("CONFIG_VP9_ENCODER") eq "yes") || (vpx_config("CONFIG_VP10_ENCODER") eq "yes")) { add_proto qw/void vpx_quantize_b/, "const tran_low_t *coeff_ptr, intptr_t n_coeffs, int skip_block, const int16_t *zbin_ptr, const int16_t *round_ptr, const int16_t *quant_ptr, const int16_t *quant_shift_ptr, tran_low_t *qcoeff_ptr, tran_low_t *dqcoeff_ptr, const int16_t *dequant_ptr, uint16_t *eob_ptr, const int16_t *scan, const int16_t *iscan"; - specialize qw/vpx_quantize_b sse2/, "$ssse3_x86_64_x86inc", "$avx_x86_64_x86inc"; + specialize qw/vpx_quantize_b sse2/, "$ssse3_x86_64", "$avx_x86_64"; add_proto qw/void vpx_quantize_b_32x32/, "const tran_low_t *coeff_ptr, intptr_t n_coeffs, int skip_block, const int16_t *zbin_ptr, const int16_t *round_ptr, const int16_t *quant_ptr, const int16_t *quant_shift_ptr, tran_low_t *qcoeff_ptr, tran_low_t *dqcoeff_ptr, const int16_t *dequant_ptr, uint16_t *eob_ptr, const int16_t *scan, const int16_t *iscan"; - specialize qw/vpx_quantize_b_32x32/, "$ssse3_x86_64_x86inc", "$avx_x86_64_x86inc"; + specialize qw/vpx_quantize_b_32x32/, "$ssse3_x86_64", "$avx_x86_64"; if (vpx_config("CONFIG_VP9_HIGHBITDEPTH") eq "yes") { add_proto qw/void vpx_highbd_quantize_b/, "const tran_low_t *coeff_ptr, intptr_t n_coeffs, int skip_block, const int16_t *zbin_ptr, const int16_t *round_ptr, const int16_t *quant_ptr, const int16_t *quant_shift_ptr, tran_low_t *qcoeff_ptr, tran_low_t *dqcoeff_ptr, const int16_t *dequant_ptr, uint16_t *eob_ptr, const int16_t *scan, const int16_t *iscan"; @@ -985,51 +962,60 @@ if (vpx_config("CONFIG_ENCODERS") eq "yes") { # Block subtraction # add_proto qw/void vpx_subtract_block/, "int rows, int cols, int16_t *diff_ptr, ptrdiff_t diff_stride, const uint8_t *src_ptr, ptrdiff_t src_stride, const uint8_t *pred_ptr, ptrdiff_t pred_stride"; -specialize qw/vpx_subtract_block neon msa/, "$sse2_x86inc"; +specialize qw/vpx_subtract_block neon msa sse2/; if (vpx_config("CONFIG_VP10_ENCODER") eq "yes") { - # - # Sum of Squares - # - add_proto qw/uint64_t vpx_sum_squares_2d_i16/, "const int16_t *src, int stride, int size"; - specialize qw/vpx_sum_squares_2d_i16 sse2/; +# +# Sum of Squares +# +add_proto qw/uint64_t vpx_sum_squares_2d_i16/, "const int16_t *src, int stride, int size"; +specialize qw/vpx_sum_squares_2d_i16 sse2/; - add_proto qw/uint64_t vpx_sum_squares_i16/, "const int16_t *src, uint32_t N"; - specialize qw/vpx_sum_squares_i16 sse2/; +add_proto qw/uint64_t vpx_sum_squares_i16/, "const int16_t *src, uint32_t N"; +specialize qw/vpx_sum_squares_i16 sse2/; } + +# Single block SAD +# +add_proto qw/unsigned int vpx_sad64x64/, "const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride"; +specialize qw/vpx_sad64x64 avx2 neon msa sse2/; + +add_proto qw/unsigned int vpx_sad64x32/, "const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride"; +specialize qw/vpx_sad64x32 avx2 msa sse2/; + add_proto qw/unsigned int vpx_sad32x64/, "const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride"; -specialize qw/vpx_sad32x64 avx2 msa/, "$sse2_x86inc"; +specialize qw/vpx_sad32x64 avx2 msa sse2/; add_proto qw/unsigned int vpx_sad32x32/, "const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride"; -specialize qw/vpx_sad32x32 avx2 neon msa/, "$sse2_x86inc"; +specialize qw/vpx_sad32x32 avx2 neon msa sse2/; add_proto qw/unsigned int vpx_sad32x16/, "const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride"; -specialize qw/vpx_sad32x16 avx2 msa/, "$sse2_x86inc"; +specialize qw/vpx_sad32x16 avx2 msa sse2/; add_proto qw/unsigned int vpx_sad16x32/, "const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride"; -specialize qw/vpx_sad16x32 msa/, "$sse2_x86inc"; +specialize qw/vpx_sad16x32 msa sse2/; add_proto qw/unsigned int vpx_sad16x16/, "const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride"; -specialize qw/vpx_sad16x16 media neon msa/, "$sse2_x86inc"; +specialize qw/vpx_sad16x16 media neon msa sse2/; add_proto qw/unsigned int vpx_sad16x8/, "const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride"; -specialize qw/vpx_sad16x8 neon msa/, "$sse2_x86inc"; +specialize qw/vpx_sad16x8 neon msa sse2/; add_proto qw/unsigned int vpx_sad8x16/, "const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride"; -specialize qw/vpx_sad8x16 neon msa/, "$sse2_x86inc"; +specialize qw/vpx_sad8x16 neon msa sse2/; add_proto qw/unsigned int vpx_sad8x8/, "const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride"; -specialize qw/vpx_sad8x8 neon msa/, "$sse2_x86inc"; +specialize qw/vpx_sad8x8 neon msa sse2/; add_proto qw/unsigned int vpx_sad8x4/, "const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride"; -specialize qw/vpx_sad8x4 msa/, "$sse2_x86inc"; +specialize qw/vpx_sad8x4 msa sse2/; add_proto qw/unsigned int vpx_sad4x8/, "const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride"; -specialize qw/vpx_sad4x8 msa/, "$sse2_x86inc"; +specialize qw/vpx_sad4x8 msa sse2/; add_proto qw/unsigned int vpx_sad4x4/, "const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride"; -specialize qw/vpx_sad4x4 neon msa/, "$sse2_x86inc"; +specialize qw/vpx_sad4x4 neon msa sse2/; # # Avg @@ -1062,7 +1048,7 @@ if ((vpx_config("CONFIG_VP9_ENCODER") eq "yes") || (vpx_config("CONFIG_VP10_ENCO } add_proto qw/void vpx_hadamard_8x8/, "const int16_t *src_diff, int src_stride, int16_t *coeff"; - specialize qw/vpx_hadamard_8x8 sse2 neon/, "$ssse3_x86_64_x86inc"; + specialize qw/vpx_hadamard_8x8 sse2 neon/, "$ssse3_x86_64"; add_proto qw/void vpx_hadamard_16x16/, "const int16_t *src_diff, int src_stride, int16_t *coeff"; specialize qw/vpx_hadamard_16x16 sse2 neon/; @@ -1089,39 +1075,39 @@ foreach (@block_sizes) { add_proto qw/unsigned int/, "vpx_sad${w}x${h}_avg", "const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred"; } -specialize qw/vpx_sad128x128 /, "$sse2_x86inc"; -specialize qw/vpx_sad128x64 /, "$sse2_x86inc"; -specialize qw/vpx_sad64x128 /, "$sse2_x86inc"; -specialize qw/vpx_sad64x64 avx2 neon msa/, "$sse2_x86inc"; -specialize qw/vpx_sad64x32 avx2 msa/, "$sse2_x86inc"; -specialize qw/vpx_sad32x64 avx2 msa/, "$sse2_x86inc"; -specialize qw/vpx_sad32x32 avx2 neon msa/, "$sse2_x86inc"; -specialize qw/vpx_sad32x16 avx2 msa/, "$sse2_x86inc"; -specialize qw/vpx_sad16x32 msa/, "$sse2_x86inc"; -specialize qw/vpx_sad16x16 media neon msa/, "$sse2_x86inc"; -specialize qw/vpx_sad16x8 neon msa/, "$sse2_x86inc"; -specialize qw/vpx_sad8x16 neon msa/, "$sse2_x86inc"; -specialize qw/vpx_sad8x8 neon msa/, "$sse2_x86inc"; -specialize qw/vpx_sad8x4 msa/, "$sse2_x86inc"; -specialize qw/vpx_sad4x8 msa/, "$sse2_x86inc"; -specialize qw/vpx_sad4x4 neon msa/, "$sse2_x86inc"; - -specialize qw/vpx_sad128x128_avg /, "$sse2_x86inc"; -specialize qw/vpx_sad128x64_avg /, "$sse2_x86inc"; -specialize qw/vpx_sad64x128_avg /, "$sse2_x86inc"; -specialize qw/vpx_sad64x64_avg avx2 msa/, "$sse2_x86inc"; -specialize qw/vpx_sad64x32_avg avx2 msa/, "$sse2_x86inc"; -specialize qw/vpx_sad32x64_avg avx2 msa/, "$sse2_x86inc"; -specialize qw/vpx_sad32x32_avg avx2 msa/, "$sse2_x86inc"; -specialize qw/vpx_sad32x16_avg avx2 msa/, "$sse2_x86inc"; -specialize qw/vpx_sad16x32_avg msa/, "$sse2_x86inc"; -specialize qw/vpx_sad16x16_avg msa/, "$sse2_x86inc"; -specialize qw/vpx_sad16x8_avg msa/, "$sse2_x86inc"; -specialize qw/vpx_sad8x16_avg msa/, "$sse2_x86inc"; -specialize qw/vpx_sad8x8_avg msa/, "$sse2_x86inc"; -specialize qw/vpx_sad8x4_avg msa/, "$sse2_x86inc"; -specialize qw/vpx_sad4x8_avg msa/, "$sse2_x86inc"; -specialize qw/vpx_sad4x4_avg msa/, "$sse2_x86inc"; +specialize qw/vpx_sad128x128 sse2/; +specialize qw/vpx_sad128x64 sse2/; +specialize qw/vpx_sad64x128 sse2/; +specialize qw/vpx_sad64x64 avx2 msa sse2/; +specialize qw/vpx_sad64x32 avx2 msa sse2/; +specialize qw/vpx_sad32x64 avx2 msa sse2/; +specialize qw/vpx_sad32x32 avx2 neon msa sse2/; +specialize qw/vpx_sad32x16 avx2 msa sse2/; +specialize qw/vpx_sad16x32 msa sse2/; +specialize qw/vpx_sad16x16 media neon msa sse2/; +specialize qw/vpx_sad16x8 neon msa sse2/; +specialize qw/vpx_sad8x16 neon msa sse2/; +specialize qw/vpx_sad8x8 neon msa sse2/; +specialize qw/vpx_sad8x4 msa sse2/; +specialize qw/vpx_sad4x8 msa sse2/; +specialize qw/vpx_sad4x4 neon msa sse2/; + +specialize qw/vpx_sad128x128_avg sse2/; +specialize qw/vpx_sad128x64_avg sse2/; +specialize qw/vpx_sad64x128_avg sse2/; +specialize qw/vpx_sad64x64_avg avx2 msa sse2/; +specialize qw/vpx_sad64x32_avg avx2 msa sse2/; +specialize qw/vpx_sad32x64_avg avx2 msa sse2/; +specialize qw/vpx_sad32x32_avg avx2 msa sse2/; +specialize qw/vpx_sad32x16_avg avx2 msa sse2/; +specialize qw/vpx_sad16x32_avg msa sse2/; +specialize qw/vpx_sad16x16_avg msa sse2/; +specialize qw/vpx_sad16x8_avg msa sse2/; +specialize qw/vpx_sad8x16_avg msa sse2/; +specialize qw/vpx_sad8x8_avg msa sse2/; +specialize qw/vpx_sad8x4_avg msa sse2/; +specialize qw/vpx_sad4x8_avg msa sse2/; +specialize qw/vpx_sad4x4_avg msa sse2/; if (vpx_config("CONFIG_VP9_HIGHBITDEPTH") eq "yes") { foreach (@block_sizes) { @@ -1129,8 +1115,8 @@ if (vpx_config("CONFIG_VP9_HIGHBITDEPTH") eq "yes") { add_proto qw/unsigned int/, "vpx_highbd_sad${w}x${h}", "const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride"; add_proto qw/unsigned int/, "vpx_highbd_sad${w}x${h}_avg", "const uint8_t *src_ptr, int src_stride, const uint8_t *ref_ptr, int ref_stride, const uint8_t *second_pred"; if ($w != 128 && $h != 128 && $w != 4) { - specialize "vpx_highbd_sad${w}x${h}", "$sse2_x86inc"; - specialize "vpx_highbd_sad${w}x${h}_avg", "$sse2_x86inc"; + specialize "vpx_highbd_sad${w}x${h}", qw/sse2/; + specialize "vpx_highbd_sad${w}x${h}_avg", qw/sse2/; } } } @@ -1235,22 +1221,22 @@ foreach (@block_sizes) { add_proto qw/void/, "vpx_sad${w}x${h}x4d", "const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array"; } -specialize qw/vpx_sad128x128x4d /, "$sse2_x86inc"; -specialize qw/vpx_sad128x64x4d /, "$sse2_x86inc"; -specialize qw/vpx_sad64x128x4d /, "$sse2_x86inc"; -specialize qw/vpx_sad64x64x4d avx2 neon msa/, "$sse2_x86inc"; -specialize qw/vpx_sad64x32x4d msa/, "$sse2_x86inc"; -specialize qw/vpx_sad32x64x4d msa/, "$sse2_x86inc"; -specialize qw/vpx_sad32x32x4d avx2 neon msa/, "$sse2_x86inc"; -specialize qw/vpx_sad32x16x4d msa/, "$sse2_x86inc"; -specialize qw/vpx_sad16x32x4d msa/, "$sse2_x86inc"; -specialize qw/vpx_sad16x16x4d neon msa/, "$sse2_x86inc"; -specialize qw/vpx_sad16x8x4d msa/, "$sse2_x86inc"; -specialize qw/vpx_sad8x16x4d msa/, "$sse2_x86inc"; -specialize qw/vpx_sad8x8x4d msa/, "$sse2_x86inc"; -specialize qw/vpx_sad8x4x4d msa/, "$sse2_x86inc"; -specialize qw/vpx_sad4x8x4d msa/, "$sse2_x86inc"; -specialize qw/vpx_sad4x4x4d msa/, "$sse2_x86inc"; +specialize qw/vpx_sad128x128x4d sse2/; +specialize qw/vpx_sad128x64x4d sse2/; +specialize qw/vpx_sad64x128x4d sse2/; +specialize qw/vpx_sad64x64x4d avx2 neon msa sse2/; +specialize qw/vpx_sad64x32x4d msa sse2/; +specialize qw/vpx_sad32x64x4d msa sse2/; +specialize qw/vpx_sad32x32x4d avx2 neon msa sse2/; +specialize qw/vpx_sad32x16x4d msa sse2/; +specialize qw/vpx_sad16x32x4d msa sse2/; +specialize qw/vpx_sad16x16x4d neon msa sse2/; +specialize qw/vpx_sad16x8x4d msa sse2/; +specialize qw/vpx_sad8x16x4d msa sse2/; +specialize qw/vpx_sad8x8x4d msa sse2/; +specialize qw/vpx_sad8x4x4d msa sse2/; +specialize qw/vpx_sad4x8x4d msa sse2/; +specialize qw/vpx_sad4x4x4d msa sse2/; if (vpx_config("CONFIG_VP9_HIGHBITDEPTH") eq "yes") { # @@ -1260,7 +1246,7 @@ if (vpx_config("CONFIG_VP9_HIGHBITDEPTH") eq "yes") { ($w, $h) = @$_; add_proto qw/void/, "vpx_highbd_sad${w}x${h}x4d", "const uint8_t *src_ptr, int src_stride, const uint8_t * const ref_ptr[], int ref_stride, uint32_t *sad_array"; if ($w != 128 && $h != 128) { - specialize "vpx_highbd_sad${w}x${h}x4d", "$sse2_x86inc"; + specialize "vpx_highbd_sad${w}x${h}x4d", qw/sse2/; } } } @@ -1409,34 +1395,33 @@ specialize qw/vpx_variance8x4 sse2 msa/; specialize qw/vpx_variance4x8 sse2 msa/; specialize qw/vpx_variance4x4 sse2 msa/; -specialize qw/vpx_sub_pixel_variance64x64 avx2 neon msa/, "$sse2_x86inc", "$ssse3_x86inc"; -specialize qw/vpx_sub_pixel_variance64x32 msa/, "$sse2_x86inc", "$ssse3_x86inc"; -specialize qw/vpx_sub_pixel_variance32x64 msa/, "$sse2_x86inc", "$ssse3_x86inc"; -specialize qw/vpx_sub_pixel_variance32x32 avx2 neon msa/, "$sse2_x86inc", "$ssse3_x86inc"; -specialize qw/vpx_sub_pixel_variance32x16 msa/, "$sse2_x86inc", "$ssse3_x86inc"; -specialize qw/vpx_sub_pixel_variance16x32 msa/, "$sse2_x86inc", "$ssse3_x86inc"; -specialize qw/vpx_sub_pixel_variance16x16 media neon msa/, "$sse2_x86inc", "$ssse3_x86inc"; -specialize qw/vpx_sub_pixel_variance16x8 msa/, "$sse2_x86inc", "$ssse3_x86inc"; -specialize qw/vpx_sub_pixel_variance8x16 msa/, "$sse2_x86inc", "$ssse3_x86inc"; -specialize qw/vpx_sub_pixel_variance8x8 media neon msa/, "$sse2_x86inc", "$ssse3_x86inc"; -specialize qw/vpx_sub_pixel_variance8x4 msa/, "$sse2_x86inc", "$ssse3_x86inc"; -specialize qw/vpx_sub_pixel_variance4x8 msa/, "$sse2_x86inc", "$ssse3_x86inc"; -specialize qw/vpx_sub_pixel_variance4x4 msa/, "$sse2_x86inc", "$ssse3_x86inc"; - -specialize qw/vpx_sub_pixel_avg_variance64x64 avx2 msa/, "$sse2_x86inc", "$ssse3_x86inc"; -specialize qw/vpx_sub_pixel_avg_variance64x32 msa/, "$sse2_x86inc", "$ssse3_x86inc"; -specialize qw/vpx_sub_pixel_avg_variance32x64 msa/, "$sse2_x86inc", "$ssse3_x86inc"; -specialize qw/vpx_sub_pixel_avg_variance32x32 avx2 msa/, "$sse2_x86inc", "$ssse3_x86inc"; -specialize qw/vpx_sub_pixel_avg_variance32x16 msa/, "$sse2_x86inc", "$ssse3_x86inc"; -specialize qw/vpx_sub_pixel_avg_variance16x32 msa/, "$sse2_x86inc", "$ssse3_x86inc"; -specialize qw/vpx_sub_pixel_avg_variance16x16 msa/, "$sse2_x86inc", "$ssse3_x86inc"; -specialize qw/vpx_sub_pixel_avg_variance16x8 msa/, "$sse2_x86inc", "$ssse3_x86inc"; -specialize qw/vpx_sub_pixel_avg_variance8x16 msa/, "$sse2_x86inc", "$ssse3_x86inc"; -specialize qw/vpx_sub_pixel_avg_variance8x8 msa/, "$sse2_x86inc", "$ssse3_x86inc"; -specialize qw/vpx_sub_pixel_avg_variance8x4 msa/, "$sse2_x86inc", "$ssse3_x86inc"; -specialize qw/vpx_sub_pixel_avg_variance4x8 msa/, "$sse2_x86inc", "$ssse3_x86inc"; -specialize qw/vpx_sub_pixel_avg_variance4x4 msa/, "$sse2_x86inc", "$ssse3_x86inc"; - +specialize qw/vpx_sub_pixel_variance64x64 avx2 neon msa sse2 ssse3/; +specialize qw/vpx_sub_pixel_variance64x32 msa sse2 ssse3/; +specialize qw/vpx_sub_pixel_variance32x64 msa sse2 ssse3/; +specialize qw/vpx_sub_pixel_variance32x32 avx2 neon msa sse2 ssse3/; +specialize qw/vpx_sub_pixel_variance32x16 msa sse2 ssse3/; +specialize qw/vpx_sub_pixel_variance16x32 msa sse2 ssse3/; +specialize qw/vpx_sub_pixel_variance16x16 media neon msa sse2 ssse3/; +specialize qw/vpx_sub_pixel_variance16x8 msa sse2 ssse3/; +specialize qw/vpx_sub_pixel_variance8x16 msa sse2 ssse3/; +specialize qw/vpx_sub_pixel_variance8x8 media neon msa sse2 ssse3/; +specialize qw/vpx_sub_pixel_variance8x4 msa sse2 ssse3/; +specialize qw/vpx_sub_pixel_variance4x8 msa sse2 ssse3/; +specialize qw/vpx_sub_pixel_variance4x4 msa sse2 ssse3/; + +specialize qw/vpx_sub_pixel_avg_variance64x64 avx2 msa sse2 ssse3/; +specialize qw/vpx_sub_pixel_avg_variance64x32 msa sse2 ssse3/; +specialize qw/vpx_sub_pixel_avg_variance32x64 msa sse2 ssse3/; +specialize qw/vpx_sub_pixel_avg_variance32x32 avx2 msa sse2 ssse3/; +specialize qw/vpx_sub_pixel_avg_variance32x16 msa sse2 ssse3/; +specialize qw/vpx_sub_pixel_avg_variance16x32 msa sse2 ssse3/; +specialize qw/vpx_sub_pixel_avg_variance16x16 msa sse2 ssse3/; +specialize qw/vpx_sub_pixel_avg_variance16x8 msa sse2 ssse3/; +specialize qw/vpx_sub_pixel_avg_variance8x16 msa sse2 ssse3/; +specialize qw/vpx_sub_pixel_avg_variance8x8 msa sse2 ssse3/; +specialize qw/vpx_sub_pixel_avg_variance8x4 msa sse2 ssse3/; +specialize qw/vpx_sub_pixel_avg_variance4x8 msa sse2 ssse3/; +specialize qw/vpx_sub_pixel_avg_variance4x4 msa sse2 ssse3/; if (vpx_config("CONFIG_VP9_HIGHBITDEPTH") eq "yes") { foreach $bd (8, 10, 12) { foreach (@block_sizes) { @@ -1451,8 +1436,8 @@ if (vpx_config("CONFIG_VP9_HIGHBITDEPTH") eq "yes") { specialize "vpx_highbd_${bd}_variance${w}x${h}", "sse4_1"; } if ($w != 128 && $h != 128 && $w != 4) { - specialize "vpx_highbd_${bd}_sub_pixel_variance${w}x${h}", $sse2_x86inc; - specialize "vpx_highbd_${bd}_sub_pixel_avg_variance${w}x${h}", $sse2_x86inc; + specialize "vpx_highbd_${bd}_sub_pixel_variance${w}x${h}", qw/sse2/; + specialize "vpx_highbd_${bd}_sub_pixel_avg_variance${w}x${h}", qw/sse2/; } if ($w == 4 && $h == 4) { specialize "vpx_highbd_${bd}_sub_pixel_variance${w}x${h}", "sse4_1"; @@ -1513,43 +1498,43 @@ if (vpx_config("CONFIG_OBMC") eq "yes") { } add_proto qw/uint32_t vpx_sub_pixel_avg_variance64x64/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred"; - specialize qw/vpx_sub_pixel_avg_variance64x64 avx2 msa/, "$sse2_x86inc", "$ssse3_x86inc"; + specialize qw/vpx_sub_pixel_avg_variance64x64 avx2 msa sse2 ssse3/; add_proto qw/uint32_t vpx_sub_pixel_avg_variance64x32/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred"; - specialize qw/vpx_sub_pixel_avg_variance64x32 msa/, "$sse2_x86inc", "$ssse3_x86inc"; + specialize qw/vpx_sub_pixel_avg_variance64x32 msa sse2 ssse3/; add_proto qw/uint32_t vpx_sub_pixel_avg_variance32x64/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred"; - specialize qw/vpx_sub_pixel_avg_variance32x64 msa/, "$sse2_x86inc", "$ssse3_x86inc"; + specialize qw/vpx_sub_pixel_avg_variance32x64 msa sse2 ssse3/; add_proto qw/uint32_t vpx_sub_pixel_avg_variance32x32/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred"; - specialize qw/vpx_sub_pixel_avg_variance32x32 avx2 msa/, "$sse2_x86inc", "$ssse3_x86inc"; + specialize qw/vpx_sub_pixel_avg_variance32x32 avx2 msa sse2 ssse3/; add_proto qw/uint32_t vpx_sub_pixel_avg_variance32x16/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred"; - specialize qw/vpx_sub_pixel_avg_variance32x16 msa/, "$sse2_x86inc", "$ssse3_x86inc"; + specialize qw/vpx_sub_pixel_avg_variance32x16 msa sse2 ssse3/; add_proto qw/uint32_t vpx_sub_pixel_avg_variance16x32/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred"; - specialize qw/vpx_sub_pixel_avg_variance16x32 msa/, "$sse2_x86inc", "$ssse3_x86inc"; + specialize qw/vpx_sub_pixel_avg_variance16x32 msa sse2 ssse3/; add_proto qw/uint32_t vpx_sub_pixel_avg_variance16x16/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred"; - specialize qw/vpx_sub_pixel_avg_variance16x16 msa/, "$sse2_x86inc", "$ssse3_x86inc"; + specialize qw/vpx_sub_pixel_avg_variance16x16 msa sse2 ssse3/; add_proto qw/uint32_t vpx_sub_pixel_avg_variance16x8/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred"; - specialize qw/vpx_sub_pixel_avg_variance16x8 msa/, "$sse2_x86inc", "$ssse3_x86inc"; + specialize qw/vpx_sub_pixel_avg_variance16x8 msa sse2 ssse3/; add_proto qw/uint32_t vpx_sub_pixel_avg_variance8x16/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred"; - specialize qw/vpx_sub_pixel_avg_variance8x16 msa/, "$sse2_x86inc", "$ssse3_x86inc"; + specialize qw/vpx_sub_pixel_avg_variance8x16 msa sse2 ssse3/; add_proto qw/uint32_t vpx_sub_pixel_avg_variance8x8/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred"; - specialize qw/vpx_sub_pixel_avg_variance8x8 msa/, "$sse2_x86inc", "$ssse3_x86inc"; + specialize qw/vpx_sub_pixel_avg_variance8x8 msa sse2 ssse3/; add_proto qw/uint32_t vpx_sub_pixel_avg_variance8x4/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred"; - specialize qw/vpx_sub_pixel_avg_variance8x4 msa/, "$sse2_x86inc", "$ssse3_x86inc"; + specialize qw/vpx_sub_pixel_avg_variance8x4 msa sse2 ssse3/; add_proto qw/uint32_t vpx_sub_pixel_avg_variance4x8/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred"; - specialize qw/vpx_sub_pixel_avg_variance4x8 msa/, "$sse2_x86inc", "$ssse3_x86inc"; + specialize qw/vpx_sub_pixel_avg_variance4x8 msa sse2 ssse3/; add_proto qw/uint32_t vpx_sub_pixel_avg_variance4x4/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred"; - specialize qw/vpx_sub_pixel_avg_variance4x4 msa/, "$sse2_x86inc", "$ssse3_x86inc"; + specialize qw/vpx_sub_pixel_avg_variance4x4 msa sse2 ssse3/; # # Specialty Subpixel @@ -1709,217 +1694,217 @@ if (vpx_config("CONFIG_VP9_HIGHBITDEPTH") eq "yes") { # Subpixel Variance # add_proto qw/uint32_t vpx_highbd_12_sub_pixel_variance64x64/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse"; - specialize qw/vpx_highbd_12_sub_pixel_variance64x64/, "$sse2_x86inc"; + specialize qw/vpx_highbd_12_sub_pixel_variance64x64 sse2/; add_proto qw/uint32_t vpx_highbd_12_sub_pixel_variance64x32/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse"; - specialize qw/vpx_highbd_12_sub_pixel_variance64x32/, "$sse2_x86inc"; + specialize qw/vpx_highbd_12_sub_pixel_variance64x32 sse2/; add_proto qw/uint32_t vpx_highbd_12_sub_pixel_variance32x64/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse"; - specialize qw/vpx_highbd_12_sub_pixel_variance32x64/, "$sse2_x86inc"; + specialize qw/vpx_highbd_12_sub_pixel_variance32x64 sse2/; add_proto qw/uint32_t vpx_highbd_12_sub_pixel_variance32x32/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse"; - specialize qw/vpx_highbd_12_sub_pixel_variance32x32/, "$sse2_x86inc"; + specialize qw/vpx_highbd_12_sub_pixel_variance32x32 sse2/; add_proto qw/uint32_t vpx_highbd_12_sub_pixel_variance32x16/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse"; - specialize qw/vpx_highbd_12_sub_pixel_variance32x16/, "$sse2_x86inc"; + specialize qw/vpx_highbd_12_sub_pixel_variance32x16 sse2/; add_proto qw/uint32_t vpx_highbd_12_sub_pixel_variance16x32/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse"; - specialize qw/vpx_highbd_12_sub_pixel_variance16x32/, "$sse2_x86inc"; + specialize qw/vpx_highbd_12_sub_pixel_variance16x32 sse2/; add_proto qw/uint32_t vpx_highbd_12_sub_pixel_variance16x16/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse"; - specialize qw/vpx_highbd_12_sub_pixel_variance16x16/, "$sse2_x86inc"; + specialize qw/vpx_highbd_12_sub_pixel_variance16x16 sse2/; add_proto qw/uint32_t vpx_highbd_12_sub_pixel_variance16x8/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse"; - specialize qw/vpx_highbd_12_sub_pixel_variance16x8/, "$sse2_x86inc"; + specialize qw/vpx_highbd_12_sub_pixel_variance16x8 sse2/; add_proto qw/uint32_t vpx_highbd_12_sub_pixel_variance8x16/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse"; - specialize qw/vpx_highbd_12_sub_pixel_variance8x16/, "$sse2_x86inc"; + specialize qw/vpx_highbd_12_sub_pixel_variance8x16 sse2/; add_proto qw/uint32_t vpx_highbd_12_sub_pixel_variance8x8/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse"; - specialize qw/vpx_highbd_12_sub_pixel_variance8x8/, "$sse2_x86inc"; + specialize qw/vpx_highbd_12_sub_pixel_variance8x8 sse2/; add_proto qw/uint32_t vpx_highbd_12_sub_pixel_variance8x4/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse"; - specialize qw/vpx_highbd_12_sub_pixel_variance8x4/, "$sse2_x86inc"; + specialize qw/vpx_highbd_12_sub_pixel_variance8x4 sse2/; add_proto qw/uint32_t vpx_highbd_12_sub_pixel_variance4x8/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse"; add_proto qw/uint32_t vpx_highbd_12_sub_pixel_variance4x4/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse"; add_proto qw/uint32_t vpx_highbd_10_sub_pixel_variance64x64/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse"; - specialize qw/vpx_highbd_10_sub_pixel_variance64x64/, "$sse2_x86inc"; + specialize qw/vpx_highbd_10_sub_pixel_variance64x64 sse2/; add_proto qw/uint32_t vpx_highbd_10_sub_pixel_variance64x32/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse"; - specialize qw/vpx_highbd_10_sub_pixel_variance64x32/, "$sse2_x86inc"; + specialize qw/vpx_highbd_10_sub_pixel_variance64x32 sse2/; add_proto qw/uint32_t vpx_highbd_10_sub_pixel_variance32x64/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse"; - specialize qw/vpx_highbd_10_sub_pixel_variance32x64/, "$sse2_x86inc"; + specialize qw/vpx_highbd_10_sub_pixel_variance32x64 sse2/; add_proto qw/uint32_t vpx_highbd_10_sub_pixel_variance32x32/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse"; - specialize qw/vpx_highbd_10_sub_pixel_variance32x32/, "$sse2_x86inc"; + specialize qw/vpx_highbd_10_sub_pixel_variance32x32 sse2/; add_proto qw/uint32_t vpx_highbd_10_sub_pixel_variance32x16/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse"; - specialize qw/vpx_highbd_10_sub_pixel_variance32x16/, "$sse2_x86inc"; + specialize qw/vpx_highbd_10_sub_pixel_variance32x16 sse2/; add_proto qw/uint32_t vpx_highbd_10_sub_pixel_variance16x32/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse"; - specialize qw/vpx_highbd_10_sub_pixel_variance16x32/, "$sse2_x86inc"; + specialize qw/vpx_highbd_10_sub_pixel_variance16x32 sse2/; add_proto qw/uint32_t vpx_highbd_10_sub_pixel_variance16x16/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse"; - specialize qw/vpx_highbd_10_sub_pixel_variance16x16/, "$sse2_x86inc"; + specialize qw/vpx_highbd_10_sub_pixel_variance16x16 sse2/; add_proto qw/uint32_t vpx_highbd_10_sub_pixel_variance16x8/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse"; - specialize qw/vpx_highbd_10_sub_pixel_variance16x8/, "$sse2_x86inc"; + specialize qw/vpx_highbd_10_sub_pixel_variance16x8 sse2/; add_proto qw/uint32_t vpx_highbd_10_sub_pixel_variance8x16/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse"; - specialize qw/vpx_highbd_10_sub_pixel_variance8x16/, "$sse2_x86inc"; + specialize qw/vpx_highbd_10_sub_pixel_variance8x16 sse2/; add_proto qw/uint32_t vpx_highbd_10_sub_pixel_variance8x8/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse"; - specialize qw/vpx_highbd_10_sub_pixel_variance8x8/, "$sse2_x86inc"; + specialize qw/vpx_highbd_10_sub_pixel_variance8x8 sse2/; add_proto qw/uint32_t vpx_highbd_10_sub_pixel_variance8x4/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse"; - specialize qw/vpx_highbd_10_sub_pixel_variance8x4/, "$sse2_x86inc"; + specialize qw/vpx_highbd_10_sub_pixel_variance8x4 sse2/; add_proto qw/uint32_t vpx_highbd_10_sub_pixel_variance4x8/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse"; add_proto qw/uint32_t vpx_highbd_10_sub_pixel_variance4x4/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse"; add_proto qw/uint32_t vpx_highbd_8_sub_pixel_variance64x64/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse"; - specialize qw/vpx_highbd_8_sub_pixel_variance64x64/, "$sse2_x86inc"; + specialize qw/vpx_highbd_8_sub_pixel_variance64x64 sse2/; add_proto qw/uint32_t vpx_highbd_8_sub_pixel_variance64x32/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse"; - specialize qw/vpx_highbd_8_sub_pixel_variance64x32/, "$sse2_x86inc"; + specialize qw/vpx_highbd_8_sub_pixel_variance64x32 sse2/; add_proto qw/uint32_t vpx_highbd_8_sub_pixel_variance32x64/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse"; - specialize qw/vpx_highbd_8_sub_pixel_variance32x64/, "$sse2_x86inc"; + specialize qw/vpx_highbd_8_sub_pixel_variance32x64 sse2/; add_proto qw/uint32_t vpx_highbd_8_sub_pixel_variance32x32/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse"; - specialize qw/vpx_highbd_8_sub_pixel_variance32x32/, "$sse2_x86inc"; + specialize qw/vpx_highbd_8_sub_pixel_variance32x32 sse2/; add_proto qw/uint32_t vpx_highbd_8_sub_pixel_variance32x16/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse"; - specialize qw/vpx_highbd_8_sub_pixel_variance32x16/, "$sse2_x86inc"; + specialize qw/vpx_highbd_8_sub_pixel_variance32x16 sse2/; add_proto qw/uint32_t vpx_highbd_8_sub_pixel_variance16x32/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse"; - specialize qw/vpx_highbd_8_sub_pixel_variance16x32/, "$sse2_x86inc"; + specialize qw/vpx_highbd_8_sub_pixel_variance16x32 sse2/; add_proto qw/uint32_t vpx_highbd_8_sub_pixel_variance16x16/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse"; - specialize qw/vpx_highbd_8_sub_pixel_variance16x16/, "$sse2_x86inc"; + specialize qw/vpx_highbd_8_sub_pixel_variance16x16 sse2/; add_proto qw/uint32_t vpx_highbd_8_sub_pixel_variance16x8/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse"; - specialize qw/vpx_highbd_8_sub_pixel_variance16x8/, "$sse2_x86inc"; + specialize qw/vpx_highbd_8_sub_pixel_variance16x8 sse2/; add_proto qw/uint32_t vpx_highbd_8_sub_pixel_variance8x16/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse"; - specialize qw/vpx_highbd_8_sub_pixel_variance8x16/, "$sse2_x86inc"; + specialize qw/vpx_highbd_8_sub_pixel_variance8x16 sse2/; add_proto qw/uint32_t vpx_highbd_8_sub_pixel_variance8x8/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse"; - specialize qw/vpx_highbd_8_sub_pixel_variance8x8/, "$sse2_x86inc"; + specialize qw/vpx_highbd_8_sub_pixel_variance8x8 sse2/; add_proto qw/uint32_t vpx_highbd_8_sub_pixel_variance8x4/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse"; - specialize qw/vpx_highbd_8_sub_pixel_variance8x4/, "$sse2_x86inc"; + specialize qw/vpx_highbd_8_sub_pixel_variance8x4 sse2/; add_proto qw/uint32_t vpx_highbd_8_sub_pixel_variance4x8/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse"; add_proto qw/uint32_t vpx_highbd_8_sub_pixel_variance4x4/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse"; add_proto qw/uint32_t vpx_highbd_12_sub_pixel_avg_variance64x64/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred"; - specialize qw/vpx_highbd_12_sub_pixel_avg_variance64x64/, "$sse2_x86inc"; + specialize qw/vpx_highbd_12_sub_pixel_avg_variance64x64 sse2/; add_proto qw/uint32_t vpx_highbd_12_sub_pixel_avg_variance64x32/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred"; - specialize qw/vpx_highbd_12_sub_pixel_avg_variance64x32/, "$sse2_x86inc"; + specialize qw/vpx_highbd_12_sub_pixel_avg_variance64x32 sse2/; add_proto qw/uint32_t vpx_highbd_12_sub_pixel_avg_variance32x64/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred"; - specialize qw/vpx_highbd_12_sub_pixel_avg_variance32x64/, "$sse2_x86inc"; + specialize qw/vpx_highbd_12_sub_pixel_avg_variance32x64 sse2/; add_proto qw/uint32_t vpx_highbd_12_sub_pixel_avg_variance32x32/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred"; - specialize qw/vpx_highbd_12_sub_pixel_avg_variance32x32/, "$sse2_x86inc"; + specialize qw/vpx_highbd_12_sub_pixel_avg_variance32x32 sse2/; add_proto qw/uint32_t vpx_highbd_12_sub_pixel_avg_variance32x16/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred"; - specialize qw/vpx_highbd_12_sub_pixel_avg_variance32x16/, "$sse2_x86inc"; + specialize qw/vpx_highbd_12_sub_pixel_avg_variance32x16 sse2/; add_proto qw/uint32_t vpx_highbd_12_sub_pixel_avg_variance16x32/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred"; - specialize qw/vpx_highbd_12_sub_pixel_avg_variance16x32/, "$sse2_x86inc"; + specialize qw/vpx_highbd_12_sub_pixel_avg_variance16x32 sse2/; add_proto qw/uint32_t vpx_highbd_12_sub_pixel_avg_variance16x16/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred"; - specialize qw/vpx_highbd_12_sub_pixel_avg_variance16x16/, "$sse2_x86inc"; + specialize qw/vpx_highbd_12_sub_pixel_avg_variance16x16 sse2/; add_proto qw/uint32_t vpx_highbd_12_sub_pixel_avg_variance16x8/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred"; - specialize qw/vpx_highbd_12_sub_pixel_avg_variance16x8/, "$sse2_x86inc"; + specialize qw/vpx_highbd_12_sub_pixel_avg_variance16x8 sse2/; add_proto qw/uint32_t vpx_highbd_12_sub_pixel_avg_variance8x16/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred"; - specialize qw/vpx_highbd_12_sub_pixel_avg_variance8x16/, "$sse2_x86inc"; + specialize qw/vpx_highbd_12_sub_pixel_avg_variance8x16 sse2/; add_proto qw/uint32_t vpx_highbd_12_sub_pixel_avg_variance8x8/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred"; - specialize qw/vpx_highbd_12_sub_pixel_avg_variance8x8/, "$sse2_x86inc"; + specialize qw/vpx_highbd_12_sub_pixel_avg_variance8x8 sse2/; add_proto qw/uint32_t vpx_highbd_12_sub_pixel_avg_variance8x4/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred"; - specialize qw/vpx_highbd_12_sub_pixel_avg_variance8x4/, "$sse2_x86inc"; + specialize qw/vpx_highbd_12_sub_pixel_avg_variance8x4 sse2/; add_proto qw/uint32_t vpx_highbd_12_sub_pixel_avg_variance4x8/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred"; add_proto qw/uint32_t vpx_highbd_12_sub_pixel_avg_variance4x4/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred"; add_proto qw/uint32_t vpx_highbd_10_sub_pixel_avg_variance64x64/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred"; - specialize qw/vpx_highbd_10_sub_pixel_avg_variance64x64/, "$sse2_x86inc"; + specialize qw/vpx_highbd_10_sub_pixel_avg_variance64x64 sse2/; add_proto qw/uint32_t vpx_highbd_10_sub_pixel_avg_variance64x32/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred"; - specialize qw/vpx_highbd_10_sub_pixel_avg_variance64x32/, "$sse2_x86inc"; + specialize qw/vpx_highbd_10_sub_pixel_avg_variance64x32 sse2/; add_proto qw/uint32_t vpx_highbd_10_sub_pixel_avg_variance32x64/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred"; - specialize qw/vpx_highbd_10_sub_pixel_avg_variance32x64/, "$sse2_x86inc"; + specialize qw/vpx_highbd_10_sub_pixel_avg_variance32x64 sse2/; add_proto qw/uint32_t vpx_highbd_10_sub_pixel_avg_variance32x32/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred"; - specialize qw/vpx_highbd_10_sub_pixel_avg_variance32x32/, "$sse2_x86inc"; + specialize qw/vpx_highbd_10_sub_pixel_avg_variance32x32 sse2/; add_proto qw/uint32_t vpx_highbd_10_sub_pixel_avg_variance32x16/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred"; - specialize qw/vpx_highbd_10_sub_pixel_avg_variance32x16/, "$sse2_x86inc"; + specialize qw/vpx_highbd_10_sub_pixel_avg_variance32x16 sse2/; add_proto qw/uint32_t vpx_highbd_10_sub_pixel_avg_variance16x32/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred"; - specialize qw/vpx_highbd_10_sub_pixel_avg_variance16x32/, "$sse2_x86inc"; + specialize qw/vpx_highbd_10_sub_pixel_avg_variance16x32 sse2/; add_proto qw/uint32_t vpx_highbd_10_sub_pixel_avg_variance16x16/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred"; - specialize qw/vpx_highbd_10_sub_pixel_avg_variance16x16/, "$sse2_x86inc"; + specialize qw/vpx_highbd_10_sub_pixel_avg_variance16x16 sse2/; add_proto qw/uint32_t vpx_highbd_10_sub_pixel_avg_variance16x8/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred"; - specialize qw/vpx_highbd_10_sub_pixel_avg_variance16x8/, "$sse2_x86inc"; + specialize qw/vpx_highbd_10_sub_pixel_avg_variance16x8 sse2/; add_proto qw/uint32_t vpx_highbd_10_sub_pixel_avg_variance8x16/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred"; - specialize qw/vpx_highbd_10_sub_pixel_avg_variance8x16/, "$sse2_x86inc"; + specialize qw/vpx_highbd_10_sub_pixel_avg_variance8x16 sse2/; add_proto qw/uint32_t vpx_highbd_10_sub_pixel_avg_variance8x8/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred"; - specialize qw/vpx_highbd_10_sub_pixel_avg_variance8x8/, "$sse2_x86inc"; + specialize qw/vpx_highbd_10_sub_pixel_avg_variance8x8 sse2/; add_proto qw/uint32_t vpx_highbd_10_sub_pixel_avg_variance8x4/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred"; - specialize qw/vpx_highbd_10_sub_pixel_avg_variance8x4/, "$sse2_x86inc"; + specialize qw/vpx_highbd_10_sub_pixel_avg_variance8x4 sse2/; add_proto qw/uint32_t vpx_highbd_10_sub_pixel_avg_variance4x8/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred"; add_proto qw/uint32_t vpx_highbd_10_sub_pixel_avg_variance4x4/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred"; add_proto qw/uint32_t vpx_highbd_8_sub_pixel_avg_variance64x64/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred"; - specialize qw/vpx_highbd_8_sub_pixel_avg_variance64x64/, "$sse2_x86inc"; + specialize qw/vpx_highbd_8_sub_pixel_avg_variance64x64 sse2/; add_proto qw/uint32_t vpx_highbd_8_sub_pixel_avg_variance64x32/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred"; - specialize qw/vpx_highbd_8_sub_pixel_avg_variance64x32/, "$sse2_x86inc"; + specialize qw/vpx_highbd_8_sub_pixel_avg_variance64x32 sse2/; add_proto qw/uint32_t vpx_highbd_8_sub_pixel_avg_variance32x64/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred"; - specialize qw/vpx_highbd_8_sub_pixel_avg_variance32x64/, "$sse2_x86inc"; + specialize qw/vpx_highbd_8_sub_pixel_avg_variance32x64 sse2/; add_proto qw/uint32_t vpx_highbd_8_sub_pixel_avg_variance32x32/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred"; - specialize qw/vpx_highbd_8_sub_pixel_avg_variance32x32/, "$sse2_x86inc"; + specialize qw/vpx_highbd_8_sub_pixel_avg_variance32x32 sse2/; add_proto qw/uint32_t vpx_highbd_8_sub_pixel_avg_variance32x16/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred"; - specialize qw/vpx_highbd_8_sub_pixel_avg_variance32x16/, "$sse2_x86inc"; + specialize qw/vpx_highbd_8_sub_pixel_avg_variance32x16 sse2/; add_proto qw/uint32_t vpx_highbd_8_sub_pixel_avg_variance16x32/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred"; - specialize qw/vpx_highbd_8_sub_pixel_avg_variance16x32/, "$sse2_x86inc"; + specialize qw/vpx_highbd_8_sub_pixel_avg_variance16x32 sse2/; add_proto qw/uint32_t vpx_highbd_8_sub_pixel_avg_variance16x16/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred"; - specialize qw/vpx_highbd_8_sub_pixel_avg_variance16x16/, "$sse2_x86inc"; + specialize qw/vpx_highbd_8_sub_pixel_avg_variance16x16 sse2/; add_proto qw/uint32_t vpx_highbd_8_sub_pixel_avg_variance16x8/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred"; - specialize qw/vpx_highbd_8_sub_pixel_avg_variance16x8/, "$sse2_x86inc"; + specialize qw/vpx_highbd_8_sub_pixel_avg_variance16x8 sse2/; add_proto qw/uint32_t vpx_highbd_8_sub_pixel_avg_variance8x16/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred"; - specialize qw/vpx_highbd_8_sub_pixel_avg_variance8x16/, "$sse2_x86inc"; + specialize qw/vpx_highbd_8_sub_pixel_avg_variance8x16 sse2/; add_proto qw/uint32_t vpx_highbd_8_sub_pixel_avg_variance8x8/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred"; - specialize qw/vpx_highbd_8_sub_pixel_avg_variance8x8/, "$sse2_x86inc"; + specialize qw/vpx_highbd_8_sub_pixel_avg_variance8x8 sse2/; add_proto qw/uint32_t vpx_highbd_8_sub_pixel_avg_variance8x4/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred"; - specialize qw/vpx_highbd_8_sub_pixel_avg_variance8x4/, "$sse2_x86inc"; + specialize qw/vpx_highbd_8_sub_pixel_avg_variance8x4 sse2/; add_proto qw/uint32_t vpx_highbd_8_sub_pixel_avg_variance4x8/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred"; add_proto qw/uint32_t vpx_highbd_8_sub_pixel_avg_variance4x4/, "const uint8_t *src_ptr, int source_stride, int xoffset, int yoffset, const uint8_t *ref_ptr, int ref_stride, uint32_t *sse, const uint8_t *second_pred"; @@ -1932,6 +1917,18 @@ if (vpx_config("CONFIG_VP9_HIGHBITDEPTH") eq "yes") { if (vpx_config("CONFIG_POSTPROC") eq "yes" || vpx_config("CONFIG_VP9_POSTPROC") eq "yes") { add_proto qw/void vpx_plane_add_noise/, "uint8_t *Start, char *noise, char blackclamp[16], char whiteclamp[16], char bothclamp[16], unsigned int Width, unsigned int Height, int Pitch"; specialize qw/vpx_plane_add_noise sse2 msa/; + + add_proto qw/void vpx_mbpost_proc_down/, "unsigned char *dst, int pitch, int rows, int cols,int flimit"; + specialize qw/vpx_mbpost_proc_down sse2 msa/; + $vpx_mbpost_proc_down_sse2=vpx_mbpost_proc_down_xmm; + + add_proto qw/void vpx_mbpost_proc_across_ip/, "unsigned char *dst, int pitch, int rows, int cols,int flimit"; + specialize qw/vpx_mbpost_proc_across_ip sse2 msa/; + $vpx_mbpost_proc_across_ip_sse2=vpx_mbpost_proc_across_ip_xmm; + + add_proto qw/void vpx_post_proc_down_and_across_mb_row/, "unsigned char *src, unsigned char *dst, int src_pitch, int dst_pitch, int cols, unsigned char *flimits, int size"; + specialize qw/vpx_post_proc_down_and_across_mb_row sse2 msa/; + } } # CONFIG_ENCODERS || CONFIG_POSTPROC || CONFIG_VP9_POSTPROC diff --git a/vp8/common/x86/postproc_sse2.asm b/vpx_dsp/x86/deblock_sse2.asm similarity index 95% rename from vp8/common/x86/postproc_sse2.asm rename to vpx_dsp/x86/deblock_sse2.asm index de17afa5c..6df360df4 100644 --- a/vp8/common/x86/postproc_sse2.asm +++ b/vpx_dsp/x86/deblock_sse2.asm @@ -83,7 +83,7 @@ add rbx, 16 %endmacro -;void vp8_post_proc_down_and_across_mb_row_sse2 +;void vpx_post_proc_down_and_across_mb_row_sse2 ;( ; unsigned char *src_ptr, ; unsigned char *dst_ptr, @@ -93,8 +93,8 @@ ; int *flimits, ; int size ;) -global sym(vp8_post_proc_down_and_across_mb_row_sse2) PRIVATE -sym(vp8_post_proc_down_and_across_mb_row_sse2): +global sym(vpx_post_proc_down_and_across_mb_row_sse2) PRIVATE +sym(vpx_post_proc_down_and_across_mb_row_sse2): push rbp mov rbp, rsp SHADOW_ARGS_TO_STACK 7 @@ -198,7 +198,7 @@ sym(vp8_post_proc_down_and_across_mb_row_sse2): UPDATE_FLIMIT jmp .acrossnextcol -.acrossdone +.acrossdone: ; last 16 pixels movq QWORD PTR [rdi+rdx-16], mm0 @@ -230,11 +230,11 @@ sym(vp8_post_proc_down_and_across_mb_row_sse2): ret %undef flimit -;void vp8_mbpost_proc_down_xmm(unsigned char *dst, +;void vpx_mbpost_proc_down_xmm(unsigned char *dst, ; int pitch, int rows, int cols,int flimit) -extern sym(vp8_rv) -global sym(vp8_mbpost_proc_down_xmm) PRIVATE -sym(vp8_mbpost_proc_down_xmm): +extern sym(vpx_rv) +global sym(vpx_mbpost_proc_down_xmm) PRIVATE +sym(vpx_mbpost_proc_down_xmm): push rbp mov rbp, rsp SHADOW_ARGS_TO_STACK 5 @@ -257,7 +257,7 @@ sym(vp8_mbpost_proc_down_xmm): %define flimit4 [rsp+128] %if ABI_IS_32BIT=0 - lea r8, [GLOBAL(sym(vp8_rv))] + lea r8, [GLOBAL(sym(vpx_rv))] %endif ;rows +=8; @@ -278,7 +278,7 @@ sym(vp8_mbpost_proc_down_xmm): lea rdi, [rdi+rdx] movq xmm1, QWORD ptr[rdi] ; first row mov rcx, 8 -.init_borderd ; initialize borders +.init_borderd: ; initialize borders lea rdi, [rdi + rax] movq [rdi], xmm1 @@ -291,7 +291,7 @@ sym(vp8_mbpost_proc_down_xmm): mov rdi, rsi movq xmm1, QWORD ptr[rdi] ; first row mov rcx, 8 -.init_border ; initialize borders +.init_border: ; initialize borders lea rdi, [rdi + rax] movq [rdi], xmm1 @@ -403,13 +403,13 @@ sym(vp8_mbpost_proc_down_xmm): and rcx, 127 %if ABI_IS_32BIT=1 && CONFIG_PIC=1 push rax - lea rax, [GLOBAL(sym(vp8_rv))] - movdqu xmm4, [rax + rcx*2] ;vp8_rv[rcx*2] + lea rax, [GLOBAL(sym(vpx_rv))] + movdqu xmm4, [rax + rcx*2] ;vpx_rv[rcx*2] pop rax %elif ABI_IS_32BIT=0 - movdqu xmm4, [r8 + rcx*2] ;vp8_rv[rcx*2] + movdqu xmm4, [r8 + rcx*2] ;vpx_rv[rcx*2] %else - movdqu xmm4, [sym(vp8_rv) + rcx*2] + movdqu xmm4, [sym(vpx_rv) + rcx*2] %endif paddw xmm1, xmm4 @@ -434,7 +434,7 @@ sym(vp8_mbpost_proc_down_xmm): movq mm0, [rsp + rcx*8] ;d[rcx*8] movq [rsi], mm0 -.skip_assignment +.skip_assignment: lea rsi, [rsi+rax] lea rdi, [rdi+rax] @@ -462,10 +462,10 @@ sym(vp8_mbpost_proc_down_xmm): %undef flimit4 -;void vp8_mbpost_proc_across_ip_xmm(unsigned char *src, +;void vpx_mbpost_proc_across_ip_xmm(unsigned char *src, ; int pitch, int rows, int cols,int flimit) -global sym(vp8_mbpost_proc_across_ip_xmm) PRIVATE -sym(vp8_mbpost_proc_across_ip_xmm): +global sym(vpx_mbpost_proc_across_ip_xmm) PRIVATE +sym(vpx_mbpost_proc_across_ip_xmm): push rbp mov rbp, rsp SHADOW_ARGS_TO_STACK 5 diff --git a/vpx_dsp/x86/highbd_variance_sse2.c b/vpx_dsp/x86/highbd_variance_sse2.c index 7bfa38340..364391578 100644 --- a/vpx_dsp/x86/highbd_variance_sse2.c +++ b/vpx_dsp/x86/highbd_variance_sse2.c @@ -147,24 +147,28 @@ uint32_t vpx_highbd_10_variance##w##x##h##_sse2( \ const uint8_t *src8, int src_stride, \ const uint8_t *ref8, int ref_stride, uint32_t *sse) { \ int sum; \ + int64_t var; \ uint16_t *src = CONVERT_TO_SHORTPTR(src8); \ uint16_t *ref = CONVERT_TO_SHORTPTR(ref8); \ highbd_10_variance_sse2( \ src, src_stride, ref, ref_stride, w, h, sse, &sum, \ vpx_highbd_calc##block_size##x##block_size##var_sse2, block_size); \ - return *sse - (((int64_t)sum * sum) >> shift); \ + var = (int64_t)(*sse) - (((int64_t)sum * sum) >> shift); \ + return (var >= 0) ? (uint32_t)var : 0; \ } \ \ uint32_t vpx_highbd_12_variance##w##x##h##_sse2( \ const uint8_t *src8, int src_stride, \ const uint8_t *ref8, int ref_stride, uint32_t *sse) { \ int sum; \ + int64_t var; \ uint16_t *src = CONVERT_TO_SHORTPTR(src8); \ uint16_t *ref = CONVERT_TO_SHORTPTR(ref8); \ highbd_12_variance_sse2( \ src, src_stride, ref, ref_stride, w, h, sse, &sum, \ vpx_highbd_calc##block_size##x##block_size##var_sse2, block_size); \ - return *sse - (((int64_t)sum * sum) >> shift); \ + var = (int64_t)(*sse) - (((int64_t)sum * sum) >> shift); \ + return (var >= 0) ? (uint32_t)var : 0; \ } VAR_FN(64, 64, 16, 12); @@ -246,7 +250,6 @@ unsigned int vpx_highbd_12_mse8x8_sse2(const uint8_t *src8, int src_stride, return *sse; } -#if CONFIG_USE_X86INC // The 2 unused parameters are place holders for PIC enabled build. // These definitions are for functions defined in // highbd_subpel_variance_impl_sse2.asm @@ -593,7 +596,6 @@ FNS(sse2); #undef FNS #undef FN -#endif // CONFIG_USE_X86INC void vpx_highbd_upsampled_pred_sse2(uint16_t *comp_pred, int width, int height, diff --git a/vpx_dsp/x86/intrapred_sse2.asm b/vpx_dsp/x86/intrapred_sse2.asm index cd6a6ae98..c18095c28 100644 --- a/vpx_dsp/x86/intrapred_sse2.asm +++ b/vpx_dsp/x86/intrapred_sse2.asm @@ -756,7 +756,7 @@ cglobal tm_predictor_8x8, 4, 4, 5, dst, stride, above, left psubw m0, m2 ; t1-tl t2-tl ... t8-tl [word] movq m2, [leftq] punpcklbw m2, m1 ; l1 l2 l3 l4 l5 l6 l7 l8 [word] -.loop +.loop: pshuflw m4, m2, 0x0 ; [63:0] l1 l1 l1 l1 [word] pshuflw m3, m2, 0x55 ; [63:0] l2 l2 l2 l2 [word] punpcklqdq m4, m4 ; l1 l1 l1 l1 l1 l1 l1 l1 [word] diff --git a/vpx_dsp/x86/loopfilter_sse2.c b/vpx_dsp/x86/loopfilter_sse2.c index 39a6ae3a8..739adf31d 100644 --- a/vpx_dsp/x86/loopfilter_sse2.c +++ b/vpx_dsp/x86/loopfilter_sse2.c @@ -111,7 +111,6 @@ void vpx_lpf_horizontal_4_sse2(uint8_t *s, int p /* pitch */, __m128i q1p1, q0p0, p3p2, p2p1, p1p0, q3q2, q2q1, q1q0, ps1ps0, qs1qs0; __m128i mask, hev; - p3p2 = _mm_loadl_epi64((__m128i *)(s - 3 * p)); p3p2 = _mm_unpacklo_epi64(_mm_loadl_epi64((__m128i *)(s - 3 * p)), _mm_loadl_epi64((__m128i *)(s - 4 * p))); q1p1 = _mm_unpacklo_epi64(_mm_loadl_epi64((__m128i *)(s - 2 * p)), diff --git a/vpx_dsp/x86/variance_sse2.c b/vpx_dsp/x86/variance_sse2.c index c2b55a3e1..e76c1a287 100644 --- a/vpx_dsp/x86/variance_sse2.c +++ b/vpx_dsp/x86/variance_sse2.c @@ -308,7 +308,6 @@ unsigned int vpx_mse16x16_sse2(const uint8_t *src, int src_stride, return *sse; } -#if CONFIG_USE_X86INC // The 2 unused parameters are place holders for PIC enabled build. // These definitions are for functions defined in subpel_variance.asm #define DECL(w, opt) \ @@ -474,7 +473,6 @@ FNS(ssse3, ssse3); #undef FNS #undef FN -#endif // CONFIG_USE_X86INC void vpx_upsampled_pred_sse2(uint8_t *comp_pred, int width, int height, diff --git a/vpx_dsp/x86/vpx_convolve_copy_sse2.asm b/vpx_dsp/x86/vpx_convolve_copy_sse2.asm index 6d43fc18e..964ee1429 100644 --- a/vpx_dsp/x86/vpx_convolve_copy_sse2.asm +++ b/vpx_dsp/x86/vpx_convolve_copy_sse2.asm @@ -201,7 +201,7 @@ cglobal convolve_%1, 4, 7, 4+AUX_XMM_REGS, src, src_stride, \ %endif %endif ; CONFIG_VP10 && CONFIG_EXT_PARTITION -.w64 +.w64: mov r4d, dword hm .loop64: movu m0, [srcq] diff --git a/vpx_dsp/x86/vpx_subpixel_8t_ssse3.asm b/vpx_dsp/x86/vpx_subpixel_8t_ssse3.asm index d2cb8ea29..c1a6f23ab 100644 --- a/vpx_dsp/x86/vpx_subpixel_8t_ssse3.asm +++ b/vpx_dsp/x86/vpx_subpixel_8t_ssse3.asm @@ -23,11 +23,7 @@ pw_64: times 8 dw 64 ; z = signed SAT(x + y) SECTION .text -%if ARCH_X86_64 - %define LOCAL_VARS_SIZE 16*4 -%else - %define LOCAL_VARS_SIZE 16*6 -%endif +%define LOCAL_VARS_SIZE 16*6 %macro SETUP_LOCAL_VARS 0 ; TODO(slavarnway): using xmm registers for these on ARCH_X86_64 + @@ -54,11 +50,11 @@ SECTION .text mova k6k7, m3 %if ARCH_X86_64 %define krd m12 - %define tmp m13 + %define tmp0 [rsp + 16*4] + %define tmp1 [rsp + 16*5] mova krd, [GLOBAL(pw_64)] %else - %define tmp [rsp + 16*4] - %define krd [rsp + 16*5] + %define krd [rsp + 16*4] %if CONFIG_PIC=0 mova m6, [GLOBAL(pw_64)] %else @@ -71,50 +67,31 @@ SECTION .text %endif %endm -%macro HORIZx4_ROW 2 - mova %2, %1 - punpcklbw %1, %1 - punpckhbw %2, %2 - - mova m3, %2 - palignr %2, %1, 1 - palignr m3, %1, 5 - - pmaddubsw %2, k0k1k4k5 - pmaddubsw m3, k2k3k6k7 - mova m4, %2 ;k0k1 - mova m5, m3 ;k2k3 - psrldq %2, 8 ;k4k5 - psrldq m3, 8 ;k6k7 - paddsw %2, m4 - paddsw m5, m3 - paddsw %2, m5 - paddsw %2, krd - psraw %2, 7 - packuswb %2, %2 -%endm - ;------------------------------------------------------------------------------- +%if ARCH_X86_64 + %define LOCAL_VARS_SIZE_H4 0 +%else + %define LOCAL_VARS_SIZE_H4 16*4 +%endif + %macro SUBPIX_HFILTER4 1 -cglobal filter_block1d4_%1, 6, 6+(ARCH_X86_64*2), 11, LOCAL_VARS_SIZE, \ +cglobal filter_block1d4_%1, 6, 6, 11, LOCAL_VARS_SIZE_H4, \ src, sstride, dst, dstride, height, filter mova m4, [filterq] packsswb m4, m4 %if ARCH_X86_64 - %define k0k1k4k5 m8 - %define k2k3k6k7 m9 - %define krd m10 - %define orig_height r7d + %define k0k1k4k5 m8 + %define k2k3k6k7 m9 + %define krd m10 mova krd, [GLOBAL(pw_64)] pshuflw k0k1k4k5, m4, 0b ;k0_k1 pshufhw k0k1k4k5, k0k1k4k5, 10101010b ;k0_k1_k4_k5 pshuflw k2k3k6k7, m4, 01010101b ;k2_k3 pshufhw k2k3k6k7, k2k3k6k7, 11111111b ;k2_k3_k6_k7 %else - %define k0k1k4k5 [rsp + 16*0] - %define k2k3k6k7 [rsp + 16*1] - %define krd [rsp + 16*2] - %define orig_height [rsp + 16*3] + %define k0k1k4k5 [rsp + 16*0] + %define k2k3k6k7 [rsp + 16*1] + %define krd [rsp + 16*2] pshuflw m6, m4, 0b ;k0_k1 pshufhw m6, m6, 10101010b ;k0_k1_k4_k5 pshuflw m7, m4, 01010101b ;k2_k3 @@ -131,61 +108,46 @@ cglobal filter_block1d4_%1, 6, 6+(ARCH_X86_64*2), 11, LOCAL_VARS_SIZE, \ mova k2k3k6k7, m7 mova krd, m1 %endif - mov orig_height, heightd - shr heightd, 1 + dec heightd + .loop: ;Do two rows at once - movh m0, [srcq - 3] - movh m1, [srcq + 5] - punpcklqdq m0, m1 - mova m1, m0 - movh m2, [srcq + sstrideq - 3] - movh m3, [srcq + sstrideq + 5] - punpcklqdq m2, m3 - mova m3, m2 - punpcklbw m0, m0 - punpckhbw m1, m1 - punpcklbw m2, m2 - punpckhbw m3, m3 - mova m4, m1 - palignr m4, m0, 1 - pmaddubsw m4, k0k1k4k5 - palignr m1, m0, 5 + movu m4, [srcq - 3] + movu m5, [srcq + sstrideq - 3] + punpckhbw m1, m4, m4 + punpcklbw m4, m4 + punpckhbw m3, m5, m5 + punpcklbw m5, m5 + palignr m0, m1, m4, 1 + pmaddubsw m0, k0k1k4k5 + palignr m1, m4, 5 pmaddubsw m1, k2k3k6k7 - mova m7, m3 - palignr m7, m2, 1 - pmaddubsw m7, k0k1k4k5 - palignr m3, m2, 5 + palignr m2, m3, m5, 1 + pmaddubsw m2, k0k1k4k5 + palignr m3, m5, 5 pmaddubsw m3, k2k3k6k7 - mova m0, m4 ;k0k1 - mova m5, m1 ;k2k3 - mova m2, m7 ;k0k1 upper - psrldq m4, 8 ;k4k5 - psrldq m1, 8 ;k6k7 - paddsw m4, m0 - paddsw m5, m1 - mova m1, m3 ;k2k3 upper - psrldq m7, 8 ;k4k5 upper - psrldq m3, 8 ;k6k7 upper - paddsw m7, m2 - paddsw m4, m5 - paddsw m1, m3 - paddsw m7, m1 - paddsw m4, krd - psraw m4, 7 - packuswb m4, m4 - paddsw m7, krd - psraw m7, 7 - packuswb m7, m7 + punpckhqdq m4, m0, m2 + punpcklqdq m0, m2 + punpckhqdq m5, m1, m3 + punpcklqdq m1, m3 + paddsw m0, m4 + paddsw m1, m5 +%ifidn %1, h8_avg + movd m4, [dstq] + movd m5, [dstq + dstrideq] +%endif + paddsw m0, m1 + paddsw m0, krd + psraw m0, 7 + packuswb m0, m0 + psrldq m1, m0, 4 %ifidn %1, h8_avg - movd m0, [dstq] - pavgb m4, m0 - movd m2, [dstq + dstrideq] - pavgb m7, m2 + pavgb m0, m4 + pavgb m1, m5 %endif - movd [dstq], m4 - movd [dstq + dstrideq], m7 + movd [dstq], m0 + movd [dstq + dstrideq], m1 lea srcq, [srcq + sstrideq ] prefetcht0 [srcq + 4 * sstrideq - 3] @@ -193,205 +155,156 @@ cglobal filter_block1d4_%1, 6, 6+(ARCH_X86_64*2), 11, LOCAL_VARS_SIZE, \ lea dstq, [dstq + 2 * dstrideq ] prefetcht0 [srcq + 2 * sstrideq - 3] - dec heightd - jnz .loop + sub heightd, 2 + jg .loop ; Do last row if output_height is odd - mov heightd, orig_height - and heightd, 1 - je .done - - movh m0, [srcq - 3] ; load src - movh m1, [srcq + 5] - punpcklqdq m0, m1 - - HORIZx4_ROW m0, m1 + jne .done + + movu m4, [srcq - 3] + punpckhbw m1, m4, m4 + punpcklbw m4, m4 + palignr m0, m1, m4, 1 + palignr m1, m4, 5 + pmaddubsw m0, k0k1k4k5 + pmaddubsw m1, k2k3k6k7 + psrldq m2, m0, 8 + psrldq m3, m1, 8 + paddsw m0, m2 + paddsw m1, m3 + paddsw m0, m1 + paddsw m0, krd + psraw m0, 7 + packuswb m0, m0 %ifidn %1, h8_avg - movd m0, [dstq] - pavgb m1, m0 + movd m4, [dstq] + pavgb m0, m4 %endif - movd [dstq], m1 -.done - RET -%endm - -%macro HORIZx8_ROW 5 - mova %2, %1 - punpcklbw %1, %1 - punpckhbw %2, %2 - - mova %3, %2 - mova %4, %2 - mova %5, %2 - - palignr %2, %1, 1 - palignr %3, %1, 5 - palignr %4, %1, 9 - palignr %5, %1, 13 - - pmaddubsw %2, k0k1 - pmaddubsw %3, k2k3 - pmaddubsw %4, k4k5 - pmaddubsw %5, k6k7 - paddsw %2, %4 - paddsw %5, %3 - paddsw %2, %5 - paddsw %2, krd - psraw %2, 7 - packuswb %2, %2 - SWAP %1, %2 + movd [dstq], m0 +.done: + REP_RET %endm ;------------------------------------------------------------------------------- %macro SUBPIX_HFILTER8 1 -cglobal filter_block1d8_%1, 6, 6+(ARCH_X86_64*1), 14, LOCAL_VARS_SIZE, \ +cglobal filter_block1d8_%1, 6, 6, 14, LOCAL_VARS_SIZE, \ src, sstride, dst, dstride, height, filter mova m4, [filterq] SETUP_LOCAL_VARS -%if ARCH_X86_64 - %define orig_height r7d -%else - %define orig_height heightmp -%endif - mov orig_height, heightd - shr heightd, 1 + dec heightd .loop: - movh m0, [srcq - 3] - movh m3, [srcq + 5] - movh m4, [srcq + sstrideq - 3] - movh m7, [srcq + sstrideq + 5] - punpcklqdq m0, m3 - mova m1, m0 + ;Do two rows at once + movu m0, [srcq - 3] + movu m4, [srcq + sstrideq - 3] + punpckhbw m1, m0, m0 punpcklbw m0, m0 - punpckhbw m1, m1 - mova m5, m1 - palignr m5, m0, 13 + palignr m5, m1, m0, 13 pmaddubsw m5, k6k7 - mova m2, m1 - mova m3, m1 + palignr m2, m1, m0, 5 + palignr m3, m1, m0, 9 palignr m1, m0, 1 pmaddubsw m1, k0k1 - punpcklqdq m4, m7 - mova m6, m4 + punpckhbw m6, m4, m4 punpcklbw m4, m4 - palignr m2, m0, 5 - punpckhbw m6, m6 - palignr m3, m0, 9 - mova m7, m6 pmaddubsw m2, k2k3 pmaddubsw m3, k4k5 - palignr m7, m4, 13 - mova m0, m6 - palignr m0, m4, 5 + palignr m7, m6, m4, 13 + palignr m0, m6, m4, 5 pmaddubsw m7, k6k7 paddsw m1, m3 paddsw m2, m5 paddsw m1, m2 - mova m5, m6 +%ifidn %1, h8_avg + movh m2, [dstq] + movhps m2, [dstq + dstrideq] +%endif + palignr m5, m6, m4, 9 palignr m6, m4, 1 pmaddubsw m0, k2k3 pmaddubsw m6, k0k1 - palignr m5, m4, 9 paddsw m1, krd pmaddubsw m5, k4k5 psraw m1, 7 paddsw m0, m7 -%ifidn %1, h8_avg - movh m7, [dstq] - movh m2, [dstq + dstrideq] -%endif - packuswb m1, m1 paddsw m6, m5 paddsw m6, m0 paddsw m6, krd psraw m6, 7 - packuswb m6, m6 + packuswb m1, m6 %ifidn %1, h8_avg - pavgb m1, m7 - pavgb m6, m2 + pavgb m1, m2 %endif - movh [dstq], m1 - movh [dstq + dstrideq], m6 + movh [dstq], m1 + movhps [dstq + dstrideq], m1 lea srcq, [srcq + sstrideq ] prefetcht0 [srcq + 4 * sstrideq - 3] lea srcq, [srcq + sstrideq ] lea dstq, [dstq + 2 * dstrideq ] prefetcht0 [srcq + 2 * sstrideq - 3] - dec heightd - jnz .loop + sub heightd, 2 + jg .loop - ;Do last row if output_height is odd - mov heightd, orig_height - and heightd, 1 - je .done - - movh m0, [srcq - 3] - movh m3, [srcq + 5] - punpcklqdq m0, m3 - - HORIZx8_ROW m0, m1, m2, m3, m4 + ; Do last row if output_height is odd + jne .done + movu m0, [srcq - 3] + punpckhbw m3, m0, m0 + punpcklbw m0, m0 + palignr m1, m3, m0, 1 + palignr m2, m3, m0, 5 + palignr m4, m3, m0, 13 + palignr m3, m0, 9 + pmaddubsw m1, k0k1 + pmaddubsw m2, k2k3 + pmaddubsw m3, k4k5 + pmaddubsw m4, k6k7 + paddsw m1, m3 + paddsw m4, m2 + paddsw m1, m4 + paddsw m1, krd + psraw m1, 7 + packuswb m1, m1 %ifidn %1, h8_avg - movh m1, [dstq] - pavgb m0, m1 + movh m0, [dstq] + pavgb m1, m0 %endif - movh [dstq], m0 + movh [dstq], m1 .done: - RET + REP_RET %endm ;------------------------------------------------------------------------------- %macro SUBPIX_HFILTER16 1 -cglobal filter_block1d16_%1, 6, 6+(ARCH_X86_64*0), 14, LOCAL_VARS_SIZE, \ +cglobal filter_block1d16_%1, 6, 6, 14, LOCAL_VARS_SIZE, \ src, sstride, dst, dstride, height, filter mova m4, [filterq] SETUP_LOCAL_VARS + .loop: prefetcht0 [srcq + 2 * sstrideq -3] - movh m0, [srcq - 3] - movh m4, [srcq + 5] - movh m6, [srcq + 13] - punpcklqdq m0, m4 - mova m7, m0 - punpckhbw m0, m0 - mova m1, m0 - punpcklqdq m4, m6 - mova m3, m0 - punpcklbw m7, m7 - - palignr m3, m7, 13 - mova m2, m0 - pmaddubsw m3, k6k7 - palignr m0, m7, 1 + movu m0, [srcq - 3] + movu m4, [srcq - 2] pmaddubsw m0, k0k1 - palignr m1, m7, 5 - pmaddubsw m1, k2k3 - palignr m2, m7, 9 - pmaddubsw m2, k4k5 - paddsw m1, m3 - mova m3, m4 - punpckhbw m4, m4 - mova m5, m4 - punpcklbw m3, m3 - mova m7, m4 - palignr m5, m3, 5 - mova m6, m4 - palignr m4, m3, 1 pmaddubsw m4, k0k1 + movu m1, [srcq - 1] + movu m5, [srcq + 0] + pmaddubsw m1, k2k3 pmaddubsw m5, k2k3 - palignr m6, m3, 9 + movu m2, [srcq + 1] + movu m6, [srcq + 2] + pmaddubsw m2, k4k5 pmaddubsw m6, k4k5 - palignr m7, m3, 13 + movu m3, [srcq + 3] + movu m7, [srcq + 4] + pmaddubsw m3, k6k7 pmaddubsw m7, k6k7 paddsw m0, m2 + paddsw m1, m3 paddsw m0, m1 -%ifidn %1, h8_avg - mova m1, [dstq] -%endif paddsw m4, m6 paddsw m5, m7 paddsw m4, m5 @@ -399,16 +312,18 @@ cglobal filter_block1d16_%1, 6, 6+(ARCH_X86_64*0), 14, LOCAL_VARS_SIZE, \ paddsw m4, krd psraw m0, 7 psraw m4, 7 - packuswb m0, m4 + packuswb m0, m0 + packuswb m4, m4 + punpcklbw m0, m4 %ifidn %1, h8_avg - pavgb m0, m1 + pavgb m0, [dstq] %endif lea srcq, [srcq + sstrideq] mova [dstq], m0 lea dstq, [dstq + dstrideq] dec heightd jnz .loop - RET + REP_RET %endm INIT_XMM ssse3 @@ -420,204 +335,463 @@ SUBPIX_HFILTER4 h8 SUBPIX_HFILTER4 h8_avg ;------------------------------------------------------------------------------- + +; TODO(Linfeng): Detect cpu type and choose the code with better performance. +%define X86_SUBPIX_VFILTER_PREFER_SLOW_CELERON 1 + +%if ARCH_X86_64 && X86_SUBPIX_VFILTER_PREFER_SLOW_CELERON + %define NUM_GENERAL_REG_USED 9 +%else + %define NUM_GENERAL_REG_USED 6 +%endif + %macro SUBPIX_VFILTER 2 -cglobal filter_block1d%2_%1, 6, 6+(ARCH_X86_64*3), 14, LOCAL_VARS_SIZE, \ +cglobal filter_block1d%2_%1, 6, NUM_GENERAL_REG_USED, 15, LOCAL_VARS_SIZE, \ src, sstride, dst, dstride, height, filter mova m4, [filterq] SETUP_LOCAL_VARS -%if ARCH_X86_64 - %define src1q r7 - %define sstride6q r8 - %define dst_stride dstrideq + +%ifidn %2, 8 + %define movx movh %else - %define src1q filterq - %define sstride6q dstrideq - %define dst_stride dstridemp + %define movx movd %endif - mov src1q, srcq - add src1q, sstrideq - lea sstride6q, [sstrideq + sstrideq * 4] - add sstride6q, sstrideq ;pitch * 6 -%ifidn %2, 8 - %define movx movh + dec heightd + +%if ARCH_X86 || X86_SUBPIX_VFILTER_PREFER_SLOW_CELERON + +%if ARCH_X86_64 + %define src1q r7 + %define sstride6q r8 + %define dst_stride dstrideq %else - %define movx movd + %define src1q filterq + %define sstride6q dstrideq + %define dst_stride dstridemp %endif + mov src1q, srcq + add src1q, sstrideq + lea sstride6q, [sstrideq + sstrideq * 4] + add sstride6q, sstrideq ;pitch * 6 + .loop: - movx m0, [srcq ] ;A - movx m1, [srcq + sstrideq ] ;B - punpcklbw m0, m1 ;A B - movx m2, [srcq + sstrideq * 2 ] ;C - pmaddubsw m0, k0k1 - mova m6, m2 - movx m3, [src1q + sstrideq * 2] ;D - punpcklbw m2, m3 ;C D - pmaddubsw m2, k2k3 - movx m4, [srcq + sstrideq * 4 ] ;E - mova m7, m4 - movx m5, [src1q + sstrideq * 4] ;F - punpcklbw m4, m5 ;E F - pmaddubsw m4, k4k5 - punpcklbw m1, m6 ;A B next iter - movx m6, [srcq + sstride6q ] ;G - punpcklbw m5, m6 ;E F next iter - punpcklbw m3, m7 ;C D next iter - pmaddubsw m5, k4k5 - movx m7, [src1q + sstride6q ] ;H - punpcklbw m6, m7 ;G H - pmaddubsw m6, k6k7 - pmaddubsw m3, k2k3 - pmaddubsw m1, k0k1 - paddsw m0, m4 - paddsw m2, m6 - movx m6, [srcq + sstrideq * 8 ] ;H next iter - punpcklbw m7, m6 - pmaddubsw m7, k6k7 - paddsw m0, m2 - paddsw m0, krd - psraw m0, 7 - paddsw m1, m5 - packuswb m0, m0 - - paddsw m3, m7 - paddsw m1, m3 - paddsw m1, krd - psraw m1, 7 - lea srcq, [srcq + sstrideq * 2 ] - lea src1q, [src1q + sstrideq * 2] - packuswb m1, m1 + ;Do two rows at once + movx m0, [srcq ] ;A + movx m1, [src1q ] ;B + punpcklbw m0, m1 ;A B + movx m2, [srcq + sstrideq * 2 ] ;C + pmaddubsw m0, k0k1 + mova m6, m2 + movx m3, [src1q + sstrideq * 2] ;D + punpcklbw m2, m3 ;C D + pmaddubsw m2, k2k3 + movx m4, [srcq + sstrideq * 4 ] ;E + mova m7, m4 + movx m5, [src1q + sstrideq * 4] ;F + punpcklbw m4, m5 ;E F + pmaddubsw m4, k4k5 + punpcklbw m1, m6 ;A B next iter + movx m6, [srcq + sstride6q ] ;G + punpcklbw m5, m6 ;E F next iter + punpcklbw m3, m7 ;C D next iter + pmaddubsw m5, k4k5 + movx m7, [src1q + sstride6q ] ;H + punpcklbw m6, m7 ;G H + pmaddubsw m6, k6k7 + pmaddubsw m3, k2k3 + pmaddubsw m1, k0k1 + paddsw m0, m4 + paddsw m2, m6 + movx m6, [srcq + sstrideq * 8 ] ;H next iter + punpcklbw m7, m6 + pmaddubsw m7, k6k7 + paddsw m0, m2 + paddsw m0, krd + psraw m0, 7 + paddsw m1, m5 + packuswb m0, m0 + + paddsw m3, m7 + paddsw m1, m3 + paddsw m1, krd + psraw m1, 7 + lea srcq, [srcq + sstrideq * 2 ] + lea src1q, [src1q + sstrideq * 2] + packuswb m1, m1 %ifidn %1, v8_avg - movx m2, [dstq] - pavgb m0, m2 + movx m2, [dstq] + pavgb m0, m2 %endif - movx [dstq], m0 - add dstq, dst_stride + movx [dstq], m0 + add dstq, dst_stride %ifidn %1, v8_avg - movx m3, [dstq] - pavgb m1, m3 + movx m3, [dstq] + pavgb m1, m3 %endif - movx [dstq], m1 - add dstq, dst_stride - sub heightd, 2 - cmp heightd, 1 - jg .loop - - cmp heightd, 0 - je .done - - movx m0, [srcq ] ;A - movx m1, [srcq + sstrideq ] ;B - movx m6, [srcq + sstride6q ] ;G - punpcklbw m0, m1 ;A B - movx m7, [src1q + sstride6q ] ;H - pmaddubsw m0, k0k1 - movx m2, [srcq + sstrideq * 2 ] ;C - punpcklbw m6, m7 ;G H - movx m3, [src1q + sstrideq * 2] ;D - pmaddubsw m6, k6k7 - movx m4, [srcq + sstrideq * 4 ] ;E - punpcklbw m2, m3 ;C D - movx m5, [src1q + sstrideq * 4] ;F - punpcklbw m4, m5 ;E F - pmaddubsw m2, k2k3 - pmaddubsw m4, k4k5 - paddsw m2, m6 - paddsw m0, m4 - paddsw m0, m2 - paddsw m0, krd - psraw m0, 7 - packuswb m0, m0 + movx [dstq], m1 + add dstq, dst_stride + sub heightd, 2 + jg .loop + + ; Do last row if output_height is odd + jne .done + + movx m0, [srcq ] ;A + movx m1, [srcq + sstrideq ] ;B + movx m6, [srcq + sstride6q ] ;G + punpcklbw m0, m1 ;A B + movx m7, [src1q + sstride6q ] ;H + pmaddubsw m0, k0k1 + movx m2, [srcq + sstrideq * 2 ] ;C + punpcklbw m6, m7 ;G H + movx m3, [src1q + sstrideq * 2] ;D + pmaddubsw m6, k6k7 + movx m4, [srcq + sstrideq * 4 ] ;E + punpcklbw m2, m3 ;C D + movx m5, [src1q + sstrideq * 4] ;F + punpcklbw m4, m5 ;E F + pmaddubsw m2, k2k3 + pmaddubsw m4, k4k5 + paddsw m2, m6 + paddsw m0, m4 + paddsw m0, m2 + paddsw m0, krd + psraw m0, 7 + packuswb m0, m0 %ifidn %1, v8_avg - movx m1, [dstq] - pavgb m0, m1 + movx m1, [dstq] + pavgb m0, m1 %endif - movx [dstq], m0 + movx [dstq], m0 + +%else + ; ARCH_X86_64 + + movx m0, [srcq ] ;A + movx m1, [srcq + sstrideq ] ;B + lea srcq, [srcq + sstrideq * 2 ] + movx m2, [srcq] ;C + movx m3, [srcq + sstrideq] ;D + lea srcq, [srcq + sstrideq * 2 ] + movx m4, [srcq] ;E + movx m5, [srcq + sstrideq] ;F + lea srcq, [srcq + sstrideq * 2 ] + movx m6, [srcq] ;G + punpcklbw m0, m1 ;A B + punpcklbw m1, m2 ;A B next iter + punpcklbw m2, m3 ;C D + punpcklbw m3, m4 ;C D next iter + punpcklbw m4, m5 ;E F + punpcklbw m5, m6 ;E F next iter + +.loop: + ;Do two rows at once + movx m7, [srcq + sstrideq] ;H + lea srcq, [srcq + sstrideq * 2 ] + movx m14, [srcq] ;H next iter + punpcklbw m6, m7 ;G H + punpcklbw m7, m14 ;G H next iter + pmaddubsw m8, m0, k0k1 + pmaddubsw m9, m1, k0k1 + mova m0, m2 + mova m1, m3 + pmaddubsw m10, m2, k2k3 + pmaddubsw m11, m3, k2k3 + mova m2, m4 + mova m3, m5 + pmaddubsw m4, k4k5 + pmaddubsw m5, k4k5 + paddsw m8, m4 + paddsw m9, m5 + mova m4, m6 + mova m5, m7 + pmaddubsw m6, k6k7 + pmaddubsw m7, k6k7 + paddsw m10, m6 + paddsw m11, m7 + paddsw m8, m10 + paddsw m9, m11 + mova m6, m14 + paddsw m8, krd + paddsw m9, krd + psraw m8, 7 + psraw m9, 7 +%ifidn %2, 4 + packuswb m8, m8 + packuswb m9, m9 +%else + packuswb m8, m9 +%endif + +%ifidn %1, v8_avg + movx m7, [dstq] +%ifidn %2, 4 + movx m10, [dstq + dstrideq] + pavgb m9, m10 +%else + movhpd m7, [dstq + dstrideq] +%endif + pavgb m8, m7 +%endif + movx [dstq], m8 +%ifidn %2, 4 + movx [dstq + dstrideq], m9 +%else + movhpd [dstq + dstrideq], m8 +%endif + + lea dstq, [dstq + dstrideq * 2 ] + sub heightd, 2 + jg .loop + + ; Do last row if output_height is odd + jne .done + + movx m7, [srcq + sstrideq] ;H + punpcklbw m6, m7 ;G H + pmaddubsw m0, k0k1 + pmaddubsw m2, k2k3 + pmaddubsw m4, k4k5 + pmaddubsw m6, k6k7 + paddsw m0, m4 + paddsw m2, m6 + paddsw m0, m2 + paddsw m0, krd + psraw m0, 7 + packuswb m0, m0 +%ifidn %1, v8_avg + movx m1, [dstq] + pavgb m0, m1 +%endif + movx [dstq], m0 + +%endif ; ARCH_X86_64 + .done: - RET + REP_RET + %endm ;------------------------------------------------------------------------------- %macro SUBPIX_VFILTER16 1 -cglobal filter_block1d16_%1, 6, 6+(ARCH_X86_64*3), 14, LOCAL_VARS_SIZE, \ +cglobal filter_block1d16_%1, 6, NUM_GENERAL_REG_USED, 16, LOCAL_VARS_SIZE, \ src, sstride, dst, dstride, height, filter - mova m4, [filterq] + mova m4, [filterq] SETUP_LOCAL_VARS + +%if ARCH_X86 || X86_SUBPIX_VFILTER_PREFER_SLOW_CELERON + %if ARCH_X86_64 - %define src1q r7 - %define sstride6q r8 - %define dst_stride dstrideq + %define src1q r7 + %define sstride6q r8 + %define dst_stride dstrideq %else - %define src1q filterq - %define sstride6q dstrideq - %define dst_stride dstridemp + %define src1q filterq + %define sstride6q dstrideq + %define dst_stride dstridemp %endif - mov src1q, srcq - add src1q, sstrideq - lea sstride6q, [sstrideq + sstrideq * 4] - add sstride6q, sstrideq ;pitch * 6 + lea src1q, [srcq + sstrideq] + lea sstride6q, [sstrideq + sstrideq * 4] + add sstride6q, sstrideq ;pitch * 6 .loop: - movh m0, [srcq ] ;A - movh m1, [srcq + sstrideq ] ;B - movh m2, [srcq + sstrideq * 2 ] ;C - movh m3, [src1q + sstrideq * 2] ;D - movh m4, [srcq + sstrideq * 4 ] ;E - movh m5, [src1q + sstrideq * 4] ;F - - punpcklbw m0, m1 ;A B - movh m6, [srcq + sstride6q] ;G - punpcklbw m2, m3 ;C D - movh m7, [src1q + sstride6q] ;H - punpcklbw m4, m5 ;E F - pmaddubsw m0, k0k1 - movh m3, [srcq + 8] ;A - pmaddubsw m2, k2k3 - punpcklbw m6, m7 ;G H - movh m5, [srcq + sstrideq + 8] ;B - pmaddubsw m4, k4k5 - punpcklbw m3, m5 ;A B - movh m7, [srcq + sstrideq * 2 + 8] ;C - pmaddubsw m6, k6k7 - movh m5, [src1q + sstrideq * 2 + 8] ;D - punpcklbw m7, m5 ;C D - paddsw m2, m6 - pmaddubsw m3, k0k1 - movh m1, [srcq + sstrideq * 4 + 8] ;E - paddsw m0, m4 - pmaddubsw m7, k2k3 - movh m6, [src1q + sstrideq * 4 + 8] ;F - punpcklbw m1, m6 ;E F - paddsw m0, m2 - paddsw m0, krd - movh m2, [srcq + sstride6q + 8] ;G - pmaddubsw m1, k4k5 - movh m5, [src1q + sstride6q + 8] ;H - psraw m0, 7 - punpcklbw m2, m5 ;G H - pmaddubsw m2, k6k7 + movh m0, [srcq ] ;A + movh m1, [src1q ] ;B + movh m2, [srcq + sstrideq * 2 ] ;C + movh m3, [src1q + sstrideq * 2] ;D + movh m4, [srcq + sstrideq * 4 ] ;E + movh m5, [src1q + sstrideq * 4] ;F + + punpcklbw m0, m1 ;A B + movh m6, [srcq + sstride6q] ;G + punpcklbw m2, m3 ;C D + movh m7, [src1q + sstride6q] ;H + punpcklbw m4, m5 ;E F + pmaddubsw m0, k0k1 + movh m3, [srcq + 8] ;A + pmaddubsw m2, k2k3 + punpcklbw m6, m7 ;G H + movh m5, [srcq + sstrideq + 8] ;B + pmaddubsw m4, k4k5 + punpcklbw m3, m5 ;A B + movh m7, [srcq + sstrideq * 2 + 8] ;C + pmaddubsw m6, k6k7 + movh m5, [src1q + sstrideq * 2 + 8] ;D + punpcklbw m7, m5 ;C D + paddsw m2, m6 + pmaddubsw m3, k0k1 + movh m1, [srcq + sstrideq * 4 + 8] ;E + paddsw m0, m4 + pmaddubsw m7, k2k3 + movh m6, [src1q + sstrideq * 4 + 8] ;F + punpcklbw m1, m6 ;E F + paddsw m0, m2 + paddsw m0, krd + movh m2, [srcq + sstride6q + 8] ;G + pmaddubsw m1, k4k5 + movh m5, [src1q + sstride6q + 8] ;H + psraw m0, 7 + punpcklbw m2, m5 ;G H + pmaddubsw m2, k6k7 + paddsw m7, m2 + paddsw m3, m1 + paddsw m3, m7 + paddsw m3, krd + psraw m3, 7 + packuswb m0, m3 + + add srcq, sstrideq + add src1q, sstrideq +%ifidn %1, v8_avg + pavgb m0, [dstq] +%endif + mova [dstq], m0 + add dstq, dst_stride + dec heightd + jnz .loop + REP_RET + +%else + ; ARCH_X86_64 + dec heightd + + movu m1, [srcq ] ;A + movu m3, [srcq + sstrideq ] ;B + lea srcq, [srcq + sstrideq * 2] + punpcklbw m0, m1, m3 ;A B + punpckhbw m1, m3 ;A B + movu m5, [srcq] ;C + punpcklbw m2, m3, m5 ;A B next iter + punpckhbw m3, m5 ;A B next iter + mova tmp0, m2 ;store to stack + mova tmp1, m3 ;store to stack + movu m7, [srcq + sstrideq] ;D + lea srcq, [srcq + sstrideq * 2] + punpcklbw m4, m5, m7 ;C D + punpckhbw m5, m7 ;C D + movu m9, [srcq] ;E + punpcklbw m6, m7, m9 ;C D next iter + punpckhbw m7, m9 ;C D next iter + movu m11, [srcq + sstrideq] ;F + lea srcq, [srcq + sstrideq * 2] + punpcklbw m8, m9, m11 ;E F + punpckhbw m9, m11 ;E F + movu m2, [srcq] ;G + punpcklbw m10, m11, m2 ;E F next iter + punpckhbw m11, m2 ;E F next iter + +.loop: + ;Do two rows at once + pmaddubsw m13, m0, k0k1 + mova m0, m4 + pmaddubsw m14, m8, k4k5 + pmaddubsw m15, m4, k2k3 + mova m4, m8 + paddsw m13, m14 + movu m3, [srcq + sstrideq] ;H + lea srcq, [srcq + sstrideq * 2] + punpcklbw m14, m2, m3 ;G H + mova m8, m14 + pmaddubsw m14, k6k7 + paddsw m15, m14 + paddsw m13, m15 + paddsw m13, krd + psraw m13, 7 + + pmaddubsw m14, m1, k0k1 + pmaddubsw m1, m9, k4k5 + pmaddubsw m15, m5, k2k3 + paddsw m14, m1 + mova m1, m5 + mova m5, m9 + punpckhbw m2, m3 ;G H + mova m9, m2 + pmaddubsw m2, k6k7 + paddsw m15, m2 + paddsw m14, m15 + paddsw m14, krd + psraw m14, 7 + packuswb m13, m14 %ifidn %1, v8_avg - mova m4, [dstq] + pavgb m13, [dstq] %endif - movh [dstq], m0 - paddsw m7, m2 - paddsw m3, m1 - paddsw m3, m7 - paddsw m3, krd - psraw m3, 7 - packuswb m0, m3 - - add srcq, sstrideq - add src1q, sstrideq + mova [dstq], m13 + + ; next iter + pmaddubsw m15, tmp0, k0k1 + pmaddubsw m14, m10, k4k5 + pmaddubsw m13, m6, k2k3 + paddsw m15, m14 + mova tmp0, m6 + mova m6, m10 + movu m2, [srcq] ;G next iter + punpcklbw m14, m3, m2 ;G H next iter + mova m10, m14 + pmaddubsw m14, k6k7 + paddsw m13, m14 + paddsw m15, m13 + paddsw m15, krd + psraw m15, 7 + + pmaddubsw m14, tmp1, k0k1 + mova tmp1, m7 + pmaddubsw m13, m7, k2k3 + mova m7, m11 + pmaddubsw m11, k4k5 + paddsw m14, m11 + punpckhbw m3, m2 ;G H next iter + mova m11, m3 + pmaddubsw m3, k6k7 + paddsw m13, m3 + paddsw m14, m13 + paddsw m14, krd + psraw m14, 7 + packuswb m15, m14 %ifidn %1, v8_avg - pavgb m0, m4 + pavgb m15, [dstq + dstrideq] %endif - mova [dstq], m0 - add dstq, dst_stride - dec heightd - jnz .loop - RET + mova [dstq + dstrideq], m15 + lea dstq, [dstq + dstrideq * 2] + sub heightd, 2 + jg .loop + + ; Do last row if output_height is odd + jne .done + + movu m3, [srcq + sstrideq] ;H + punpcklbw m6, m2, m3 ;G H + punpckhbw m2, m3 ;G H + pmaddubsw m0, k0k1 + pmaddubsw m1, k0k1 + pmaddubsw m4, k2k3 + pmaddubsw m5, k2k3 + pmaddubsw m8, k4k5 + pmaddubsw m9, k4k5 + pmaddubsw m6, k6k7 + pmaddubsw m2, k6k7 + paddsw m0, m8 + paddsw m1, m9 + paddsw m4, m6 + paddsw m5, m2 + paddsw m0, m4 + paddsw m1, m5 + paddsw m0, krd + paddsw m1, krd + psraw m0, 7 + psraw m1, 7 + packuswb m0, m1 +%ifidn %1, v8_avg + pavgb m0, [dstq] +%endif + mova [dstq], m0 + +.done: + REP_RET + +%endif ; ARCH_X86_64 + %endm INIT_XMM ssse3 diff --git a/vpx_dsp/x86/vpx_subpixel_bilinear_ssse3.asm b/vpx_dsp/x86/vpx_subpixel_bilinear_ssse3.asm index 3c8cfd225..538b2129d 100644 --- a/vpx_dsp/x86/vpx_subpixel_bilinear_ssse3.asm +++ b/vpx_dsp/x86/vpx_subpixel_bilinear_ssse3.asm @@ -14,14 +14,14 @@ mov rdx, arg(5) ;filter ptr mov rsi, arg(0) ;src_ptr mov rdi, arg(2) ;output_ptr - mov rcx, 0x0400040 + mov ecx, 0x01000100 movdqa xmm3, [rdx] ;load filters psrldq xmm3, 6 packsswb xmm3, xmm3 pshuflw xmm3, xmm3, 0b ;k3_k4 - movq xmm2, rcx ;rounding + movd xmm2, ecx ;rounding_shift pshufd xmm2, xmm2, 0 movsxd rax, DWORD PTR arg(1) ;pixels_per_line @@ -33,8 +33,7 @@ punpcklbw xmm0, xmm1 pmaddubsw xmm0, xmm3 - paddsw xmm0, xmm2 ;rounding - psraw xmm0, 7 ;shift + pmulhrsw xmm0, xmm2 ;rounding(+64)+shift(>>7) packuswb xmm0, xmm0 ;pack to byte %if %1 @@ -51,7 +50,7 @@ mov rdx, arg(5) ;filter ptr mov rsi, arg(0) ;src_ptr mov rdi, arg(2) ;output_ptr - mov rcx, 0x0400040 + mov ecx, 0x01000100 movdqa xmm7, [rdx] ;load filters psrldq xmm7, 6 @@ -59,7 +58,7 @@ pshuflw xmm7, xmm7, 0b ;k3_k4 punpcklwd xmm7, xmm7 - movq xmm6, rcx ;rounding + movd xmm6, ecx ;rounding_shift pshufd xmm6, xmm6, 0 movsxd rax, DWORD PTR arg(1) ;pixels_per_line @@ -71,8 +70,7 @@ punpcklbw xmm0, xmm1 pmaddubsw xmm0, xmm7 - paddsw xmm0, xmm6 ;rounding - psraw xmm0, 7 ;shift + pmulhrsw xmm0, xmm6 ;rounding(+64)+shift(>>7) packuswb xmm0, xmm0 ;pack back to byte %if %1 @@ -92,10 +90,8 @@ pmaddubsw xmm0, xmm7 pmaddubsw xmm2, xmm7 - paddsw xmm0, xmm6 ;rounding - paddsw xmm2, xmm6 - psraw xmm0, 7 ;shift - psraw xmm2, 7 + pmulhrsw xmm0, xmm6 ;rounding(+64)+shift(>>7) + pmulhrsw xmm2, xmm6 packuswb xmm0, xmm2 ;pack back to byte %if %1 diff --git a/vpx_util/vpx_thread.c b/vpx_util/vpx_thread.c index 0bb0125bd..0132ce6f2 100644 --- a/vpx_util/vpx_thread.c +++ b/vpx_util/vpx_thread.c @@ -10,8 +10,7 @@ // Multi-threaded worker // // Original source: -// http://git.chromium.org/webm/libwebp.git -// 100644 blob 264210ba2807e4da47eb5d18c04cf869d89b9784 src/utils/thread.c +// https://chromium.googlesource.com/webm/libwebp #include #include // for memset() diff --git a/vpx_util/vpx_thread.h b/vpx_util/vpx_thread.h index 2062abd75..f554bbca1 100644 --- a/vpx_util/vpx_thread.h +++ b/vpx_util/vpx_thread.h @@ -10,8 +10,7 @@ // Multi-threaded worker // // Original source: -// http://git.chromium.org/webm/libwebp.git -// 100644 blob 7bd451b124ae3b81596abfbcc823e3cb129d3a38 src/utils/thread.h +// https://chromium.googlesource.com/webm/libwebp #ifndef VPX_THREAD_H_ #define VPX_THREAD_H_ @@ -34,11 +33,26 @@ extern "C" { #include // NOLINT typedef HANDLE pthread_t; typedef CRITICAL_SECTION pthread_mutex_t; + +#if _WIN32_WINNT >= 0x0600 // Windows Vista / Server 2008 or greater +#define USE_WINDOWS_CONDITION_VARIABLE +typedef CONDITION_VARIABLE pthread_cond_t; +#else typedef struct { HANDLE waiting_sem_; HANDLE received_sem_; HANDLE signal_event_; } pthread_cond_t; +#endif // _WIN32_WINNT >= 0x600 + +#ifndef WINAPI_FAMILY_PARTITION +#define WINAPI_PARTITION_DESKTOP 1 +#define WINAPI_FAMILY_PARTITION(x) x +#endif + +#if !WINAPI_FAMILY_PARTITION(WINAPI_PARTITION_DESKTOP) +#define USE_CREATE_THREAD +#endif //------------------------------------------------------------------------------ // simplistic pthread emulation layer @@ -47,16 +61,30 @@ typedef struct { #define THREADFN unsigned int __stdcall #define THREAD_RETURN(val) (unsigned int)((DWORD_PTR)val) +#if _WIN32_WINNT >= 0x0501 // Windows XP or greater +#define WaitForSingleObject(obj, timeout) \ + WaitForSingleObjectEx(obj, timeout, FALSE /*bAlertable*/) +#endif + static INLINE int pthread_create(pthread_t* const thread, const void* attr, unsigned int (__stdcall *start)(void*), void* arg) { (void)attr; +#ifdef USE_CREATE_THREAD + *thread = CreateThread(NULL, /* lpThreadAttributes */ + 0, /* dwStackSize */ + start, + arg, + 0, /* dwStackSize */ + NULL); /* lpThreadId */ +#else *thread = (pthread_t)_beginthreadex(NULL, /* void *security */ 0, /* unsigned stack_size */ start, arg, 0, /* unsigned initflag */ NULL); /* unsigned *thrdaddr */ +#endif if (*thread == NULL) return 1; SetThreadPriority(*thread, THREAD_PRIORITY_ABOVE_NORMAL); return 0; @@ -72,7 +100,11 @@ static INLINE int pthread_join(pthread_t thread, void** value_ptr) { static INLINE int pthread_mutex_init(pthread_mutex_t *const mutex, void* mutexattr) { (void)mutexattr; +#if _WIN32_WINNT >= 0x0600 // Windows Vista / Server 2008 or greater + InitializeCriticalSectionEx(mutex, 0 /*dwSpinCount*/, 0 /*Flags*/); +#else InitializeCriticalSection(mutex); +#endif return 0; } @@ -98,15 +130,22 @@ static INLINE int pthread_mutex_destroy(pthread_mutex_t *const mutex) { // Condition static INLINE int pthread_cond_destroy(pthread_cond_t *const condition) { int ok = 1; +#ifdef USE_WINDOWS_CONDITION_VARIABLE + (void)condition; +#else ok &= (CloseHandle(condition->waiting_sem_) != 0); ok &= (CloseHandle(condition->received_sem_) != 0); ok &= (CloseHandle(condition->signal_event_) != 0); +#endif return !ok; } static INLINE int pthread_cond_init(pthread_cond_t *const condition, void* cond_attr) { (void)cond_attr; +#ifdef USE_WINDOWS_CONDITION_VARIABLE + InitializeConditionVariable(condition); +#else condition->waiting_sem_ = CreateSemaphore(NULL, 0, MAX_DECODE_THREADS, NULL); condition->received_sem_ = CreateSemaphore(NULL, 0, MAX_DECODE_THREADS, NULL); condition->signal_event_ = CreateEvent(NULL, FALSE, FALSE, NULL); @@ -116,11 +155,15 @@ static INLINE int pthread_cond_init(pthread_cond_t *const condition, pthread_cond_destroy(condition); return 1; } +#endif return 0; } static INLINE int pthread_cond_signal(pthread_cond_t *const condition) { int ok = 1; +#ifdef USE_WINDOWS_CONDITION_VARIABLE + WakeConditionVariable(condition); +#else if (WaitForSingleObject(condition->waiting_sem_, 0) == WAIT_OBJECT_0) { // a thread is waiting in pthread_cond_wait: allow it to be notified ok = SetEvent(condition->signal_event_); @@ -129,12 +172,16 @@ static INLINE int pthread_cond_signal(pthread_cond_t *const condition) { ok &= (WaitForSingleObject(condition->received_sem_, INFINITE) != WAIT_OBJECT_0); } +#endif return !ok; } static INLINE int pthread_cond_wait(pthread_cond_t *const condition, pthread_mutex_t *const mutex) { int ok; +#ifdef USE_WINDOWS_CONDITION_VARIABLE + ok = SleepConditionVariableCS(condition, mutex, INFINITE); +#else // note that there is a consumer available so the signal isn't dropped in // pthread_cond_signal if (!ReleaseSemaphore(condition->waiting_sem_, 1, NULL)) @@ -145,6 +192,7 @@ static INLINE int pthread_cond_wait(pthread_cond_t *const condition, WAIT_OBJECT_0); ok &= ReleaseSemaphore(condition->received_sem_, 1, NULL); pthread_mutex_lock(mutex); +#endif return !ok; } #elif defined(__OS2__)