]> granicus.if.org Git - libvpx/commit
change vp10_fwd_txfm2d_#x#_sse2 to vp10_fwd_txfm2d_#x#_sse4_1
authorAngie Chiang <angiebird@google.com>
Thu, 24 Mar 2016 22:34:27 +0000 (15:34 -0700)
committerAngie Chiang <angiebird@google.com>
Wed, 30 Mar 2016 22:25:26 +0000 (15:25 -0700)
commit25520d8dc3a0c295115a4f22a0965b7561dcbefa
treea4e93c20dfdc008b46207dbb73318bd2f395deb8
parentc75f64780b714ff5a7946d14a89983e4ab59c78f
change vp10_fwd_txfm2d_#x#_sse2 to vp10_fwd_txfm2d_#x#_sse4_1

The speed performance for running 20k times  is as follows

Notice that the vp10_highbd_fdct#x#_sse2 version is
16-bit version plus range check

The rest are 32-bit version

vp10_fwd_txfm2d_4x4_c (2 ms)
vp10_fwd_txfm2d_8x8_c (9 ms)
vp10_fwd_txfm2d_16x16_c (45 ms)
vp10_fwd_txfm2d_32x32_c (233 ms)

vp10_fwd_txfm2d_4x4_sse4_1 (2 ms)
vp10_fwd_txfm2d_8x8_sse4_1 (3 ms)
vp10_fwd_txfm2d_16x16_sse4_1 (16 ms)
vp10_fwd_txfm2d_32x32_sse4_1 (80 ms)

vp10_highbd_fdct4x4_c (1 ms)
vp10_highbd_fdct8x8_c (3 ms)
vp10_highbd_fdct16x16_c (17 ms)
highbd_fdct32x32_c (160 ms)

vp10_highbd_fdct4x4_sse2 (0 ms)
vp10_highbd_fdct8x8_sse2 (2 ms)
vp10_highbd_fdct16x16_sse2 (8 ms)
highbd_fdct32x32_sse2 (105 ms)

Change-Id: I24daf1e0d4d66e91e4ce61ef71cefa7b70ee90ce
test/test.mk
test/vp10_fwd_txfm2d_sse4_test.cc [moved from test/vp10_fwd_txfm2d_sse2_test.cc with 76% similarity]
vp10/common/vp10_rtcd_defs.pl
vp10/common/x86/vp10_fwd_txfm1d_sse4.c [moved from vp10/common/x86/vp10_fwd_txfm1d_sse2.c with 74% similarity]
vp10/common/x86/vp10_fwd_txfm2d_sse4.c [moved from vp10/common/x86/vp10_fwd_txfm2d_sse2.c with 68% similarity]
vp10/common/x86/vp10_txfm1d_sse2.h [deleted file]
vp10/common/x86/vp10_txfm1d_sse4.h [new file with mode: 0644]
vp10/vp10_common.mk