From: Yi Luo Date: Wed, 23 Mar 2016 18:30:39 +0000 (+0000) Subject: Merge "Highbd fht4x4 SSE4.1 optimization for DCT_DCT mode - Setup function vp10_highb... X-Git-Url: https://granicus.if.org/sourcecode?a=commitdiff_plain;h=deb33056d127f0c178f3415caa7032497ba14f61;p=libvpx Merge "Highbd fht4x4 SSE4.1 optimization for DCT_DCT mode - Setup function vp10_highbd_fht4x4_sse4_1 for highbd SSE4.1 intrinsics optimization. - Wrote SSE4.1 functions: load_buffer_4x4(), write_buffer_4x4(), and fdct4x4_sse4_1(). - Used logic right shift to avoid coeff memory write/read. - Turned on vp10_highbd_fht4x4_sse4_1 for DCT_DCT mode only. - Improved overall encoding performance >2.3% for 50 frames sequence, park_joy_1080p_12.y4m, in which, --input-bit-depth=12, --bit-depth=12, 50 frames. - Unit test passed." into nextgenv2 --- deb33056d127f0c178f3415caa7032497ba14f61 diff --cc vp10/encoder/hybrid_fwd_txfm.c index 785fef088,58eb7483a..e9dd70be6 --- a/vp10/encoder/hybrid_fwd_txfm.c +++ b/vp10/encoder/hybrid_fwd_txfm.c @@@ -213,14 -227,17 +215,14 @@@ void vp10_highbd_fwd_txfm_4x4(const int case FLIPADST_FLIPADST: case ADST_FLIPADST: case FLIPADST_ADST: - vp10_highbd_fht4x4(src_diff, coeff, diff_stride, tx_type); + vp10_highbd_fht4x4_c(src_diff, coeff, diff_stride, tx_type); break; - case DST_DST: - case DCT_DST: - case DST_DCT: - case DST_ADST: - case ADST_DST: - case DST_FLIPADST: - case FLIPADST_DST: - case H_DCT: case V_DCT: + case H_DCT: + case V_ADST: + case H_ADST: + case V_FLIPADST: + case H_FLIPADST: vp10_highbd_fht4x4_c(src_diff, coeff, diff_stride, tx_type); break; case IDTX: