]> granicus.if.org Git - libvpx/commit
Rework idct8x8_10 SSE2 implementation
authorJingning Han <jingning@google.com>
Thu, 2 Jan 2014 23:33:38 +0000 (15:33 -0800)
committerJingning Han <jingning@google.com>
Fri, 3 Jan 2014 20:04:09 +0000 (12:04 -0800)
commit1bb11781e2be1a047e5f38a5cd3d6f5c8d0a107b
tree18e9782816ca8d66db9201c65d53012ae05c75d5
parent3bcece9578a9d5a536cdb3d11cb897f25fdaeea4
Rework idct8x8_10 SSE2 implementation

This commit optimizes the SSE2 implmentation of idct8x8_10. It exploits
the fact that only top-left 4x4 block contains non-zero coefficients,
and hence reduces the instructions needed.

The runtime of idct8x8_10_sse2 goes down from 216 to 198 CPU cycles,
estimated by averaging over 100000 runs. For pedestrian_area_1080p 300
frames coded at 4000kbps, the average decoding speed goes up from
79.3 fps to 79.7 fps.

Change-Id: I6d277bbaa3ec9e1562667906975bae06904cb180
vp9/common/x86/vp9_idct_intrin_sse2.c