From: DRC Date: Wed, 13 Jun 2012 01:23:09 +0000 (+0000) Subject: Eliminate the use of the MASKMOVDQU instruction, to speed up decompression performanc... X-Git-Tag: 1.2.90~58 X-Git-Url: https://granicus.if.org/sourcecode?a=commitdiff_plain;h=66fe68b0b28ec18000f6a863bb9ad59c56ce81d7;p=libjpeg-turbo Eliminate the use of the MASKMOVDQU instruction, to speed up decompression performance by 10x on AMD Bobcat embedded processors (and ~5% on AMD desktop processors.) git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@836 632fc199-4ca6-4c93-a231-07263d6284db --- 66fe68b0b28ec18000f6a863bb9ad59c56ce81d7 diff --cc ChangeLog.txt index d4808e8,e80ac6c..5453869 --- a/ChangeLog.txt +++ b/ChangeLog.txt @@@ -34,9 -19,13 +34,16 @@@ calling conventions images (specifically, images in which the component count was erroneously set to a large value) would cause libjpeg-turbo to segfault. -[5] Worked around a severe performance issue with "Bobcat" (AMD Embedded APU) +[8] Extended the TurboJPEG Java API so that it can be used to decompress a +JPEG image into an arbitrary position in a large output buffer. + ++[9] Worked around a severe performance issue with "Bobcat" (AMD Embedded APU) + processors. The MASKMOVDQU instruction, which was used by the libjpeg-turbo + SSE2 SIMD code, is apparently implemented in microcode on AMD processors, and + it is painfully slow on Bobcat processors in particular. Eliminating the use + of this instruction improved performance by an order of magnitude on Bobcat + processors and by a small amount (typically 5%) on AMD desktop processors. + 1.2.0 =====