granicus.if.org Git - libjpeg-turbo/commit

author	DRC <information@libjpeg-turbo.org>
	Mon, 19 Feb 2018 20:16:46 +0000 (14:16 -0600)
committer	DRC <information@libjpeg-turbo.org>
	Tue, 27 Feb 2018 16:40:32 +0000 (10:40 -0600)
commit	dc89ee09f12c5541f391fc89db8458db6cae0950
tree	a328159773ffc0427aa4238c1d2b56213f123408	tree \| snapshot
parent	9cd4a15c8a4b716bfa31f889455197e5bffbe7ab	commit \| diff

64-bit AVX2 implementation of fast integer FDCT

Still not faster than SSE2.  Improving upon SSE2 performance will
probably require restructuring the algorithm to combine the various
multiply/add operations, but I'm not sure how to do that without
introducing further roundoff error.  Left as an exercise for the reader.

The IFAST FDCT algorithm is sort of a legacy feature anyhow.  Even with
SSE2 instructions, the ISLOW FDCT was almost as fast as the IFAST FDCT.
Since the ISLOW FDCT has been accelerated with AVX2 instructions, it is
now about the same speed as the IFAST FDCT on AVX2-equipped CPUs.