DRC [Tue, 27 Jan 2015 20:59:16 +0000 (20:59 +0000)]
Oops. Need to set the alpha channel when using TYPE_4BYTE_ABGR*. This has no bearing on the actual tests, but it prevents the PNG pre-encode reference images for those tests from being blank.
DRC [Wed, 21 Jan 2015 17:42:28 +0000 (17:42 +0000)]
Oops. The MIPS SIMD implementations of h2v1 and h2v2 upsampling were not checking for DSPr2 support, so running 'djpeg -nosmooth' on a non-DSPr2-enabled platform caused an "illegal instruction" error.
DRC [Tue, 20 Jan 2015 10:33:32 +0000 (10:33 +0000)]
Introduce fast paths to speed up NULL color conversion somewhat, particularly when using 64-bit code; on the decompression side, the "slow path" also now use an approach similar to that of the compression side (with the component loop outside of the column loop rather than inside.) This is faster when using 32-bit code.
DRC [Fri, 16 Jan 2015 06:53:36 +0000 (06:53 +0000)]
Make the floating point regression tests optional. It has been known for quite some time that these tests do not always generate the same results unless there is full SIMD coverage of the floating point algorithms in libjpeg-turbo. Further research reveals that there are basically three expected results: the results from our SSE SIMD extensions (which are slightly more accurate than the C code), results from the C code when running on a 32-bit FPU (or when using SSE instructions on an x86-64 CPU, which is the default with GCC), and results from the C code when running on a 64-bit FPU (which presumably uses double-precision arithmetic by default.) There is basically no way to determine which type of math will be used prior to run time, so it's best to just let the developers specify which result they expect on their particular system.
DRC [Fri, 16 Jan 2015 06:45:54 +0000 (06:45 +0000)]
In the process of developing the AltiVec extensions, it was discovered that the normal regression tests aren't sufficient to test the behavior of the library with very small image sizes and when compressing from/decompressing to a subregion of a larger image buffer. Thus, an additional regression test was added that takes advantage of the tiled compression/decompression feature in tjbench. This is being back-ported to the 1.4.x branch primarily to verify that there are no lingering issues in the existing SIMD extensions.
DRC [Fri, 16 Jan 2015 06:37:03 +0000 (06:37 +0000)]
Add separate pseudo-targets for the TurboJPEG and libjpeg regression tests, for those times when you just don't want to sit through 11 iterations of TJUnitTest to find out that your algorithm is broken.
DRC [Fri, 16 Jan 2015 06:29:52 +0000 (06:29 +0000)]
Add the ability to benchmark YCCK JPEG compression/decompression. This is particularly useful since that is the only way to test the performance of the "plain" upsampling routines, which are accelerated on some platforms.
DRC [Fri, 16 Jan 2015 03:13:16 +0000 (03:13 +0000)]
Revert r1506 (we actually are generating columns with the IDCT, so the naming makes sense in retrospect); further de-confusification in the forward DCT
DRC [Thu, 15 Jan 2015 08:51:31 +0000 (08:51 +0000)]
De-confusify the variable names a bit -- "out" represents the output of the IDCT kernel, so use "final" to represent the packed data that will be stored to memory.
DRC [Wed, 14 Jan 2015 19:39:01 +0000 (19:39 +0000)]
Add the ability to benchmark YCCK JPEG compression/decompression. This is particularly useful since that is the only way to test the performance of the "plain" upsampling routines, which are accelerated on some platforms.
DRC [Wed, 14 Jan 2015 10:45:31 +0000 (10:45 +0000)]
Fix bugs in the AltiVec fancy upsampling routines uncovered during additional testing with small image sizes. Since the input width is half the output width, the upsampler should only write a second 16-byte chuck if there are more than 8 input columns left. Additionally, if the width is < 16, then we need to insert a dummy sample (the SSE2 code does this as well, but I neglected to port that portion of the code for some reason.)
DRC [Wed, 14 Jan 2015 08:42:29 +0000 (08:42 +0000)]
In the process of developing the AltiVec extensions, it was discovered that the normal regression tests aren't sufficient to test the behavior of the library with very small image sizes and when compressing from/decompressing to a subregion of a larger image buffer. Thus, an additional regression test was added that takes advantage of the tiled compression/decompression feature in tjbench.
DRC [Wed, 14 Jan 2015 08:31:54 +0000 (08:31 +0000)]
Fix a bug in the AltiVec downsampling routines uncovered during additional testing with small image sizes. Since the output width is half the input width, the downsampler should only read a second 16-byte chunk if there are more than 8 output columns left.
DRC [Sun, 11 Jan 2015 06:34:47 +0000 (06:34 +0000)]
Use intrinsics for loading aligned data in the IDCT functions. This has no effect on performance, but it makes it more obvious what that code is doing.
DRC [Sat, 10 Jan 2015 11:32:36 +0000 (11:32 +0000)]
Overhaul the AltiVec vector loading code in the compression-side colorspace conversion routines. The existing code was sometimes overreading the source buffer (at least according to valgrind), and it was necessary to increase the complexity of the code in order to prevent this without significantly compromising performance.
DRC [Mon, 22 Dec 2014 16:04:17 +0000 (16:04 +0000)]
Use intrinsics for loading/storing data in the DCT/IDCT functions. This has no effect on the performance of the aligned loads/stores, but it makes it more obvious what that code is doing. Using intrinsics for the unaligned stores in the inverse DCT functions increases overall decompression performance by 1-2%.
DRC [Sat, 20 Dec 2014 01:16:26 +0000 (01:16 +0000)]
Use macros to allocate constants statically, rather than reading them from a table using vec_splat*(). This improves code readability and probably improves performance a bit as well.
DRC [Fri, 19 Dec 2014 15:36:39 +0000 (15:36 +0000)]
Modify the ARM64 assembly file so that it uses only syntax that the clang assembler in XCode 5.x can understand. These changes should all be cosmetic in nature-- they do not change the meaning or readability of the code nor the ability to build it for Linux. Actually, the code is now more in compliance with the ARM64 programming manual. In addition to these changes, there were a couple of instructions that clang simply doesn't support, so gas-preprocessor.pl was modified so that it now converts those into equivalent instructions that clang can handle.
DRC [Fri, 19 Dec 2014 10:44:09 +0000 (10:44 +0000)]
Add iOS architectures to the shared libraries generated by the Mac/iOS packaging system. I have no idea how useful this is for "standard" iOS application development, but it is useful in a jailbreak environment, and iOS 8 supposedly allows shared libs in "official" apps as well.
DRC [Thu, 18 Dec 2014 09:49:39 +0000 (09:49 +0000)]
Further cleanup of the AltiVec forward DCT code:
-- Use macros to represent the fast FDCT constants, to facilitate comparing the AltiVec implementation of the algorithm with the SSE2 implementation.
-- Rename slow FDCT constants for consistency.
-- Use vec_sra() in all cases in the slow FDCT code. The SSE2 implementation uses psraw, which is an arithmetic shift, so we need to do likewise with AltiVec. Using vec_sr() hasn't caused any problems yet, but it is conceivable that it might cause different behavior in certain corner cases.
DRC [Wed, 17 Dec 2014 08:04:39 +0000 (08:04 +0000)]
AltiVec SIMD implementation of slow integer forward DCT; Clean up fast integer forward DCT code so that it is easier to see how it derives from the SSE2 code and to make it play more nicely with the slow FDCT code.
DRC [Tue, 25 Nov 2014 09:48:15 +0000 (09:48 +0000)]
Restore the JPP() and JMETHOD() macros. Even though libjpeg-turbo doesn't use them anymore, other software apparently does:
https://bugzilla.redhat.com/show_bug.cgi?id=1164815
https://bugs.kde.org/show_bug.cgi?id=340944
https://bugzilla.mozilla.org/show_bug.cgi?id=1093615
DRC [Sat, 22 Nov 2014 22:09:30 +0000 (22:09 +0000)]
Fix Huffman local buffer overrun discovered by Debian developers when attempting to transform a junk image using ImageMagick:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=768369
DRC [Wed, 19 Nov 2014 00:55:28 +0000 (00:55 +0000)]
Sometimes the sampling factors in grayscale images can be > 1 (for instance, if compressing using 'cjpeg -sample 2x2 -grayscale'.) Technically, sampling factors have no meaning with grayscale JPEGs, and the libjpeg decompressor ignores them in that case. Thus, the TurboJPEG decompressor should ignore them as well.
The AltiVec code actually works on 32-bit PowerPC platforms as well, so change the "powerpc64" token to "powerpc". Also clean up the shift code, which wasn't building properly on OS X.
When building libjpeg-turbo on Un*x systems, INT32 is usually typedef'ed to long, not int, so we need to specify an int pointer when doing a 4-byte write to the RGB565 output buffer. On little endian systems, this doesn't matter, but when you write a 32-bit int to a 64-bit long pointer address on a big endian system, you are writing to the upper 4 bytes, not the lower 4 bytes. NOTE: this will probably break on big endian systems that use 16-bit ints (are there any of those still around?)
This patch also removes an unneeded macro from jdmerge.c.
When building libjpeg-turbo on Un*x systems, INT32 is usually typedef'ed to long, not int, so we need to specify an int pointer when doing a 4-byte write to the RGB565 output buffer. On little endian systems, this doesn't matter, but when you write a 32-bit int to a 64-bit long pointer address on a big endian system, you are writing to the upper 4 bytes, not the lower 4 bytes. NOTE: this will probably break on big endian systems that use 16-bit ints (are there any of those still around?)
This patch also removes an unneeded macro from jdmerge.c.
DRC [Sat, 30 Aug 2014 20:37:50 +0000 (20:37 +0000)]
Fix issues with RGB565 color conversion on big endian machines. The RGB565 routines are now abstracted in a separate file, with separate little-endian and big-endian versions defined at compile time through the use of macros (this is similar to how the colorspace extension routines work.) This allows big-endian machines to take advantage of the same performance optimizations as little-endian machines, and it retains the performance on little-endian machines, since the conditional branch for endianness is at a very coarse-grained level.
DRC [Fri, 29 Aug 2014 01:49:59 +0000 (01:49 +0000)]
Fix several mathematical issues discovered in the ARM64 NEON code while running the extended regression tests introduced in r1267. Specific comments can be found in the original patches:
https://sourceforge.net/p/libjpeg-turbo/patches/64/
DRC [Mon, 25 Aug 2014 15:26:09 +0000 (15:26 +0000)]
Reformat code per Siarhei's original patch (to clearly indicate that the offset instructions are completely independent) and add Siarhei as an individual author (he no longer works for Nokia.)