DRC [Sat, 20 Jun 2015 16:27:51 +0000 (16:27 +0000)]
The Linux build machine has been upgraded to autoconf 2.69, automake 1.15, m4 1.4.17, and libtool 2.4.6, so it is no longer necessary to recommend running autoreconf prior to building the source.
DRC [Mon, 8 Jun 2015 18:31:34 +0000 (18:31 +0000)]
Now that the TurboJPEG API is reporting libjpeg warnings as errors, an "Invalid SOS parameters for sequential JPEG" warning surfaced in tjDecodeYUV*(). This was caused by the Se member of jpeg_decompress_struct being set to 0 (it is normally set to a non-zero value when the start-of-scan markers are read, but there are no SOS markers in this case, because we're not actually decompressing a JPEG file.)
DRC [Mon, 8 Jun 2015 17:41:34 +0000 (17:41 +0000)]
Fix a segfault that occured in the MIPS DSPr2 fancy upsampling routine when downsampled_width==3. Because the DSPr2 code unrolls the loop for the middle columns (refer to jdsample.c), it has the effect of performing two column iterations, and that only works properly if the number of columns (minus the first and last) is >= 2. For the specific case of downsampled_width==3, this patch skips to the second iteration of the unrolled column loop.
DRC [Mon, 1 Jun 2015 19:22:41 +0000 (19:22 +0000)]
If a warning (such as "Premature end of JPEG file") is triggered in the underlying libjpeg API, make sure that the TurboJPEG API function returns -1. Unlike errors, however, libjpeg warnings do not make the TurboJPEG functions abort.
DRC [Sun, 17 May 2015 15:56:18 +0000 (15:56 +0000)]
Back out r1555 and r1548. Using setenv() didn't fix the iOS simulator issue. It just replaced an undefined _putenv$UNIX2003 symbol with an undefined _setenv$UNIX2003 symbol. The correct solution seems to be to use -D_NONSTD_SOURCE when generating our official builds.
DRC [Sat, 16 May 2015 04:18:21 +0000 (04:18 +0000)]
Fix the Windows build. I remember now why I used putenv() originally-- because Windows doesn't have setenv(). We could use _putenv_s(), but older versions of MinGW don't have that either. Fortunately, since all of the environment values we're setting in turbojpeg.c are static, we can just map setenv() to putenv() using a macro. NOTE: we still have to use _putenv_s() in turbojpeg-jni.c, but at least people who may need to build with an older version of MinGW can still do so by disabling the Java build.
DRC [Fri, 15 May 2015 19:09:44 +0000 (19:09 +0000)]
__WORDSIZE doesn't seem to be available on platforms other than Mac or Linux, and best practices are for user-level code not to rely on it anyhow, since it's meant to be an internal macro. Fortunately, autoconf already has a way of determining the word size at configure time, so it can be passed into the compiler. This should work on any platform and has been tested on all of the Un*x platforms we support (Linux, Mac, FreeBSD, Solaris.)
DRC [Fri, 15 May 2015 18:23:59 +0000 (18:23 +0000)]
Unless you define _ANSI_SOURCE, then putenv() on Mac is renamed to putenv$UNIX2003(), and this causes problems when trying to link an i386 iOS application (for the simulator) against the TurboJPEG static library. It's easiest to just use setenv() instead.
DRC [Wed, 6 May 2015 22:41:12 +0000 (22:41 +0000)]
Fix a bug in the 64-bit Huffman encoder that Google discovered when encoding some very specific (and proprietary) aerial images using quality=98, an optimized Huffman table, and the ISLOW DCT. These images were causing the Huffman bit buffer to overflow, because the code for encoding the DC coefficient was using the equivalent of the 32-bit version of EMIT_BITS(). Thus, when 64-bit code was used, the DC coefficient code was not properly checking how many bits were in the buffer before attempting to add more bits to it. This issue appears to have existed in all versions of libjpeg-turbo.
DRC [Thu, 19 Mar 2015 19:27:40 +0000 (19:27 +0000)]
Allow the executables and libraries outside of the sharedlib/ directory to be linked against msvcr*.dll instead of libcmt*.lib. This is reported to be necessary when building libjpeg-turbo for use with C#.
DRC [Mon, 23 Feb 2015 19:19:40 +0000 (19:19 +0000)]
Surround the usage of getenv() in the TurboJPEG API with #ifndef NO_GETENV so that developers can add -DNO_GETENV to the C flags when building for platforms that don't have getenv(). Currently this is known to be necessary when building for Windows Phone.
DRC [Mon, 23 Feb 2015 19:06:44 +0000 (19:06 +0000)]
If libjpeg-turbo is configured with a non-default prefix, such as /usr, then use the docdir variable defined by autoconf 2.60 and later, if available. This will, for instance, install the documentation under /usr/share/doc/libjpeg-turbo by default if prefix=/usr, unless docdir is overridden. When using earlier versions of autoconf, docdir is set to ${datadir}/doc, as it always has been.
DRC [Mon, 23 Feb 2015 19:03:29 +0000 (19:03 +0000)]
Enable silent build rules for the NASM objects, if the source is configured with automake 1.11 or later. NOTE: the build still spits out "error: ignoring unknown tag NASM" for each object, but unfortunately, if we remove "--tag NASM" from the command line, the build breaks under older versions of automake (it aborts with "unable to infer tagged configuration.")
DRC [Tue, 27 Jan 2015 20:59:16 +0000 (20:59 +0000)]
Oops. Need to set the alpha channel when using TYPE_4BYTE_ABGR*. This has no bearing on the actual tests, but it prevents the PNG pre-encode reference images for those tests from being blank.
DRC [Wed, 21 Jan 2015 17:42:28 +0000 (17:42 +0000)]
Oops. The MIPS SIMD implementations of h2v1 and h2v2 upsampling were not checking for DSPr2 support, so running 'djpeg -nosmooth' on a non-DSPr2-enabled platform caused an "illegal instruction" error.
DRC [Tue, 20 Jan 2015 10:33:32 +0000 (10:33 +0000)]
Introduce fast paths to speed up NULL color conversion somewhat, particularly when using 64-bit code; on the decompression side, the "slow path" also now use an approach similar to that of the compression side (with the component loop outside of the column loop rather than inside.) This is faster when using 32-bit code.
DRC [Fri, 16 Jan 2015 06:53:36 +0000 (06:53 +0000)]
Make the floating point regression tests optional. It has been known for quite some time that these tests do not always generate the same results unless there is full SIMD coverage of the floating point algorithms in libjpeg-turbo. Further research reveals that there are basically three expected results: the results from our SSE SIMD extensions (which are slightly more accurate than the C code), results from the C code when running on a 32-bit FPU (or when using SSE instructions on an x86-64 CPU, which is the default with GCC), and results from the C code when running on a 64-bit FPU (which presumably uses double-precision arithmetic by default.) There is basically no way to determine which type of math will be used prior to run time, so it's best to just let the developers specify which result they expect on their particular system.
DRC [Fri, 16 Jan 2015 06:45:54 +0000 (06:45 +0000)]
In the process of developing the AltiVec extensions, it was discovered that the normal regression tests aren't sufficient to test the behavior of the library with very small image sizes and when compressing from/decompressing to a subregion of a larger image buffer. Thus, an additional regression test was added that takes advantage of the tiled compression/decompression feature in tjbench. This is being back-ported to the 1.4.x branch primarily to verify that there are no lingering issues in the existing SIMD extensions.
DRC [Fri, 16 Jan 2015 06:37:03 +0000 (06:37 +0000)]
Add separate pseudo-targets for the TurboJPEG and libjpeg regression tests, for those times when you just don't want to sit through 11 iterations of TJUnitTest to find out that your algorithm is broken.
DRC [Fri, 16 Jan 2015 06:29:52 +0000 (06:29 +0000)]
Add the ability to benchmark YCCK JPEG compression/decompression. This is particularly useful since that is the only way to test the performance of the "plain" upsampling routines, which are accelerated on some platforms.
DRC [Fri, 16 Jan 2015 03:13:16 +0000 (03:13 +0000)]
Revert r1506 (we actually are generating columns with the IDCT, so the naming makes sense in retrospect); further de-confusification in the forward DCT
DRC [Thu, 15 Jan 2015 08:51:31 +0000 (08:51 +0000)]
De-confusify the variable names a bit -- "out" represents the output of the IDCT kernel, so use "final" to represent the packed data that will be stored to memory.
DRC [Wed, 14 Jan 2015 19:39:01 +0000 (19:39 +0000)]
Add the ability to benchmark YCCK JPEG compression/decompression. This is particularly useful since that is the only way to test the performance of the "plain" upsampling routines, which are accelerated on some platforms.
DRC [Wed, 14 Jan 2015 10:45:31 +0000 (10:45 +0000)]
Fix bugs in the AltiVec fancy upsampling routines uncovered during additional testing with small image sizes. Since the input width is half the output width, the upsampler should only write a second 16-byte chuck if there are more than 8 input columns left. Additionally, if the width is < 16, then we need to insert a dummy sample (the SSE2 code does this as well, but I neglected to port that portion of the code for some reason.)
DRC [Wed, 14 Jan 2015 08:42:29 +0000 (08:42 +0000)]
In the process of developing the AltiVec extensions, it was discovered that the normal regression tests aren't sufficient to test the behavior of the library with very small image sizes and when compressing from/decompressing to a subregion of a larger image buffer. Thus, an additional regression test was added that takes advantage of the tiled compression/decompression feature in tjbench.
DRC [Wed, 14 Jan 2015 08:31:54 +0000 (08:31 +0000)]
Fix a bug in the AltiVec downsampling routines uncovered during additional testing with small image sizes. Since the output width is half the input width, the downsampler should only read a second 16-byte chunk if there are more than 8 output columns left.
DRC [Sun, 11 Jan 2015 06:34:47 +0000 (06:34 +0000)]
Use intrinsics for loading aligned data in the IDCT functions. This has no effect on performance, but it makes it more obvious what that code is doing.
DRC [Sat, 10 Jan 2015 11:32:36 +0000 (11:32 +0000)]
Overhaul the AltiVec vector loading code in the compression-side colorspace conversion routines. The existing code was sometimes overreading the source buffer (at least according to valgrind), and it was necessary to increase the complexity of the code in order to prevent this without significantly compromising performance.
DRC [Mon, 22 Dec 2014 16:04:17 +0000 (16:04 +0000)]
Use intrinsics for loading/storing data in the DCT/IDCT functions. This has no effect on the performance of the aligned loads/stores, but it makes it more obvious what that code is doing. Using intrinsics for the unaligned stores in the inverse DCT functions increases overall decompression performance by 1-2%.
DRC [Sat, 20 Dec 2014 01:16:26 +0000 (01:16 +0000)]
Use macros to allocate constants statically, rather than reading them from a table using vec_splat*(). This improves code readability and probably improves performance a bit as well.
DRC [Fri, 19 Dec 2014 15:36:39 +0000 (15:36 +0000)]
Modify the ARM64 assembly file so that it uses only syntax that the clang assembler in XCode 5.x can understand. These changes should all be cosmetic in nature-- they do not change the meaning or readability of the code nor the ability to build it for Linux. Actually, the code is now more in compliance with the ARM64 programming manual. In addition to these changes, there were a couple of instructions that clang simply doesn't support, so gas-preprocessor.pl was modified so that it now converts those into equivalent instructions that clang can handle.
DRC [Fri, 19 Dec 2014 10:44:09 +0000 (10:44 +0000)]
Add iOS architectures to the shared libraries generated by the Mac/iOS packaging system. I have no idea how useful this is for "standard" iOS application development, but it is useful in a jailbreak environment, and iOS 8 supposedly allows shared libs in "official" apps as well.
DRC [Thu, 18 Dec 2014 09:49:39 +0000 (09:49 +0000)]
Further cleanup of the AltiVec forward DCT code:
-- Use macros to represent the fast FDCT constants, to facilitate comparing the AltiVec implementation of the algorithm with the SSE2 implementation.
-- Rename slow FDCT constants for consistency.
-- Use vec_sra() in all cases in the slow FDCT code. The SSE2 implementation uses psraw, which is an arithmetic shift, so we need to do likewise with AltiVec. Using vec_sr() hasn't caused any problems yet, but it is conceivable that it might cause different behavior in certain corner cases.
DRC [Wed, 17 Dec 2014 08:04:39 +0000 (08:04 +0000)]
AltiVec SIMD implementation of slow integer forward DCT; Clean up fast integer forward DCT code so that it is easier to see how it derives from the SSE2 code and to make it play more nicely with the slow FDCT code.