DRC [Thu, 19 Mar 2015 19:27:40 +0000 (19:27 +0000)]
Allow the executables and libraries outside of the sharedlib/ directory to be linked against msvcr*.dll instead of libcmt*.lib. This is reported to be necessary when building libjpeg-turbo for use with C#.
DRC [Mon, 23 Feb 2015 19:19:40 +0000 (19:19 +0000)]
Surround the usage of getenv() in the TurboJPEG API with #ifndef NO_GETENV so that developers can add -DNO_GETENV to the C flags when building for platforms that don't have getenv(). Currently this is known to be necessary when building for Windows Phone.
DRC [Mon, 23 Feb 2015 19:06:44 +0000 (19:06 +0000)]
If libjpeg-turbo is configured with a non-default prefix, such as /usr, then use the docdir variable defined by autoconf 2.60 and later, if available. This will, for instance, install the documentation under /usr/share/doc/libjpeg-turbo by default if prefix=/usr, unless docdir is overridden. When using earlier versions of autoconf, docdir is set to ${datadir}/doc, as it always has been.
DRC [Mon, 23 Feb 2015 19:03:29 +0000 (19:03 +0000)]
Enable silent build rules for the NASM objects, if the source is configured with automake 1.11 or later. NOTE: the build still spits out "error: ignoring unknown tag NASM" for each object, but unfortunately, if we remove "--tag NASM" from the command line, the build breaks under older versions of automake (it aborts with "unable to infer tagged configuration.")
DRC [Tue, 27 Jan 2015 20:59:16 +0000 (20:59 +0000)]
Oops. Need to set the alpha channel when using TYPE_4BYTE_ABGR*. This has no bearing on the actual tests, but it prevents the PNG pre-encode reference images for those tests from being blank.
DRC [Wed, 21 Jan 2015 17:42:28 +0000 (17:42 +0000)]
Oops. The MIPS SIMD implementations of h2v1 and h2v2 upsampling were not checking for DSPr2 support, so running 'djpeg -nosmooth' on a non-DSPr2-enabled platform caused an "illegal instruction" error.
DRC [Tue, 20 Jan 2015 10:33:32 +0000 (10:33 +0000)]
Introduce fast paths to speed up NULL color conversion somewhat, particularly when using 64-bit code; on the decompression side, the "slow path" also now use an approach similar to that of the compression side (with the component loop outside of the column loop rather than inside.) This is faster when using 32-bit code.
DRC [Fri, 16 Jan 2015 06:53:36 +0000 (06:53 +0000)]
Make the floating point regression tests optional. It has been known for quite some time that these tests do not always generate the same results unless there is full SIMD coverage of the floating point algorithms in libjpeg-turbo. Further research reveals that there are basically three expected results: the results from our SSE SIMD extensions (which are slightly more accurate than the C code), results from the C code when running on a 32-bit FPU (or when using SSE instructions on an x86-64 CPU, which is the default with GCC), and results from the C code when running on a 64-bit FPU (which presumably uses double-precision arithmetic by default.) There is basically no way to determine which type of math will be used prior to run time, so it's best to just let the developers specify which result they expect on their particular system.
DRC [Fri, 16 Jan 2015 06:45:54 +0000 (06:45 +0000)]
In the process of developing the AltiVec extensions, it was discovered that the normal regression tests aren't sufficient to test the behavior of the library with very small image sizes and when compressing from/decompressing to a subregion of a larger image buffer. Thus, an additional regression test was added that takes advantage of the tiled compression/decompression feature in tjbench. This is being back-ported to the 1.4.x branch primarily to verify that there are no lingering issues in the existing SIMD extensions.
DRC [Fri, 16 Jan 2015 06:37:03 +0000 (06:37 +0000)]
Add separate pseudo-targets for the TurboJPEG and libjpeg regression tests, for those times when you just don't want to sit through 11 iterations of TJUnitTest to find out that your algorithm is broken.
DRC [Fri, 16 Jan 2015 06:29:52 +0000 (06:29 +0000)]
Add the ability to benchmark YCCK JPEG compression/decompression. This is particularly useful since that is the only way to test the performance of the "plain" upsampling routines, which are accelerated on some platforms.
DRC [Fri, 19 Dec 2014 15:36:39 +0000 (15:36 +0000)]
Modify the ARM64 assembly file so that it uses only syntax that the clang assembler in XCode 5.x can understand. These changes should all be cosmetic in nature-- they do not change the meaning or readability of the code nor the ability to build it for Linux. Actually, the code is now more in compliance with the ARM64 programming manual. In addition to these changes, there were a couple of instructions that clang simply doesn't support, so gas-preprocessor.pl was modified so that it now converts those into equivalent instructions that clang can handle.
DRC [Fri, 19 Dec 2014 10:44:09 +0000 (10:44 +0000)]
Add iOS architectures to the shared libraries generated by the Mac/iOS packaging system. I have no idea how useful this is for "standard" iOS application development, but it is useful in a jailbreak environment, and iOS 8 supposedly allows shared libs in "official" apps as well.
DRC [Tue, 25 Nov 2014 09:48:15 +0000 (09:48 +0000)]
Restore the JPP() and JMETHOD() macros. Even though libjpeg-turbo doesn't use them anymore, other software apparently does:
https://bugzilla.redhat.com/show_bug.cgi?id=1164815
https://bugs.kde.org/show_bug.cgi?id=340944
https://bugzilla.mozilla.org/show_bug.cgi?id=1093615
DRC [Sat, 22 Nov 2014 22:09:30 +0000 (22:09 +0000)]
Fix Huffman local buffer overrun discovered by Debian developers when attempting to transform a junk image using ImageMagick:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=768369
DRC [Wed, 19 Nov 2014 00:55:28 +0000 (00:55 +0000)]
Sometimes the sampling factors in grayscale images can be > 1 (for instance, if compressing using 'cjpeg -sample 2x2 -grayscale'.) Technically, sampling factors have no meaning with grayscale JPEGs, and the libjpeg decompressor ignores them in that case. Thus, the TurboJPEG decompressor should ignore them as well.
When building libjpeg-turbo on Un*x systems, INT32 is usually typedef'ed to long, not int, so we need to specify an int pointer when doing a 4-byte write to the RGB565 output buffer. On little endian systems, this doesn't matter, but when you write a 32-bit int to a 64-bit long pointer address on a big endian system, you are writing to the upper 4 bytes, not the lower 4 bytes. NOTE: this will probably break on big endian systems that use 16-bit ints (are there any of those still around?)
This patch also removes an unneeded macro from jdmerge.c.
DRC [Sat, 30 Aug 2014 20:37:50 +0000 (20:37 +0000)]
Fix issues with RGB565 color conversion on big endian machines. The RGB565 routines are now abstracted in a separate file, with separate little-endian and big-endian versions defined at compile time through the use of macros (this is similar to how the colorspace extension routines work.) This allows big-endian machines to take advantage of the same performance optimizations as little-endian machines, and it retains the performance on little-endian machines, since the conditional branch for endianness is at a very coarse-grained level.
DRC [Fri, 29 Aug 2014 01:49:59 +0000 (01:49 +0000)]
Fix several mathematical issues discovered in the ARM64 NEON code while running the extended regression tests introduced in r1267. Specific comments can be found in the original patches:
https://sourceforge.net/p/libjpeg-turbo/patches/64/
DRC [Mon, 25 Aug 2014 15:26:09 +0000 (15:26 +0000)]
Reformat code per Siarhei's original patch (to clearly indicate that the offset instructions are completely independent) and add Siarhei as an individual author (he no longer works for Nokia.)
From aee36252be20054afce371a92406fc66ba6627b5 Mon Sep 17 00:00:00 2001
From: Siarhei Siamashka <siarhei.siamashka@gmail.com>
Date: Wed, 13 Aug 2014 03:50:22 +0300
Subject: [PATCH] ARM: Faster NEON yuv->rgb conversion for Krait and Cortex-A15
The older code was developed and tested only on ARM Cortex-A8 and ARM Cortex-A9.
Tuning it for newer ARM processors can introduce some speed-up (up to 20%).
The performance of the inner loop (conversion of 8 pixels) improves from
~27 cycles down to ~22 cycles on Qualcomm Krait 300, and from ~20 cycles
down to ~18 cycles on ARM Cortex-A15.
The performance remains exactly the same on ARM Cortex-A7 (~58 cycles),
ARM Cortex-A8 (~25 cycles) and ARM Cortex-A9 (~30 cycles) processors.
Also use larger indentation in the source code for separating two independent
instruction streams.
The performance of the inner loop (conversion of 8 pixels):
* ARM Cortex-A7: ~55 cycles
* ARM Cortex-A8: ~28 cycles
* ARM Cortex-A9: ~32 cycles
* ARM Cortex-A15: ~20 cycles
* Qualcomm Krait: ~24 cycles
Based on the Linaro rgb565 patch from
https://sourceforge.net/p/libjpeg-turbo/patches/24/
but implements better instructions scheduling.
DRC [Fri, 22 Aug 2014 19:59:51 +0000 (19:59 +0000)]
Ensure that tjFree() is used for any JPEG buffers that might have been dynamically allocated by the compress/transform functions. To keep things simple, we use tjAlloc() for the statically-allocated buffer as well, so that tjFree() can always be used to free the buffer, regardless of whether it was allocated by tjbench or by the TurboJPEG library. This fixes crashes that occurred on Windows when running tjunittest or tjbench with the -alloc flag.
DRC [Fri, 22 Aug 2014 19:27:28 +0000 (19:27 +0000)]
Ensure that tjFree() is used for any JPEG buffers that might have been dynamically allocated by the compress/transform functions. To keep things simple, we use tjAlloc() for the statically-allocated buffer as well, so that tjFree() can always be used to free the buffer, regardless of whether it was allocated by tjbench or by the TurboJPEG library. This fixes crashes that occurred on Windows when running tjunittest or tjbench with the -alloc flag.
DRC [Fri, 22 Aug 2014 18:30:44 +0000 (18:30 +0000)]
Revert r1335 and r1336. It was a valiant effort, but on Windows, xmm8-xmm15 are non-volatile, and the overhead of pushing them onto the stack at the beginning of each function and popping them at the end was causing worse performance (in the neighborhood of 3-5%) than just using the work areas and limiting the register usage to xmm0-xmm7. Best to leave the SSE2 code alone. We can optimize the register usage for AVX2, once that port takes place.
DRC [Fri, 22 Aug 2014 13:43:33 +0000 (13:43 +0000)]
Add a set of undocumented environment variables and Java system properties that allow compression features of libjpeg that are not normally exposed in the TurboJPEG API to be enabled. These features are not normally exposed because, for the most part, they aren't "turbo" features, but it is still useful to be able to benchmark them without modifying the code.
DRC [Fri, 22 Aug 2014 11:31:46 +0000 (11:31 +0000)]
.func/.endfunc are only necessary when generating STABS debug info, which basically went out of style with parachute pants and Rick Astley. At any rate, none of the platforms for which we're building the ARM code use it (DWARF is the common format these days), and the .func/.endfunc directives cause the clang integrated assembler to fail (http://llvm.org/bugs/show_bug.cgi?id=20424).
DRC [Fri, 22 Aug 2014 03:04:06 +0000 (03:04 +0000)]
Extend tjbenchtest so that it tests the dynamic JPEG buffer allocation feature in TurboJPEG. Disable the tiling feature in TJBench whenever dynamic buffer allocation is enabled (because the tiling feature requires a separate buffer for each tile, using it successfully with dynamic buffer allocation would require a separate TurboJPEG compressor instance for each tile, and it's not worth going to that trouble right now.)
DRC [Fri, 22 Aug 2014 03:00:37 +0000 (03:00 +0000)]
Run the TurboJPEG conformance tests out of a directory in /tmp (for improved performance, if the source directory is on a remote file share.) Fix an issue in TJBench.java that prevented it from working properly if the source image resided in a directory with a dot in the name.
DRC [Fri, 22 Aug 2014 02:57:34 +0000 (02:57 +0000)]
Extend tjbenchtest so that it tests the dynamic JPEG buffer allocation feature in TurboJPEG. Disable the tiling feature in TJBench whenever dynamic buffer allocation is enabled (because the tiling feature requires a separate buffer for each tile, using it successfully with dynamic buffer allocation would require a separate TurboJPEG compressor instance for each tile, and it's not worth going to that trouble right now.)
DRC [Fri, 22 Aug 2014 02:51:16 +0000 (02:51 +0000)]
Run the TurboJPEG conformance tests out of a directory in /tmp (for improved performance, if the source directory is on a remote file share.) Fix an issue in TJBench.java that prevented it from working properly if the source image resided in a directory with a dot in the name.
DRC [Thu, 21 Aug 2014 22:15:19 +0000 (22:15 +0000)]
Subtle point, but dest->outbuffer is a pointer to the address of the JPEG buffer, which is stored in the calling program. Thus, *(dest->outbuffer) will always equal *outbuffer. We need to compare *outbuffer with dest->buffer instead to determine if the pointer is being reused.
DRC [Thu, 21 Aug 2014 15:51:47 +0000 (15:51 +0000)]
If the output buffer in the TurboJPEG destination manager was allocated by the destination manager and is being reused from a previous compression operation, then we need to get the buffer size from the previous operation, since the calling program doesn't know the actual buffer size.
DRC [Thu, 21 Aug 2014 03:40:37 +0000 (03:40 +0000)]
Actually, we need to increase the size of BUFSIZE, not just the size of _buffer. The previous patch might have cause problems if, for instance, state->free_in_buffer was 127 but 129 bytes were compressed. In that case, only 127 of the 129 bytes would have been written to the file. Also document the fix.
DRC [Thu, 21 Aug 2014 03:38:14 +0000 (03:38 +0000)]
Actually, we need to increase the size of BUFSIZE, not just the size of _buffer. The previous patch might have cause problems if, for instance, state->free_in_buffer was 127 but 129 bytes were compressed. In that case, only 127 of the 129 bytes would have been written to the file. Also document the fix.
DRC [Thu, 21 Aug 2014 01:55:22 +0000 (01:55 +0000)]
Fix an extremely rare crash that can occur when compressing a very high-frequency MCU using quality 100 and no subsampling, and when dynamically allocating the JPEG buffer in the destination manager. Even with a test program designed specifically to reproduce the crash, it only occurred once in about 25 million iterations. More details here: https://sourceforge.net/p/libjpeg-turbo/bugs/64
DRC [Thu, 21 Aug 2014 01:53:47 +0000 (01:53 +0000)]
Fix an extremely rare crash that can occur when compressing a very high-frequency MCU using quality 100 and no subsampling, and when dynamically allocating the JPEG buffer in the destination manager. Even with a test program designed specifically to reproduce the crash, it only occurred once in about 25 million iterations. More details here: https://sourceforge.net/p/libjpeg-turbo/bugs/64
DRC [Sun, 17 Aug 2014 12:23:49 +0000 (12:23 +0000)]
Refactored YUVImage Java class so that it supports both unified YUV image buffers as well as separate YUV image planes; modified the JNI functions accordingly and added new helper functions to the TurboJPEG C API (tjPlaneWidth(), tjPlaneHeight(), tjPlaneSizeYUV()) to facilitate those modifications; changed potentially confusing "component width" and "component height" terms to "plane width" and "plane height" and modified variable names in turbojpeg.c to reflect this; numerous other documentation tweaks
DRC [Thu, 14 Aug 2014 17:24:01 +0000 (17:24 +0000)]
Clean up exception handling in the JNI code. The exception is actually not thrown until the function exits, so we can let the code fall through to bailout: if the TurboJPEG C function fails. Also, per the JNI spec, no other JNI functions can be called between GetPrimitiveArrayCritical() and ReleasePrimitiveArrayCritical(). This hasn't caused any problems thus far, but better safe than sorry.
DRC [Thu, 14 Aug 2014 16:54:04 +0000 (16:54 +0000)]
Clean up exception handling in the JNI code. The exception is actually not thrown until the function exits, so we can let the code fall through to bailout: if the TurboJPEG C function fails. Also, per the JNI spec, no other JNI functions can be called between GetPrimitiveArrayCritical() and ReleasePrimitiveArrayCritical(). This hasn't caused any problems thus far, but better safe than sorry.
DRC [Tue, 12 Aug 2014 15:06:30 +0000 (15:06 +0000)]
Reformat TurboJPEG C API documentation to improve ease of maintenance and to make it more consistent with the javadoc formatting; fix minor error in tjCompressFromYUV() prototype.
DRC [Sun, 10 Aug 2014 20:12:17 +0000 (20:12 +0000)]
Clean up and consolidate notes regarding the YUV image format. This also corrects a factual error regarding the padding of the luminance plane-- because we now support 4:1:1, the component width is not necessarily padded to the nearest multiple of 2 if horizontal subsampling is used.