DRC [Mon, 25 Aug 2014 15:26:09 +0000 (15:26 +0000)]
Reformat code per Siarhei's original patch (to clearly indicate that the offset instructions are completely independent) and add Siarhei as an individual author (he no longer works for Nokia.)
From aee36252be20054afce371a92406fc66ba6627b5 Mon Sep 17 00:00:00 2001
From: Siarhei Siamashka <siarhei.siamashka@gmail.com>
Date: Wed, 13 Aug 2014 03:50:22 +0300
Subject: [PATCH] ARM: Faster NEON yuv->rgb conversion for Krait and Cortex-A15
The older code was developed and tested only on ARM Cortex-A8 and ARM Cortex-A9.
Tuning it for newer ARM processors can introduce some speed-up (up to 20%).
The performance of the inner loop (conversion of 8 pixels) improves from
~27 cycles down to ~22 cycles on Qualcomm Krait 300, and from ~20 cycles
down to ~18 cycles on ARM Cortex-A15.
The performance remains exactly the same on ARM Cortex-A7 (~58 cycles),
ARM Cortex-A8 (~25 cycles) and ARM Cortex-A9 (~30 cycles) processors.
Also use larger indentation in the source code for separating two independent
instruction streams.
The performance of the inner loop (conversion of 8 pixels):
* ARM Cortex-A7: ~55 cycles
* ARM Cortex-A8: ~28 cycles
* ARM Cortex-A9: ~32 cycles
* ARM Cortex-A15: ~20 cycles
* Qualcomm Krait: ~24 cycles
Based on the Linaro rgb565 patch from
https://sourceforge.net/p/libjpeg-turbo/patches/24/
but implements better instructions scheduling.
DRC [Fri, 22 Aug 2014 19:27:28 +0000 (19:27 +0000)]
Ensure that tjFree() is used for any JPEG buffers that might have been dynamically allocated by the compress/transform functions. To keep things simple, we use tjAlloc() for the statically-allocated buffer as well, so that tjFree() can always be used to free the buffer, regardless of whether it was allocated by tjbench or by the TurboJPEG library. This fixes crashes that occurred on Windows when running tjunittest or tjbench with the -alloc flag.
DRC [Fri, 22 Aug 2014 19:59:51 +0000 (19:59 +0000)]
Ensure that tjFree() is used for any JPEG buffers that might have been dynamically allocated by the compress/transform functions. To keep things simple, we use tjAlloc() for the statically-allocated buffer as well, so that tjFree() can always be used to free the buffer, regardless of whether it was allocated by tjbench or by the TurboJPEG library. This fixes crashes that occurred on Windows when running tjunittest or tjbench with the -alloc flag.
DRC [Fri, 22 Aug 2014 18:30:44 +0000 (18:30 +0000)]
Revert r1335 and r1336. It was a valiant effort, but on Windows, xmm8-xmm15 are non-volatile, and the overhead of pushing them onto the stack at the beginning of each function and popping them at the end was causing worse performance (in the neighborhood of 3-5%) than just using the work areas and limiting the register usage to xmm0-xmm7. Best to leave the SSE2 code alone. We can optimize the register usage for AVX2, once that port takes place.
DRC [Fri, 22 Aug 2014 13:43:33 +0000 (13:43 +0000)]
Add a set of undocumented environment variables and Java system properties that allow compression features of libjpeg that are not normally exposed in the TurboJPEG API to be enabled. These features are not normally exposed because, for the most part, they aren't "turbo" features, but it is still useful to be able to benchmark them without modifying the code.
DRC [Fri, 22 Aug 2014 11:31:46 +0000 (11:31 +0000)]
.func/.endfunc are only necessary when generating STABS debug info, which basically went out of style with parachute pants and Rick Astley. At any rate, none of the platforms for which we're building the ARM code use it (DWARF is the common format these days), and the .func/.endfunc directives cause the clang integrated assembler to fail (http://llvm.org/bugs/show_bug.cgi?id=20424).
DRC [Fri, 22 Aug 2014 03:04:06 +0000 (03:04 +0000)]
Extend tjbenchtest so that it tests the dynamic JPEG buffer allocation feature in TurboJPEG. Disable the tiling feature in TJBench whenever dynamic buffer allocation is enabled (because the tiling feature requires a separate buffer for each tile, using it successfully with dynamic buffer allocation would require a separate TurboJPEG compressor instance for each tile, and it's not worth going to that trouble right now.)
DRC [Fri, 22 Aug 2014 03:00:37 +0000 (03:00 +0000)]
Run the TurboJPEG conformance tests out of a directory in /tmp (for improved performance, if the source directory is on a remote file share.) Fix an issue in TJBench.java that prevented it from working properly if the source image resided in a directory with a dot in the name.
DRC [Fri, 22 Aug 2014 02:57:34 +0000 (02:57 +0000)]
Extend tjbenchtest so that it tests the dynamic JPEG buffer allocation feature in TurboJPEG. Disable the tiling feature in TJBench whenever dynamic buffer allocation is enabled (because the tiling feature requires a separate buffer for each tile, using it successfully with dynamic buffer allocation would require a separate TurboJPEG compressor instance for each tile, and it's not worth going to that trouble right now.)
DRC [Fri, 22 Aug 2014 02:51:16 +0000 (02:51 +0000)]
Run the TurboJPEG conformance tests out of a directory in /tmp (for improved performance, if the source directory is on a remote file share.) Fix an issue in TJBench.java that prevented it from working properly if the source image resided in a directory with a dot in the name.
DRC [Thu, 21 Aug 2014 22:15:19 +0000 (22:15 +0000)]
Subtle point, but dest->outbuffer is a pointer to the address of the JPEG buffer, which is stored in the calling program. Thus, *(dest->outbuffer) will always equal *outbuffer. We need to compare *outbuffer with dest->buffer instead to determine if the pointer is being reused.
DRC [Thu, 21 Aug 2014 15:51:47 +0000 (15:51 +0000)]
If the output buffer in the TurboJPEG destination manager was allocated by the destination manager and is being reused from a previous compression operation, then we need to get the buffer size from the previous operation, since the calling program doesn't know the actual buffer size.
DRC [Thu, 21 Aug 2014 03:40:37 +0000 (03:40 +0000)]
Actually, we need to increase the size of BUFSIZE, not just the size of _buffer. The previous patch might have cause problems if, for instance, state->free_in_buffer was 127 but 129 bytes were compressed. In that case, only 127 of the 129 bytes would have been written to the file. Also document the fix.
DRC [Thu, 21 Aug 2014 03:38:14 +0000 (03:38 +0000)]
Actually, we need to increase the size of BUFSIZE, not just the size of _buffer. The previous patch might have cause problems if, for instance, state->free_in_buffer was 127 but 129 bytes were compressed. In that case, only 127 of the 129 bytes would have been written to the file. Also document the fix.
DRC [Thu, 21 Aug 2014 01:55:22 +0000 (01:55 +0000)]
Fix an extremely rare crash that can occur when compressing a very high-frequency MCU using quality 100 and no subsampling, and when dynamically allocating the JPEG buffer in the destination manager. Even with a test program designed specifically to reproduce the crash, it only occurred once in about 25 million iterations. More details here: https://sourceforge.net/p/libjpeg-turbo/bugs/64
DRC [Thu, 21 Aug 2014 01:53:47 +0000 (01:53 +0000)]
Fix an extremely rare crash that can occur when compressing a very high-frequency MCU using quality 100 and no subsampling, and when dynamically allocating the JPEG buffer in the destination manager. Even with a test program designed specifically to reproduce the crash, it only occurred once in about 25 million iterations. More details here: https://sourceforge.net/p/libjpeg-turbo/bugs/64
DRC [Sun, 17 Aug 2014 12:23:49 +0000 (12:23 +0000)]
Refactored YUVImage Java class so that it supports both unified YUV image buffers as well as separate YUV image planes; modified the JNI functions accordingly and added new helper functions to the TurboJPEG C API (tjPlaneWidth(), tjPlaneHeight(), tjPlaneSizeYUV()) to facilitate those modifications; changed potentially confusing "component width" and "component height" terms to "plane width" and "plane height" and modified variable names in turbojpeg.c to reflect this; numerous other documentation tweaks
DRC [Thu, 14 Aug 2014 17:24:01 +0000 (17:24 +0000)]
Clean up exception handling in the JNI code. The exception is actually not thrown until the function exits, so we can let the code fall through to bailout: if the TurboJPEG C function fails. Also, per the JNI spec, no other JNI functions can be called between GetPrimitiveArrayCritical() and ReleasePrimitiveArrayCritical(). This hasn't caused any problems thus far, but better safe than sorry.
DRC [Thu, 14 Aug 2014 16:54:04 +0000 (16:54 +0000)]
Clean up exception handling in the JNI code. The exception is actually not thrown until the function exits, so we can let the code fall through to bailout: if the TurboJPEG C function fails. Also, per the JNI spec, no other JNI functions can be called between GetPrimitiveArrayCritical() and ReleasePrimitiveArrayCritical(). This hasn't caused any problems thus far, but better safe than sorry.
DRC [Tue, 12 Aug 2014 15:06:30 +0000 (15:06 +0000)]
Reformat TurboJPEG C API documentation to improve ease of maintenance and to make it more consistent with the javadoc formatting; fix minor error in tjCompressFromYUV() prototype.
DRC [Sun, 10 Aug 2014 20:12:17 +0000 (20:12 +0000)]
Clean up and consolidate notes regarding the YUV image format. This also corrects a factual error regarding the padding of the luminance plane-- because we now support 4:1:1, the component width is not necessarily padded to the nearest multiple of 2 if horizontal subsampling is used.
DRC [Sun, 10 Aug 2014 16:39:32 +0000 (16:39 +0000)]
Fix a display issue in the documentation for tjDecompress2() (doxygen treats a star at the beginning of the line as a list bullet); make the documentation more readable by displaying fixed-width text (which is used to refer to variables and functions) in a different color.
DRC [Sat, 9 Aug 2014 22:58:18 +0000 (22:58 +0000)]
Oops. The Windows version of collect_args/uncollect_args uses rsp, so we still need the rsp prologue/epilogue, despite the fact that we aren't using the stack as a work area. This fixes a segfault on Windows caused by r1335.
DRC [Sat, 9 Aug 2014 14:30:28 +0000 (14:30 +0000)]
Attempt to improve performance by refactoring the compression-side color conversion and DCT algorithms so that they take full advantage of the additional registers available with 64-bit SSE2. This produces a somewhat yawn-worthy speedup of 2-3%, but at least the code is a lot more readable now.
Fix performance and other issues uncovered in testing with actual ARM64 hardware; formatting tweaks; remove NEON platform check (NEON is always available with ARMv8)
DRC [Sun, 22 Jun 2014 20:38:54 +0000 (20:38 +0000)]
Prevent a buffer overrun if the comment begins with a literal quote character and the string exceeds 65k characters. Also prevent comments longer than 65k characters from being written, since this will produce an incorrect JPEG file.
DRC [Sun, 22 Jun 2014 20:36:50 +0000 (20:36 +0000)]
Prevent a buffer overrun if the comment begins with a literal quote character and the string exceeds 65k characters. Also prevent comments longer than 65k characters from being written, since this will produce an incorrect JPEG file.
DRC [Wed, 28 May 2014 20:28:30 +0000 (20:28 +0000)]
Our copyright string is longer than JMSG_LENGTH_MAX, and this was causing a buffer overrun if output_message() was called with msg_code set to JMSG_COPYRIGHT, or if format_message() was called with msg_code set to JMSG_COPYRIGHT and with a buffer of length JMSG_LENGTH_MAX.
We can't simply increase JMSG_LENGTH_MAX, because it is part of the libjpeg API, and it is generally assumed that a buffer of this length will be passed to format_message(). Thus, the easiest solution is simply to use a shorter copyright string for JMSG_COPYRIGHT.
DRC [Wed, 28 May 2014 20:27:42 +0000 (20:27 +0000)]
Our copyright string is longer than JMSG_LENGTH_MAX, and this was causing a buffer overrun if output_message() was called with msg_code set to JMSG_COPYRIGHT, or if format_message() was called with msg_code set to JMSG_COPYRIGHT and with a buffer of length JMSG_LENGTH_MAX.
We can't simply increase JMSG_LENGTH_MAX, because it is part of the libjpeg API, and it is generally assumed that a buffer of this length will be passed to format_message(). Thus, the easiest solution is simply to use a shorter copyright string for JMSG_COPYRIGHT.
DRC [Wed, 28 May 2014 20:19:54 +0000 (20:19 +0000)]
Our copyright string is longer than JMSG_LENGTH_MAX, and this was causing a buffer overrun if output_message() was called with msg_code set to JMSG_COPYRIGHT, or if format_message() was called with msg_code set to JMSG_COPYRIGHT and with a buffer of length JMSG_LENGTH_MAX.
We can't simply increase JMSG_LENGTH_MAX, because it is part of the libjpeg API, and it is generally assumed that a buffer of this length will be passed to format_message(). Thus, the easiest solution is simply to use a shorter copyright string for JMSG_COPYRIGHT.
DRC [Mon, 19 May 2014 19:13:22 +0000 (19:13 +0000)]
Allow for building the MIPS DSPr2 extensions if the host is mips-* as well as mipsel-*. The DSPr2 extensions are little endian, so we still have to check that the compiler defines __MIPSEL__ before enabling them. This paves the way for supporting big-endian MIPS, and in the near term, it allows the SIMD extensions to be built with Sourcery CodeBench.
DRC [Sun, 18 May 2014 19:04:03 +0000 (19:04 +0000)]
libjpeg-turbo has never supported non-ANSI compilers, so get rid of the crufty SIZEOF() macro. It was not being used consistently anyhow, so it would not have been possible to build prior releases of libjpeg-turbo using the broken compilers for which that macro was designed.
DRC [Sun, 18 May 2014 18:33:44 +0000 (18:33 +0000)]
Remove MS-DOS code and information, and adjust copyright headers to reflect the removal of features in r1307 and r1308. libjpeg-turbo has never supported MS-DOS, nor is it even possible for us to do so.
DRC [Fri, 16 May 2014 10:43:44 +0000 (10:43 +0000)]
Get rid of the HAVE_PROTOTYPES configuration option, as well as the related JMETHOD and JPP macros. libjpeg-turbo has never supported compilers that don't handle prototypes. Doing so requires ansi2knr, which isn't even supported in the IJG code anymore.
DRC [Thu, 15 May 2014 20:30:16 +0000 (20:30 +0000)]
Remove all of the NEED_SHORT_EXTERNAL_NAMES stuff. There is scant information available as to which linkers ever had a 15-character global symbol name limit. AFAICT, it might have been a VMS and/or a.out BSD thing, but none of those platforms have ever been supported by libjpeg-turbo (nor are such systems supported by other open source libraries of this nature.)
DRC [Thu, 15 May 2014 18:22:24 +0000 (18:22 +0000)]
Fix build, which was broken by the checkin of the MIPS DSPr2 accelerated smooth downsampling routine. Until/unless other platforms include SIMD support for that function, it's just easier to #ifdef around it rather than adding stubs for the other platforms.
DRC [Tue, 13 May 2014 18:41:33 +0000 (18:41 +0000)]
The x86/x86-64 SIMD extensions were originally designed to accommodate changing the value of RGB_*, but this apparently broke when RGB-to-gray colorspace conversion was accelerated. Further, the ARM NEON extensions have always assumed that JCS_RGB behaves identically to JCS_EXT_RGB. Rather than fix these issues, it makes more sense to just stop claiming that we support changing the values of RGB_*, since doing so is no longer necessary.
DRC [Tue, 13 May 2014 18:38:36 +0000 (18:38 +0000)]
The x86/x86-64 SIMD extensions were originally designed to accommodate changing the value of RGB_*, but this apparently broke when RGB-to-gray colorspace conversion was accelerated. Further, the ARM NEON extensions have always assumed that JCS_RGB behaves identically to JCS_EXT_RGB. Rather than fix these issues, it makes more sense to just stop claiming that we support changing the values of RGB_*, since doing so is no longer necessary.