DRC [Sat, 9 Aug 2014 22:58:18 +0000 (22:58 +0000)]
Oops. The Windows version of collect_args/uncollect_args uses rsp, so we still need the rsp prologue/epilogue, despite the fact that we aren't using the stack as a work area. This fixes a segfault on Windows caused by r1335.
DRC [Sat, 9 Aug 2014 14:30:28 +0000 (14:30 +0000)]
Attempt to improve performance by refactoring the compression-side color conversion and DCT algorithms so that they take full advantage of the additional registers available with 64-bit SSE2. This produces a somewhat yawn-worthy speedup of 2-3%, but at least the code is a lot more readable now.
Fix performance and other issues uncovered in testing with actual ARM64 hardware; formatting tweaks; remove NEON platform check (NEON is always available with ARMv8)
DRC [Sun, 22 Jun 2014 20:38:54 +0000 (20:38 +0000)]
Prevent a buffer overrun if the comment begins with a literal quote character and the string exceeds 65k characters. Also prevent comments longer than 65k characters from being written, since this will produce an incorrect JPEG file.
DRC [Sun, 22 Jun 2014 20:36:50 +0000 (20:36 +0000)]
Prevent a buffer overrun if the comment begins with a literal quote character and the string exceeds 65k characters. Also prevent comments longer than 65k characters from being written, since this will produce an incorrect JPEG file.
DRC [Wed, 28 May 2014 20:28:30 +0000 (20:28 +0000)]
Our copyright string is longer than JMSG_LENGTH_MAX, and this was causing a buffer overrun if output_message() was called with msg_code set to JMSG_COPYRIGHT, or if format_message() was called with msg_code set to JMSG_COPYRIGHT and with a buffer of length JMSG_LENGTH_MAX.
We can't simply increase JMSG_LENGTH_MAX, because it is part of the libjpeg API, and it is generally assumed that a buffer of this length will be passed to format_message(). Thus, the easiest solution is simply to use a shorter copyright string for JMSG_COPYRIGHT.
DRC [Wed, 28 May 2014 20:27:42 +0000 (20:27 +0000)]
Our copyright string is longer than JMSG_LENGTH_MAX, and this was causing a buffer overrun if output_message() was called with msg_code set to JMSG_COPYRIGHT, or if format_message() was called with msg_code set to JMSG_COPYRIGHT and with a buffer of length JMSG_LENGTH_MAX.
We can't simply increase JMSG_LENGTH_MAX, because it is part of the libjpeg API, and it is generally assumed that a buffer of this length will be passed to format_message(). Thus, the easiest solution is simply to use a shorter copyright string for JMSG_COPYRIGHT.
DRC [Wed, 28 May 2014 20:19:54 +0000 (20:19 +0000)]
Our copyright string is longer than JMSG_LENGTH_MAX, and this was causing a buffer overrun if output_message() was called with msg_code set to JMSG_COPYRIGHT, or if format_message() was called with msg_code set to JMSG_COPYRIGHT and with a buffer of length JMSG_LENGTH_MAX.
We can't simply increase JMSG_LENGTH_MAX, because it is part of the libjpeg API, and it is generally assumed that a buffer of this length will be passed to format_message(). Thus, the easiest solution is simply to use a shorter copyright string for JMSG_COPYRIGHT.
DRC [Mon, 19 May 2014 19:13:22 +0000 (19:13 +0000)]
Allow for building the MIPS DSPr2 extensions if the host is mips-* as well as mipsel-*. The DSPr2 extensions are little endian, so we still have to check that the compiler defines __MIPSEL__ before enabling them. This paves the way for supporting big-endian MIPS, and in the near term, it allows the SIMD extensions to be built with Sourcery CodeBench.
DRC [Sun, 18 May 2014 19:04:03 +0000 (19:04 +0000)]
libjpeg-turbo has never supported non-ANSI compilers, so get rid of the crufty SIZEOF() macro. It was not being used consistently anyhow, so it would not have been possible to build prior releases of libjpeg-turbo using the broken compilers for which that macro was designed.
DRC [Sun, 18 May 2014 18:33:44 +0000 (18:33 +0000)]
Remove MS-DOS code and information, and adjust copyright headers to reflect the removal of features in r1307 and r1308. libjpeg-turbo has never supported MS-DOS, nor is it even possible for us to do so.
DRC [Fri, 16 May 2014 10:43:44 +0000 (10:43 +0000)]
Get rid of the HAVE_PROTOTYPES configuration option, as well as the related JMETHOD and JPP macros. libjpeg-turbo has never supported compilers that don't handle prototypes. Doing so requires ansi2knr, which isn't even supported in the IJG code anymore.
DRC [Thu, 15 May 2014 20:30:16 +0000 (20:30 +0000)]
Remove all of the NEED_SHORT_EXTERNAL_NAMES stuff. There is scant information available as to which linkers ever had a 15-character global symbol name limit. AFAICT, it might have been a VMS and/or a.out BSD thing, but none of those platforms have ever been supported by libjpeg-turbo (nor are such systems supported by other open source libraries of this nature.)
DRC [Thu, 15 May 2014 18:22:24 +0000 (18:22 +0000)]
Fix build, which was broken by the checkin of the MIPS DSPr2 accelerated smooth downsampling routine. Until/unless other platforms include SIMD support for that function, it's just easier to #ifdef around it rather than adding stubs for the other platforms.
DRC [Tue, 13 May 2014 18:41:33 +0000 (18:41 +0000)]
The x86/x86-64 SIMD extensions were originally designed to accommodate changing the value of RGB_*, but this apparently broke when RGB-to-gray colorspace conversion was accelerated. Further, the ARM NEON extensions have always assumed that JCS_RGB behaves identically to JCS_EXT_RGB. Rather than fix these issues, it makes more sense to just stop claiming that we support changing the values of RGB_*, since doing so is no longer necessary.
DRC [Tue, 13 May 2014 18:38:36 +0000 (18:38 +0000)]
The x86/x86-64 SIMD extensions were originally designed to accommodate changing the value of RGB_*, but this apparently broke when RGB-to-gray colorspace conversion was accelerated. Further, the ARM NEON extensions have always assumed that JCS_RGB behaves identically to JCS_EXT_RGB. Rather than fix these issues, it makes more sense to just stop claiming that we support changing the values of RGB_*, since doing so is no longer necessary.
DRC [Sun, 11 May 2014 10:09:07 +0000 (10:09 +0000)]
Port the more accurate (and slightly faster) floating point IDCT implementation from jpeg-8a and later. New research revealed that the SSE/SSE2 floating point IDCT implementation was actually more accurate than the jpeg-6b implementation, not less, which is why its mathematical results have always differed from those of the jpeg-6b implementation. This patch brings the accuracy of the C code in line with that of the SSE/SSE2 code.
DRC [Sun, 11 May 2014 09:36:25 +0000 (09:36 +0000)]
Convert tabs to spaces in the libjpeg code and the SIMD code (TurboJPEG retains the use of tabs for historical reasons. They were annoying in the libjpeg code primarily because they were not consistently used and because they were used to format as well as indent the code. In the case of TurboJPEG, tabs are used just to indent the code, so even if the editor assumes a different tab width, the code will still be readable.)
DRC [Sat, 10 May 2014 09:53:34 +0000 (09:53 +0000)]
Using subdirectories unfortunately opened up a can of worms. In order to prevent object name conflicts, it is necessary to use the subdir-objects automake directive, but it simply doesn't work right on some of the versions of automake we still have to support. Another option would be to add a separate Makefile.am file to each subdirectory, but that requires maintaining a completely different set of build rules for each one. Fortunately, however, we're in the 21st century now, so we can use filenames longer than 8.3.
DRC [Fri, 9 May 2014 20:14:26 +0000 (20:14 +0000)]
Re-organize the x86/x86-64 SIMD routines into separate folders by instruction set so we can name each routine similarly to its corresponding C file. This also makes it easier to add support for new instruction sets.
DRC [Fri, 9 May 2014 18:00:32 +0000 (18:00 +0000)]
Convert tabs to spaces in the libjpeg code and the SIMD code (TurboJPEG retains the use of tabs for historical reasons. They were annoying in the libjpeg code primarily because they were not consistently used and because they were used to format as well as indent the code. In the case of TurboJPEG, tabs are used just to indent the code, so even if the editor assumes a different tab width, the code will still be readable.)
DRC [Wed, 7 May 2014 06:02:57 +0000 (06:02 +0000)]
Fix regression that caused 'make test' to fail with non-x86 SIMD code. The round-off error in the SIMD float DCT/IDCT routines only exists in the SSE and SSE2 implementations.
DRC [Tue, 6 May 2014 21:03:35 +0000 (21:03 +0000)]
Replace our custom version of Android.mk with instructions on how to build a libjpeg-turbo SDK for Android using autotools. Upon consulting with AOSP, it appears that Android.mk isn't really necessary except when building libjpeg-turbo for use by the Android platform itself, and it makes more sense for them to maintain the makefile for that purpose rather than for it to be upstreamed. ndk-build has serious limitations that prevent it from being used to generate static libjpeg-turbo libraries (mainly, it isn't possible to combine pre-built objects from one module into a static library for another module, which is necessary because the SIMD extensions sometimes have to be built with different CFLAGS than the rest of the code.) In general, it's just better not to introduce a new build system.
We use __CHAR_UNSIGNED__ (automatically defined by the AC_C_CHAR_UNSIGNED macro) rather than CHAR_IS_UNSIGNED (defined by custom autoconf code in libjpeg that we didn't port over), although I doubt it matters on any of the platforms we support.
We use __CHAR_UNSIGNED__ (automatically defined by the AC_C_CHAR_UNSIGNED macro) rather than CHAR_IS_UNSIGNED (defined by custom autoconf code in libjpeg that we didn't port over), although I doubt it matters on any of the platforms we support.
This patch accomplishes the following:
-- Auto-generates HAVE_LOCALE_H macro and adds it to jconfig.h (this is used by rdjpgcom.c.)
-- Reconciles the description and ordering of macros between config.h.in and jconfig.h.in, so the two files can be easily diffed.
-- Eliminates the use of the autoheader-generated config.h in the project and moves relevant internal-only macros into a new file, jconfigint.h. This is to avoid "already defined" warnings in files that were including both config.h (to get the internal autotools package information or the INLINE definition) and jconfig.h.
This patch accomplishes the following:
-- Auto-generates HAVE_LOCALE_H macro and adds it to jconfig.h (this is used by rdjpgcom.c.)
-- Reconciles the description and ordering of macros between config.h.in and jconfig.h.in, so the two files can be easily diffed.
-- Eliminates the use of the autoheader-generated config.h in the project and moves relevant internal-only macros into a new file, jconfigint.h. This is to avoid "already defined" warnings in files that were including both config.h (to get the internal autotools package information or the INLINE definition) and jconfig.h.
Work around an issue in Visual C++ 2010 and 2013 that was causing the 256-bit bitmap test in the regression tests to fail. More specifically, when optimization is enabled in these versions of Visual C++, the optimizer seems to get confused by the following code block:
Each time cur0 is incremented by delta, the compiled code doubles the value of delta (WTF?!) Thus, by the time the end of the block is reached, cur0 is equal to 15 times its former self, not 7 times its former self as it should be. At any rate, it was a lot simpler to just refactor the code so that it uses multiplication.
Work around an issue in Visual C++ 2010 and 2013 that was causing the 256-bit bitmap test in the regression tests to fail. More specifically, when optimization is enabled in these versions of Visual C++, the optimizer seems to get confused by the following code block:
Each time cur0 is incremented by delta, the compiled code doubles the value of delta (WTF?!) Thus, by the time the end of the block is reached, cur0 is equal to 15 times its former self, not 7 times its former self as it should be. At any rate, it was a lot simpler to just refactor the code so that it uses multiplication.
Work around an issue in Visual C++ 2010 and 2013 that was causing the 256-bit bitmap test in the regression tests to fail. More specifically, when optimization is enabled in these versions of Visual C++, the optimizer seems to get confused by the following code block:
Each time cur0 is incremented by delta, the compiled code doubles the value of delta (WTF?!) Thus, by the time the end of the block is reached, cur0 is equal to 15 times its former self, not 7 times its former self as it should be. At any rate, it was a lot simpler to just refactor the code so that it uses multiplication.