From 8940e6ca8628e10b45486fad4572a15b63546eb1 Mon Sep 17 00:00:00 2001
From: DRC <dcommander@users.sourceforge.net>
Date: Sun, 11 May 2014 09:46:28 +0000
Subject: [PATCH] Provide a more thorough description of the trade-offs between
 the various DCT/IDCT algorithms, based on new resarch

git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/branches/1.3.x@1286 632fc199-4ca6-4c93-a231-07263d6284db
---
 README-turbo.txt | 22 +++++++++++-----
 cjpeg.1          | 23 +++++++++++-----
 djpeg.1          | 26 +++++++++++++-----
 libjpeg.txt      | 55 +++++++++++++++++++++++++++++++--------
 usage.txt        | 68 ++++++++++++++++++++++++++++++++++--------------
 5 files changed, 144 insertions(+), 50 deletions(-)

diff --git a/README-turbo.txt b/README-turbo.txt
index b81299f..a94ff97 100755
--- a/README-turbo.txt
+++ b/README-turbo.txt
@@ -419,10 +419,16 @@ details.
 
 For the most part, libjpeg-turbo should produce identical output to libjpeg
 v6b.  The one exception to this is when using the floating point DCT/IDCT, in
-which case the outputs of libjpeg v6b and libjpeg-turbo are not guaranteed to
-be identical (the accuracy of the floating point DCT/IDCT is constant when
-using libjpeg-turbo's SIMD extensions, but otherwise, it can depend heavily on
-the compiler and compiler settings.)
+which case the outputs of libjpeg v6b and libjpeg-turbo can differ for the
+following reasons:
+
+-- The SSE/SSE2 floating point DCT implementation in libjpeg-turbo is ever so
+   slightly more accurate than the implementation in libjpeg v6b, but not by
+   any amount perceptible to human vision (generally in the range of 0.01 to
+   0.08 dB gain in PNSR.)
+-- When not using the SIMD extensions, then the accuracy of the floating point
+   DCT/IDCT can depend on the compiler and compiler settings.
+
 
 While libjpeg-turbo does emulate the libjpeg v8 API/ABI, under the hood, it is
 still using the same algorithms as libjpeg v6b, so there are several specific
@@ -430,12 +436,14 @@ cases in which libjpeg-turbo cannot be expected to produce the same output as
 libjpeg v8:
 
 -- When decompressing using scaling factors of 1/2 and 1/4, because libjpeg v8
-   implements those scaling algorithms a bit differently than libjpeg v6b does,
-   and libjpeg-turbo's SIMD extensions are based on the libjpeg v6b behavior.
+   implements those scaling algorithms differently than libjpeg v6b does, and
+   libjpeg-turbo's SIMD extensions are based on the libjpeg v6b behavior.
 
 -- When using chrominance subsampling, because libjpeg v8 implements this
    with its DCT/IDCT scaling algorithms rather than with a separate
-   downsampling/upsampling algorithm.
+   downsampling/upsampling algorithm.  In our testing, the subsampled/upsampled
+   output of libjpeg v8 is less accurate than that of libjpeg v6b for this
+   reason.
 
 -- When using the floating point IDCT, for the reasons stated above and also
    because the floating point IDCT algorithm was modified in libjpeg v8a to
diff --git a/cjpeg.1 b/cjpeg.1
index 113efd5..b4edf62 100644
--- a/cjpeg.1
+++ b/cjpeg.1
@@ -1,4 +1,4 @@
-.TH CJPEG 1 "18 January 2013"
+.TH CJPEG 1 "11 May 2014"
 .SH NAME
 cjpeg \- compress an image file to a JPEG file
 .SH SYNOPSIS
@@ -166,14 +166,25 @@ Use integer DCT method (default).
 .TP
 .B \-dct fast
 Use fast integer DCT (less accurate).
+In libjpeg-turbo, the fast method is generally about 5-15% faster than the int
+method when using the x86/x86-64 SIMD extensions (results may vary with other
+SIMD implementations, or when using libjpeg-turbo without SIMD extensions.)
+For quality levels of 90 and below, there should be little or no perceptible
+difference between the two algorithms.  For quality levels above 90, however,
+the difference between the fast and the int methods becomes more pronounced.
+With quality=97, for instance, the fast method incurs generally about a 1-3 dB
+loss (in PSNR) relative to the int method, but this can be larger for some
+images.  Do not use the fast method with quality levels above 97.  The
+algorithm often degenerates at quality=98 and above and can actually produce a
+more lossy image than if lower quality levels had been used.
 .TP
 .B \-dct float
 Use floating-point DCT method.
-The float method is very slightly more accurate than the int method, but is
-much slower unless your machine has very fast floating-point hardware.  Also
-note that results of the floating-point method may vary slightly across
-machines, while the integer methods should give the same results everywhere.
-The fast integer method is much less accurate than the other two.
+The float method is mostly a legacy feature.  It does not produce significantly
+more accurate results than the int method, and it is much slower.  The float
+method may also give different results on different machines due to varying
+roundoff behavior, whereas the integer methods should give the same results on
+all machines.
 .TP
 .BI \-restart " N"
 Emit a JPEG restart marker every N MCU rows, or every N MCU blocks if "B" is
diff --git a/djpeg.1 b/djpeg.1
index 8bb7d27..d77e7ed 100644
--- a/djpeg.1
+++ b/djpeg.1
@@ -1,4 +1,4 @@
-.TH DJPEG 1 "18 January 2013"
+.TH DJPEG 1 "11 May 2014"
 .SH NAME
 djpeg \- decompress a JPEG file to an image file
 .SH SYNOPSIS
@@ -115,14 +115,28 @@ Use integer DCT method (default).
 .TP
 .B \-dct fast
 Use fast integer DCT (less accurate).
+In libjpeg-turbo, the fast method is generally about 5-15% faster than the int
+method when using the x86/x86-64 SIMD extensions (results may vary with other
+SIMD implementations, or when using libjpeg-turbo without SIMD extensions.)  If
+the JPEG image was compressed using a quality level of 85 or below, then there
+should be little or no perceptible difference between the two algorithms.  When
+decompressing images that were compressed using quality levels above 85,
+however, the difference between the fast and int methods becomes more
+pronounced.  With images compressed using quality=97, for instance, the fast
+method incurs generally about a 4-6 dB loss (in PSNR) relative to the int
+method, but this can be larger for some images.  If you can avoid it, do not
+use the fast method when decompressing images that were compressed using
+quality levels above 97.  The algorithm often degenerates for such images and
+can actually produce a more lossy output image than if the JPEG image had been
+compressed using lower quality levels.
 .TP
 .B \-dct float
 Use floating-point DCT method.
-The float method is very slightly more accurate than the int method, but is
-much slower unless your machine has very fast floating-point hardware.  Also
-note that results of the floating-point method may vary slightly across
-machines, while the integer methods should give the same results everywhere.
-The fast integer method is much less accurate than the other two.
+The float method is mostly a legacy feature.  It does not produce significantly
+more accurate results than the int method, and it is much slower.  The float
+method may also give different results on different machines due to varying
+roundoff behavior, whereas the integer methods should give the same results on
+all machines.
 .TP
 .B \-dither fs
 Use Floyd-Steinberg dithering in color quantization.
diff --git a/libjpeg.txt b/libjpeg.txt
index d110738..afc002b 100644
--- a/libjpeg.txt
+++ b/libjpeg.txt
@@ -3,7 +3,7 @@ USING THE IJG JPEG LIBRARY
 This file was part of the Independent JPEG Group's software:
 Copyright (C) 1994-2011, Thomas G. Lane, Guido Vollbeding.
 Modifications:
-Copyright (C) 2010, D. R. Commander.
+Copyright (C) 2010, 2014, D. R. Commander.
 For conditions of distribution and use, see the accompanying README file.
 
 
@@ -886,14 +886,23 @@ J_DCT_METHOD dct_method
                 JDCT_FLOAT: floating-point method
                 JDCT_DEFAULT: default method (normally JDCT_ISLOW)
                 JDCT_FASTEST: fastest method (normally JDCT_IFAST)
-        The FLOAT method is very slightly more accurate than the ISLOW method,
-        but may give different results on different machines due to varying
-        roundoff behavior.  The integer methods should give the same results
-        on all machines.  On machines with sufficiently fast FP hardware, the
-        floating-point method may also be the fastest.  The IFAST method is
-        considerably less accurate than the other two; its use is not
-        recommended if high quality is a concern.  JDCT_DEFAULT and
-        JDCT_FASTEST are macros configurable by each installation.
+        In libjpeg-turbo, JDCT_IFAST is generally about 5-15% faster than
+        JDCT_ISLOW when using the x86/x86-64 SIMD extensions (results may vary
+        with other SIMD implementations, or when using libjpeg-turbo without
+        SIMD extensions.)  For quality levels of 90 and below, there should be
+        little or no perceptible difference between the two algorithms.  For
+        quality levels above 90, however, the difference between JDCT_IFAST and
+        JDCT_ISLOW becomes more pronounced.  With quality=97, for instance,
+        JDCT_IFAST incurs generally about a 1-3 dB loss (in PSNR) relative to
+        JDCT_ISLOW, but this can be larger for some images.  Do not use
+        JDCT_IFAST with quality levels above 97.  The algorithm often
+        degenerates at quality=98 and above and can actually produce a more
+        lossy image than if lower quality levels had been used.  JDCT_FLOAT is
+        mostly a legacy feature.  It does not produce significantly more
+        accurate results than the ISLOW method, and it is much slower.  The
+        FLOAT method may also give different results on different machines due
+        to varying roundoff behavior, whereas the integer methods should give
+        the same results on all machines.
 
 J_COLOR_SPACE jpeg_color_space
 int num_components
@@ -1170,8 +1179,32 @@ int actual_number_of_colors
 Additional decompression parameters that the application may set include:
 
 J_DCT_METHOD dct_method
-        Selects the algorithm used for the DCT step.  Choices are the same
-        as described above for compression.
+        Selects the algorithm used for the DCT step.  Choices are:
+                JDCT_ISLOW: slow but accurate integer algorithm
+                JDCT_IFAST: faster, less accurate integer method
+                JDCT_FLOAT: floating-point method
+                JDCT_DEFAULT: default method (normally JDCT_ISLOW)
+                JDCT_FASTEST: fastest method (normally JDCT_IFAST)
+        In libjpeg-turbo, JDCT_IFAST is generally about 5-15% faster than
+        JDCT_ISLOW when using the x86/x86-64 SIMD extensions (results may vary
+        with other SIMD implementations, or when using libjpeg-turbo without
+        SIMD extensions.)  If the JPEG image was compressed using a quality
+        level of 85 or below, then there should be little or no perceptible
+        difference between the two algorithms.  When decompressing images that
+        were compressed using quality levels above 85, however, the difference
+        between JDCT_IFAST and JDCT_ISLOW becomes more pronounced.  With images
+        compressed using quality=97, for instance, JDCT_IFAST incurs generally
+        about a 4-6 dB loss (in PSNR) relative to JDCT_ISLOW, but this can be
+        larger for some images.  If you can avoid it, do not use JDCT_IFAST
+        when decompressing images that were compressed using quality levels
+        above 97.  The algorithm often degenerates for such images and can
+        actually produce a more lossy output image than if the JPEG image had
+        been compressed using lower quality levels.  JDCT_FLOAT is mostly a
+        legacy feature.  It does not produce significantly more accurate
+        results than the ISLOW method, and it is much slower.  The FLOAT method
+        may also give different results on different machines due to varying
+        roundoff behavior, whereas the integer methods should give the same
+        results on all machines.
 
 boolean do_fancy_upsampling
         If TRUE, do careful upsampling of chroma components.  If FALSE,
diff --git a/usage.txt b/usage.txt
index 14ab77b..b328a21 100644
--- a/usage.txt
+++ b/usage.txt
@@ -172,13 +172,28 @@ Switches for advanced users:
         -dct int        Use integer DCT method (default).
         -dct fast       Use fast integer DCT (less accurate).
         -dct float      Use floating-point DCT method.
-                        The float method is very slightly more accurate than
-                        the int method, but is much slower unless your machine
-                        has very fast floating-point hardware.  Also note that
-                        results of the floating-point method may vary slightly
-                        across machines, while the integer methods should give
-                        the same results everywhere.  The fast integer method
-                        is much less accurate than the other two.
+                        In libjpeg-turbo, the fast method is generally about
+                        5-15% faster than the int method when using the
+                        x86/x86-64 SIMD extensions (results may vary with other
+                        SIMD implementations, or when using libjpeg-turbo
+                        without SIMD extensions.)  For quality levels of 90 and
+                        below, there should be little or no perceptible
+                        difference between the two algorithms.  For quality
+                        levels above 90, however, the difference between
+                        the fast and the int methods becomes more pronounced.
+                        With quality=97, for instance, the fast method incurs
+                        generally about a 1-3 dB loss (in PSNR) relative to
+                        the int method, but this can be larger for some images.
+                        Do not use the fast method with quality levels above
+                        97.  The algorithm often degenerates at quality=98 and
+                        above and can actually produce a more lossy image than
+                        if lower quality levels had been used.  The float
+                        method is mostly a legacy feature.  It does not produce
+                        significantly more accurate results than the int
+                        method, and it is much slower.  The float method may
+                        also give different results on different machines due
+                        to varying roundoff behavior, whereas the integer
+                        methods should give the same results on all machines.
 
         -restart N      Emit a JPEG restart marker every N MCU rows, or every
                         N MCU blocks if "B" is attached to the number.
@@ -296,13 +311,32 @@ Switches for advanced users:
         -dct int        Use integer DCT method (default).
         -dct fast       Use fast integer DCT (less accurate).
         -dct float      Use floating-point DCT method.
-                        The float method is very slightly more accurate than
-                        the int method, but is much slower unless your machine
-                        has very fast floating-point hardware.  Also note that
-                        results of the floating-point method may vary slightly
-                        across machines, while the integer methods should give
-                        the same results everywhere.  The fast integer method
-                        is much less accurate than the other two.
+                        In libjpeg-turbo, the fast method is generally about
+                        5-15% faster than the int method when using the
+                        x86/x86-64 SIMD extensions (results may vary with other
+                        SIMD implementations, or when using libjpeg-turbo
+                        without SIMD extensions.)  If the JPEG image was
+                        compressed using a quality level of 85 or below, then
+                        there should be little or no perceptible difference
+                        between the two algorithms.  When decompressing images
+                        that were compressed using quality levels above 85,
+                        however, the difference between the fast and int
+                        methods becomes more pronounced.  With images
+                        compressed using quality=97, for instance, the fast
+                        method incurs generally about a 4-6 dB loss (in PSNR)
+                        relative to the int method, but this can be larger for
+                        some images.  If you can avoid it, do not use the fast
+                        method when decompressing images that were compressed
+                        using quality levels above 97.  The algorithm often
+                        degenerates for such images and can actually produce
+                        a more lossy output image than if the JPEG image had
+                        been compressed using lower quality levels.  The float
+                        method is mostly a legacy feature.  It does not produce
+                        significantly more accurate results than the int
+                        method, and it is much slower.  The float method may
+                        also give different results on different machines due
+                        to varying roundoff behavior, whereas the integer
+                        methods should give the same results on all machines.
 
         -dither fs      Use Floyd-Steinberg dithering in color quantization.
         -dither ordered Use ordered dithering in color quantization.
@@ -381,12 +415,6 @@ When producing a color-quantized image, "-onepass -dither ordered" is fast but
 much lower quality than the default behavior.  "-dither none" may give
 acceptable results in two-pass mode, but is seldom tolerable in one-pass mode.
 
-If you are fortunate enough to have very fast floating point hardware,
-"-dct float" may be even faster than "-dct fast".  But on most machines
-"-dct float" is slower than "-dct int"; in this case it is not worth using,
-because its theoretical accuracy advantage is too small to be significant
-in practice.
-
 Two-pass color quantization requires a good deal of memory; on MS-DOS machines
 it may run out of memory even with -maxmemory 0.  In that case you can still
 decompress, with some loss of image quality, by specifying -onepass for
-- 
2.40.0