]> granicus.if.org Git - python/commitdiff
x_mul(): Made life easier for C optimizers in the "grade school"
authorTim Peters <tim.peters@gmail.com>
Mon, 12 Aug 2002 18:25:43 +0000 (18:25 +0000)
committerTim Peters <tim.peters@gmail.com>
Mon, 12 Aug 2002 18:25:43 +0000 (18:25 +0000)
algorithm.  MSVC 6 wasn't impressed <wink>.

Something odd:  the x_mul algorithm appears to get substantially worse
than quadratic time as the inputs grow larger:

bits in each input   x_mul time   k_mul time
------------------   ----------   ----------
             15360         0.01         0.00
             30720         0.04         0.01
             61440         0.16         0.04
            122880         0.64         0.14
            245760         2.56         0.40
            491520        10.76         1.23
            983040        71.28         3.69
           1966080       459.31        11.07

That is, x_mul is perfectly quadratic-time until a little burp at
2.56->10.76, and after that goes to hell in a hurry.  Under Karatsuba,
doubling the input size "should take" 3 times longer instead of 4, and
that remains the case throughout this range.  I conclude that my "be nice
to the cache" reworkings of k_mul() are paying.

Objects/longobject.c

index 0eefc90b1ea3df21a4c84b2d9200eec6d18a8a04..0801e6498183c7c63355cbda05517710282e4573 100644 (file)
@@ -1539,20 +1539,21 @@ x_mul(PyLongObject *a, PyLongObject *b)
                twodigits carry = 0;
                twodigits f = a->ob_digit[i];
                int j;
+               digit *pz = z->ob_digit + i;
 
                SIGCHECK({
                        Py_DECREF(z);
                        return NULL;
                })
                for (j = 0; j < size_b; ++j) {
-                       carry += z->ob_digit[i+j] + b->ob_digit[j] * f;
-                       z->ob_digit[i+j] = (digit) (carry & MASK);
+                       carry += *pz + b->ob_digit[j] * f;
+                       *pz++ = (digit) (carry & MASK);
                        carry >>= SHIFT;
                }
                for (; carry != 0; ++j) {
                        assert(i+j < z->ob_size);
-                       carry += z->ob_digit[i+j];
-                       z->ob_digit[i+j] = (digit) (carry & MASK);
+                       carry += *pz;
+                       *pz++ = (digit) (carry & MASK);
                        carry >>= SHIFT;
                }
        }