Tim Peters [Mon, 12 Aug 2002 22:01:34 +0000 (22:01 +0000)]
Added new function k_lopsided_mul(), which is much more efficient than
k_mul() when inputs have vastly different sizes, and a little more
efficient when they're close to a factor of 2 out of whack.
I consider this done now, although I'll set up some more correctness
tests to run overnight.
Tim Peters [Mon, 12 Aug 2002 19:30:26 +0000 (19:30 +0000)]
k_mul(): White-box testing turned up that (ah+al)*(bh+bl) can, in rare
cases, overflow the allocated result object by 1 bit. In such cases,
it would have been brought back into range if we subtracted al*bl and
ah*bh from it first, but I don't want to do that because it hurts cache
behavior. Instead we just ignore the excess bit when it appears -- in
effect, this is forcing unsigned mod BASE**(asize + bsize) arithmetic
in a case where that doesn't happen all by itself.
Guido van Rossum [Mon, 12 Aug 2002 19:05:44 +0000 (19:05 +0000)]
Refactor how __dict__ and __weakref__ interact with __slots__.
1. You can now have __dict__ and/or __weakref__ in your __slots__
(before only __weakref__ was supported). This is treated
differently than before: it merely sets a flag that the object
should support the corresponding magic.
2. Dynamic types now always have descriptors __dict__ and __weakref__
thrust upon them. If the type in fact does not support one or the
other, that descriptor's __get__ method will raise AttributeError.
3. (This is the reason for all this; it fixes SF bug 575229, reported
by Cesar Douady.) Given this code:
class A(object): __slots__ = []
class B(object): pass
class C(A, B): __slots__ = []
the class object for C was broken; its size was less than that of
B, and some descriptors on B could cause a segfault. C now
correctly inherits __weakrefs__ and __dict__ from B, even though A
is the "primary" base (C.__base__ is A).
Tim Peters [Mon, 12 Aug 2002 18:25:43 +0000 (18:25 +0000)]
x_mul(): Made life easier for C optimizers in the "grade school"
algorithm. MSVC 6 wasn't impressed <wink>.
Something odd: the x_mul algorithm appears to get substantially worse
than quadratic time as the inputs grow larger:
bits in each input x_mul time k_mul time
------------------ ---------- ----------
15360 0.01 0.00
30720 0.04 0.01
61440 0.16 0.04
122880 0.64 0.14
245760 2.56 0.40
491520 10.76 1.23
983040 71.28 3.69 1966080 459.31 11.07
That is, x_mul is perfectly quadratic-time until a little burp at
2.56->10.76, and after that goes to hell in a hurry. Under Karatsuba,
doubling the input size "should take" 3 times longer instead of 4, and
that remains the case throughout this range. I conclude that my "be nice
to the cache" reworkings of k_mul() are paying.
Tim Peters [Mon, 12 Aug 2002 17:36:03 +0000 (17:36 +0000)]
k_mul() and long_mul(): I'm confident that the Karatsuba algorithm is
correct now, so added some final comments, did some cleanup, and enabled
it for all long-int multiplies. The KARAT envar no longer matters,
although I left some #if 0'ed code in there for my own use (temporary).
k_mul() is still much slower than x_mul() if the inputs have very
differenent sizes, and that still needs to be addressed.
Tim Peters [Mon, 12 Aug 2002 15:08:20 +0000 (15:08 +0000)]
k_mul: Rearranged computation for better cache use. Ignored overflow
(it's possible, but should be harmless -- this requires more thought,
and allocating enough space in advance to prevent it requires exactly
as much thought, to know exactly how much that is -- the end result
certainly fits in the allocated space -- hmm, but that's really all
the thought it needs! borrows/carries out of the high digits really
are harmless).
Tim Peters [Mon, 12 Aug 2002 06:17:58 +0000 (06:17 +0000)]
x_mul(): This failed to normalize its result.
k_mul(): This didn't allocate enough result space when one input had
more than twice as many bits as the other. This was partly hidden by
that x_mul() didn't normalize its result.
The Karatsuba recurrence is pretty much hosed if the inputs aren't
roughly the same size. If one has at least twice as many bits as the
other, we get a degenerate case where the "high half" of the smaller
input is 0. Added a special case for that, for speed, but despite that
it helped, this can still be much slower than the "grade school" method.
It seems to take a really wild imbalance to trigger that; e.g., a
2**22-bit input times a 1000-bit input on my box runs about twice as slow
under k_mul than under x_mul. This still needs to be addressed.
I'm also not sure that allocating a->ob_size + b->ob_size digits is
enough, given that this is computing k = (ah+al)*(bh+bl) instead of
k = (ah-al)*(bl-bh); i.e., it's certainly enough for the final result,
but it's vaguely possible that adding in the "artificially" large k may
overflow that temporarily. If so, an assert will trigger in the debug
build, but we'll probably compute the right result anyway(!).
Tim Peters [Mon, 12 Aug 2002 05:09:36 +0000 (05:09 +0000)]
Introduced helper functions v_iadd and v_isub, for in-place digit-vector
addition and subtraction. Reworked the tail end of k_mul() to use them.
This saves oodles of one-shot longobject allocations (this is a triply-
recursive routine, so saving one allocation in the body saves 3**n
allocations at depth n; we actually save 2 allocations in the body).
Tim Peters [Mon, 12 Aug 2002 02:31:19 +0000 (02:31 +0000)]
Cautious introduction of a patch that started from
SF 560379: Karatsuba multiplication.
Lots of things were changed from that. This needs a lot more testing,
for correctness and speed, the latter especially when bit lengths are
unbalanced. For now, the Karatsuba code gets invoked if and only if
envar KARAT exists.
Ka-Ping Yee [Sun, 11 Aug 2002 15:11:33 +0000 (15:11 +0000)]
Extend stripid() to handle strings ending in more than one '>'.
Add resolve() to handle looking up objects and names (fix SF bug 586931).
Add a nicer error message when given a filename that doesn't exist.
Guido van Rossum [Sun, 11 Aug 2002 14:06:15 +0000 (14:06 +0000)]
Reset errno to zero after calling PyErr_Warn(). It can potentially do
a lot of work, including I/O (e.g. to import warnings.py), which might
affect errno.
Guido van Rossum [Sun, 11 Aug 2002 04:24:12 +0000 (04:24 +0000)]
Implement stage B0 of PEP 237: add warnings for operations that
currently return inconsistent results for ints and longs; in
particular: hex/oct/%u/%o/%x/%X of negative short ints, and x<<n that
either loses bits or changes sign. (No warnings for repr() of a long,
though that will also change to lose the trailing 'L' eventually.)
This introduces some warnings in the test suite; I'll take care of
those later.
Tim Peters [Sat, 10 Aug 2002 21:20:54 +0000 (21:20 +0000)]
If any trash happened to be sitting around waiting to get collected at
the time it's called, test_saveall() made it look a leak, triggering
bogus warnings from regrtest's -l (findleaks) mode.
Tim Peters [Sat, 10 Aug 2002 05:21:15 +0000 (05:21 +0000)]
1. Combined the base and length arrays into a single array of structs.
This is friendlier for caches.
2. Cut MIN_GALLOP to 7, but added a per-sort min_gallop vrbl that adapts
the "get into galloping mode" threshold higher when galloping isn't
paying, and lower when it is. There's no known case where this hurts.
It's (of course) neutral for /sort, \sort and =sort. It also happens
to be neutral for !sort. It cuts a tiny # of compares in 3sort and +sort.
For *sort, it reduces the # of compares to better than what this used to
do when MIN_GALLOP was hardcoded to 10 (it did about 0.1% more *sort
compares before, but given how close we are to the limit, this is "a
lot"!). %sort used to do about 1.5% more compares, and ~sort about
3.6% more. Here are exact counts:
i *sort 3sort +sort %sort ~sort !sort
15 449235 33019 33016 51328 188720 65534 before
448885 33016 33007 50426 182083 65534 after
0.08% 0.01% 0.03% 1.79% 3.65% 0.00% %ch from after
Tim Peters [Sat, 10 Aug 2002 03:04:33 +0000 (03:04 +0000)]
The samplesort-vs-mergesort #-of-comparisons comparisons were captured
before %sort was introduced. Redid them (the numbers change, but the
conclusions don't). Also did the samplesort counts with the released
2.2.1, as they're slightly different under the last CVS 2.3 samplesort
(some higher, some lower -- CVS had been changed to stop doing the
special-case business on recursive samplesort calls).
Fred Drake [Fri, 9 Aug 2002 20:20:50 +0000 (20:20 +0000)]
Lots of changes to the packaging of the documentation, all to keep
directories clean where the packages are unpacked. Each package now
contains a single directory, Python-Docs-<version>/, which contains the
files for that version of the documentation.
Tim Peters [Fri, 9 Aug 2002 18:13:51 +0000 (18:13 +0000)]
There's no distinction among 'user', 'group' and 'world' permissions
on Win32, so tests that assume there are such distinctions can't
pass. Fiddled them to work.
Massive changes from SF 589982 (tempfile.py rewrite, by Zack
Weinberg). This changes all uses of deprecated tempfile functions to
the recommended ones.
Check-in of the most essential parts of SF 589982 (tempfile.py
rewrite, by Zack Weinberg). This replaces most code in tempfile.py
(please review!!!) and adds extensive unit tests for it.
This will cause some warnings in the test suite; I'll check those in
soon, and also the docs.
Only call sq_repeat if the object does not have a nb_multiply slot. One
example of where this changes behavior is when a new-style instance
defines '__mul__' and '__rmul__' and is multiplied by an int. Before the
change the '__rmul__' method is never called, even if the int is the
left operand.
Steve Purcell [Fri, 9 Aug 2002 09:46:23 +0000 (09:46 +0000)]
Fix to ensure consistent 'repr' and 'str' results between Python
versions, since 'repr(new_style_class) != repr(classic_class)'.
Suggested by Jeremy Hylton.
Major speedup for new-style class creation. Turns out there was some
trampolining going on with the tp_new descriptor, where the inherited
PyType_GenericNew was overwritten with the much slower slot_tp_new
which would end up calling tp_new_wrapper which would eventually call
PyType_GenericNew. Add a special case for this to update_one_slot().
XXX Hope there isn't a loophole in this. I'll buy the first person to
point out a bug in the reasoning a beer.
Moved inplace add and multiply methods from UserString to MutableString.
Closes SF Bug #592573 where inplace add mutated a UserString.
Added unittests to verify the bug is cleared.
Moved special case for tuples from iterobject.c to
tupleobject.c. Makes the code in iterobject.c cleaner
and speeds-up the general case by not checking for
tuples everytime. SF Patch #592065.
Revised the test suite for 'contains' to use the test() function argument
rather than vereq(). While it was effectively testing regular strings, it
ignored the test() function argument when called by test_userstring.py.
Jack Jansen [Fri, 9 Aug 2002 00:18:21 +0000 (00:18 +0000)]
By popular demand the frameworkinstall target now installs everything:
the framework, the MacOSX apps and the unix tools.
Most of the hard work is done by Mac/OSX/Makefile.
Also, it should now be possible to install in a different directory,
such as /tmp/dist/Library/Frameworks, for building binary installers.
The fink crowd wanted this.
Significant speedup in new-style object creation: in slot_tp_new(),
intern the string "__new__" so we can call PyObject_GetAttr() rather
than PyObject_GetAttrString(). (Though it's a mystery why slot_tp_new
is being called when a class doesn't define __new__. I'll look into
that tomorrow.)
Jack Jansen [Thu, 8 Aug 2002 21:16:56 +0000 (21:16 +0000)]
Use hex escape for non-ascii chars, now that the parser wants that.
Good thing, too: some of the characters had been mangled by OS9->CVS->OSX
roundtrips.
A modest speedup of object deallocation. call_finalizer() did rather
a lot of work: it had to save and restore the current exception around
a call to lookup_maybe(), because that could fail in rare cases, and
most objects don't have a __del__ method, so the whole exercise was
usually a waste of time. Changed this to cache the __del__ method in
the type object just like all other special methods, in a new slot
tp_del. So now subtype_dealloc() can test whether tp_del is NULL and
skip the whole exercise if it is. The new slot doesn't need a new
flag bit: subtype_dealloc() is only called if the type was dynamically
allocated by type_new(), so it's guaranteed to have all current slots.
Types defined in C cannot fill in tp_del with a function of their own,
so there's no corresponding "wrapper". (That functionality is already
available through tp_dealloc.)
The other half of the patches added to SF patch 555085 by A I
MacIntyre. At least on OS/2, a subsequent connect() on a nonblocking
socket returns errno==EISCONN to indicate success. This seems
harmless on Unix.
Clean up some docstrings. Some docstrings didn't show their return
value; others were inconsistent in what to name the argument or return
value; a few module-global functions had "socket." in front of their
name, against convention.
testSendAll(): loop until all data is read; this was necessary at
least on OS/2 (see note on SF patch 555085 by A I MacIntyre) but
looks like the test *could* fail on any other platform too -- there's
no guarantee that recv() reads all data.
The _socketobject class has no need for a __del__ method: all it did was
to delete the reference to self._sock, and the regular destructor will
do that just fine. This made some hacks in close() unnecessary.
The _fileobject class still has a __del__ method, because it must flush.