Fred Drake [Sun, 3 Jun 2001 03:12:57 +0000 (03:12 +0000)]
Explained more differences between PyList_SetItem() and PyList_SET_ITEM().
In particular, the affect on existing list content was not sufficiently
explained.
Tim Peters [Sat, 2 Jun 2001 08:02:56 +0000 (08:02 +0000)]
Coredumpers from Michael Hudson, mutating dicts while printing or
converting to string.
Critical bugfix candidate -- if you take this seriously <wink>.
Tim Peters [Sat, 2 Jun 2001 05:27:19 +0000 (05:27 +0000)]
New collision resolution scheme: no polynomials, simpler, faster, less
code, less memory. Tests have uncovered no drawbacks. Christian and
Vladimir are the other two people who have burned many brain cells on the
dict code in recent years, and they like the approach too, so I'm checking
it in without further ado.
Tim Peters [Tue, 29 May 2001 22:18:09 +0000 (22:18 +0000)]
This division test was too stringent in its accuracy expectations for
random inputs: if you ran the test 100 times, you could expect it to
report a bogus failure. So loosened its expectations.
Also changed the way failing tests are printed, so that when run under
regrtest.py we get enough info to reproduce the failure.
Tim Peters [Tue, 29 May 2001 21:14:32 +0000 (21:14 +0000)]
BadDictKey test: The output file expected "raising error" to be printed
exactly once. But the test code can't know that, as the number of times
__cmp__ is called depends on internal details of the dict implementation.
This is especially nasty because the __hash__ method returns the address
of the class object, so the hash codes seen by the dict can vary across
runs, causing the dict to use a different probe order across runs. I
just happened to see this test fail about 1 run in 7 today, but only
under a release build and when passing -O to Python. So, changed the test
to be predictable across runs.
Fred Drake [Tue, 29 May 2001 19:53:46 +0000 (19:53 +0000)]
New solution to the "Someone stuck a colon in that filename!" problem:
Allow colons in the labels used for internal references, but do not
expose them when generating filename.
Fred Drake [Tue, 29 May 2001 18:51:41 +0000 (18:51 +0000)]
Users of PySequence_GET_FAST() should get the length of the sequence using
PySequence_Size(), not PyObject_Size(): the later considers the mapping
methods as well as the sequence methods, which is not needed here. Either
should be equally fast in this case, but PySequence_Size() offers a better
conceptual match.
Fred Drake [Tue, 29 May 2001 18:13:06 +0000 (18:13 +0000)]
readlink() description: Added note that the return value may be either
absolute or relative.
remove(), rename() descriptions: Give more information about the cross-
platform behavior of these functions, so single-platform developers
can be aware of the potential issues when writing portable code.
Jeremy Hylton [Tue, 29 May 2001 17:46:19 +0000 (17:46 +0000)]
Change cascaded if stmts to switch stmt in vgetargs1().
In the default branch, keep three ifs that are used if level == 0, the
most common case. Note that first if here is a slight optimization
for the 'O' format.
Jeremy Hylton [Tue, 29 May 2001 17:37:05 +0000 (17:37 +0000)]
Internal refactoring of convertsimple() and friends.
Note that lots of code was re-indented.
Replace two-step of convertsimple() and convertsimple1() with
convertsimple() and helper converterr(), which is called to format
error messages when convertsimple() fails. The old code did all the
real work in convertsimple1(), but deferred error message formatting
to conversimple(). The result was paying the price of a second
function call on every call just to format error messages in the
failure cases.
Factor out of the buffer-handling code in convertsimple() and package
it as convertbuffer().
Add two macros to ease readability of Unicode coversions,
UNICODE_DEFAULT_ENCODING() and CONV_UNICODE, an error string.
The convertsimple() routine had awful indentation problems, primarily
because there were two tabs between the case line and the body of the
case statements. This patch reformats the entire function to have a
single tab between case line and case body, which makes the code
easier to read (and consistent with ceval). The introduction of
converterr() exacerbated the problem and prompted this fix.
Also, eliminate non-standard whitespace after opening paren and before
closing paren in a few if statements.
Fred Drake [Tue, 29 May 2001 17:10:51 +0000 (17:10 +0000)]
runtest(): When generating output, if the result is a single line with the
name of the test, only write the output file if it already exists (and
tell the user to consider removing it). This avoids the generation of
unnecessary turds.
Fred Drake [Tue, 29 May 2001 16:10:07 +0000 (16:10 +0000)]
Hack to make this play nicer with *old* versions of Python: os.path.abspath()
was not available in Python 1.5.1. (Yes, a user actually tried to use this
with that version of Python!)
Fred Drake [Tue, 29 May 2001 16:02:35 +0000 (16:02 +0000)]
Bring the notes on the relationship between __cmp__(), __eq__(), and
__hash__() up to date (re: use of objects which define these methods
as dictionary keys).
Fred Drake [Tue, 29 May 2001 15:34:06 +0000 (15:34 +0000)]
Removed information on the old third parameter to _PyTuple_Resize().
Added information on PyIter_Check(), PyIter_Next(),
PyObject_Unicode(), PyString_AsDecodedObject(),
PyString_AsEncodedObject(), and PyThreadState_GetDict().
Fred Drake [Tue, 29 May 2001 15:13:00 +0000 (15:13 +0000)]
Do not start API descriptions with "Does the same, but ..." -- actually
state *which* other function the current one is like, even if the
descriptions are adjacent.
Revise the _PyTuple_Resize() description to reflect the removal of the
third parameter.
Tim Peters [Tue, 29 May 2001 04:27:01 +0000 (04:27 +0000)]
Patch from Gordon McMillan.
updatecache(): When using imputil, sys.path may contain things other than
strings. Ignore such things instead of blowing up.
Hard to say whether this is a bugfix or a feature ...
Thomas Wouters [Mon, 28 May 2001 13:11:02 +0000 (13:11 +0000)]
_PyTuple_Resize: take into account the empty tuple. There can be only one.
Instead of raising a SystemError, just create a new tuple of the desired
size.
Tim Peters [Sun, 27 May 2001 07:39:22 +0000 (07:39 +0000)]
Implement an old idea of Christian Tismer's: use polynomial division
instead of multiplication to generate the probe sequence. The idea is
recorded in Python-Dev for Dec 2000, but that version is prone to rare
infinite loops.
The value is in getting *all* the bits of the hash code to participate;
and, e.g., this speeds up querying every key in a dict with keys
[i << 16 for i in range(20000)] by a factor of 500. Should be equally
valuable in any bad case where the high-order hash bits were getting
ignored.
Also wrote up some of the motivations behind Python's ever-more-subtle
hash table strategy.
Jack Jansen [Sat, 26 May 2001 20:01:41 +0000 (20:01 +0000)]
When reading from stdin (with the dialog box) use any partial line on
stdout as the prompt. This makes raw_input() and print "xxx", ; sys.stdin.readline() work a bit more palatable.
Tim Peters [Sat, 26 May 2001 05:28:40 +0000 (05:28 +0000)]
roundupsize() and friends: fiddle over-allocation strategy for list
resizing.
Accurate timings are impossible on my Win98SE box, but this is obviously
faster even on this box for reasonable list.append() cases. I give
credit for this not to the resizing strategy but to getting rid of integer
multiplication and divsion (in favor of shifting) when computing the
rounded-up size.
For unreasonable list.append() cases, Win98SE now displays linear behavior
for one-at-time appends up to a list with about 35 million elements. Then
it dies with a MemoryError, due to fatally fragmented *address space*
(there's plenty of VM available, but by this point Win9X has broken user
space into many distinct heaps none of which has enough contiguous space
left to resize the list, and for whatever reason Win9x isn't coalescing
the dead heaps). Before the patch it got a MemoryError for the same
reason, but once the list reached about 2 million elements.
Haven't yet tried on Win2K but have high hopes extreme list.append()
will be much better behaved now (NT & Win2K didn't fragment address space,
but suffered obvious quadratic-time behavior before as lists got large).
For other systems I'm relying on common sense: replacing integer * and /
by << and >> can't plausibly hurt, the number of function calls hasn't
changed, and the total operation count for reasonably small lists is about
the same (while the operations are cheaper now).
Fred Drake [Fri, 25 May 2001 04:24:37 +0000 (04:24 +0000)]
Add descriptions of {}.iteritems(), {}.iterkeys(), and {}.itervalues()
in the table of mapping object operations. Re-numbered the list of
notes to reflect the move of the "Added in version 2.2." note to the list
of notes instead of being inserted into the last column of the table.
Tim Peters [Thu, 24 May 2001 16:26:40 +0000 (16:26 +0000)]
dictresize(): Rebuild small tables if there are any dummies, not just if
they're entirely full. Not a question of correctness, but of temporarily
misplaced common sense.
Tim Peters [Wed, 23 May 2001 23:33:57 +0000 (23:33 +0000)]
Jack Jansen hit a bug in the new dict code, reported on python-dev.
dictresize() was too aggressive about never ever resizing small dicts.
If a small dict is entirely full, it needs to rebuild it despite that
it won't actually resize it, in order to purge old dummy entries thus
creating at least one virgin slot (lookdict assumes at least one such
exists).
Also took the opportunity to add some high-level comments to dictresize.
Barry Warsaw [Wed, 23 May 2001 16:59:45 +0000 (16:59 +0000)]
write(): Do two levels of sorting: first sort the individual location
tuples by filename/lineno, then sort the catalog entries by their
location tuples.
Guido van Rossum [Wed, 23 May 2001 13:24:30 +0000 (13:24 +0000)]
When Tim untabified this file, his editor accidentally assumed 4-space
tabs. The title was centered using 8-byte tabs, however, and the
result looked strange. Fixed this.
Tim Peters [Wed, 23 May 2001 07:46:36 +0000 (07:46 +0000)]
Remove test_doctest's expected-output file.
Change test_doctest and test_difflib to pass regrtest's notion of
verbosity on to doctest.
Add explanation for a dozen "new" things to test/README.
Tim Peters [Wed, 23 May 2001 01:45:19 +0000 (01:45 +0000)]
Remove test_difflib's output file and change test_difflib to stop
generating it. Since this is purely a doctest, the output file never
served a good purpose.
Jack Jansen [Tue, 22 May 2001 21:56:42 +0000 (21:56 +0000)]
Lots more Carbon/Carbon.h includes, new UPP routine names, function prototypes. Most toolbox modules now compile, link and import in MacOSX-MachO python.
Tim Peters [Tue, 22 May 2001 20:40:22 +0000 (20:40 +0000)]
SF patch #425242: Patch which "inlines" small dictionaries.
The idea is Marc-Andre Lemburg's, the implementation is Tim's.
Add a new ma_smalltable member to dictobjects, an embedded vector of
MINSIZE (8) dictentry structs. Short course is that this lets us avoid
additional malloc(s) for dicts with no more than 5 entries.
The changes are widespread but mostly small.
Long course: WRT speed, all scalar operations (getitem, setitem, delitem)
on non-empty dicts benefit from no longer needing NULL-pointer checks
(ma_table is never NULL anymore). Bulk operations (copy, update, resize,
clearing slots during dealloc) benefit in some cases from now looping
on the ma_fill count rather than on ma_size, but that was an unexpected
benefit: the original reason to loop on ma_fill was to let bulk
operations on empty dicts end quickly (since the NULL-pointer checks
went away, empty dicts aren't special-cased any more).
Special considerations:
For dicts that remain empty, this change is a lose on two counts:
the dict object contains 8 new dictentry slots now that weren't
needed before, and dict object creation also spends time memset'ing
these doomed-to-be-unsused slots to NULLs.
For dicts with one or two entries that never get larger than 2, it's
a mix: a malloc()/free() pair is no longer needed, and the 2-entry case
gets to use 8 slots (instead of 4) thus decreasing the chance of
collision. Against that, dict object creation spends time memset'ing
4 slots that aren't strictly needed in this case.
For dicts with 3 through 5 entries that never get larger than 5, it's a
pure win: the dict is created with all the space they need, and they
never need to resize. Before they suffered two malloc()/free() calls,
plus 1 dict resize, to get enough space. In addition, the 8-slot
table they ended with consumed more memory overall, because of the
hidden overhead due to the additional malloc.
For dicts with 6 or more entries, the ma_smalltable member is wasted
space, but then these are large(r) dicts so 8 slots more or less doesn't
make much difference. They still benefit all the time from removing
ubiquitous dynamic null-pointer checks, and get a small benefit (but
relatively smaller the larger the dict) from not having to do two
mallocs, two frees, and a resize on the way *to* getting their sixth
entry.
All in all it appears a small but definite general win, with larger
benefits in specific cases. It's especially nice that it allowed to
get rid of several branches, gotos and labels, and overall made the
code smaller.
Fred Drake [Tue, 22 May 2001 19:36:50 +0000 (19:36 +0000)]
Per discussion with Barry, make the default value for both get() and
setdefault() the empty string. In setdefault(), use + to join the value
to create the entry for the headers attribute so that TypeError is raised
if the value is of the wrong type.
Tim Peters [Tue, 22 May 2001 18:28:25 +0000 (18:28 +0000)]
Implementing an idea from Guido on the checkins list:
When regrtest.py finds an attribute "test_main" in a test it imports,
regrtest runs the test's test_main after the import. test_threaded_import
needs this else the cross-thread import lock prevents it from making
progress. Other tests can use this hack too, but I doubt it will ever be
popular.
Guido van Rossum [Tue, 22 May 2001 16:48:37 +0000 (16:48 +0000)]
file_getiter(): make iter(file) be equivalent to file.xreadlines().
This should be faster.
This means:
(1) "for line in file:" won't work if the xreadlines module can't be
imported.
(2) The body of "for line in file:" shouldn't use the file directly;
the effects (e.g. of file.readline(), file.seek() or even
file.tell()) would be undefined because of the buffering that goes
on in the xreadlines module.
Fred Drake [Tue, 22 May 2001 15:12:46 +0000 (15:12 +0000)]
Update to add get() and setdefault() as supported mapping operations, and
add a list of the mapping methods which are not supported (per Barry's
comments).
Jack Jansen [Tue, 22 May 2001 14:13:02 +0000 (14:13 +0000)]
Fixed a nasty slowdown in imports in frozen applications: the shortcut
for loading modules from the application resource fork stopped working
when sys.path component normalization was implemented. Comparison
of sys.path components is now done by FSSpec in stead of by pathname.
Tim Peters [Tue, 22 May 2001 09:34:27 +0000 (09:34 +0000)]
New test adapted from the ancient Demo/threads/bug.py.
ICK ALERT: read the long comment block before run_the_test(). It was
almost impossible to get this to run without instant deadlock, and the
solution here sucks on several counts. If you can dream up a better way,
let me know!
Tim Peters [Tue, 22 May 2001 06:54:14 +0000 (06:54 +0000)]
Changed all the examples with ugly platform-dependent float output to use
numbers that display nicely after repr(). From much doctest experience
with the same trick, I believe people find examples with simple fractions
easier to understand too: they can usually check the results in their
head, and so feel confident about what they're seeing. Not even I get a
warm feeling from a result that looks like 70330.345024097141 ...
Fred Drake [Mon, 21 May 2001 21:23:01 +0000 (21:23 +0000)]
Add a "See also" section with useful links. More should be added giving
pointers to information about the other mailbox formats; if anyone can
provide the information needed, please let me know!
Fred Drake [Mon, 21 May 2001 21:12:10 +0000 (21:12 +0000)]
Remove all files of expected output that contain only the name of the
test; there is no need to store this in a file if the actual test code
does not produce any output.
Fred Drake [Mon, 21 May 2001 21:08:12 +0000 (21:08 +0000)]
If the file containing expected output does not exist, assume that it
contains a single line of text giving the name of the output file. This
covers all tests that do not actually produce any output in the test code.
This patch changes the behaviour of the UTF-16 codec family. Only the
UTF-16 codec will now interpret and remove a *leading* BOM mark. Sub-
sequent BOM characters are no longer interpreted and removed.
UTF-16-LE and -BE pass through all BOM mark characters.
These changes should get the UTF-16 codec more in line with what
the Unicode FAQ recommends w/r to BOM marks.