Fred Drake [Tue, 22 May 2001 19:36:50 +0000 (19:36 +0000)]
Per discussion with Barry, make the default value for both get() and
setdefault() the empty string. In setdefault(), use + to join the value
to create the entry for the headers attribute so that TypeError is raised
if the value is of the wrong type.
Tim Peters [Tue, 22 May 2001 18:28:25 +0000 (18:28 +0000)]
Implementing an idea from Guido on the checkins list:
When regrtest.py finds an attribute "test_main" in a test it imports,
regrtest runs the test's test_main after the import. test_threaded_import
needs this else the cross-thread import lock prevents it from making
progress. Other tests can use this hack too, but I doubt it will ever be
popular.
Guido van Rossum [Tue, 22 May 2001 16:48:37 +0000 (16:48 +0000)]
file_getiter(): make iter(file) be equivalent to file.xreadlines().
This should be faster.
This means:
(1) "for line in file:" won't work if the xreadlines module can't be
imported.
(2) The body of "for line in file:" shouldn't use the file directly;
the effects (e.g. of file.readline(), file.seek() or even
file.tell()) would be undefined because of the buffering that goes
on in the xreadlines module.
Fred Drake [Tue, 22 May 2001 15:12:46 +0000 (15:12 +0000)]
Update to add get() and setdefault() as supported mapping operations, and
add a list of the mapping methods which are not supported (per Barry's
comments).
Jack Jansen [Tue, 22 May 2001 14:13:02 +0000 (14:13 +0000)]
Fixed a nasty slowdown in imports in frozen applications: the shortcut
for loading modules from the application resource fork stopped working
when sys.path component normalization was implemented. Comparison
of sys.path components is now done by FSSpec in stead of by pathname.
Tim Peters [Tue, 22 May 2001 09:34:27 +0000 (09:34 +0000)]
New test adapted from the ancient Demo/threads/bug.py.
ICK ALERT: read the long comment block before run_the_test(). It was
almost impossible to get this to run without instant deadlock, and the
solution here sucks on several counts. If you can dream up a better way,
let me know!
Tim Peters [Tue, 22 May 2001 06:54:14 +0000 (06:54 +0000)]
Changed all the examples with ugly platform-dependent float output to use
numbers that display nicely after repr(). From much doctest experience
with the same trick, I believe people find examples with simple fractions
easier to understand too: they can usually check the results in their
head, and so feel confident about what they're seeing. Not even I get a
warm feeling from a result that looks like 70330.345024097141 ...
Fred Drake [Mon, 21 May 2001 21:23:01 +0000 (21:23 +0000)]
Add a "See also" section with useful links. More should be added giving
pointers to information about the other mailbox formats; if anyone can
provide the information needed, please let me know!
Fred Drake [Mon, 21 May 2001 21:12:10 +0000 (21:12 +0000)]
Remove all files of expected output that contain only the name of the
test; there is no need to store this in a file if the actual test code
does not produce any output.
Fred Drake [Mon, 21 May 2001 21:08:12 +0000 (21:08 +0000)]
If the file containing expected output does not exist, assume that it
contains a single line of text giving the name of the output file. This
covers all tests that do not actually produce any output in the test code.
This patch changes the behaviour of the UTF-16 codec family. Only the
UTF-16 codec will now interpret and remove a *leading* BOM mark. Sub-
sequent BOM characters are no longer interpreted and removed.
UTF-16-LE and -BE pass through all BOM mark characters.
These changes should get the UTF-16 codec more in line with what
the Unicode FAQ recommends w/r to BOM marks.
Fred Drake [Mon, 21 May 2001 20:23:21 +0000 (20:23 +0000)]
Re-write the mailbox test suite to use PyUnit. Cover a lot more ground
for the Maildir mailbox format. This still does not address other mailbox
formats.
Guido van Rossum [Mon, 21 May 2001 20:17:17 +0000 (20:17 +0000)]
parse_declaration(): be more lenient in what we accept. We now
basically accept <!...> where the dots can be single- or double-quoted
strings or any other character except >.
Background: I found a real-life example that failed to parse with
the old assumption: http://www.opensource.org/licenses/jabberpl.html
contains a few constructs of the form <![if !supportLists]>...<![endif]>.
Barry Warsaw [Mon, 21 May 2001 19:58:23 +0000 (19:58 +0000)]
main(): default-domain argument to getopt.getopt() was missing a = to
indicate it took an argument. This closes SF patch #402223 by Bastian
Kleineidam.
Fred Drake [Sun, 20 May 2001 12:26:04 +0000 (12:26 +0000)]
Fix bug in smtplib example: the prompt said to end the message with ^D,
but doing so raised EOFError. This makes it work as advertised and
converts to string methods where reasonable.
Jack Jansen [Sat, 19 May 2001 12:34:59 +0000 (12:34 +0000)]
Include Carbon/Carbon.h in stead of universal headers, if appropriate.
Test for TARGET_API_MAC_OS8 in stead of !TARGET_API_MAC_CARBON where
appropriate.
Tim Peters [Sat, 19 May 2001 07:04:38 +0000 (07:04 +0000)]
Bugfix candidate.
Two exceedingly unlikely errors in dictresize():
1. The loop for finding the new size had an off-by-one error at the
end (could over-index the polys[] vector).
2. The polys[] vector ended with a 0, apparently intended as a sentinel
value but never used as such; i.e., it was never checked, so 0 could
have been used *as* a polynomial.
Neither bug could trigger unless a dict grew to 2**30 slots; since that
would consume at least 12GB of memory just to hold the dict pointers,
I'm betting it's not the cause of the bug Fred's tracking down <wink>.
Jeremy Hylton [Fri, 18 May 2001 20:57:38 +0000 (20:57 +0000)]
vgetargs1() and vgetargskeywords(): Replace uses of PyTuple_Size() and
PyTuple_GetItem() with PyTuple_GET_SIZE() and PyTuple_GET_ITEM().
The code has already done a PyTuple_Check().
Jeremy Hylton [Fri, 18 May 2001 20:53:14 +0000 (20:53 +0000)]
Add a second special case to the inline function call code in eval_code2().
If we have a PyCFunction (builtin) and it is METH_VARARGS only, load
the args and dispatch to call_cfunction() directly. This provides a
small speedup for perhaps the most common function calls -- builtins.
Fred Drake [Fri, 18 May 2001 15:32:59 +0000 (15:32 +0000)]
Added test suite for the new HTMLParser module, originally from the
TAL/PageTemplate package for Zope. This only needed a little boilerplate
change; the tests themselves are unchanged.
Guido van Rossum [Fri, 18 May 2001 14:50:52 +0000 (14:50 +0000)]
A much improved HTML parser -- a replacement for sgmllib. The API is
derived from but not quite compatible with that of sgmllib, so it's a
new file. I suppose it needs documentation, and htmllib needs to be
changed to use this instead of sgmllib, and sgmllib needs to be
declared obsolete. But that can all be done later.
This code was first published as part of TAL (part of Zope Page
Templates), but that was strongly based on sgmllib anyway. Authors
are Fred drake and Guido van Rossum.
Tim Peters [Thu, 17 May 2001 22:25:34 +0000 (22:25 +0000)]
Speed dictresize by collapsing its two passes into one; the reason given
in the comments for using two passes was bogus, as the only object that
can get decref'ed due to the copy is the dummy key, and decref'ing dummy
can't have side effects (for one thing, dummy is immortal! for another,
it's a string object, not a potentially dangerous user-defined object).
Jack Jansen [Thu, 17 May 2001 21:58:34 +0000 (21:58 +0000)]
First step in porting MacPython modules to OSX/unix: break all references between modules except for the obj_New() and obj_Convert() routines, the PyArg_Parse and Py_BuildValue helpers.
And these can now be vectored through glue routines (by defining USE_TOOLBOX_OBJECT_GLUE) which will do the necessary imports, whereupon the module's init routine will tell the glue routine about the real conversion routine address and everything is fine again.
Guido van Rossum [Thu, 17 May 2001 15:03:14 +0000 (15:03 +0000)]
Fixed botched indent in _init_mac() code. (It may never be executed,
but it still can't have any syntax errors. Went a little too fast
there, Jack? :-)
Moved the encoding map building logic from the individual mapping
codec files to codecs.py and added logic so that multi mappings
in the decoding maps now result in mappings to None (undefined mapping)
in the encoding maps.
Jack Jansen [Tue, 15 May 2001 20:22:08 +0000 (20:22 +0000)]
Bah, somehow the macroman<->iso-latin-1 translation got lost during the merge. Checking in one fixed file to make sure MacCVS Pro isn't the problem. If it isn't a flurry of checkins will follow tomorrow. If it is... well...
Tim Peters [Tue, 15 May 2001 20:12:59 +0000 (20:12 +0000)]
Speed tuple comparisons in two ways:
1. Omit the early-out EQ/NE "lengths different?" test. Was unable to find
any real code where it triggered, but it always costs. The same is not
true of list richcmps, where different-size lists appeared to get
compared about half the time.
2. Because tuples are immutable, there's no need to refetch the lengths of
both tuples from memory again on each loop trip.
BUG ALERT: The tuple (and list) richcmp algorithm is arguably wrong,
because it won't believe there's any difference unless Py_EQ returns false
for some corresponding elements:
This patch changes the way the string .encode() method works slightly
and introduces a new method .decode().
The major change is that strg.encode() will no longer try to convert
Unicode returns from the codec into a string, but instead pass along
the Unicode object as-is. The same is now true for all other codec
return types. The underlying C APIs were changed accordingly.
Note that even though this does have the potential of breaking
existing code, the chances are low since conversion from Unicode
previously took place using the default encoding which is normally
set to ASCII rendering this auto-conversion mechanism useless for
most Unicode encodings.
The good news is that you can now use .encode() and .decode() with
much greater ease and that the door was opened for better accessibility
of the builtin codecs.
As demonstration of the new feature, the patch includes a few new
codecs which allow string to string encoding and decoding (rot13,
hex, zip, uu, base64).
Written by Marc-Andre Lemburg. Copyright assigned to the PSF.
Guido van Rossum [Tue, 15 May 2001 02:14:44 +0000 (02:14 +0000)]
Add warnings to the strop module, for to those functions that really
*are* obsolete; three variables and the maketrans() function are not
(yet) obsolete.
Add a compensating warnings.filterwarnings() call to test_strop.py.
Tim Peters [Mon, 14 May 2001 23:19:12 +0000 (23:19 +0000)]
Fix new compiler warnings. Also boost "start" from (C) int to long and
return a (C) long: PyArg_ParseTuple and Py_BuildValue may not let us get
at the size_t we really want, but C int is clearly too small for a 64-bit
box, and both the start parameter and the return value should work for
large mapped files even on 32-bit boxes. The code really needs to be
rethought from scratch (not by me, though ...).
Tim Peters [Mon, 14 May 2001 18:39:41 +0000 (18:39 +0000)]
pprint's workhorse _safe_repr() function took time quadratic in the # of
elements when crunching a list, dict or tuple. Now takes linear time
instead -- huge speedup for even moderately large containers, and the
code is notably simpler too.
Added some basic "is the output correct?" tests to test_pprint.
Guido van Rossum [Mon, 14 May 2001 13:53:38 +0000 (13:53 +0000)]
Fix a typo, consistently spell ASCII in all caps, and insert blank
lines between paragraphs in Mark Hammond's news item about the default
encoding in posixmodule. Resist the temptation to reflow paragraphs.
Greg Stein [Mon, 14 May 2001 09:32:26 +0000 (09:32 +0000)]
Fix the .find() method for memory maps.
1) it didn't obey the "start" parameter (and when it does, we must validate
the value)
2) the return value needs to be an absolute index, rather than relative to
some arbitrary point in the file
(checking CVS, it appears this method never worked; these changes bring it
into line with typical .find() behavior)
Mark Hammond [Sun, 13 May 2001 08:04:26 +0000 (08:04 +0000)]
Add support for Windows using "mbcs" as the default Unicode encoding when dealing with the file system. As discussed on python-dev and in patch 410465.
Tim Peters [Sun, 13 May 2001 06:43:53 +0000 (06:43 +0000)]
Aggressive reordering of dict comparisons. In case of collision, it stands
to reason that me_key is much more likely to match the key we're looking
for than to match dummy, and if the key is absent me_key is much more
likely to be NULL than dummy: most dicts don't even have a dummy entry.
Running instrumented dict code over the test suite and some apps confirmed
that matching dummy was 200-300x less frequent than matching key in
practice. So this reorders the tests to try the common case first.
It can lose if a large dict with many collisions is mostly deleted, not
resized, and then frequently searched, but that's hardly a case we
should be favoring.
Tim Peters [Sun, 13 May 2001 00:19:31 +0000 (00:19 +0000)]
Get rid of the superstitious "~" in dict hashing's "i = (~hash) & mask".
The comment following used to say:
/* We use ~hash instead of hash, as degenerate hash functions, such
as for ints <sigh>, can have lots of leading zeros. It's not
really a performance risk, but better safe than sorry.
12-Dec-00 tim: so ~hash produces lots of leading ones instead --
what's the gain? */
That is, there was never a good reason for doing it. And to the contrary,
as explained on Python-Dev last December, it tended to make the *sum*
(i + incr) & mask (which is the first table index examined in case of
collison) the same "too often" across distinct hashes.
Changing to the simpler "i = hash & mask" reduced the number of string-dict
collisions (== # number of times we go around the lookup for-loop) from about
6 million to 5 million during a full run of the test suite (these are
approximate because the test suite does some random stuff from run to run).
The number of collisions in non-string dicts also decreased, but not as
dramatically.
Note that this may, for a given dict, change the order (wrt previous
releases) of entries exposed by .keys(), .values() and .items(). A number
of std tests suffered bogus failures as a result. For dicts keyed by
small ints, or (less so) by characters, the order is much more likely to be
in increasing order of key now; e.g.,
>>> d = {}
>>> for i in range(10):
... d[i] = i
...
>>> d
{0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9}
>>>
Unfortunately. people may latch on to that in small examples and draw a
bogus conclusion.
test_support.py
Moved test_extcall's sortdict() into test_support, made it stronger,
and imported sortdict into other std tests that needed it.
test_unicode.py
Excluced cp875 from the "roundtrip over range(128)" test, because
cp875 doesn't have a well-defined inverse for unicode("?", "cp875").
See Python-Dev for excruciating details.
Cookie.py
Chaged various output functions to sort dicts before building
strings from them.
test_extcall
Fiddled the expected-result file. This remains sensitive to native
dict ordering, because, e.g., if there are multiple errors in a
keyword-arg dict (and test_extcall sets up many cases like that), the
specific error Python complains about first depends on native dict
ordering.
Jack Jansen [Sat, 12 May 2001 22:46:35 +0000 (22:46 +0000)]
Got the first MacPython module working under MacOSX/MachO (gestalt). Main changes
are including Carbon/Carbon.h in stead of the old headers (unless WITHOUT_FRAMEWORKS
is defined, as it will be for classic MacPython) and selectively disabling all the
stuff that is unneeded in a unix-Python (event handling, etc).
Jack Jansen [Sat, 12 May 2001 21:31:34 +0000 (21:31 +0000)]
Be more sensible about when to use TARGET_API_MAC_OS8 in stead of !TARGET_API_MAC_CARBON. This should greatly facilitate porting stuff to OSX in its MachO/BSD incarnation.
Guido van Rossum [Sat, 12 May 2001 12:18:10 +0000 (12:18 +0000)]
Move the action of loading the configuration to the IdleConf module
rather than the idle.py script. This has advantages and
disadvantages; the biggest advantage being that we can more easily
have an alternative main program.