Antoine Pitrou [Fri, 11 Jun 2010 21:42:26 +0000 (21:42 +0000)]
Issue #8941: decoding big endian UTF-32 data in UCS-2 builds could crash
the interpreter with characters outside the Basic Multilingual Plane
(higher than 0x10000).
Michael Foord [Sat, 5 Jun 2010 21:57:03 +0000 (21:57 +0000)]
Documentation updates for issues 8302 and 8351 (truncating excessive diffs in unittest failure messages and reporting SkipTest exceptions in setUpClass and setUpModule as skips rather than errors).
Ezio Melotti [Sat, 5 Jun 2010 17:51:07 +0000 (17:51 +0000)]
Update PyUnicode_DecodeUTF8 from RFC 2279 to RFC 3629.
1) #8271: when a byte sequence is invalid, only the start byte and all the
valid continuation bytes are now replaced by U+FFFD, instead of replacing
the number of bytes specified by the start byte.
See http://www.unicode.org/versions/Unicode5.2.0/ch03.pdf (pages 94-95);
2) 5- and 6-bytes-long UTF-8 sequences are now considered invalid (no changes
in behavior);
3) Add code and tests to reject surrogates (U+D800-U+DFFF) as defined in
RFC 3629, but leave it commented out since it's not backward compatible;
4) Change the error messages "unexpected code byte" to "invalid start byte"
and "invalid data" to "invalid continuation byte";
5) Add an extensive set of tests in test_unicode;
6) Fix test_codeccallbacks because it was failing after this change.
Mark Dickinson [Sat, 5 Jun 2010 12:14:43 +0000 (12:14 +0000)]
Issue #8627: Fix "XXX undetected error" from unchecked PyErr_WarnPy3k return.
This is just a quick fix: if the warning is turned into an exception, the
exception simply gets ignored.
Michael Foord [Sat, 5 Jun 2010 12:10:52 +0000 (12:10 +0000)]
Removed the new max_diff argument to assertSequenceEqual. All unittest.TestCase assert methods that use difflib to produce failure messages now truncate overly long messages. New class attribute unittest.TestCase.maxDiff to configure this if necessary. Issue 8351.
Michael Foord [Sat, 5 Jun 2010 10:39:42 +0000 (10:39 +0000)]
unittest TestLoader test discovery filename matching done in a method. This makes it easier to override the matching strategy in subclasses. No behaviour change in actual implementation.
R. David Murray [Fri, 4 Jun 2010 19:51:06 +0000 (19:51 +0000)]
#4487: have Charset check with codecs for possible aliases.
Previously, unexpected results occurred when email was passed, for example,
'utf8' as a charset name, since email would accept it but would *not* use
the 'utf-8' codec for it, even though Python itself recognises that as
an alias for utf-8. Now Charset checks with codecs for aliases as well
as its own internal table. Issue 8898 has been opened to change this
further in py3k so that all aliasing is routed through the codecs module.
R. David Murray [Thu, 3 Jun 2010 20:19:25 +0000 (20:19 +0000)]
#8889: rewrite transient_internet so we don't use EAI_NODATA on FreeBSD.
FreeBSD doesn't have socket.EAI_NODATA. I rewrote the routine because
there's no easy way to conditionally include a context manager in a
with statement. As a side benefit, instead of a stack of context
managers there's now only one.
R. David Murray [Thu, 3 Jun 2010 15:43:20 +0000 (15:43 +0000)]
#5610: use \Z not $ so we don't eat extra chars when body part ends with \r\n.
If a body part ended with \r\n, feedparser, using '$' to terminate its
search for the newline, would match on the \r\n, and think that it needed
to strip two characters in order to account for the line end before the
boundary. That made it chop one too many characters off the end of
the body part. Using \Z makes the match correct.
Stefan Krah [Thu, 3 Jun 2010 12:39:50 +0000 (12:39 +0000)]
Issue #7384: If the system readline library is linked against ncurses,
the curses module must be linked against ncurses as well. Otherwise it
is not safe to load both the readline and curses modules in an application.
Thanks Thomas Dickey for answering questions about ncurses/ncursesw
and readline!
Lars Gustäbel [Thu, 3 Jun 2010 12:34:14 +0000 (12:34 +0000)]
Issue #8741: Fixed the TarFile.makelink() method that is responsible
for extracting symbolic and hard link entries as regular files as a
work-around on platforms that do not support filesystem links.
This stopped working reliably after a change in r74571. I also added
a few tests for this functionality.
Lars Gustäbel [Thu, 3 Jun 2010 09:56:22 +0000 (09:56 +0000)]
Issue #8833: tarfile created hard link entries with a size
field != 0 by mistake. The associated testcase did not
expose this bug because it was broken too.
R. David Murray [Wed, 2 Jun 2010 22:03:15 +0000 (22:03 +0000)]
#1368247: make set_charset/MIMEText automatically encode unicode _payload.
Fixes (mysterious, to the end user) UnicodeErrors when using utf-8 as
the charset and unicode as the _text argument. Also makes the way in
which unicode gets encoded to quoted printable for other charsets more
sane (it only worked by accident previously). The _payload now is encoded
to the charset.output_charset if it is unicode.
Ronald Oussoren [Wed, 2 Jun 2010 03:47:14 +0000 (03:47 +0000)]
Fix for issue8868: without this patch 'MacOS.WMAvailable()' will return
False on MacOSX 10.5 or earlier and scripts won't be able to access GUI
functionality.
Mark Dickinson [Sun, 30 May 2010 12:12:25 +0000 (12:12 +0000)]
Issue #5211: Complete removal of implicit coercions for the complex
type. Coercion for arithmetic operations was already removed in
r78280, but that commit didn't remove coercion for rich comparisons.