From: Senthil Kumaran Date: Thu, 22 Apr 2010 10:58:56 +0000 (+0000) Subject: Merged revisions 80346 via svnmerge from X-Git-Tag: v3.1.3rc1~895 X-Git-Url: https://granicus.if.org/sourcecode?a=commitdiff_plain;h=d0ab48f1c4bb5cbe96a799f9271f36c51c3debd3;p=python Merged revisions 80346 via svnmerge from svn+ssh://pythondev@svn.python.org/python/branches/py3k ........ r80346 | senthil.kumaran | 2010-04-22 16:23:30 +0530 (Thu, 22 Apr 2010) | 4 lines Fixing a note on encoding declaration, its usage in urlopen based on review comments from RDM and Ezio. ........ --- diff --git a/Doc/library/urllib.request.rst b/Doc/library/urllib.request.rst index 9496083696..8928882c32 100644 --- a/Doc/library/urllib.request.rst +++ b/Doc/library/urllib.request.rst @@ -1072,30 +1072,37 @@ HTTPErrorProcessor Objects Examples -------- -This example gets the python.org main page and displays the first 100 bytes of +This example gets the python.org main page and displays the first 300 bytes of it.:: >>> import urllib.request >>> f = urllib.request.urlopen('http://www.python.org/') - >>> print(f.read(100)) - b' - >> print(f.read(300)) + b'\n\n\n\n\n\n + \n + Python Programming ' + +Note that urlopen returns a bytes object. This is because there is no way +for urlopen to automatically determine the encoding of the byte stream +it receives from the http server. In general, a program will decode +the returned bytes object to string once it determines or guesses +the appropriate encoding. + +The following W3C document, http://www.w3.org/International/O-charset , lists +the various ways in which a (X)HTML or a XML document could have specified its +encoding information. + +As python.org website uses *utf-8* encoding as specified in it's meta tag, we +will use same for decoding the bytes object. :: >>> import urllib.request >>> f = urllib.request.urlopen('http://www.python.org/') - >>> print(f.read(100).decode('utf-8') - <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> - <?xml-stylesheet href="./css/ht2html + >>> print(fp.read(100).decode('utf-8')) + <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" + "http://www.w3.org/TR/xhtml1/DTD/xhtm + In the following example, we are sending a data-stream to the stdin of a CGI and reading the data it returns to us. Note that this example will only work