Update the first two parts of the reference manual for Py3k,

author Georg Brandl <georg@python.org>

Fri, 31 Aug 2007 08:07:45 +0000 (08:07 +0000)

committer Georg Brandl <georg@python.org>

Fri, 31 Aug 2007 08:07:45 +0000 (08:07 +0000)
author Georg Brandl <georg@python.org>
Fri, 31 Aug 2007 08:07:45 +0000 (08:07 +0000)
committer Georg Brandl <georg@python.org>
Fri, 31 Aug 2007 08:07:45 +0000 (08:07 +0000)
diff --git a/Doc/documenting/index.rst b/Doc/documenting/index.rst

index 1a3778b092508bf7a47cc69c0943768f46bad0db..5adbd46b066a491f0c0c567f503ea4bd0290face 100644 (file)
--- a/Doc/documenting/index.rst
+++ b/Doc/documenting/index.rst
@@ -27,6 +27,7 @@ are more than welcome as well.
     style.rst
     rest.rst
     markup.rst
+   fromlatex.rst
     sphinx.rst
  
  .. XXX add credits, thanks etc.
diff --git a/Doc/reference/introduction.rst b/Doc/reference/introduction.rst

index 0d53719f27f5797c12c40365ba925ebe0678bfd3..4da1606ea18c4b6120e9c5bc85f6eb4a71b897ab 100644 (file)
--- a/Doc/reference/introduction.rst
+++ b/Doc/reference/introduction.rst
@@ -22,11 +22,12 @@ language, maybe you could volunteer your time --- or invent a cloning machine
  
  It is dangerous to add too many implementation details to a language reference
  document --- the implementation may change, and other implementations of the
-same language may work differently.  On the other hand, there is currently only
-one Python implementation in widespread use (although alternate implementations
-exist), and its particular quirks are sometimes worth being mentioned,
-especially where the implementation imposes additional limitations.  Therefore,
-you'll find short "implementation notes" sprinkled throughout the text.
+same language may work differently.  On the other hand, CPython is the one
+Python implementation in widespread use (although alternate implementations
+continue to gain support), and its particular quirks are sometimes worth being
+mentioned, especially where the implementation imposes additional limitations.
+Therefore, you'll find short "implementation notes" sprinkled throughout the
+text.
  
  Every Python implementation comes with a number of built-in and standard
  modules.  These are documented in :ref:`library-index`.  A few built-in modules
@@ -88,11 +89,7 @@ implementation you're using.
  Notation
  ========
  
-.. index::
-   single: BNF
-   single: grammar
-   single: syntax
-   single: notation
+.. index:: BNF, grammar, syntax, notation
  
  The descriptions of lexical analysis and syntax use a modified BNF grammar
  notation.  This uses the following style of definition:
@@ -118,9 +115,7 @@ meaningful to separate tokens. Rules are normally contained on a single line;
  rules with many alternatives may be formatted alternatively with each line after
  the first beginning with a vertical bar.
  
-.. index::
-   single: lexical definitions
-   single: ASCII@ASCII
+.. index:: lexical definitions, ASCII
  
  In lexical definitions (as the example above), two more conventions are used:
  Two literal characters separated by three dots mean a choice of any single
diff --git a/Doc/reference/lexical_analysis.rst b/Doc/reference/lexical_analysis.rst

index 35e92cf9cb9f39eb2210e578a9f6ed0e846b6df1..856137d8ea67abcefcfba245ac2db9e019c0dfed 100644 (file)
--- a/Doc/reference/lexical_analysis.rst
+++ b/Doc/reference/lexical_analysis.rst
@@ -5,38 +5,16 @@
  Lexical analysis
  ****************
  
-.. index::
-   single: lexical analysis
-   single: parser
-   single: token
+.. index:: lexical analysis, parser, token
  
  A Python program is read by a *parser*.  Input to the parser is a stream of
  *tokens*, generated by the *lexical analyzer*.  This chapter describes how the
  lexical analyzer breaks a file into tokens.
  
-Python uses the 7-bit ASCII character set for program text.
-
-.. versionadded:: 2.3
-   An encoding declaration can be used to indicate that  string literals and
-   comments use an encoding different from ASCII.
-
-For compatibility with older versions, Python only warns if it finds 8-bit
-characters; those warnings should be corrected by either declaring an explicit
-encoding, or using escape sequences if those bytes are binary data, instead of
-characters.
-
-The run-time character set depends on the I/O devices connected to the program
-but is generally a superset of ASCII.
-
-**Future compatibility note:** It may be tempting to assume that the character
-set for 8-bit characters is ISO Latin-1 (an ASCII superset that covers most
-western languages that use the Latin alphabet), but it is possible that in the
-future Unicode text editors will become common.  These generally use the UTF-8
-encoding, which is also an ASCII superset, but with very different use for the
-characters with ordinals 128-255.  While there is no consensus on this subject
-yet, it is unwise to assume either Latin-1 or UTF-8, even though the current
-implementation appears to favor Latin-1.  This applies both to the source
-character set and the run-time character set.
+Python reads program text as Unicode code points; the encoding of a source file
+can be given by an encoding declaration and defaults to UTF-8, see :pep:`3120`
+for details.  If the source file cannot be decoded, a :exc:`SyntaxError` is
+raised.
  
  
  .. _line-structure:
@@ -44,21 +22,17 @@ character set and the run-time character set.
  Line structure
  ==============
  
-.. index:: single: line structure
+.. index:: line structure
  
  A Python program is divided into a number of *logical lines*.
  
  
-.. _logical:
+.. _logical-lines:
  
  Logical lines
  -------------
  
-.. index::
-   single: logical line
-   single: physical line
-   single: line joining
-   single: NEWLINE token
+.. index:: logical line, physical line, line joining, NEWLINE token
  
  The end of a logical line is represented by the token NEWLINE.  Statements
  cannot cross logical line boundaries except where NEWLINE is allowed by the
@@ -67,7 +41,7 @@ constructed from one or more *physical lines* by following the explicit or
  implicit *line joining* rules.
  
  
-.. _physical:
+.. _physical-lines:
  
  Physical lines
  --------------
@@ -89,9 +63,7 @@ representing ASCII LF, is the line terminator).
  Comments
  --------
  
-.. index::
-   single: comment
-   single: hash character
+.. index:: comment, hash character
  
  A comment starts with a hash character (``#``) that is not part of a string
  literal, and ends at the end of the physical line.  A comment signifies the end
@@ -104,9 +76,7 @@ are ignored by the syntax; they are not tokens.
  Encoding declarations
  ---------------------
  
-.. index::
-   single: source character set
-   single: encodings
+.. index:: source character set, encodings
  
  If a comment in the first or second line of the Python script matches the
  regular expression ``coding[=:]\s*([-\w.]+)``, this comment is processed as an
@@ -119,19 +89,19 @@ which is recognized also by GNU Emacs, and ::
  
     # vim:fileencoding=<encoding-name>
  
-which is recognized by Bram Moolenaar's VIM. In addition, if the first bytes of
-the file are the UTF-8 byte-order mark (``'\xef\xbb\xbf'``), the declared file
-encoding is UTF-8 (this is supported, among others, by Microsoft's
-:program:`notepad`).
+which is recognized by Bram Moolenaar's VIM.
+
+If no encoding declaration is found, the default encoding is UTF-8.  In
+addition, if the first bytes of the file are the UTF-8 byte-order mark
+(``b'\xef\xbb\xbf'``), the declared file encoding is UTF-8 (this is supported,
+among others, by Microsoft's :program:`notepad`).
  
  If an encoding is declared, the encoding name must be recognized by Python. The
-encoding is used for all lexical analysis, in particular to find the end of a
-string, and to interpret the contents of Unicode literals. String literals are
-converted to Unicode for syntactical analysis, then converted back to their
-original encoding before interpretation starts. The encoding declaration must
-appear on a line of its own.
+encoding is used for all lexical analysis, including string literals, comments
+and identifiers. The encoding declaration must appear on a line of its own.
  
-.. % XXX there should be a list of supported encodings.
+A list of standard encodings can be found in the section
+:ref:`standard-encodings`.
  
  
  .. _explicit-joining:
@@ -139,21 +109,13 @@ appear on a line of its own.
  Explicit line joining
  ---------------------
  
-.. index::
-   single: physical line
-   single: line joining
-   single: line continuation
-   single: backslash character
+.. index:: physical line, line joining, line continuation, backslash character
  
  Two or more physical lines may be joined into logical lines using backslash
  characters (``\``), as follows: when a physical line ends in a backslash that is
  not part of a string literal or comment, it is joined with the following forming
  a single logical line, deleting the backslash and the following end-of-line
-character.  For example:
-
-.. % 
-
-::
+character.  For example::
  
     if 1900 < year < 2100 and 1 <= month <= 12 \
        and 1 <= day <= 31 and 0 <= hour < 24 \
@@ -197,9 +159,9 @@ Blank lines
  A logical line that contains only spaces, tabs, formfeeds and possibly a
  comment, is ignored (i.e., no NEWLINE token is generated).  During interactive
  input of statements, handling of a blank line may differ depending on the
-implementation of the read-eval-print loop.  In the standard implementation, an
-entirely blank logical line (i.e. one containing not even whitespace or a
-comment) terminates a multi-line statement.
+implementation of the read-eval-print loop.  In the standard interactive
+interpreter, an entirely blank logical line (i.e. one containing not even
+whitespace or a comment) terminates a multi-line statement.
  
  
  .. _indentation:
@@ -207,14 +169,7 @@ comment) terminates a multi-line statement.
  Indentation
  -----------
  
-.. index::
-   single: indentation
-   single: whitespace
-   single: leading whitespace
-   single: space
-   single: tab
-   single: grouping
-   single: statement grouping
+.. index:: indentation, leading whitespace, space, tab, grouping, statement grouping
  
  Leading whitespace (spaces and tabs) at the beginning of a logical line is used
  to compute the indentation level of the line, which in turn is used to determine
@@ -238,9 +193,7 @@ for the indentation calculations above.  Formfeed characters occurring elsewhere
  in the leading whitespace have an undefined effect (for instance, they may reset
  the space count to zero).
  
-.. index::
-   single: INDENT token
-   single: DEDENT token
+.. index:: INDENT token, DEDENT token
  
  The indentation levels of consecutive lines are used to generate INDENT and
  DEDENT tokens, using a stack, as follows.
@@ -315,22 +268,48 @@ possible string that forms a legal token, when read from left to right.
  Identifiers and keywords
  ========================
  
-.. index::
-   single: identifier
-   single: name
+.. index:: identifier, name
  
  Identifiers (also referred to as *names*) are described by the following lexical
  definitions:
  
-.. productionlist::
-   identifier: (`letter`|"_") (`letter` | `digit` | "_")*
-   letter: `lowercase` | `uppercase`
-   lowercase: "a"..."z"
-   uppercase: "A"..."Z"
-   digit: "0"..."9"
+The syntax of identifiers in Python is based on the Unicode standard annex
+UAX-31, with elaboration and changes as defined below.
+
+Within the ASCII range (U+0001..U+007F), the valid characters for identifiers
+are the same as in Python 2.5; Python 3.0 introduces additional
+characters from outside the ASCII range (see :pep:`3131`).  For other
+characters, the classification uses the version of the Unicode Character
+Database as included in the :mod:`unicodedata` module.
  
  Identifiers are unlimited in length.  Case is significant.
  
+.. productionlist::
+   identifier: `id_start` `id_continue`*
+   id_start: <all characters in general categories Lu, Ll, Lt, Lm, Lo, Nl,
+              the underscore, and characters with the Other_ID_Start property>
+   id_continue: <all characters in `id_start`, plus characters in the categories
+                 Mn, Mc, Nd, Pc and others with the Other_ID_Continue property>
+
+The Unicode category codes mentioned above stand for:
+
+* *Lu* - uppercase letters
+* *Ll* - lowercase letters
+* *Lt* - titlecase letters
+* *Lm* - modifier letters
+* *Lo* - other letters
+* *Nl* - letter numbers
+* *Mn* - nonspacing marks
+* *Mc* - spacing combining marks
+* *Nd* - decimal numbers
+* *Pc* - connector punctuations
+
+All identifiers are converted into the normal form NFC while parsing; comparison
+of identifiers is based on NFC.
+
+A non-normative HTML file listing all valid identifier characters for Unicode
+4.1 can be found at
+http://www.dcl.hpi.uni-potsdam.de/home/loewis/table-3131.html.
  
  .. _keywords:
  
@@ -345,25 +324,13 @@ The following identifiers are used as reserved words, or *keywords* of the
  language, and cannot be used as ordinary identifiers.  They must be spelled
  exactly as written here::
  
-   and       def       for       is        raise
-   as        del       from      lambda    return
-   assert    elif      global    not       try
-   break     else      if        or        while
-   class     except    import    pass      with
-   continue  finally   in        print     yield
-
-.. versionchanged:: 2.4
-   :const:`None` became a constant and is now recognized by the compiler as a name
-   for the built-in object :const:`None`.  Although it is not a keyword, you cannot
-   assign a different object to it.
-
-.. versionchanged:: 2.5
-   Both :keyword:`as` and :keyword:`with` are only recognized when the
-   ``with_statement`` future feature has been enabled. It will always be enabled in
-   Python 2.6.  See section :ref:`with` for details.  Note that using :keyword:`as`
-   and :keyword:`with` as identifiers will always issue a warning, even when the
-   ``with_statement`` future directive is not in effect.
-
+   False      class      finally    is         return
+   None       continue   for        lambda     try
+   True       def        from       nonlocal   while
+   and        del        global     not        with
+   as         elif       if         or         yield
+   assert     else       import     pass
+   break      except     in         raise
  
  .. _id-classes:
  
@@ -405,71 +372,71 @@ characters:
  Literals
  ========
  
-.. index::
-   single: literal
-   single: constant
+.. index:: literal, constant
  
  Literals are notations for constant values of some built-in types.
  
  
  .. _strings:
  
-String literals
----------------
+String and Bytes literals
+-------------------------
  
-.. index:: single: string literal
+.. index:: string literal, bytes literal, ASCII
  
  String literals are described by the following lexical definitions:
  
-.. index:: single: ASCII@ASCII
-
  .. productionlist::
     stringliteral: [`stringprefix`](`shortstring` | `longstring`)
-   stringprefix: "r" | "u" | "ur" | "R" | "U" | "UR" | "Ur" | "uR"
+   stringprefix: "r" | "R"
     shortstring: "'" `shortstringitem`* "'" | '"' `shortstringitem`* '"'
-   longstring: ""'" `longstringitem`* ""'"
-             : | '"""' `longstringitem`* '"""'
-   shortstringitem: `shortstringchar` | `escapeseq`
-   longstringitem: `longstringchar` | `escapeseq`
+   longstring: "'''" `longstringitem`* "'''" | '"""' `longstringitem`* '"""'
+   shortstringitem: `shortstringchar` | `stringescapeseq`
+   longstringitem: `longstringchar` | `stringescapeseq`
     shortstringchar: <any source character except "\" or newline or the quote>
     longstringchar: <any source character except "\">
-   escapeseq: "\" <any ASCII character>
+   stringescapeseq: "\" <any source character>
+
+.. productionlist::
+   bytesliteral: `bytesprefix`(`shortbytes` | `longbytes`)
+   bytesprefix: "b" | "B"
+   shortbytes: "'" `shortbytesitem`* "'" | '"' `shortbytesitem`* '"'
+   longbytes: "'''" `longbytesitem`* "'''" | '"""' `longbytesitem`* '"""'
+   shortbytesitem: `shortbyteschar` | `bytesescapeseq`
+   longbytesitem: `longbyteschar` | `bytesescapeseq`
+   shortbyteschar: <any ASCII character except "\" or newline or the quote>
+   longbyteschar: <any ASCII character except "\">
+   bytesescapeseq: "\" <any ASCII character>
  
  One syntactic restriction not indicated by these productions is that whitespace
-is not allowed between the :token:`stringprefix` and the rest of the string
-literal. The source character set is defined by the encoding declaration; it is
-ASCII if no encoding declaration is given in the source file; see section
-:ref:`encodings`.
+is not allowed between the :token:`stringprefix` or :token:`bytesprefix` and the
+rest of the literal. The source character set is defined by the encoding
+declaration; it is UTF-8 if no encoding declaration is given in the source file;
+see section :ref:`encodings`.
  
-.. index::
-   single: triple-quoted string
-   single: Unicode Consortium
-   single: string; Unicode
-   single: raw string
+.. index:: triple-quoted string, Unicode Consortium, raw string
  
-In plain English: String literals can be enclosed in matching single quotes
+In plain English: Both types of literals can be enclosed in matching single quotes
  (``'``) or double quotes (``"``).  They can also be enclosed in matching groups
  of three single or double quotes (these are generally referred to as
  *triple-quoted strings*).  The backslash (``\``) character is used to escape
  characters that otherwise have a special meaning, such as newline, backslash
-itself, or the quote character.  String literals may optionally be prefixed with
-a letter ``'r'`` or ``'R'``; such strings are called :dfn:`raw strings` and use
-different rules for interpreting backslash escape sequences.  A prefix of
-``'u'`` or ``'U'`` makes the string a Unicode string.  Unicode strings use the
-Unicode character set as defined by the Unicode Consortium and ISO 10646.  Some
-additional escape sequences, described below, are available in Unicode strings.
-The two prefix characters may be combined; in this case, ``'u'`` must appear
-before ``'r'``.
+itself, or the quote character.
+
+String literals may optionally be prefixed with a letter ``'r'`` or ``'R'``;
+such strings are called :dfn:`raw strings` and use different rules for
+interpreting backslash escape sequences.
+
+Bytes literals are always prefixed with ``'b'`` or ``'B'``; they produce an
+instance of the :class:`bytes` type instead of the :class:`str` type.  They
+may only contain ASCII characters; bytes with a numeric value of 128 or greater
+must be expressed with escapes.
  
  In triple-quoted strings, unescaped newlines and quotes are allowed (and are
  retained), except that three unescaped quotes in a row terminate the string.  (A
  "quote" is the character used to open the string, i.e. either ``'`` or ``"``.)
  
-.. index::
-   single: physical line
-   single: escape sequence
-   single: Standard C
-   single: C
+.. index:: physical line, escape sequence, Standard C, C
  
  Unless an ``'r'`` or ``'R'`` prefix is present, escape sequences in strings are
  interpreted according to rules similar to those used by Standard C.  The
@@ -478,7 +445,7 @@ recognized escape sequences are:
  +-----------------+---------------------------------+-------+
  | Escape Sequence | Meaning                         | Notes |
  +=================+=================================+=======+
-| ``\newline``    | Ignored                         |       |
+| ``\newline``    | Backslash and newline ignored   |       |
  +-----------------+---------------------------------+-------+
  | ``\\``          | Backslash (``\``)               |       |
  +-----------------+---------------------------------+-------+
@@ -494,83 +461,83 @@ recognized escape sequences are:
  +-----------------+---------------------------------+-------+
  | ``\n``          | ASCII Linefeed (LF)             |       |
  +-----------------+---------------------------------+-------+
-| ``\N{name}``    | Character named *name* in the   |       |
-|                 | Unicode database (Unicode only) |       |
-+-----------------+---------------------------------+-------+
  | ``\r``          | ASCII Carriage Return (CR)      |       |
  +-----------------+---------------------------------+-------+
  | ``\t``          | ASCII Horizontal Tab (TAB)      |       |
  +-----------------+---------------------------------+-------+
-| ``\uxxxx``      | Character with 16-bit hex value | \(1)  |
-|                 | *xxxx* (Unicode only)           |       |
-+-----------------+---------------------------------+-------+
-| ``\Uxxxxxxxx``  | Character with 32-bit hex value | \(2)  |
-|                 | *xxxxxxxx* (Unicode only)       |       |
-+-----------------+---------------------------------+-------+
  | ``\v``          | ASCII Vertical Tab (VT)         |       |
  +-----------------+---------------------------------+-------+
-| ``\ooo``        | Character with octal value      | (3,5) |
+| ``\ooo``        | Character with octal value      | (1,3) |
  |                 | *ooo*                           |       |
  +-----------------+---------------------------------+-------+
-| ``\xhh``        | Character with hex value *hh*   | (4,5) |
+| ``\xhh``        | Character with hex value *hh*   | (2,3) |
  +-----------------+---------------------------------+-------+
  
-.. index:: single: ASCII@ASCII
+Escape sequences only recognized in string literals are:
+
++-----------------+---------------------------------+-------+
+| Escape Sequence | Meaning                         | Notes |
++=================+=================================+=======+
+| ``\N{name}``    | Character named *name* in the   |       |
+|                 | Unicode database                |       |
++-----------------+---------------------------------+-------+
+| ``\uxxxx``      | Character with 16-bit hex value | \(4)  |
+|                 | *xxxx*                          |       |
++-----------------+---------------------------------+-------+
+| ``\Uxxxxxxxx``  | Character with 32-bit hex value | \(5)  |
+|                 | *xxxxxxxx*                      |       |
++-----------------+---------------------------------+-------+
  
  Notes:
  
  (1)
-   Individual code units which form parts of a surrogate pair can be encoded using
-   this escape sequence.
+   As in Standard C, up to three octal digits are accepted.
  
  (2)
-   Any Unicode character can be encoded this way, but characters outside the Basic
-   Multilingual Plane (BMP) will be encoded using a surrogate pair if Python is
-   compiled to use 16-bit code units (the default).  Individual code units which
-   form parts of a surrogate pair can be encoded using this escape sequence.
+   Unlike in Standard C, at most two hex digits are accepted.
  
  (3)
-   As in Standard C, up to three octal digits are accepted.
+   In a bytes literal, hexadecimal and octal escapes denote the byte with the
+   given value. In a string literal, these escapes denote a Unicode character
+   with the given value.
  
  (4)
-   Unlike in Standard C, at most two hex digits are accepted.
+   Individual code units which form parts of a surrogate pair can be encoded using
+   this escape sequence.
  
  (5)
-   In a string literal, hexadecimal and octal escapes denote the byte with the
-   given value; it is not necessary that the byte encodes a character in the source
-   character set. In a Unicode literal, these escapes denote a Unicode character
-   with the given value.
+   Any Unicode character can be encoded this way, but characters outside the Basic
+   Multilingual Plane (BMP) will be encoded using a surrogate pair if Python is
+   compiled to use 16-bit code units (the default).  Individual code units which
+   form parts of a surrogate pair can be encoded using this escape sequence.
+
  
-.. index:: single: unrecognized escape sequence
+.. index:: unrecognized escape sequence
  
  Unlike Standard C, all unrecognized escape sequences are left in the string
  unchanged, i.e., *the backslash is left in the string*.  (This behavior is
  useful when debugging: if an escape sequence is mistyped, the resulting output
  is more easily recognized as broken.)  It is also important to note that the
-escape sequences marked as "(Unicode only)" in the table above fall into the
-category of unrecognized escapes for non-Unicode string literals.
-
-When an ``'r'`` or ``'R'`` prefix is present, a character following a backslash
-is included in the string without change, and *all backslashes are left in the
-string*.  For example, the string literal ``r"\n"`` consists of two characters:
-a backslash and a lowercase ``'n'``.  String quotes can be escaped with a
-backslash, but the backslash remains in the string; for example, ``r"\""`` is a
-valid string literal consisting of two characters: a backslash and a double
-quote; ``r"\"`` is not a valid string literal (even a raw string cannot end in
-an odd number of backslashes).  Specifically, *a raw string cannot end in a
-single backslash* (since the backslash would escape the following quote
-character).  Note also that a single backslash followed by a newline is
-interpreted as those two characters as part of the string, *not* as a line
-continuation.
-
-When an ``'r'`` or ``'R'`` prefix is used in conjunction with a ``'u'`` or
-``'U'`` prefix, then the ``\uXXXX`` and ``\UXXXXXXXX`` escape sequences are
-processed while  *all other backslashes are left in the string*. For example,
-the string literal ``ur"\u0062\n"`` consists of three Unicode characters: 'LATIN
-SMALL LETTER B', 'REVERSE SOLIDUS', and 'LATIN SMALL LETTER N'. Backslashes can
-be escaped with a preceding backslash; however, both remain in the string.  As a
-result, ``\uXXXX`` escape sequences are only recognized when there are an odd
-number of backslashes.
+escape sequences only recognized in string literals fall into the category of
+unrecognized escapes for bytes literals.
+
+When an ``'r'`` or ``'R'`` prefix is used in a string literal, then the
+``\uXXXX`` and ``\UXXXXXXXX`` escape sequences are processed while *all other
+backslashes are left in the string*. For example, the string literal
+``r"\u0062\n"`` consists of three Unicode characters: 'LATIN SMALL LETTER B',
+'REVERSE SOLIDUS', and 'LATIN SMALL LETTER N'. Backslashes can be escaped with a
+preceding backslash; however, both remain in the string.  As a result,
+``\uXXXX`` escape sequences are only recognized when there is an odd number of
+backslashes.
+
+Even in a raw string, string quotes can be escaped with a backslash, but the
+backslash remains in the string; for example, ``r"\""`` is a valid string
+literal consisting of two characters: a backslash and a double quote; ``r"\"``
+is not a valid string literal (even a raw string cannot end in an odd number of
+backslashes).  Specifically, *a raw string cannot end in a single backslash*
+(since the backslash would escape the following quote character).  Note also
+that a single backslash followed by a newline is interpreted as those two
+characters as part of the string, *not* as a line continuation.
  
  
  .. _string-catenation:
@@ -600,19 +567,9 @@ styles for each component (even mixing raw strings and triple quoted strings).
  Numeric literals
  ----------------
  
-.. index::
-   single: number
-   single: numeric literal
-   single: integer literal
-   single: plain integer literal
-   single: long integer literal
-   single: floating point literal
-   single: hexadecimal literal
-   single: octal literal
-   single: binary literal
-   single: decimal literal
-   single: imaginary literal
-   single: complex; literal
+.. index:: number, numeric literal, integer literal, plain integer literal
+   long integer literal, floating point literal, hexadecimal literal
+   octal literal, binary literal, decimal literal, imaginary literal, complex literal
  
  There are four types of numeric literals: plain integers, long integers,
  floating point numbers, and imaginary numbers.  There are no complex literals
@@ -633,18 +590,17 @@ Integer literals are described by the following lexical definitions:
  .. productionlist::
     integer: `decimalinteger` | `octinteger` | `hexinteger`
     decimalinteger: `nonzerodigit` `digit`* | "0"+
+   nonzerodigit: "1"..."9"
+   digit: "0"..."9"
     octinteger: "0" ("o" | "O") `octdigit`+
     hexinteger: "0" ("x" | "X") `hexdigit`+
     bininteger: "0" ("b" | "B") `bindigit`+
-   nonzerodigit: "1"..."9"
     octdigit: "0"..."7"
     hexdigit: `digit` | "a"..."f" | "A"..."F"
-   bindigit: "0"..."1"
+   bindigit: "0" | "1"
  
-Plain integer literals that are above the largest representable plain integer
-(e.g., 2147483647 when using 32-bit arithmetic) are accepted as if they were
-long integers instead. [#]_  There is no limit for long integer literals apart
-from what can be stored in available memory.
+There is no limit for the length of integer literals apart from what can be
+stored in available memory.
  
  Note that leading zeros in a non-zero decimal number are not allowed. This is
  for disambiguation with C-style octal literals, which Python used before version
@@ -732,7 +688,7 @@ The following tokens serve as delimiters in the grammar::
     &=      |=      ^=      >>=     <<=     **=
  
  The period can also occur in floating-point and imaginary literals.  A sequence
-of three periods has a special meaning as an ellipsis in slices. The second half
+of three periods has a special meaning as an ellipsis literal. The second half
  of the list, the augmented assignment operators, serve lexically as delimiters,
  but also perform an operation.
  
@@ -741,18 +697,7 @@ tokens or are otherwise significant to the lexical analyzer::
  
     '       "       #       \
  
-.. index:: single: ASCII@ASCII
-
  The following printing ASCII characters are not used in Python.  Their
  occurrence outside string literals and comments is an unconditional error::
  
     $       ?
-
-.. rubric:: Footnotes
-
-.. [#] In versions of Python prior to 2.4, octal and hexadecimal literals in the range
-   just above the largest representable plain integer but below the largest
-   unsigned 32-bit number (on a machine using 32-bit arithmetic), 4294967296, were
-   taken as the negative plain integer obtained by subtracting 4294967296 from
-   their unsigned value.
-
author	Georg Brandl <georg@python.org>
	Fri, 31 Aug 2007 08:07:45 +0000 (08:07 +0000)
committer	Georg Brandl <georg@python.org>
	Fri, 31 Aug 2007 08:07:45 +0000 (08:07 +0000)
Doc/documenting/index.rst		patch \| blob \| history
Doc/reference/introduction.rst		patch \| blob \| history
Doc/reference/lexical_analysis.rst		patch \| blob \| history