\section{\module{htmllib} ---
- A parser for HTML documents.}
-\declaremodule{standard}{htmllib}
+ A parser for HTML documents}
+\declaremodule{standard}{htmllib}
\modulesynopsis{A parser for HTML documents.}
\index{HTML}
other classes in order to add functionality, and allows most of its
methods to be extended or overridden. In turn, this class is derived
from and extends the \class{SGMLParser} class defined in module
-\module{sgmllib}\refstmodindex{sgmllib}. The \class{HTMLParser}
+\refmodule{sgmllib}\refstmodindex{sgmllib}. The \class{HTMLParser}
implementation supports the HTML 2.0 language as described in
\rfc{1866}. Two implementations of formatter objects are provided in
-the \module{formatter}\refstmodindex{formatter} module; refer to the
+the \refmodule{formatter}\refstmodindex{formatter} module; refer to the
documentation for that module for information on the formatter
interface.
-\index{SGML}
\withsubitem{(in module sgmllib)}{\ttindex{SGMLParser}}
-\index{formatter}
The following is a summary of the interface defined by
\class{sgmllib.SGMLParser}:
\item
The interface to define semantics for HTML tags is very simple: derive
-a class and define methods called \code{start_\var{tag}()},
-\code{end_\var{tag}()}, or \code{do_\var{tag}()}. The parser will
-call these at appropriate moments: \code{start_\var{tag}} or
-\code{do_\var{tag}()} is called when an opening tag of the form
-\code{<\var{tag} ...>} is encountered; \code{end_\var{tag}()} is called
+a class and define methods called \method{start_\var{tag}()},
+\method{end_\var{tag}()}, or \method{do_\var{tag}()}. The parser will
+call these at appropriate moments: \method{start_\var{tag}} or
+\method{do_\var{tag}()} is called when an opening tag of the form
+\code{<\var{tag} ...>} is encountered; \method{end_\var{tag}()} is called
when a closing tag of the form \code{<\var{tag}>} is encountered. If
an opening tag requires a corresponding closing tag, like \code{<H1>}
-... \code{</H1>}, the class should define the \code{start_\var{tag}()}
+... \code{</H1>}, the class should define the \method{start_\var{tag}()}
method; if a tag requires no closing tag, like \code{<P>}, the class
-should define the \code{do_\var{tag}()} method.
+should define the \method{do_\var{tag}()} method.
\end{itemize}
This method is called at the start of an anchor region. The arguments
correspond to the attributes of the \code{<A>} tag with the same
names. The default implementation maintains a list of hyperlinks
-(defined by the \code{href} attribute) within the document. The list
-of hyperlinks is available as the data attribute \code{anchorlist}.
+(defined by the \code{HREF} attribute for \code{<A>} tags) within the
+document. The list of hyperlinks is available as the data attribute
+\member{anchorlist}.
\end{methoddesc}
\begin{methoddesc}{anchor_end}{}
\begin{methoddesc}{save_end}{}
Ends buffering character data and returns all data saved since the
-preceeding call to \method{save_bgn()}. If the \code{nofill} flag is
+preceeding call to \method{save_bgn()}. If the \member{nofill} flag is
false, whitespace is collapsed to single spaces. A call to this
method without a preceeding call to \method{save_bgn()} will raise a
\exception{TypeError} exception.
\section{\module{sgmllib} ---
- Simple SGML parser.}
-\declaremodule{standard}{sgmllib}
+ Simple SGML parser}
+\declaremodule{standard}{sgmllib}
\modulesynopsis{Only as much of an SGML parser as needed to parse HTML.}
\index{SGML}
basis for parsing text files formatted in SGML (Standard Generalized
Mark-up Language). In fact, it does not provide a full SGML parser
--- it only parses SGML insofar as it is used by HTML, and the module
-only exists as a base for the \module{htmllib}\refstmodindex{htmllib}
+only exists as a base for the \refmodule{htmllib}\refstmodindex{htmllib}
module.
\begin{methoddesc}{setnomoretags}{}
Stop processing tags. Treat all following input as literal input
-(CDATA). (This is only provided so the HTML tag \code{<PLAINTEXT>}
-can be implemented.)
+(CDATA). (This is only provided so the HTML tag
+\code{<PLAINTEXT>} can be implemented.)
\end{methoddesc}
\begin{methoddesc}{setliteral}{}
\begin{methoddesc}{handle_starttag}{tag, method, attributes}
This method is called to handle start tags for which either a
-\code{start_\var{tag}()} or \code{do_\var{tag}()} method has been
+\method{start_\var{tag}()} or \method{do_\var{tag}()} method has been
defined. The \var{tag} argument is the name of the tag converted to
lower case, and the \var{method} argument is the bound method which
should be used to support semantic interpretation of the start tag.
-The \var{attributes} argument is a list of \code{(\var{name}, \var{value})}
-pairs containing the attributes found inside the tag's \code{<>}
-brackets. The \var{name} has been translated to lower case and double
-quotes and backslashes in the \var{value} have been interpreted. For
-instance, for the tag \code{<A HREF="http://www.cwi.nl/">}, this
+The \var{attributes} argument is a list of \code{(\var{name},
+\var{value})} pairs containing the attributes found inside the tag's
+\code{<>} brackets. The \var{name} has been translated to lower case
+and double quotes and backslashes in the \var{value} have been interpreted.
+For instance, for the tag \code{<A HREF="http://www.cwi.nl/">}, this
method would be called as \samp{unknown_starttag('a', [('href',
'http://www.cwi.nl/')])}. The base implementation simply calls
\var{method} with \var{attributes} as the only argument.
\begin{methoddesc}{handle_endtag}{tag, method}
This method is called to handle endtags for which an
-\code{end_\var{tag}()} method has been defined. The \var{tag}
-argument is the name of the tag converted to lower case, and the
-\var{method} argument is the bound method which should be used to
+\method{end_\var{tag}()} method has been defined. The
+\var{tag} argument is the name of the tag converted to lower case, and
+the \var{method} argument is the bound method which should be used to
support semantic interpretation of the end tag. If no
-\code{end_\var{tag}()} method is defined for the closing element,
+\method{end_\var{tag}()} method is defined for the closing element,
this handler is not called. The base implementation simply calls
\var{method}.
\end{methoddesc}
form \samp{\&\var{ref};} where \var{ref} is an general entity
reference. It looks for \var{ref} in the instance (or class)
variable \member{entitydefs} which should be a mapping from entity
-names to corresponding translations.
-If a translation is found, it calls the method \method{handle_data()}
-with the translation; otherwise, it calls the method
-\code{unknown_entityref(\var{ref})}. The default \member{entitydefs}
-defines translations for \code{\&}, \code{\&apos}, \code{\>},
-\code{\<}, and \code{\"}.
+names to corresponding translations. If a translation is found, it
+calls the method \method{handle_data()} with the translation;
+otherwise, it calls the method \code{unknown_entityref(\var{ref})}.
+The default \member{entitydefs} defines translations for
+\code{\&}, \code{\&apos}, \code{\>}, \code{\<}, and
+\code{\"}.
\end{methoddesc}
\begin{methoddesc}{handle_comment}{comment}
\begin{methoddescni}{start_\var{tag}}{attributes}
This method is called to process an opening tag \var{tag}. It has
-preference over \code{do_\var{tag}()}. The \var{attributes}
-argument has the same meaning as described for
+preference over \method{do_\var{tag}()}. The
+\var{attributes} argument has the same meaning as described for
\method{handle_starttag()} above.
\end{methoddescni}
Note that the parser maintains a stack of open elements for which no
end tag has been found yet. Only tags processed by
-\code{start_\var{tag}()} are pushed on this stack. Definition of an
-\code{end_\var{tag}()} method is optional for these tags. For tags
-processed by \code{do_\var{tag}()} or by \method{unknown_tag()}, no
-\code{end_\var{tag}()} method must be defined; if defined, it will not
-be used. If both \code{start_\var{tag}()} and \code{do_\var{tag}()}
-methods exist for a tag, the \code{start_\var{tag}()} method takes
-precedence.
+\method{start_\var{tag}()} are pushed on this stack. Definition of an
+\method{end_\var{tag}()} method is optional for these tags. For tags
+processed by \method{do_\var{tag}()} or by \method{unknown_tag()}, no
+\method{end_\var{tag}()} method must be defined; if defined, it will
+not be used. If both \method{start_\var{tag}()} and
+\method{do_\var{tag}()} methods exist for a tag, the
+\method{start_\var{tag}()} method takes precedence.