method, which takes a string argument. This can be called with as
little or as much text at a time as desired; \samp{p.feed(a);
p.feed(b)} has the same effect as \samp{p.feed(a+b)}. When the data
-contains complete HTML tags, these are processed immediately;
-incomplete elements are saved in a buffer. To force processing of all
+contains complete HTML markup constructs, these are processed immediately;
+incomplete constructs are saved in a buffer. To force processing of all
unprocessed data, call the \method{close()} method.
For example, to parse the entire contents of a file, use:
\end{itemize}
-The module defines a single class:
+The module defines a parser class and an exception:
\begin{classdesc}{HTMLParser}{formatter}
This is the basic HTML parser class. It supports all entity names
It also defines handlers for all HTML 2.0 and many HTML 3.0 and 3.2 elements.
\end{classdesc}
+\begin{excdesc}{HTMLParseError}
+Exception raised by the \class{HTMLParser} class when it encounters an
+error while parsing.
+\versionadded{2.4}
+\end{excdesc}
+
\begin{seealso}
\seemodule{formatter}{Interface definition for transforming an
list of hyperlinks created by \method{anchor_bgn()}.
\end{methoddesc}
-\begin{methoddesc}{handle_image}{source, alt\optional{, ismap\optional{, align\optional{, width\optional{, height}}}}}
+\begin{methoddesc}{handle_image}{source, alt\optional{, ismap\optional{,
+ align\optional{, width\optional{, height}}}}}
This method is called to handle images. The default implementation
simply passes the \var{alt} value to the \method{handle_data()}
method.
\declaremodule{standard}{HTMLParser}
\modulesynopsis{A simple parser that can handle HTML and XHTML.}
+\versionadded{2.2}
+
This module defines a class \class{HTMLParser} which serves as the
basis for parsing text files formatted in HTML\index{HTML} (HyperText
Mark-up Language) and XHTML.\index{XHTML} Unlike the parser in
elements which are closed implicitly by closing an outer element.
\end{classdesc}
+An exception is defined as well:
+
+\begin{excdesc}{HTMLParseError}
+Exception raised by the \class{HTMLParser} class when it encounters an
+error while parsing. This exception provides three attributes:
+\member{msg} is a brief message explaining the error, \member{lineno}
+is the number of the line on which the broken construct was detected,
+and \member{offset} is the number of characters into the line at which
+the construct starts.
+\end{excdesc}
+
\class{HTMLParser} instances have the following methods:
HTML parser which supports XHTML and offers a somewhat different
interface is available in the \refmodule{HTMLParser} module.
-
\begin{classdesc}{SGMLParser}{}
The \class{SGMLParser} class is instantiated without arguments.
The parser is hardcoded to recognize the following
\end{itemize}
\end{classdesc}
-\class{SGMLParser} instances have the following interface methods:
+A single exception is defined as well:
+
+\begin{excdesc}{SGMLParseError}
+Exception raised by the \class{SGMLParser} class when it encounters an
+error while parsing.
+\versionadded{2.1}
+\end{excdesc}
+
+
+\class{SGMLParser} instances have the following methods:
\begin{methoddesc}{reset}{}