From: Clark Cooper
@@ -60,16 +60,29 @@ copyright and to distribute it with expat.
+
XML_Char
. This type is defined in expat.h as char *
and contains bytes encoding UTF-8.
Note that you'll receive them in this form independent of the original
-encoding of the document.
+encoding of the document.
-XML_SetElementHandler(XML_Parser p, - XML_StartElementHandler start, - XML_EndElementHandler end); + ---++typedef void (*XML_StartElementHandler)(void *userData, const XML_Char *name, const XML_Char **atts); -+Set handler for start (and empty) tags. Attributes are passed to the start +handler as a pointer to a vector of char pointers. Each attribute seen in +a start (or empty) tag occupies 2 consecutive places in this vector: the +attribute name followed by the attribute value. These pairs are terminated +by a null pointer.
+Note that an empty tag generates a call to both start and end handlers +(in that order).
-+ ++ +-+typedef void (*XML_EndElementHandler)(void *userData, const XML_Char *name); -+Set handler for end (and empty) tags. As noted above, an empty tag +generates a call to both start and end handlers.
Set handlers for start and end tags. Attributes are passed to the start -handler as a pointer to a vector of char pointers. Each attribute seen in -a start (or empty) tag occupies 2 consecutive places in this vector: the -attribute name followed by the attribute value. These pairs are terminated -by a null pointer. + +
+ +Set handlers for start and end tags with one call.
@@ -695,108 +727,95 @@ by a null pointer. XML_SetCharacterDataHandler(XML_Parser p, XML_CharacterDataHandler charhndl)--++typedef void (*XML_CharacterDataHandler)(void *userData, const XML_Char *s, int len); --Set a text handler. The string your handler receives is NOT zero terminated. You have to use the length argument to deal with the end of the string. A single block of contiguous text free of markup may still result in a sequence of calls to this handler. In other words, if you're searching for a pattern in the text, it may -be split across calls to this handler. +be split across calls to this handler.
-Set a handler for processing instructions. The target is the first word in the processing instruction. The data is the rest of the characters in -it after skipping all whitespace after the initial word. +it after skipping all whitespace after the initial word.
-Set a handler for comments. The data is all text inside the comment -delimiters. +delimiters.
- ---+ ++typedef void (*XML_StartCdataSectionHandler)(void *userData); -+Set a handler that gets called at the beginning of a CDATA section.
-+ ++ +-+typedef void (*XML_EndCdataSectionHandler)(void *userData); -+Set a handler that gets called at the end of a CDATA section.
Sets handlers that get called at the beginning and end of a -CDATA section. + +
+ +Sets both CDATA section handlers with one call.
-Sets a handler for any characters in the document which wouldn't otherwise be handled. This includes both data for which no handlers can be set (like some kinds of DTD declarations) and data which could be reported @@ -805,100 +824,82 @@ data that is destined to be reported to the default handler may actually be reported over several calls to the handler. Setting the handler with this call has the side effect of turning off expansion of references to internally defined general entities. Instead these references are -passed to the default handler. +passed to the default handler.
-This sets a default handler, but doesn't affect expansion of internal -entity references. +entity references.
-Set an external entity reference handler. This handler is also called for processing an external DTD subset if parameter entity parsing is in effect. (See -
XML_SetParamEntityParsing
.) +XML_SetParamEntityParsing
.)The base parameter is the base to use for relative system identifiers. It is set by XML_SetBase and may be null. The public id parameter is the public id given in the entity declaration and may be null. The system id is the system identifier specified in the entity -declaration and is never null. +declaration and is never null.
There are a couple of ways in which this handler differs from others. First, this handler returns an integer. A non-zero value should be returned for successful handling of the external entity reference. Returning a zero indicates failure, and causes the calling parser to return -an XML_ERROR_EXTERNAL_ENTITY_HANDLING error. +an XML_ERROR_EXTERNAL_ENTITY_HANDLING error.
Second, instead of having userData as its first argument, it receives the parser that encountered the entity reference. This, along with the context parameter, may be used as arguments to a call to -XML_ExternalEntityParserCreate. Using the -returned parser, the body of the external entity can be recursively -parsed. +XML_ExternalEntityParserCreate. +Using the returned parser, the body of the external entity can be recursively +parsed.
Since this handler may be called recursively, it should not be saving -information into global or static variables. +information into global or static variables.
-Set a handler to deal with encodings other than the built in set. If the handler knows how to deal with an encoding with the given name, it should fill in the info -data structure and return 1. Otherwise it should return 0. +data structure and return 1. Otherwise it should return 0.
typedef struct { @@ -910,64 +911,264 @@ data structure and return 1. Otherwise it should return 0.The map array contains information for every possible possible leading -byte in a byte sequence. If the corresponding value is >= 0, then it's +byte in a byte sequence. If the corresponding value is >= 0, then it's a single byte sequence and the byte encodes that Unicode value. If the value is -1, then that byte is invalid as the initial byte in a sequence. -If the value is -n, where n is an integer > 1, then n is the number of +If the value is -n, where n is an integer > 1, then n is the number of bytes in the sequence and the actual conversion is accomplished by a call to the function pointed at by convert. This function may return -1 if the sequence itself is invalid. The convert pointer may be null if there are only single byte codes. The data parameter passed to the convert function is the data pointer from XML_Encoding. The string s is NOT -null terminated and points at the sequence of bytes to be converted. +null terminated and points at the sequence of bytes to be converted.
The function pointed at by release is called by the parser when it is -finished with the encoding. It may be null. +finished with the encoding. It may be null.
- ---+ ++typedef void (*XML_StartNamespaceDeclHandler)(void *userData, const XML_Char *prefix, const XML_Char *uri); -+Set a handler to be called when a namespace is declared. Namespace +declarations occur inside start tags. But the namespace declaration start +handler is called before the start tag handler for each namespace declared +in that start tag.
-+ + ++ ++ ++ ++typedef void (*XML_EndNamespaceDeclHandler)(void *userData, const XML_Char *prefix); -+Set a handler to be called when leaving the scope of a namespace +declaration. This will be called, for each namespace declaration, after +the handler for the end tag of the element in which the namespace was declared. +
++ +-Sets both namespace declaration handlers with a single call
Set handlers for namespace declarations. Namespace declarations occur -inside start tags. But the namespace declaration start handler is called before -the start tag handler for each namespace declared in that start tag. The -corresponding namespace end handler is called after the end tag for the -element the namespace is associated with. + +
+ +++typedef void +(*XML_XmlDeclHandler) (void *userData, + const XML_Char *version, + const XML_Char *encoding, + int standalone); +Sets a handler that is called for XML declarations and also for +text declarations discovered in external entities. The way to distinguish +is that the
version
parameter will be NULL for text +declarations. Theencoding
parameter may be NULL for +an XML declaration. Thestandalone
argument will contain +-1, 0, or 1 indicating respectively that there was no standalone parameter in +the declaration, that it was given as no, or that it was given as yes.- + ++ ++ +++typedef void +(*XML_EndDoctypeDeclHandler)(void *userData); +Set a handler that is called at the end of a DOCTYPE declaration, +after parsing any external subset.
++ ++ +Set both doctype handlers with one call.
++ ++ +++typedef void +(*XML_ElementDeclHandler)(void *userData, + const XML_Char *name, + XML_Content *model); +++enum XML_Content_Type { + XML_CTYPE_EMPTY = 1, + XML_CTYPE_ANY, + XML_CTYPE_MIXED, + XML_CTYPE_NAME, + XML_CTYPE_CHOICE, + XML_CTYPE_SEQ +}; + +enum XML_Content_Quant { + XML_CQUANT_NONE, + XML_CQUANT_OPT, + XML_CQUANT_REP, + XML_CQUANT_PLUS +}; + +typedef struct XML_cp XML_Content; + +struct XML_cp { + enum XML_Content_Type type; + enum XML_Content_Quant quant; + const XML_Char * name; + unsigned int numchildren; + XML_Content * children; +}; +Sets a handler for element declarations in a DTD. The handler gets called +with the name of the element in the declaration and a pointer to a structure +that contains the element model. It is the application's responsibility to +free this data structure by calling +XML_ContentFree.
+ +The
+ +model
argument is the root of a tree of +XML_Content
nodes. Iftype
equals +XML_CTYPE_EMPTY
orXML_CTYPE_ANY
, then +quant
will beXML_CQUANT_NONE
, and the other fields +will be zero or NULL. +Iftype
isXML_CTYPE_MIXED
, thenquant
+will beXML_CQUANT_NONE
orXML_CQUANT_REP
and +numchildren
will contain the number of elements that are allowed +to be mixed in andchildren
points to an array of +XML_Content
structures that will all have type XML_CTYPE_NAME +with no quantification. +Only the root node can be typeXML_CTYPE_EMPTY
,XML_CTYPE_ANY
, orXML_CTYPE_MIXED
.For type
+ +XML_CTYPE_NAME
, thename
field points +to the name and thenumchildren
andchildren
fields +will be zero and NULL. Thequant
field will indicate any +quantifiers placed on the name.Types
+XML_CTYPE_CHOICE
andXML_CTYPE_SEQ
+indicate a choice or sequence respectively. Thenumchildren
+field indicates how many nodes in the choice or sequence and +children
points to the nodes.+ ++ +++typedef void +(*XML_AttlistDeclHandler) (void *userData, + const XML_Char *elname, + const XML_Char *attname, + const XML_Char *att_type, + const XML_Char *dflt, + int isrequired); +Set a handler for attlist declarations in the DTD. This handler is called +for each attribute. So a single attlist declaration with multiple +attributes declared will generate multiple calls to this handler. The +
+ +elname
parameter returns the name of the element for which the +attribute is being declared. The attribute name is in theattname
+parameter. The attribute type is in theatt_type
parameter. +It is the string representing the type in the declaration with whitespace +removed.The
+dflt
parameter holds the default value. It will +be NULL in the case of "#IMPLIED" or "#REQUIRED" attributes. You can +distinguish these two cases by checking theisrequired
+parameter, which will be true in the case of "#REQUIRED" attributes. +Attributes which are "#FIXED" will have also have a true +isrequired
, but they will have the non-NULL fixed value in the +dflt
parameter. ++ ++ +++typedef void +(*XML_EntityDeclHandler) (void *userData, + const XML_Char *entityName, + int is_parameter_entity, + const XML_Char *value, + int value_length, + const XML_Char *base, + const XML_Char *systemId, + const XML_Char *publicId, + const XML_Char *notationName); +Sets a handler that will be called for all entity declarations. +The
+is_parameter_entity
argument will be non-zero in the case +of parameter entities and zero otherwise. +For internal entities (
+<!ENTITY foo "bar">
), +value
will be non-NULL andsystemId
, +publicId
, andnotationName
will all be NULL. +The value string is not NULL terminated; the length is provided in +thevalue_length
parameter. Do not usevalue_length
+to test for internal entities, since it is legal to have zero-length +values. Instead check for whether or notvalue
is NULL.The
+notationName
argument will have a non-NULL value only +for unparsed entity declarations.+ --+++typedef void (*XML_UnparsedEntityDeclHandler)(void *userData, const XML_Char *entityName, @@ -975,74 +1176,61 @@ typedef void const XML_Char *systemId, const XML_Char *publicId, const XML_Char *notationName); --Set a handler that receives declarations of unparsed entities. These -are entity declarations that have a notation (NDATA) field: -
--+are entity declarations that have a notation (NDATA) field: + ++<!ENTITY logo SYSTEM "images/logo.gif" NDATA gif> --So for this example, the entityName would be "logo", the systemId -would be "images/logo.gif" and notationName would be "gif". For this -example the publicId parameter is null. The base parameter would be -whatever has been set with
XML_SetBase
. -If not set, it would be null. +This handler is obsolete and is provided for backwards compatibility. +Use instead XML_SetEntityDeclHandler. +
- +Set a handler that receives notation declarations.
-Set a handler that is called if the document is not "standalone". This happens when there is an external subset or a reference to a parameter entity, but does not have standalone set to "yes" in an XML declaration. If this handler returns 0, then the parser will throw an -XML_ERROR_NOT_STANDALONE error. +XML_ERROR_NOT_STANDALONE error.
Parse position and error reporting functions
These are the functions you'll want to call when the parse functions -return 0, although the position reporting functions are useful outside -of errors. The position reported is the byte position (in the original -document or entity encoding) of the first of the sequence -of characters that generated the current event (or the error that caused -the parse functions to return 0.) +return 0 (i.e. a parse error has ocurred), although the position reporting +functions are useful outside of errors. The position reported is the byte +position (in the original document or entity encoding) of the first of the +sequence of characters that generated the current event (or the error that +caused the parse functions to return 0.)
+The position reporting functions are accurate only outside of the DTD. +In other words, they usually return bogus information when called from within +a DTD declaration handler.
+ ++Return the number of bytes in the current event. Returns 0 if the event is +inside a reference to an internal entity. ++ + +++Returns the parser's input buffer, sets the integer pointed at by +
+offset
to the offset within this buffer of the current +parse position, and set the integer pointed at bysize
+to the size of the returned buffer.This should only be called from within a handler during an active +parse and the returned buffer should only be referred to from within +the handler that made the call. This input buffer contains the untranslated +bytes of the input.
+Only a limited amount of context is kept, so if the event triggering +a call spans over a very large amount of input, the actual parse position +may be before the beginning of the buffer.
+Miscellaneous functions
The functions in this section either obtain state information from the parser or can be used to dynamicly set parser options. @@ -1153,9 +1370,20 @@ When attributes are reported to the start handler in the atts vector, attributes that were explicitly set in the element occur before any attributes that receive their value from default information in an ATTLIST declaration. This function returns the number of attributes that -were explicitly set, thus giving the offset of the first attribute set +were explicitly set times two, thus giving the offset in the
atts
+array passed to the start tag handler of the first attribute set due to defaults. It supplies information for the last call to a start -handler. If you're in a start handler, then that means the current call. +handler. If called inside a start handler, then that means the current call. ++Returns the index of the ID attribute passed in the atts array +in the last call to XML_StartElementHandler, or -1 if there is no ID +attribute. If called inside a start handler, then that means the current call.+ ++Free the model data structure passed to an element declaration handler. ++