Klacks parser

The Klacks parser provides an alternative parsing interface, similar in concept to Java's Streaming API for XML (StAX).

It implements a streaming, "pull-based" API. This is different from SAX, which is a "push-based" model.

Klacks is implemented using the same code base as the SAX parser and has the same parsing characteristics (validation, namespace support, entity resolution) while offering a more flexible interface than SAX.

See below for examples.

Parsing incrementally using sources

To parse using Klacks, create an XML source first.

Function CXML:MAKE-SOURCE (input &key validate dtd root entity-resolver disallow-external-subset pathname)
Create and return a source for input.

Exact behaviour depends on input, which can be one of the following types:

Closing streams: Sources can refer to Lisp streams that need to be closed after parsing. This includes a stream passed explicitly as input, a stream created implicitly for the pathname case, as well as any streams created automatically for external parsed entities referred to by the document.

All these stream get closed automatically if end of file is reached normally. Use klacks:close-source or klacks:with-open-source to ensure that the streams get closed otherwise.

Buffering: By default, the Klacks parser performs buffering of octets being read from the stream as an optimization. This can result in unwanted blocking if the stream is a socket and the parser tries to read more data than required to parse the current event. Use :buffering nil to disable this optimization.

The following keyword arguments have the same meaning as with the SAX parser, please refer to the documentation of parse-file for more information:

In addition, the following argument is for types of input other than pathname:

Events are read from the stream using the following functions:

Function KLACKS:PEEK (source)

=> :start-document
or => :start-document, version, encoding, standalonep
or => :dtd, name, public-id, system-id
or => :start-element, uri, lname, qname
or => :end-element, uri, lname, qname
or => :characters, data
or => :processing-instruction, target, data
or => :comment, data
or => :end-document, data
or => nil

peek returns the current event's key and main values.

Function KLACKS:PEEK-NEXT (source) => key, value*

Advance the source forward to the next event and returns it like peek would.

Function KLACKS:PEEK-VALUE (source) => value*

Like peek, but return only the values, not the key.

Function KLACKS:CONSUME (source) => key, value*

Return the same values peek would, and in addition advance the source forward to the next event.

Function KLACKS:CURRENT-URI (source) => uri
Function KLACKS:CURRENT-LNAME (source) => string
Function KLACKS:CURRENT-QNAME (source) => string

If the current event is :start-element or :end-element, return the corresponding value. Else, signal an error.

Function KLACKS:CURRENT-CHARACTERS (source) => string

If the current event is :characters, return the character data value. Else, signal an error.

Function KLACKS:CURRENT-CDATA-SECTION-P (source) => boolean

If the current event is :characters, determine whether the data was specified using a CDATA section in the source document. Else, signal an error.

Function KLACKS:MAP-ATTRIBUTES (fn source)

Call fn for each attribute of the current start tag in turn, and pass the following values as arguments to the function:

Only valid for :start-element.

Return a list of SAX attribute structures for the current start tag. Only valid for :start-element.

Function KLACKS:CLOSE-SOURCE (source)
Close all streams referred to by source.

Macro KLACKS:WITH-OPEN-SOURCE ((var source) &body body)
Evaluate source to create a source object, bind it to symbol var and evaluate body as an implicit progn. Call klacks:close-source to close the source after exiting body, whether normally or abnormally.

Convenience functions

Function KLACKS:FIND-EVENT (source key)
Read events from source and discard them until an event of type key is found. Return values like peek, or NIL if no such event was found.

Function KLACKS:FIND-ELEMENT (source &optional lname uri)
Read events from source and discard them until an event of type :start-element is found with matching local name and namespace uri is found. If lname is nil, any tag name matches. If uri is nil, any namespace matches. Return values like peek or NIL if no such event was found.

Bridging Klacks and SAX

Function KLACKS:SERIALIZE-EVENT (source handler)
Send the current klacks events from source as a SAX events to the SAX handler and consume it.

Function KLACKS:SERIALIZE-ELEMENT (source handler &key document-events)
Read all klacks events from the following :start-element to its :end-element and send them as SAX events to handler. When this function is called, the current event must be :start-element, else an error is signalled. With document-events (the default), sax:start-document and sax:end-document events are sent around the element.

Function KLACKS:SERIALIZE-SOURCE (source handler)
Read all klacks events from source and send them as SAX events to the SAX handler.

Examples

The following example illustrates creation of a klacks source, use of the peek-next function to read individual events, and shows some of the most common event types.

* (defparameter *source* (cxml:make-source "<example>text</example>"))
*SOURCE*

* (klacks:peek-next *source*)
:START-DOCUMENT

* (klacks:peek-next *source*)
:START-ELEMENT
NIL                      ;namespace URI
"example"                ;local name
"example"                ;qualified name

* (klacks:peek-next *source*)
:CHARACTERS
"text"

* (klacks:peek-next *source*)
:END-ELEMENT
NIL
"example"
"example"

* (klacks:peek-next *source*)
:END-DOCUMENT

* (klacks:peek-next *source*)
NIL

In this example, find-element is used to skip over the uninteresting events until the opening child1 tag is found. Then serialize-element is used to generate SAX events for the following element, including its children, and an xmls-compatible list structure is build from those events. find-element skips over whitespace, and find-event is used to parse up to :end-document, ensuring that the source has been closed.

* (defparameter *source*
      (cxml:make-source "<example>
                           <child1><p>foo</p></child1>
                           <child2 bar='baz'/>
                         </example>"))
*SOURCE*

* (klacks:find-element *source* "child1")
:START-ELEMENT
NIL
"child1"
"child1"

* (klacks:serialize-element *source* (cxml-xmls:make-xmls-builder))
("child1" NIL ("p" NIL "foo"))

* (klacks:find-element *source*)
:START-ELEMENT
NIL
"child2"
"child2"

*  (klacks:serialize-element *source* (cxml-xmls:make-xmls-builder))
("child2" (("bar" "baz")))

* (klacks:find-event *source* :end-document)
:END-DOCUMENT
NIL
NIL
NIL