- David Lichteblau for David Lichteblau for knowledgeTools
(conversion into an independent package; DOM bug fixing; validation)
and headcraft
- (most september 2004 changes) and privately (changes since then).
+ (most september/october 2004 changes) and privately (changes
+ since then).
+
+ CXML currently implements a namespace-aware, validating SAX-like
+ XML 1.0
+ parser as well as the DOM Level 1 Core
+ interfaces.
+
- CXML should be portable to all Common Lisp implementations
- supporting gray streams. Currently assumed to work are:
-
-
-
- ACL (with support for rune-is-character in the
- unicode-enabled images)
-
-
- SBCL. The rune-is-character mode needs SBCL's Unicode
- branch ("character_branch"). Note that cxml still uses
- surrogate characters instead of utilizing full 21bit characters.
- This will probably addressed in a future release.)
-
-
- CMUCL (no support for rune-is-character)
-
-
- CLISP (reported to work with and without rune-is-character).
- CLISP needs to be run with an option like -E iso-8869-1
- teaching it to accept cxml's non-ASCII source files.
-
-
- LispWorks
-
-
-
- Incomplete port:
-
-
-
- OpenMCL basically works (in rune mode), but fails some tests.
- This needs to be investigated.
-
-
-
-
- Optional configuration (skip this unless you know better): CXML
- has full Unicode code support -- even on Lisps without Unicode
- strings. On non-unicode aware Lisps, DOMString is
- implemented as an array of character codes. CXML will auto-detect
- at compile-time which string representation to use. To override
- the auto-detection, you can set one of the features
- :rune-is-character and :rune-is-octet before
- loading cxml.asd. (fixme: feature
- :rune-is-octet is of course misnamed, since it uses 16bit
- runes, not 8bit runes. It will probably be renamed
- to :rune-is-integer at some point.)
-
-
-
- ASDF is used for
- compilation. The following instructions assume that ASDF has
- already been loaded.
-
$ export CVSROOT=:pserver:anonymous@dev.w3.org:/sources/public
-$ cvs login # password is "anonymous"
-$ cvs co 2001/XML-Test-Suite/xmlconf
-$ cvs co -D '2005-05-06 23:00' 2001/DOM-Test-Suite
-$ cd 2001/DOM-Test-Suite && ant dom1-dtd
-
- Omit -D to get the latest version, which may not work
- with cxml yet. The ant step is necessary to run the DOM
- tests.
-
-
Usage and expected output:
-
* (xmlconf:run-all-tests "/path/to/2001/XML-Test-Suite/xmlconf/")
-0/556 tests failed; 1606 tests were skipped
-* (domtest:run-all-tests "/path/to/2001/DOM-Test-Suite/")
-0/450 tests failed; 71 tests were skipped
-
-
- fixme: Add an explanation of xml/sax-tests here.
-
-
-
- fixme My parser does not understand the current testsuite
- anymore. To fix this problem, revert the affected files
- manually after check-out:
-
-
-
$ cd 2001/XML-Test-Suite/xmlconf/
-xmltest$ patch -p0 -R </path/to/cxml/test/xmlconf-base.diff
-
-
- The log message for the changes reads "Removed unnecessary
- xml:base attribute". If I understand correctly, only
- DOM 3 parsers provide the baseURI attribute necessary for
- understanding xmlconf.xml now. We don't have that
- yet.
-
Function CXML:PARSE-FILE (pathname handler &key ...)
-
Function CXML:PARSE-STREAM (stream handler &key ...)
-
Function CXML:PARSE-OCTETS (octets handler &key ...)
- Parse an XML document.
- Return values from this function depend on the SAX handler used.
- Arguments:
-
-
-
pathname -- a Common Lisp pathname
-
stream -- a Common Lisp stream with element-type
- (unsigned-byte 8)
-
octets -- an (unsigned-byte 8) array
-
handler -- a SAX handler
-
-
- Common keyword arguments:
-
-
-
- validate -- A boolean. Defaults to
- nil. If true, parse in validating mode, i.e. assert that
- the document contains a DOCTYPE declaration and conforms to the
- DTD declared.
-
-
- dtd -- unless nil, an extid instance
- specifying the external subset to load. This options overrides
- the extid specified in the document type declaration, if any.
- See below for make-extid. This option is useful
- for verification purposes together with the root
- and disallow-internal-subset arguments.
-
-
root -- the expected root element
- name, or nil (the default).
-
-
- entity-resolver -- nil or a function of two
- arguments which is invoked for every entity referenced by the
- document with the entity's Public ID (a rod) and System ID (an
- URI object) as arguments. The function may either return
- nil, CXML will then try to resolve the entity as usual.
- Alternatively it may return a Common Lisp stream specialized on
- (unsigned-byte 8) which will be used instead. (It may
- also signal an error, of course, which can be useful to prohibit
- parsed XML documents from including arbitrary files readable by
- the parser.)
-
-
- disallow-internal-subset -- a boolean. If true, signal
- an error if the document contains an internal subset.
-
-
-
-
-
Function CXML:PARSE-DTD-FILE (pathname)
-
Function CXML:PARSE-DTD-STREAM (stream)
- Parse declarations
- from a stand-alone file and return an object representing the DTD,
- suitable as an argument to validate.
-
-
-
pathname -- a Common Lisp pathname
-
stream -- a Common Lisp stream with element-type
- (unsigned-byte 8)
-
-
-
-
Function CXML:MAKE-EXTID (publicid systemid)
- Create an object representing the External ID composed
- of the specified Public ID, a rod or nil, and System ID
- (an URI object).
-
-
-
-
Function DOM:MAKE-DOM-BUILDER ()
- Create a SAX handler which builds a DOM document. Example:
-
-
- NIL: Use a more readable non-canonical representation.
-
-
-
- With an indentation level, pretty-print the XML by
- inserting additional whitespace. Note that indentation
- changes the document model and should only be used if whitespace
- does not matter to the application.
-
-
- unparse-document-to-octets returns an (unsigned-byte
- 8) array, whereas unparse-document writes
- characters. unparse-document is useful together
- with with-output-to-string. However, note that the
- resulting document in both cases is UTF-8 encoded, so the
- characters written by unparse-document are really UTF-8
- bytes encoded as characters.
-
-
-
-
Function CXML:MAKE-CHARACTER-STREAM-SINK (stream &rest keys) => sink
-
Function CXML:MAKE-OCTET-VECTOR-SINK (&rest keys) => sink
- Return a handle suitable for event-based XML serialization.
-
-
- These function provide the low-level mechanism used by the DOM
- serialization functions. To serialize a document without building
- its DOM tree first, create a sink handle and call SAX functions on that
- handle. sax:end-document returns the serialized form of
- the document described by the SAX events.
-
- Macro with-xhtml is a modified version of
- Franz' htmlgen works as a SAX driver for XHTML.
- It aims to be a plug-in replacement for the html macro.
-
-
- xhtmlgen is included as contrib/xhtmlgen.lisp in
- the cxml distribution. Example:
-
- Create a SAX handler which validates against a DTD instance.
- The document's root element must be named root.
- Used with dom:map-document, this validates a document
- object as if by re-reading it with a validating parser, except
- that declarations recorded in the document instance are completely
- ignored.
- Example:
-
-
Function DOM:MAP-DOCUMENT (handler document &key include-xmlns-attributes include-default-values)
- Traverse a DOM document and call SAX functions as if an XML
- representation of the document were processed by a SAX parser.
-
-
-
-
Class CXML:SAX-PROXY ()
-
Accessor CXML:PROXY-CHAINED-HANDLER
- sax-proxy is a SAX handler which passes all events it
- receives on to a user-defined second handler, which defaults
- to nil. Use sax-proxy to modify the events a
- SAX handler receives by defining your own subclass
- of sax-proxy. Setting the chained handler to the target
- handler, and define methods on your handler class for the events
- to be modified. All other events will pass through to the chained
- handler unmodified.
-
-
-
-
XMLS Compatibility
-
- Like other XML parsers written in Lisp, CXML can work with
- documents represented as list structures. The specific model
- implemented by cxml is compatible with the xmls parser. Xmls
- list structures are a simpler and faster alternative to full DOM
- document trees. They also serve as an example showing how to
- implement user-defined document models as an independent layer
- over the the base parser (c.f. xml/xmls-compat.lisp in
- the cxml distribution). However, note that the list structures do
- not include all information available in DOM documents and are
- sometimes more difficult to work wth since many DOM functions
- cannot be implemented on them.
-
-
-
Function CXML-XMLS:MAKE-XMLS-BUILDER (&key include-default-values)
- Create a SAX handler which builds XMLS list structures.
- If include-default-values is true, default values for
- attributes declared in a DTD are included as attributes in the
- xmls output. include-default-values is true by default
- and can be set to nil to suppress inclusion of default
- values.
-
-
Function CXML-XMLS:MAKE-NODE (&key name ns attrs
- children) => xmls node
- Build a list node of the form
- (name ((namevalue)*) child*).
-
-
- The node list's car can also be a cons of local name
- and namespace prefix ns.
- fixme: It is unclear to me how namespaces are meant to
- work in xmls, since xmls documentation differs from how xmls
- actually works in current releases. Usually applications need to
- know both the namespace prefix and the namespace URI. We
- currently follow the xmls implementation and use the
- namespace prefix instead of following its documentation which
- shows the URI. We do not follow xmls in munging xmlns attribute
- values. Attributes themselves have namespaces and it is not clear
- to me how that works in xmls.
-
-
-
Accessor CXML-XMLS:NODE-NAME (node)
-
Accessor CXML-XMLS:NODE-NS (node)
-
Accessor CXML-XMLS:NODE-ATTRS (node)
-
Accessor CXML-XMLS:NODE-CHILDREN (node)
- Accessors for xmls node data.
-
-
-
-
-
-
Dealing with Rods
-
- As explained above, the XML parser handles character encoding and
- uses 16bit strings internally. Instead of using characters and strings
- it uses runes and rods. This is seen as a
- feature, but can be inconvenient.
-
-
-
- If your Lisp supports 16 bit unicode strings, use feature
- :rune-is-character and forget about runes and rods.
- CXML will use ordinary Lisp characters and strings both
- internally and externally.
-
-
- If your Lisp does not support such strings and your application
- needs Unicode support, use functions defined in the
- runes package instead of ordinary string operators.
-
-
- If your Lisp does not support such strings and your application
- does not need Unicode support anyway, it will probably be more
- convenient to let CXML convert rods into strings automatically.
- To do that, use cxml:make-recoder to chain a special
- sax handler between the parser and your application handler.
- The recoder translates all rods using an application defined
- function, which defaults to runes:rod-string. Although
- the actual XML parser still uses rods internally, you SAX
- handler will only see ordinary Lisp strings.
-
-
-
- Note that the recoder approach does not work with the DOM
- builder, since DOM is specified to use UTF-16.
-
-
-
Function CXML:MAKE-RECODER (chained-handler &optional recoder-fn)
- Return a SAX handler which passes all events on to
- chained-handler after converting all strings and rods
- using recoder-fn, a function of one argument which
- defaults to runes:rod-string.
-
-
- Example. In a Lisp which ordinarily would use octet vector rods:
-
- To avoid spending time parsing the same DTD over and over again,
- CXML can cache DTD objects. The parser consults
- cxml:*dtd-cache* whenever it is looking for an external
- subset in a document which does not have an internal subset and
- uses the cached DTD instance if one is present in the cache for
- the System ID in question.
-
-
- Note that DTDs do not expire from the cache automatically.
- (Future versions of CXML might introduce automatic checks for
- outdated DTDs.)
-
-
-
Variable CXML:*DTD-CACHE*
- The DTD cache object consulted by the parser when it needs a DTD.
-
-
-
Function CXML:MAKE-DTD-CACHE ()
- Return a new, empty DTD cache object.
-
-
-
Variable CXML:*CACHE-ALL-DTDS*
- If true, instructs the parser to enter all DTDs that could have
- been cached into *dtd-cache* if they were not cached
- already. Defaults to nil.
-
-
-
Reader CXML:GETDTD (uri dtd-cache)
- Return a cached instance of the DTD at uri, if present in
- the cache, or nil.
-
-
-
Writer CXML:GETDTD (uri dtd-cache)
- Enter a new value for uri into dtd-cache.
-
-
-
Function CXML:REMDTD (uri dtd-cache)
- Ensure that no DTD is recorded for uri in the cache and
- return true if such a DTD was present.
-
-
-
Function CXML:CLEAR-DTD-CACHE (dtd-cache)
- Remove all entries from dtd-cache.
-
-
- fixme: thread-safety
-
-
-
-
XML Catalogs
-
- External entities (for example, DTDs) are referred to using their
- Public and System IDs. Usually the System ID, a URI, is used to
- locate the entity. CXML itself handles only file://-URIs, but
- many System IDs in practical use are http://-URIs. There are two
- different mechanims applications can use to allow CXML to locate
- entities using arbitrary Public ID or System ID:
-
-
-
- User-defined entity resolvers can be used to open entities using
- arbitrary protocols. For example, an entity resolver could
- handle all System-IDs with the http scheme using some
- HTTP library. Refer to the description of the
- entity-resolver keyword argument to parser functions (see cxml:parse-file) to more
- information on entity resolvers.
-
-
- XML Catalogs are (local) tables in XML syntax which map External
- IDs to alternative System IDs. If, say, the xhtml DTD is
- present in the local file system and the local copy has been
- registered with the XML catalog, CXML will use the local copy of
- the DTD instead of trying to open the version available using HTTP.
-
-
-
- This section describes XML Catalogs, the second solution. CXML
- implements Oasis
- XML Catalogs.
-
-
-
Variable CXML:*CATALOG*
- The XML Catalog object consulted by the parser before trying to
- open an entity. Initially nil.
-
-
-
Variable CXML:*PREFER*
- The default "prefer" mode from the Catalog specification, one
- of :public or :system. Defaults
- to :public.
-
-
-
Function CXML:MAKE-CATALOG (&optional uris)
- Return a catalog object for the catalog files specified.
-
-
-
Function CXML:RESOLVE-URI (uri catalog)
- Look up uri in catalog and return the
- resulting URI, or nil if no match was found.
-
-
-
Function CXML:RESOLVE-EXTID (publicid systemid catalog)
- Look up the External ID (publicid, systemid)
- in catalog and return the resulting URI, or nil
- if no match was found.
-
-
- Example:
-
-
* (setf cxml:*catalog* nil)
-* (cxml:parse-file "test.xhtml" nil)
-=> Error: URI scheme :HTTP not supported
-
-* (setf cxml:*catalog* (cxml:make-catalog))
-* (cxml:parse-file "test.xhtml" nil)
-;; no error!
-NIL
-
- Note that parsed catalog files are cached in the catalog object.
- Catalog files cached do not expire automatically. To ensure that
- all catalog files are parsed again, create a new catalog object.
-
-
-
-
SAX Interface
-
- A SAX handler is an arbitrary objects that implements some of the
- generic functions in the SAX package. Note that no default
- handler class is necessary, because all generic functions have default
- methods which do nothing. SAX functions are:
-
Function SAX:START-DOCUMENT (handler)
-
Function SAX:END-DOCUMENT (handler)
-
-
Function SAX:START-ELEMENT (handler namespace-uri local-name qname attributes)
-
Function SAX:END-ELEMENT (handler namespace-uri local-name qname)
-
Function SAX:START-PREFIX-MAPPING (handler prefix uri)
-
Function SAX:END-PREFIX-MAPPING (handler prefix)
-
Function SAX:PROCESSING-INSTRUCTION (handler target data)
-
Function SAX:COMMENT (handler data)
-
Function SAX:START-CDATA (handler)
-
Function SAX:END-CDATA (handler)
-
Function SAX:CHARACTERS (handler data)
-
-
Function SAX:START-DTD (handler name public-id system-id)
-
Function SAX:END-DTD (handler)
-
Function SAX:UNPARSED-ENTITY-DECLARATION (handler name public-id system-id notation-name)
-
Function SAX:EXTERNAL-ENTITY-DECLARATION (handler kind name public-id system-id)
-
Function SAX:INTERNAL-ENTITY-DECLARATION (handler kind name value)
-
Function SAX:NOTATION-DECLARATION (handler name public-id system-id)
-
Function SAX:ELEMENT-DECLARATION (handler name model)
-
Function SAX:ATTRIBUTE-DECLARATION (handler ename aname type default)
-
-
Accessor SAX:ATTRIBUTE-PREFIX (attribute)
-
Accessor SAX:ATTRIBUTE-NAMESPACE-URI (attribute)
-
Accessor SAX:ATTRIBUTE-LOCAL-NAME (attribute)
-
Accessor SAX:ATTRIBUTE-VALUE (attribute)
-
Accessor SAX:ATTRIBUTE-QNAME (attribute)
-
Accessor SAX:ATTRIBUTE-SPECIFIED-P (attribute)
-
-
- The entity declaration methods are similar to Java SAX
- definitions, but parameter entities are distinguished from
- general entities not by a % prefix to the name, but by
- the kind argument, either :parameter or
- :general.
-
-
- The arguments to sax:element-declaration and
- sax:attribute-declaration differ significantly from their
- Java counterparts.
-
-
- fixme: For more information on these functions refer to the docstrings.
-
-
-
-
-
DOM Notes
-
- CXML implements the DOM Level 1 Core interfaces. Explaining
- DOM is better left to the specification,
- so please refer to the official W3C documents for DOM.
-
-
- However, there is no "standard" DOM mapping for Lisp. DOM
- is specified
- in CORBA IDL, but it refrains from using object-oriented IDL
- features, allowing for a much more natural Lisp implemenation than
- the the ordinary IDL/Lisp mapping would.
-
-
- Differences between CXML's DOM and the direct IDL/Lisp mapping:
-
-
-
- DOM function names are symbols in the DOM package (not
- the OP package).
-
-
- DOM functions have proper required arguments, not a huge
- &rest lambda list.
-
-
- Although most IDL interfaces are implemented as CLOS classes by
- CXML, the Lisp types of DOM objects is not documented and cannot
- be relied upon. A node's type can be determined using
- dom:node-type instead.
-
-
- DOMString is mapped to rod, which is either
- an (unsigned-byte 16) array type or a string type.
-
-
- The IDL/Lisp mapping maps CORBA enums to Lisp keywords.
- Unfortunately, the DOM IDL does not use enums. Instead,
- both exception types and node types are defined integer
- constants. CXML chooses to ignore this definition and uses
- keywords instead.
-
-
- DOM uses StudlyCaps. Lisp programmers don't. We
- insert #\- before every upper case letter preceded by a
- lower case letter and before every upper case letter which is
- followed by a lower case letter, but preceded by a capital
- letter. This algorithms leads to the natural Lisp spelling
- of DOM function names.
-
-
- Implementation note: DOM's NodeList does not
- necessarily map to a native "sequence" type. (For example,
- node lists are objects in Java, not arrays.)
- NodeList is specified to reflect changes done after a
- node list was created, so node lists cannot be Lisp lists.
- (A node list could be implemented as a CLOS object pointing to
- said list though.) Instead, CXML currently implements node
- lists as adjustable vectors. Note that code which relies on
- this implementation and uses Lisp sequence functions
- instead of sticking to dom:item and dom:length
- is not portable. As a compromise, you can use our
- extensions dom:map-node-list or
- dom:do-node-list, which can be implemented portably.
-
$ export CVSROOT=:pserver:anonymous@common-lisp.net:/project/cxml/cvsroot
+$ cvs login
+Logging in to :pserver:anonymous@common-lisp.net:2401/project/cxml/cvsroot
+CVS password: anonymous
+$ cvs co cxml
+
+
+
+
+
Implementation-specific notes
+
+ CXML should be portable to all Common Lisp implementations
+ supporting gray streams. Currently supported are ACL, CLISP,
+ CMUCL, LispWorks, OpenMCL, and SBCL.
+
+
+
+ Note that CMUCL and OpenMCL do not support Unicode
+ natively. (You might want to use the recoding SAX handler to work with
+ native strings anyway.)
+
+
+ SBCL and CLISP will trip over cxml's non-ASCII source files
+ unless compiled using a suitable locale configuration
+ (LC_CTYPE=en_US.ISO-8859-1 should help).
+
+
+ The SBCL port uses 16 bit surrogate characters instead of taking
+ advantage of SBCL's full 21 bit character support.
+
+
+
+
+
+
+
Compilation
+
+ ASDF is used for
+ compilation. The following instructions assume that ASDF has
+ already been loaded.
+
$ export CVSROOT=:pserver:anonymous@dev.w3.org:/sources/public
+$ cvs login # password is "anonymous"
+$ cvs co 2001/XML-Test-Suite/xmlconf
+$ cvs co -D '2005-05-06 23:00' 2001/DOM-Test-Suite
+$ cd 2001/DOM-Test-Suite && ant dom1-dtd
+
+ Omit -D to get the latest version, which may not work
+ with cxml yet. The ant step is necessary to run the DOM
+ tests.
+
+
Usage and expected output:
+
* (xmlconf:run-all-tests "/path/to/2001/XML-Test-Suite/xmlconf/")
+0/556 tests failed; 1606 tests were skipped
+* (domtest:run-all-tests "/path/to/2001/DOM-Test-Suite/")
+0/449 tests failed; 71 tests were skipped
+
+
+ fixme: Add an explanation of xml/sax-tests here.
+
+
+
+ fixme My parser does not understand the current testsuite
+ anymore. To fix this problem, revert the affected files
+ manually after check-out:
+
+
+
$ cd 2001/XML-Test-Suite/xmlconf/
+xmltest$ patch -p0 -R </path/to/cxml/test/xmlconf-base.diff
+
+
+ The log message for the changes reads "Removed unnecessary
+ xml:base attribute". If I understand correctly, only
+ DOM 3 parsers provide the baseURI attribute necessary for
+ understanding xmlconf.xml now. We don't have that
+ yet.
+
Function CXML:PARSE-FILE (pathname handler &key ...)
+
Function CXML:PARSE-STREAM (stream handler &key ...)
+
Function CXML:PARSE-OCTETS (octets handler &key ...)
+ Parse an XML document.
+ Return values from this function depend on the SAX handler used.
+ Arguments:
+
+
+
pathname -- a Common Lisp pathname
+
stream -- a Common Lisp stream with element-type
+ (unsigned-byte 8)
+
octets -- an (unsigned-byte 8) array
+
handler -- a SAX handler
+
+
+ Common keyword arguments:
+
+
+
+ validate -- A boolean. Defaults to
+ nil. If true, parse in validating mode, i.e. assert that
+ the document contains a DOCTYPE declaration and conforms to the
+ DTD declared.
+
+
+ dtd -- unless nil, an extid instance
+ specifying the external subset to load. This options overrides
+ the extid specified in the document type declaration, if any.
+ See below for make-extid. This option is useful
+ for verification purposes together with the root
+ and disallow-internal-subset arguments.
+
+
root -- the expected root element
+ name, or nil (the default).
+
+
+ entity-resolver -- nil or a function of two
+ arguments which is invoked for every entity referenced by the
+ document with the entity's Public ID (a rod) and System ID (an
+ URI object) as arguments. The function may either return
+ nil, CXML will then try to resolve the entity as usual.
+ Alternatively it may return a Common Lisp stream specialized on
+ (unsigned-byte 8) which will be used instead. (It may
+ also signal an error, of course, which can be useful to prohibit
+ parsed XML documents from including arbitrary files readable by
+ the parser.)
+
+
+ disallow-internal-subset -- a boolean. If true, signal
+ an error if the document contains an internal subset.
+
+
+
+
+
Function CXML:PARSE-DTD-FILE (pathname)
+
Function CXML:PARSE-DTD-STREAM (stream)
+ Parse declarations
+ from a stand-alone file and return an object representing the DTD,
+ suitable as an argument to validate.
+
+
+
pathname -- a Common Lisp pathname
+
stream -- a Common Lisp stream with element-type
+ (unsigned-byte 8)
+
+
+
+
Function CXML:MAKE-EXTID (publicid systemid)
+ Create an object representing the External ID composed
+ of the specified Public ID, a rod or nil, and System ID
+ (an URI object).
+
+
+
+
Function DOM:MAKE-DOM-BUILDER ()
+ Create a SAX handler which builds a DOM document. Example:
+
+
+ NIL: Use a more readable non-canonical representation.
+
+
+
+ With an indentation level, pretty-print the XML by
+ inserting additional whitespace. Note that indentation
+ changes the document model and should only be used if whitespace
+ does not matter to the application.
+
+
+ unparse-document-to-octets returns an (unsigned-byte
+ 8) array, whereas unparse-document writes
+ characters. unparse-document is useful together
+ with with-output-to-string. However, note that the
+ resulting document in both cases is UTF-8 encoded, so the
+ characters written by unparse-document are really UTF-8
+ bytes encoded as characters.
+
+
+
+
Function CXML:MAKE-CHARACTER-STREAM-SINK (stream &rest keys) => sink
+
Function CXML:MAKE-OCTET-VECTOR-SINK (&rest keys) => sink
+ Return a handle suitable for event-based XML serialization.
+
+
+ These function provide the low-level mechanism used by the DOM
+ serialization functions. To serialize a document without building
+ its DOM tree first, create a sink handle and call SAX functions on that
+ handle. sax:end-document returns the serialized form of
+ the document described by the SAX events.
+
+ Macro with-xhtml is a modified version of
+ Franz' htmlgen works as a SAX driver for XHTML.
+ It aims to be a plug-in replacement for the html macro.
+
+
+ xhtmlgen is included as contrib/xhtmlgen.lisp in
+ the cxml distribution. Example:
+
+ Create a SAX handler which validates against a DTD instance.
+ The document's root element must be named root.
+ Used with dom:map-document, this validates a document
+ object as if by re-reading it with a validating parser, except
+ that declarations recorded in the document instance are completely
+ ignored.
+ Example:
+
+
Function DOM:MAP-DOCUMENT (handler document &key include-xmlns-attributes include-default-values)
+ Traverse a DOM document and call SAX functions as if an XML
+ representation of the document were processed by a SAX parser.
+
+
+
+
Class CXML:SAX-PROXY ()
+
Accessor CXML:PROXY-CHAINED-HANDLER
+ sax-proxy is a SAX handler which passes all events it
+ receives on to a user-defined second handler, which defaults
+ to nil. Use sax-proxy to modify the events a
+ SAX handler receives by defining your own subclass
+ of sax-proxy. Setting the chained handler to the target
+ handler, and define methods on your handler class for the events
+ to be modified. All other events will pass through to the chained
+ handler unmodified.
+
+
+
+
XMLS Compatibility
+
+ Like other XML parsers written in Lisp, CXML can work with
+ documents represented as list structures. The specific model
+ implemented by cxml is compatible with the xmls parser. Xmls
+ list structures are a simpler and faster alternative to full DOM
+ document trees. They also serve as an example showing how to
+ implement user-defined document models as an independent layer
+ over the the base parser (c.f. xml/xmls-compat.lisp in
+ the cxml distribution). However, note that the list structures do
+ not include all information available in DOM documents and are
+ sometimes more difficult to work wth since many DOM functions
+ cannot be implemented on them.
+
+
+
Function CXML-XMLS:MAKE-XMLS-BUILDER (&key include-default-values)
+ Create a SAX handler which builds XMLS list structures.
+ If include-default-values is true, default values for
+ attributes declared in a DTD are included as attributes in the
+ xmls output. include-default-values is true by default
+ and can be set to nil to suppress inclusion of default
+ values.
+
+
Function CXML-XMLS:MAKE-NODE (&key name ns attrs
+ children) => xmls node
+ Build a list node of the form
+ (name ((namevalue)*) child*).
+
+
+ The node list's car can also be a cons of local name
+ and namespace prefix ns.
+ fixme: It is unclear to me how namespaces are meant to
+ work in xmls, since xmls documentation differs from how xmls
+ actually works in current releases. Usually applications need to
+ know both the namespace prefix and the namespace URI. We
+ currently follow the xmls implementation and use the
+ namespace prefix instead of following its documentation which
+ shows the URI. We do not follow xmls in munging xmlns attribute
+ values. Attributes themselves have namespaces and it is not clear
+ to me how that works in xmls.
+
+
+
Accessor CXML-XMLS:NODE-NAME (node)
+
Accessor CXML-XMLS:NODE-NS (node)
+
Accessor CXML-XMLS:NODE-ATTRS (node)
+
Accessor CXML-XMLS:NODE-CHILDREN (node)
+ Accessors for xmls node data.
+
+
+
+
+
+
Dealing with Rods
+
+ As explained above, the XML parser handles character encoding and
+ uses 16bit strings internally. Instead of using characters and strings
+ it uses runes and rods. This is seen as a
+ feature, but can be inconvenient.
+
+
+
+ If your Lisp supports 16 bit unicode strings, use feature
+ :rune-is-character and forget about runes and rods.
+ CXML will use ordinary Lisp characters and strings both
+ internally and externally.
+
+
+ If your Lisp does not support such strings and your application
+ needs Unicode support, use functions defined in the
+ runes package instead of ordinary string operators.
+
+
+ If your Lisp does not support such strings and your application
+ does not need Unicode support anyway, it will probably be more
+ convenient to let CXML convert rods into strings automatically.
+ To do that, use cxml:make-recoder to chain a special
+ sax handler between the parser and your application handler.
+ The recoder translates all rods using an application defined
+ function, which defaults to runes:rod-string. Although
+ the actual XML parser still uses rods internally, you SAX
+ handler will only see ordinary Lisp strings.
+
+
+
+ Note that the recoder approach does not work with the DOM
+ builder, since DOM is specified to use UTF-16.
+
+
+
Function CXML:MAKE-RECODER (chained-handler &optional recoder-fn)
+ Return a SAX handler which passes all events on to
+ chained-handler after converting all strings and rods
+ using recoder-fn, a function of one argument which
+ defaults to runes:rod-string.
+
+
+ Example. In a Lisp which ordinarily would use octet vector rods:
+
+ To avoid spending time parsing the same DTD over and over again,
+ CXML can cache DTD objects. The parser consults
+ cxml:*dtd-cache* whenever it is looking for an external
+ subset in a document which does not have an internal subset and
+ uses the cached DTD instance if one is present in the cache for
+ the System ID in question.
+
+
+ Note that DTDs do not expire from the cache automatically.
+ (Future versions of CXML might introduce automatic checks for
+ outdated DTDs.)
+
+
+
Variable CXML:*DTD-CACHE*
+ The DTD cache object consulted by the parser when it needs a DTD.
+
+
+
Function CXML:MAKE-DTD-CACHE ()
+ Return a new, empty DTD cache object.
+
+
+
Variable CXML:*CACHE-ALL-DTDS*
+ If true, instructs the parser to enter all DTDs that could have
+ been cached into *dtd-cache* if they were not cached
+ already. Defaults to nil.
+
+
+
Reader CXML:GETDTD (uri dtd-cache)
+ Return a cached instance of the DTD at uri, if present in
+ the cache, or nil.
+
+
+
Writer CXML:GETDTD (uri dtd-cache)
+ Enter a new value for uri into dtd-cache.
+
+
+
Function CXML:REMDTD (uri dtd-cache)
+ Ensure that no DTD is recorded for uri in the cache and
+ return true if such a DTD was present.
+
+
+
Function CXML:CLEAR-DTD-CACHE (dtd-cache)
+ Remove all entries from dtd-cache.
+
+
+ fixme: thread-safety
+
+
+
+
XML Catalogs
+
+ External entities (for example, DTDs) are referred to using their
+ Public and System IDs. Usually the System ID, a URI, is used to
+ locate the entity. CXML itself handles only file://-URIs, but
+ many System IDs in practical use are http://-URIs. There are two
+ different mechanims applications can use to allow CXML to locate
+ entities using arbitrary Public ID or System ID:
+
+
+
+ User-defined entity resolvers can be used to open entities using
+ arbitrary protocols. For example, an entity resolver could
+ handle all System-IDs with the http scheme using some
+ HTTP library. Refer to the description of the
+ entity-resolver keyword argument to parser functions (see cxml:parse-file) to more
+ information on entity resolvers.
+
+
+ XML Catalogs are (local) tables in XML syntax which map External
+ IDs to alternative System IDs. If, say, the xhtml DTD is
+ present in the local file system and the local copy has been
+ registered with the XML catalog, CXML will use the local copy of
+ the DTD instead of trying to open the version available using HTTP.
+
+
+
+ This section describes XML Catalogs, the second solution. CXML
+ implements Oasis
+ XML Catalogs.
+
+
+
Variable CXML:*CATALOG*
+ The XML Catalog object consulted by the parser before trying to
+ open an entity. Initially nil.
+
+
+
Variable CXML:*PREFER*
+ The default "prefer" mode from the Catalog specification, one
+ of :public or :system. Defaults
+ to :public.
+
+
+
Function CXML:MAKE-CATALOG (&optional uris)
+ Return a catalog object for the catalog files specified.
+
+
+
Function CXML:RESOLVE-URI (uri catalog)
+ Look up uri in catalog and return the
+ resulting URI, or nil if no match was found.
+
+
+
Function CXML:RESOLVE-EXTID (publicid systemid catalog)
+ Look up the External ID (publicid, systemid)
+ in catalog and return the resulting URI, or nil
+ if no match was found.
+
+
+ Example:
+
+
* (setf cxml:*catalog* nil)
+* (cxml:parse-file "test.xhtml" nil)
+=> Error: URI scheme :HTTP not supported
+
+* (setf cxml:*catalog* (cxml:make-catalog))
+* (cxml:parse-file "test.xhtml" nil)
+;; no error!
+NIL
+
+ Note that parsed catalog files are cached in the catalog object.
+ Catalog files cached do not expire automatically. To ensure that
+ all catalog files are parsed again, create a new catalog object.
+
+
+
+
SAX Interface
+
+ A SAX handler is an arbitrary objects that implements some of the
+ generic functions in the SAX package. Note that no default
+ handler class is necessary, because all generic functions have default
+ methods which do nothing. SAX functions are:
+
Function SAX:START-DOCUMENT (handler)
+
Function SAX:END-DOCUMENT (handler)
+
+
Function SAX:START-ELEMENT (handler namespace-uri local-name qname attributes)
+
Function SAX:END-ELEMENT (handler namespace-uri local-name qname)
+
Function SAX:START-PREFIX-MAPPING (handler prefix uri)
+
Function SAX:END-PREFIX-MAPPING (handler prefix)
+
Function SAX:PROCESSING-INSTRUCTION (handler target data)
+
Function SAX:COMMENT (handler data)
+
Function SAX:START-CDATA (handler)
+
Function SAX:END-CDATA (handler)
+
Function SAX:CHARACTERS (handler data)
+
+
Function SAX:START-DTD (handler name public-id system-id)
+
Function SAX:END-DTD (handler)
+
Function SAX:UNPARSED-ENTITY-DECLARATION (handler name public-id system-id notation-name)
+
Function SAX:EXTERNAL-ENTITY-DECLARATION (handler kind name public-id system-id)
+
Function SAX:INTERNAL-ENTITY-DECLARATION (handler kind name value)
+
Function SAX:NOTATION-DECLARATION (handler name public-id system-id)
+
Function SAX:ELEMENT-DECLARATION (handler name model)
+
Function SAX:ATTRIBUTE-DECLARATION (handler ename aname type default)
+
+
Accessor SAX:ATTRIBUTE-PREFIX (attribute)
+
Accessor SAX:ATTRIBUTE-NAMESPACE-URI (attribute)
+
Accessor SAX:ATTRIBUTE-LOCAL-NAME (attribute)
+
Accessor SAX:ATTRIBUTE-VALUE (attribute)
+
Accessor SAX:ATTRIBUTE-QNAME (attribute)
+
Accessor SAX:ATTRIBUTE-SPECIFIED-P (attribute)
+
+
+ The entity declaration methods are similar to Java SAX
+ definitions, but parameter entities are distinguished from
+ general entities not by a % prefix to the name, but by
+ the kind argument, either :parameter or
+ :general.
+
+
+ The arguments to sax:element-declaration and
+ sax:attribute-declaration differ significantly from their
+ Java counterparts.
+
+
+ fixme: For more information on these functions refer to the docstrings.
+
+
+
+
+
DOM Notes
+
+ CXML implements the DOM Level 1 Core interfaces. Explaining
+ DOM is better left to the specification,
+ so please refer to the official W3C documents for DOM.
+
+
+ However, there is no "standard" DOM mapping for Lisp. DOM
+ is specified
+ in CORBA IDL, but it refrains from using object-oriented IDL
+ features, allowing for a much more natural Lisp implemenation than
+ the the ordinary IDL/Lisp mapping would.
+
+
+ Differences between CXML's DOM and the direct IDL/Lisp mapping:
+
+
+
+ DOM function names are symbols in the DOM package (not
+ the OP package).
+
+
+ DOM functions have proper required arguments, not a huge
+ &rest lambda list.
+
+
+ Although most IDL interfaces are implemented as CLOS classes by
+ CXML, the Lisp types of DOM objects is not documented and cannot
+ be relied upon. A node's type can be determined using
+ dom:node-type instead.
+
+
+ DOMString is mapped to rod, which is either
+ an (unsigned-byte 16) array type or a string type.
+
+
+ The IDL/Lisp mapping maps CORBA enums to Lisp keywords.
+ Unfortunately, the DOM IDL does not use enums. Instead,
+ both exception types and node types are defined integer
+ constants. CXML chooses to ignore this definition and uses
+ keywords instead.
+
+
+ DOM uses StudlyCaps. Lisp programmers don't. We
+ insert #\- before every upper case letter preceded by a
+ lower case letter and before every upper case letter which is
+ followed by a lower case letter, but preceded by a capital
+ letter. This algorithms leads to the natural Lisp spelling
+ of DOM function names.
+
+
+ Implementation note: DOM's NodeList does not
+ necessarily map to a native "sequence" type. (For example,
+ node lists are objects in Java, not arrays.)
+ NodeList is specified to reflect changes done after a
+ node list was created, so node lists cannot be Lisp lists.
+ (A node list could be implemented as a CLOS object pointing to
+ said list though.) Instead, CXML currently implements node
+ lists as adjustable vectors. Note that code which relies on
+ this implementation and uses Lisp sequence functions
+ instead of sticking to dom:item and dom:length
+ is not portable. As a compromise, you can use our
+ extensions dom:map-node-list or
+ dom:do-node-list, which can be implemented portably.
+