195 lines
7.6 KiB
HTML
195 lines
7.6 KiB
HTML
<?xml version="1.0" encoding="iso-8859-1"?>
|
|
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
|
|
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
|
|
<head>
|
|
<title>Closure XML</title>
|
|
<link rel="stylesheet" type="text/css" href="cxml.css"/>
|
|
</head>
|
|
<body>
|
|
<div class="sidebar">
|
|
</div>
|
|
|
|
<h1>The DOM implementation</h1>
|
|
<p>
|
|
CXML implements the DOM Level 2 Core interfaces. For details
|
|
on DOM, please refer to the <a
|
|
href="http://www.w3.org/TR/DOM-Level-2-Core/core.html">specification</a>.
|
|
</p>
|
|
|
|
<a name="parser"/>
|
|
<h3>Parsing into DOM</h3>
|
|
<p>
|
|
To parse an XML document into a DOM tree, use the SAX parser with a
|
|
DOM builder as the SAX handler. Example:
|
|
</p>
|
|
<pre>(cxml:parse-file "test.xml" (cxml-dom:make-dom-builder))</pre>
|
|
<p>
|
|
<div class="def">Function CXML-DOM:MAKE-DOM-BUILDER ()</div>
|
|
Create a SAX handler which builds a DOM document.
|
|
<p>
|
|
</p>
|
|
This functions returns a DOM builder that will work with the default
|
|
configuration of the SAX parser and is guaranteed to use
|
|
characters/strings instead of runes/rods, if that makes a
|
|
difference on the Lisp in question.
|
|
<p>
|
|
</p>
|
|
This is the same as <tt>rune-dom:make-dom-builder</tt> on Lisps
|
|
with Unicode support, and the same as
|
|
<tt>utf8-dom:make-dom-builder</tt> otherwise.
|
|
</p>
|
|
|
|
<p>
|
|
<div class="def">Function RUNE-DOM:MAKE-DOM-BUILDER ()</div>
|
|
Create a SAX handler which builds a DOM document using runes and rods.
|
|
</p>
|
|
|
|
<p>
|
|
<div class="def">Function UTF8-DOM:MAKE-DOM-BUILDER ()</div>
|
|
(Only on Lisps without Unicode support:)
|
|
Create a SAX handler which builds a DOM document using
|
|
UTF-8-encoded strings.
|
|
</p>
|
|
|
|
<a name="serialization"/>
|
|
<h3>Serializing DOM</h3>
|
|
<p>
|
|
The technique used to serialize a DOM document is to use a SAX
|
|
serialization sink as the argument to <tt>dom:map-document</tt>,
|
|
which generates SAX events for the DOM tree.
|
|
</p>
|
|
<p>
|
|
In addition, there are convenience functions like
|
|
<tt>unparse-document</tt> as a thin wrapper around
|
|
<tt>map-document</tt>.
|
|
</p>
|
|
<p>
|
|
<div class="def">Function DOM:MAP-DOCUMENT (handler document &key include-xmlns-attributes include-default-values include-doctype)</div>
|
|
Traverse a DOM document and call SAX functions as if an XML
|
|
representation of the document was processed by a SAX parser.
|
|
</p>
|
|
<p>Keyword arguments:</p>
|
|
<ul>
|
|
<li>
|
|
<tt>include-xmlns-attributes</tt> -- defaults to
|
|
<tt>sax:*include-xmlns-attributes*</tt>
|
|
</li>
|
|
<li>
|
|
<tt>include-doctype</tt> -- One of <tt>nil</tt> (no doctype
|
|
declaration), <tt>:full-internal-subset</tt> (include a doctype
|
|
declaration and the full internal subset), or
|
|
<tt>:canonical-notations</tt> (write a doctype declaration
|
|
with an internal subset including only notations, as required
|
|
for canonical serialization).
|
|
</li>
|
|
<li>
|
|
<tt>include-default-values</tt> -- include attribute nodes with nil
|
|
<tt>dom:specified</tt>.
|
|
</li>
|
|
<li>
|
|
<tt>recode</tt> -- (ignored on Lisps with Unicode support.) If
|
|
true, recode UTF-8 strings to rods. Defaults to true if used
|
|
with a UTF-8 DOM document. It can be set to false manually to
|
|
suppress recoding in this case.
|
|
</li>
|
|
</ul>
|
|
|
|
<p>
|
|
<div class="def">Function CXML:UNPARSE-DOCUMENT (document stream &rest keys)</div>
|
|
<div class="def">Function CXML:UNPARSE-DOCUMENT-TO-OCTETS (document &rest keys) => vector</div>
|
|
</p>
|
|
<p>
|
|
Serialize a DOM document object. These convenience functions are
|
|
wrappers around <tt>dom:map-document</tt>.
|
|
</p>
|
|
<p>Keyword arguments are passed on to the sink. C.f. <a
|
|
href="using.html#serialization">cxml:make-octet-vector-sink</a>.</p>
|
|
<p>Notes:</p>
|
|
<ul>
|
|
<li>
|
|
If keyword argument <tt>canonical</tt> is specified as 2, a
|
|
doctype declaration will be written that includes notations
|
|
declared in the document.
|
|
</li>
|
|
<li>
|
|
If namespace processing is enabled
|
|
(<tt>sax:*namespace-processing*</tt>), a <a
|
|
href="using.html#misc">namespace normalizer</a> is used.
|
|
</li>
|
|
</ul>
|
|
<p>
|
|
<tt>unparse-document-to-octets</tt> returns an <tt>(unsigned-byte
|
|
8)</tt> array, whereas <tt>unparse-document</tt> writes
|
|
characters. <tt>unparse-document</tt> is useful together
|
|
with <tt>with-output-to-string</tt>. However, note that the
|
|
resulting document in both cases is UTF-8 encoded, so the
|
|
characters written by <tt>unparse-document</tt> are really UTF-8
|
|
bytes encoded as characters.
|
|
</p>
|
|
|
|
<a name="mapping"/>
|
|
<h3>DOM/Lisp mapping</h3>
|
|
<p>
|
|
Note that there is no "standard" DOM mapping for Lisp.
|
|
</p>
|
|
<p>
|
|
DOM is <a
|
|
href="http://www.w3.org/TR/DOM-Level-2-Core/idl-definitions.html">specified
|
|
in CORBA IDL</a>, but it refrains from using object-oriented IDL
|
|
features, allowing for a much more natural Lisp implemenation than
|
|
the the ordinary IDL/Lisp mapping would.
|
|
Differences between CXML's DOM and the direct IDL/Lisp mapping:
|
|
</p>
|
|
<ul>
|
|
<li>
|
|
DOM function names are symbols in the <tt>DOM</tt> package (not
|
|
the <tt>OP</tt> package).
|
|
</li>
|
|
<li>
|
|
DOM functions have proper required arguments, not a huge
|
|
<tt>&rest</tt> lambda list.
|
|
</li>
|
|
<li>
|
|
Although most IDL interfaces are implemented as CLOS classes by
|
|
CXML, the Lisp types of DOM objects is not documented and cannot
|
|
be relied upon. A node's type can be determined using
|
|
<tt>dom:node-type</tt> instead.
|
|
</li>
|
|
<li>
|
|
<tt>DOMString</tt> is mapped to <tt>rod</tt>, which is either
|
|
an <tt>(unsigned-byte 16)</tt> array type or a string type.
|
|
</li>
|
|
<li>
|
|
The IDL/Lisp mapping maps CORBA enums to Lisp keywords.
|
|
Unfortunately, the DOM IDL does not use enums. Instead,
|
|
both exception types and node types are defined integer
|
|
constants. CXML chooses to ignore this definition and uses
|
|
keywords instead.
|
|
</li>
|
|
<li>
|
|
DOM uses StudlyCaps. Lisp programmers don't. We
|
|
insert <tt>#\-</tt> before every upper case letter preceded by a
|
|
lower case letter and before every upper case letter which is
|
|
followed by a lower case letter, but preceded by a capital
|
|
letter. This algorithms leads to the natural Lisp spelling
|
|
of DOM function names.
|
|
</li>
|
|
<li>
|
|
Implementation note: DOM's <tt>NodeList</tt> does not
|
|
necessarily map to a native "sequence" type. (For example,
|
|
node lists are objects in Java, not arrays.)
|
|
<tt>NodeList</tt> is specified to reflect changes done after a
|
|
node list was created, so node lists cannot be Lisp lists.
|
|
(A node list could be implemented as a CLOS object pointing to
|
|
said list though.) Instead, CXML currently implements node
|
|
lists as adjustable vectors. Note that code which relies on
|
|
this implementation and uses Lisp sequence functions
|
|
instead of sticking to <tt>dom:item</tt> and <tt>dom:length</tt>
|
|
is not portable. As a compromise, you can use our
|
|
extensions <tt>dom:map-node-list</tt> or
|
|
<tt>dom:do-node-list</tt>, which can be implemented portably.
|
|
</li>
|
|
</ul>
|
|
</body>
|
|
</html>
|