utf8-dom fixes.
recoding nach utf-8 jetzt der default.
This commit is contained in:
33
doc/dom.html
33
doc/dom.html
@ -22,10 +22,33 @@
|
||||
To parse an XML document into a DOM tree, use the SAX parser with a
|
||||
DOM builder as the SAX handler. Example:
|
||||
</p>
|
||||
<pre>(cxml:parse-file "test.xml" (dom:make-dom-builder))</pre>
|
||||
<pre>(cxml:parse-file "test.xml" (cxml-dom:make-dom-builder))</pre>
|
||||
<p>
|
||||
<div class="def">Function DOM:MAKE-DOM-BUILDER ()</div>
|
||||
<div class="def">Function CXML-DOM:MAKE-DOM-BUILDER ()</div>
|
||||
Create a SAX handler which builds a DOM document.
|
||||
<p>
|
||||
</p>
|
||||
This functions returns a DOM builder that will work with the default
|
||||
configuration of the SAX parser and is guaranteed to use
|
||||
characters/strings instead of runes/rods, if that makes a
|
||||
difference on the Lisp in question.
|
||||
<p>
|
||||
</p>
|
||||
This is the same as <tt>rune-dom:make-dom-builder</tt> on Lisps
|
||||
with Unicode support, and the same as
|
||||
<tt>utf8-dom:make-dom-builder</tt> otherwise.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
<div class="def">Function RUNE-DOM:MAKE-DOM-BUILDER ()</div>
|
||||
Create a SAX handler which builds a DOM document using runes and rods.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
<div class="def">Function UTF8-DOM:MAKE-DOM-BUILDER ()</div>
|
||||
(Only on Lisps without Unicode support:)
|
||||
Create a SAX handler which builds a DOM document using
|
||||
UTF-8-encoded strings.
|
||||
</p>
|
||||
|
||||
<a name="serialization"/>
|
||||
@ -63,6 +86,12 @@
|
||||
<tt>include-default-values</tt> -- include attribute nodes with nil
|
||||
<tt>dom:specified</tt>.
|
||||
</li>
|
||||
<li>
|
||||
<tt>recode</tt> -- (ignored on Lisps with Unicode support.) If
|
||||
true, recode UTF-8 strings to rods. Defaults to true if used
|
||||
with a UTF-8 DOM document. It can be set to false manually to
|
||||
suppress recoding in this case.
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<p>
|
||||
|
||||
@ -33,7 +33,7 @@
|
||||
<li><a href="using.html#parser">Parsing and Validating</a></li>
|
||||
<li><a href="using.html#serialization">Serialization</a></li>
|
||||
<li><a href="using.html#misc">Miscellaneous SAX handlers</a></li>
|
||||
<li><a href="using.html#rods">Dealing with Rods</a></li>
|
||||
<li><a href="using.html#rods">Recoders</a></li>
|
||||
<li><a href="using.html#dtdcache">Caching of DTD Objects</a></li>
|
||||
<li><a href="using.html#catalogs">XML Catalogs</a></li>
|
||||
<li><a href="using.html#sax">SAX Interface</a></li>
|
||||
@ -67,7 +67,7 @@
|
||||
|
||||
<p>Parse <tt>example.xml</tt> into a DOM tree (<a href="using.html#parser">read
|
||||
more</a>):</p>
|
||||
<pre>* <b>(cxml:parse-file "example.xml" (dom:make-dom-builder))</b>
|
||||
<pre>* <b>(cxml:parse-file "example.xml" (cxml-dom:make-dom-builder))</b>
|
||||
#<DOM-IMPL::DOCUMENT @ #x72206172>
|
||||
;; save result for later:
|
||||
* <b>(defparameter *example* *)</b>
|
||||
|
||||
@ -69,6 +69,13 @@
|
||||
<tt>disallow-internal-subset</tt> -- a boolean. If true, signal
|
||||
an error if the document contains an internal subset.
|
||||
</li>
|
||||
<li>
|
||||
<tt>recode</tt> -- a boolean. (Ignored on Lisps with Unicode
|
||||
support.) Recode rods to UTF-8 strings. Defaults to true.
|
||||
Make sure to use <tt>utf8-dom:make-dom-builder</tt> if this
|
||||
option is enabled and <tt>rune-dom:make-dom-builder</tt>
|
||||
otherwise.
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<p>
|
||||
@ -258,7 +265,7 @@
|
||||
ignored.<br/>
|
||||
Example:
|
||||
</p>
|
||||
<pre>(let ((d (parse-file "~/test.xml" (dom:make-dom-builder)))
|
||||
<pre>(let ((d (parse-file "~/test.xml" (cxml-dom:make-dom-builder)))
|
||||
(x (parse-dtd-file "~/test.dtd")))
|
||||
(dom:map-document (cxml:make-validator x #"foo") d))</pre>
|
||||
|
||||
@ -287,40 +294,15 @@
|
||||
</p>
|
||||
|
||||
<a name="rods"/>
|
||||
<h3>Dealing with Rods</h3>
|
||||
<h3>Recoders</h3>
|
||||
<p>
|
||||
As explained above, the XML parser handles character encoding and
|
||||
uses 16bit strings internally. Instead of using characters and strings
|
||||
it uses <em>runes</em> and <em>rods</em>. This is seen as a
|
||||
feature, but can be inconvenient.
|
||||
Recoders are a mechanism used by CXML internally on Lisp implementations
|
||||
without Unicode support to recode UTF-16 vectors (rods) of
|
||||
integers (runes) into UTF-8 strings.
|
||||
</p>
|
||||
<ul>
|
||||
<li>
|
||||
If your Lisp supports 16 bit unicode strings, use feature
|
||||
<tt>:rune-is-character</tt> and forget about runes and rods.
|
||||
CXML will use ordinary Lisp characters and strings both
|
||||
internally and externally.
|
||||
</li>
|
||||
<li>
|
||||
If your Lisp does not support such strings and your application
|
||||
needs Unicode support, use functions defined in the
|
||||
<tt>runes</tt> package instead of ordinary string operators.
|
||||
</li>
|
||||
<li>
|
||||
If your Lisp does not support such strings and your application
|
||||
does not need Unicode support anyway, it will probably be more
|
||||
convenient to let CXML convert rods into strings automatically.
|
||||
To do that, use <tt>cxml:make-recoder</tt> to chain a special
|
||||
sax handler between the parser and your application handler.
|
||||
The recoder translates all rods using an application defined
|
||||
function, which defaults to <tt>runes:rod-string</tt>. Although
|
||||
the actual XML parser still uses rods internally, you SAX
|
||||
handler will only see ordinary Lisp strings.
|
||||
</li>
|
||||
</ul>
|
||||
<p>
|
||||
Note that the recoder approach does <em>not</em> work with the DOM
|
||||
builder, since DOM is specified to use UTF-16.
|
||||
User code does not usually need to deal with recoders in current
|
||||
versions of CXML.
|
||||
</p>
|
||||
<p>
|
||||
<div class="def">Function CXML:MAKE-RECODER (chained-handler recoder-fn)</div>
|
||||
@ -328,16 +310,6 @@
|
||||
<tt>chained-handler</tt> after converting all strings and rods
|
||||
using <tt>recoder-fn</tt>, a function of one argument.
|
||||
</p>
|
||||
<p>
|
||||
<b>Example.</b> In a Lisp which ordinarily would use octet vector rods:
|
||||
</p>
|
||||
<pre>CL-USER(14): (cxml:parse-string "<test/>" (cxml-xmls:make-xmls-builder))
|
||||
(#(116 101 115 116) NIL)</pre>
|
||||
<p>
|
||||
Use a SAX recoder to get strings instead::
|
||||
</p>
|
||||
<pre>CL-USER(17): (parse-string "<test/>" (cxml:make-recoder (cxml-xmls:make-xmls-builder) 'runes:rod-string))
|
||||
("test" NIL)</pre>
|
||||
|
||||
<a name="dtdcache"/>
|
||||
<h3>Caching of DTD Objects</h3>
|
||||
|
||||
Reference in New Issue
Block a user