utf8-dom fixes.
recoding nach utf-8 jetzt der default.
This commit is contained in:
33
doc/dom.html
33
doc/dom.html
@ -22,10 +22,33 @@
|
||||
To parse an XML document into a DOM tree, use the SAX parser with a
|
||||
DOM builder as the SAX handler. Example:
|
||||
</p>
|
||||
<pre>(cxml:parse-file "test.xml" (dom:make-dom-builder))</pre>
|
||||
<pre>(cxml:parse-file "test.xml" (cxml-dom:make-dom-builder))</pre>
|
||||
<p>
|
||||
<div class="def">Function DOM:MAKE-DOM-BUILDER ()</div>
|
||||
<div class="def">Function CXML-DOM:MAKE-DOM-BUILDER ()</div>
|
||||
Create a SAX handler which builds a DOM document.
|
||||
<p>
|
||||
</p>
|
||||
This functions returns a DOM builder that will work with the default
|
||||
configuration of the SAX parser and is guaranteed to use
|
||||
characters/strings instead of runes/rods, if that makes a
|
||||
difference on the Lisp in question.
|
||||
<p>
|
||||
</p>
|
||||
This is the same as <tt>rune-dom:make-dom-builder</tt> on Lisps
|
||||
with Unicode support, and the same as
|
||||
<tt>utf8-dom:make-dom-builder</tt> otherwise.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
<div class="def">Function RUNE-DOM:MAKE-DOM-BUILDER ()</div>
|
||||
Create a SAX handler which builds a DOM document using runes and rods.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
<div class="def">Function UTF8-DOM:MAKE-DOM-BUILDER ()</div>
|
||||
(Only on Lisps without Unicode support:)
|
||||
Create a SAX handler which builds a DOM document using
|
||||
UTF-8-encoded strings.
|
||||
</p>
|
||||
|
||||
<a name="serialization"/>
|
||||
@ -63,6 +86,12 @@
|
||||
<tt>include-default-values</tt> -- include attribute nodes with nil
|
||||
<tt>dom:specified</tt>.
|
||||
</li>
|
||||
<li>
|
||||
<tt>recode</tt> -- (ignored on Lisps with Unicode support.) If
|
||||
true, recode UTF-8 strings to rods. Defaults to true if used
|
||||
with a UTF-8 DOM document. It can be set to false manually to
|
||||
suppress recoding in this case.
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<p>
|
||||
|
||||
@ -33,7 +33,7 @@
|
||||
<li><a href="using.html#parser">Parsing and Validating</a></li>
|
||||
<li><a href="using.html#serialization">Serialization</a></li>
|
||||
<li><a href="using.html#misc">Miscellaneous SAX handlers</a></li>
|
||||
<li><a href="using.html#rods">Dealing with Rods</a></li>
|
||||
<li><a href="using.html#rods">Recoders</a></li>
|
||||
<li><a href="using.html#dtdcache">Caching of DTD Objects</a></li>
|
||||
<li><a href="using.html#catalogs">XML Catalogs</a></li>
|
||||
<li><a href="using.html#sax">SAX Interface</a></li>
|
||||
@ -67,7 +67,7 @@
|
||||
|
||||
<p>Parse <tt>example.xml</tt> into a DOM tree (<a href="using.html#parser">read
|
||||
more</a>):</p>
|
||||
<pre>* <b>(cxml:parse-file "example.xml" (dom:make-dom-builder))</b>
|
||||
<pre>* <b>(cxml:parse-file "example.xml" (cxml-dom:make-dom-builder))</b>
|
||||
#<DOM-IMPL::DOCUMENT @ #x72206172>
|
||||
;; save result for later:
|
||||
* <b>(defparameter *example* *)</b>
|
||||
|
||||
@ -69,6 +69,13 @@
|
||||
<tt>disallow-internal-subset</tt> -- a boolean. If true, signal
|
||||
an error if the document contains an internal subset.
|
||||
</li>
|
||||
<li>
|
||||
<tt>recode</tt> -- a boolean. (Ignored on Lisps with Unicode
|
||||
support.) Recode rods to UTF-8 strings. Defaults to true.
|
||||
Make sure to use <tt>utf8-dom:make-dom-builder</tt> if this
|
||||
option is enabled and <tt>rune-dom:make-dom-builder</tt>
|
||||
otherwise.
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<p>
|
||||
@ -258,7 +265,7 @@
|
||||
ignored.<br/>
|
||||
Example:
|
||||
</p>
|
||||
<pre>(let ((d (parse-file "~/test.xml" (dom:make-dom-builder)))
|
||||
<pre>(let ((d (parse-file "~/test.xml" (cxml-dom:make-dom-builder)))
|
||||
(x (parse-dtd-file "~/test.dtd")))
|
||||
(dom:map-document (cxml:make-validator x #"foo") d))</pre>
|
||||
|
||||
@ -287,40 +294,15 @@
|
||||
</p>
|
||||
|
||||
<a name="rods"/>
|
||||
<h3>Dealing with Rods</h3>
|
||||
<h3>Recoders</h3>
|
||||
<p>
|
||||
As explained above, the XML parser handles character encoding and
|
||||
uses 16bit strings internally. Instead of using characters and strings
|
||||
it uses <em>runes</em> and <em>rods</em>. This is seen as a
|
||||
feature, but can be inconvenient.
|
||||
Recoders are a mechanism used by CXML internally on Lisp implementations
|
||||
without Unicode support to recode UTF-16 vectors (rods) of
|
||||
integers (runes) into UTF-8 strings.
|
||||
</p>
|
||||
<ul>
|
||||
<li>
|
||||
If your Lisp supports 16 bit unicode strings, use feature
|
||||
<tt>:rune-is-character</tt> and forget about runes and rods.
|
||||
CXML will use ordinary Lisp characters and strings both
|
||||
internally and externally.
|
||||
</li>
|
||||
<li>
|
||||
If your Lisp does not support such strings and your application
|
||||
needs Unicode support, use functions defined in the
|
||||
<tt>runes</tt> package instead of ordinary string operators.
|
||||
</li>
|
||||
<li>
|
||||
If your Lisp does not support such strings and your application
|
||||
does not need Unicode support anyway, it will probably be more
|
||||
convenient to let CXML convert rods into strings automatically.
|
||||
To do that, use <tt>cxml:make-recoder</tt> to chain a special
|
||||
sax handler between the parser and your application handler.
|
||||
The recoder translates all rods using an application defined
|
||||
function, which defaults to <tt>runes:rod-string</tt>. Although
|
||||
the actual XML parser still uses rods internally, you SAX
|
||||
handler will only see ordinary Lisp strings.
|
||||
</li>
|
||||
</ul>
|
||||
<p>
|
||||
Note that the recoder approach does <em>not</em> work with the DOM
|
||||
builder, since DOM is specified to use UTF-16.
|
||||
User code does not usually need to deal with recoders in current
|
||||
versions of CXML.
|
||||
</p>
|
||||
<p>
|
||||
<div class="def">Function CXML:MAKE-RECODER (chained-handler recoder-fn)</div>
|
||||
@ -328,16 +310,6 @@
|
||||
<tt>chained-handler</tt> after converting all strings and rods
|
||||
using <tt>recoder-fn</tt>, a function of one argument.
|
||||
</p>
|
||||
<p>
|
||||
<b>Example.</b> In a Lisp which ordinarily would use octet vector rods:
|
||||
</p>
|
||||
<pre>CL-USER(14): (cxml:parse-string "<test/>" (cxml-xmls:make-xmls-builder))
|
||||
(#(116 101 115 116) NIL)</pre>
|
||||
<p>
|
||||
Use a SAX recoder to get strings instead::
|
||||
</p>
|
||||
<pre>CL-USER(17): (parse-string "<test/>" (cxml:make-recoder (cxml-xmls:make-xmls-builder) 'runes:rod-string))
|
||||
("test" NIL)</pre>
|
||||
|
||||
<a name="dtdcache"/>
|
||||
<h3>Caching of DTD Objects</h3>
|
||||
|
||||
@ -83,7 +83,7 @@
|
||||
|
||||
(defmethod sax:start-element
|
||||
((handler dom-builder) namespace-uri local-name qname attributes)
|
||||
(check-type qname rod)
|
||||
(check-type qname rod) ;catch recoder/builder mismatch
|
||||
(with-slots (document element-stack) handler
|
||||
(let* ((nsp sax:*namespace-processing*)
|
||||
(element (make-instance 'element
|
||||
|
||||
@ -1228,7 +1228,9 @@
|
||||
(when resolver
|
||||
(setf (document handler) owner)
|
||||
(push instance (element-stack handler))
|
||||
(funcall resolver (dom:name instance) handler)))
|
||||
#+cxml-system::utf8dom-file
|
||||
(setf handler (cxml:make-recoder handler #'cxml:rod-to-utf8-string))
|
||||
(funcall resolver (real-rod (dom:name instance)) handler)))
|
||||
(labels ((walk (n)
|
||||
(setf (slot-value n 'read-only-p) t)
|
||||
(when (dom:element-p n)
|
||||
|
||||
@ -12,7 +12,11 @@
|
||||
(handler document
|
||||
&key (include-xmlns-attributes sax:*include-xmlns-attributes*)
|
||||
include-doctype
|
||||
include-default-values)
|
||||
include-default-values
|
||||
(recode (typep document 'utf8-dom::node)))
|
||||
#+rune-is-integer
|
||||
(when recode
|
||||
(setf handler (make-recoder handler #'utf8-string-to-rod)))
|
||||
(sax:start-document handler)
|
||||
(when include-doctype
|
||||
(let ((doctype (dom:doctype document)))
|
||||
|
||||
@ -175,7 +175,8 @@
|
||||
(defun read-members (&optional (directory *directory*))
|
||||
(let* ((pathname (merge-pathnames "build/dom2-interfaces.xml" directory))
|
||||
(builder (rune-dom:make-dom-builder))
|
||||
(library (dom:document-element (cxml:parse-file pathname builder)))
|
||||
(library (dom:document-element
|
||||
(cxml:parse-file pathname builder :recode nil)))
|
||||
(methods '())
|
||||
(fields '()))
|
||||
(do-child-elements (interface library :name "interface")
|
||||
@ -584,7 +585,8 @@
|
||||
(catch 'give-up
|
||||
(let* ((builder (rune-dom:make-dom-builder))
|
||||
(cxml::*validate* nil) ;dom1.dtd is buggy
|
||||
(test (dom:document-element (cxml:parse-file pathname builder)))
|
||||
(test (dom:document-element
|
||||
(cxml:parse-file pathname builder :recode nil)))
|
||||
title
|
||||
(bindings '())
|
||||
(code '()))
|
||||
@ -631,7 +633,8 @@
|
||||
(setf name (runes:rod-string name))
|
||||
(cxml:parse-file
|
||||
(make-pathname :name name :type "xml" :defaults *files-directory*)
|
||||
(rune-dom:make-dom-builder)))
|
||||
(rune-dom:make-dom-builder)
|
||||
:recode nil))
|
||||
|
||||
(defparameter *bad-tests*
|
||||
'("hc_elementnormalize2.xml"
|
||||
@ -656,7 +659,7 @@
|
||||
(let* ((all-tests (merge-pathnames "alltests.xml" test-directory))
|
||||
(builder (rune-dom:make-dom-builder))
|
||||
(suite (dom:document-element
|
||||
(cxml:parse-file all-tests builder)))
|
||||
(cxml:parse-file all-tests builder :recode nil)))
|
||||
(*files-directory*
|
||||
(merge-pathnames "files/" test-directory)))
|
||||
(do-child-elements (member suite)
|
||||
|
||||
102
test/utf8domtest.diff
Normal file
102
test/utf8domtest.diff
Normal file
@ -0,0 +1,102 @@
|
||||
Index: test/domtest.lisp
|
||||
===================================================================
|
||||
RCS file: /project/cxml/cvsroot/cxml/test/domtest.lisp,v
|
||||
retrieving revision 1.13
|
||||
diff -u -r1.13 domtest.lisp
|
||||
--- test/domtest.lisp 27 Dec 2005 00:21:37 -0000 1.13
|
||||
+++ test/domtest.lisp 27 Dec 2005 00:46:00 -0000
|
||||
@@ -137,21 +137,22 @@
|
||||
((digit-char-p (runes:rune-char (elt str 0)))
|
||||
(parse-integer (runes:rod-string str)))
|
||||
((runes:rune= (elt str 0) #.(runes:char-rune #\"))
|
||||
- (let ((v (make-array 1 :fill-pointer 0 :adjustable t)))
|
||||
- (for* ((i = 1 :then (1+ i))
|
||||
- (c = (elt str i))
|
||||
- :until (runes:rune= c #.(runes:char-rune #\")))
|
||||
- (if (runes:rune= c #.(runes:char-rune #\\))
|
||||
- (let ((frob
|
||||
- (progn
|
||||
- (incf i)
|
||||
- (elt str i))))
|
||||
- (ecase frob
|
||||
- ;; ...
|
||||
- (#/n (vector-push-extend #/newline v (length v)))
|
||||
- ((#/\\ #/\") (vector-push-extend #/\\ v (length v)))))
|
||||
- (vector-push-extend c v (length v))))
|
||||
- (coerce v 'runes::simple-rod)))
|
||||
+ (utf8-dom::%rod
|
||||
+ (let ((v (make-array 1 :fill-pointer 0 :adjustable t)))
|
||||
+ (for* ((i = 1 :then (1+ i))
|
||||
+ (c = (elt str i))
|
||||
+ :until (runes:rune= c #.(runes:char-rune #\")))
|
||||
+ (if (runes:rune= c #.(runes:char-rune #\\))
|
||||
+ (let ((frob
|
||||
+ (progn
|
||||
+ (incf i)
|
||||
+ (elt str i))))
|
||||
+ (ecase frob
|
||||
+ ;; ...
|
||||
+ (#/n (vector-push-extend #/newline v (length v)))
|
||||
+ ((#/\\ #/\") (vector-push-extend #/\\ v (length v)))))
|
||||
+ (vector-push-extend c v (length v))))
|
||||
+ (coerce v 'runes::simple-rod))))
|
||||
(t
|
||||
(%intern str))))
|
||||
|
||||
@@ -368,7 +369,7 @@
|
||||
|
||||
(defun translate-implementation (elt)
|
||||
(with-attributes (|var|) elt
|
||||
- (maybe-setf (%intern |var|) `'rune-dom:implementation)))
|
||||
+ (maybe-setf (%intern |var|) `'utf8-dom:implementation)))
|
||||
|
||||
(defun translate-length (load)
|
||||
;; XXX Soweit ich sehe unterscheiden die Tests nicht zwischen
|
||||
@@ -406,7 +407,7 @@
|
||||
(if (nullify |obj|)
|
||||
(translate-member element)
|
||||
(maybe-setf (%intern |var|)
|
||||
- `(dom:has-feature 'rune-dom:implementation
|
||||
+ `(dom:has-feature 'utf8-dom:implementation
|
||||
,(parse-java-literal |feature|)
|
||||
,(parse-java-literal |version|))))))
|
||||
|
||||
@@ -493,9 +494,9 @@
|
||||
(return
|
||||
`(block assert-domexception
|
||||
(handler-bind
|
||||
- ((rune-dom::dom-exception
|
||||
+ ((utf8-dom::dom-exception
|
||||
(lambda (c)
|
||||
- (when (eq (rune-dom::dom-exception-key c)
|
||||
+ (when (eq (utf8-dom::dom-exception-key c)
|
||||
,(intern (tag-name c) :keyword))
|
||||
(return-from assert-domexception)))))
|
||||
,@(translate-body c)
|
||||
@@ -506,7 +507,7 @@
|
||||
,@(map-child-elements
|
||||
'list
|
||||
(lambda (exception)
|
||||
- `(when (eq (rune-dom::dom-exception-key c)
|
||||
+ `(when (eq (utf8-dom::dom-exception-key c)
|
||||
,(intern (runes:rod-string (dom:get-attribute exception "code"))
|
||||
:keyword))
|
||||
,@(translate-body exception)
|
||||
@@ -516,7 +517,7 @@
|
||||
(defun translate-try (element)
|
||||
`(block try
|
||||
(handler-bind
|
||||
- ((rune-dom::dom-exception
|
||||
+ ((utf8-dom::dom-exception
|
||||
,(translate-catch
|
||||
(do-child-elements (c element :name "catch") (return c))
|
||||
'(return-from try))))
|
||||
@@ -631,7 +632,7 @@
|
||||
(setf name (runes:rod-string name))
|
||||
(cxml:parse-file
|
||||
(make-pathname :name name :type "xml" :defaults *files-directory*)
|
||||
- (rune-dom:make-dom-builder)))
|
||||
+ (cxml:make-recoder (utf8-dom:make-dom-builder) 'cxml:rod-to-utf8-string)))
|
||||
|
||||
(defparameter *bad-tests*
|
||||
'("hc_elementnormalize2.xml"
|
||||
@ -77,7 +77,7 @@
|
||||
(defun run-all-tests (directory)
|
||||
(let* ((pathname (merge-pathnames "xmlconf.xml" directory))
|
||||
(builder (rune-dom:make-dom-builder))
|
||||
(xmlconf (cxml:parse-file pathname builder))
|
||||
(xmlconf (cxml:parse-file pathname builder :recode nil))
|
||||
(ntried 0)
|
||||
(nfailed 0)
|
||||
(nskipped 0)
|
||||
@ -125,6 +125,7 @@
|
||||
(let ((document (apply #'cxml:parse-file
|
||||
pathname
|
||||
(rune-dom:make-dom-builder)
|
||||
:recode nil
|
||||
args)))
|
||||
(cond
|
||||
((null output)
|
||||
@ -161,7 +162,10 @@
|
||||
(handler-case
|
||||
(progn
|
||||
(format t " [validating:]")
|
||||
(cxml:parse-file pathname (rune-dom:make-dom-builder) :validate t)
|
||||
(cxml:parse-file pathname
|
||||
(rune-dom:make-dom-builder)
|
||||
:recode nil
|
||||
:validate t)
|
||||
(error "validity error not detected")
|
||||
nil)
|
||||
(cxml:validity-error ()
|
||||
@ -174,7 +178,10 @@
|
||||
(handler-case
|
||||
(progn
|
||||
(format t " [not validating:]")
|
||||
(cxml:parse-file pathname (rune-dom:make-dom-builder) :validate nil)
|
||||
(cxml:parse-file pathname
|
||||
(rune-dom:make-dom-builder)
|
||||
:recode nil
|
||||
:validate nil)
|
||||
(error "well-formedness violation not detected")
|
||||
nil)
|
||||
(cxml:well-formedness-violation ()
|
||||
@ -183,7 +190,10 @@
|
||||
(handler-case
|
||||
(progn
|
||||
(format t " [validating:]")
|
||||
(cxml:parse-file pathname (rune-dom:make-dom-builder) :validate t)
|
||||
(cxml:parse-file pathname
|
||||
(rune-dom:make-dom-builder)
|
||||
:recode nil
|
||||
:validate t)
|
||||
(error "well-formedness violation not detected")
|
||||
nil)
|
||||
(cxml:well-formedness-violation ()
|
||||
|
||||
@ -75,4 +75,6 @@
|
||||
#:make-recoder
|
||||
#:sax-proxy
|
||||
#:proxy-chained-handler
|
||||
#:make-namespace-normalizer))
|
||||
#:make-namespace-normalizer
|
||||
#:rod-to-utf8-string
|
||||
#:utf8-string-to-rod))
|
||||
|
||||
@ -74,6 +74,9 @@
|
||||
(%string public-id)
|
||||
(%string system-id))
|
||||
|
||||
(defwrapper sax:start-internal-subset ())
|
||||
(defwrapper sax:end-internal-subset ())
|
||||
|
||||
(defwrapper sax:end-dtd ())
|
||||
|
||||
(defwrapper sax:unparsed-entity-declaration
|
||||
|
||||
@ -2552,13 +2552,18 @@
|
||||
|
||||
(defun p/document
|
||||
(input handler
|
||||
&key validate dtd root entity-resolver disallow-internal-subset)
|
||||
&key validate dtd root entity-resolver disallow-internal-subset
|
||||
(recode t))
|
||||
;; check types of user-supplied arguments for better error messages:
|
||||
(check-type validate boolean)
|
||||
(check-type recode boolean)
|
||||
(check-type dtd (or null extid))
|
||||
(check-type root (or null rod))
|
||||
(check-type entity-resolver (or null function symbol))
|
||||
(check-type disallow-internal-subset boolean)
|
||||
#+rune-is-integer
|
||||
(when recode
|
||||
(setf handler (make-recoder handler #'rod-to-utf8-string)))
|
||||
(let ((*ctx*
|
||||
(make-context :handler handler
|
||||
:main-zstream input
|
||||
|
||||
Reference in New Issue
Block a user