namespace normalization

This commit is contained in:
dlichteblau
2005-12-11 23:56:45 +00:00
parent 2b5b61baf1
commit 5363dadbb8
8 changed files with 205 additions and 39 deletions

View File

@ -90,7 +90,12 @@
<h2>Recent Changes</h2> <h2>Recent Changes</h2>
<p class="nomargin"><tt>rel-2005-xx-yy</tt></p> <p class="nomargin"><tt>rel-2005-xx-yy</tt></p>
<ul class="nomargin"> <ul class="nomargin">
<li>Implemented DOM 2 Core.</li> <li>
Implemented DOM 2 Core.
(A handler for DOM 3-style namespace normalization is provided and
used by default for serialization of DOM documents if namespace
support is enabled.)
</li>
<li> <li>
Error handling overhaul: All syntax errors should now be Error handling overhaul: All syntax errors should now be
reported as instances of <tt>well-formedness-violation</tt>. We reported as instances of <tt>well-formedness-violation</tt>. We

View File

@ -68,6 +68,7 @@
(:file "unparse" :depends-on ("xml-parse")) (:file "unparse" :depends-on ("xml-parse"))
(:file "xmls-compat" :depends-on ("xml-parse")) (:file "xmls-compat" :depends-on ("xml-parse"))
(:file "recoder" :depends-on ("xml-parse")) (:file "recoder" :depends-on ("xml-parse"))
(:file "xmlns-normalizer" :depends-on ("xml-parse"))
(:file "catalog" :depends-on ("xml-parse")) (:file "catalog" :depends-on ("xml-parse"))
(:file "sax-proxy" :depends-on ("xml-parse"))) (:file "sax-proxy" :depends-on ("xml-parse")))
:depends-on (:cxml-runes :puri :trivial-gray-streams)) :depends-on (:cxml-runes :puri :trivial-gray-streams))

View File

@ -155,12 +155,6 @@
(an URI object). (an URI object).
</p> </p>
<p>
<div class="def">Function DOM:MAKE-DOM-BUILDER ()</div>
Create a SAX handler which builds a DOM document.&nbsp; Example:
</p>
<pre>(cxml:parse-file "test.xml" (dom:make-dom-builder))</pre>
<p> <p>
<div class="def">Condition class CXML:XML-PARSE-ERROR ()</div> <div class="def">Condition class CXML:XML-PARSE-ERROR ()</div>
Superclass of all conditions signalled by the CXML parser. Superclass of all conditions signalled by the CXML parser.
@ -183,7 +177,8 @@
<p> <p>
<div class="def">Function CXML:UNPARSE-DOCUMENT (document stream &rest keys)</div> <div class="def">Function CXML:UNPARSE-DOCUMENT (document stream &rest keys)</div>
<div class="def">Function CXML:UNPARSE-DOCUMENT-TO-OCTETS (document &rest keys) => vector</div> <div class="def">Function CXML:UNPARSE-DOCUMENT-TO-OCTETS (document &rest keys) => vector</div>
Serialize a DOM document object. Serialize a DOM document object. These convenience functions are
wrappers around <tt>dom:map-document</tt>.
</p> </p>
<ul> <ul>
<li><tt>document</tt> -- a DOM document object</li> <li><tt>document</tt> -- a DOM document object</li>
@ -223,6 +218,10 @@
changes the document model and should only be used if whitespace changes the document model and should only be used if whitespace
does not matter to the application. does not matter to the application.
</p> </p>
<p>
If namespace support is enabled (the default), these functions use
a namespace normalizer (<tt>cxml:make-namespace-normalizer</tt>).
</p>
<p> <p>
<tt>unparse-document-to-octets</tt> returns an <tt>(unsigned-byte <tt>unparse-document-to-octets</tt> returns an <tt>(unsigned-byte
8)</tt> array, whereas <tt>unparse-document</tt> writes 8)</tt> array, whereas <tt>unparse-document</tt> writes
@ -247,7 +246,7 @@
</p> </p>
<p> <p>
<div class="def">Macro CXML:WITH-XML-OUTPUT (sink &body body) => vector</div> <div class="def">Macro CXML:WITH-XML-OUTPUT (sink &body body) => sink-specific result</div>
<div class="def">Macro CXML:WITH-ELEMENT (qname &body body) => result</div> <div class="def">Macro CXML:WITH-ELEMENT (qname &body body) => result</div>
<div class="def">Function CXML:ATTRIBUTE (name value) => value</div> <div class="def">Function CXML:ATTRIBUTE (name value) => value</div>
<div class="def">Function CXML:TEXT (data) => data</div> <div class="def">Function CXML:TEXT (data) => data</div>
@ -319,12 +318,6 @@
(x (parse-dtd-file "~/test.dtd"))) (x (parse-dtd-file "~/test.dtd")))
(dom:map-document (cxml:make-validator x #"foo") d))</pre> (dom:map-document (cxml:make-validator x #"foo") d))</pre>
<p>
<div class="def">Function DOM:MAP-DOCUMENT (handler document &key include-xmlns-attributes include-default-values)</div>
Traverse a DOM document and call SAX functions as if an XML
representation of the document were processed by a SAX parser.
</p>
<p> <p>
<div class="def">Class CXML:SAX-PROXY ()</div> <div class="def">Class CXML:SAX-PROXY ()</div>
<div class="def">Accessor CXML:PROXY-CHAINED-HANDLER</div> <div class="def">Accessor CXML:PROXY-CHAINED-HANDLER</div>
@ -338,6 +331,17 @@
handler unmodified. handler unmodified.
</p> </p>
<p>
<div class="def">Accessor CXML:MAKE-NAMESPACE-NORMALIZER (next-handler)</div>
</p>
<p>
Return a SAX handler that performs <a
href="http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/namespaces-algorithms.html#normalizeDocumentAlgo">DOM
3-style namespace normalization</a> on Attribute lists in
<tt>start-element</tt> events before passing them on the next
handler.
</p>
<a name="xmls"/> <a name="xmls"/>
<h3>XMLS Compatibility</h3> <h3>XMLS Compatibility</h3>
<p> <p>
@ -649,19 +653,45 @@ NIL</pre>
<a name="dom"/> <a name="dom"/>
<h2>DOM Notes</h2> <h2>DOM Notes</h2>
<p> <p>
CXML implements the DOM Level 2 Core interfaces.&nbsp; Explaining CXML implements the DOM Level 2 Core interfaces.&nbsp; For details
DOM is better left to the <a on DOM, please refer to the <a
href="http://www.w3.org/TR/DOM-Level-2-Core/core.html">specification</a>, href="http://www.w3.org/TR/DOM-Level-2-Core/core.html">specification</a>.
so please refer to the official W3C documents for DOM.
</p> </p>
<p> <p>
However, there is no "standard" DOM mapping for Lisp.&nbsp; DOM However, note that there is no "standard" DOM mapping for Lisp.&nbsp; DOM
is <a is <a
href="http://www.w3.org/TR/DOM-Level-2-Core/idl-definitions.html">specified href="http://www.w3.org/TR/DOM-Level-2-Core/idl-definitions.html">specified
in CORBA IDL</a>, but it refrains from using object-oriented IDL in CORBA IDL</a>, but it refrains from using object-oriented IDL
features, allowing for a much more natural Lisp implemenation than features, allowing for a much more natural Lisp implemenation than
the the ordinary IDL/Lisp mapping would. the the ordinary IDL/Lisp mapping would. The mapping chosen for
cxml is explained below.
</p> </p>
<h3>Example</h3>
<pre>XML(97): (dom:node-type
(dom:document-element
(cxml:parse-file "~/test.xml" (dom:make-dom-builder))))
:ELEMENT</pre>
<h3>CXML-specific functions</h3>
<p>
<div class="def">Function DOM:MAKE-DOM-BUILDER ()</div>
Create a SAX handler which builds a DOM document.&nbsp; Example:
</p>
<pre>(cxml:parse-file "test.xml" (dom:make-dom-builder))</pre>
<p>
<div class="def">Function DOM:MAP-DOCUMENT (handler document &key include-xmlns-attributes include-default-values)</div>
Traverse a DOM document and call SAX functions as if an XML
representation of the document were processed by a SAX parser.
</p>
<p>
<tt>dom:map-document</tt> is the low-level building-block used to
implement the <a href="#serialization">serialization functions</a>
like <tt>unparse-document</tt>, but can also be used directly.
</p>
<h3>DOM/Lisp mapping</h3>
<p> <p>
Differences between CXML's DOM and the direct IDL/Lisp mapping: Differences between CXML's DOM and the direct IDL/Lisp mapping:
</p> </p>
@ -715,10 +745,5 @@ NIL</pre>
<tt>dom:do-node-list</tt>, which can be implemented portably. <tt>dom:do-node-list</tt>, which can be implemented portably.
</li> </li>
</ul> </ul>
<p>Example:</p>
<pre>XML(97): (dom:node-type
(dom:document-element
(cxml:parse-file "~/test.xml" (dom:make-dom-builder))))
:ELEMENT</pre>
</body> </body>
</html> </html>

View File

@ -50,18 +50,6 @@
(dom:namespace-uri node))) (dom:namespace-uri node)))
(setf (slot-value node 'prefix) newval)) (setf (slot-value node 'prefix) newval))
(defmethod (setf dom:prefix) :before (newval (node attribute))
(when (rod= (dom:node-name node) #"xmlns")
(dom-error :NAMESPACE_ERR "must not change xmlns attribute prefix")))
(defmethod (setf dom:prefix) :after (newval (node attribute))
(setf (slot-value node 'name)
(concatenate 'rod newval #":" (dom:local-name node))))
(defmethod (setf dom:prefix) :after (newval (node element))
(setf (slot-value node 'tag-name)
(concatenate 'rod newval #":" (dom:local-name node))))
(defclass document (node) (defclass document (node)
((doc-type :initarg :doc-type :reader dom:doctype) ((doc-type :initarg :doc-type :reader dom:doctype)
(dtd :initform nil :reader dtd) (dtd :initform nil :reader dtd)
@ -78,6 +66,14 @@
(owner-element :initarg :owner-element :reader dom:owner-element) (owner-element :initarg :owner-element :reader dom:owner-element)
(specified-p :initarg :specified-p :reader dom:specified))) (specified-p :initarg :specified-p :reader dom:specified)))
(defmethod (setf dom:prefix) :before (newval (node attribute))
(when (rod= (dom:node-name node) #"xmlns")
(dom-error :NAMESPACE_ERR "must not change xmlns attribute prefix")))
(defmethod (setf dom:prefix) :after (newval (node attribute))
(setf (slot-value node 'name)
(concatenate 'rod newval #":" (dom:local-name node))))
(defmethod print-object ((object attribute) stream) (defmethod print-object ((object attribute) stream)
(print-unreadable-object (object stream :type t :identity t) (print-unreadable-object (object stream :type t :identity t)
(format stream "~A=~S" (format stream "~A=~S"
@ -88,6 +84,10 @@
((tag-name :initarg :tag-name :reader dom:tag-name) ((tag-name :initarg :tag-name :reader dom:tag-name)
(attributes :initarg :attributes :reader dom:attributes))) (attributes :initarg :attributes :reader dom:attributes)))
(defmethod (setf dom:prefix) :after (newval (node element))
(setf (slot-value node 'tag-name)
(concatenate 'rod newval #":" (dom:local-name node))))
(defmethod print-object ((object element) stream) (defmethod print-object ((object element) stream)
(print-unreadable-object (object stream :type t :identity t) (print-unreadable-object (object stream :type t :identity t)
(princ (rod-string (dom:tag-name object)) stream))) (princ (rod-string (dom:tag-name object)) stream)))

View File

@ -1,6 +1,8 @@
(in-package :cxml) (in-package :cxml)
(defun %unparse-document (sink doc canonical) (defun %unparse-document (sink doc canonical)
(when sax:*namespace-processing*
(setf sink (cxml:make-namespace-normalizer sink)))
(dom:map-document sink (dom:map-document sink
doc doc
:include-doctype (if (and canonical (>= canonical 2)) :include-doctype (if (and canonical (>= canonical 2))

View File

@ -74,4 +74,5 @@
#:make-recoder #:make-recoder
#:sax-proxy #:sax-proxy
#:proxy-chained-handler)) #:proxy-chained-handler
#:make-namespace-normalizer))

View File

@ -29,6 +29,8 @@
(define-proxy-method sax:end-cdata ()) (define-proxy-method sax:end-cdata ())
(define-proxy-method sax:start-dtd (name public-id system-id)) (define-proxy-method sax:start-dtd (name public-id system-id))
(define-proxy-method sax:end-dtd ()) (define-proxy-method sax:end-dtd ())
(define-proxy-method sax:start-internal-subset ())
(define-proxy-method sax:end-internal-subset ())
(define-proxy-method sax:unparsed-entity-declaration (name pub sys not)) (define-proxy-method sax:unparsed-entity-declaration (name pub sys not))
(define-proxy-method sax:external-entity-declaration (kind name pub sys)) (define-proxy-method sax:external-entity-declaration (kind name pub sys))
(define-proxy-method sax:internal-entity-declaration (kind name value)) (define-proxy-method sax:internal-entity-declaration (kind name value))

130
xml/xmlns-normalizer.lisp Normal file
View File

@ -0,0 +1,130 @@
;;;; xmlns-normalizer.lisp -- DOM 3-style namespace normalization
;;;;
;;;; This file is part of the CXML parser, released under Lisp-LGPL.
;;;; See file COPYING for details.
;;;;
;;;; Copyright (c) 2005 David Lichteblau
;;;; Hier eine Variante des reichlich furchtbaren Algorithmus zur
;;;; Namespace-Normalisierung aus DOM 3 Core.[1]
;;;;
;;;; Gebraucht wir die Sache, weil Element- und Attributknoten in DOM
;;;; zwar ein Prefix-Attribut speichern, massgeblich fuer ihren Namespace
;;;; aber nur die URI sein soll. Und eine Anpassung der zugehoerigen
;;;; xmlns-Attribute findet bei Veraenderungen im DOM-Baum nicht statt,
;;;; bzw. wird dem Nutzer ueberlassen.
;;;;
;;;; Daher muss letztlich spaetestens beim Serialisieren eine
;;;; Namespace-Deklaration fuer die angegebene URI nachgetragen und das
;;;; Praefix ggf. umbenannt werden, damit am Ende doch etwas
;;;; Namespace-konformes heraus kommt.
;;;;
;;;; Und das nennen sie dann Namespace-Support.
;;;;
;;;; [1] http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/namespaces-algorithms.html#normalizeDocumentAlgo
(in-package :cxml)
(defclass namespace-normalizer (sax-proxy)
((xmlns-stack :initarg :xmlns-stack :accessor xmlns-stack)))
(defvar *xmlns-namespace* #"http://www.w3.org/2000/xmlns/")
(defun make-namespace-normalizer (chained-handler)
(make-instance 'namespace-normalizer
:xmlns-stack (list (mapcar (lambda (cons)
(make-xmlns-attribute (car cons) (cdr cons)))
*namespace-bindings*))
:chained-handler chained-handler))
(defun normalizer-find-prefix (handler prefix)
(block t
(dolist (bindings (xmlns-stack handler))
(dolist (attribute bindings)
(when (rod= (sax:attribute-local-name attribute) prefix)
(return-from t attribute))))))
(defun normalizer-find-uri (handler uri)
(block t
(dolist (bindings (xmlns-stack handler))
(dolist (attribute bindings)
(when (and (rod= (sax:attribute-value attribute) uri)
;; default-namespace interessiert uns nicht
(not (rod= (sax:attribute-qname attribute) #"xmlns")))
(return-from t attribute))))))
(defun make-xmlns-attribute (prefix uri)
(if prefix
(sax:make-attribute
:qname (concatenate 'rod #"xmlns:" prefix)
:namespace-uri *xmlns-namespace*
:local-name prefix
:value uri)
(sax:make-attribute
:qname #"xmlns"
:namespace-uri *xmlns-namespace*
:local-name #"xmlns"
:value uri)))
(defun rename-attribute (a new-prefix)
(setf (sax:attribute-qname a)
(concatenate 'rod new-prefix #":" (sax:attribute-local-name a))))
(defmethod sax:start-element
((handler namespace-normalizer) uri lname qname attrs)
(declare (ignore qname))
(when (null uri)
(setf uri #""))
(let ((normal-attrs '()))
(push nil (xmlns-stack handler))
(dolist (a attrs)
(if (rod= *xmlns-namespace* (sax:attribute-namespace-uri a))
(push a (car (xmlns-stack handler)))
(push a normal-attrs)))
(flet ((push-namespace (prefix uri)
(let ((new (make-xmlns-attribute prefix uri)))
(push new (car (xmlns-stack handler)))
(push new attrs))))
(multiple-value-bind (prefix local-name) (split-qname qname)
(setf lname local-name)
(let ((binding (normalizer-find-prefix handler prefix)))
(cond
((null binding)
(push-namespace prefix uri))
((rod= (sax:attribute-value binding) uri))
((member binding (car (xmlns-stack handler)))
(setf (sax:attribute-value binding) uri))
(t
(push-namespace prefix uri)))))
(dolist (a normal-attrs)
(let ((u (sax:attribute-namespace-uri a)))
(when u
(let* ((prefix (split-qname (sax:attribute-qname a)))
(prefix-binding
(when prefix
(normalizer-find-prefix handler prefix))))
(when (or (null prefix-binding)
(not (rod= (sax:attribute-value prefix-binding) u)))
(let ((uri-binding (normalizer-find-uri handler u)))
(cond
(uri-binding
(rename-attribute
a
(sax:attribute-local-name uri-binding)))
((null prefix-binding)
(push-namespace prefix u))
(t
(loop
for i from 1
for prefix = (rod (format nil "NS~D" i))
unless (normalizer-find-prefix handler prefix)
do
(push-namespace prefix u)
(rename-attribute a prefix)
(return))))))))))))
(sax:start-element (proxy-chained-handler handler) uri lname qname attrs))
(defmethod sax:end-element ((handler namespace-normalizer) uri lname qname)
(declare (ignore qname))
(pop (xmlns-stack handler))
(sax:end-element (proxy-chained-handler handler) (or uri #"") lname qname))