namespace normalization

This commit is contained in:
dlichteblau
2005-12-11 23:56:45 +00:00
parent 2b5b61baf1
commit 5363dadbb8
8 changed files with 205 additions and 39 deletions

View File

@ -90,7 +90,12 @@
<h2>Recent Changes</h2>
<p class="nomargin"><tt>rel-2005-xx-yy</tt></p>
<ul class="nomargin">
<li>Implemented DOM 2 Core.</li>
<li>
Implemented DOM 2 Core.
(A handler for DOM 3-style namespace normalization is provided and
used by default for serialization of DOM documents if namespace
support is enabled.)
</li>
<li>
Error handling overhaul: All syntax errors should now be
reported as instances of <tt>well-formedness-violation</tt>. We

View File

@ -68,6 +68,7 @@
(:file "unparse" :depends-on ("xml-parse"))
(:file "xmls-compat" :depends-on ("xml-parse"))
(:file "recoder" :depends-on ("xml-parse"))
(:file "xmlns-normalizer" :depends-on ("xml-parse"))
(:file "catalog" :depends-on ("xml-parse"))
(:file "sax-proxy" :depends-on ("xml-parse")))
:depends-on (:cxml-runes :puri :trivial-gray-streams))

View File

@ -155,12 +155,6 @@
(an URI object).
</p>
<p>
<div class="def">Function DOM:MAKE-DOM-BUILDER ()</div>
Create a SAX handler which builds a DOM document.&nbsp; Example:
</p>
<pre>(cxml:parse-file "test.xml" (dom:make-dom-builder))</pre>
<p>
<div class="def">Condition class CXML:XML-PARSE-ERROR ()</div>
Superclass of all conditions signalled by the CXML parser.
@ -183,7 +177,8 @@
<p>
<div class="def">Function CXML:UNPARSE-DOCUMENT (document stream &rest keys)</div>
<div class="def">Function CXML:UNPARSE-DOCUMENT-TO-OCTETS (document &rest keys) => vector</div>
Serialize a DOM document object.
Serialize a DOM document object. These convenience functions are
wrappers around <tt>dom:map-document</tt>.
</p>
<ul>
<li><tt>document</tt> -- a DOM document object</li>
@ -223,6 +218,10 @@
changes the document model and should only be used if whitespace
does not matter to the application.
</p>
<p>
If namespace support is enabled (the default), these functions use
a namespace normalizer (<tt>cxml:make-namespace-normalizer</tt>).
</p>
<p>
<tt>unparse-document-to-octets</tt> returns an <tt>(unsigned-byte
8)</tt> array, whereas <tt>unparse-document</tt> writes
@ -247,7 +246,7 @@
</p>
<p>
<div class="def">Macro CXML:WITH-XML-OUTPUT (sink &body body) => vector</div>
<div class="def">Macro CXML:WITH-XML-OUTPUT (sink &body body) => sink-specific result</div>
<div class="def">Macro CXML:WITH-ELEMENT (qname &body body) => result</div>
<div class="def">Function CXML:ATTRIBUTE (name value) => value</div>
<div class="def">Function CXML:TEXT (data) => data</div>
@ -319,12 +318,6 @@
(x (parse-dtd-file "~/test.dtd")))
(dom:map-document (cxml:make-validator x #"foo") d))</pre>
<p>
<div class="def">Function DOM:MAP-DOCUMENT (handler document &key include-xmlns-attributes include-default-values)</div>
Traverse a DOM document and call SAX functions as if an XML
representation of the document were processed by a SAX parser.
</p>
<p>
<div class="def">Class CXML:SAX-PROXY ()</div>
<div class="def">Accessor CXML:PROXY-CHAINED-HANDLER</div>
@ -338,6 +331,17 @@
handler unmodified.
</p>
<p>
<div class="def">Accessor CXML:MAKE-NAMESPACE-NORMALIZER (next-handler)</div>
</p>
<p>
Return a SAX handler that performs <a
href="http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/namespaces-algorithms.html#normalizeDocumentAlgo">DOM
3-style namespace normalization</a> on Attribute lists in
<tt>start-element</tt> events before passing them on the next
handler.
</p>
<a name="xmls"/>
<h3>XMLS Compatibility</h3>
<p>
@ -649,19 +653,45 @@ NIL</pre>
<a name="dom"/>
<h2>DOM Notes</h2>
<p>
CXML implements the DOM Level 2 Core interfaces.&nbsp; Explaining
DOM is better left to the <a
href="http://www.w3.org/TR/DOM-Level-2-Core/core.html">specification</a>,
so please refer to the official W3C documents for DOM.
CXML implements the DOM Level 2 Core interfaces.&nbsp; For details
on DOM, please refer to the <a
href="http://www.w3.org/TR/DOM-Level-2-Core/core.html">specification</a>.
</p>
<p>
However, there is no "standard" DOM mapping for Lisp.&nbsp; DOM
However, note that there is no "standard" DOM mapping for Lisp.&nbsp; DOM
is <a
href="http://www.w3.org/TR/DOM-Level-2-Core/idl-definitions.html">specified
in CORBA IDL</a>, but it refrains from using object-oriented IDL
features, allowing for a much more natural Lisp implemenation than
the the ordinary IDL/Lisp mapping would.
the the ordinary IDL/Lisp mapping would. The mapping chosen for
cxml is explained below.
</p>
<h3>Example</h3>
<pre>XML(97): (dom:node-type
(dom:document-element
(cxml:parse-file "~/test.xml" (dom:make-dom-builder))))
:ELEMENT</pre>
<h3>CXML-specific functions</h3>
<p>
<div class="def">Function DOM:MAKE-DOM-BUILDER ()</div>
Create a SAX handler which builds a DOM document.&nbsp; Example:
</p>
<pre>(cxml:parse-file "test.xml" (dom:make-dom-builder))</pre>
<p>
<div class="def">Function DOM:MAP-DOCUMENT (handler document &key include-xmlns-attributes include-default-values)</div>
Traverse a DOM document and call SAX functions as if an XML
representation of the document were processed by a SAX parser.
</p>
<p>
<tt>dom:map-document</tt> is the low-level building-block used to
implement the <a href="#serialization">serialization functions</a>
like <tt>unparse-document</tt>, but can also be used directly.
</p>
<h3>DOM/Lisp mapping</h3>
<p>
Differences between CXML's DOM and the direct IDL/Lisp mapping:
</p>
@ -715,10 +745,5 @@ NIL</pre>
<tt>dom:do-node-list</tt>, which can be implemented portably.
</li>
</ul>
<p>Example:</p>
<pre>XML(97): (dom:node-type
(dom:document-element
(cxml:parse-file "~/test.xml" (dom:make-dom-builder))))
:ELEMENT</pre>
</body>
</html>

View File

@ -50,18 +50,6 @@
(dom:namespace-uri node)))
(setf (slot-value node 'prefix) newval))
(defmethod (setf dom:prefix) :before (newval (node attribute))
(when (rod= (dom:node-name node) #"xmlns")
(dom-error :NAMESPACE_ERR "must not change xmlns attribute prefix")))
(defmethod (setf dom:prefix) :after (newval (node attribute))
(setf (slot-value node 'name)
(concatenate 'rod newval #":" (dom:local-name node))))
(defmethod (setf dom:prefix) :after (newval (node element))
(setf (slot-value node 'tag-name)
(concatenate 'rod newval #":" (dom:local-name node))))
(defclass document (node)
((doc-type :initarg :doc-type :reader dom:doctype)
(dtd :initform nil :reader dtd)
@ -78,6 +66,14 @@
(owner-element :initarg :owner-element :reader dom:owner-element)
(specified-p :initarg :specified-p :reader dom:specified)))
(defmethod (setf dom:prefix) :before (newval (node attribute))
(when (rod= (dom:node-name node) #"xmlns")
(dom-error :NAMESPACE_ERR "must not change xmlns attribute prefix")))
(defmethod (setf dom:prefix) :after (newval (node attribute))
(setf (slot-value node 'name)
(concatenate 'rod newval #":" (dom:local-name node))))
(defmethod print-object ((object attribute) stream)
(print-unreadable-object (object stream :type t :identity t)
(format stream "~A=~S"
@ -88,6 +84,10 @@
((tag-name :initarg :tag-name :reader dom:tag-name)
(attributes :initarg :attributes :reader dom:attributes)))
(defmethod (setf dom:prefix) :after (newval (node element))
(setf (slot-value node 'tag-name)
(concatenate 'rod newval #":" (dom:local-name node))))
(defmethod print-object ((object element) stream)
(print-unreadable-object (object stream :type t :identity t)
(princ (rod-string (dom:tag-name object)) stream)))

View File

@ -1,6 +1,8 @@
(in-package :cxml)
(defun %unparse-document (sink doc canonical)
(when sax:*namespace-processing*
(setf sink (cxml:make-namespace-normalizer sink)))
(dom:map-document sink
doc
:include-doctype (if (and canonical (>= canonical 2))

View File

@ -74,4 +74,5 @@
#:make-recoder
#:sax-proxy
#:proxy-chained-handler))
#:proxy-chained-handler
#:make-namespace-normalizer))

View File

@ -29,6 +29,8 @@
(define-proxy-method sax:end-cdata ())
(define-proxy-method sax:start-dtd (name public-id system-id))
(define-proxy-method sax:end-dtd ())
(define-proxy-method sax:start-internal-subset ())
(define-proxy-method sax:end-internal-subset ())
(define-proxy-method sax:unparsed-entity-declaration (name pub sys not))
(define-proxy-method sax:external-entity-declaration (kind name pub sys))
(define-proxy-method sax:internal-entity-declaration (kind name value))

130
xml/xmlns-normalizer.lisp Normal file
View File

@ -0,0 +1,130 @@
;;;; xmlns-normalizer.lisp -- DOM 3-style namespace normalization
;;;;
;;;; This file is part of the CXML parser, released under Lisp-LGPL.
;;;; See file COPYING for details.
;;;;
;;;; Copyright (c) 2005 David Lichteblau
;;;; Hier eine Variante des reichlich furchtbaren Algorithmus zur
;;;; Namespace-Normalisierung aus DOM 3 Core.[1]
;;;;
;;;; Gebraucht wir die Sache, weil Element- und Attributknoten in DOM
;;;; zwar ein Prefix-Attribut speichern, massgeblich fuer ihren Namespace
;;;; aber nur die URI sein soll. Und eine Anpassung der zugehoerigen
;;;; xmlns-Attribute findet bei Veraenderungen im DOM-Baum nicht statt,
;;;; bzw. wird dem Nutzer ueberlassen.
;;;;
;;;; Daher muss letztlich spaetestens beim Serialisieren eine
;;;; Namespace-Deklaration fuer die angegebene URI nachgetragen und das
;;;; Praefix ggf. umbenannt werden, damit am Ende doch etwas
;;;; Namespace-konformes heraus kommt.
;;;;
;;;; Und das nennen sie dann Namespace-Support.
;;;;
;;;; [1] http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/namespaces-algorithms.html#normalizeDocumentAlgo
(in-package :cxml)
(defclass namespace-normalizer (sax-proxy)
((xmlns-stack :initarg :xmlns-stack :accessor xmlns-stack)))
(defvar *xmlns-namespace* #"http://www.w3.org/2000/xmlns/")
(defun make-namespace-normalizer (chained-handler)
(make-instance 'namespace-normalizer
:xmlns-stack (list (mapcar (lambda (cons)
(make-xmlns-attribute (car cons) (cdr cons)))
*namespace-bindings*))
:chained-handler chained-handler))
(defun normalizer-find-prefix (handler prefix)
(block t
(dolist (bindings (xmlns-stack handler))
(dolist (attribute bindings)
(when (rod= (sax:attribute-local-name attribute) prefix)
(return-from t attribute))))))
(defun normalizer-find-uri (handler uri)
(block t
(dolist (bindings (xmlns-stack handler))
(dolist (attribute bindings)
(when (and (rod= (sax:attribute-value attribute) uri)
;; default-namespace interessiert uns nicht
(not (rod= (sax:attribute-qname attribute) #"xmlns")))
(return-from t attribute))))))
(defun make-xmlns-attribute (prefix uri)
(if prefix
(sax:make-attribute
:qname (concatenate 'rod #"xmlns:" prefix)
:namespace-uri *xmlns-namespace*
:local-name prefix
:value uri)
(sax:make-attribute
:qname #"xmlns"
:namespace-uri *xmlns-namespace*
:local-name #"xmlns"
:value uri)))
(defun rename-attribute (a new-prefix)
(setf (sax:attribute-qname a)
(concatenate 'rod new-prefix #":" (sax:attribute-local-name a))))
(defmethod sax:start-element
((handler namespace-normalizer) uri lname qname attrs)
(declare (ignore qname))
(when (null uri)
(setf uri #""))
(let ((normal-attrs '()))
(push nil (xmlns-stack handler))
(dolist (a attrs)
(if (rod= *xmlns-namespace* (sax:attribute-namespace-uri a))
(push a (car (xmlns-stack handler)))
(push a normal-attrs)))
(flet ((push-namespace (prefix uri)
(let ((new (make-xmlns-attribute prefix uri)))
(push new (car (xmlns-stack handler)))
(push new attrs))))
(multiple-value-bind (prefix local-name) (split-qname qname)
(setf lname local-name)
(let ((binding (normalizer-find-prefix handler prefix)))
(cond
((null binding)
(push-namespace prefix uri))
((rod= (sax:attribute-value binding) uri))
((member binding (car (xmlns-stack handler)))
(setf (sax:attribute-value binding) uri))
(t
(push-namespace prefix uri)))))
(dolist (a normal-attrs)
(let ((u (sax:attribute-namespace-uri a)))
(when u
(let* ((prefix (split-qname (sax:attribute-qname a)))
(prefix-binding
(when prefix
(normalizer-find-prefix handler prefix))))
(when (or (null prefix-binding)
(not (rod= (sax:attribute-value prefix-binding) u)))
(let ((uri-binding (normalizer-find-uri handler u)))
(cond
(uri-binding
(rename-attribute
a
(sax:attribute-local-name uri-binding)))
((null prefix-binding)
(push-namespace prefix u))
(t
(loop
for i from 1
for prefix = (rod (format nil "NS~D" i))
unless (normalizer-find-prefix handler prefix)
do
(push-namespace prefix u)
(rename-attribute a prefix)
(return))))))))))))
(sax:start-element (proxy-chained-handler handler) uri lname qname attrs))
(defmethod sax:end-element ((handler namespace-normalizer) uri lname qname)
(declare (ignore qname))
(pop (xmlns-stack handler))
(sax:end-element (proxy-chained-handler handler) (or uri #"") lname qname))