Output encoding support, using Babel

This commit is contained in:
David Lichteblau
2008-04-13 16:48:24 +02:00
committed by David Lichteblau
parent 6a4a3be00f
commit 4c11d5b68a
5 changed files with 324 additions and 217 deletions

View File

@ -61,21 +61,33 @@
</li> </li>
</ul> </ul>
<a name="changes"/> <a name="changes"/>
<h3>Recent Changes</h3> <h3>Recent Changes</h3>
<p class="nomargin"><tt>rel-2007-10-21</tt></p> <div style="background-color: #f7f7f7;
width: 60%;
border: solid #9c0000;
margin: 0em 2pt 1em 2em;
padding: 1em">
cxml and closure-common are now available from <b>git</b> instead of CVS.
Please refer to the <a href="installation.html#download">
installation instructions</a> for details.
</div>
<p class="nomargin"><tt>rel-2008-xx-yy</tt></p>
<ul class="nomargin"> <ul class="nomargin">
<li>
Support for user-specified output encodings
using <a href="http://common-lisp.net/project/babel/">Babel</a>.
</li>
<li> <li>
Lisps using full 21 bit code points as characters are now fully Lisps using full 21 bit code points as characters are now fully
supported (including SBCL and Clozure CL) addition to the supported (including SBCL and Clozure CL) addition to the
existing support for 16 bit characters using UTF-16 (including existing support for 16 bit characters using UTF-16 (including
Allegro and LispWorks). The feature <tt>rune-is-utf-16</tt> is Allegro and LispWorks). The feature <tt>rune-is-utf-16</tt> is
now set on 16 bit Lisps.<br/><br/> Make sure to update and now set on 16 bit Lisps. Make sure to update and
recompile both cxml and closure-common when upgrading. recompile both cxml and closure-common when upgrading.
</li> </li>
<li> <li>
New sink slot omit-xml-declaration-p; functions cxml:unescaped New sink slot sink-omit-xml-declaration-p; functions cxml:unescaped
and sax:unescaped. and sax:unescaped.
</li> </li>
<li> <li>
@ -83,22 +95,23 @@
</li> </li>
<li> <li>
Fixed various DTD serialization bugs. Fixed xmls compatibility Fixed various DTD serialization bugs. Fixed xmls compatibility
bugs. Fixed variable names in with-source. Fixed bugs. Fixed variable names in with-source. Fixed klacks-error
klacks-error export. Consistently use strings as base URIs. export. Consistently use strings as base URIs. Fixed PARSE for
Fixed PARSE for non-file-streams. non-file-streams. Added support for the UTF-8 "BOM", thanks to
Ivan Shvedunov.
</li> </li>
</ul> </ul>
<br/><br/>
<div style="background-color: #f7f7f7; <div style="background-color: #f7f7f7;
width: 60%; width: 60%;
border: solid #9c0000; border: solid #9c0000;
margin: 0em 2pt 1em 2em; margin: 0em 2pt 1em 2em;
padding: 1em"> padding: 1em">
Runes have now been moved into Runes have been moved into a <b>separate project</b>,
a <b>separate CVS module</b> unter the named <b>closure-common</b>. Releases will be available
name <b>closure-common</b>. Releases will be available
as <b>separate tarballs</b> in the download directory. Please as <b>separate tarballs</b> in the download directory. Please
refer to the <a href="installation.html#download"> refer to the <a href="installation.html#download"> installation
installation instructions</a> for details. instructions</a> for details.
</div> </div>
<p class="nomargin"><tt>rel-2007-10-21</tt></p> <p class="nomargin"><tt>rel-2007-10-21</tt></p>
<ul class="nomargin"> <ul class="nomargin">

View File

@ -3,19 +3,21 @@
<a name="download"/> <a name="download"/>
<h2>Download</h2> <h2>Download</h2>
<ul> <p>
<li> Download <a href="http://common-lisp.net/project/cxml/download/">tarballs</a> for both cxml itself and closure-common.
<div>Download <a href="http://common-lisp.net/project/cxml/download/">tarballs</a> for both cxml itself and closure-common.</div> </p>
</li> <p>
<li> Or get it from git:
<div> </p>
Or use anonymous CVS (<a href="http://common-lisp.net/cgi-bin/viewcvs.cgi/cxml/?cvsroot=cxml">browse</a>): <p>
<pre>export CVSROOT=:pserver:anonymous:anonymous@common-lisp.net:/project/cxml/cvsroot <tt>git clone git://repo.or.cz/cxml.git</tt>
cvs co cxml (<a href="http://repo.or.cz/w/cxml.git">gitweb</a>)
cvs co closure-common</pre> <br/>
</div>
</li> <tt>git clone git://repo.or.cz/closure-common.git</tt>
</ul> (<a href="http://repo.or.cz/w/closure-common.git">gitweb</a>)
<br/>
</p>
<a name="implementations"/> <a name="implementations"/>
<h2>Implementation-specific notes</h2> <h2>Implementation-specific notes</h2>
@ -42,12 +44,20 @@ cvs co closure-common</pre>
<p> <p>
<b>Prerequisites.</b> <b>Prerequisites.</b>
CXML needs <a href="http://www.cliki.net/Puri">puri</a> and CXML needs:
<a href="http://www.common-lisp.net/project/cl-plus-ssl/#trivial-gray-streams">trivial-gray-streams</a>. <ul>
In addition, <li><a href="http://www.cliki.net/Puri">puri</a></li>
<a href="http://www.cliki.net/closure-common">closure-common</a> <li><a href="http://www.common-lisp.net/project/cl-plus-ssl/#trivial-gray-streams">trivial-gray-streams</a></li>
is required, which is a separate module in cxml CVS (see above for <li>
check-out instructions). <a href="http://common-lisp.net/project/babel/">Babel</a>
(on Unicode-capable Lisps only)
</li>
<li>
<a href="http://www.cliki.net/closure-common">closure-common</a>
is maintained together with cxml but available as a separate
download (see above for check-out instructions).
</li>
</ul>
</p> </p>
<p> <p>

View File

@ -297,6 +297,14 @@
<li> <li>
<tt>indentation</tt> -- indentation level. An integer or <tt>nil</tt>. <tt>indentation</tt> -- indentation level. An integer or <tt>nil</tt>.
</li> </li>
<li>
<tt>encoding</tt> -- the character encoding to use. A string or
keyword. <tt>nil</tt> is also allowed and means UTF-8.
</li>
<li>
<tt>omit-xml-declaration-p</tt> -- Boolean. Don't write an XML
declaration.
</li>
</ul> </ul>
<p> <p>
The following <tt>canonical</tt> values are allowed: The following <tt>canonical</tt> values are allowed:

View File

@ -50,7 +50,8 @@
;; #-rune-is-character ;; #-rune-is-character
#:make-character-stream-sink/utf8 #:make-character-stream-sink/utf8
#:omit-xml-declaration-p #:sink-encoding
#:sink-omit-xml-declaration-p
#:with-xml-output #:with-xml-output
#:with-output-sink #:with-output-sink

View File

@ -10,6 +10,7 @@
;;; (c) copyright 1999 by Gilbert Baumann ;;; (c) copyright 1999 by Gilbert Baumann
;;; (c) copyright 2004 by knowledgeTools Int. GmbH ;;; (c) copyright 2004 by knowledgeTools Int. GmbH
;;; (c) copyright 2004 by David Lichteblau (for headcraft.de) ;;; (c) copyright 2004 by David Lichteblau (for headcraft.de)
;;; (c) copyright 2005-2008 by David Lichteblau
;;; This library is free software; you can redistribute it and/or ;;; This library is free software; you can redistribute it and/or
;;; modify it under the terms of the GNU Library General Public ;;; modify it under the terms of the GNU Library General Public
@ -81,9 +82,10 @@
(have-doctype :initform nil :accessor have-doctype) (have-doctype :initform nil :accessor have-doctype)
(have-internal-subset :initform nil :accessor have-internal-subset) (have-internal-subset :initform nil :accessor have-internal-subset)
(stack :initform nil :accessor stack) (stack :initform nil :accessor stack)
(omit-xml-declaration-p :initform nil (sink-omit-xml-declaration-p :initform nil
:initarg :omit-xml-declaration-p :initarg :omit-xml-declaration-p
:accessor omit-xml-declaration-p))) :accessor sink-omit-xml-declaration-p)
(encoding :initarg :encoding :reader sink-encoding)))
#-rune-is-character #-rune-is-character
(defmethod hax:%want-strings-p ((handler sink)) (defmethod hax:%want-strings-p ((handler sink))
@ -95,7 +97,14 @@
(unless (member (canonical instance) '(nil 1 2)) (unless (member (canonical instance) '(nil 1 2))
(error "Invalid canonical form: ~A" (canonical instance))) (error "Invalid canonical form: ~A" (canonical instance)))
(when (and (canonical instance) (indentation instance)) (when (and (canonical instance) (indentation instance))
(error "Cannot indent XML in canonical mode"))) (error "Cannot indent XML in canonical mode"))
(when (and (canonical instance)
(not (eq (ystream-encoding (sink-ystream instance)) :utf-8)))
(error "Cannot use non-UTF-8 encoding in canonical mode"))
(when (let ((encoding (ystream-encoding (sink-ystream instance))))
(and (not (symbolp encoding))
(eq (babel-encodings:enc-name encoding) :utf-16)))
(sink-write-rune #/U+FEFF instance)))
(defun make-buffer (&key (element-type '(unsigned-byte 8))) (defun make-buffer (&key (element-type '(unsigned-byte 8)))
(make-array 1 (make-array 1
@ -103,14 +112,41 @@
:adjustable t :adjustable t
:fill-pointer 0)) :fill-pointer 0))
(defun find-output-encoding (name)
(when (stringp name)
(setf name (find-symbol (string-upcase name) :keyword)))
(cond
((null name)
(warn "Unknown encoding ~A, falling back to UTF-8" name)
:utf-8)
((find name '(:utf-8 :utf_8 :utf8))
:utf-8)
#-rune-is-character
(t
(warn "Unknown encoding ~A, falling back to UTF-8" name)
:utf-8)
#+rune-is-character
(t
(handler-case
(babel-encodings:get-character-encoding name)
(error ()
(warn "Unknown encoding ~A, falling back to UTF-8" name)
:utf-8)))))
;; bisschen unschoen hier die ganze api zu duplizieren, aber die ;; bisschen unschoen hier die ganze api zu duplizieren, aber die
;; ystreams sind noch undokumentiert ;; ystreams sind noch undokumentiert
(macrolet ((define-maker (make-sink make-ystream &rest args) (macrolet ((define-maker (make-sink make-ystream &rest args)
`(defun ,make-sink (,@args &rest initargs) `(defun ,make-sink (,@args &rest initargs
(apply #'make-instance &key encoding &allow-other-keys)
'sink (let* ((encoding (or encoding "UTF-8"))
:ystream (,make-ystream ,@args) (ystream (,make-ystream ,@args)))
initargs)))) (setf (ystream-encoding ystream)
(find-output-encoding encoding))
(apply #'make-instance
'sink
:ystream ystream
:encoding encoding
initargs)))))
(define-maker make-octet-vector-sink make-octet-vector-ystream) (define-maker make-octet-vector-sink make-octet-vector-ystream)
(define-maker make-octet-stream-sink make-octet-stream-ystream stream) (define-maker make-octet-stream-sink make-octet-stream-ystream stream)
(define-maker make-rod-sink make-rod-ystream) (define-maker make-rod-sink make-rod-ystream)
@ -138,9 +174,11 @@
(defmethod sax:start-document ((sink sink)) (defmethod sax:start-document ((sink sink))
(unless (or (canonical sink) (unless (or (canonical sink)
(omit-xml-declaration-p sink)) (sink-omit-xml-declaration-p sink))
(%write-rod #"<?xml version=\"1.0\" encoding=\"UTF-8\"?>" sink) (sink-write-rod #"<?xml version=\"1.0\" encoding=\"" sink)
(%write-rune #/U+000A sink))) (sink-write-rod (rod (sink-encoding sink)) sink)
(sink-write-rod #"\"?>" sink)
(sink-write-rune #/U+000A sink)))
(defmethod sax:start-dtd ((sink sink) name public-id system-id) (defmethod sax:start-dtd ((sink sink) name public-id system-id)
(setf (name-for-dtd sink) name) (setf (name-for-dtd sink) name)
@ -150,50 +188,50 @@
(defun ensure-doctype (sink &optional public-id system-id) (defun ensure-doctype (sink &optional public-id system-id)
(unless (have-doctype sink) (unless (have-doctype sink)
(setf (have-doctype sink) t) (setf (have-doctype sink) t)
(%write-rod #"<!DOCTYPE " sink) (sink-write-rod #"<!DOCTYPE " sink)
(%write-rod (name-for-dtd sink) sink) (sink-write-rod (name-for-dtd sink) sink)
(cond (cond
((not (zerop (length public-id))) ((not (zerop (length public-id)))
(%write-rod #" PUBLIC \"" sink) (sink-write-rod #" PUBLIC \"" sink)
(unparse-string public-id sink) (sink-write-escapable-rod public-id sink)
(%write-rod #"\" \"" sink) (sink-write-rod #"\" \"" sink)
(unparse-string system-id sink) (sink-write-escapable-rod system-id sink)
(%write-rod #"\"" sink)) (sink-write-rod #"\"" sink))
((not (zerop (length system-id))) ((not (zerop (length system-id)))
(%write-rod #" SYSTEM \"" sink) (sink-write-rod #" SYSTEM \"" sink)
(unparse-string system-id sink) (sink-write-escapable-rod system-id sink)
(%write-rod #"\"" sink))))) (sink-write-rod #"\"" sink)))))
(defmethod sax:start-internal-subset ((sink sink)) (defmethod sax:start-internal-subset ((sink sink))
(when (have-internal-subset sink) (when (have-internal-subset sink)
(error "duplicate internal subset")) (error "duplicate internal subset"))
(setf (have-internal-subset sink) t) (setf (have-internal-subset sink) t)
(ensure-doctype sink) (ensure-doctype sink)
(%write-rod #" [" sink) (sink-write-rod #" [" sink)
(%write-rune #/U+000A sink)) (sink-write-rune #/U+000A sink))
(defmethod sax:end-internal-subset ((sink sink)) (defmethod sax:end-internal-subset ((sink sink))
(ensure-doctype sink) (ensure-doctype sink)
(%write-rod #"]" sink)) (sink-write-rod #"]" sink))
(defmethod sax:unparsed-internal-subset ((sink sink) str) (defmethod sax:unparsed-internal-subset ((sink sink) str)
(when (have-internal-subset sink) (when (have-internal-subset sink)
(error "duplicate internal subset")) (error "duplicate internal subset"))
(setf (have-internal-subset sink) t) (setf (have-internal-subset sink) t)
(ensure-doctype sink) (ensure-doctype sink)
(%write-rod #" [" sink) (sink-write-rod #" [" sink)
(%write-rune #/U+000A sink) (sink-write-rune #/U+000A sink)
(%write-rod str sink) (sink-write-rod str sink)
(%write-rod #"]" sink)) (sink-write-rod #"]" sink))
;; for the benefit of the XML test suite, prefer ' over " ;; for the benefit of the XML test suite, prefer ' over "
(defun write-quoted-rod (x sink) (defun write-quoted-rod (x sink)
(let ((q (if (find #/' x) #/" #/' (let ((q (if (find #/' x) #/" #/'
;; '" (thanks you Emacs indentation, the if ends here) ;; '" (thanks you Emacs indentation, the if ends here)
))) )))
(%write-rune q sink) (sink-write-rune q sink)
(%write-rod x sink) (sink-write-rod x sink)
(%write-rune q sink))) (sink-write-rune q sink)))
(defmethod sax:notation-declaration ((sink sink) name public-id system-id) (defmethod sax:notation-declaration ((sink sink) name public-id system-id)
(let ((prev (previous-notation sink))) (let ((prev (previous-notation sink)))
@ -202,165 +240,165 @@
(not (rod< prev name))) (not (rod< prev name)))
(error "misordered notations; cannot unparse canonically")) (error "misordered notations; cannot unparse canonically"))
(setf (previous-notation sink) name)) (setf (previous-notation sink) name))
(%write-rod #"<!NOTATION " sink) (sink-write-rod #"<!NOTATION " sink)
(%write-rod name sink) (sink-write-rod name sink)
(cond (cond
((zerop (length public-id)) ((zerop (length public-id))
(%write-rod #" SYSTEM " sink) (sink-write-rod #" SYSTEM " sink)
(write-quoted-rod system-id sink)) (write-quoted-rod system-id sink))
((zerop (length system-id)) ((zerop (length system-id))
(%write-rod #" PUBLIC " sink) (sink-write-rod #" PUBLIC " sink)
(write-quoted-rod public-id sink)) (write-quoted-rod public-id sink))
(t (t
(%write-rod #" PUBLIC " sink) (sink-write-rod #" PUBLIC " sink)
(write-quoted-rod public-id sink) (write-quoted-rod public-id sink)
(%write-rod #" " sink) (sink-write-rod #" " sink)
(write-quoted-rod system-id sink))) (write-quoted-rod system-id sink)))
(%write-rune #/> sink) (sink-write-rune #/> sink)
(%write-rune #/U+000A sink)) (sink-write-rune #/U+000A sink))
(defmethod sax:unparsed-entity-declaration (defmethod sax:unparsed-entity-declaration
((sink sink) name public-id system-id notation-name) ((sink sink) name public-id system-id notation-name)
(unless (and (canonical sink) (< (canonical sink) 3)) (unless (and (canonical sink) (< (canonical sink) 3))
(%write-rod #"<!ENTITY " sink) (sink-write-rod #"<!ENTITY " sink)
(%write-rod name sink) (sink-write-rod name sink)
(cond (cond
((zerop (length public-id)) ((zerop (length public-id))
(%write-rod #" SYSTEM " sink) (sink-write-rod #" SYSTEM " sink)
(write-quoted-rod system-id sink)) (write-quoted-rod system-id sink))
((zerop (length system-id)) ((zerop (length system-id))
(%write-rod #" PUBLIC " sink) (sink-write-rod #" PUBLIC " sink)
(write-quoted-rod public-id sink)) (write-quoted-rod public-id sink))
(t (t
(%write-rod #" PUBLIC " sink) (sink-write-rod #" PUBLIC " sink)
(write-quoted-rod public-id sink) (write-quoted-rod public-id sink)
(%write-rod #" " sink) (sink-write-rod #" " sink)
(write-quoted-rod system-id sink))) (write-quoted-rod system-id sink)))
(%write-rod #" NDATA " sink) (sink-write-rod #" NDATA " sink)
(%write-rod notation-name sink) (sink-write-rod notation-name sink)
(%write-rune #/> sink) (sink-write-rune #/> sink)
(%write-rune #/U+000A sink))) (sink-write-rune #/U+000A sink)))
(defmethod sax:external-entity-declaration (defmethod sax:external-entity-declaration
((sink sink) kind name public-id system-id) ((sink sink) kind name public-id system-id)
(when (canonical sink) (when (canonical sink)
(error "cannot serialize parsed entities in canonical mode")) (error "cannot serialize parsed entities in canonical mode"))
(%write-rod #"<!ENTITY " sink) (sink-write-rod #"<!ENTITY " sink)
(when (eq kind :parameter) (when (eq kind :parameter)
(%write-rod #" % " sink)) (sink-write-rod #" % " sink))
(%write-rod name sink) (sink-write-rod name sink)
(cond (cond
((zerop (length public-id)) ((zerop (length public-id))
(%write-rod #" SYSTEM " sink) (sink-write-rod #" SYSTEM " sink)
(write-quoted-rod system-id sink)) (write-quoted-rod system-id sink))
((zerop (length system-id)) ((zerop (length system-id))
(%write-rod #" PUBLIC " sink) (sink-write-rod #" PUBLIC " sink)
(write-quoted-rod public-id sink)) (write-quoted-rod public-id sink))
(t (t
(%write-rod #" PUBLIC " sink) (sink-write-rod #" PUBLIC " sink)
(write-quoted-rod public-id sink) (write-quoted-rod public-id sink)
(%write-rod #" " sink) (sink-write-rod #" " sink)
(write-quoted-rod system-id sink))) (write-quoted-rod system-id sink)))
(%write-rune #/> sink) (sink-write-rune #/> sink)
(%write-rune #/U+000A sink)) (sink-write-rune #/U+000A sink))
(defmethod sax:internal-entity-declaration ((sink sink) kind name value) (defmethod sax:internal-entity-declaration ((sink sink) kind name value)
(when (canonical sink) (when (canonical sink)
(error "cannot serialize parsed entities in canonical mode")) (error "cannot serialize parsed entities in canonical mode"))
(%write-rod #"<!ENTITY " sink) (sink-write-rod #"<!ENTITY " sink)
(when (eq kind :parameter) (when (eq kind :parameter)
(%write-rod #" % " sink)) (sink-write-rod #" % " sink))
(%write-rod name sink) (sink-write-rod name sink)
(%write-rune #/U+0020 sink) (sink-write-rune #/U+0020 sink)
(%write-rune #/\" sink) (sink-write-rune #/\" sink)
(unparse-dtd-string value sink) (sink-write-escapable-rod/dtd value sink)
(%write-rune #/\" sink) (sink-write-rune #/\" sink)
(%write-rune #/> sink) (sink-write-rune #/> sink)
(%write-rune #/U+000A sink)) (sink-write-rune #/U+000A sink))
(defmethod sax:element-declaration ((sink sink) name model) (defmethod sax:element-declaration ((sink sink) name model)
(when (canonical sink) (when (canonical sink)
(error "cannot serialize element type declarations in canonical mode")) (error "cannot serialize element type declarations in canonical mode"))
(%write-rod #"<!ELEMENT " sink) (sink-write-rod #"<!ELEMENT " sink)
(%write-rod name sink) (sink-write-rod name sink)
(%write-rune #/U+0020 sink) (sink-write-rune #/U+0020 sink)
(labels ((walk (m) (labels ((walk (m)
(cond (cond
((eq m :EMPTY) ((eq m :EMPTY)
(%write-rod "EMPTY" sink)) (sink-write-rod "EMPTY" sink))
((eq m :PCDATA) ((eq m :PCDATA)
(%write-rod "#PCDATA" sink)) (sink-write-rod "#PCDATA" sink))
((eq m :ANY) ((eq m :ANY)
(%write-rod "ANY" sink)) (sink-write-rod "ANY" sink))
((atom m) ((atom m)
(unparse-string m sink)) (sink-write-escapable-rod m sink))
(t (t
(ecase (car m) (ecase (car m)
(and (and
(%write-rune #/\( sink) (sink-write-rune #/\( sink)
(loop for (n . rest) on (cdr m) do (loop for (n . rest) on (cdr m) do
(walk n) (walk n)
(when rest (when rest
(%write-rune #\, sink))) (sink-write-rune #\, sink)))
(%write-rune #/\) sink)) (sink-write-rune #/\) sink))
(or (or
(%write-rune #/\( sink) (sink-write-rune #/\( sink)
(loop for (n . rest) on (cdr m) do (loop for (n . rest) on (cdr m) do
(walk n) (walk n)
(when rest (when rest
(%write-rune #\| sink))) (sink-write-rune #\| sink)))
(%write-rune #/\) sink)) (sink-write-rune #/\) sink))
(* (*
(walk (second m)) (walk (second m))
(%write-rune #/* sink)) (sink-write-rune #/* sink))
(+ (+
(walk (second m)) (walk (second m))
(%write-rune #/+ sink)) (sink-write-rune #/+ sink))
(? (?
(walk (second m)) (walk (second m))
(%write-rune #/? sink))))))) (sink-write-rune #/? sink)))))))
(walk model)) (walk model))
(%write-rune #/> sink) (sink-write-rune #/> sink)
(%write-rune #/U+000A sink)) (sink-write-rune #/U+000A sink))
(defmethod sax:attribute-declaration ((sink sink) ename aname type default) (defmethod sax:attribute-declaration ((sink sink) ename aname type default)
(when (canonical sink) (when (canonical sink)
(error "cannot serialize attribute type declarations in canonical mode")) (error "cannot serialize attribute type declarations in canonical mode"))
(%write-rod #"<!ATTLIST " sink) (sink-write-rod #"<!ATTLIST " sink)
(%write-rod ename sink) (sink-write-rod ename sink)
(%write-rune #/U+0020 sink) (sink-write-rune #/U+0020 sink)
(%write-rod aname sink) (sink-write-rod aname sink)
(%write-rune #/U+0020 sink) (sink-write-rune #/U+0020 sink)
(cond (cond
((atom type) ((atom type)
(%write-rod (rod (string-upcase (symbol-name type))) sink)) (sink-write-rod (rod (string-upcase (symbol-name type))) sink))
(t (t
(when (eq :NOTATION (car type)) (when (eq :NOTATION (car type))
(%write-rod #"NOTATION " sink)) (sink-write-rod #"NOTATION " sink))
(%write-rune #/\( sink) (sink-write-rune #/\( sink)
(loop for (n . rest) on (cdr type) do (loop for (n . rest) on (cdr type) do
(%write-rod n sink) (sink-write-rod n sink)
(when rest (when rest
(%write-rune #\| sink))) (sink-write-rune #\| sink)))
(%write-rune #/\) sink))) (sink-write-rune #/\) sink)))
(%write-rune #/U+0020 sink) (sink-write-rune #/U+0020 sink)
(cond (cond
((atom default) ((atom default)
(%write-rune #/# sink) (sink-write-rune #/# sink)
(%write-rod (rod (string-upcase (symbol-name default))) sink)) (sink-write-rod (rod (string-upcase (symbol-name default))) sink))
(t (t
(when (eq :FIXED (car default)) (when (eq :FIXED (car default))
(%write-rod #"#FIXED " sink)) (sink-write-rod #"#FIXED " sink))
(%write-rune #/\" sink) (sink-write-rune #/\" sink)
(unparse-string (second default) sink) (sink-write-escapable-rod (second default) sink)
(%write-rune #/\" sink))) (sink-write-rune #/\" sink)))
(%write-rune #/> sink) (sink-write-rune #/> sink)
(%write-rune #/U+000A sink)) (sink-write-rune #/U+000A sink))
(defmethod sax:end-dtd ((sink sink)) (defmethod sax:end-dtd ((sink sink))
(when (have-doctype sink) (when (have-doctype sink)
(%write-rod #">" sink) (sink-write-rod #">" sink)
(%write-rune #/U+000A sink))) (sink-write-rune #/U+000A sink)))
;;;; elements ;;;; elements
@ -372,14 +410,14 @@
(defun sink-fresh-line (sink) (defun sink-fresh-line (sink)
(unless (zerop (ystream-column (sink-ystream sink))) (unless (zerop (ystream-column (sink-ystream sink)))
(%write-rune #/U+000A sink) ;newline (sink-write-rune #/U+000A sink) ;newline
(indent sink))) (indent sink)))
(defun maybe-close-tag (sink) (defun maybe-close-tag (sink)
(let ((tag (car (stack sink)))) (let ((tag (car (stack sink))))
(when (and (tag-p tag) (not (tag-have-gt tag))) (when (and (tag-p tag) (not (tag-have-gt tag)))
(setf (tag-have-gt tag) t) (setf (tag-have-gt tag) t)
(%write-rune #/> sink)))) (sink-write-rune #/> sink))))
(defmethod sax:start-element (defmethod sax:start-element
((sink sink) namespace-uri local-name qname attributes) ((sink sink) namespace-uri local-name qname attributes)
@ -391,19 +429,21 @@
(when (indentation sink) (when (indentation sink)
(sink-fresh-line sink) (sink-fresh-line sink)
(start-indentation-block sink)) (start-indentation-block sink))
(%write-rune #/< sink) (sink-write-rune #/< sink)
(%write-rod qname sink) (sink-write-rod qname sink)
(dolist (a (if (canonical sink) (dolist (a (if (canonical sink)
(sort (copy-list attributes) (sort (copy-list attributes)
#'rod< #'rod<
:key #'sax:attribute-qname) :key #'sax:attribute-qname)
attributes)) attributes))
(%write-rune #/space sink) (sink-write-rune #/space sink)
(%write-rod (sax:attribute-qname a) sink) (sink-write-rod (sax:attribute-qname a) sink)
(%write-rune #/= sink) (sink-write-rune #/= sink)
(%write-rune #/\" sink) (sink-write-rune #/\" sink)
(unparse-string (sax:attribute-value a) sink) (if (canonical sink)
(%write-rune #/\" sink)) (sink-write-escapable-rod/canonical (sax:attribute-value a) sink)
(sink-write-escapable-rod/attribute (sax:attribute-value a) sink))
(sink-write-rune #/\" sink))
(when (canonical sink) (when (canonical sink)
(maybe-close-tag sink))) (maybe-close-tag sink)))
@ -422,24 +462,24 @@
(sink-fresh-line sink))) (sink-fresh-line sink)))
(cond (cond
((tag-have-gt tag) ((tag-have-gt tag)
(%write-rod '#.(string-rod "</") sink) (sink-write-rod '#.(string-rod "</") sink)
(%write-rod qname sink) (sink-write-rod qname sink)
(%write-rod '#.(string-rod ">") sink)) (sink-write-rod '#.(string-rod ">") sink))
(t (t
(%write-rod #"/>" sink))))) (sink-write-rod #"/>" sink)))))
(defmethod sax:processing-instruction ((sink sink) target data) (defmethod sax:processing-instruction ((sink sink) target data)
(maybe-close-tag sink) (maybe-close-tag sink)
(unless (rod-equal target '#.(string-rod "xml")) (unless (rod-equal target '#.(string-rod "xml"))
(%write-rod '#.(string-rod "<?") sink) (sink-write-rod '#.(string-rod "<?") sink)
(%write-rod target sink) (sink-write-rod target sink)
(cond (cond
((plusp (length data)) ((plusp (length data))
(%write-rune #/space sink) (sink-write-rune #/space sink)
(%write-rod data sink)) (sink-write-rod data sink))
((canonical sink) ((canonical sink)
(%write-rune #/space sink))) (sink-write-rune #/space sink)))
(%write-rod '#.(string-rod "?>") sink))) (sink-write-rod '#.(string-rod "?>") sink)))
(defmethod sax:start-cdata ((sink sink)) (defmethod sax:start-cdata ((sink sink))
(maybe-close-tag sink) (maybe-close-tag sink)
@ -453,29 +493,29 @@
(not (search #"]]" data))) (not (search #"]]" data)))
(when (indentation sink) (when (indentation sink)
(sink-fresh-line sink)) (sink-fresh-line sink))
(%write-rod #"<![CDATA[" sink) (sink-write-rod #"<![CDATA[" sink)
;; XXX signal error if body is unprintable? ;; XXX signal error if body is unprintable?
(map nil (lambda (c) (%write-rune c sink)) data) ;; zzz no, in that case, split into multiple CDATA sections
(%write-rod #"]]>" sink)) (map nil (lambda (c) (sink-write-rune c sink)) data)
(sink-write-rod #"]]>" sink))
(t (t
(if (indentation sink) (if (indentation sink)
(unparse-indented-text data sink) (unparse-indented-text data sink)
(let ((y (sink-ystream sink))) (if (canonical sink)
(if (canonical sink) (sink-write-escapable-rod/canonical data sink)
(loop for c across data do (unparse-datachar c y)) (sink-write-escapable-rod data sink))))))
(loop for c across data do (unparse-datachar-readable c y))))))))
(defmethod sax:unescaped ((sink sink) data) (defmethod sax:unescaped ((sink sink) data)
(maybe-close-tag sink) (maybe-close-tag sink)
(%write-rod data sink)) (sink-write-rod data sink))
(defmethod sax:comment ((sink sink) data) (defmethod sax:comment ((sink sink) data)
(maybe-close-tag sink) (maybe-close-tag sink)
(unless (canonical sink) (unless (canonical sink)
;; XXX signal error if body is unprintable? ;; XXX signal error if body is unprintable?
(%write-rod #"<!--" sink) (sink-write-rod #"<!--" sink)
(map nil (lambda (c) (%write-rune c sink)) data) (map nil (lambda (c) (sink-write-rune c sink)) data)
(%write-rod #"-->" sink))) (sink-write-rod #"-->" sink)))
(defmethod sax:end-cdata ((sink sink)) (defmethod sax:end-cdata ((sink sink))
(unless (eq (pop (stack sink)) :cdata) (unless (eq (pop (stack sink)) :cdata)
@ -483,7 +523,7 @@
(defun indent (sink) (defun indent (sink)
(dotimes (x (current-indentation sink)) (dotimes (x (current-indentation sink))
(%write-rune #/U+0020 sink))) ; space (sink-write-rune #/U+0020 sink)))
(defun start-indentation-block (sink) (defun start-indentation-block (sink)
(incf (current-indentation sink) (indentation sink))) (incf (current-indentation sink) (indentation sink)))
@ -507,62 +547,97 @@
(when need-whitespace-p (when need-whitespace-p
(if (< (+ (ystream-column (sink-ystream sink)) w (- pos)) (if (< (+ (ystream-column (sink-ystream sink)) w (- pos))
(width sink)) (width sink))
(%write-rune #/U+0020 sink) (sink-write-rune #/U+0020 sink)
(sink-fresh-line sink))) (sink-fresh-line sink)))
(loop (sink-write-escapable-rod data sink :start pos :end w)
with y = (sink-ystream sink)
for i from pos below w do
(unparse-datachar-readable (elt data i) y))
(setf need-whitespace-p (< w n)) (setf need-whitespace-p (< w n))
(setf pos next)))) (setf pos next))))
(t (t
(%write-rune #/U+0020 sink)))))) (sink-write-rune #/U+0020 sink))))))
(defun unparse-string (str sink) (defun sink-write-escapable-rod (rod sink &key (start 0) (end (length rod)))
;;
;; OPTIMIZE ME
;;
(let ((y (sink-ystream sink))) (let ((y (sink-ystream sink)))
(loop for rune across str do (unparse-datachar rune y)))) (loop
for i from start below end
for c = (rune rod i)
do
(case c
(#/& (ystream-write-escapable-rod #.(string-rod "&amp;") y))
(#/< (ystream-write-escapable-rod #.(string-rod "&lt;") y))
;; there's no need to escape > per se, but we're supposed to
;; escape -->, which is harder to check for
(#/> (ystream-write-escapable-rod #.(string-rod "&gt;") y))
(#/U+000D (ystream-write-escapable-rod #.(string-rod "&#13;") y))
(t (ystream-write-escapable-rune c y))))))
(defun unparse-datachar (c ystream) (defun sink-write-escapable-rod/attribute
(cond ((rune= c #/&) (write-rod '#.(string-rod "&amp;") ystream)) (rod sink &key (start 0) (end (length rod)))
((rune= c #/<) (write-rod '#.(string-rod "&lt;") ystream)) ;;
((rune= c #/>) (write-rod '#.(string-rod "&gt;") ystream)) ;; OPTIMIZE ME
((rune= c #/\") (write-rod '#.(string-rod "&quot;") ystream)) ;;
((rune= c #/U+0009) (write-rod '#.(string-rod "&#9;") ystream))
((rune= c #/U+000A) (write-rod '#.(string-rod "&#10;") ystream))
((rune= c #/U+000D) (write-rod '#.(string-rod "&#13;") ystream))
(t
(write-rune c ystream))))
(defun unparse-datachar-readable (c ystream)
(cond ((rune= c #/&) (write-rod '#.(string-rod "&amp;") ystream))
((rune= c #/<) (write-rod '#.(string-rod "&lt;") ystream))
((rune= c #/>) (write-rod '#.(string-rod "&gt;") ystream))
((rune= c #/\") (write-rod '#.(string-rod "&quot;") ystream))
((rune= c #/U+000D) (write-rod '#.(string-rod "&#13;") ystream))
(t
(write-rune c ystream))))
(defun unparse-dtd-string (str sink)
(let ((y (sink-ystream sink))) (let ((y (sink-ystream sink)))
(loop for rune across str do (unparse-dtd-char rune y)))) (loop
for i from start below end
for c = (rune rod i)
do
(case c
(#/& (ystream-write-escapable-rod #.(string-rod "&amp;") y))
(#/< (ystream-write-escapable-rod #.(string-rod "&lt;") y))
;; there's no need to escape > per se, but we're supposed to
;; escape -->, which is harder to check for
(#/> (ystream-write-escapable-rod #.(string-rod "&gt;") y))
(#/\" (ystream-write-escapable-rod #.(string-rod "&quot;") y))
(#/U+0009 (ystream-write-escapable-rod #.(string-rod "&#9;") y))
(#/U+000A (ystream-write-escapable-rod #.(string-rod "&#10;") y))
(#/U+000D (ystream-write-escapable-rod #.(string-rod "&#13;") y))
(t (ystream-write-escapable-rune c y))))))
(defun unparse-dtd-char (c ystream) (defun sink-write-escapable-rod/canonical
(cond ((rune= c #/%) (write-rod '#.(string-rod "&#37;") ystream)) (rod sink &key (start 0) (end (length rod)))
((rune= c #/&) (write-rod '#.(string-rod "&amp;") ystream)) ;;
((rune= c #/<) (write-rod '#.(string-rod "&lt;") ystream)) ;; OPTIMIZE ME
((rune= c #/>) (write-rod '#.(string-rod "&gt;") ystream)) ;;
((rune= c #/\") (write-rod '#.(string-rod "&quot;") ystream)) (let ((y (sink-ystream sink)))
((rune= c #/U+0009) (write-rod '#.(string-rod "&#9;") ystream)) (loop
((rune= c #/U+000A) (write-rod '#.(string-rod "&#10;") ystream)) for i from start below end
((rune= c #/U+000D) (write-rod '#.(string-rod "&#13;") ystream)) for c = (rune rod i)
(t do
(write-rune c ystream)))) (case c
(#/& (ystream-write-escapable-rod #.(string-rod "&amp;") y))
(#/< (ystream-write-escapable-rod #.(string-rod "&lt;") y))
(#/> (ystream-write-escapable-rod #.(string-rod "&gt;") y))
(#/\" (ystream-write-escapable-rod #.(string-rod "&quot;") y))
(#/U+0009 (ystream-write-escapable-rod #.(string-rod "&#9;") y))
(#/U+000A (ystream-write-escapable-rod #.(string-rod "&#10;") y))
(#/U+000D (ystream-write-escapable-rod #.(string-rod "&#13;") y))
(t (ystream-write-escapable-rune c y))))))
(defun %write-rune (c sink) (defun sink-write-escapable-rod/dtd
(write-rune c (sink-ystream sink))) (rod sink &key (start 0) (end (length rod)))
(let ((y (sink-ystream sink)))
(loop
for i from start below end
for c = (rune rod i)
do
(case c
(#/% (ystream-write-escapable-rod #.(string-rod "&#37;") y))
(#/& (ystream-write-escapable-rod #.(string-rod "&amp;") y))
(#/< (ystream-write-escapable-rod #.(string-rod "&lt;") y))
(#/> (ystream-write-escapable-rod #.(string-rod "&gt;") y))
(#/\" (ystream-write-escapable-rod #.(string-rod "&quot;") y))
(#/U+0009 (ystream-write-escapable-rod #.(string-rod "&#9;") y))
(#/U+000A (ystream-write-escapable-rod #.(string-rod "&#10;") y))
(#/U+000D (ystream-write-escapable-rod #.(string-rod "&#13;") y))
(t (ystream-write-escapable-rune c y))))))
(defun %write-rod (r sink) (defun sink-write-rune (c sink)
(write-rod r (sink-ystream sink))) (ystream-write-rune c (sink-ystream sink)))
(defun sink-write-rod (r sink)
(ystream-write-rod r (sink-ystream sink)))
;;;; convenience functions for DOMless XML serialization ;;;; convenience functions for DOMless XML serialization