diff --git a/README.html b/README.html index ab7a1d7..184555e 100644 --- a/README.html +++ b/README.html @@ -10,6 +10,49 @@

CXML Homepage

+

Closure XML Parser

@@ -67,6 +110,7 @@
  • Implemented DOM 2 Core.
  • Error handling overhaul.
  • UTF-8 string support in DOM on Lisps without Unicode characters.
  • +
  • Sink API has been changed.
  • Support internal subset serialization.
  • Gilbert Baumann has clarified the license as Lisp-LGPL.
  • Use trivial-gray-streams.
  • diff --git a/cxml.asd b/cxml.asd index 7cc05eb..43a6647 100644 --- a/cxml.asd +++ b/cxml.asd @@ -51,7 +51,8 @@ (:file "encodings" :depends-on ("package")) (:file "encodings-data" :depends-on ("package" "encodings")) (:file "xstream" - :depends-on ("package" "definline" "syntax" "encodings-data")))) + :depends-on ("package" "definline" "syntax" "encodings-data")) + (:file "ystream" :depends-on (runes)))) (asdf:defsystem :cxml-xml :default-component-class closure-source-file @@ -106,7 +107,6 @@ (utf8dom-file utf8-impl :pathname "dom-impl" :depends-on ("package")) #+rune-is-integer (utf8dom-file utf8-builder :pathname "dom-builder" :depends-on (utf8-impl)) - (:file "unparse" :depends-on ("package")) (:file "dom-sax" :depends-on ("package"))) :depends-on (:cxml-xml)) diff --git a/doc/dom.html b/doc/dom.html index 95d93b9..2f06f66 100644 --- a/doc/dom.html +++ b/doc/dom.html @@ -7,6 +7,52 @@

    The DOM implementation

    @@ -54,14 +100,9 @@

    Serializing DOM

    - The technique used to serialize a DOM document is to use a SAX - serialization sink as the argument to dom:map-document, - which generates SAX events for the DOM tree. -

    -

    - In addition, there are convenience functions like - unparse-document as a thin wrapper around - map-document. + To serialize a DOM document, use a SAX serialization sink as the + argument to dom:map-document, which generates SAX events + for the DOM tree.

    Applications dealing with namespaces might want to inject a @@ -99,34 +140,6 @@ -

    -

    Function CXML:UNPARSE-DOCUMENT (document stream &rest keys)
    -
    Function CXML:UNPARSE-DOCUMENT-TO-OCTETS (document &rest keys) => vector
    -

    -

    - Serialize a DOM document object. These convenience functions are - wrappers around dom:map-document. -

    -

    Keyword arguments are passed on to the sink. C.f. cxml:make-octet-vector-sink.

    -

    Notes:

    - -

    - unparse-document-to-octets returns an (unsigned-byte - 8) array, whereas unparse-document writes - characters.  unparse-document is useful together - with with-output-to-string.  However, note that the - resulting document in both cases is UTF-8 encoded, so the - characters written by unparse-document are really UTF-8 - bytes encoded as characters. -

    -

    DOM/Lisp mapping

    diff --git a/doc/installation.html b/doc/installation.html index cc62863..c10ee97 100644 --- a/doc/installation.html +++ b/doc/installation.html @@ -7,6 +7,52 @@

    Installation of Closure XML

    diff --git a/doc/quickstart.html b/doc/quickstart.html index 69f038f..2afacd1 100644 --- a/doc/quickstart.html +++ b/doc/quickstart.html @@ -83,10 +83,10 @@ * (dom:get-attribute (dom:document-element *example*) "a") "b" -

    Serialize the DOM document back into a stream (Serialize the DOM document back into a file (read more):

    -
    (cxml:unparse-document *example* *standard-output*)
    -<test a="b"><child></child></test>
    +
    (with-open-file (out "example.out" :direction :output :element-type '(unsigned-byte 8))
    +  (dom:map-document (cxml:make-octet-stream-sink out) *example*))

    As an alternative to DOM, parse into xmls-compatible list structure (read more):

    diff --git a/doc/using.html b/doc/using.html index 2f53068..efc7360 100644 --- a/doc/using.html +++ b/doc/using.html @@ -7,6 +7,52 @@

    Using the SAX parser

    @@ -137,12 +183,70 @@ with-xml-output, with-element, etc).

    +
    + Portable sinks:
    + Function CXML:MAKE-OCTET-VECTOR-SINK (&rest keys) => sink
    + Function CXML:MAKE-OCTET-STREAM-SINK (stream &rest keys) => sink
    + Function CXML:MAKE-ROD-SINK (&rest keys) => sink
    +
    + Only on Lisps with Unicode support:
    + Function CXML:MAKE-STRING-SINK -- alias for cxml:make-rod-sink
    + Function CXML:MAKE-CHARACTER-STREAM-SINK (stream &rest keys) => sink
    +
    + Only on Lisps without Unicode support:
    + Function CXML:MAKE-STRING-SINK/UTF8 (&rest keys) => sink
    + Function CXML:MAKE-CHARACTER-STREAM-SINK/UTF8 (stream &rest keys) => sink
    +

    -

    Function CXML:MAKE-OCTET-STREAM-SINK (stream &rest keys) => sink
    -
    Function CXML:MAKE-OCTET-VECTOR-SINK (&rest keys) => sink
    -
    Function CXML:MAKE-CHARACTER-STREAM-SINK (stream &rest keys) => sink
    Return a SAX serialization handle.

    +

    Keyword arguments:

    +

    + An internal subset will be included in the result regardless of + the canonical setting. It is the responsibility of the + caller to not report an internal subset for + canonical <= 1, or only notations as required for + canonical = 2. For example, the + include-doctype argument to dom:map-document + should be set to nil for the former behaviour and + :canonical-notations for the latter. +

    With an indentation level, pretty-print the XML by inserting additional whitespace.  Note that indentation diff --git a/doc/xmls-compat.html b/doc/xmls-compat.html index 7ab3429..4474b39 100644 --- a/doc/xmls-compat.html +++ b/doc/xmls-compat.html @@ -7,6 +7,52 @@

    XMLS Builder

    diff --git a/dom/dom-impl.lisp b/dom/dom-impl.lisp index 0e0a512..8de0a1d 100644 --- a/dom/dom-impl.lisp +++ b/dom/dom-impl.lisp @@ -973,7 +973,7 @@ (rod-stream-buf stream))) (defmethod write-attribute-child ((node node) stream) - (write-rod (dom:node-value node) stream)) + (put-rod (dom:node-value node) stream)) (defmethod write-attribute-child ((node entity-reference) stream) (dovector (child (dom:child-nodes node)) @@ -988,7 +988,7 @@ (buf nil) (position 0)) -(defun write-rod (rod rod-stream) +(defun put-rod (rod rod-stream) (let ((buf (rod-stream-buf rod-stream))) (when buf (move rod buf 0 (rod-stream-position rod-stream) (length rod))) @@ -1210,10 +1210,12 @@ ;; dass ein leeres internal subset nicht vorhanden ist und ;; wir daher nil liefern sollen. bittesehr! (dom::%internal-subset node)) - (with-output-to-string (stream) - (let ((sink (cxml:make-character-stream-sink stream))) - (dolist (def (dom::%internal-subset node)) - (apply (car def) sink (cdr def))))) + (let ((sink + #+rune-is-character (cxml:make-string-sink) + #-rune-is-character (cxml:make-string-sink/utf8))) + (dolist (def (dom::%internal-subset node)) + (apply (car def) sink (cdr def))) + (sax:end-document sink)) nil)) ;;; NOTATION -- nix diff --git a/dom/unparse.lisp b/dom/unparse.lisp deleted file mode 100644 index 120835f..0000000 --- a/dom/unparse.lisp +++ /dev/null @@ -1,19 +0,0 @@ -(in-package :cxml) - -(defun %unparse-document (sink doc canonical) - (dom:map-document sink - doc - :include-doctype (if (and canonical (>= canonical 2)) - :canonical-notations - nil) - :include-default-values t)) - -(defun unparse-document-to-octets (doc &rest initargs &key canonical) - (%unparse-document (apply #'make-octet-vector-sink initargs) - doc - canonical)) - -(defun unparse-document (doc stream &rest initargs &key canonical) - (%unparse-document (apply #'make-character-stream-sink stream initargs) - doc - canonical)) diff --git a/runes/package.lisp b/runes/package.lisp index e2c70e2..f801eda 100644 --- a/runes/package.lisp +++ b/runes/package.lisp @@ -59,7 +59,21 @@ #:xstream-plist #:xstream-encoding #:set-to-full-speed - #:xstream-name)) + #:xstream-name + + ;; ystream.lisp + #:ystream + #:close-ystream + #:write-rune + #:write-rod + #:ystream-column + #:make-octet-vector-ystream + #:make-octet-stream-ystream + #:make-rod-ystream + #+rune-is-character #:make-character-stream-ystream + #+rune-is-integer #:make-string-ystream/utf8 + #+rune-is-integer #:make-character-stream-ystream/utf8 + #:runes-to-utf8/adjustable-string)) (defpackage :utf8-runes (:use :cl) diff --git a/runes/ystream.lisp b/runes/ystream.lisp new file mode 100644 index 0000000..e574f3a --- /dev/null +++ b/runes/ystream.lisp @@ -0,0 +1,247 @@ +;;; (c) 2005 David Lichteblau +;;; License: Lisp-LGPL (See file COPYING for details). +;;; +;;; ystream (for lack of a better name): a rune output "stream" + +(in-package :runes) + +(defconstant +ystream-bufsize+ 1024) + +(defun make-ub8-array (n) + (make-array n :element-type '(unsigned-byte 8))) + +(defun make-ub16-array (n) + (make-array n :element-type '(unsigned-byte 16))) + +(defun make-buffer (&key (element-type '(unsigned-byte 8))) + (make-array 1 + :element-type element-type + :adjustable t + :fill-pointer 0)) + +(defmacro while (test &body body) + `(until (not ,test) ,@body)) + +(defmacro until (test &body body) + `(do () (,test) ,@body)) + +;;; ystream +;;; +- utf8-ystream +;;; | +- octet-vector-ystream +;;; | \- %stream-ystream +;;; | +- octet-stream-ystream +;;; | \- character-stream-ystream/utf8 +;;; | \- string-ystream/utf8 +;;; +- rod-ystream +;;; \-- character-stream-ystream + +(defstruct ystream + (column 0 :type integer) + (in-ptr 0 :type fixnum) + (in-buffer (make-rod +ystream-bufsize+) :type simple-rod)) + +(defstruct (utf8-ystream + (:include ystream) + (:conc-name "YSTREAM-")) + (out-buffer (make-ub8-array (* 6 +ystream-bufsize+)) + :type (simple-array (unsigned-byte 8) (*)))) + +(defstruct (%stream-ystream (:include utf8-ystream) (:conc-name "YSTREAM-")) + (os-stream nil)) + +(definline write-rune (rune ystream) + (let ((in (ystream-in-buffer ystream))) + (when (eql (ystream-in-ptr ystream) (length in)) + (flush-ystream ystream) + (setf in (ystream-in-buffer ystream))) + (setf (elt in (ystream-in-ptr ystream)) rune) + (incf (ystream-in-ptr ystream)) + (setf (ystream-column ystream) + (if (eql rune #/U+0010) 0 (1+ (ystream-column ystream)))) + rune)) + +(defmethod close-ystream :before ((ystream ystream)) + (flush-ystream ystream)) + + +;;;; UTF8-YSTREAM (abstract) + +(defmethod close-ystream ((ystream %stream-ystream)) + (ystream-os-stream ystream)) + +(defgeneric ystream-device-write (ystream buf nbytes)) + +(defmethod flush-ystream ((ystream utf8-ystream)) + (let ((ptr (ystream-in-ptr ystream))) + (when (plusp ptr) + (let* ((in (ystream-in-buffer ystream)) + (out (ystream-out-buffer ystream)) + (surrogatep (<= #xD800 (elt in (1- ptr)) #xDBFF)) + n) + (when surrogatep + (decf ptr)) + (when (plusp ptr) + (setf n (runes-to-utf8 out in ptr)) + (ystream-device-write ystream out n) + (cond + (surrogatep + (setf (elt in 0) (elt in (1- ptr))) + (setf (ystream-in-ptr ystream) 1)) + (t + (setf (ystream-in-ptr ystream) 0)))))))) + +(defun write-rod (rod sink) + (loop for rune across rod do (write-rune rune sink))) + +(defun fast-push (new-element vector) + (vector-push-extend new-element vector (max 1 (array-dimension vector 0)))) + +(macrolet ((define-utf8-writer (name (byte &rest aux) result &body body) + `(defun ,name (out in n) + (let ((high-surrogate nil) + ,@aux) + (labels + ((write0 (,byte) + ,@body) + (write1 (r) + (cond + ((<= #x00000000 r #x0000007F) + (write0 r)) + ((<= #x00000080 r #x000007FF) + (write0 (logior #b11000000 (ldb (byte 5 6) r))) + (write0 (logior #b10000000 (ldb (byte 6 0) r)))) + ((<= #x00000800 r #x0000FFFF) + (write0 (logior #b11100000 (ldb (byte 4 12) r))) + (write0 (logior #b10000000 (ldb (byte 6 6) r))) + (write0 (logior #b10000000 (ldb (byte 6 0) r)))) + ((<= #x00010000 r #x001FFFFF) + (write0 (logior #b11110000 (ldb (byte 3 18) r))) + (write0 (logior #b10000000 (ldb (byte 6 12) r))) + (write0 (logior #b10000000 (ldb (byte 6 6) r))) + (write0 (logior #b10000000 (ldb (byte 6 0) r)))) + ((<= #x00200000 r #x03FFFFFF) + (write0 (logior #b11111000 (ldb (byte 2 24) r))) + (write0 (logior #b10000000 (ldb (byte 6 18) r))) + (write0 (logior #b10000000 (ldb (byte 6 12) r))) + (write0 (logior #b10000000 (ldb (byte 6 6) r))) + (write0 (logior #b10000000 (ldb (byte 6 0) r)))) + ((<= #x04000000 r #x7FFFFFFF) + (write0 (logior #b11111100 (ldb (byte 1 30) r))) + (write0 (logior #b10000000 (ldb (byte 6 24) r))) + (write0 (logior #b10000000 (ldb (byte 6 18) r))) + (write0 (logior #b10000000 (ldb (byte 6 12) r))) + (write0 (logior #b10000000 (ldb (byte 6 6) r))) + (write0 (logior #b10000000 (ldb (byte 6 0) r)))))) + (write2 (r) + (cond + ((<= #xD800 r #xDBFF) + (setf high-surrogate r)) + ((<= #xDC00 r #xDFFF) + (let ((q (logior (ash (- high-surrogate #xD7C0) 10) + (- r #xDC00)))) + (write1 q)) + (setf high-surrogate nil)) + (t + (write1 r))))) + (dotimes (j n) + (write2 (rune-code (elt in j))))) + ,result)))) + (define-utf8-writer runes-to-utf8 (x (i 0)) + i + (setf (elt out i) x) + (incf i)) + (define-utf8-writer runes-to-utf8/adjustable-string (x) + nil + (fast-push (code-char x) out))) + + +;;;; ROD-YSTREAM + +(defstruct (rod-ystream (:include ystream))) + +(defmethod flush-ystream ((ystream rod-ystream)) + (let* ((old (ystream-in-buffer ystream)) + (new (make-rod (* 2 (length old))))) + (replace new old) + (setf (ystream-in-buffer ystream) new))) + +(defmethod close-ystream ((ystream rod-ystream)) + (subseq (ystream-in-buffer ystream) 0 (ystream-in-ptr ystream))) + + +;;;; CHARACTER-STREAM-YSTREAM + +#+rune-is-character +(progn + (defstruct (character-stream-ystream + (:constructor make-character-stream-ystream (target-stream)) + (:include ystream) + (:conc-name "YSTREAM-")) + (target-stream nil)) + + (defmethod flush-ystream ((ystream rod-ystream)) + (write-string (ystream-in-buffer ystream) (ystream-target-stream ystream)) + (setf (ystream-in-ptr ystream) 0)) + + (defmethod close-ystream ((ystream rod-ystream)) + (ystream-target-stream ystream))) + + +;;;; OCTET-VECTOR-YSTREAM + +(defstruct (octet-vector-ystream + (:include utf8-ystream) + (:conc-name "YSTREAM-")) + (result (make-buffer))) + +(defmethod ystream-device-write ((ystream octet-vector-ystream) buf nbytes) + (let* ((result (ystream-result ystream)) + (start (length result)) + (size (array-dimension result 0))) + (while (> (+ start nbytes) size) + (setf size (* 2 size))) + (adjust-array result size :fill-pointer (+ start nbytes)) + (replace result buf :start1 start :end2 nbytes))) + +(defmethod close-ystream ((ystream octet-vector-ystream)) + (ystream-result ystream)) + + +;;;; OCTET-STREAM-YSTREAM + +(defstruct (octet-stream-ystream + (:include %stream-ystream) + (:constructor make-octet-stream-ystream (os-stream)) + (:conc-name "YSTREAM-"))) + +(defmethod ystream-device-write ((ystream octet-stream-ystream) buf nbytes) + (write-sequence buf (ystream-os-stream ystream) :end nbytes)) + + +;;;; CHARACTER-STREAM-YSTREAM/UTF8 + +#+rune-is-integer +(progn + (defstruct (character-stream-ystream/utf8 + (:include %stream-ystream) + (:conc-name "YSTREAM-"))) + + (defmethod ystream-device-write + ((ystream character-stream-ystream/utf8) buf nbytes) + (declare (type (simple-array (unsigned-byte 8) (*)) buf)) + (let ((out (ystream-os-stream ystream))) + (dotimes (x nbytes) + (write-char (code-char (elt buf x)) out))))) + + +;;;; STRING-YSTREAM/UTF8 + +#+rune-is-integer +(progn + (defstruct (string-ystream/utf8 + (:include character-stream-ystream/utf8 + (os-stream (make-string-output-stream))) + (:conc-name "YSTREAM-"))) + + (defmethod close-ystream ((ystream string-ystream/utf8)) + (get-output-stream-string (ystream-os-stream ystream)))) diff --git a/test/domtest.lisp b/test/domtest.lisp index 6d40b62..0cba632 100644 --- a/test/domtest.lisp +++ b/test/domtest.lisp @@ -680,7 +680,7 @@ (dom:get-attribute member "href")))) (unless (or (runes:rod= (dom:tag-name member) #"metadata") (member href *bad-tests* :test 'equal)) - (format t "~&~D/~D ~A~%" i #+nil n 808 href) + (format t "~&~D/~D ~A~%" i n href) (let ((lisp (slurp-test (merge-pathnames href test-directory)))) (when verbose diff --git a/test/xmlconf.lisp b/test/xmlconf.lisp index ed1bcf4..8dd482b 100644 --- a/test/xmlconf.lisp +++ b/test/xmlconf.lisp @@ -54,9 +54,10 @@ (merge-pathnames output sub-directory))))) (defun serialize-document (document) - (map 'vector #'char-code - (with-output-to-string (s) - (cxml:unparse-document document s :canonical 2)))) + (dom:map-document (cxml:make-octet-vector-sink :canonical 2) + document + :include-doctype :canonical-notations + :include-default-values t)) (defun file-contents (pathname) (with-open-file (s pathname :element-type '(unsigned-byte 8)) diff --git a/xml/package.lisp b/xml/package.lisp index 8797981..39bd4e6 100644 --- a/xml/package.lisp +++ b/xml/package.lisp @@ -38,11 +38,13 @@ ;; #:parse-string #:parse-octets - #:make-character-stream-sink #:make-octet-vector-sink #:make-octet-stream-sink - #:unparse-document - #:unparse-document-to-octets + #:make-rod-sink + #+rune-is-character #:make-string-sink + #+rune-is-character #:make-character-stream-sink + #-rune-is-character #:make-string-sink/utf8 + #-rune-is-character #:make-character-stream-sink/utf8 #:with-xml-output #:with-element diff --git a/xml/unparse.lisp b/xml/unparse.lisp index 1dbfbaa..fad3f01 100644 --- a/xml/unparse.lisp +++ b/xml/unparse.lisp @@ -67,11 +67,10 @@ ;; -- James Clark (jjc@jclark.com) -;;;; SINK: a rune output "stream" +;;;; SINK: an xml output sink (defclass sink () - ((high-surrogate :initform nil) - (column :initform 0 :accessor column) + ((ystream :initarg :ystream :accessor sink-ystream) (width :initform 79 :initarg :width :accessor width) (canonical :initform t :initarg :canonical :accessor canonical) (indentation :initform nil :initarg :indentation :accessor indentation) @@ -90,77 +89,49 @@ (when (and (canonical instance) (indentation instance)) (error "Cannot indent XML in canonical mode"))) -;; WRITE-OCTET als generisch zu machen ist vielleicht nicht die schnellste -;; Loesung, aber die einfachste. -(defgeneric write-octet (octet sink)) - (defun make-buffer (&key (element-type '(unsigned-byte 8))) (make-array 1 :element-type element-type :adjustable t :fill-pointer 0)) -(defmethod write-octet :after (octet sink) - (with-slots (column) sink - (setf column (if (eql octet 10) 0 (1+ column))))) +;; total haesslich, aber die ystreams will ich im moment eigentlich nicht +;; dokumentieren +(macrolet ((define-maker (make-sink make-ystream &rest args) + `(defun ,make-sink (,@args &rest initargs) + (apply #'make-instance + 'sink + :ystream (,make-ystream ,@args) + initargs)))) + (define-maker make-octet-vector-sink make-octet-vector-ystream) + (define-maker make-octet-stream-sink make-octet-stream-ystream stream) + (define-maker make-rod-sink make-rod-ystream) + + #+rune-is-character + (define-maker make-character-stream-sink make-character-ystream stream) + + #-rune-is-character + (define-maker make-string-sink/utf8 make-string-ystream/utf8) + + #-rune-is-character + (define-maker make-character-stream-sink/utf8 + make-character-stream-ystream/utf8 + stream)) + +#+rune-is-character +(defun make-string-sink (&rest args) (apply #'make-rod-sink args)) -;; vector (octet) sinks - -(defclass vector-sink (sink) - ((target-vector :initform (make-buffer)))) - -(defun make-octet-vector-sink (&rest initargs) - (apply #'make-instance 'vector-sink initargs)) - -(defmethod write-octet (octet (sink vector-sink)) - (let ((target-vector (slot-value sink 'target-vector))) - (vector-push-extend octet target-vector (length target-vector)))) - -(defmethod sax:end-document ((sink vector-sink)) - (slot-value sink 'target-vector)) - - -;; character stream sinks - -(defclass character-stream-sink (sink) - ((target-stream :initarg :target-stream))) - -(defun make-character-stream-sink (character-stream &rest initargs) - (apply #'make-instance 'character-stream-sink - :target-stream character-stream - initargs)) - -(defmethod write-octet (octet (sink character-stream-sink)) - (write-char (code-char octet) (slot-value sink 'target-stream))) - -(defmethod sax:end-document ((sink character-stream-sink)) - (slot-value sink 'target-stream)) - - -;; octet stream sinks - -(defclass octet-stream-sink (sink) - ((target-stream :initarg :target-stream))) - -(defun make-octet-stream-sink (octet-stream &rest initargs) - (apply #'make-instance 'octet-stream-sink - :target-stream octet-stream - initargs)) - -(defmethod write-octet (octet (sink octet-stream-sink)) - (write-byte octet (slot-value sink 'target-stream))) - -(defmethod sax:end-document ((sink octet-stream-sink)) - (slot-value sink 'target-stream)) +(defmethod sax:end-document ((sink sink)) + (close-ystream (sink-ystream sink))) ;;;; doctype and notations (defmethod sax:start-document ((sink sink)) (unless (canonical sink) - (write-rod #"" sink) - (write-rune #/U+000A sink))) + (%write-rod #"" sink) + (%write-rune #/U+000A sink))) (defmethod sax:start-dtd ((sink sink) name public-id system-id) (setf (name-for-dtd sink) name) @@ -170,28 +141,28 @@ (defun ensure-doctype (sink &optional public-id system-id) (unless (have-doctype sink) (setf (have-doctype sink) t) - (write-rod #" sink) - (write-rune #/U+000A sink)) + (%write-rod #" PUBLIC '" sink) + (%write-rod public-id sink) + (%write-rod #"' '" sink) + (%write-rod system-id sink) + (%write-rune #/' sink))) + (%write-rune #/> sink) + (%write-rune #/U+000A sink)) (defmethod sax:unparsed-entity-declaration ((sink sink) name public-id system-id notation-name) (unless (and (canonical sink) (< (canonical sink) 3)) - (write-rod #" sink) - (write-rune #/U+000A sink))) + (%write-rod #" PUBLIC '" sink) + (%write-rod public-id sink) + (%write-rod #"' '" sink) + (%write-rod system-id sink) + (%write-rune #/' sink))) + (%write-rod #" NDATA " sink) + (%write-rod notation-name sink) + (%write-rune #/> sink) + (%write-rune #/U+000A sink))) (defmethod sax:external-entity-declaration ((sink sink) kind name public-id system-id) (when (canonical sink) (error "cannot serialize parsed entities in canonical mode")) - (write-rod #" sink) - (write-rune #/U+000A sink)) + (%write-rod #" PUBLIC '" sink) + (%write-rod public-id sink) + (%write-rod #"' '" sink) + (%write-rod system-id sink) + (%write-rune #/' sink))) + (%write-rune #/> sink) + (%write-rune #/U+000A sink)) (defmethod sax:internal-entity-declaration ((sink sink) kind name value) (when (canonical sink) (error "cannot serialize parsed entities in canonical mode")) - (write-rod #" sink) - (write-rune #/U+000A sink)) + (%write-rune #/\" sink) + (%write-rune #/> sink) + (%write-rune #/U+000A sink)) (defmethod sax:element-declaration ((sink sink) name model) (when (canonical sink) (error "cannot serialize element type declarations in canonical mode")) - (write-rod #" sink) - (write-rune #/U+000A sink)) + (%write-rune #/> sink) + (%write-rune #/U+000A sink)) (defmethod sax:attribute-declaration ((sink sink) ename aname type default) (when (canonical sink) (error "cannot serialize attribute type declarations in canonical mode")) - (write-rod #" sink) - (write-rune #/U+000A sink)) + (%write-rune #/\" sink))) + (%write-rune #/> sink) + (%write-rune #/U+000A sink)) (defmethod sax:end-dtd ((sink sink)) (when (have-doctype sink) - (write-rod #">" sink) - (write-rune #/U+000A sink))) + (%write-rod #">" sink) + (%write-rune #/U+000A sink))) ;;;; elements @@ -375,15 +346,15 @@ (have-gt nil)) (defun sink-fresh-line (sink) - (unless (zerop (column sink)) - (write-rune-0 10 sink) + (unless (zerop (ystream-column (sink-ystream sink))) + (%write-rune 10 sink) (indent sink))) (defun maybe-close-tag (sink) (let ((tag (car (stack sink)))) (when (and (tag-p tag) (not (tag-have-gt tag))) (setf (tag-have-gt tag) t) - (write-rune #/> sink)))) + (%write-rune #/> sink)))) (defmethod sax:start-element ((sink sink) namespace-uri local-name qname attributes) @@ -395,16 +366,16 @@ (when (indentation sink) (sink-fresh-line sink) (start-indentation-block sink)) - (write-rune #/< sink) - (write-rod qname sink) + (%write-rune #/< sink) + (%write-rod qname sink) (let ((atts (sort (copy-list attributes) #'rod< :key #'sax:attribute-qname))) (dolist (a atts) - (write-rune #/space sink) - (write-rod (sax:attribute-qname a) sink) - (write-rune #/= sink) - (write-rune #/\" sink) - (map nil (lambda (c) (unparse-datachar c sink)) (sax:attribute-value a)) - (write-rune #/\" sink))) + (%write-rune #/space sink) + (%write-rod (sax:attribute-qname a) sink) + (%write-rune #/= sink) + (%write-rune #/\" sink) + (unparse-string (sax:attribute-value a) sink) + (%write-rune #/\" sink))) (when (canonical sink) (maybe-close-tag sink))) @@ -423,21 +394,21 @@ (sink-fresh-line sink))) (cond ((tag-have-gt tag) - (write-rod '#.(string-rod "") sink)) + (%write-rod '#.(string-rod "") sink)) (t - (write-rod #"/>" sink))))) + (%write-rod #"/>" sink))))) (defmethod sax:processing-instruction ((sink sink) target data) (maybe-close-tag sink) (unless (rod-equal target '#.(string-rod "xml")) - (write-rod '#.(string-rod "") sink))) + (%write-rune #/space sink) + (%write-rod data sink)) + (%write-rod '#.(string-rod "?>") sink))) (defmethod sax:start-cdata ((sink sink)) (maybe-close-tag sink) @@ -451,17 +422,17 @@ (not (search #"]]" data))) (when (indentation sink) (sink-fresh-line sink)) - (write-rod #"" sink)) + (map nil (lambda (c) (%write-rune c sink)) data) + (%write-rod #"]]>" sink)) (t (if (indentation sink) (unparse-indented-text data sink) - (map nil (if (canonical sink) - (lambda (c) (unparse-datachar c sink)) - (lambda (c) (unparse-datachar-readable c sink))) - data))))) + (let ((y (sink-ystream sink))) + (if (canonical sink) + (loop for c across data do (unparse-datachar c y)) + (loop for c across data do (unparse-datachar-readable c y)))))))) (defmethod sax:end-cdata ((sink sink)) (unless (eq (pop (stack sink)) :cdata) @@ -469,7 +440,7 @@ (defun indent (sink) (dotimes (x (current-indentation sink)) - (write-rune-0 32 sink))) + (%write-rune 32 sink))) (defun start-indentation-block (sink) (incf (current-indentation sink) (indentation sink))) @@ -491,89 +462,47 @@ (let* ((w (or (position-if #'whitespacep data :start (1+ pos)) n)) (next (or (position-if-not #'whitespacep data :start w) n))) (when need-whitespace-p - (if (< (+ (column sink) w (- pos)) (width sink)) - (write-rune-0 32 sink) + (if (< (+ (ystream-column (sink-ystream sink)) w (- pos)) + (width sink)) + (%write-rune 32 sink) (sink-fresh-line sink))) (loop + with y = (sink-ystream sink) for i from pos below w do - (unparse-datachar-readable (elt data i) sink)) + (unparse-datachar-readable (elt data i) y)) (setf need-whitespace-p (< w n)) (setf pos next)))) (t - (write-rune-0 32 sink)))))) + (%write-rune 32 sink)))))) (defun unparse-string (str sink) - (map nil (lambda (c) (unparse-datachar c sink)) str)) + (let ((y (sink-ystream sink))) + (loop for rune across str do (unparse-datachar rune y)))) -(defun unparse-datachar (c sink) - (cond ((rune= c #/&) (write-rod '#.(string-rod "&") sink)) - ((rune= c #/<) (write-rod '#.(string-rod "<") sink)) - ((rune= c #/>) (write-rod '#.(string-rod ">") sink)) - ((rune= c #/\") (write-rod '#.(string-rod """) sink)) - ((rune= c #/U+0009) (write-rod '#.(string-rod " ") sink)) - ((rune= c #/U+000A) (write-rod '#.(string-rod " ") sink)) - ((rune= c #/U+000D) (write-rod '#.(string-rod " ") sink)) +(defun unparse-datachar (c ystream) + (cond ((rune= c #/&) (write-rod '#.(string-rod "&") ystream)) + ((rune= c #/<) (write-rod '#.(string-rod "<") ystream)) + ((rune= c #/>) (write-rod '#.(string-rod ">") ystream)) + ((rune= c #/\") (write-rod '#.(string-rod """) ystream)) + ((rune= c #/U+0009) (write-rod '#.(string-rod " ") ystream)) + ((rune= c #/U+000A) (write-rod '#.(string-rod " ") ystream)) + ((rune= c #/U+000D) (write-rod '#.(string-rod " ") ystream)) (t - (write-rune c sink)))) + (write-rune c ystream)))) -(defun unparse-datachar-readable (c sink) - (cond ((rune= c #/&) (write-rod '#.(string-rod "&") sink)) - ((rune= c #/<) (write-rod '#.(string-rod "<") sink)) - ((rune= c #/>) (write-rod '#.(string-rod ">") sink)) - ((rune= c #/\") (write-rod '#.(string-rod """) sink)) +(defun unparse-datachar-readable (c ystream) + (cond ((rune= c #/&) (write-rod '#.(string-rod "&") ystream)) + ((rune= c #/<) (write-rod '#.(string-rod "<") ystream)) + ((rune= c #/>) (write-rod '#.(string-rod ">") ystream)) + ((rune= c #/\") (write-rod '#.(string-rod """) ystream)) (t - (write-rune c sink)))) + (write-rune c ystream)))) +(defun %write-rune (c sink) + (write-rune c (sink-ystream sink))) -;;;; UTF-8 output for SINKs - -(defun write-rod (rod sink) - (map nil (lambda (c) (write-rune c sink)) rod)) - -(defun write-rune (rune sink) - (let ((code (rune-code rune))) - (with-slots (high-surrogate) sink - (cond - ((<= #xD800 code #xDBFF) - (setf high-surrogate code)) - ((<= #xDC00 code #xDFFF) - (let ((q (logior (ash (- high-surrogate #xD7C0) 10) - (- code #xDC00)))) - (write-rune-0 q sink)) - (setf high-surrogate nil)) - (t - (write-rune-0 code sink)))))) - -(defun write-rune-0 (code sink) - (labels ((wr (x) - (write-octet x sink))) - (cond ((<= #x00000000 code #x0000007F) - (wr code)) - ((<= #x00000080 code #x000007FF) - (wr (logior #b11000000 (ldb (byte 5 6) code))) - (wr (logior #b10000000 (ldb (byte 6 0) code)))) - ((<= #x00000800 code #x0000FFFF) - (wr (logior #b11100000 (ldb (byte 4 12) code))) - (wr (logior #b10000000 (ldb (byte 6 6) code))) - (wr (logior #b10000000 (ldb (byte 6 0) code)))) - ((<= #x00010000 code #x001FFFFF) - (wr (logior #b11110000 (ldb (byte 3 18) code))) - (wr (logior #b10000000 (ldb (byte 6 12) code))) - (wr (logior #b10000000 (ldb (byte 6 6) code))) - (wr (logior #b10000000 (ldb (byte 6 0) code)))) - ((<= #x00200000 code #x03FFFFFF) - (wr (logior #b11111000 (ldb (byte 2 24) code))) - (wr (logior #b10000000 (ldb (byte 6 18) code))) - (wr (logior #b10000000 (ldb (byte 6 12) code))) - (wr (logior #b10000000 (ldb (byte 6 6) code))) - (wr (logior #b10000000 (ldb (byte 6 0) code)))) - ((<= #x04000000 code #x7FFFFFFF) - (wr (logior #b11111100 (ldb (byte 1 30) code))) - (wr (logior #b10000000 (ldb (byte 6 24) code))) - (wr (logior #b10000000 (ldb (byte 6 18) code))) - (wr (logior #b10000000 (ldb (byte 6 12) code))) - (wr (logior #b10000000 (ldb (byte 6 6) code))) - (wr (logior #b10000000 (ldb (byte 6 0) code))))))) +(defun %write-rod (r sink) + (write-rod r (sink-ystream sink))) ;;;; convenience functions for DOMless XML serialization @@ -632,8 +561,9 @@ data) (defun rod-to-utf8-string (rod) - (with-output-to-string (s) - (write-rod rod (cxml:make-character-stream-sink s)))) + (let ((out (make-buffer :element-type 'character))) + (runes-to-utf8/adjustable-string out rod (length rod)) + out)) (defun utf8-string-to-rod (str) (let* ((bytes (map '(vector (unsigned-byte 8)) #'char-code str))