From 007b129d05a08121236e704e5ae7ba0ee8a8a81d Mon Sep 17 00:00:00 2001
From: dlichteblau Closure XML Parser
@@ -8,93 +38,953 @@
Closure XML was written by Gilbert Baumann
(unk6 at rz.uni-karlsruhe.de) as part of the Closure web
- browser.
+ browser.
Contributions to the parser by
-
-
+
- (SAX layer; namespace support)
-
- (conversion into an independent package; DOM bug fixing)
-
+ Send bug reports to cxml-devel@common-lisp.net + (list + information). +
+ +$ export CVSROOT=:pserver:anonymous@common-lisp.net:/project/cxml/cvsroot +$ cvs login +Logging in to :pserver:anonymous@common-lisp.net:2401/project/cxml/cvsroot +CVS password: anonymous +$ cvs co cxml+ +
+ (David's tla archive is out of date.) +
+ +patch-xyz (200-mm-dd)
+patch-357 (2004-10-10)
+patch-306 (2004-09-03)
+patch-279 (2004-05-11)
+patch-204
+patch-191 (2004-03-18)
+CXML provides three packages:
++ CXML should be portable to all Common Lisp implementations + supporting gray streams. Currently assumed to work are: +
++ Incomplete port: +
++ Optional configuration (skip this unless you know better): CXML + has full Unicode code support -- even on Lisps without Unicode + strings. On non-unicode aware Lisps, DOMString is + implemented as an array of character codes. CXML will auto-detect + at compile-time which string representation to use. To override + the auto-detection, you can set one of the features + :rune-is-character and :rune-is-octet before + loading cxml.asd. (fixme: feature + :rune-is-octet is of course misnamed, since it uses 16bit + runes, not 8bit runes. It will probably be renamed + to :rune-is-integer at some point.)
-
-
- Configuration (optional). - CXML has full Unicode code support -- even on Lisps without - Unicode strings. On non-unicode aware Lisps, DOMString - is implemented as an array of character codes. If your Lisp - supports 16 bit characters natively, you can enable feature - RUNE-IS-CHARACTER to select an alternative - DOMString implementatation, which uses real characters - instead of characters codes. -
* (pushnew :rune-is-character *features*)+ Prerequisites. + CXML needs the puri library.
Compiling and loading CXML. Register the .asd file, e.g. by symlinking it: -
$ ln -sf `pwd`/cxms.asd /path/to/your/registry- Compile CXML using: -
* (asdf:operate 'asdf:load-op :cxml)+
$ ln -sf `pwd`/cxml.asd /path/to/your/registry/+
Then compile CXML using:
+* (asdf:operate 'asdf:load-op :cxml)
-
$ export CVSROOT=:pserver:anonymous@dev.w3.org:/sources/public - $ cvs login # password is "anonymous" - $ cvs co 2001/XML-Test-Suite/xmlconf - $ cvs co 2001/DOM-Test-Suite- Run all applicable tests using: -
* (xmlconf:run-all-tests "/path/to/2001/XML-Test-Suite/xmlconf/") - * (domtest:run-all-tests "/path/to/2001/2001/DOM-Test-Suite/")- (As always in Lisp, the trailing slash is significant.) -
- + You can then try the quick-start example.
+ + +Check out the XML and DOM testsuites:
+$ export CVSROOT=:pserver:anonymous@dev.w3.org:/sources/public +$ cvs login # password is "anonymous" +$ cvs co 2001/XML-Test-Suite/xmlconf +$ cvs co -D '2005-05-06 23:00' 2001/DOM-Test-Suite+
(Omit -D to get the latest version, which may not work + with cxml yet.)
+Usage and expected output:
+* (xmlconf:run-all-tests "/path/to/2001/XML-Test-Suite/xmlconf/") +0/556 tests failed; 1606 tests were skipped +* (domtest:run-all-tests "/path/to/2001/DOM-Test-Suite/") +0/450 tests failed; 71 tests were skipped+ +
fixme: Add an explanation of xml/sax-tests here.
+ ++ fixme My parser does not understand the current testsuite + anymore. To fix this problem, revert the affected files + manually after check-out: +
+ +$ cd 2001/XML-Test-Suite/xmlconf/ +xmltest$ patch -p0 -R </path/to/cxml/test/xmlconf-base.diff+ +
+ The log message for the changes reads "Removed unnecessary + xml:base attribute". If I understand correctly, only + DOM 3 parsers provide the baseURI attribute necessary for + understanding xmlconf.xml now. We don't have that + yet. +
+ + ++ (Compare also with Gilbert Baumann's older TODO list in + xml-parse.lisp.) +
+ + ++ Make sure to install and load cxml first. +
+ +Create a test file called example.xml:
+* (with-open-file (s "example.xml" :direction :output) + (write-string "<test a='b'><child/></test>" s))+ +
Parse example.xml into a DOM tree (read + more):
+* (cxml:parse-file "example.xml" (dom:make-dom-builder)) +#<DOM-IMPL::DOCUMENT @ #x72206172> +;; save result for later: +* (defparameter *example* *) +*EXAMPLE*+ +
Inspect the DOM tree (read more):
+* (dom:document-element *example*) +#<DOM-IMPL::ELEMENT test @ #x722b6ba2> +* (dom:tag-name (dom:document-element *example*)) +"test" +* (dom:child-nodes (dom:document-element *example*)) +#(#<DOM-IMPL::ELEMENT child @ #x722b6d8a>) +* (dom:get-attribute (dom:document-element *example*) "a") +"b"+ +
Serialize the DOM document back into a stream (read more):
+(cxml:unparse-document *example* *standard-output*) +<test a="b"><child></child></test>+ +
As an alternative to DOM, parse into xmls-compatible list + structure (read more):
+* (cxml:parse-file "example.xml" (cxml-xmls:make-xmls-builder))
+("test" (("a" "b")) ("child" NIL))
+
+
+ +
+ Common keyword arguments: +
++
+
+
(cxml:parse-file "test.xml" (dom:make-dom-builder))+ + +
+
Keyword arguments:
++ The following canonical values are allowed: +
++ With an indentation level, pretty-print the XML by + inserting additional whitespace. Note that indentation + changes the document model and should only be used if whitespace + does not matter to the application. +
++ unparse-document-to-octets returns an (unsigned-byte + 8) array, whereas unparse-document writes + characters. unparse-document is useful together + with with-output-to-string. However, note that the + resulting document in both cases is UTF-8 encoded, so the + characters written by unparse-document are really UTF-8 + bytes encoded as characters. +
+ ++
+ These function provide the low-level mechanism used by the DOM + serialization functions. To serialize a document without building + its DOM tree first, create a sink handle and call SAX functions on that + handle. sax:end-document returns the serialized form of + the document described by the SAX events. +
+ ++
+ Example: +
+(with-xml-output (make-octet-stream-sink stream :indentation 2 :canonical nil) + (with-element "foo" + (attribute "xyz" "abc") + (with-element "bar" + (attribute "blub" "bla")) + (text "Hi there.")))+
+ Prints this to stream, which must be an + (unsigned-byte 8) stream: +
+<foo xyz="abc"> + <bar blub="bla"></bar> + Hi there. +</foo>+
+ (Note that these functions accept both strings and rods, so we + could write "foo" instead of #"foo" above.) +
+ ++
+ xhtmlgen is included as contrib/xhtmlgen.lisp in + the cxml distribution. Example: +
+(let ((sink (cxml:make-character-stream-sink *standard-output*))) + (sax:start-document sink) + (xhtml-generator:write-doctype sink) + (xhtml-generator:with-html sink + (:html + (:head + (:title "Titel")) + (:body + ((:p "style" "font-weight: bold") + "Inhalt") + (:ul + (:li "Eins") + (:li "Zwei") + (:li "Drei"))))) + (sax:end-document sink))+ + +
+
(let ((d (parse-file "~/test.xml" (dom:make-dom-builder))) + (x (parse-dtd-file "~/test.dtd"))) + (dom:map-document (cxml:make-validator x #"foo") d))+ +
+
+
+ Like other XML parsers written in Lisp, CXML can work with + documents represented as list structures. The specific model + implemented by cxml is compatible with the xmls parser. Xmls + list structures are a simpler and faster alternative to full DOM + document trees. They also serve as an example showing how to + implement user-defined document models as an independent layer + over the the base parser (c.f. xml/xmls-compat.lisp in + the cxml distribution). However, note that the list structures do + not include all information available in DOM documents and are + sometimes more difficult to work wth since many DOM functions + cannot be implemented on them. +
++
+ Example: +
+(cxml:parse-file "test.xml" (cxml-xmls:make-xmls-builder))+
+
+ Use this function to serialize XMLS data. For example, we could + define a replacement for xmls:write-xml like this: +
+(defun write-xml (stream node &key indent) + (let ((sink (cxml:make-character-stream-sink + stream :canonical nil :indentation indent))) + (cxml-xmls:map-node sink node)))+
+
+ The node list's car can also be a cons of local name + and namespace prefix ns. + fixme: It is unclear to me how namespaces are meant to + work in xmls, since xmls documentation differs from how xmls + actually works in current releases. Usually applications need to + know both the namespace prefix and the namespace URI. We + currently follow the xmls implementation and use the + namespace prefix instead of following its documentation which + shows the URI. We do not follow xmls in munging xmlns attribute + values. Attributes themselves have namespaces and it is not clear + to me how that works in xmls. +
++
+
+ + ++ As explained above, the XML parser handles character encoding and + uses 16bit strings internally. Instead of using characters and strings + it uses runes and rods. This is seen as a + feature, but can be inconvenient. +
++ Note that the recoder approach does not work with the DOM + builder, since DOM is specified to use UTF-16. +
++
+ Example. In a Lisp which ordinarily would use octet vector rods: +
+CL-USER(14): (cxml:parse-string "<test/>" (cxml-xmls:make-xmls-builder)) +(#(116 101 115 116) NIL)+
+ Use a SAX recoder to get strings instead:: +
+CL-USER(17): (parse-string "<test/>" (cxml:make-recoder (cxml-xmls:make-xmls-builder)))
+("test" NIL)
+
+
+ + To avoid spending time parsing the same DTD over and over again, + CXML can cache DTD objects. The parser consults + cxml:*dtd-cache* whenever it is looking for an external + subset in a document which does not have an internal subset and + uses the cached DTD instance if one is present in the cache for + the System ID in question. +
++ Note that DTDs do not expire from the cache automatically. + (Future versions of CXML might introduce automatic checks for + outdated DTDs.) +
++
+
+
+
+
+
+
+ fixme: thread-safety +
+ + ++ External entities (for example, DTDs) are referred to using their + Public and System IDs. Usually the System ID, a URI, is used to + locate the entity. CXML itself handles only file://-URIs, but + many System IDs in practical use are http://-URIs. There are two + different mechanims applications can use to allow CXML to locate + entities using arbitrary Public ID or System ID: +
++ This section describes XML Catalogs, the second solution. CXML + implements Oasis + XML Catalogs. +
++
+
+
+
+
+ Example: +
+* (setf cxml:*catalog* nil) +* (cxml:parse-file "test.xhtml" nil) +=> Error: URI scheme :HTTP not supported + +* (setf cxml:*catalog* (cxml:make-catalog)) +* (cxml:parse-file "test.xhtml" nil) +;; no error! +NIL+
+ Note that parsed catalog files are cached in the catalog object. + Catalog files cached do not expire automatically. To ensure that + all catalog files are parsed again, create a new catalog object. +
+ + ++ A SAX handler is an arbitrary objects that implements some of the + generic functions in the SAX package. Note that no default + handler class is necessary, because all generic functions have default + methods which do nothing. SAX functions are: +
+ The entity declaration methods are similar to Java SAX + definitions, but parameter entities are distinguished from + general entities not by a % prefix to the name, but by + the kind argument, either :parameter or + :general. +
++ The arguments to sax:element-declaration and + sax:attribute-declaration differ significantly from their + Java counterparts. +
++ fixme: For more information on these functions refer to the docstrings. +
+ + + ++ CXML implements the DOM Level 1 Core interfaces. Explaining + DOM is better left to the specification, + so please refer to the official W3C documents for DOM. +
++ However, there is no "standard" DOM mapping for Lisp. DOM + is specified + in CORBA IDL, but it refrains from using object-oriented IDL + features, allowing for a much more natural Lisp implemenation than + the the ordinary IDL/Lisp mapping would. +
++ Differences between CXML's DOM and the direct IDL/Lisp mapping: +
+Example:
+XML(97): (dom:node-type + (dom:document-element + (cxml:parse-file "~/test.xml" (dom:make-dom-builder)))) +:ELEMENTdiff --git a/test/domtest.lisp b/test/domtest.lisp index 596fd3a..478048a 100644 --- a/test/domtest.lisp +++ b/test/domtest.lisp @@ -1,5 +1,5 @@ (defpackage :domtest - (:use :cl :xml) + (:use :cl :cxml) (:export #:run-all-tests)) (defpackage :domtest-tests (:use)) @@ -117,7 +117,10 @@ (write-char #\- out)))))) (defun intern-dom (name) - (intern (replace-studly-caps name) :dom)) + (setf name (replace-studly-caps name)) + (when (eq :foo :FOO) + (setf name (string-upcase name))) + (intern name :dom)) (defun child-elements (element) (map-child-elements 'list #'identity element)) @@ -167,7 +170,7 @@ (defun read-members () (let* ((pathname (merge-pathnames "patches/dom1-interfaces.xml" *directory*)) (builder (dom:make-dom-builder)) - (library (dom:document-element (xml:parse-file pathname builder))) + (library (dom:document-element (cxml:parse-file pathname builder))) (methods '()) (fields '())) (do-child-elements (interface library :name "interface") @@ -189,6 +192,7 @@ (defun translate-condition (element) (string-case (tag-name element) ("equals" (translate-equals element)) + ("notEquals" (translate-not-equals element)) ("contentType" (translate-content-type element)) ("hasFeature" (translate-has-feature element)) ("implementationAttribute" (assert-have-implementation-attribute element)) @@ -197,6 +201,7 @@ ("notNull" (translate-not-null element)) ("or" (translate-or element)) ("same" (translate-same element)) + ("less" (translate-less element)) (t (error "unknown condition: ~A" element)))) (defun equalsp (a b test) @@ -216,13 +221,20 @@ (defun translate-equals (element) (with-attributes (|actual| |expected| |ignoreCase|) element - `(equalsp ,(%intern actual) - ,(parse-java-literal expected) + `(equalsp ,(%intern |actual|) + ,(parse-java-literal |expected|) ',(if (parse-java-literal |ignoreCase|) '%equal '%equal)))) +(defun translate-not-equals (element) + `(not ,(translate-equals element))) + (defun translate-same (element) (with-attributes (|actual| |expected|) element - `(eql ,(%intern actual) ,(parse-java-literal expected)))) + `(eql ,(%intern |actual|) ,(parse-java-literal |expected|)))) + +(defun translate-less (element) + (with-attributes (|actual| |expected|) element + `(< ,(%intern |actual|) ,(parse-java-literal |expected|)))) (defun translate-or (element) `(or ,@(map-child-elements 'list #'translate-condition element))) @@ -257,6 +269,13 @@ (with-attributes (|type|) element `(equal ,|type| "text/xml"))) +#-allegro +(defun translate-uri-equals (element) + (declare (ignore element)) + (warn "oops, assert-uri-equals needs Franz' URI package") + (throw 'give-up nil)) + +#+allegro (defun translate-uri-equals (element) (with-attributes (|actual| @@ -307,6 +326,7 @@ ("assertTrue" (translate-assert-true element)) ("assertFalse" (translate-assert-false element)) ("assertURIEquals" (translate-assert-uri-equals element)) + ("assign" (translate-assign element)) ("for-each" (translate-for-each element)) ("fail" (translate-fail element)) ("hasFeature" (translate-has-feature element)) @@ -327,6 +347,10 @@ `(,fn ,(parse-java-literal |op1|) ,(parse-java-literal |op2|))))) +(defun translate-assign (element) + (with-attributes (|var| |value|) element + (maybe-setf (%intern |var|) (parse-java-literal |value|)))) + (defun translate-unary-assignment (fn element) (with-attributes (|var| |value|) element (maybe-setf (%intern |var|) @@ -529,6 +553,8 @@ (defun assert-have-implementation-attribute (element) (let ((attribute (runes:rod-string (dom:get-attribute element "name")))) (string-case attribute + ("validating" + (setf cxml::*validate* t)) (t (format t "~&implementationAttribute ~A not supported, skipping test~%" attribute) @@ -536,14 +562,15 @@ (defun slurp-test (pathname) (unless *fields* - (multiple-value-setq (*methods* *fields*) (read-members *directory*))) + (multiple-value-setq (*methods* *fields*) (read-members))) (catch 'give-up (let* ((builder (dom:make-dom-builder)) - (test (dom:document-element (xml:parse-file pathname builder))) + (cxml::*validate* nil) ;dom1.dtd is buggy + (test (dom:document-element (cxml:parse-file pathname builder))) title (bindings '()) (code '())) - (declare (ignore title)) + (declare (ignorable title)) (do-child-elements (e test) (string-case (tag-name e) ("metadata" @@ -580,33 +607,36 @@ (setf name (runes:rod-string name)) (let* ((directory (merge-pathnames "tests/level1/core/files/" *directory*)) (document - (xml:parse-file + (cxml:parse-file (make-pathname :name name :type "xml" :defaults directory) (dom:make-dom-builder)))) document)) (defparameter *bad-tests* - '("hc_elementnormalize2.xml" "hc_nodereplacechildnewchildexists.xml")) + '("hc_nodereplacechildnewchildexists.xml" + "characterdatadeletedatanomodificationallowederr.xml")) (defun run-all-tests (*directory* &optional verbose) - (let* ((xml::*redefinition-warning* nil) + (let* ((cxml::*redefinition-warning* nil) (test-directory (merge-pathnames "tests/level1/core/" *directory*)) (all-tests (merge-pathnames "alltests.xml" test-directory)) (builder (dom:make-dom-builder)) - (suite (dom:document-element (xml:parse-file all-tests builder))) + (suite (dom:document-element (cxml:parse-file all-tests builder))) (n 0) (i 0) (ntried 0) (nfailed 0)) (do-child-elements (member suite) (unless - (member (runes:rod-string (dom:get-attribute member "href")) - *bad-tests* - :test 'equal) + (or (equal (dom:tag-name member) "metadata") + (member (runes:rod-string (dom:get-attribute member "href")) + *bad-tests* + :test 'equal)) (incf n))) (do-child-elements (member suite) (let ((href (runes:rod-string (dom:get-attribute member "href")))) - (unless (member href *bad-tests* :test 'equal) + (unless (or (equal (dom:tag-name member) "metadata") + (member href *bad-tests* :test 'equal)) (format t "~&~D/~D ~A~%" i n href) (let ((lisp (slurp-test (merge-pathnames href test-directory)))) (when verbose @@ -615,7 +645,8 @@ (incf ntried) (with-simple-restart (skip-test "Skip this test") (handler-case - (funcall (compile nil lisp)) + (let ((cxml::*validate* nil)) + (funcall (compile nil lisp))) (serious-condition (c) (incf nfailed) (warn "test failed: ~A" c)))))) @@ -625,7 +656,8 @@ (defun run-test (*directory* href) (let* ((test-directory (merge-pathnames "tests/level1/core/" *directory*)) - (lisp (slurp-test (merge-pathnames href test-directory)))) + (lisp (slurp-test (merge-pathnames href test-directory))) + (cxml::*validate* nil)) (print lisp) (when lisp (funcall (compile nil lisp)))))