Update to version 1.2.12 from weitz.de

git-svn-id: svn://bknr.net/svn/trunk/thirdparty/cl-ppcre@1779 4281704c-cde7-0310-8518-8e2dc76b1ff0
This commit is contained in:
Hans Huebner
2005-12-04 14:02:55 +00:00
parent 4122284075
commit bf6913769f
23 changed files with 1602 additions and 1121 deletions

View File

@ -6,14 +6,12 @@
<title>CL-PPCRE - portable Perl-compatible regular expressions for Common Lisp</title>
<style type="text/css">
pre { padding:5px; background-color:#e0e0e0 }
a.none { text-decoration: none; color:black }
a.none:visited { text-decoration: none; color:black }
a.none:active { text-decoration: none; color:black }
a.none:hover { text-decoration: none; color:black }
a { text-decoration: none; }
a:visited { text-decoration: none; }
a:active { text-decoration: underline; }
a:hover { text-decoration: underline; }
a.none:hover { border:1px solid white; }
a { border:1px solid white; }
a:hover { border: 1px solid black; }
a.noborder { border:0px }
a.noborder:hover { border:0px }
</style>
</head>
@ -47,7 +45,7 @@ to CLISP's own regex implementation which is also written in
C.
<li>It is <b>portable</b>, i.e. the code aims to be strictly <a
href="http://www.lispworks.com/reference/HyperSpec/Front/index.htm">ANSI-compliant</a>. If
href="http://www.lispworks.com/documentation/HyperSpec/Front/index.htm">ANSI-compliant</a>. If
you encounter any deviations this is an error and should be
reported to <a
href="#mail">the mailing list</a>. CL-PPCRE has been
@ -55,16 +53,18 @@ successfully tested with the following Common Lisp implementations:
<ul>
<li><a href="http://www.franz.com/products/allegrocl/">Allegro Common Lisp</a> (6.2 trial on Gentoo Linux 1.1a)
<li><a href="http://clisp.sourceforge.net/">CLISP</a> (2.30 on Gentoo Linux 1.1a and 2.29 on Windows XP pro)
<li><a href="http://www.cons.org/cmucl/">CMUCL</a> (18e on Gentoo Linux 1.1a)
<li><a href="http://www.cormanlisp.com/">Corman Lisp</a> (2.5 on Windows XP pro)
<li><a href="http://ecls.sourceforge.net/">ECL</a> (0.9c on Gentoo Linux 1.1a)
<li><a href="http://www.digitool.com/">Macintosh Common Lisp</a> (4.3 demo on MacOS 9.1 - only tested with CL-PPCRE 0.1.x)
<li><a href="http://openmcl.clozure.com/">OpenMCL</a> (0.13.4 on MacOS X 10.2.2 - only tested with CL-PPCRE 0.1.x)
<li><a href="http://sbcl.sourceforge.net/">SBCL</a> (0.8.4 on Gentoo Linux 1.1a)
<li><a href="http://www.scieneer.com/scl/">Scieneer Common Lisp</a> (1.1.1 evaluation on Gentoo Linux 1.1a - only tested with CL-PPCRE 0.1.x)
<li><a href="http://www.lispworks.com/">Xanalys LispWorks</a> (4.2.7 professional on Gentoo Linux 1.1a and 4.3.6 professional on Windows XP pro)
<li><a href="http://www.franz.com/products/allegrocl/">Allegro Common Lisp</a>
<li><a href="http://armedbear.org/abcl.html">Armed Bear Common Lisp</a>
<li><a href="http://clisp.sourceforge.net/">CLISP</a>
<li><a href="http://www.cons.org/cmucl/">CMUCL</a>
<li><a href="http://www.cormanlisp.com/">Corman Lisp</a>
<li><a href="http://ecls.sourceforge.net/">ECL</a>
<li><a href="http://www.symbolics.com/">Genera</a>
<li><a href="http://www.digitool.com/">Macintosh Common Lisp</a>
<li><a href="http://openmcl.clozure.com/">OpenMCL</a>
<li><a href="http://sbcl.sourceforge.net/">SBCL</a>
<li><a href="http://www.scieneer.com/scl/">Scieneer Common Lisp</a>
<li><a href="http://www.lispworks.com/">LispWorks</a>
</ul>
@ -116,14 +116,26 @@ license</b></a> so you can basically do with it whatever you want.
</ul>
CL-PPCRE has been used successfully in various applications like <a
href="http://nostoc.stanford.edu/Docs/">BioLingua</a>, <a
href="http://www.hpc.unm.edu/~download/LoGS/">LoGS</a>, <a href="http://cafespot.net/">CafeSpot</a>, <a href="http://www.eboy.com/">Eboy</a>, or <a
href="http://weitz.de/regex-coach/">The Regex Coach</a>.
<p>
<font color=red>Download shortcut:</font> <a href="http://weitz.de/files/cl-ppcre.tar.gz">http://weitz.de/files/cl-ppcre.tar.gz</a>.
</blockquote>
<br>&nbsp;<br><h3><a class=none name="contents">Contents</a></h3>
<ol>
<li><a href="#howto">How to use CL-PPCRE</a>
<li><a href="#install">Download and installation</a>
<li><a href="#mail">Support and mailing lists</a>
<li><a href="#dict">The CL-PPCRE dictionary</a>
<ol>
<li><a href="#create-scanner1"><code>create-scanner</code></a> (for Perl regex strings)
<li><a href="#create-scanner"><code>create-scanner</code></a> (for Perl regex strings)
<li><a href="#create-scanner2"><code>create-scanner</code></a> (for parse trees)
<li><a href="#parse-tree-synonym"><code>parse-tree-synonym</code></a>
<li><a href="#define-parse-tree-synonym"><code>define-parse-tree-synonym</code></a>
<li><a href="#scan"><code>scan</code></a>
<li><a href="#scan-to-strings"><code>scan-to-strings</code></a>
<li><a href="#register-groups-bind"><code>register-groups-bind</code></a>
@ -148,8 +160,7 @@ license</b></a> so you can basically do with it whatever you want.
<li><a href="#ppcre-syntax-error-string"><code>ppcre-syntax-error-string</code></a>
<li><a href="#ppcre-syntax-error-pos"><code>ppcre-syntax-error-pos</code></a>
</ol>
<li><a href="#install">Download and installation</a>
<li><a href="#mail">Support and mailing lists</a>
<li><a href="#filters">Filters</a>
<li><a href="#test">Testing CL-PPCRE</a>
<li><a href="#perl">Compatibility with Perl</a>
<ol>
@ -173,19 +184,84 @@ license</b></a> so you can basically do with it whatever you want.
<li><a href="#backslash">Backslashes may confuse you...</a>
</ol>
<li><a href="#remarks">Remarks</a>
<li><a href="#allegro">AllegroCL compatibility mode</a>
<li><a href="#ack">Acknowledgements</a>
</ol>
<br>&nbsp;<br><h3><a class=none name="howto">How to use CL-PPCRE</a></h3>
<br>&nbsp;<br><h3><a name="install" class=none>Download and installation</a></h3>
CL-PPCRE together with this documentation can be downloaded from <a
href="http://weitz.de/files/cl-ppcre.tar.gz">http://weitz.de/files/cl-ppcre.tar.gz</a>. The
current version is 1.2.12. A <a
href="CHANGELOG">CHANGELOG</a> is available.
<p>
If you're on <a href="http://www.debian.org/">Debian</a> you should
probably use the <a
href="http://packages.debian.org/cgi-bin/search_packages.pl?keywords=cl-ppcre&searchon=names&version=all&release=all">cl-ppcre
Debian package</a> which is available thanks to <a href="http://pvaneynd.mailworks.org/">Peter van Eynde</a> and <a href="http://b9.com/">Kevin
Rosenberg</a>. There's also a port
for <a href="http://www.cliki.net/gentoo">Gentoo Linux</a> thanks to Matthew Kennedy and a <a href="http://www.freebsd.org/cgi/url.cgi?ports/textproc/cl-ppcre/pkg-descr">FreeBSD port</a> thanks to Henrik Motakef.
Installation via <a
href="http://www.cliki.net/asdf-install">asdf-install</a> should as well
be possible.
<p>
CL-PPCRE comes with simple system definitions for <a
href="http://www.cliki.net/mk-defsystem">MK:DEFSYSTEM</a> and <a
href="http://www.cliki.net/asdf">asdf</a> so you can either adapt it
to your needs or just unpack the archive and from within the CL-PPCRE
directory start your Lisp image and evaluate the form
<code>(mk:compile-system &quot;cl-ppcre&quot;)</code> (or the
equivalent one for asdf) which should compile and load the whole
system.
<p>
If for some reason you don't want to use MK:DEFSYSTEM or asdf you
can just <code>LOAD</code> the file <code>load.lisp</code> or you
can also get away with something like this:
<pre>
(loop for name in '("packages" "specials" "util" "errors" "lexer"
"parser" "regex-class" "convert" "optimize"
"closures" "repetition-closures" "scanner" "api")
do (compile-file (make-pathname :name name
:type "lisp"))
(load name))
</pre>
Note that on CL implementations which use the Python compiler
(i.e. CMUCL, SBCL, SCL) you can concatenate the compiled object files
to create one single object file which you can load afterwards:
<pre>
cat {packages,specials,util,errors,lexer,parser,regex-class,convert,optimize,closures,repetition-closures,scanner,api}.x86f > cl-ppcre.x86f
</pre>
(Replace &quot;.<code>x86f</code>&quot; with the correct suffix for
your platform.)
<p>
Note that there is <em>no</em> public CVS repository for CL-PPCRE - the repository at <a href="http://common-lisp.net/">common-lisp.net</a> is out of date and not in sync with the (current) version distributed from <a href="http://weitz.de/">weitz.de</a>.
<br>&nbsp;<br><h3><a name="mail" class=none>Support and mailing lists</a></h3>
For questions, bug reports, feature requests, improvements, or patches
please use the <a
href="http://common-lisp.net/mailman/listinfo/cl-ppcre-devel">cl-ppcre-devel
mailing list</a>. If you want to be notified about future releases
subscribe to the <a
href="http://common-lisp.net/mailman/listinfo/cl-ppcre-announce">cl-ppcre-announce
mailing list</a>. These mailing lists were made available thanks to
the services of <a href="http://common-lisp.net/">common-lisp.net</a>.
<br>&nbsp;<br><h3><a class=none name="dict">The CL-PPCRE dictionary</a></h3>
CL-PPCRE exports the following symbols:
<p><br>[Function]
<br><a class=none name="create-scanner1"><b>create-scanner</b> <i>string <tt>&amp;key</tt> case-insensitive-mode multi-line-mode single-line-mode extended-mode destructive</i> =&gt; <i>scanner</i></a>
<p><br>[Method]
<br><a class=none name="create-scanner"><b>create-scanner</b> <i>(string string)<tt>&amp;key</tt> case-insensitive-mode multi-line-mode single-line-mode extended-mode destructive</i> =&gt; <i>scanner</i></a>
<blockquote><br> Accepts a string which is a regular expression in
Perl syntax and returns a closure which will scan strings for this
regular expression. The mode keyboard arguments are equivalent to the
regular expression. The mode keyword arguments are equivalent to the
<code>&quot;imsx&quot;</code> modifiers in Perl. The
<code>destructive</code> keyword will be ignored.
<p>
@ -236,12 +312,17 @@ The keyword arguments are just for your
convenience. You can always use embedded modifiers like
<code>&quot;(?i-s)&quot;</code> instead.</blockquote>
<p><br>[Method]
<br><a class=none name="create-scanner"><b>create-scanner</b> <i>(function function)<tt>&amp;key</tt> case-insensitive-mode multi-line-mode single-line-mode extended-mode destructive</i> =&gt; <i>scanner</i></a>
<blockquote><br>
In this case <code><i>function</i></code> should be a scanner returned by another invocation of <code>CREATE-SCANNER</code>. It will be returned as is.
</blockquote>
<p><br>[Function]
<br><a class=none name="create-scanner2"><b>create-scanner</b> <i>parse-tree <tt>&amp;key</tt> case-insensitive-mode multi-line-mode single-line-mode extended-mode destructive</i> =&gt; <i>scanner</i></a>
<p><br>[Method]
<br><a class=none name="create-scanner2"><b>create-scanner</b> <i>(parse-tree t)<tt>&amp;key</tt> case-insensitive-mode multi-line-mode single-line-mode extended-mode destructive</i> =&gt; <i>scanner</i></a>
<blockquote><br>
This is similar to <a
href="#create-scanner1"><code>CREATE-SCANNER</code></a> above but
href="#create-scanner"><code>CREATE-SCANNER</code></a> for regex strings above but
accepts a <em>parse tree</em> as its first argument. A parse tree is an S-expression
conforming to the following syntax:
@ -290,6 +371,11 @@ and <code>:NOT-SINGLE-LINE-MODE-P</code> are equivalent to Perl's
kept local to the innermost enclosing grouping or clustering
construct.
</li><li>All other symbols will signal an error of type <a
href="#ppcre-syntax-error"><code>PPCRE-SYNTAX-ERROR</code></a>
<em>unless</em> they are defined to be <a
href="#parse-tree-synonym"><em>parse tree synonyms</em></a>.
<li><code>(:FLAGS {&lt;modifier&gt;}*)</code> where
<code>&lt;modifier&gt;</code> is one of the modifier symbols from
above is used to group modifier symbols. The modifiers are applied
@ -357,6 +443,14 @@ beginning with 1.
<code>&lt;<i>number</i>&gt;</code> is a positive integer is a back-reference to a
register group.
<li><a class=none name="filterdef"><code>(:FILTER &lt;<i>function</i>&gt; <tt>&amp;optional</tt>
&lt;<i>length</i>&gt;)</code></a> where
<code>&lt;<i>function</i>&gt;</code> is a <a
href="http://www.lispworks.com/documentation/HyperSpec/Body/26_glo_f.htm#function_designator">function
designator</a> and <code>&lt;<i>length</i>&gt;</code> is a
non-negative integer or <code>NIL</code> is a user-defined <a
href="#filters">filter</a>.
<li><code>(:CHAR-CLASS|:INVERTED-CHAR-CLASS
{&lt;<i>item</i>&gt;}*)</code> where <code>&lt;<i>item</i>&gt;</code>
is either a character, a <em>character range</em>, or a symbol for a
@ -379,10 +473,10 @@ Perl regex strings when given to <code>CREATE-SCANNER</code>. To
circumvent this you can always use the equivalent parse tree <code>(:GROUP
&lt;<i>string</i>&gt;)</code> instead.
<p>
Note that currently <code>CREATE-SCANNER</code> doesn't always check
Note that <code>CREATE-SCANNER</code> doesn't always check
for the well-formedness of its first argument, i.e. you are expected
to provide <em>correct</em> parse trees. This will most likely change in
future releases.
to provide <em>correct</em> parse trees.
<p>
The usage of the keyword argument <code>extended-mode</code> obviously
doesn't make sense if <code>CREATE-SCANNER</code> is applied to parse
@ -418,6 +512,72 @@ regex strings to parse trees. Here are some examples:
(:SEQUENCE (:POSITIVE-LOOKAHEAD #\a) #\b)
</pre></blockquote>
<p><br>[Accessor]
<br><a class="none" name="parse-tree-synonym"><b>parse-tree-synonym</b> <i>symbol</i> =&gt; <i>parse-tree</i>
<br><tt>(setf (</tt><b>parse-tree-synonym</b> <i>symbol</i>) <i>new-parse-tree</i><tt>)</tt></a>
</p><blockquote><br>
Any symbol (unless it's a keyword with a special meaning in parse
trees) can be made a "synonym", i.e. an abbreviation, for another parse
tree by this accessor. <code>PARSE-TREE-SYNONYM</code> returns <code>NIL</code> if <code><i>symbol</i></code> isn't a synonym yet.
<p>
Here's an example:
</p><pre>* (cl-ppcre::parse-string "a*b+")
(:SEQUENCE (:GREEDY-REPETITION 0 NIL #\a) (:GREEDY-REPETITION 1 NIL #\b))
* (defun my-repetition (char min)
`(:greedy-repetition ,min nil ,char))
MY-REPETITION
* (setf (parse-tree-synonym 'a*) (my-repetition #\a 0))
(:GREEDY-REPETITION 0 NIL #\a)
* (setf (parse-tree-synonym 'b+) (my-repetition #\b 1))
(:GREEDY-REPETITION 1 NIL #\b)
* (let ((scanner (create-scanner '(:sequence a* b+))))
(dolist (string '("ab" "b" "aab" "a" "x"))
(print (scan scanner string)))
(values))
0
0
0
NIL
NIL
* (parse-tree-synonym 'a*)
(:GREEDY-REPETITION 0 NIL #\a)
* (parse-tree-synonym 'a+)
NIL
</pre></blockquote>
<p><br>[Macro]
<br><a class="none" name="define-parse-tree-synonym"><b>define-parse-tree-synonym</b> <i>name parse-tree</i> =&gt; <i>parse-tree</i></a>
</p><blockquote><br>
This is a convenience macro for parse tree synonyms defined as
<pre>(defmacro define-parse-tree-synonym (name parse-tree)
`(eval-when (:compile-toplevel :load-toplevel :execute)
(setf (parse-tree-synonym ',name) ',parse-tree)))
</pre>
so you can write code like this:
<pre>
(define-parse-tree-synonym a-z
(:char-class (:range #\a #\z) (:range #\a #\z)))
(define-parse-tree-synonym a-z*
(:greedy-repetition 0 nil a-z))
(defun ascii-char-tester (string)
(scan '(:sequence :start-anchor a-z* :end-anchor)
string))
</pre></blockquote>
<p><br>
<b>For the rest of this section </b><code><i>regex</i></code><b> can
always be a string (which is interpreted as a Perl regular
@ -430,7 +590,7 @@ href="#scan"><code>SCAN</code></a><b>.</b>
<p><br>[Function]
<p><br>[Standard Generic Function]
<br><a class=none name="scan"><b>scan</b> <i>regex target-string <tt>&amp;key</tt> start end</i> =&gt; <i>match-start, match-end, reg-starts, reg-ends</i></a>
<blockquote><br>
@ -525,7 +685,15 @@ Examples:
Evaluates <code><i>statement*</i></code> with the variables in <code><i>var-list</i></code> bound to the
corresponding register groups after <code><i>target-string</i></code> has been matched
against <code><i>regex</i></code>, i.e. each variable is either
bound to a string or to <code>NIL</code>. If there is no match, the <code><i>statement*</i></code> forms are <em>not</em>
bound to a string or to <code>NIL</code>.
As a shortcut, the elements of <code><i>var-list</i></code> can also be lists of the form <code>(FN&nbsp;VAR)</code> where <code>VAR</code> is the variable symbol
and <code>FN</code> is a <a
href="http://www.lispworks.com/documentation/HyperSpec/Body/26_glo_f.htm#function_designator">function
designator</a> (which is evaluated) denoting a function which is to be applied to the string before the result is bound to <code>VAR</code>.
To make this even more convenient the form <code>(FN&nbsp;VAR1&nbsp;...VARn)</code> can be used as an abbreviation for
<code>(FN&nbsp;VAR1)&nbsp;...&nbsp;(FN&nbsp;VARn).
<p>
If there is no match, the <code><i>statement*</i></code> forms are <em>not</em>
executed. For each element of
<code><i>var-list</i></code> which is <code>NIL</code> there's no binding to the corresponding register
group. The number of variables in <code><i>var-list</i></code> must not be greater than
@ -537,15 +705,22 @@ share structure with <code><i>target-string</i></code>.
(&quot;((a)|(b)|(c))+&quot; &quot;abababc&quot; :sharedp t)
(list first second third fourth))
(&quot;c&quot; &quot;a&quot; &quot;b&quot; &quot;c&quot;)
* (register-groups-bind (nil second third fourth)
<font color=orange>;; note that we don't bind the first and fifth register group</font>
(&quot;((a)|(b)|(c))()+&quot; &quot;abababc&quot; :start 6)
(list second third fourth))
(NIL NIL &quot;c&quot;)
* (register-groups-bind (first)
(&quot;(a|b)+&quot; &quot;accc&quot; :start 1)
(format t &quot;This will not be printed: ~A&quot; first))
NIL
* (register-groups-bind (fname lname (#'parse-integer date month year))
(&quot;(\\w+)\\s+(\\w+)\\s+(\\d{1,2})\\.(\\d{1,2})\\.(\\d{4})&quot; &quot;Frank Zappa 21.12.1940&quot;)
(list fname lname (encode-universal-time 0 0 0 date month year)))
("Frank" "Zappa" 1292882400)
</pre>
</blockquote>
@ -639,7 +814,7 @@ CROSSFOOT
6
</pre>
Of course, in real life you would do this with <a href="#do-matches"><code>DO-MATCHES</code></a> and use the <code><i>start</i></code> and <code><i>end</i></code> keyword parameters of <a href="http://www.lispworks.com/reference/HyperSpec/Body/f_parse_.htm"><code>PARSE-INTEGER</code></a>.</blockquote>
Of course, in real life you would do this with <a href="#do-matches"><code>DO-MATCHES</code></a> and use the <code><i>start</i></code> and <code><i>end</i></code> keyword parameters of <a href="http://www.lispworks.com/documentation/HyperSpec/Body/f_parse_.htm"><code>PARSE-INTEGER</code></a>.</blockquote>
<p><br>[Macro]
<br><a class=none name="do-register-groups"><b>do-register-groups</b> <i>var-list (regex target-string <tt>&amp;optional</tt> result-form <tt>&amp;key</tt> start end sharedp) declaration* statement*</i> =&gt; <i>result*</i></a>
@ -648,7 +823,7 @@ Of course, in real life you would do this with <a href="#do-matches"><code>DO-MA
Iterates over <code><i>target-string</i></code> and tries to match <code><i>regex</i></code> as often as
possible evaluating <code><i>statement*</i></code> with the variables in <code><i>var-list</i></code> bound to the
corresponding register groups for each match in turn, i.e. each
variable is either bound to a string or to <code>NIL</code>. The number of
variable is either bound to a string or to <code>NIL</code>. You can use the same shortcuts and abbreviations as in <a href="#register-groups-bind"><code>REGISTER-GROUPS-BIND</code></a>. The number of
variables in <code><i>var-list</i></code> must not be greater than the number of register
groups. For each element of
<code><i>var-list</i></code> which is <code>NIL</code> there's no binding to the corresponding register
@ -669,6 +844,14 @@ match. If <code><i>sharedp</i></code> is true, the substrings may share structur
(&quot;b&quot; NIL &quot;b&quot; NIL)
(&quot;c&quot; NIL NIL &quot;c&quot;)
NIL
* (let (result)
(do-register-groups ((#'parse-integer n) (#'intern sign) whitespace)
(&quot;(\\d+)|(\\+|-|\\*|/)|(\\s+)&quot; &quot;12*15 - 42/3&quot;)
(unless whitespace
(push (or n sign) result)))
(nreverse result))
(12 * 15 - 42 / 3)
</pre>
</blockquote>
@ -787,7 +970,7 @@ frob")
<p><br>[Function]
<br><a class=none name="regex-replace"><b>regex-replace</b> <i>regex target-string replacement <tt>&amp;key</tt> start end preserve-case</i> =&gt; <i>list</i></a>
<br><a class=none name="regex-replace"><b>regex-replace</b> <i>regex target-string replacement <tt>&amp;key</tt> start end preserve-case simple-calls</i> =&gt; <i>list</i></a>
<blockquote><br> Try to match <code><i>target-string</i></code>
between <code><i>start</i></code> and <code><i>end</i></code> against
@ -804,7 +987,7 @@ match, <code>&quot;\`&quot;</code> for the part of
<code>N</code>th register where <code>N</code> is a positive integer.
<p>
<code><i>replacement</i></code> can also be a <a
href="http://www.lispworks.com/reference/HyperSpec/Body/26_glo_f.htm#function_designator">function
href="http://www.lispworks.com/documentation/HyperSpec/Body/26_glo_f.htm#function_designator">function
designator</a> in which case the match will be replaced with the
result of calling the function designated by
<code><i>replacement</i></code> with the arguments
@ -816,6 +999,15 @@ result of calling the function designated by
positions of matched registers (or <code>NIL</code>) - the meaning of
the other arguments should be obvious.)
<p>
If <code><i>simple-calls</i></code> is true, a function designated by
<code><i>replacement</i></code> will instead be called with the
arguments <code><i>match</i></code>, <code><i>register-1</i></code>,
..., <code><i>register-n</i></code> where <code><i>match</i></code> is
the whole match as a string and <code><i>register-1</i></code> to
<code><i>register-n</i></code> are the matched registers, also as
strings (or <code>NIL</code>). Note that these strings share structure with
<code><i>target-string</i></code> so you must not modify them.
<p>
Finally, <code><i>replacement</i></code> can be a list where each
element is a string (which will be inserted verbatim), one of the
symbols <code>:match</code>, <code>:before-match</code>, or
@ -829,7 +1021,7 @@ If <code><i>preserve-case</i></code> is true (default is
<code>NIL</code>), the replacement will try to preserve the case (all
upper case, all lower case, or capitalized) of the match. The result
will always be a <a
href="http://www.lispworks.com/reference/HyperSpec/Body/26_glo_f.htm#fresh">fresh</a>
href="http://www.lispworks.com/documentation/HyperSpec/Body/26_glo_f.htm#fresh">fresh</a>
string, even if <code><i>regex</i></code> doesn't match.
<p>
Examples:
@ -860,7 +1052,7 @@ Examples:
<p><br>[Function]
<br><a class=none name="regex-replace-all"><b>regex-replace-all</b> <i>regex target-string replacement <tt>&amp;key</tt> start end preserve-case</i> =&gt; <i>list</i></a>
<br><a class=none name="regex-replace-all"><b>regex-replace-all</b> <i>regex target-string replacement <tt>&amp;key</tt> start end preserve-case simple-calls</i> =&gt; <i>list</i></a>
<blockquote><br>
Like <a href="#regex-replace"><code>REGEX-REPLACE</code></a> but replaces all matches.
@ -912,6 +1104,34 @@ HOW-MANY
"foo{...}bar{.....}{..}baz{....}frob"
(list "[" 'how-many " dots]"))
"foo[3 dots]bar[5 dots][2 dots]baz[4 dots]frob"
* (let ((qp-regex (cl-ppcre:create-scanner "[\\x80-\\xff]")))
(defun encode-quoted-printable (string)
"Convert 8-bit string to quoted-printable representation.
Version using SIMPLE-CALLS keyword argument."
<font color=orange>;; ;; won't work for Corman Lisp because non-ASCII characters aren't 8-bit there</font>
(flet ((convert (match)
(format nil "=~2,'0x" (char-code (char match 0)))))
(cl-ppcre:regex-replace-all qp-regex string #'convert
:simple-calls t))))
Converted ENCODE-QUOTED-PRINTABLE.
ENCODE-QUOTED-PRINTABLE
* (encode-quoted-printable "F&ecirc;te S&oslash;rensen na&iuml;ve H&uuml;hner Stra&szlig;e")
"F=EAte S=F8rensen na=EFve H=FChner Stra=DFe"
* (defun how-many (match first-register)
(declare (ignore match))
(format nil "~A" (length first-register)))
HOW-MANY
* (cl-ppcre:regex-replace-all "{(.+?)}"
"foo{...}bar{.....}{..}baz{....}frob"
(list "[" 'how-many " dots]")
:simple-calls t)
"foo[3 dots]bar[5 dots][2 dots]baz[4 dots]frob"
</pre></blockquote>
<p><br>[Function]
@ -919,7 +1139,7 @@ HOW-MANY
<blockquote><br>
Like <a
href="http://www.lispworks.com/reference/HyperSpec/Body/f_apropo.htm"><code>APROPOS</code></a>
href="http://www.lispworks.com/documentation/HyperSpec/Body/f_apropo.htm"><code>APROPOS</code></a>
but searches for interned symbols which match the regular expression
<code><i>regex</i></code>. The output is implementation-dependent. If
<code><i>case-insensitive</i></code> is true (which is the default)
@ -983,7 +1203,7 @@ FOOBOO [variable] value: 43
<blockquote><br>
Like <a
href="http://www.lispworks.com/reference/HyperSpec/Body/f_apropo.htm"><code>APROPOS-LIST</code></a>
href="http://www.lispworks.com/documentation/HyperSpec/Body/f_apropo.htm"><code>APROPOS-LIST</code></a>
but searches for interned symbols which match the regular expression
<code><i>regex</i></code>. If <code><i>case-insensitive</i></code> is
true (which is the default) and <code><i>regex</i></code> isn't
@ -1001,18 +1221,18 @@ Example (continued from above):
<blockquote><br>This variable controls whether scanners take into
account all characters of your CL implementation or only those the <a
href="http://www.lispworks.com/reference/HyperSpec/Body/f_char_c.htm#char-code"><code>CHAR-CODE</code></a>
href="http://www.lispworks.com/documentation/HyperSpec/Body/f_char_c.htm#char-code"><code>CHAR-CODE</code></a>
of which is not larger than its value. It is only relevant if the
regular expression contains certain character classes. The default is
<a
href="http://www.lispworks.com/reference/HyperSpec/Body/v_char_c.htm"><code>CHAR-CODE-LIMIT</code></a>,
href="http://www.lispworks.com/documentation/HyperSpec/Body/v_char_c.htm"><code>CHAR-CODE-LIMIT</code></a>,
and you might see significant speed and space improvements during
scanner <em>creation</em> if, say, your target strings only contain <a
href="http://wwwwbs.cs.tu-berlin.de/user/czyborra/charsets/">ISO-8859-1</a>
characters and you're using an implementation like AllegroCL,
LispWorks, or CLISP where <code>CHAR-CODE-LIMIT</code> has a value
much higher than 255. The <a href="#test">test suite</a> will
automatically set <code>*REGEX-CHAR-CODE-LIMIT*</code> to 255 while
CLISP, LispWorks, or SBCL where <code>CHAR-CODE-LIMIT</code> has a value
much higher than 256. The <a href="#test">test suite</a> will
automatically set <code>*REGEX-CHAR-CODE-LIMIT*</code> to 256 while
you're running the default test.
<p>
Here's an example with LispWorks:
@ -1028,8 +1248,8 @@ Allocation = 546600 bytes standard / 2162611 bytes fixlen
0 Page faults
#&lt;closure 20654AF2&gt;
CL-USER 24 > (time (let ((cl-ppcre:*regex-char-code-limit* 255)) (cl-ppcre:create-scanner "[3\\D]")))
Timing the evaluation of (LET ((CL-PPCRE:*REGEX-CHAR-CODE-LIMIT* 255)) (CL-PPCRE:CREATE-SCANNER "[3\\D]"))
CL-USER 24 > (time (let ((cl-ppcre:*regex-char-code-limit* 256)) (cl-ppcre:create-scanner "[3\\D]")))
Timing the evaluation of (LET ((CL-PPCRE:*REGEX-CHAR-CODE-LIMIT* 256)) (CL-PPCRE:CREATE-SCANNER "[3\\D]"))
user time = 0.000
system time = 0.000
@ -1042,7 +1262,7 @@ Allocation = 3336 bytes standard / 8338 bytes fixlen
Note: Due to the nature of <code>LOAD-TIME-VALUE</code> and the <a
href="#compiler-macro">compiler macro for <code>SCAN</code></a> some
scanners might be created in a <a
href="http://www.lispworks.com/reference/HyperSpec/Body/26_glo_n.htm#null_lexical_environment">null
href="http://www.lispworks.com/documentation/HyperSpec/Body/26_glo_n.htm#null_lexical_environment">null
lexical environment</a> at load time or at compile time so be careful
to which value <code>*REGEX-CHAR-CODE-LIMIT*</code> is bound at that
time. The default value should always yield correct results unless you
@ -1052,14 +1272,14 @@ play dirty tricks with implementation-dependent behaviour, though.</blockquote>
<br><a class=none name="use-bmh-matchers"><b>*use-bmh-matchers*</b></a>
<blockquote><br>Usually, the scanners created by <a
href="#create-scanner1"><code>CREATE-SCANNER</code></a> (or
href="#create-scanner"><code>CREATE-SCANNER</code></a> (or
implicitely by other functions and macros) will use fast <a
href="http://www-igm.univ-mlv.fr/~lecroq/string/node18.html">Boyer-Moore-Horspool
matchers</a> to check for constant strings at the start or end of the
regular expression. If <code>*USE-BMH-MATCHERS*</code> is
<code>NIL</code> (the default is <code>T</code>), the standard
function <a
href="http://www.lispworks.com/reference/HyperSpec/Body/f_search.htm"><code>SEARCH</code></a>
href="http://www.lispworks.com/documentation/HyperSpec/Body/f_search.htm"><code>SEARCH</code></a>
will be used instead. This will usually be a bit slower but can save
lots of space if you're storing many scanners. The <a
href="#test">test suite</a> will automatically set
@ -1069,7 +1289,7 @@ the default test.
Note: Due to the nature of <code>LOAD-TIME-VALUE</code> and the <a
href="#compiler-macro">compiler macro for <code>SCAN</code></a> some
scanners might be created in a <a
href="http://www.lispworks.com/reference/HyperSpec/Body/26_glo_n.htm#null_lexical_environment">null
href="http://www.lispworks.com/documentation/HyperSpec/Body/26_glo_n.htm#null_lexical_environment">null
lexical environment</a> at load time or at compile time so be careful
to which value <code>*USE-BMH-MATCHERS*</code> is bound at that
time.</blockquote>
@ -1134,7 +1354,7 @@ href="#*allow-quoting*"><code>*ALLOW-QUOTING*</code></a> is
non-word characters (everything except ASCII characters, digits and
underline) of <code>STRING</code> are quoted by prepending a
backslash similar to Perl's <code>quotemeta</code> function. It always returns a <a
href="http://www.lispworks.com/reference/HyperSpec/Body/26_glo_f.htm#fresh">fresh</a>
href="http://www.lispworks.com/documentation/HyperSpec/Body/26_glo_f.htm#fresh">fresh</a>
string.
<pre>
* (cl-ppcre:quote-meta-chars &quot;[a-z]*&quot;)
@ -1147,7 +1367,7 @@ string.
<blockquote><br>
Every error signaled by CL-PPCRE is of type
<code>PPCRE-ERROR</code>. This is a direct subtype of <a
href="http://www.lispworks.com/reference/HyperSpec/Body/e_smp_er.htm"><code>SIMPLE-ERROR</code></a>
href="http://www.lispworks.com/documentation/HyperSpec/Body/e_smp_er.htm"><code>SIMPLE-ERROR</code></a>
without any additional slots or options.
</blockquote>
@ -1210,7 +1430,7 @@ encountered (or <code>NIL</code> if the error happened while trying to
convert a parse tree). This might be particularly useful when <a
href="#*allow-quoting*"><code>*ALLOW-QUOTING*</code></a> is
<em>true</em> because in this case the offending string might not be the one you gave to the <a
href="#create-scanner1"><code>CREATE-SCANNER</code></a> function.
href="#create-scanner"><code>CREATE-SCANNER</code></a> function.
</blockquote>
<p><br>[Function]
@ -1225,69 +1445,185 @@ convert a parse tree).
</blockquote>
<br>&nbsp;<br><h3><a name="install" class=none>Download and installation</a></h3>
<br>&nbsp;<br><h3><a name="filters" class=none>Filters</a></h3>
CL-PPCRE together with this documentation can be downloaded from <a
href="http://weitz.de/files/cl-ppcre.tgz">http://weitz.de/files/cl-ppcre.tgz</a>. The
current version is 0.7.4 - older versions are
available for download through URLs like
<code>http://weitz.de/files/cl-ppcre-&lt;version&gt;.tgz</code>. A <a
href="CHANGELOG">CHANGELOG</a> is available.
Because several users have asked for it, CL-PPCRE now offers
&quot;filters&quot; (see <a href="#filterdef">above</a> for syntax)
which are basically arbitrary, user-defined functions that can act as
regex building blocks. Filters can only be used within <a
href="#create-scanner2">parse trees</a>, not within Perl regex
strings.
<p>
If you're on <a href="http://www.debian.org/">Debian</a> you should
probably use the <a
href="http://packages.debian.org/cgi-bin/search_packages.pl?keywords=cl-ppcre&searchon=names&version=all&release=all">cl-ppcre
Debian package</a> which is available thanks to <a href="http://b9.com/">Kevin
Rosenberg</a>. There's also a port
for <a href="http://www.cliki.net/gentoo">Gentoo Linux</a> thanks to Matthew Kennedy and a <a href="http://www.freebsd.org/cgi/url.cgi?ports/textproc/cl-ppcre/pkg-descr">FreeBSD port</a> thanks to Henrik Motakef.
Installation via <a
href="http://www.cliki.net/asdf-install">asdf-install</a> should as well
be possible.
Note that filters are currently considered an experimental feature and
their API might change in the future.
<p>
CL-PPCRE comes with simple system definitions for <a
href="http://www.cliki.net/mk-defsystem">MK:DEFSYSTEM</a> and <a
href="http://www.cliki.net/asdf">asdf</a> so you can either adapt it
to your needs or just unpack the archive and from within the CL-PPCRE
directory start your Lisp image and evaluate the form
<code>(mk:compile-system &quot;cl-ppcre&quot;)</code> (or the
equivalent one for asdf) which should compile and load the whole
system.
A filter is defined by its <em>filter function</em> which must be a
function of one argument. During the parsing process this function
might be called once or several times or it might not be called at
all. If it's called its argument is an integer <code><i>pos</i></code>
which is the current position within the target string. The filter can
either return <code>NIL</code> (which means that the subexpression
represented by this filter didn't match) or an integer not smaller
than <code><i>pos</i></code> for success. A zero-length assertion
should return <code><i>pos</i></code> itself while a filter which
wants to consume <code>N</code> characters should return
<code>(+&nbsp;POS&nbsp;N)</code>.
<p>
If for some reason you don't want to use MK:DEFSYSTEM or asdf you
can just <code>LOAD</code> the file <code>load.lisp</code> or you
can also get away with something like this:
If you supply the optional value <code><i>length</i></code> and it is
not <code>NIL</code> then this is a promise to the regex engine that
your filter will <em>always</em> consume <em>exactly</em>
<code><i>length</i></code> characters. The regex engine might use this
information for optimization purposes but it is otherwise irrelevant
to the outcome of the matching process.
<p>
The filter function can access the following special variables from
its code body:
<ul>
<li><code>CL-PPCRE::*STRING*</code>: The target (a string) of the
current matching process.
<li><code>CL-PPCRE::*START-POS*</code> and
<code>CL-PPCRE::*END-POS*</code>: The start and end (integers) indices
of the current matching process. These correspond to the
<code>START</code> and <code>END</code> keyword parameters of <a
href="#scan"><code>SCAN</code></a>.
<li><code>CL-PPCRE::*REAL-START-POS*</code>: The initial starting
position. This is only relevant for repeated scans (as in <a
href="#do-scans"><code>DO-SCANS</code></a>) where
<code>CL-PPCRE::*START-POS*</code> will be moved forward while
<code>CL-PPCRE::*REAL-START-POS*</code> won't. For normal scans the
value of this variable is <code>NIL</code>.
<li><CODE>CL-PPCRE::*REG-STARTS*</CODE> and
<CODE>CL-PPCRE::*REG-ENDS*</CODE>: Two simple vectors which denote the
start and end indices of registers within the regular expression. The
first register is indexed by&nbsp;0. If a register hasn't matched yet
then its corresponding entry in <CODE>CL-PPCRE::*REG-STARTS*</CODE> is
<code>NIL</code>.
</ul>
These variables should be considered read-only. Do <em>not</em> change
these values unless you really know what you're doing!
<p>
Note that the names of the variables are not exported from the
<code>CL-PPCRE</code> package because there's currently no guarantee
that they will be available in future releases.
<p>
Here are some filter examples:
<pre>
(loop for name in '("packages" "specials" "util" "errors" "lexer"
"parser" "regex-class" "convert" "optimize"
"closures" "repetition-closures" "scanner" "api")
do (compile-file (make-pathname :name name
:type "lisp"))
(load name))
* (defun my-info-filter (pos)
&quot;Show some info about the matching process.&quot;
(format t &quot;Called at position ~A~%&quot; pos)
(loop with dim = (array-dimension cl-ppcre::*reg-starts* 0)
for i below dim
for reg-start = (aref cl-ppcre::*reg-starts* i)
for reg-end = (aref cl-ppcre::*reg-ends* i)
do (format t &quot;Register ~A is currently &quot; (1+ i))
when reg-start
(write-string cl-ppcre::*string* nil
do (write-char #\')
(write-string cl-ppcre::*string* nil
:start reg-start :end reg-end)
(write-char #\')
else
do (write-string &quot;unbound&quot;)
do (terpri))
(terpri)
pos)
MY-INFO-FILTER
* (scan '(:sequence
(:register
(:greedy-repetition 0 nil
(:char-class (:range #\a #\z))))
(:filter my-info-filter 0) &quot;X&quot;)
&quot;bYcdeX&quot;)
Called at position 1
Register 1 is currently 'b'
Called at position 0
Register 1 is currently ''
Called at position 1
Register 1 is currently ''
Called at position 5
Register 1 is currently 'cde'
2
6
#(2)
#(5)
* (scan '(:sequence
(:register
(:greedy-repetition 0 nil
(:char-class (:range #\a #\z))))
(:filter my-info-filter 0) &quot;X&quot;)
&quot;bYcdeZ&quot;)
NIL
* (defun my-weird-filter (pos)
&quot;Only match at this point if either pos is odd and the character
we're looking at is lowerrcase or if pos is even and the next two
characters we're looking at are uppercase. Consume these characters if
there's a match.&quot;
(format t &quot;Trying at position ~A~%&quot; pos)
(cond ((and (oddp pos)
(&lt; pos cl-ppcre::*end-pos*)
(lower-case-p (char cl-ppcre::*string* pos)))
(1+ pos))
((and (evenp pos)
(&lt; (1+ pos) cl-ppcre::*end-pos*)
(upper-case-p (char cl-ppcre::*string* pos))
(upper-case-p (char cl-ppcre::*string* (1+ pos))))
(+ pos 2))
(t nil)))
MY-WEIRD-FILTER
* (defparameter *weird-regex*
`(:sequence &quot;+&quot; (:filter ,#'my-weird-filter) &quot;+&quot;))
*WEIRD-REGEX*
* (scan *weird-regex* &quot;+A++a+AA+&quot;)
Trying at position 1
Trying at position 3
Trying at position 4
Trying at position 6
5
9
#()
#()
* (fmakunbound 'my-weird-filter)
MY-WEIRD-FILTER
* (scan *weird-regex* &quot;+A++a+AA+&quot;)
Trying at position 1
Trying at position 3
Trying at position 4
Trying at position 6
5
9
#()
#()
</pre>
Note that on CL implementations which use the Python compiler
(i.e. CMUCL, SBCL, SCL) you can concatenate the compiled object files
to create one single object file which you can load afterwards:
Note that in the second call to <code>SCAN</code> our filter wasn't
invoked at all - it was optimized away by the regex engine because it
knew that it couldn't match. Also note that <code>*WEIRD-REGEX*</code>
still worked after we removed the global function definition of
<code>MY-WEIRD-FILTER</code> because the regular expression had
captured the original definition.
<pre>
cat {packages,specials,util,errors,lexer,parser,regex-class,convert,optimize,closures,repetition-closures,scanner,api}.x86f > cl-ppcre.x86f
</pre>
<p>
(Replace &quot;.<code>x86f</code>&quot; with the correct suffix for
your platform.)
<br>&nbsp;<br><h3><a name="mail" class=none>Support and mailing lists</a></h3>
For questions, bug reports, feature requests, improvements, or patches
please use the <a
href="http://common-lisp.net/mailman/listinfo/cl-ppcre-devel">cl-ppcre-devel
mailing list</a>. If you want to be notified about future releases
subscribe to the <a
href="http://common-lisp.net/mailman/listinfo/cl-ppcre-announce">cl-ppcre-announce
mailing list</a>. These mailing lists were made available thanks to
the services of <a href="http://common-lisp.net/">common-lisp.net</a>.
For more ideas about what you can do with filters see <a
href="http://common-lisp.net/pipermail/cl-ppcre-devel/2004-October/000069.html">this
thread</a> on the <a href="#mail">mailing list</a>.
<br>&nbsp;<br><h3><a name="test" class=none>Testing CL-PPCRE</a></h3>
@ -1317,7 +1653,7 @@ NIL
* (cl-ppcre-test:test)
<font color=orange>;; ....
;; (a list of <a href="#perl">incompatibilities with Perl</a>)</font color=orange>
;; (a list of <a class=noborder href="#perl">incompatibilities with Perl</a>)</font color=orange>
</pre>
(If you're not using MK:DEFSYSTEM or asdf it suffices to build
@ -1398,7 +1734,7 @@ translates <code>&quot;\r&quot;</code> to <code>(CODE-CHAR
<h4><a name="alpha" class=none>What about <code>&quot;\w&quot;</code>?</a></h4>
CL-PPCRE uses <a
href="http://www.lispworks.com/reference/HyperSpec/Body/f_alphan.htm"><code>ALPHANUMERICP</code></a>
href="http://www.lispworks.com/documentation/HyperSpec/Body/f_alphan.htm"><code>ALPHANUMERICP</code></a>
to decide whether a character matches Perl's
<code>&quot;\w&quot;</code>, so depending on your CL implementation
you might encounter differences between Perl and CL-PPCRE when
@ -1410,7 +1746,7 @@ matching non-ASCII characters.
The <a href="">CL-PPCRE test suite</a> can also be used for
benchmarking purposes: If you call <code>perltest.pl</code> with a
command line argument it will be interpreted as the number of seconds
command line argument it will be interpreted as the minimum number of seconds
each test should run. Perl will time its tests accordingly and create
output which, when fed to <code>CL-PPCRE-TEST:TEST</code>, will result
in a benchmark. Here's an example:
@ -1554,13 +1890,13 @@ for you automatically.
<p>
However, beginning with version&nbsp;0.5.2, CL-PPCRE uses a <a
name="compiler-macro"
href="http://www.lispworks.com/reference/HyperSpec/Body/26_glo_c.htm#compiler_macro">compiler
href="http://www.lispworks.com/documentation/HyperSpec/Body/26_glo_c.htm#compiler_macro">compiler
macro</a> and <a
href="http://www.lispworks.com/reference/HyperSpec/Body/s_ld_tim.htm"><code>LOAD-TIME-VALUE</code></a>
href="http://www.lispworks.com/documentation/HyperSpec/Body/s_ld_tim.htm"><code>LOAD-TIME-VALUE</code></a>
to make sure that the scanner is only built once if the first argument
to <a href="#scan"><code>SCAN</code></a>, <a href="#scan-to-strings"><code>SCAN-TO-STRINGS</code></a>, <a href="#split"><code>SPLIT</code></a>, or
<a href="#regex-replace"><code>REGEX-REPLACE</code></a> is a <a
href="http://www.lispworks.com/reference/HyperSpec/Body/26_glo_c.htm#constant_form">constant
to <a href="#scan"><code>SCAN</code></a>, <a href="#scan-to-strings"><code>SCAN-TO-STRINGS</code></a>, <a href="#split"><code>SPLIT</code></a>,
<a href="#regex-replace"><code>REGEX-REPLACE</code></a>, or <a href="#regex-replace-all"><code>REGEX-REPLACE-ALL</code></a> is a <a
href="http://www.lispworks.com/documentation/HyperSpec/Body/26_glo_c.htm#constant_form">constant
form</a>. (But see the notes for <a
href="#regex-char-code-limit"><code>*REGEX-CHAR-CODE-LIMIT*</code></a> and
<a href="#use-bmh-matchers"><code>*USE-BMH-MATCHERS*</code></a>.)
@ -1674,7 +2010,7 @@ target strings.
<p>
Another thing to consider is that, for performance reasons, CL-PPCRE
assumes that most of the target strings you're trying to match are <a
href="http://www.lispworks.com/reference/HyperSpec/Body/t_smp_st.htm">simple
href="http://www.lispworks.com/documentation/HyperSpec/Body/t_smp_st.htm">simple
strings</a> and coerces non-simple strings to simple strings before
scanning them. If you plan on working with non-simple strings mostly
you might consider modifying the CL-PPCRE source code. This is easy:
@ -1746,6 +2082,8 @@ TARGET
With CMUCL the situation is better and worse at the same time. It will
take a lot longer until CMUCL gives up but if it gives up the whole
Lisp image will silently die (at least on my machine):
<p>
[Note: This was true for CMUCL&nbsp;18e - CMUCL&nbsp;19a behaves in a much nicer way and gives you a chance to recover.]
<pre>
* (defun target (n) (concatenate 'string (make-string n :initial-element #\a) "b"))
@ -1900,6 +2238,50 @@ IBM Thinkpad T23 laptop (Pentium&nbsp;III 1.2&nbsp;GHz,
768&nbsp;MB&nbsp;RAM) running <a href="http://www.gentoo.org/">Gentoo
Linux</a> 1.1a.
<br>&nbsp;<br><h3><a class=none name="allegro">AllegroCL compatibility mode</a></h3>
Since autumn 2004 <a
href="http://www.franz.com/products/allegrocl/">AllegroCL</a> offers
<a
href="http://www.franz.com/support/documentation/7.0/doc/regexp.htm">a
new regular expression API</a> with a syntax very similar to
CL-PPCRE. Although CL-PPCRE is quite fast already, AllegroCL's engine will
most likely be even faster (but only on AllegroCL, of course). However, you might want to
stick to CL-PPCRE because you have a "legacy" application or because
you want your code to be portable to other Lisp implementations.
Therefore, beginning from version 1.2.0, CL-PPCRE offers a
"compatibility mode" where you can continue using the CL-PPCRE API as
described <a href="#dict">above</a> but deploy the AllegroCL regex
engine under the hood. (The details are: Calls to <a
href="#create-scanner"><code>CREATE-SCANNER</code></a> and <a
href="#scan"><code>SCAN</code></a> are dispatched to their AllegroCL
counterparts <a
href="http://www.franz.com/support/documentation/7.0/doc/operators/excl/compile-re.htm"><code>EXCL:COMPILE-RE</code></a>
and <a
href="http://www.franz.com/support/documentation/7.0/doc/operators/excl/match-re.htm"><code>EXCL:MATCH-RE</code></a>
while everything else is left as is.)
<p>
The advantage of this mode is that you'll get a much smaller image and
most likely faster code. (But note that CL-PPCRE needs to do a small amount of work to massage AllegroCL's output into the format expected by CL-PPCRE.) The downside is that your code won't be
fully compatible with CL-PPCRE anymore. Here are some of the
differences (most of which probably don't matter very often):
<ul>
<li>The AllegroCL engine doesn't offer <a
href="#parse-tree-synonym">parse tree synonyms</a> and <a href="#filters">filters</a>.
<li>The AllegroCL engine <a href="http://www.franz.com/support/documentation/7.0/doc/regexp.htm#regexp-new-compatibility-2">will choke on some regular expressions involving curly braces</a> that are accepted by Perl and CL-PPCRE's native engine.
<li>The AllegroCL engine's case-folding mode switch (which is used instead of CL-PPCRE's <a href="#create-scanner"><code>:CASE-INSENSITIVE</code> keyword parameter</a>) <a href="http://www.franz.com/support/documentation/7.0/doc/regexp.htm#regexp-new-matching-2">is currently only effective for ASCII characters</a>.
<li>CL-PPCRE's engine doesn't understand the <a href="http://www.franz.com/support/documentation/7.0/doc/regexp.htm#regexp-new-capturing-2">named register groups</a> provided by AllegroCL.
<li>The AllegroCL engine <a href="http://www.franz.com/support/documentation/7.0/doc/regexp.htm#regexp-new-compatibility-2">doesn't support</a> <a href="#*allow-quoting*">quoting of metacharacters</a>.
<li>In AllegroCL compatibility mode compiled regular expressions (as returned by <a href="#create-scanner"><code>CREATE-SCANNER</code></a>) aren't functions but structures.
</ul>
For more details about the AllegroCL engine and possible deviations from CL-PPCRE see the <a href="http://www.franz.com/support/documentation/7.0/doc/regexp.htm">documentation</a> at the <a href="http://www.franz.com/">Franz Inc. website</a>.
<p>
To use the AllegroCL compatibility mode you have to
<pre>
(push :use-acl-regexp2-engine *features*)
</pre>
<em>before</em> you compile CL-PPCRE.
<br>&nbsp;<br><h3><a class=none name="ack">Acknowledgements</a></h3>
Although I didn't use their code I was heavily inspired by looking at
@ -1927,7 +2309,7 @@ where I wrote most of the code and thanks to my wife for lending me
her PowerBook to test CL-PPCRE with MCL and OpenMCL.
<p>
$Header: /home/manuel/bknr-cvs/cvs/thirdparty/cl-ppcre/doc/index.html,v 1.1 2004/06/23 08:27:10 hans Exp $
$Header: /usr/local/cvsrep/cl-ppcre/doc/index.html,v 1.131 2005/11/01 09:51:02 edi Exp $
<p><a href="http://weitz.de/index.html">BACK TO MY HOMEPAGE</a>
</body>