SCAN function
-- yields the expected results. Other functions like SPLIT, ALL-MATCHES, REGEX-REPLACE, REGEX-APROPOS, or the DO-macros have only been tested
-on CMUCL and LispWorks which were my main development platforms.
-
-SPLIT function, a couple of DO-like loop constructs, and a regex-based APROPOS feature
-similar to the one found in Emacs.
+create-scanner (for Perl regex strings)
- create-scanner (for parse trees)
- parse-tree-synonym
- define-parse-tree-synonym
- scan
- scan-to-strings
- register-groups-bind
- do-scans
- do-matches
- do-matches-as-strings
- do-register-groups
- all-matches
- all-matches-as-strings
- split
- regex-replace
- regex-replace-all
- regex-apropos
- regex-apropos-list
- *regex-char-code-limit*
- *use-bmh-matchers*
- *allow-quoting*
- *allow-named-registers*
- quote-meta-chars
- ppcre-error
- ppcre-invocation-error
- ppcre-syntax-error
- ppcre-syntax-error-string
- ppcre-syntax-error-pos
+ create-scanner (for Perl regex strings)
+ create-scanner (for parse trees)
+ scan
+ scan-to-strings
+ register-groups-bind
+ do-scans
+ do-matches
+ do-matches-as-strings
+ do-register-groups
+ all-matches
+ all-matches-as-strings
+ split
+ regex-replace
+ regex-replace-all
+ undef in $1, $2, etc.
@@ -177,21 +133,15 @@ href="http://weitz.de/regex-coach/">The Regex Coach.
$1, $2, etc.
"\r" doesn't work with MCL
"\w"?
-If you're on Debian, you should -probably use the cl-ppcre -Debian package which is available thanks to Peter van Eynde and Kevin -Rosenberg. There's also a port -for Gentoo Linux thanks to Matthew Kennedy and a FreeBSD port thanks to Henrik Motakef. -Installation via asdf-install should as well -be possible. +CL-PPCRE comes with a system definition +for ASDF and you compile and +load it in the usual way. There are no dependencies (except that the +test suite which is not needed for normal operation depends +on FLEXI-STREAMS).
-CL-PPCRE comes with simple system definitions for MK:DEFSYSTEM and asdf so you can either adapt it
-to your needs or just unpack the archive and from within the CL-PPCRE
-directory start your Lisp image and evaluate the form
-(mk:compile-system "cl-ppcre") (or the
-equivalent one for asdf) which should compile and load the whole
-system.
+CL-PPCRE is integrated into the package/port systems
+of Debian, Gentoo,
+and FreeBSD, but before you
+install it from there, you should check if they actually offer the
+latest release. Installation
+via ASDF-Install
+should as well be possible.
-If for some reason you don't want to use MK:DEFSYSTEM or asdf, you
-can just LOAD the file load.lisp or you
-can also get away with something like this:
-
+You can run a test suite which tests most aspects of the library with
-(loop for name in '("packages" "specials" "util" "errors" "lexer"
- "parser" "regex-class" "convert" "optimize"
- "closures" "repetition-closures" "scanner" "api")
- do (compile-file (make-pathname :name name
- :type "lisp"))
- (load name))
+(asdf:oos 'asdf:test-op :cl-ppcre)
-
-Note that on CL implementations which use the Python compiler
-(i.e. CMUCL, SBCL, SCL) you can concatenate the compiled object files
-to create one single object file which you can load afterwards:
-
-
-cat {packages,specials,util,errors,lexer,parser,regex-class,convert,optimize,closures,repetition-closures,scanner,api}.x86f > cl-ppcre.x86f
-
-
-(Replace ".x86f" with the correct suffix for
-your platform.)
--Note that there is no public CVS repository for CL-PPCRE - the repository at common-lisp.net is out of date and not in sync with the (current) version distributed from weitz.de.
Luís Oliveira maintains a darcs
repository of CL-PPCRE
@@ -270,7 +194,7 @@ If you want to send patches, please read
-The function accepts most of the regex syntax of Perl 5 as described
-in
-Since version 0.6.0 CL-PPCRE also supports Perl's
-Since version 1.3.0 CL-PPCRE offers support for AllegroCL's
+Since version 2.0.0, CL-PPCRE
+supports named properties
+(
The keyword arguments are just for your
convenience. You can always use embedded modifiers like
@@ -334,8 +262,11 @@ convenience. You can always use embedded modifiers like
If you want to find out how parse trees are related to Perl regex
strings, you should play around with
-
-Here's an example:
-
-
-Examples:
-Examples:
Examples:
Example:
-Example:
Example:
-Examples:
-Examples:
-Beginning with CL-PPCRE 0.2.0, this function also tries hard to be
-Perl-compatible - thus the somewhat peculiar behaviour. But note that
-it hasn't been as extensively tested as
-Examples:
+This function also tries hard to be
+Perl-compatible - thus the somewhat peculiar behaviour.
-Examples:
-
-Examples:
-
-Here are examples for CMUCL:
+
-Example (continued from above):
+
-Here's an example with LispWorks:
-
-
Note: Due to the nature of
The CL-PPCRE dictionary
-CL-PPCRE exports the following symbols:
+Scanning
[Method]
create-scanner (string string)&key case-insensitive-mode multi-line-mode single-line-mode extended-mode destructive => scanner, register-names
@@ -283,9 +207,8 @@ The mode keyword arguments are equivalent to the
"imsx" modifiers in Perl. The
destructive keyword will be ignored.
man
+The function accepts most of the regex syntax of Perl 5.8 as described
+in and man
perlre including extended features like non-greedy
repetitions, positive and negative look-ahead and look-behind
assertions, "standalone" subexpressions, and conditional
@@ -303,13 +226,12 @@ they obviously don't make sense in Lisp.
because they're actually not part of Perl's regex syntax and
(honestly) because I was too lazy - but see CL-INTERPOL.
-\pP and \PP (named properties),
-\X (extended Unicode), and \C (single
+\X (extended Unicode), and \C (single
character). But you can of course use all characters
supported by your CL implementation.
-[[:alpha]]. I
-might add this in the future.
+[[:alpha]].
+Use Unicode properties instead.
\G for Perl's pos() because we don't have it.
@@ -323,10 +245,16 @@ codes), \c[ (control characters), \w,
\D, \b, \B, \A,
\Z, and \z are supported.
\Q and \E - see \Q\E - see *ALLOW-QUOTING* below. Make sure you also read the relevant section in "Bugs and problems."
(?<name>"<regex>") named registers and \k<name> back-references syntax, have a look at *ALLOW-NAMED-REGISTERS* for details.
+Since version 1.3.0, CL-PPCRE offers support for AllegroCL's (?<name>"<regex>") named registers and \k<name> back-references syntax, have a look at *ALLOW-NAMED-REGISTERS* for details.
+\p and \P), but only the long form with
+braces is supported, i.e. \p{Letter}
+and \p{L} will work while \pL won't.
-
-In this case function should be a scanner returned by another invocation of CREATE-SCANNER. It will be returned as is.
+
In this case function should be a
+scanner returned by another invocation
+of CREATE-SCANNER. It will be returned as is. You can't
+use any of the keyword arguments because the scanner has already been
+created and is immutable.
[Method]
@@ -473,6 +404,12 @@ is found. See *ALLOW-NAMED-REGISTERS*
for more information.
+(:PROPERTY|:INVERTED-PROPERTY <property>) is
+a named property (or its inverse) with
+<property> being a function designator or a
+string which must be resolved
+by *PROPERTY-RESOLVER*.
+
(:FILTER <function> &optional
<length>) where
<function> is a .
(:CHAR-CLASS|:INVERTED-CHAR-CLASS
{<item>}*) where <item>
-is either a character, a character range, or a symbol for a
-special character class (see above) will be translated into a (one
-character wide) character class. A character range looks like
+is either a character, a character range, a named property
+(see above), or a symbol for a special character class (see above)
+will be translated into a (one character wide) character
+class. A character range looks like
(:RANGE <char1> <char2>) where
<char1> and
<char2> are characters such that
@@ -523,104 +461,37 @@ If destructive is not NIL (the default is
CL-PPCRE::PARSE-STRING - a function which converts Perl
-regex strings to parse trees. Here are some examples:
+PARSE-STRING:
-* (cl-ppcre::parse-string "(ab)*")
+* (parse-string "(ab)*")
(:GREEDY-REPETITION 0 NIL (:REGISTER "ab"))
-* (cl-ppcre::parse-string "(a(b))")
+* (parse-string "(a(b))")
(:REGISTER (:SEQUENCE #\a (:REGISTER #\b)))
-* (cl-ppcre::parse-string "(?:abc){3,5}")
+* (parse-string "(?:abc){3,5}")
(:GREEDY-REPETITION 3 5 (:GROUP "abc"))
;; (:GREEDY-REPETITION 3 5 "abc") would also be OK
-* (cl-ppcre::parse-string "a(?i)b(?-i)c")
+* (parse-string "a(?i)b(?-i)c")
(:SEQUENCE #\a
(:SEQUENCE (:FLAGS :CASE-INSENSITIVE-P)
(:SEQUENCE #\b (:SEQUENCE (:FLAGS :CASE-SENSITIVE-P) #\c))))
;; same as (:SEQUENCE #\a :CASE-INSENSITIVE-P #\b :CASE-SENSITIVE-P #\c)
-* (cl-ppcre::parse-string "(?=a)b")
+* (parse-string "(?=a)b")
(:SEQUENCE (:POSITIVE-LOOKAHEAD #\a) #\b)
[Accessor]
-
parse-tree-synonym symbol => parse-tree
-
(setf (parse-tree-synonym symbol) new-parse-tree)
-
-
-
-
-Any symbol (unless it's a keyword with a special meaning in parse
-trees) can be made a "synonym", i.e. an abbreviation, for another parse
-tree by this accessor. PARSE-TREE-SYNONYM returns NIL if symbol isn't a synonym yet.
-* (cl-ppcre::parse-string "a*b+")
-(:SEQUENCE (:GREEDY-REPETITION 0 NIL #\a) (:GREEDY-REPETITION 1 NIL #\b))
-
-* (defun my-repetition (char min)
- `(:greedy-repetition ,min nil ,char))
-MY-REPETITION
-
-* (setf (parse-tree-synonym 'a*) (my-repetition #\a 0))
-(:GREEDY-REPETITION 0 NIL #\a)
-
-* (setf (parse-tree-synonym 'b+) (my-repetition #\b 1))
-(:GREEDY-REPETITION 1 NIL #\b)
-
-* (let ((scanner (create-scanner '(:sequence a* b+))))
- (dolist (string '("ab" "b" "aab" "a" "x"))
- (print (scan scanner string)))
- (values))
-0
-0
-0
-NIL
-NIL
-
-* (parse-tree-synonym 'a*)
-(:GREEDY-REPETITION 0 NIL #\a)
-
-* (parse-tree-synonym 'a+)
-NIL
-
[Macro]
-
define-parse-tree-synonym name parse-tree => parse-tree
-
-
-
-This is a convenience macro for parse tree synonyms defined as
-
-(defmacro define-parse-tree-synonym (name parse-tree)
- `(eval-when (:compile-toplevel :load-toplevel :execute)
- (setf (parse-tree-synonym ',name) ',parse-tree)))
-
-
-so you can write code like this:
-
-
-(define-parse-tree-synonym a-z
- (:char-class (:range #\a #\z) (:range #\A #\Z)))
-
-(define-parse-tree-synonym a-z*
- (:greedy-repetition 0 nil a-z))
-
-(defun ascii-char-tester (string)
- (scan '(:sequence :start-anchor a-z* :end-anchor)
- string))
-
-For the rest of this section regex can
+For the rest of the dictionary, regex can
always be a string (which is interpreted as a Perl regular
expression), a parse tree, or a scanner created by
-CREATE-SCANNER. The
+CREATE-SCANNER. The
start and end
-keyword parameters are always used as in SCAN.
+keyword parameters are always used as in SCAN.
@@ -645,41 +516,39 @@ parameter real-start-pos. This one should
target-string between start
and end were a standalone string, i.e. look-aheads
and look-behinds can't look beyond these boundaries.
-
-* (cl-ppcre:scan "(a)*b" "xaaabd")
+* (scan "(a)*b" "xaaabd")
1
5
#(3)
#(4)
-* (cl-ppcre:scan "(a)*b" "xaaabd" :start 1)
+* (scan "(a)*b" "xaaabd" :start 1)
1
5
#(3)
#(4)
-* (cl-ppcre:scan "(a)*b" "xaaabd" :start 2)
+* (scan "(a)*b" "xaaabd" :start 2)
2
5
#(3)
#(4)
-* (cl-ppcre:scan "(a)*b" "xaaabd" :end 4)
+* (scan "(a)*b" "xaaabd" :end 4)
NIL
-* (cl-ppcre:scan '(:GREEDY-REPETITION 0 NIL #\b) "bbbc")
+* (scan '(:greedy-repetition 0 nil #\b) "bbbc")
0
3
#()
#()
-* (cl-ppcre:scan '(:GREEDY-REPETITION 4 6 #\b) "bbbc")
+* (scan '(:greedy-repetition 4 6 #\b) "bbbc")
NIL
-* (let ((s (cl-ppcre:create-scanner "(([a-c])+)x")))
- (cl-ppcre:scan s "abcxy"))
+* (let ((s (create-scanner "(([a-c])+)x")))
+ (scan s "abcxy"))
0
4
#(0 2)
@@ -698,18 +567,16 @@ function returns two values on success: the whole match as a string
plus an array of substrings (or
at load time or at compile time so be careful
to which value NILs) corresponding to
the matched registers. If sharedp is true, the substrings may share structure with
target-string.
-
-* (cl-ppcre:scan-to-strings "[^b]*b" "aaabd")
+* (scan-to-strings "[^b]*b" "aaabd")
"aaab"
#()
-* (cl-ppcre:scan-to-strings "([^b])*b" "aaabd")
+* (scan-to-strings "([^b])*b" "aaabd")
"aaab"
#("a")
-* (cl-ppcre:scan-to-strings "(([^b])*)b" "aaabd")
+* (scan-to-strings "(([^b])*)b" "aaabd")
"aaab"
#("aaa" "a")
@@ -736,7 +603,6 @@ executed. For each element of
group. The number of variables in var-list must not be greater than
the number of register groups. If sharedp is true, the substrings may
share structure with target-string.
-
* (register-groups-bind (first second third fourth)
("((a)|(b)|(c))+" "abababc" :sharedp t)
@@ -756,8 +622,8 @@ NIL
* (register-groups-bind (fname lname (#'parse-integer date month year))
("(\\w+)\\s+(\\w+)\\s+(\\d{1,2})\\.(\\d{1,2})\\.(\\d{4})" "Frank Zappa 21.12.1940")
- (list fname lname (encode-universal-time 0 0 0 date month year)))
-("Frank" "Zappa" 1292882400)
+ (list fname lname (encode-universal-time 0 0 0 date month year 0)))
+("Frank" "Zappa" 1292889600)
@@ -795,11 +661,10 @@ usage.
@@ -923,19 +783,18 @@ Examples:
Like
Like DO-SCANS but doesn't bind
variables to the register arrays.
-
* (defun foo (regex target-string &key (start 0) (end (length target-string)))
(let ((sum 0))
- (cl-ppcre:do-matches (s e regex target-string nil :start start :end end)
+ (do-matches (s e regex target-string nil :start start :end end)
(incf sum (- e s)))
(format t "~,2F% of the string was inside of a match~%"
;; note: doesn't check for division by zero
@@ -826,12 +691,10 @@ Like DO-MATCHES but binds
match-var to the substring of
target-string corresponding to each match in turn. If sharedp is true, the substrings may share structure with
target-string.
-
* (defun crossfoot (target-string &key (start 0) (end (length target-string)))
(let ((sum 0))
- (cl-ppcre:do-matches-as-strings (m :digit-class
+ (do-matches-as-strings (m :digit-class
target-string nil
:start start :end end)
(incf sum (parse-integer m)))
@@ -870,7 +733,6 @@ otherwise. An implicit block named NIL surrounds DO-REGISTER-
an empty string, the scan is continued one position behind this
match. If sharedp is true, the substrings may share structure with
target-string.
-
* (do-register-groups (first second third fourth)
("((a)|(b)|(c))" "abababc" nil :start 2 :sharedp t)
@@ -903,13 +765,11 @@ of regex against
matches the list contains (* 2 N) elements. If
regex matches an empty string the scan is
continued one position behind this match.
-
-* (cl-ppcre:all-matches "a" "foo bar baz")
+* (all-matches "a" "foo bar baz")
(5 6 9 10)
-* (cl-ppcre:all-matches "\\w*" "foo bar baz")
+* (all-matches "\\w*" "foo bar baz")
(0 3 3 3 4 7 7 7 8 11 11 11)
ALL-MATCHES but
returns a list of substrings instead. If sharedp is true, the substrings may share structure with
target-string.
-
-* (cl-ppcre:all-matches-as-strings "a" "foo bar baz")
+* (all-matches-as-strings "a" "foo bar baz")
("a" "a")
-* (cl-ppcre:all-matches-as-strings "\\w*" "foo bar baz")
+* (all-matches-as-strings "\\w*" "foo bar baz")
("foo" "" "bar" "" "baz" "")
+Splitting and replacing
[Function]
@@ -957,48 +816,44 @@ If regex matches an empty string, the scan is
continued one position behind this match. If sharedp is true, the substrings may share structure with
target-string.
SCAN.
-
-* (cl-ppcre:split "\\s+" "foo bar baz
+* (split "\\s+" "foo bar baz
frob")
("foo" "bar" "baz" "frob")
-* (cl-ppcre:split "\\s*" "foo bar baz")
+* (split "\\s*" "foo bar baz")
("f" "o" "o" "b" "a" "r" "b" "a" "z")
-* (cl-ppcre:split "(\\s+)" "foo bar baz")
+* (split "(\\s+)" "foo bar baz")
("foo" "bar" "baz")
-* (cl-ppcre:split "(\\s+)" "foo bar baz" :with-registers-p t)
+* (split "(\\s+)" "foo bar baz" :with-registers-p t)
("foo" " " "bar" " " "baz")
-* (cl-ppcre:split "(\\s)(\\s*)" "foo bar baz" :with-registers-p t)
+* (split "(\\s)(\\s*)" "foo bar baz" :with-registers-p t)
("foo" " " "" "bar" " " " " "baz")
-* (cl-ppcre:split "(,)|(;)" "foo,bar;baz" :with-registers-p t)
+* (split "(,)|(;)" "foo,bar;baz" :with-registers-p t)
("foo" "," NIL "bar" NIL ";" "baz")
-* (cl-ppcre:split "(,)|(;)" "foo,bar;baz" :with-registers-p t :omit-unmatched-p t)
+* (split "(,)|(;)" "foo,bar;baz" :with-registers-p t :omit-unmatched-p t)
("foo" "," "bar" ";" "baz")
-* (cl-ppcre:split ":" "a:b:c:d:e:f:g::")
+* (split ":" "a:b:c:d:e:f:g::")
("a" "b" "c" "d" "e" "f" "g")
-* (cl-ppcre:split ":" "a:b:c:d:e:f:g::" :limit 1)
+* (split ":" "a:b:c:d:e:f:g::" :limit 1)
("a:b:c:d:e:f:g::")
-* (cl-ppcre:split ":" "a:b:c:d:e:f:g::" :limit 2)
+* (split ":" "a:b:c:d:e:f:g::" :limit 2)
("a" "b:c:d:e:f:g::")
-* (cl-ppcre:split ":" "a:b:c:d:e:f:g::" :limit 3)
+* (split ":" "a:b:c:d:e:f:g::" :limit 3)
("a" "b" "c:d:e:f:g::")
-* (cl-ppcre:split ":" "a:b:c:d:e:f:g::" :limit 1000)
+* (split ":" "a:b:c:d:e:f:g::" :limit 1000)
("a" "b" "c" "d" "e" "f" "g" "" "")
@@ -1073,40 +928,37 @@ for LispWorks
and CHARACTER
for other Lisps.
-
-* (cl-ppcre:regex-replace "fo+" "foo bar" "frob")
+* (regex-replace "fo+" "foo bar" "frob")
"frob bar"
T
-* (cl-ppcre:regex-replace "fo+" "FOO bar" "frob")
+* (regex-replace "fo+" "FOO bar" "frob")
"FOO bar"
NIL
-* (cl-ppcre:regex-replace "(?i)fo+" "FOO bar" "frob")
+* (regex-replace "(?i)fo+" "FOO bar" "frob")
"frob bar"
T
-* (cl-ppcre:regex-replace "(?i)fo+" "FOO bar" "frob" :preserve-case t)
+* (regex-replace "(?i)fo+" "FOO bar" "frob" :preserve-case t)
"FROB bar"
T
-* (cl-ppcre:regex-replace "(?i)fo+" "Foo bar" "frob" :preserve-case t)
+* (regex-replace "(?i)fo+" "Foo bar" "frob" :preserve-case t)
"Frob bar"
T
-* (cl-ppcre:regex-replace "bar" "foo bar baz" "[frob (was '\\&' between '\\`' and '\\'')]")
+* (regex-replace "bar" "foo bar baz" "[frob (was '\\&' between '\\`' and '\\'')]")
"foo [frob (was 'bar' between 'foo ' and ' baz')] baz"
T
-* (cl-ppcre:regex-replace "bar" "foo bar baz"
+* (regex-replace "bar" "foo bar baz"
'("[frob (was '" :match "' between '" :before-match "' and '" :after-match "')]"))
"foo [frob (was 'bar' between 'foo ' and ' baz')] baz"
T
-* (cl-ppcre:regex-replace "(be)(nev)(o)(lent)"
+* (regex-replace "(be)(nev)(o)(lent)"
"benevolent: adj. generous, kind"
#'(lambda (match &rest registers)
(format nil "~A [~{~A~^.~}]" match registers))
@@ -1121,26 +973,23 @@ T
-
Like REGEX-REPLACE but replaces all matches.
-
-* (cl-ppcre:regex-replace-all "(?i)fo+" "foo Fooo FOOOO bar" "frob" :preserve-case t)
+* (regex-replace-all "(?i)fo+" "foo Fooo FOOOO bar" "frob" :preserve-case t)
"frob Frob FROB bar"
T
-* (cl-ppcre:regex-replace-all "(?i)f(o+)" "foo Fooo FOOOO bar" "fr\\1b" :preserve-case t)
+* (regex-replace-all "(?i)f(o+)" "foo Fooo FOOOO bar" "fr\\1b" :preserve-case t)
"froob Frooob FROOOOB bar"
T
-* (let ((qp-regex (cl-ppcre:create-scanner "[\\x80-\\xff]")))
+* (let ((qp-regex (create-scanner "[\\x80-\\xff]")))
(defun encode-quoted-printable (string)
- "Convert 8-bit string to quoted-printable representation."
+ "Converts 8-bit string to quoted-printable representation."
;; won't work for Corman Lisp because non-ASCII characters aren't 8-bit there
(flet ((convert (target-string start end match-start match-end reg-starts reg-ends)
(declare (ignore start end match-end reg-starts reg-ends))
(format nil "=~2,'0x" (char-code (char target-string match-start)))))
- (cl-ppcre:regex-replace-all qp-regex string #'convert))))
+ (regex-replace-all qp-regex string #'convert))))
Converted ENCODE-QUOTED-PRINTABLE.
ENCODE-QUOTED-PRINTABLE
@@ -1148,14 +997,14 @@ ENCODE-QUOTED-PRINTABLE
"F=EAte S=F8rensen na=EFve H=FChner Stra=DFe"
T
-* (let ((url-regex (cl-ppcre:create-scanner "[^a-zA-Z0-9_\\-.]")))
+* (let ((url-regex (create-scanner "[^a-zA-Z0-9_\\-.]")))
(defun url-encode (string)
- "URL-encode a string."
+ "URL-encodes a string."
;; won't work for Corman Lisp because non-ASCII characters aren't 8-bit there
(flet ((convert (target-string start end match-start match-end reg-starts reg-ends)
(declare (ignore start end match-end reg-starts reg-ends))
(format nil "%~2,'0x" (char-code (char target-string match-start)))))
- (cl-ppcre:regex-replace-all url-regex string #'convert))))
+ (regex-replace-all url-regex string #'convert))))
Converted URL-ENCODE.
URL-ENCODE
@@ -1169,20 +1018,20 @@ T
(svref reg-starts 0))))
HOW-MANY
-* (cl-ppcre:regex-replace-all "{(.+?)}"
+* (regex-replace-all "{(.+?)}"
"foo{...}bar{.....}{..}baz{....}frob"
(list "[" 'how-many " dots]"))
"foo[3 dots]bar[5 dots][2 dots]baz[4 dots]frob"
T
-* (let ((qp-regex (cl-ppcre:create-scanner "[\\x80-\\xff]")))
+* (let ((qp-regex (create-scanner "[\\x80-\\xff]")))
(defun encode-quoted-printable (string)
- "Convert 8-bit string to quoted-printable representation.
+ "Converts 8-bit string to quoted-printable representation.
Version using SIMPLE-CALLS keyword argument."
;; ;; won't work for Corman Lisp because non-ASCII characters aren't 8-bit there
(flet ((convert (match)
(format nil "=~2,'0x" (char-code (char match 0)))))
- (cl-ppcre:regex-replace-all qp-regex string #'convert
+ (regex-replace-all qp-regex string #'convert
:simple-calls t))))
Converted ENCODE-QUOTED-PRINTABLE.
@@ -1197,7 +1046,7 @@ T
(format nil "~A" (length first-register)))
HOW-MANY
-* (cl-ppcre:regex-replace-all "{(.+?)}"
+* (regex-replace-all "{(.+?)}"
"foo{...}bar{.....}{..}baz{....}frob"
(list "[" 'how-many " dots]")
:simple-calls t)
@@ -1206,86 +1055,111 @@ HOW-MANY
T
[Function]
-
regex-apropos regex &optional packages &key case-insensitive => list
+Modifying scanner behaviour
-
-Like APROPOS
-but searches for interned symbols which match the regular expression
-regex. The output is implementation-dependent. If
-case-insensitive is true (which is the default)
-and regex isn't already a scanner, a
-case-insensitive scanner is used.
-
[Special variable]
+
*property-resolver*
+
-* (defun foo (n &optional (k 0)) (+ 3 n k))
-FOO
-* (defparameter foo "bar")
-FOO
+
This is the designator for a function responsible
+for resolving named properties like \p{Number}. If
+CL-PPCRE encounters a \p or a \P it expects
+to see an opening curly brace immediately afterwards and will then
+read everything following that brace until it sees a closing curly
+brace. The resolver function will be called with this string and must
+return a corresponding unary test function which accepts a character
+as its argument and returns a true value if and only if the character
+has the named property. If the resolver returns NIL
+instead, it signals that a property of that name is unknown.
-* *package*
-#<The COMMON-LISP-USER package, 16/21 internal, 0/9 external>
+* (labels ((char-code-odd-p (char)
+ (oddp (char-code char)))
+ (char-code-even-p (char)
+ (evenp (char-code char)))
+ (resolver (name)
+ (cond ((string= name "odd") #'char-code-odd-p)
+ ((string= name "even") #'char-code-even-p)
+ ((string= name "true") (constantly t))
+ (t (error "Can't resolve ~S." name)))))
+ (let ((*property-resolver* #'resolver))
+ ;; quiz question - why do we need CREATE-SCANNER here?
+ (list (regex-replace-all (create-scanner "\\p{odd}") "abcd" "+")
+ (regex-replace-all (create-scanner "\\p{even}") "abcd" "+")
+ (regex-replace-all (create-scanner "\\p{true}") "abcd" "+"))))
+("+b+d" "a+c+" "++++")
+
+If the value
+of *PROPERTY-RESOLVER*
+is NIL (which is the default), \p and \P in regex
+strings will simply be treated like p or P
+as in CL-PPCRE 1.4.1 and earlier. Note that this does not affect
+the validity of (:PROPERTY <name>)
+parts in S-expression syntax.
+
[Accessor]
+
parse-tree-synonym symbol => parse-tree
+
(setf (parse-tree-synonym symbol) new-parse-tree)
-* (defparameter |foobar| 42)
-|foobar|
+
+
+Any symbol (unless it's a keyword with a special meaning in parse
+trees) can be made a "synonym", i.e. an abbreviation, for another parse
+tree by this accessor. PARSE-TREE-SYNONYM returns NIL if symbol isn't a synonym yet.
+
+* (parse-string "a*b+")
+(:SEQUENCE (:GREEDY-REPETITION 0 NIL #\a) (:GREEDY-REPETITION 1 NIL #\b))
-* (defparameter fooboo 43)
-FOOBOO
+* (defun my-repetition (char min)
+ `(:greedy-repetition ,min nil ,char))
+MY-REPETITION
-* (defclass frobar () ())
-#<STANDARD-CLASS FROBAR {4874E625}>
+* (setf (parse-tree-synonym 'a*) (my-repetition #\a 0))
+(:GREEDY-REPETITION 0 NIL #\a)
-* (cl-ppcre:regex-apropos "foo(?:bar)?")
-FOO [variable] value: "bar"
- [compiled function] (N &OPTIONAL (K 0))
-FOOBOO [variable] value: 43
-|foobar| [variable] value: 42
+* (setf (parse-tree-synonym 'b+) (my-repetition #\b 1))
+(:GREEDY-REPETITION 1 NIL #\b)
-* (cl-ppcre:regex-apropos "(?:foo|fro)bar")
-PCL::|COMMON-LISP-USER::FROBAR class predicate| [compiled closure]
-FROBAR [class] #<STANDARD-CLASS FROBAR {4874E625}>
-|foobar| [variable] value: 42
+* (let ((scanner (create-scanner '(:sequence a* b+))))
+ (dolist (string '("ab" "b" "aab" "a" "x"))
+ (print (scan scanner string)))
+ (values))
+0
+0
+0
+NIL
+NIL
-* (cl-ppcre:regex-apropos "(?:foo|fro)bar" 'cl-user)
-FROBAR [class] #<STANDARD-CLASS FROBAR {4874E625}>
-|foobar| [variable] value: 42
+* (parse-tree-synonym 'a*)
+(:GREEDY-REPETITION 0 NIL #\a)
-* (cl-ppcre:regex-apropos "(?:foo|fro)bar" '(pcl ext))
-PCL::|COMMON-LISP-USER::FROBAR class predicate| [compiled closure]
-
-* (cl-ppcre:regex-apropos "foo")
-FOO [variable] value: "bar"
- [compiled function] (N &OPTIONAL (K 0))
-FOOBOO [variable] value: 43
-|foobar| [variable] value: 42
-
-* (cl-ppcre:regex-apropos "foo" nil :case-insensitive nil)
-|foobar| [variable] value: 42
+* (parse-tree-synonym 'a+)
+NIL
[Macro]
+
define-parse-tree-synonym name parse-tree => parse-tree
-
-
-
[Function]
-
regex-apropos-list regex &optional packages &key upcase => list
-
-
-Like APROPOS-LIST
-but searches for interned symbols which match the regular expression
-regex. If case-insensitive is
-true (which is the default) and regex isn't
-already a scanner, a case-insensitive scanner is used.
-
+This is a convenience macro for parse tree synonyms defined as
-* (cl-ppcre:regex-apropos-list "foo(?:bar)?")
-(|foobar| FOOBOO FOO)
+(defmacro define-parse-tree-synonym (name parse-tree)
+ `(eval-when (:compile-toplevel :load-toplevel :execute)
+ (setf (parse-tree-synonym ',name) ',parse-tree)))
+
+
+so you can write code like this:
+
+
+(define-parse-tree-synonym a-z
+ (:char-class (:range #\a #\z) (:range #\A #\Z)))
+
+(define-parse-tree-synonym a-z*
+ (:greedy-repetition 0 nil a-z))
+
+(defun ascii-char-tester (string)
+ (scan '(:sequence :start-anchor a-z* :end-anchor)
+ string))
[Special variable]
@@ -1296,40 +1170,15 @@ account all characters of your CL implementation or only those
the CHAR-CODE
of which is not larger than its value. The default is
-CHAR-CODE-LIMIT,
+CHAR-CODE-LIMIT,
and you might see significant speed and space improvements during
-scanner creation if, say, your target strings only contain ISO-8859-1
-characters and you're using an implementation like AllegroCL,
-CLISP, LispWorks, or SBCL where CHAR-CODE-LIMIT has a value
-much higher than 256. The test suite will
-automatically set *REGEX-CHAR-CODE-LIMIT* to 256 while
-you're running the default test.
-
-CL-USER 23 > (time (cl-ppcre:create-scanner "[3\\D]"))
-Timing the evaluation of (CL-PPCRE:CREATE-SCANNER "[3\\D]")
-
-user time = 0.443
-system time = 0.001
-Elapsed time = 0:00:01
-Allocation = 546600 bytes standard / 2162611 bytes fixlen
-0 Page faults
-#<closure 20654AF2>
-
-CL-USER 24 > (time (let ((cl-ppcre:*regex-char-code-limit* 256)) (cl-ppcre:create-scanner "[3\\D]")))
-Timing the evaluation of (LET ((CL-PPCRE:*REGEX-CHAR-CODE-LIMIT* 256)) (CL-PPCRE:CREATE-SCANNER "[3\\D]"))
-
-user time = 0.000
-system time = 0.000
-Elapsed time = 0:00:00
-Allocation = 3336 bytes standard / 8338 bytes fixlen
-0 Page faults
-#<closure 206569DA>
-
+scanner creation if, say, your target strings only
+contain ISO-8859-1
+characters and you're using a Lisp implementation
+where CHAR-CODE-LIMIT has a value much higher
+than 256. The test suite will automatically
+set *REGEX-CHAR-CODE-LIMIT* to 256 while you're running
+the default test.
LOAD-TIME-VALUE and the compiler macro for SCAN and other functions, some
@@ -1366,6 +1215,29 @@ lexical environment*USE-BMH-MATCHERS* is bound at that
time.
+
[Special variable]
*optimize-char-classes*
+
+
+Whether character classes should be compiled into look-ups into O(1) +data structures. This is usually fast but will be costly in terms of +scanner creation time and might be costly in terms of size if +*REGEX-CHAR-CODE-LIMIT*+is high. This value will be used as thekind+keyword argument +toCREATE-OPTIMIZED-TEST-FUNCTION+- see there for the possible non-NILvalues. The default +value (NIL) should usually be fine unless you're sure +that you absolutely have to optimize some character classes for speed. ++Note: Due to the nature +of
LOAD-TIME-VALUE+and the compiler macro forSCAN+and other functions, some scanners might be created in +a null +lexical environment at load time or at compile time so be careful +to which value*OPTIMIZE-CHAR-CLASSES*is bound at that +time. +
[Special variable]
*allow-quoting*
@@ -1380,19 +1252,19 @@ href="#ppcre-syntax-error">syntax error messages will complain
about the converted string and not about the original regex string.
-* (cl-ppcre:scan "^a+$" "a+")
+* (scan "^a+$" "a+")
NIL
-* (let ((cl-ppcre:*allow-quoting* t))
+* (let ((*allow-quoting* t))
;;we use CREATE-SCANNER because of Lisps like SBCL that don't have an interpreter
- (cl-ppcre:scan (cl-ppcre:create-scanner "^\\Qa+\\E$") "a+"))
+ (scan (create-scanner "^\\Qa+\\E$") "a+"))
0
2
#()
#()
-* (let ((cl-ppcre:*allow-quoting* t))
- (cl-ppcre:scan (cl-ppcre:create-scanner "\\Qa()\\E(?#comment\\Q)a**b") "()ab"))
+* (let ((*allow-quoting* t))
+ (scan (create-scanner "\\Qa()\\E(?#comment\\Q)a**b") "()ab"))
Quantifier '*' not allowed at position 19 in string "a\\(\\)(?#commentQ)a**b"
@@ -1403,10 +1275,10 @@ function. Also note that the second example might be easier to
understand (and Lisp-ier) if you write it like this:
-* (cl-ppcre:scan '(:sequence :start-anchor - "a+" ;; no quoting necessary - :end-anchor) - "a+") +* (scan '(:sequence :start-anchor + "a+" ;; no quoting necessary + :end-anchor) + "a+") 0 2 #() @@ -1434,39 +1306,32 @@ CL-PPCRE will support(?<name>"<regex>")andAllegroCL. nameis has to start with a letter and can contain only alphanumeric characters or minus sign. Names of registers are matched case-sensitively. The parse tree syntax is not affected by the*ALLOW-NAMED-REGISTERS*switch,:NAMED-REGISTERand:BACK-REFERENCEforms are always resolved as expected. There are also no restrictions on register names in this syntax except that they have to be strings. --Examples: -
;; Perl compatible mode (*ALLOW-NAMED-REGISTERS* is NIL) -* (cl-ppcre:create-scanner "(?<reg>.*)") +* (create-scanner "(?<reg>.*)") Character 'r' may not follow '(?<' at position 3 in string "(?<reg>)" ;; just unescapes "\\k" -* (cl-ppcre::parse-string "\\k<reg>") +* (parse-string "\\k<reg>") "k<reg>" ---* (setq cl-ppcre:*allow-named-registers* t) +* (setq *allow-named-registers* t) T -* (cl-ppcre:create-scanner "((?<small>[a-z]*)(?<big>[A-Z]*))") +* (create-scanner "((?<small>[a-z]*)(?<big>[A-Z]*))") #<CLOSURE (LAMBDA (STRING CL-PPCRE::START CL-PPCRE::END)) {AD75BFD}> (NIL "small" "big") ;; the scanner doesn't capture any information about named groups - ;; you have to store the second value returned from CREATE-SCANNER yourself -* (cl-ppcre:scan * "aaaBBB") +* (scan * "aaaBBB") 0 6 #(0 0 3) #(6 3 6) --;; parse tree syntax -* (cl-ppcre::parse-string "((?<small>[a-z]*)(?<big>[A-Z]*))") +* (parse-string "((?<small>[a-z]*)(?<big>[A-Z]*))") (:REGISTER (:SEQUENCE (:NAMED-REGISTER "small" @@ -1474,61 +1339,54 @@ T (:NAMED-REGISTER "big" (:GREEDY-REPETITION 0 NIL (:CHAR-CLASS (:RANGE #\A #\Z)))))) -* (cl-ppcre:create-scanner *) +* (create-scanner *) #<CLOSURE (LAMBDA (STRING CL-PPCRE::START CL-PPCRE::END)) {B158E3D}> (NIL "small" "big") --;; multiple-choice back-reference -* (cl-ppcre:scan "^(?<reg>[ab])(?<reg>[12])\\k<reg>\\k<reg>$" "a1aa") +* (scan "^(?<reg>[ab])(?<reg>[12])\\k<reg>\\k<reg>$" "a1aa") 0 4 #(0 1) #(1 2) -* (cl-ppcre:scan "^(?<reg>[ab])(?<reg>[12])\\k<reg>\\k<reg>$" "a22a") +* (scan "^(?<reg>[ab])(?<reg>[12])\\k<reg>\\k<reg>$" "a22a") 0 4 #(0 1) #(1 2) -- -;; demonstrating most-recently-seen-register-first property of back-reference; ;; "greedy" regex (analogous to "aa?") -* (cl-ppcre:scan "^(?<reg>)(?<reg>a)(\\k<reg>)" "a") +* (scan "^(?<reg>)(?<reg>a)(\\k<reg>)" "a") 0 1 #(0 0 1) #(0 1 1) -* (cl-ppcre:scan "^(?<reg>)(?<reg>a)(\\k<reg>)" "aa") +* (scan "^(?<reg>)(?<reg>a)(\\k<reg>)" "aa") 0 2 #(0 0 1) #(0 1 2) --;; switched groups ;; "lazy" regex (analogous to "aa??") -* (cl-ppcre:scan "^(?<reg>a)(?<reg>)(\\k<reg>)" "a") +* (scan "^(?<reg>a)(?<reg>)(\\k<reg>)" "a") 0 1 #(0 1 1) #(1 1 1) ;; scanner ignores the second "a" -* (cl-ppcre:scan "^(?<reg>a)(?<reg>)(\\k<reg>)" "aa") +* (scan "^(?<reg>a)(?<reg>)(\\k<reg>)" "aa") 0 1 #(0 1 1) #(1 1 1) ;; "aa" will be matched only when forced by adding "$" at the end -* (cl-ppcre:scan "^(?<reg>a)(?<reg>)(\\k<reg>)$" "aa") +* (scan "^(?<reg>a)(?<reg>)(\\k<reg>)$" "aa") 0 2 #(0 1 1) @@ -1543,6 +1401,56 @@ to which value*ALLOW-NAMED-REGISTERS*is bound at that time. +Miscellaneous
+ +
[Function] +
parse-string string => parse-tree + ++ +
Converts the regex +stringstringinto a parse tree. +Note that the result is usually one possible way of creating an +equivalent parse tree and not necessarily the "canonical" one. +Specifically, the parse tree might contain redundant parts which are +supposed to be excised when a scanner is created. +
[Function]
create-optimized-test-function test-function &key start end kind => function ++
+ +Given a unary test functiontest-functionwhich is +applicable to characters returns a function which yields the same +boolean results for all characters with character codes +fromstartto (excluding)end. +Ifkind+isNIL,test-functionwill simply be +returned. Otherwise,kindshould be one of: ++
+You can also use- +
:HASH-TABLE- The function builds a hash table representing all characters which +satisfy the test and returns a closure which checks if a character is +in that hash table.
+- +
:CHARSET- Instead of a hash table the function uses a "charset" +which is a data structure using non-linear hashing and optimized to +represent (sparse) sets of characters in a fast and space-efficient +way (contributed by Nikodemus Siivola).
+- +
:CHARMAP- Instead of a hash table the function uses a bit vector to +represent the set of characters.
+:HASH-TABLE*or:CHARSET*+which are like:HASH-TABLEand:CHARSETbut +use the complement of the set if the set contains more than half of +all characters betweenstart+andend. This saves space but needs an additional +pass across all characters to create the data structure. There is no +corresponding:CHARMAP*kindas the bit vectors are +already created to cover the smallest possible interval which contains +either the set or its complement. ++See also
*OPTIMIZE-CHAR-CLASSES*. +
[Function]
quote-meta-chars string => string' @@ -1556,10 +1464,94 @@ backslash similar to Perl'squotemetafunction. It always returns a href="http://www.lispworks.com/documentation/HyperSpec/Body/26_glo_f.htm#fresh">fresh string.-* (cl-ppcre:quote-meta-chars "[a-z]*") +* (quote-meta-chars "[a-z]*") "\\[a\\-z\\]\\*"+
[Function] +
regex-apropos regex &optional packages &key case-insensitive => list + ++ + + + +
+LikeAPROPOS+but searches for interned symbols which match the regular expression +regex. The output is implementation-dependent. If +case-insensitiveis true (which is the default) +andregexisn't already a scanner, a +case-insensitive scanner is used. ++Here are examples for CMUCL: + +
+* *package* +#<The COMMON-LISP-USER package, 16/21 internal, 0/9 external> + +* (defun foo (n &optional (k 0)) (+ 3 n k)) +FOO + +* (defparameter foo "bar") +FOO + +* (defparameter |foobar| 42) +|foobar| + +* (defparameter fooboo 43) +FOOBOO + +* (defclass frobar () ()) +#<STANDARD-CLASS FROBAR {4874E625}> + +* (regex-apropos "foo(?:bar)?") +FOO [variable] value: "bar" + [compiled function] (N &OPTIONAL (K 0)) +FOOBOO [variable] value: 43 +|foobar| [variable] value: 42 + +* (regex-apropos "(?:foo|fro)bar") +PCL::|COMMON-LISP-USER::FROBAR class predicate| [compiled closure] +FROBAR [class] #<STANDARD-CLASS FROBAR {4874E625}> +|foobar| [variable] value: 42 + +* (regex-apropos "(?:foo|fro)bar" 'cl-user) +FROBAR [class] #<STANDARD-CLASS FROBAR {4874E625}> +|foobar| [variable] value: 42 + +* (regex-apropos "(?:foo|fro)bar" '(pcl ext)) +PCL::|COMMON-LISP-USER::FROBAR class predicate| [compiled closure] + +* (regex-apropos "foo") +FOO [variable] value: "bar" + [compiled function] (N &OPTIONAL (K 0)) +FOOBOO [variable] value: 43 +|foobar| [variable] value: 42 + +* (regex-apropos "foo" nil :case-insensitive nil) +|foobar| [variable] value: 42 +
[Function] +
regex-apropos-list regex &optional packages &key upcase => list + ++ +
+LikeAPROPOS-LIST+but searches for interned symbols which match the regular expression +regex. Ifcase-insensitiveis +true (which is the default) andregexisn't +already a scanner, a case-insensitive scanner is used. ++Example (continued from above): + +
+* (regex-apropos-list "foo(?:bar)?") +(|foobar| FOOBOO FOO) +Conditions
+
[Condition type]
ppcre-error @@ -1602,19 +1594,19 @@ where the parser was happy and not the position where it gave up.* (handler-case - (cl-ppcre:scan "foo**x" "fooox") - (cl-ppcre:ppcre-syntax-error (condition) + (scan "foo**x" "fooox") + (ppcre-syntax-error (condition) (format t "Houston, we've got a problem with the string ~S:~%~ Looks like something went wrong at position ~A.~%~ The last message we received was \"~?\"." - (cl-ppcre:ppcre-syntax-error-string condition) - (cl-ppcre:ppcre-syntax-error-pos condition) + (ppcre-syntax-error-string condition) + (ppcre-syntax-error-pos condition) (simple-condition-format-control condition) (simple-condition-format-arguments condition)) (values))) Houston, we've got a problem with the string "foo**x": Looks like something went wrong at position 4. -The last message we received was "Quantifier '*' not allowed". +The last message we received was "Quantifier '*' not allowed.".@@ -1643,6 +1635,39 @@ occurred (orNILif the error happened while trying to convert a parse tree). +
Unicode properties
+ +You can add support for Unicode properties to CL-PPCRE by loading +the CL-PPCRE-UNICODE system: ++(asdf:oos 'asdf:load-op :cl-ppcre-unicode) ++This will automatically +installUNICODE-PROPERTY-RESOLVER+as your property resolver. ++See the CL-UNICODE +documentation for information about the supported Unicode properties +and how they are named. + +
[Function]
unicode-property-resolver property-name => function-or-nil ++
+A property +resolver which understands Unicode properties using +CL-UNICODE'sPROPERTY-TEST+function. This resolver is automatically installed +in*PROPERTY-RESOLVER*+when the CL-PPCRE-UNICODE system is loaded. ++* (scan-to-strings "\\p{Script:Latin}+" "0+AB_*") +"AB" +#() ++Note that this symbol is exported from +theCL-PPCRE-UNICODEpackage and not from +theCL-PPCREpackage. +
Filters
@@ -1674,41 +1699,43 @@ to the outcome of the matching process.The filter function can access the following special variables from its code body: -
+
+ These variables should be considered read-only. Do not change these values unless you really know what you're doing!-
CL-PPCRE::*STRING*: The target (a string) of the -current matching process. +- +
CL-PPCRE::*STRING*- The target (a string) of the current matching process.
-CL-PPCRE::*START-POS*and -CL-PPCRE::*END-POS*: The start and end (integers) indices +- +
CL-PPCRE::*START-POS*and +CL-PPCRE::*END-POS*- The start and end (integers) indices of the current matching process. These correspond to the -
-STARTandENDkeyword parameters ofSCAN. +STARTandENDkeyword parameters +ofSCAN.CL-PPCRE::*REAL-START-POS*: The initial starting +- +
CL-PPCRE::*REAL-START-POS*- The initial starting position. This is only relevant for repeated scans (as in
-DO-SCANS) whereCL-PPCRE::*START-POS*will be moved forward whileCL-PPCRE::*REAL-START-POS*won't. For normal scans the -value of this variable isNIL. +value of this variable isNIL.CL-PPCRE::*REG-STARTS*and -CL-PPCRE::*REG-ENDS*: Two simple vectors which denote the +- +
CL-PPCRE::*REG-STARTS*and +CL-PPCRE::*REG-ENDS*- Two simple vectors which denote the start and end indices of registers within the regular expression. The first register is indexed by 0. If a register hasn't matched yet, then its corresponding entry in
-CL-PPCRE::*REG-STARTS*is -NIL. +NIL.Note that the names of the variables are not exported from the -
CL-PPCREpackage because there's currently no guarantee -that they will be available in future releases. --Here are some filter examples: +
CL-PPCREpackage because there's no explicit guarantee +that they will be available in future releases. (Although after so +many years it is very unlikely that they'll go away...)* (defun my-info-filter (pos) "Show some info about the matching process." @@ -1821,64 +1848,6 @@ For more ideas about what you can do with filters see this thread on the mailing list. -
Testing CL-PPCRE
- -CL-PPCRE comes with a comprehensive test suite most of which is stolen -from the PCRE library. You can use -it like this: - --* (mk:compile-system "cl-ppcre-test") -; Loading #p"/home/edi/cl-ppcre/cl-ppcre.system". -; Loading #p"/home/edi/cl-ppcre/packages.x86f". -; Loading #p"/home/edi/cl-ppcre/specials.x86f". -; Loading #p"/home/edi/cl-ppcre/util.x86f". -; Loading #p"/home/edi/cl-ppcre/errors.x86f". -; Loading #p"/home/edi/cl-ppcre/lexer.x86f". -; Loading #p"/home/edi/cl-ppcre/parser.x86f". -; Loading #p"/home/edi/cl-ppcre/regex-class.x86f". -; Loading #p"/home/edi/cl-ppcre/convert.x86f". -; Loading #p"/home/edi/cl-ppcre/optimize.x86f". -; Loading #p"/home/edi/cl-ppcre/closures.x86f". -; Loading #p"/home/edi/cl-ppcre/repetition-closures.x86f". -; Loading #p"/home/edi/cl-ppcre/scanner.x86f". -; Loading #p"/home/edi/cl-ppcre/api.x86f". -; Loading #p"/home/edi/cl-ppcre/ppcre-tests.x86f". -NIL - -* (cl-ppcre-test:test) - -;; .... -;; (a list of incompatibilities with Perl) -- -(If you're not using MK:DEFSYSTEM or asdf, it suffices to build -CL-PPCRE and then compile and load the file -ppcre-tests.lisp.) --With LispWorks, SCL, and SBCL (starting from version 0.8.4.8) you can also call -
CL-PPCRE-TEST:TESTwith a keyword argument argument -THREADEDwhich - in addition to the usual tests - will -also check whether the scanners created by CL-PPCRE are thread-safe. --Note that the file
testdataprovided with CL-PPCRE -was created on a Linux system with Perl 5.8.0. You can (and you -should if you're on Mac OS or Windows) create your own -testdatawith the Perl script -perltest.pl: - --edi@bird:~/cl-ppcre > perl perltest.pl < testinput > testdata -- -Of course you can also create your own tests - the format accepted by -perltest.plshould be rather clear from looking at the -filetestinput. Note that the target strings are wrapped -in double quotes and then fed to Perl'sevalso you can -use ugly Perl constructs like, say,a@{['b' x 10]}cwhich -will result in the target string -"abbbbbbbbbbc". -
Compatibility with Perl
Depending on your Perl version you might encounter a couple of small @@ -1886,28 +1855,28 @@ incompatibilities with Perl most of which aren't due to CL-PPCRE:Empty strings instead of
-(Cf. case #629 ofundefin$1,$2, etc.testdata.) +(Cf. case #629 ofperltestdata.) This is a bug in Perl 5.6.1 and earlier which has been fixed in 5.8.0.Strange scoping of embedded modifiers
-(Cf. case #430 oftestdata.) +(Cf. case #430 ofperltestdata.) This is a bug in Perl 5.6.1 and earlier which has been fixed in 5.8.0.Inconsistent capturing of
-(Cf. case #662 of$1,$2, etc.testdata.) +(Cf. case #662 ofperltestdata.) This is a bug in Perl which hasn't been fixed yet.Captured groups not available outside of look-aheads and look-behinds
-(Cf. case #1439 oftestdata.) +(Cf. case #1439 ofperltestdata.) Well, OK, this ain't a Perl bug. I just can't quite understand why captured groups should only be seen within the scope of a look-ahead or look-behind. For the moment, CL-PPCRE and Perl agree to @@ -1915,13 +1884,22 @@ disagree... :)Alternations don't always work from left to right
-(Cf. case #790 oftestdata.) I +(Cf. case #790 ofperltestdata.) I also think this a Perl bug but I currently have lost the drive to report it. +Different names for Unicode properties
+ +The names of Unicode properties are derived +from CL-UNICODE and might +differ slightly from the names in Perl. Most of them should be +identical, though. +Also, CL-UNICODE is based on +Unicode 5.1 while your installed Perl version might be not. +-(Cf. case #9 of
"\r"doesn't work with MCLtestdata.) For +(Cf. case #9 ofperltestdata.) For some strange reason that I don't understand MCL translates#\Returnto(CODE-CHAR 10)while MacPerl translates"\r"to(CODE-CHAR @@ -1936,412 +1914,8 @@ to decide whether a character matches Perl's you might encounter differences between Perl and CL-PPCRE when matching non-ASCII characters. -
Performance
- -Benchmarking
- -The CL-PPCRE test suite can also be used for -benchmarking purposes: If you callperltest.plwith a -command line argument, it will be interpreted as the minimum number of seconds -each test should run. Perl will time its tests accordingly and create -output which, when fed toCL-PPCRE-TEST:TEST, will result -in a benchmark. Here's an example: - --edi@bird:~/cl-ppcre > echo "/((a{0,5}){0,5})*[c]/ -aaaaaaaaaaaac - -/((a{0,5})*)*[c]/ -aaaaaaaaaaaac" | perl perltest.pl .5 > timedata -1 -2 - -edi@bird:~/cl-ppcre > cmucl -quiet -; Loading #p"/home/edi/.cmucl-init". - -* (mk:compile-system "cl-ppcre-test") -; Loading #p"/home/edi/cl-ppcre/cl-ppcre.system". -; Loading #p"/home/edi/cl-ppcre/packages.x86f". -; Loading #p"/home/edi/cl-ppcre/specials.x86f". -; Loading #p"/home/edi/cl-ppcre/util.x86f". -; Loading #p"/home/edi/cl-ppcre/errors.x86f". -; Loading #p"/home/edi/cl-ppcre/lexer.x86f". -; Loading #p"/home/edi/cl-ppcre/parser.x86f". -; Loading #p"/home/edi/cl-ppcre/regex-class.x86f". -; Loading #p"/home/edi/cl-ppcre/convert.x86f". -; Loading #p"/home/edi/cl-ppcre/optimize.x86f". -; Loading #p"/home/edi/cl-ppcre/closures.x86f". -; Loading #p"/home/edi/cl-ppcre/repetition-closures.x86f". -; Loading #p"/home/edi/cl-ppcre/scanner.x86f". -; Loading #p"/home/edi/cl-ppcre/api.x86f". -; Loading #p"/home/edi/cl-ppcre/ppcre-tests.x86f". -NIL - -* (cl-ppcre-test:test :file-name "/home/edi/cl-ppcre/timedata") - 1: 0.5559 (1000000 repetitions, Perl: 4.5330 seconds, CL-PPCRE: 2.5200 seconds) - 2: 0.4573 (1000000 repetitions, Perl: 4.5922 seconds, CL-PPCRE: 2.1000 seconds) -NIL -- -We gave two test cases toperltest.pland asked it to repeat those tests often enough so that it takes at least 0.5 seconds to run each of them. In both cases, CMUCL was about twice as fast as Perl. --Here are some more benchmarks (done with Perl 5.6.1 and CMUCL 18d+ in 2002): -
- -
-
- -- Test case Repetitions Perl (sec) CL-PPCRE (sec) Ratio CL-PPCRE/Perl - "@{['x' x 100]}" =~ /(.)*/s100000 0.1394 0.0700 0.5022 - "@{['x' x 1000]}" =~ /(.)*/s100000 0.1628 0.0600 0.3685 - "@{['x' x 10000]}" =~ /(.)*/s100000 0.5071 0.0600 0.1183 - "@{['x' x 100000]}" =~ /(.)*/s10000 0.3902 0.0000 0.0000 - "@{['x' x 100]}" =~ /.*/100000 0.1520 0.0800 0.5262 - "@{['x' x 1000]}" =~ /.*/100000 0.3786 0.5400 1.4263 - "@{['x' x 10000]}" =~ /.*/10000 0.2709 0.5100 1.8826 - "@{['x' x 100000]}" =~ /.*/1000 0.2734 0.5100 1.8656 - "@{['x' x 100]}" =~ /.*/s100000 0.1320 0.0300 0.2274 - "@{['x' x 1000]}" =~ /.*/s100000 0.1634 0.0300 0.1836 - "@{['x' x 10000]}" =~ /.*/s100000 0.5304 0.0300 0.0566 - "@{['x' x 100000]}" =~ /.*/s10000 0.3966 0.0000 0.0000 - "@{['x' x 100]}" =~ /x*/100000 0.1507 0.0900 0.5970 - "@{['x' x 1000]}" =~ /x*/100000 0.3782 0.6300 1.6658 - "@{['x' x 10000]}" =~ /x*/10000 0.2730 0.6000 2.1981 - "@{['x' x 100000]}" =~ /x*/1000 0.2708 0.5900 2.1790 - "@{['x' x 100]}" =~ /[xy]*/100000 0.2637 0.1500 0.5688 - "@{['x' x 1000]}" =~ /[xy]*/10000 0.1449 0.1200 0.8282 - "@{['x' x 10000]}" =~ /[xy]*/1000 0.1344 0.1100 0.8185 - "@{['x' x 100000]}" =~ /[xy]*/100 0.1355 0.1200 0.8857 - "@{['x' x 100]}" =~ /(.)*/100000 0.1523 0.1100 0.7220 - "@{['x' x 1000]}" =~ /(.)*/100000 0.3735 0.5700 1.5262 - "@{['x' x 10000]}" =~ /(.)*/10000 0.2735 0.5100 1.8647 - "@{['x' x 100000]}" =~ /(.)*/1000 0.2598 0.5000 1.9242 - "@{['x' x 100]}" =~ /(x)*/100000 0.1565 0.1300 0.8307 - "@{['x' x 1000]}" =~ /(x)*/100000 0.3783 0.6600 1.7446 - "@{['x' x 10000]}" =~ /(x)*/10000 0.2720 0.6000 2.2055 - "@{['x' x 100000]}" =~ /(x)*/1000 0.2725 0.6000 2.2020 - "@{['x' x 100]}" =~ /(y|x)*/10000 0.2411 0.1000 0.4147 - "@{['x' x 1000]}" =~ /(y|x)*/1000 0.2313 0.0900 0.3891 - "@{['x' x 10000]}" =~ /(y|x)*/100 0.2336 0.0900 0.3852 - "@{['x' x 100000]}" =~ /(y|x)*/10 0.4165 0.0900 0.2161 - "@{['x' x 100]}" =~ /([xy])*/100000 0.2678 0.1800 0.6721 - "@{['x' x 1000]}" =~ /([xy])*/10000 0.1459 0.1200 0.8227 - "@{['x' x 10000]}" =~ /([xy])*/1000 0.1372 0.1100 0.8017 - "@{['x' x 100000]}" =~ /([xy])*/100 0.1358 0.1100 0.8098 - "@{['x' x 100]}" =~ /((x){2})*/10000 0.1073 0.0400 0.3727 - "@{['x' x 1000]}" =~ /((x){2})*/10000 0.9146 0.2400 0.2624 - "@{['x' x 10000]}" =~ /((x){2})*/1000 0.9020 0.2300 0.2550 - "@{['x' x 100000]}" =~ /((x){2})*/100 0.8983 0.2300 0.2560 - "@{[join undef, map { chr(ord('a') + rand 26) } (1..100)]}FOOBARBAZ" =~ /[a-z]*FOOBARBAZ/100000 0.2829 0.2300 0.8129 - "@{[join undef, map { chr(ord('a') + rand 26) } (1..1000)]}FOOBARBAZ" =~ /[a-z]*FOOBARBAZ/10000 0.1859 0.1700 0.9143 - "@{[join undef, map { chr(ord('a') + rand 26) } (1..10000)]}FOOBARBAZ" =~ /[a-z]*FOOBARBAZ/1000 0.1420 0.1700 1.1968 - "@{[join undef, map { chr(ord('a') + rand 26) } (1..100)]}NOPE" =~ /[a-z]*FOOBARBAZ/1000000 0.9196 0.4600 0.5002 - "@{[join undef, map { chr(ord('a') + rand 26) } (1..1000)]}NOPE" =~ /[a-z]*FOOBARBAZ/100000 0.2166 0.2500 1.1542 - "@{[join undef, map { chr(ord('a') + rand 26) } (1..10000)]}NOPE" =~ /[a-z]*FOOBARBAZ/10000 0.1465 0.2300 1.5696 - "@{[join undef, map { chr(ord('a') + rand 26) } (1..100)]}FOOBARBAZ" =~ /([a-z])*FOOBARBAZ/100000 0.2917 0.2600 0.8915 - "@{[join undef, map { chr(ord('a') + rand 26) } (1..1000)]}FOOBARBAZ" =~ /([a-z])*FOOBARBAZ/10000 0.1811 0.1800 0.9942 - "@{[join undef, map { chr(ord('a') + rand 26) } (1..10000)]}FOOBARBAZ" =~ /([a-z])*FOOBARBAZ/1000 0.1424 0.1600 1.1233 - "@{[join undef, map { chr(ord('a') + rand 26) } (1..100)]}NOPE" =~ /([a-z])*FOOBARBAZ/1000000 0.9154 0.7400 0.8083 - "@{[join undef, map { chr(ord('a') + rand 26) } (1..1000)]}NOPE" =~ /([a-z])*FOOBARBAZ/100000 0.2170 0.2800 1.2901 - "@{[join undef, map { chr(ord('a') + rand 26) } (1..10000)]}NOPE" =~ /([a-z])*FOOBARBAZ/10000 0.1497 0.2300 1.5360 - "@{[join undef, map { chr(ord('a') + rand 26) } (1..100)]}FOOBARBAZ" =~ /([a-z]|ab)*FOOBARBAZ/10000 0.4359 0.1500 0.3441 - "@{[join undef, map { chr(ord('a') + rand 26) } (1..1000)]}FOOBARBAZ" =~ /([a-z]|ab)*FOOBARBAZ/1000 0.5456 0.1500 0.2749 - "@{[join undef, map { chr(ord('a') + rand 26) } (1..10000)]}FOOBARBAZ" =~ /([a-z]|ab)*FOOBARBAZ/10 0.2039 0.0600 0.2943 - "@{[join undef, map { chr(ord('a') + rand 26) } (1..100)]}NOPE" =~ /([a-z]|ab)*FOOBARBAZ/1000000 0.9311 0.7400 0.7947 - "@{[join undef, map { chr(ord('a') + rand 26) } (1..1000)]}NOPE" =~ /([a-z]|ab)*FOOBARBAZ/100000 0.2162 0.2700 1.2489 - "@{[join undef, map { chr(ord('a') + rand 26) } (1..10000)]}NOPE" =~ /([a-z]|ab)*FOOBARBAZ/10000 0.1488 0.2300 1.5455 - "@{[join undef, map { chr(ord('a') + rand 26) } (1..100)]}NOPE" =~ /[a-z]*FOOBARBAZ/i1000 0.1555 0.0000 0.0000 - "@{[join undef, map { chr(ord('a') + rand 26) } (1..1000)]}NOPE" =~ /[a-z]*FOOBARBAZ/i10 0.1441 0.0000 0.0000 - - "@{[join undef, map { chr(ord('a') + rand 26) } (1..10000)]}NOPE" =~ /[a-z]*FOOBARBAZ/i10 13.7150 0.0100 0.0007 -As you might have noticed, Perl shines if it can reduce significant -parts of the matching process to cases where it can advance through -the target string one character at a time. This leads to C code where -you can very efficiently test and increment a pointer into a string in -a tight loop and can hardly be beaten with CL. In almost all other -cases, the CMUCL/CL-PPCRE combination is usually faster than Perl - -sometimes a lot faster. -
-As most of the examples above were chosen to make Perl look good -here's another benchmark - the -result of running
perltest.plagainst the -fulltestdatafile with a time -limit of 0.1 seconds, CL-PPCRE 0.1.2 on CMUCL 18e-pre -vs. Perl 5.6.1. CL-PPCRE is faster than Perl in 1511 of 1545 -cases - in 1045 cases it's more than twice as fast. --Note that Perl as well as CL-PPCRE keep the rightmost matches in -registers - keep that in mind if you benchmark against other regex -implementations. Also note that
CL-PPCRE-TEST:TEST-automatically skips test cases where Perl and CL-PPCRE don't agree. - -Other performance issues
- -While the scanners created by CL-PPCRE are pretty fast, the process -which creates scanners from Perl regex strings and parse trees isn't -that speedy and conses a lot. It is recommended that you store and -re-use scanners if possible. TheDO-macros will do this -for you automatically. --However, beginning with version 0.5.2, CL-PPCRE uses a compiler -macro and
LOAD-TIME-VALUE-to make sure that the scanner is only built once if the first argument -toSCAN,SCAN-TO-STRINGS,SPLIT, -REGEX-REPLACE, orREGEX-REPLACE-ALLis a constant -form. (But see the notes for*REGEX-CHAR-CODE-LIMIT*and -*USE-BMH-MATCHERS*.) --Here's an example of its effect - -
-* (trace cl-ppcre::convert) -(CL-PPCRE::CONVERT) -* (defun foo (string) (cl-ppcre:scan "(?s).*" string)) -FOO -* (time (foo "The quick brown fox")) -Compiling LAMBDA NIL: -Compiling Top-Level Form: - - 0: (CL-PPCRE::CONVERT #<lambda-list-unavailable>) - 0: CL-PPCRE::CONVERT returned - #<CL-PPCRE::SEQ {48B033C5}> - 0 - #<CL-PPCRE::EVERYTHING {48B031D5}> -Evaluation took: - 0.0 seconds of real time - 0.00293 seconds of user run time - 9.77e-4 seconds of system run time - 0 page faults and - 11,408 bytes consed. -0 -19 -#() -#() -* (time (foo "The quick brown fox")) -Compiling LAMBDA NIL: -Compiling Top-Level Form: - - 0: (CL-PPCRE::CONVERT #<lambda-list-unavailable>) - 0: CL-PPCRE::CONVERT returned - #<CL-PPCRE::SEQ {48B14C4D}> - 0 - #<CL-PPCRE::EVERYTHING {48B14B65}> -Evaluation took: - 0.0 seconds of real time - 0.00293 seconds of user run time - 0.0 seconds of system run time - 0 page faults and - 10,960 bytes consed. -0 -19 -#() -#() -* (compile 'foo) - 0: (CL-PPCRE::CONVERT #<lambda-list-unavailable>) - 0: CL-PPCRE::CONVERT returned - #<CL-PPCRE::SEQ {48B1FEC5}> - 0 - #<CL-PPCRE::EVERYTHING {48B1FDDD}> -Compiling LAMBDA (STRING): -Compiling Top-Level Form: -FOO -NIL -NIL -* (time (foo "The quick brown fox")) -Compiling LAMBDA NIL: -Compiling Top-Level Form: - -Evaluation took: - 0.0 seconds of real time - 0.0 seconds of user run time - 0.0 seconds of system run time - 0 page faults and - 0 bytes consed. -0 -19 -#() -#() -* (time (foo "The quick brown fox")) -Compiling LAMBDA NIL: -Compiling Top-Level Form: - -Evaluation took: - 0.0 seconds of real time - 0.0 seconds of user run time - 0.0 seconds of system run time - 0 page faults and - 0 bytes consed. -0 -19 -#() -#() -* -- --Of course, the usual rules for creating efficient regular expressions -apply to CL-PPCRE as well although it can optimize a couple of cases -itself. The most important rule is probably that you shouldn't use -capturing groups if you don't need the captured information, i.e. use -
"(?:a|b)*"instead of -"(a|b)*"if you don't need to refer to the -register. (In fact, in this particular case CL-PPCRE will be able to -optimize away the register group, but it won't if you replace -"a|b"with, say, -"a|bc".) --Another point worth mentioning is that you definitely should use -single-line mode if you have long strings without -
#\Newline(or where you don't care about the line breaks) -and plan to use regular expressions like -".*". See the benchmarks -for comparisons between single-line mode and normal mode with such -target strings. --Another thing to consider is that, for performance reasons, CL-PPCRE -assumes that most of the target strings you're trying to match are simple -strings and coerces non-simple strings to simple strings before -scanning them. If you plan on working with non-simple strings mostly, -you might consider modifying the CL-PPCRE source code. This is easy: -Change all occurrences of
SCHARtoCHARand -redefine the macro inutil.lispwhere the coercion takes -place - that's all. -
Bugs and problems
-Stack overflow
- -CL-PPCRE can optimize away a lot of unnecessary backtracking but -sometimes this simply isn't possible. With complicated regular -expressions and long strings this might lead to stack overflows -depending on your machine and your CL implementation. --Here's one example with CLISP: - -
-[1]> (defun target (n) (concatenate 'string (make-string n :initial-element #\a) "b")) -TARGET - -[2]> (cl-ppcre:scan "a*" (target 1000)) -0 ; -1000 ; -#() ; -#() - -[3]> (cl-ppcre:scan "(?:a|b)*" (target 1000)) -0 ; -1001 ; -#() ; -#() - -[4]> (cl-ppcre:scan "(a|b)*" (target 1000)) -0 ; -1001 ; -#(1000) ; -#(1001) - -[5]> (cl-ppcre:scan "(a|b)*" (target 10000)) -0 ; -10001 ; -#(10000) ; -#(10001) - -[6]> (cl-ppcre:scan "(a|b)*" (target 100000)) -0 ; -100001 ; -#(100000) ; -#(100001) - -[7]> (cl-ppcre:scan "(a|b)*" (target 1000000)) -0 ; -1000001 ; -#(1000000) ; -#(1000001) - -;; No problem until now - but... - -[8]> (cl-ppcre:scan "(a|)*" (target 100000)) -*** - Lisp stack overflow. RESET - -[9]> (cl-ppcre:scan "(a|)*" (target 3200)) -*** - Lisp stack overflow. RESET -- --With CMUCL the situation is better and worse at the same time. It will -take a lot longer until CMUCL gives up but if it gives up the whole -Lisp image will silently die (at least on my machine): -
-[Note: This was true for CMUCL 18e - CMUCL 19a behaves in a much nicer way and gives you a chance to recover.] - -
-* (defun target (n) (concatenate 'string (make-string n :initial-element #\a) "b")) -TARGET - -* (cl-ppcre:scan "(a|)*" (target 3200)) -0 -3200 -#(3200) -#(3200) - -* (cl-ppcre:scan "(a|)*" (target 10000)) -0 -10000 -#(10000) -#(10000) - -* (cl-ppcre:scan "(a|)*" (target 100000)) -0 -100000 -#(100000) -#(100000) - -* (cl-ppcre:scan "(a|)*" (target 1000000)) -0 -1000000 -#(1000000) -#(1000000) - -;; No problem until now - but... - -* (cl-ppcre:scan "(a|)*" (target 10000000)) -edi@bird:~ > -- -This behaviour can be changed with very conservative optimization settings but that'll make CL-PPCRE crawl compared to Perl. - --You might want to compare this to the way Perl handles the same situation. It might lie to you: - -
-edi@bird:~ > perl -le '$_="a" x 32766 . "b"; /(a|)*/; print $1' - -edi@bird:~ > perl -le '$_="a" x 32767 . "b"; /(a|)*/; print $1' -a -- -Or it might warn you before it's lying to you: --edi@bird:~ > perl -lwe '$_="a" x 32767 . "b"; /(a|)*/; print $1' -Complex regular subexpression recursion limit (32766) exceeded at -e line 1. -a -- -Or it might simply die: --edi@bird:~ > /opt/perl-5.8/bin/perl -lwe '$_="a" x 32767 . "b"; /(a|)*/; print $1' -Segmentation fault -- -Your mileage may vary, of course... -In Perl the following code works as expected, i.e. it prints
"\Q"doesn't work, or does it?1. @@ -2356,9 +1930,9 @@ print 1 If you try to do something similar in CL-PPCRE, you get an error:-* (let ((cl-ppcre:*allow-quoting* t) +* (let ((*allow-quoting* t) (a "\\E*")) - (cl-ppcre:scan (concatenate 'string "(?:\\Q" a "\\E){2}") "\\E*\\E*")) + (scan (concatenate 'string "(?:\\Q" a "\\E){2}") "\\E*\\E*")) Quantifier '*' not allowed at position 3 in string "(?:*\\E){2}"@@ -2380,9 +1954,7 @@ try CL-INTERPOL or useQUOTE-META-CHARS:* (let ((a "\\E*")) - (cl-ppcre:scan (concatenate 'string - "(?:" (cl-ppcre:quote-meta-chars a) "){2}") - "\\E*\\E*")) + (scan (concatenate 'string "(?:" (quote-meta-chars a) "){2}") "\\E*\\E*")) 0 6 #() @@ -2391,8 +1963,7 @@ href="#quote-meta-chars">QUOTE-META-CHARS: Or, even better and Lisp-ier, use the S-expression syntax instead - no need for quoting in this case:* (let ((a "\\E*")) - (cl-ppcre:scan `(:greedy-repetition 2 2 ,a) - "\\E*\\E*")) + (scan `(:greedy-repetition 2 2 ,a) "\\E*\\E*")) 0 6 #() @@ -2403,11 +1974,11 @@ Or, even better and Lisp-ier, use the S-expression sy* (let ((a "y\\y")) - (cl-ppcre:scan a a)) + (scan a a)) NIL-You didn't expect this to yieldNIL, did you? Shouldn't something like(CL-PPCRE:SCAN A A)always return a true value? No, because the first and the second argument toSCANare handled differently: The first argument is fed to CL-PPCRE's parser and is treated like a Perl regular expression. In particular, the parser "sees"\yand converts it toybecause\yhas no special meaning in regular expressions. So, the regular expression is the constant string"yy". But the second argument isn't converted - it is left as is, i.e. it's equivalent to Perl's'y\y'. In other words, this example would be equivalent to the Perl code +You didn't expect this to yieldNIL, did you? Shouldn't something like(SCAN A A)always return a true value? No, because the first and the second argument toSCANare handled differently: The first argument is fed to CL-PPCRE's parser and is treated like a Perl regular expression. In particular, the parser "sees"\yand converts it toybecause\yhas no special meaning in regular expressions. So, the regular expression is the constant string"yy". But the second argument isn't converted - it is left as is, i.e. it's equivalent to Perl's'y\y'. In other words, this example would be equivalent to the Perl code'y\y' =~ /y\y/; @@ -2424,16 +1995,6 @@ which should explain why it doesn't match.Still confused? You might want to try CL-INTERPOL. -
Remarks
- -The sample output from CMUCL and CLISP has been slightly edited to -increase readability. --All test cases and benchmarks in this document where performed on an -IBM Thinkpad T23 laptop (Pentium III 1.2 GHz, -768 MB RAM) running Gentoo -Linux 1.1a. -
AllegroCL compatibility mode
Since autumn 2004- The AllegroCL engine doesn't offer parse tree synonyms and filters. -
- The AllegroCL engine will choke on some regular expressions involving curly braces that are accepted by Perl and CL-PPCRE's native engine. -
- The AllegroCL engine's case-folding mode switch (which is used instead of CL-PPCRE's
:CASE-INSENSITIVEkeyword parameter) is currently only effective for ASCII characters. -- The AllegroCL engine doesn't support quoting of metacharacters. +
- The AllegroCL engine will choke on some regular expressions involving curly braces that are accepted by Perl and CL-PPCRE's native engine. +
- The AllegroCL engine's case-folding mode switch (which is used instead of CL-PPCRE's
:CASE-INSENSITIVEkeyword parameter) is currently only effective for ASCII characters. +- The AllegroCL engine doesn't support quoting of metacharacters.
- In AllegroCL compatibility mode compiled regular expressions (as returned by
CREATE-SCANNER) aren't functions but structures. +- The AllegroCL engine doesn't support named properties.
To use the AllegroCL compatibility mode you have to
@@ -2479,7 +2041,7 @@ To use the AllegroCL compatibility mode you have to
Acknowledgements
-Although I didn't use their code I was heavily inspired by looking at +Although I didn't use their code, I was heavily inspired by looking at the Scheme/CL regex implementations of Dorai Sitaram and mailing list as well as the output of Perl'suse re "debug"pragma have been very helpful in optimizing the scanners created by CL-PPCRE. -The asdf system definitions were kindly provided by Marco -Baringer. Hannu Koivisto provided patches to make the -
.systemfiles more usable. Thanks to Kevin Rosenberg and -Douglas Crosher for pointing out how to be friendly to case-sensitive -ACL images. Thanks to Karsten Poeck and JP Massar for their help in -making CL-PPCRE work with Corman Lisp. JP Massar and Kent M. Pitman -also helped to improve/fix the test suite and the compiler macro. Nikodemus Siivola provided the -fast charset implementation incharset.lisp. See the ChangeLog for several -other people who helped with bug reports or patches. +The list of people who participated in this project in one way or +the other has grown too long to maintain it here. See +the ChangeLog for all +the people who helped with patches, bug reports, or in other ways. +Thanks to all of them!
-Thanks to the guys at "Café Olé" in Hamburg -where I wrote most of the code and thanks to my wife for lending me -her PowerBook to test CL-PPCRE with MCL and OpenMCL. +Thanks to the guys at +"Café +Olé" +in Hamburg where I +wrote most of the 0.1.0 release and thanks to my wife for lending +me her PowerBook to test early versions of CL-PPCRE with MCL and +OpenMCL.
-$Header: /usr/local/cvsrep/cl-ppcre/doc/index.html,v 1.171 2008/07/03 10:06:17 edi Exp $ +$Header: /usr/local/cvsrep/cl-ppcre/doc/index.html,v 1.191 2008/07/23 02:14:09 edi Exp $