Ample regular expressions — Compose and reuse Emacs regular expressions with ease.
If you ever tried to write more than a few related regexps and it felt that there should be a way to pick out their common parts and just plug them in without worrying about grouping and precedence, this package is for you.
ample-regexps
is tested to work on Emacs24. It should work on Emacs23,
but no guarantees about that.
ample-regexps
is available on MELPA from where it
can be installed via:
M-x package-install ample-regexps
If you haven't yet added MELPA repositories to your config, feel free to follow these instructions to do so.
The library is also available via el-get:
M-x el-get-install ample-regexps
Also, since this package has no dependencies, you can just drop the
ample-regexps.el
file somewhere on load-path
and enable it with
(require 'ample-regexps)
There's plenty of ways to help: use this package, spread the word, fix bugs, post bug reports or fresh ideas to the issue tracker, add tests, etc.
To participate in development, you'll probably need cask. The only dependency as of now is ert-runner, so it's possible to run tests manually, but it's rather inconvenient. It's a lot easier to just do:
$ cask install
$ make test
The main item of the API is the define-arx
macro. Let's start with a simple
example:
(define-arx hello-world-rx '()) ;; -> hello-world-rx
(hello-world-rx "Hello, world") ;; -> "Hello, world"
(hello-world-rx (* "Hello, world")) ;; -> "\\(?:Hello, world\\)*"
define-arx
defines a macro that converts s-exps into regular expressions.
If you're familiar with rx
package — if not, I encourage you to do so — you're probably starting to
experience déjà vu. You're right: rx
is used underneath,
ample-regexps
is just a cherry on the pie adding customization with a hint
of syntactic sugar atop.
Let's start with something simple and see how you can alias components to save some keystrokes:
(define-arx h-w-rx
'((h "Hello, ")
(w "world"))) ;; -> hello-world-rx
(h-w-rx h w) ;; -> "Hello, world"
(h-w-rx (* h w)) ;; -> "\\(?:Hello, world\\)*"
Aliased literals are regexp quoted, but you can alias a regular expression if you want:
(define-arx alnum-rx
'((alpha_ (regexp "[[:alpha:]_]"))
(alnum_ (regexp "[[:alnum:]_]")))) ;; -> alnum-rx
(alnum-rx (+ alpha_) (* alnum_)) ;; -> "[[:alpha:]_]+[[:alnum:]_]*"
In fact, (regexp ...)
is just an rx
S-expression which you can compose
and nest arbitrarily to define even more forms:
(define-arx assignment-rx
'((alpha_ (regexp "[[:alpha:]_]"))
(alnum_ (regexp "[[:alnum:]_]"))
(ws (* blank))
(id (seq symbol-start (+ alpha_) (* alnum_) symbol-end)))) ;; -> assignment-rx
(assignment-rx id ws "=" ws id) ;; -> "\\_<[[:alpha:]_]+[[:alnum:]_]*\\_>[[:blank:]]*=[[:blank:]]*\\_<[[:alpha:]_]+[[:alnum:]_]*\\_>"
Ok, this was all simple aliasing, but what if you want to add some custom S-expressions, too? Fear thou not, we've got you covered:
(define-arx cond-assignment-rx
'((alpha_ (regexp "[[:alpha:]_]"))
(alnum_ (regexp "[[:alnum:]_]"))
(ws (* blank))
(sym (:func (lambda (_form &rest args)
`(seq symbol-start (or ,@args) symbol-end))))
(cond-keyword (sym "if" "elif" "while"))
(id (sym (+ alpha_) (* alnum_))))) ;; -> cond-assignment-rx
(cond-assignment-rx cond-keyword ws id ":" id ws "=" ws id) ;; -> "\\_<\\(?:elif\\|if\\|while\\)\\_>[[:blank:]]*\\_<\\(?:[[:alpha:]_]+\\|[[:alnum:]_]*\\)\\_>:\\_<\\(?:[[:alpha:]_]+\\|[[:alnum:]_]*\\)\\_>[[:blank:]]*=[[:blank:]]*\\_<\\(?:[[:alpha:]_]+\\|[[:alnum:]_]*\\)\\_>"
(:func ...)
plist allows to use a simple function that will be passed all the
s-expressions from the form as arguments with the first argument will being the
form symbol itself. You can treat them as a list like above or decompose and
name to your liking (destructuring-bind
anyone?). Let's see how one could
write a matcher for a list of comma-separated values:
(define-arx csv-rx
'((csv (:func (lambda (_form n arg)
`(seq ,@(nbutlast (cl-loop for i from 1 to n
collect `(group-n ,i ,arg)
collect ", ")))))))) ;; -> csv-rx
(csv-rx (csv 3 (seq "foobar"))) ;; -> "\\(?1:foobar\\), \\(?2:foobar\\), \\(?3:foobar\\)"
There's a drawback to this, if you pass an incorrect number of arguments, you'll get an unreadable error message:
(csv-rx (csv 3 "foo" "bar")) ;; -> Wrong number of arguments: (lambda (_form n arg) (\` (seq (\,@ (nbutlast (cl-loop for i from 1 to n collect (\` (group-n (\, i) (\, arg))) collect ", ")))))), 4
To make this more readable, form-function plist supports :min-args
and :max-args
keywords:
(define-arx csv-rx
'((csv (:func (lambda (_form n arg)
`(seq ,@(nbutlast (cl-loop for i from 1 to n
collect `(group-n ,i ,arg)
collect ", "))))
:min-args 2
:max-args 2)))) ;; -> csv-rx
(csv-rx (csv 3 "foo" "bar")) ;; -> (error "rx form `csv' accepts at most 2 args")
(csv-rx (csv 3)) ;; -> (error "rx form `csv' requires at least 2 args")
Form functions obviously can be made to support recursion. You may have
noticed that csv-rx
only matches lists of exactly N elements. Let's fix it
to match any length up to N (you can achieve the same effect with a simple
loop, but I really wanted to avoid using factorial to show recursion):
(defun csv-opt (_form n elt &optional accum)
(cond
((<= n 0) accum)
((null accum) (list _form (1- n) elt (list 'group-n n elt)))
(t (list _form (1- n) elt (list 'group-n n elt `(opt ", " ,accum)))))) ;; -> csv-opt
(define-arx csv-opt-rx
'((csv-opt (:func csv-opt)))) ;; -> csv-opt-rx
(csv-opt-rx (csv-opt 3 "foo")) ;; -> "\\(?1:foo\\(?:, \\(?2:foo\\(?:, \\(?3:foo\\)\\)?\\)\\)?\\)"
Such expressions in plain-text are hardly readable, let alone maintainable, but wrapped in a function call they don't seem scary at all.
Form functions can return raw regular expressions, too. This is, for example,
how you could backport group-n
form to Emacs23 where it's not available (if
you had to):
(define-arx backport-rx
'((group-n (:func (lambda (_form index &rest args)
(concat (format "\\(?%d:" index)
(mapconcat (lambda (f) (rx-form f ':)) args "")
"\\)")))))) ;; -> backport-rx
(backport-rx (group-n 1 (seq "foo" (* "bar")))) ;; -> "\\(?1:foo\\(?:bar\\)*\\)"
The snippet above uses mapconcat
and a bit of underdocumented rx
functionality, you can avoid that with special convenience functions:
arx-and
and arx-or
:
(define-arx backport-rx
'((group-n (:func (lambda (_form index &rest args)
(concat (format "\\(?%d:" index)
(arx-and args)
"\\)")))))) ;; -> backport-rx
(backport-rx (group-n 1 (seq "foo" (* "bar")))) ;; -> "\\(?1:foo\\(?:bar\\)*\\)"
Be warned though, this is a power user feature and no extra grouping will be performed which may cause unexpected results:
(define-arx ungrouped-rx
'((foo (:func (lambda (_form) "foo"))))) ;; -> ungrouped-rx
(ungrouped-rx (foo) (foo)) ;; -> "foofoo"
(ungrouped-rx (* (foo))) ;; -> "foo*"
To avoid surprises, make sure you the resulting expressions are grouped.
(define-arx foobar-rx ...)
is a macro, that defines three things:
- a macro
(foobar-rx ...)
to be replaced by a constant during compilation - a function
(foobar-rx-to-string ...)
that can be used in runtime - a variable
foobar-rx-constituents
with form definitions to use
When either the function or the macro is called, constituents variable is used
to override rx-constituents
via dynamic scoping and the rest is performed by
rx-to-string
function.
This package is provided under the terms and conditions of GPLv3 license.
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/ .