Skip to content

ftlRegex

Robert Rüger edited this page Feb 19, 2020 · 6 revisions

ftlRegex is a convenient Fortran wrapper around either the PCRE library or alternatively the POSIX regular expression functionality in the C standard library (aka regex.h).

Here is a little example code that shows what ftlRegex can do for you:

type(ftlString) :: line
type(ftlRegex)  :: regex

line = 'Element: mass=12 Z=6 symbol=C name=Carbon'
call regex%New('(\w+)\s*=\s*(\w+)')

line = regex%Replace(line, '\2<-\1', doGroupSub=.true.)

The ftlString line now holds:

Element: 12<-mass 6<-Z C<-symbol Carbon<-name

Quite a lot of work done in just one line of Fortran, isn't it?

Note that since ftlRegex internally uses the regular expression engine of either the PCRE library or the C standard library, the supported regular expression elements are up to the implementation of these libraries. It's probably best to use the PCRE library which is available on all platforms (Windows, I'm looking at you here ...). Also it is more powerful than the POSIX regular expressions, i.e. it has non-capturing groups and many more features on top of what POSIX offers. Everything from the POSIX standard should also work with PCRE, but if you want to keep the option of linking against either one, you should stick to the POSIX standard regular expressions, specifically the POSIX Extended Regular Syntax.

Building & linking

Check how ftlRegex is built in the makefile that comes with the FTL. In summary you have two options:

  1. Compile with "-DUSE_PCRE" and link with "-lpcreposix -lpcre". This will use the PCRE library as the regex engine.
  2. Compile without "-DUSE_PCRE" and link with nothing. This will use the standard POSIX regular expressions as the regex engine.

Note that compilation of the ftlRegex.F90 file requires the "configure_ftlRegex.inc" file, which contains the numeric values of some of the enums in the C headers. This file can be generated with the small C program in the configure directory. Again, just check what the makefile of the FTL does ...

Unfortunately the numeric values tend to differ between the PCRE POSIX header file and regex.h, so you need to make sure that this is consistent during compilation and linking, i.e. do not compile with "-DUSE_PCRE" but then link against the standard POSIX regex engine. The linking will succeed, but the resulting ftlRegex library will do strange things at runtime ...

Derived types in ftlRegexModule

In addition to the ftlRegex type itself, the ftlRegexModule defines some other types that are used as return types of the matching methods of the ftlRegex type.

type, public :: ftlRegexMatch
   logical                          :: matches = .false.
   type(ftlString)                  :: text
   integer                          :: begin = 0
   integer                          :: end   = 0
   type(ftlRegexGroup), allocatable :: group(:)
end type

Here the matches member is .true. if a match was found. If a match was found the text that matches the regular expression is stored as an ftlString in the text member variable. The position of the match in the original string is given by the range [begin, end). Not that this (like all ranges used in the FTL) is a half open interval, meaning that begin is included and end is the first excluded character. So the text member compares equal to string(begin:end-1), if string is a raw Fortran string. The group member holds the contents of the regular expression's capture groups, if the particular expressions uses any. The used ftlRegexGroup type is defined as:

type, public :: ftlRegexGroup
   type(ftlString) :: text
   integer         :: begin = 0
   integer         :: end   = 0
end type

Here text is just text captured by the group and begin and end delimit where the captured group is found in the original string, again as a half open interval.

ftlRegex methods

Construction, destruction, assignment & comparison

ftlRegex%New()

Constructs a new ftlDynArray container from a variety of data sources:

  • Pattern constructor. Constructs an ftlRegex using either an ftlString (or alternatively a normal Fortran string) containing the regular expression pattern, and a number of optional logical arguments.

    subroutine New(self, pattern, basic, icase, nosub, newline)
       type(ftlRegex) , intent(inout)           :: self
       type(ftlString), intent(in)              :: pattern
       logical        , intent(in)   , optional :: basic, icase, nosub, newline

    The optional logicals have the following meaning:

    basic

    This flag is only relevant when linking against the regular expression engine in the C standard library, instead of the (recommended) PCRE library. If this is the case basic POSIX regular expressions are used instead of the POSIX Extended Regular Syntax that ftlRegex uses by default.

    icase

    Do not differentiate case. Subsequent searches using the ftlRegex will be case insensitive.

    nosub

    Do not report position of matches or capturing groups. The resulting ftlRegex can pretty much only be used to test if something matches, but not where exactly. However, testing for matches will be faster. (Hopefully, this depends on your libc implementation ...)

    newline

    Match-any-character operators don't match a newline. A nonmatching list ([^...]) not containing a newline does not match a newline.

    Example usage:

    type(ftlRegex)  :: regex
    type(ftlString) :: pattern
    
    call regex%New('\s*=\s*') ! construction from raw Fortran string ...
    
    pattern = 'TeSt'
    call regex%New(line, icase=.true.) ! ... or from an ftlString pattern
  • Copy constructor. Constructs one regular expression as a copy of another.

    subroutine New(self, other)
       type(ftlRegex), intent(inout) :: self
       type(ftlRegex), intent(in)    :: other

Note that the constructors are also available as free functions named ftlRegex() that take the same parameters as above type bound subroutines and return an ftlRegex instance. This is sometimes useful if one wants to use a regular expression only once:

write (*,*) ('T12T' .matches. ftlRegex('T[0-9]+T')) ! prints True

ftlRegex%Delete()

Destructs the regular expression. All used memory is deallocated.

subroutine Delete(self)
   type(ftlRegex), intent(inout) :: self

It's not necessary to call Delete manually. It is used as the finalizer of the ftlRegex type and will be called automatically when an ftlRegex goes out of scope.

ftlRegex assignment(=)

Copy assignment. Replaces the contents with a copy of the contents of other.

subroutine assignment(=)(self, other)
   type(ftlDynArrayT), intent(inout) :: self
   type(ftlDynArrayT), intent(in)    :: other

This is exactly the same as using the copy constructor. (The assignment has only been implemented because intrinsic assignment would do the wrong thing and crash the program when the assigned regexes go out of scope.)

ftlRegex operator(==) ftlRegex

ftlRegex operator(/=) ftlRegex

Compares two regular expressions for (in)equality.

logical function operator(==)(lhs, rhs)
    type(ftlRegex), intent(in) :: lhs, rhs

logical function operator(/=)(lhs, rhs)
    type(ftlRegex), intent(in) :: lhs, rhs

Two regular expressions are considered equal both the pattern and the (optional) flags passed to their constructor are equal.

Matching & match replacement

string operator(.matches.) ftlRegex

Checks whether a string (either ftlString or raw Fortran string) matches a regular expression.

logical function operator(.matches.)(lhs, rhs)
   type(ftlString), intent(in) :: lhs
   type(ftlRegex) , intent(in) :: rhs

Example usage:

type(ftlRegex)  :: newsec
type(ftlString) :: line
integer :: unit, iostat, numSections

! open some file as unit

call newsec%New('^\s*SECTION\s*$', icase=.true., nosub=.true.)

numSections = 0
do while (.true.)
   call line%ReadLine(unit, iostat)
   if (is_iostat_end(iostat)) exit
   if (line .matches. newsec) numSections = numSections + 1
enddo
write (*,*) 'Found ', numSections, 'in file'

ftlRegex%NumMatches()

Returns the number of non-overlapping matches of regex in string (which can either be an ftlString or a raw Fortran string).

integer function NumMatches(self, string)
   type(ftlRegex) , intent(in) :: self
   type(ftlString), intent(in) :: string

Example usage:

type(ftlRegex) :: regex
call regex%New('[a-zA-z]\s*=\s*[0-9]+')
write (*,*) regex%NumMatches('u=12 F=32 a=b x=7') ! prints 3

ftlRegex%Match()

Returns an array of all non-overlapping matches of the regular expression in string (which can either be an ftlString or a raw Fortran string).

function Match(self, string)
   type(ftlRegex)     , intent(in)  :: self
   type(ftlString)    , intent(in)  :: string
   type(ftlRegexMatch), allocatable :: matches(:)

If no matches are found, the returned array has a size of 0.

Example usage:

type(ftlString) :: line
type(ftlRegex) :: r
type(ftlRegexMatch), allocatable :: m(:)

line = 'keyword option1=value option2=othervalue'
call r%New('(\w+)\s*=\s*(\w+)')
m = r%Match(line)

! m(1)%text now holds 'option1=value'
! m(2)%text now holds 'option2=othervalue'
! m(:)%group is also populated with the contents of the capture groups.
! e.g. m(1)%group(2)%text holds 'value'

ftlRegex%MatchFirst()

Returns a ftlRegexMatch for the first match of the regular expression in a string (which can either be an ftlString or a raw Fortran string).

type(ftlRegexMatch) function MatchFirst(self, string)
   type(ftlRegex) , intent(in) :: self
   type(ftlString), intent(in) :: string

If no match is found then the matched member variable of the returned ftlRegexMatch is set to .false..

Example usage:

type(ftlRegex) :: regex
type(ftlRegexMatch) :: match
call regex%New('[a-zA-z]\s*=\s*[0-9]+')
match = regex%MatchFirst('u=12 F=32 a=b x=7')
! match%text now holds 'u=12'

ftlRegex%Replace()

Returns an ftlString where all matches of the regular expression in string have been replaced with sub. Note that both string and sub can be either ftlString or raw Fortran strings.

type(ftlString) function Replace(self, string, sub, doGroupSub)
   class(ftlRegex), intent(in)           :: self
   type(ftlString), intent(in)           :: string
   type(ftlString), intent(in)           :: sub
   logical        , intent(in), optional :: doGroupSub

If the optional argument doGroupSub is present and .true., the contents of the regular expression's capture groups can be used in the substitution string: \n will be replaced by the contents of the n'th capture group.

Example usage:

type(ftlString) :: line
type(ftlRegex)  :: regex

line = 'Element: mass=12 Z=6 symbol=C name=Carbon'
call regex%New('(\w+)\s*=\s*(\w+)')

line = regex%Replace(line, '\2<-\1', doGroupSub=.true.)

! line now holds: 'Element: 12<-mass 6<-Z C<-symbol Carbon<-name'