Skip to content

Commit

Permalink
Merge pull request #5 from tom-lord/random_example
Browse files Browse the repository at this point in the history
Regexp#random_example added
  • Loading branch information
tom-lord committed Mar 8, 2015
2 parents 5e2850e + 9196766 commit e20ad25
Show file tree
Hide file tree
Showing 9 changed files with 155 additions and 76 deletions.
33 changes: 24 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,11 @@
[![Build Status](https://travis-ci.org/tom-lord/regexp-examples.svg?branch=master)](https://travis-ci.org/tom-lord/regexp-examples/builds)
[![Coverage Status](https://coveralls.io/repos/tom-lord/regexp-examples/badge.svg?branch=master)](https://coveralls.io/r/tom-lord/regexp-examples?branch=master)

Extends the Regexp class with the method: Regexp#examples
Extends the Regexp class with the methods: `Regexp#examples` and `Regexp#random_example`

This method generates a list of (some\*) strings that will match the given regular expression.
`Regexp#examples` generates a list of all\* strings that will match the given regular expression.

`Regexp#random_example` returns one, random string (from all possible strings!!) that matches the regex.

\* If the regex has an infinite number of possible srings that match it, such as `/a*b+c{2,}/`,
or a huge number of possible matches, such as `/.\w/`, then only a subset of these will be listed.
Expand All @@ -31,6 +33,14 @@ For more detail on this, see [configuration options](#configuration-options).
|
\u{28}\u2310\u25a0\u{5f}\u25a0\u{29}
/x.examples #=> ["(•_•)", "( •_•)>⌐■-■ ", "(⌐■_■)"]

###################################################################################

# Obviously, you will get different results if you try these yourself!
/\w{10}@(hotmail|gmail)\.com/.random_example #=> "TTsJsiwzKS@gmail.com"
/\p{Greek}{80}/.random_example
#=> "ΖΆΧͷᵦμͷηϒϰΟᵝΔ΄θϔζΌψΨεκᴪΓΕπι϶ονϵΓϹᵦΟπᵡήϴϜΦϚϴϑ͵ϴΉϺ͵ϹϰϡᵠϝΤΏΨϹϊϻαώΞΰϰΑͼΈΘͽϙͽξΆΆΡΡΉΓς"
/written by tom lord/i.random_example #=> "WrITtEN bY tOM LORD"
```

## Installation
Expand All @@ -51,7 +61,7 @@ Or install it yourself as:

## Supported syntax

Short answer: **Everything** is supported, apart from "irregular" aspects of the regexp language -- see [impossible features](#impossible-features-illegal-syntax)
Short answer: **Everything** is supported, apart from "irregular" aspects of the regexp language -- see [impossible features](#impossible-features-illegal-syntax).

Long answer:

Expand Down Expand Up @@ -89,7 +99,7 @@ Long answer:
## Bugs and Not-Yet-Supported syntax

* There are some (rare) edge cases where backreferences do not work properly, e.g. `/(a*)a* \1/.examples` - which includes "aaaa aa". This is because each repeater is not context-aware, so the "greediness" logic is flawed. (E.g. in this case, the second `a*` should always evaluate to an empty string, because the previous `a*` was greedy! However, patterns like this are highly unusual...)
* Some named properties, e.g. `/\p{Arabic}/`, list non-matching examples for ruby 2.0/2.1 (as the definitions changed in ruby 2.2). This will be fixed in version 1.1.0 (see the pending pull request)!
* Some named properties, e.g. `/\p{Arabic}/`, list non-matching examples for ruby 2.0/2.1 (as the definitions changed in ruby 2.2). This will be fixed in version 1.1.1 (see the pending pull request)!

Since the Regexp language is so vast, it's quite likely I've missed something (please raise an issue if you find something)! The only missing feature that I'm currently aware of is:
* Conditional capture groups, e.g. `/(group1)? (?(1)yes|no)/.examples` (which *should* return: `["group1 yes", " no"]`)
Expand Down Expand Up @@ -127,33 +137,38 @@ When generating examples, the gem uses 2 configurable values to limit how many e
* `[h-s]` is equivalent to `[hijkl]`
* `(1|2|3|4|5|6|7|8)` is equivalent to `[12345]`

Rexexp#examples makes use of *both* these options; Rexexp#random_example only uses `max_repeater_variance`, since the other option is redundant!

To use an alternative value, simply pass the configuration option as follows:

```ruby
/a*/.examples(max_repeater_variance: 5)
#=> [''. 'a', 'aa', 'aaa', 'aaaa' 'aaaaa']
/[F-X]/.examples(max_group_results: 10)
#=> ['F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O']
/.*/.random_example(max_repeater_variance: 50)
#=> "A very unlikely result!"
```

_**WARNING**: Choosing huge numbers, along with a "complex" regex, could easily cause your system to freeze!_
_**WARNING**: Choosing huge numbers for `Regexp#examples`, along with a "complex" regex, could easily cause your system to freeze!_

For example, if you try to generate a list of _all_ 5-letter words: `/\w{5}/.examples(max_group_results: 999)`, then since there are actually `63` "word" characters (upper/lower case letters, numbers and "\_"), this will try to generate `63**5 #=> 992436543` (almost 1 _trillion_) examples!

In other words, think twice before playing around with this config!

A more sensible use case might be, for example, to generate one random 1-4 digit string:
A more sensible use case might be, for example, to generate all 1-4 digit strings:

`/\d{1,4}/.examples(max_repeater_variance: 3, max_group_results: 10)`

`/\d{1,4}/.examples(max_repeater_variance: 3, max_group_results: 10).sample(1)`
Due to code optimisation, this is not something you need to worry about (much) for `Regexp#random_example`. For instance, the following takes no more than ~ 1 second on my machine:

(Note: I may develop a much more efficient way to "generate one example" in a later release of this gem.)
`/.*\w+\d{100}/.random_example(max_repeater_variance: 1000)`

## TODO

* Performance improvements:
* Use of lambdas/something (in [constants.rb](lib/regexp-examples/constants.rb)) to improve the library load time. See the pending pull request.
* (Maybe?) add a `max_examples` configuration option and use lazy evaluation, to ensure the method never "freezes".
* Potential future feature: `Regexp#random_example` - but implementing this properly is non-trivial, due to performance issues that need addressing first!
* Write a blog post about how this amazing gem works! :)

## Contributing
Expand Down
5 changes: 3 additions & 2 deletions lib/regexp-examples/constants.rb
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ class ResultCountLimiters

class << self
attr_reader :max_repeater_variance, :max_group_results
def configure!(max_repeater_variance, max_group_results)
def configure!(max_repeater_variance, max_group_results = nil)
@max_repeater_variance = (max_repeater_variance || MaxRepeaterVarianceDefault)
@max_group_results = (max_group_results || MaxGroupResultsDefault)
end
Expand All @@ -44,7 +44,8 @@ module CharSets
Whitespace = [' ', "\t", "\n", "\r", "\v", "\f"]
Control = (0..31).map(&:chr) | ["\x7f"]
# Ensure that the "common" characters appear first in the array
Any = Lower | Upper | Digit | Punct | (0..127).map(&:chr)
# Also, ensure "\n" comes first, to make it obvious when included
Any = ["\n"] | Lower | Upper | Digit | Punct | (0..127).map(&:chr)
AnyNoNewLine = Any - ["\n"]
end.freeze

Expand Down
25 changes: 21 additions & 4 deletions lib/regexp-examples/core_extensions/regexp/examples.rb
Original file line number Diff line number Diff line change
@@ -1,12 +1,29 @@
module CoreExtensions
module Regexp
module Examples
def examples(config_options={})
full_examples = RegexpExamples.map_results(
RegexpExamples::Parser.new(source, options, config_options).parse
def examples(**config_options)
RegexpExamples::ResultCountLimiters.configure!(
config_options[:max_repeater_variance],
config_options[:max_group_results]
)
RegexpExamples::BackReferenceReplacer.new.substitute_backreferences(full_examples)
examples_by_method(:map_results)
end

def random_example(**config_options)
RegexpExamples::ResultCountLimiters.configure!(
config_options[:max_repeater_variance]
)
examples_by_method(:map_random_result).first
end

private
def examples_by_method(method)
full_examples = RegexpExamples.public_send(
method,
RegexpExamples::Parser.new(source, options).parse
)
RegexpExamples::BackReferenceReplacer.new.substitute_backreferences(full_examples)
end
end
end
end
Expand Down
51 changes: 40 additions & 11 deletions lib/regexp-examples/groups.rb
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,14 @@ def result
end
end

module RandomResultBySample
def random_result
result.sample(1)
end
end

class SingleCharGroup
include RandomResultBySample
prepend GroupWithIgnoreCase
def initialize(char, ignorecase)
@char = char
Expand All @@ -48,17 +55,19 @@ def result
end
end

# Used as a workaround for when a grep is expected to be returned,
# Used as a workaround for when a group is expected to be returned,
# but there are no results for the group.
# i.e. PlaceHolderGroup.new.result == '' == SingleCharGroup.new('').result
# (But using PlaceHolderGroup makes it clearer what the intention is!)
class PlaceHolderGroup
include RandomResultBySample
def result
[GroupResult.new('')]
end
end

class CharGroup
include RandomResultBySample
prepend GroupWithIgnoreCase
def initialize(chars, ignorecase)
@chars = chars
Expand All @@ -74,6 +83,7 @@ def result
end

class DotGroup
include RandomResultBySample
attr_reader :multiline
def initialize(multiline)
@multiline = multiline
Expand All @@ -94,37 +104,56 @@ def initialize(groups, group_id)
@group_id = group_id
end

# Generates the result of each contained group
# and adds the filled group of each result to
# itself
def result
strings = @groups.map {|repeater| repeater.result}
result_by_method(:result)
end

def random_result
result_by_method(:random_result)
end

private
# Generates the result of each contained group
# and adds the filled group of each result to itself
def result_by_method(method)
strings = @groups.map {|repeater| repeater.public_send(method)}
RegexpExamples.permutations_of_strings(strings).map do |result|
GroupResult.new(result, group_id)
end
end
end

class MultiGroupEnd
end

class OrGroup
def initialize(left_repeaters, right_repeaters)
@left_repeaters = left_repeaters
@right_repeaters = right_repeaters
end


def result
left_result = RegexpExamples.map_results(@left_repeaters)
right_result = RegexpExamples.map_results(@right_repeaters)
result_by_method(:map_results)
end

def random_result
# TODO: This logic is flawed in terms of choosing a truly "random" example!
# E.g. /a|b|c|d/.random_example will choose a letter with the following probabilities:
# a = 50%, b = 25%, c = 12.5%, d = 12.5%
# In order to fix this, I must either apply some weighted selection logic,
# or change how the OrGroup examples are generated - i.e. make this class work with >2 repeaters
result_by_method(:map_random_result).sample(1)
end

private
def result_by_method(method)
left_result = RegexpExamples.public_send(method, @left_repeaters)
right_result = RegexpExamples.public_send(method, @right_repeaters)
left_result.concat(right_result).flatten.uniq.map do |result|
GroupResult.new(result)
end
end
end

class BackReferenceGroup
include RandomResultBySample
attr_reader :id
def initialize(id)
@id = id
Expand Down
17 changes: 12 additions & 5 deletions lib/regexp-examples/helpers.rb
Original file line number Diff line number Diff line change
@@ -1,8 +1,6 @@
module RegexpExamples
# Given an array of arrays of strings,
# returns all possible perutations,
# for strings created by joining one
# element from each array
# Given an array of arrays of strings, returns all possible perutations
# for strings, created by joining one element from each array
#
# For example:
# permutations_of_strings [ ['a'], ['b'], ['c', 'd', 'e'] ] #=> ['abc', 'abd', 'abe']
Expand All @@ -29,8 +27,17 @@ def self.join_preserving_capture_groups(result)
end

def self.map_results(repeaters)
generic_map_result(repeaters, :result)
end

def self.map_random_result(repeaters)
generic_map_result(repeaters, :random_result)
end

private
def self.generic_map_result(repeaters, method)
repeaters
.map {|repeater| repeater.result}
.map {|repeater| repeater.public_send(method)}
.instance_eval do |partial_results|
RegexpExamples.permutations_of_strings(partial_results)
end
Expand Down
19 changes: 6 additions & 13 deletions lib/regexp-examples/parser.rb
Original file line number Diff line number Diff line change
Expand Up @@ -2,24 +2,19 @@ module RegexpExamples
IllegalSyntaxError = Class.new(StandardError)
class Parser
attr_reader :regexp_string
def initialize(regexp_string, regexp_options, config_options={})
def initialize(regexp_string, regexp_options)
@regexp_string = regexp_string
@ignorecase = !(regexp_options & Regexp::IGNORECASE).zero?
@multiline = !(regexp_options & Regexp::MULTILINE).zero?
@extended = !(regexp_options & Regexp::EXTENDED).zero?
@num_groups = 0
@current_position = 0
ResultCountLimiters.configure!(
config_options[:max_repeater_variance],
config_options[:max_group_results]
)
end

def parse
repeaters = []
while @current_position < regexp_string.length
until end_of_regexp
group = parse_group(repeaters)
break if group.is_a? MultiGroupEnd
if group.is_a? OrGroup
return [OneTimeRepeater.new(group)]
end
Expand All @@ -35,8 +30,6 @@ def parse_group(repeaters)
case next_char
when '('
group = parse_multi_group
when ')'
group = parse_multi_end_group
when '['
group = parse_char_group
when '.'
Expand Down Expand Up @@ -241,10 +234,6 @@ def regexp_options_toggle(on, off)
@extended = false if (off.include? "x")
end

def parse_multi_end_group
MultiGroupEnd.new
end

def parse_char_group
@current_position += 1 # Skip past opening "["
chargroup_parser = ChargroupParser.new(rest_of_string)
Expand Down Expand Up @@ -345,6 +334,10 @@ def rest_of_string
def next_char
regexp_string[@current_position]
end

def end_of_regexp
next_char == ")" || @current_position >= regexp_string.length
end
end
end

Loading

0 comments on commit e20ad25

Please sign in to comment.