Skip to content

Commit

Permalink
Merge pull request #12 from tom-lord/globally_configurable_results_li…
Browse files Browse the repository at this point in the history
…miters

Globally configurable results limiters
  • Loading branch information
tom-lord authored Oct 15, 2017
2 parents 7ae5278 + 67b1747 commit 36db48a
Show file tree
Hide file tree
Showing 8 changed files with 227 additions and 110 deletions.
27 changes: 25 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,6 @@ Extends the `Regexp` class with the methods: `Regexp#examples` and `Regexp#rando

\* If the regex has an infinite number of possible strings that match it, such as `/a*b+c{2,}/`,
or a huge number of possible matches, such as `/.\w/`, then only a subset of these will be listed.

For more detail on this, see [configuration options](#configuration-options).

If you'd like to understand how/why this gem works, please check out my [blog post](https://tom-lord.github.io/Reverse-Engineering-Regular-Expressions/) about it.
Expand Down Expand Up @@ -149,7 +148,9 @@ When generating examples, the gem uses 3 configurable values to limit how many e

`Rexexp#examples` makes use of *all* these options; `Rexexp#random_example` only uses `max_repeater_variance`, since the other options are redundant.

To use an alternative value, simply pass the configuration option as follows:
### Defining custom configuration values

To use an alternative value, you can either pass the configuration option as a parameter:

```ruby
/a*/.examples(max_repeater_variance: 5)
Expand All @@ -162,19 +163,41 @@ To use an alternative value, simply pass the configuration option as follows:
#=> "A very unlikely result!"
```

Or, set an alternative value *within a block*:

```ruby
RegexpExamples::Config.with_configuration(max_repeater_variance: 5) do
# ...
end
```

Or, globally set a different default value:

```ruby
# e.g In a rails project, you may wish to place this in
# config/initializers/regexp_examples.rb
RegexpExamples::Config.max_repeater_variance = 5
RegexpExamples::Config.max_group_results = 10
RegexpExamples::Config.max_results_limit = 20000
```

A sensible use case might be, for example, to generate all 1-5 digit strings:

```ruby
/\d{1,5}/.examples(max_repeater_variance: 4, max_group_results: 10, max_results_limit: 100000)
#=> ['0', '1', '2', ..., '99998', '99999']
```

### Configuration Notes

Due to code optimisation, `Regexp#random_example` runs pretty fast even on very complex patterns.
(I.e. It's typically a _lot_ faster than using `/pattern/.examples.sample(1)`.)
For instance, the following takes no more than ~ 1 second on my machine:

`/.*\w+\d{100}/.random_example(max_repeater_variance: 1000)`

All forms of configuration mentioned above **are thread safe**.

## Bugs and TODOs

There are no known major bugs with this library. However, there are a few obscure issues that you *may* encounter:
Expand Down
16 changes: 6 additions & 10 deletions lib/core_extensions/regexp/examples.rb
Original file line number Diff line number Diff line change
Expand Up @@ -5,19 +5,15 @@ module Regexp
# No core classes are extended in any way, other than the above two methods.
module Examples
def examples(**config_options)
RegexpExamples::ResultCountLimiters.configure!(
max_repeater_variance: config_options[:max_repeater_variance],
max_group_results: config_options[:max_group_results],
max_results_limit: config_options[:max_results_limit]
)
examples_by_method(:result)
RegexpExamples::Config.with_configuration(config_options) do
examples_by_method(:result)
end
end

def random_example(**config_options)
RegexpExamples::ResultCountLimiters.configure!(
max_repeater_variance: config_options[:max_repeater_variance]
)
examples_by_method(:random_result).sample(1).first
RegexpExamples::Config.with_configuration(config_options) do
examples_by_method(:random_result).sample
end
end

private
Expand Down
66 changes: 41 additions & 25 deletions lib/regexp-examples/constants.rb
Original file line number Diff line number Diff line change
@@ -1,9 +1,47 @@
# :nodoc:
module RegexpExamples
# Configuration settings to limit the number/length of Regexp examples generated
class ResultCountLimiters
class Config
class << self
def with_configuration(**new_config)
original_config = config.dup

begin
self.config = new_config
result = yield
ensure
self.config = original_config
end

result
end

# Thread-safe getters and setters
%i[max_repeater_variance max_group_results max_results_limit].each do |m|
define_method(m) do
config[m]
end
define_method("#{m}=") do |value|
config[m] = value
end
end

private

def config=(**args)
Thread.current[:regexp_examples_config].merge!(args)
end

def config
Thread.current[:regexp_examples_config] ||= {
max_repeater_variance: MAX_REPEATER_VARIANCE_DEFAULT,
max_group_results: MAX_GROUP_RESULTS_DEFAULT,
max_results_limit: MAX_RESULTS_LIMIT_DEFAULT
}
end
end
# The maximum variance for any given repeater, to prevent a huge/infinite number of
# examples from being listed. For example, if @@max_repeater_variance = 2 then:
# examples from being listed. For example, if self.max_repeater_variance = 2 then:
# .* is equivalent to .{0,2}
# .+ is equivalent to .{1,3}
# .{2,} is equivalent to .{2,4}
Expand All @@ -12,7 +50,7 @@ class ResultCountLimiters
MAX_REPEATER_VARIANCE_DEFAULT = 2

# Maximum number of characters returned from a char set, to reduce output spam
# For example, if @@max_group_results = 5 then:
# For example, if self.max_group_results = 5 then:
# \d is equivalent to [01234]
# \w is equivalent to [abcde]
MAX_GROUP_RESULTS_DEFAULT = 5
Expand All @@ -22,28 +60,6 @@ class ResultCountLimiters
# /[ab]{30}/.examples
# (Which would attempt to generate 2**30 == 1073741824 examples!!!)
MAX_RESULTS_LIMIT_DEFAULT = 10_000
class << self
attr_reader :max_repeater_variance, :max_group_results, :max_results_limit
def configure!(max_repeater_variance: nil,
max_group_results: nil,
max_results_limit: nil)
@max_repeater_variance = (max_repeater_variance || MAX_REPEATER_VARIANCE_DEFAULT)
@max_group_results = (max_group_results || MAX_GROUP_RESULTS_DEFAULT)
@max_results_limit = (max_results_limit || MAX_RESULTS_LIMIT_DEFAULT)
end
end
end

def self.max_repeater_variance
ResultCountLimiters.max_repeater_variance
end

def self.max_group_results
ResultCountLimiters.max_group_results
end

def self.max_results_limit
ResultCountLimiters.max_results_limit
end

# Definitions of various special characters, used in regular expressions.
Expand Down
7 changes: 5 additions & 2 deletions lib/regexp-examples/helpers.rb
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,13 @@ module RegexpExamples
# Edge case:
# permutations_of_strings [ [] ] #=> nil
# (For example, ths occurs during /[^\d\D]/.examples #=> [])
def self.permutations_of_strings(arrays_of_strings, max_results_limiter = MaxResultsLimiterByProduct.new)
def self.permutations_of_strings(arrays_of_strings,
max_results_limiter = MaxResultsLimiterByProduct.new)
partial_result = max_results_limiter.limit_results(arrays_of_strings.shift)
return partial_result if arrays_of_strings.empty?
partial_result.product(permutations_of_strings(arrays_of_strings, max_results_limiter)).map do |result|
partial_result.product(
permutations_of_strings(arrays_of_strings, max_results_limiter)
).map do |result|
join_preserving_capture_groups(result)
end
end
Expand Down
6 changes: 4 additions & 2 deletions lib/regexp-examples/max_results_limiter.rb
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
module RegexpExamples
class MaxResultsLimiter # Base class
# Abstract (base) class to assist limiting Regexp.examples max results
class MaxResultsLimiter
def initialize(initial_results_count)
@results_count = initial_results_count
end
Expand All @@ -25,7 +26,8 @@ def cumulate_total(new_results_count, cumulator_method)

def results_allowed_from(partial_results, limiter_method)
partial_results.first(
RegexpExamples.max_results_limit.public_send(limiter_method, @results_count)
RegexpExamples::Config.max_results_limit
.public_send(limiter_method, @results_count)
)
end
end
Expand Down
27 changes: 11 additions & 16 deletions lib/regexp-examples/repeaters.rb
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ def initialize(group)
end

def result
group_results = group.result.first(RegexpExamples.max_group_results)
group_results = group.result.first(RegexpExamples::Config.max_group_results)
results = []
max_results_limiter = MaxResultsLimiterBySum.new
min_repeats.upto(max_repeats) do |repeats|
Expand Down Expand Up @@ -51,7 +51,7 @@ class StarRepeater < BaseRepeater
def initialize(group)
super
@min_repeats = 0
@max_repeats = RegexpExamples.max_repeater_variance
@max_repeats = RegexpExamples::Config.max_repeater_variance
end
end

Expand All @@ -61,7 +61,7 @@ class PlusRepeater < BaseRepeater
def initialize(group)
super
@min_repeats = 1
@max_repeats = RegexpExamples.max_repeater_variance + 1
@max_repeats = RegexpExamples::Config.max_repeater_variance + 1
end
end

Expand All @@ -80,19 +80,14 @@ class RangeRepeater < BaseRepeater
def initialize(group, min, has_comma, max)
super(group)
@min_repeats = min || 0
if max # e.g. {1,100} --> Treat as {1,3} (by default max_repeater_variance)
@max_repeats = smallest(max, @min_repeats + RegexpExamples.max_repeater_variance)
elsif has_comma # e.g. {2,} --> Treat as {2,4} (by default max_repeater_variance)
@max_repeats = @min_repeats + RegexpExamples.max_repeater_variance
else # e.g. {3} --> Treat as {3,3}
@max_repeats = @min_repeats
end
end

private

def smallest(x, y)
x < y ? x : y
@max_repeats = if !has_comma
@min_repeats
else
[
max,
@min_repeats + RegexpExamples::Config.max_repeater_variance
].compact.min
end
end
end
end
135 changes: 135 additions & 0 deletions spec/config_spec.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
RSpec.describe RegexpExamples::Config do

describe 'max_repeater_variance' do
context 'as a passed parameter' do
it 'with low limit' do
expect(/[A-Z]/.examples(max_results_limit: 5))
.to match_array %w(A B C D E)
end
it 'with (default) high limit' do
expect(/[ab]{14}/.examples.length)
.to be <= 10000 # NOT 2**14 == 16384, because it's been limited
end
it 'with (custom) high limit' do
expect(/[ab]{14}/.examples(max_results_limit: 20000).length)
.to eq 16384 # NOT 10000, because it's below the limit
end
it 'for boolean or groups' do
expect(/[ab]{3}|[cd]{3}/.examples(max_results_limit: 10).length)
.to eq 10
end
it 'for case insensitive examples' do
expect(/[ab]{3}/i.examples(max_results_limit: 10).length)
.to be <= 10
end
it 'for range repeaters' do
expect(/[ab]{2,3}/.examples(max_results_limit: 10).length)
.to be <= 10 # NOT 4 + 8 = 12
end
it 'for backreferences' do
expect(/([ab]{3})\1?/.examples(max_results_limit: 10).length)
.to be <= 10 # NOT 8 * 2 = 16
end
it 'for a complex pattern' do
expect(/(a|[bc]{2})\1{1,3}/.examples(max_results_limit: 14).length)
.to be <= 14 # NOT (1 + 4) * 3 = 15
end
end

context 'as a global setting' do
before do
@original = RegexpExamples::Config.max_results_limit
RegexpExamples::Config.max_results_limit = 5
end
after do
RegexpExamples::Config.max_results_limit = @original
end

it 'sets limit without passing explicitly' do
expect(/[A-Z]/.examples)
.to match_array %w(A B C D E)
end
end
end # describe 'max_results_limit'

describe 'max_repeater_variance' do
context 'as a passed parameter' do
it 'with a larger value' do
expect(/a+/.examples(max_repeater_variance: 5))
.to match_array %w(a aa aaa aaaa aaaaa aaaaaa)
end
it 'with a lower value' do
expect(/a{4,8}/.examples(max_repeater_variance: 0))
.to eq %w(aaaa)
end
end

context 'as a global setting' do
before do
@original = RegexpExamples::Config.max_repeater_variance
RegexpExamples::Config.max_repeater_variance = 5
end
after do
RegexpExamples::Config.max_repeater_variance = @original
end

it 'sets limit without passing explicitly' do
expect(/a+/.examples)
.to match_array %w(a aa aaa aaaa aaaaa aaaaaa)
end
end
end # describe 'max_repeater_variance'

describe 'max_group_results' do
context 'as a passed parameter' do
it 'with a larger value' do
expect(/\d/.examples(max_group_results: 10))
.to match_array %w(0 1 2 3 4 5 6 7 8 9)
end
it 'with a lower value' do
expect(/\d/.examples(max_group_results: 3))
.to match_array %w(0 1 2)
end
end

context 'as a global setting' do
before do
@original = RegexpExamples::Config.max_group_results
RegexpExamples::Config.max_group_results = 10
end
after do
RegexpExamples::Config.max_group_results = @original
end

it 'sets limit without passing explicitly' do
expect(/\d/.examples)
.to match_array %w(0 1 2 3 4 5 6 7 8 9)
end
end
end # describe 'max_group_results'

describe 'thread safety' do
it 'uses thread-local global config values' do
thread = Thread.new do
RegexpExamples::Config.max_group_results = 1
expect(/\d/.examples).to eq %w(0)
end
sleep 0.1 # Give the above thread time to run
expect(/\d/.examples).to eq %w(0 1 2 3 4)
thread.join
end

it 'uses thread-local block config values' do
thread = Thread.new do
RegexpExamples::Config.with_configuration(max_group_results: 1) do
expect(/\d/.examples).to eq %w(0)
sleep 0.2 # Give the below thread time to run while this block is open
end
end
sleep 0.1 # Give the above thread time to run
expect(/\d/.examples).to eq %w(0 1 2 3 4)
thread.join
end
end # describe 'thread safety'

end
Loading

0 comments on commit 36db48a

Please sign in to comment.