-
Notifications
You must be signed in to change notification settings - Fork 3
/
cerberus_validate.py
432 lines (325 loc) · 12.4 KB
/
cerberus_validate.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
# Copyright (C) 2018-2019 Stefano Zacchiroli <zack@upsilon.cc>
# License: GNU General Public License (GPL), version 2 or above
"""Beancount plugin that allows to enforce data validation rules on ledgers.
Rules are specified via an external YAML file and interpreted according to
`Cerberus <http://docs.python-cerberus.org/>`_ semantics. The rules file must
be passed to the plugin as a configuration string, e.g.::
plugin "mybeancount.public.validate" "validate.yaml"
Rules
=====
A rule is conceptually a pair <match, constraint>, where the match defines to
which Beancount elements the rule applies, and constraint(s) to be enforced on
matching elements. Additional, a rule description is required for ease of rule
reference. The rules file is hence a list of rules, expressed in YAML syntax,
like this::
- description: rule 1's description
match:
# rule 1's match goes here
constraint:
# rule 1's constraint goes here
- description: rule 1's description
match:
# rule 2's match
constraint:
# rule 2's constraint
- description: rule 1's description
match:
# rule 3's match
constraint:
# rule 3's constraint
- # etc.
Constraints
===========
Constraints are applied to matching Beancount elements. An error would be
returned if the selected Beancount element does not satisfy the associated
constraint.
The constraint language is that of `Cerberus
<http://docs.python-cerberus.org/>`_, a popular data-validation framework for
Python. Specifically, a constraint for this validate plugin is a Cerberus
`schema <http://docs.python-cerberus.org/en/stable/schemas.html>`_. Cerberus
schemas being Python dictionaries, you will simply express the desired schema
in YAML syntax. Schemas will be enforced on Beancuont data structures,
transformed to nested Python dictionaries (see "Validation data model" below).
For instance, you can enforce the fact that a Beancount element has a mandatory
"author" metadata field like this::
constraint:
meta:
schema:
author:
required: true
See Cerberus `validation rules
<http://docs.python-cerberus.org/en/stable/validation-rules.html>`_ for details
about the full constraint language.
Matches
=======
Target
------
By default rules are applied to Beancount transactions. You can override the
default using the "target" property of match sections; its value can be the
name of a top-level Beancount entry ("transaction", "open", "document", etc.),
"posting" (meaning individual transaction postings), or the special value "all"
(meaning all top-level entries). You can also use a list of those values to
match multiple Beancount elements at once. Examples::
- match:
target: transaction # the default: apply rule to transactions
...
- match:
target: open # apply rule to open entries instead
...
- match:
target: # apply rule to transaction, open, and document entries
- document
- open
- transaction
...
- match:
target: all # apply rule to to all top-level entries
...
- match:
target: posting # apply rule to individual transaction postings
...
Account
-------
The "account" property of matches allow to restrict rule application to
Beancount elements affecting a given account. Postings will satisfy an account
condition if the given account name is identical to the one of the posting or,
if the "account" value is surrounded by "/", if the posting account name
matches the given regular expression.
As a shorthand, you can use "account" properties to filter transactions. A
transaction will satisfy an "account" property if at least one of its potsings
matches its value.
Examples::
- description: checking transactions must have a bank-label
match:
target: transaction
account: /^Assets:.*:Checking/
constraint:
meta:
schema:
bank-label:
required: true
- description: cheque postings must have a cheque number
match:
target: posting
account: /:Cheque$/
constraint:
meta:
schema:
cheque:
required: true
Schema
------
You can use the "schema" property of matches to define conditional validation
rules of the form "if this Beancount element matches this schema (specified in
the match section), then it must *also* match this other schema (specified in
the constraint section)". For instance::
- description: transactions made using my card must have me as author
match:
target: transaction
schema:
meta:
schema:
card:
required: true
allowed:
- "12345678"
constraint:
meta:
schema:
author:
required: true
allowed:
- Zack
Validation data model
=====================
Constraints are enforced on Beancount elements which conforms with the
definitions found in the :mod:`beancount.core.data` module, after some
"massaging" to ease data validation. In particular, the following
transformations are applied before validation:
* conversion from nested namedtuples to nested dictionaries: the tuple
structure of beancount.core.data is transformed to nested dictionaries, using
the _asdict() method of namedtuples recursively. This allow to validate
Beancount abstract syntax trees (ASTs) using Cerberus schemas (Cerberus
doesn't allow to validate attributes). For instance, you can pretend 'meta'
is a key of transaction directives, even if in beancount.core.data it is a
namedtuple attribute.
* propagation of metadata from transactions down to postings. For instance,
given the following input transaction::
1970-01-01 * "grocery"
author: "zack"
Expenses:Grocery 10.00 EUR
foo: "bar"
Assets:Checking
what validation rules will actually consider is::
1970-01-01 * "grocery"
author: "zack"
Expenses:Grocery 10.00 EUR
foo: "bar"
author: "zack"
Assets:Checking
author: "zack"
Note that these transformations are in effect only during validation and are
discarded afterwards. The set of directives returned by this plugin are
unchanged w.r.t. its input.
"""
import collections
import copy
import re
import yaml
from beancount.core import data
from cerberus import Validator
__plugins__ = ('validate',)
RuleError = collections.namedtuple(
'RuleError',
'source message entry')
ValidationError = collections.namedtuple(
'ValidationError',
'source message entry')
ALL_TARGETS = data.ALL_DIRECTIVES
DEFAULT_TARGETS = [data.Transaction]
def parse_target(target_str):
"""parse a Beancount directive type from its name in string form"""
map = {
'all': ALL_TARGETS,
'close': [data.Close],
'commodity': [data.Commodity],
'custom': [data.Custom],
'document': [data.Document],
'event': [data.Event],
'note': [data.Note],
'open': [data.Open],
'pad': [data.Pad],
'posting': [data.Posting],
'price': [data.Price],
'query': [data.Query],
'transaction': [data.Transaction],
}
return map[target_str.lower()] # TODO return RuleError if parsing fails
def element_to_dict(entry):
"""lift a Beancount entry to a (nested) dict structure, so that rules can be
checked by uniformly traversing nested dictionaries
"""
def lift_posting(posting):
posting = posting._asdict()
posting['_type'] = data.Posting
units = posting['units']._asdict()
units['_type'] = data.Amount
posting['units'] = units
# TODO handle cost and price
return posting
entry_type = type(entry)
d = entry._asdict()
d['_type'] = entry_type
if entry_type == data.Transaction:
d['postings'] = list(map(lift_posting, d['postings']))
# TODO handle other types of entries
return d
def txn_has_account(txn_dict, account_RE):
"""return True iff transaction txn_dict (as a dict) has at least one posting
whose account matches the account_RE regex
"""
return any(map(lambda p: account_RE.search(p['account']),
txn_dict['postings']))
def propagate_meta(from_elt, to_elt):
"""update metadata of to_elt Beancount element using from_elt's ones
both Beancount elements are expected to be in dict format
WARNING: to_elt is both returned and modified in place
"""
# XXX we should probably blacklist 'filename' and 'lineno' here
to_elt['meta'].update(from_elt['meta'])
return to_elt
def load_rule(rule):
def new_validator(schema):
v = Validator(schema)
v.allow_unknown = True
return v
try:
target = rule['match']['target']
rule['match']['target'] = DEFAULT_TARGETS
if target is not None:
if isinstance(target, str):
target = [target]
rule['match']['target'] = [t for ts in map(parse_target, target)
for t in ts]
except KeyError:
pass
try: # Cerberus validators used for matching
rule['match']['schema'] = new_validator(rule['match']['schema'])
except KeyError:
pass
try: # Cerberus validators used for enforcement
rule['constraint'] = new_validator(rule['constraint'])
except KeyError:
pass
try: # account regexs
account = rule['match']['account']
if account.startswith('/') and account.endswith('/'):
rule['match']['account'] = re.compile(account.strip('/'))
else: # not a regex, enforce strict string matching
rule['match']['account'] = re.compile('^{}$'.format(account))
except KeyError:
pass
return rule
def rule_applies(rule, element_d):
"""return True iff a rule should be applied to a Beancount element (as a dict)
"""
match = rule['match']
if match is None: # catch-all match
return True
if element_d['_type'] not in rule['match']['target']:
return False # current entry is not an instance of any target
if 'account' in match:
account_RE = match['account']
if element_d['_type'] == data.Transaction \
and not txn_has_account(element_d, account_RE):
return False
if element_d['_type'] == data.Posting \
and not account_RE.search(element_d['account']):
return False
if 'schema' in match and not match['schema'].validate(element_d):
return False
return True
def rule_validates(rule, element_d):
"""return True iff a rule validates a Beancount element (as a dict)
precondition: it has already been established (e.g., using rule_applies)
that the rule should be applied to this element
"""
return rule['constraint'].validate(element_d)
def validate_entry(entry, rules):
"""Validate a single Beancount entry against a set of rules
Returns:
a list of errors, if any
"""
def apply_rule(rule, element, context):
if rule_applies(rule, element) and not rule_validates(rule, element):
return [ValidationError(
context.meta,
'Constraint violation: {description}'.format(**rule),
context)]
return []
entry_dict = element_to_dict(entry)
errors = []
for rule in rules: # validate top-level entries
errors.extend(apply_rule(rule, element=entry_dict, context=entry))
if entry_dict['_type'] == data.Transaction: # validate txn postings
for posting in entry_dict['postings']:
posting = propagate_meta(entry_dict, copy.deepcopy(posting))
errors.extend(apply_rule(rule, element=posting, context=entry))
return errors
# from profilehooks import profile
# @profile
def validate(entries, options_map, rules_file):
"""Enfore data-validation rules
Args:
entries: a list of directives
options_map: an options map (unused)
rules_file: the name of a YAML file containing validation rules
Returns:
a pair formed by the input entries (unchanged) and a list of
ValidationError errors (if any)
"""
rules = list(map(load_rule, yaml.load(open(rules_file))))
errors = []
for entry in entries:
errors.extend(validate_entry(entry, rules))
return entries, errors