Skip to content

A collection of dictionary and dictionary-like utility classes for Python.

License

Notifications You must be signed in to change notification settings

fsh/betterdicts

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

betterdicts

A collection of dictionary and dictionary-like utility classes.

The base type is betterdict, which only seeks to improve on the built-in dict type by providing some extra functionality.

On top of this several other useful dictionaries have been built:

  • stack_dict: a multi-levelled dict whose state can be pushed and popped like a stack.
  • attr_dict and jsdict: ergonomic dictionaries where attribute access (xs.key) can be used in place of the usual square bracket syntax (xs["key"]).
  • persistent_dict: a dict which automatically saves itself to disk with every change, so it retains its state between script execution.
  • dynamic_dict and cache_dict: dictionaries that represent or act like a memoized function.
  • number_dict: a dict on which arithmetic can be performed. Think of it as a simple dict version of a NumPy array.

betterdict

from betterdicts import betterdict

Intended to work just like a dict, but with extra functionality and convenience.

Most of these things can be achieved with simple generator expressions or similar, but having need of them constantly it got to a boiling point.

Increased Functional Friendliness

obj.update() and obj.clear() now return obj instead of None, so they are no longer as useless.

>>> d = betterdict()
>>> d.update(a=1).update(zip('abc', [7,4,2]))
{'a': 7, 'b': 4, 'c': 2}

There's also betterdict.insert(key, value, [missing]) as an alternative to obj[key] = value. It returns the previous value (if present).

>>> d.insert('b', 69)
4
>>> d.insert('d', 0)
>>> d.insert('e', 0, 'default return value goes here')
'default return value goes here'
>>> d
{'a': 7, 'b': 69, 'c': 2, 'd': 0, 'e': 0}

Python being a statement-oriented language rather than expression-oriented can sometimes get in the way when you want to do simple operations inside of a list comprehension or test. These methods are intended to temper this problem.

>>> d = betterdict()
>>> [d.insert(len(w), w) for w in 'fare thee well great heart'.split()]
[None, 'fare', 'thee', None, 'great']
>>> d
{4: 'well', 5: 'heart'}

Note that del obj[x] (a good example of Python's statement fanaticism) already has an expression alternative in dict.pop(x, [default]).

The annoyance of statements led to the infamous walrus wart^Woperator being added after several bikeshed bonfires, finally admitting that expression-oriented language design is objectively superior. [TODO: add tongue-in-cheek emoji here.]

Combining and Collecting

Often you're collecting things into dictionaries where the key you're interested in is not unique. Or simiarly, if given a stream of keyed data, you usually need some extra logic to handle identical keys.

One way to handle this is to write out some loop and add some tedious-but-mandatory if branch in the loop. These loops start to feel like Chinese water torture when they number in the thousands.

The default behavior of dict if you just give it an iterator is to discard (overwrite). That default behavior has been kept, but alternatives have been provided.

  • betterdict.combine(func, it, [initial=...])

This works like the built-in reduce(func, it, [initial]). It works well when you're dealing with simple data like numbers and you just want a single answer.

>>> betterdict.combine(int.__add__, [('a',1), ('a',5), ('b',2), ('a',7)])
{'a': 13, 'b': 2}

It's short-hand for two kinds of loops:

# with a specified initial value
for k, v in source:
  d[k] = func(d.get(k, initial), v)

# without
for k, v in source:
  if k in d:
    d[k] = func(d[k], v)
  else:
    d[k] = v

This can be used to put values into collections like in a functional language, as the following example:

>>> betterdict.combine(lambda x, y: x + (y,), [('a',1), ('a',5), ('b',2), ('a',7)], initial=())
{'a': (1, 5, 7), 'b': (2,)}

But it's very inefficient and painful to use with mutable containers like lists. This is where collect() comes in:

  • betterdict.collect(type, it, [add_action=...])

This will return a dictionary-of-collections, where the collection is created with type and added with add_action (which defaults to being auto-detected as something like list.append, set.add, etc.).

>>> betterdict.collect(list, [('a',1), ('a',5), ('b',2), ('a',7)])
{'a': [1, 5, 7], 'b': [2]}

A more realistic example:

>>> some_stream_of_words = open('README.md', 'r', encoding='utf-8').read().split()
>>> l2w = betterdict.collect(set, ((len(w), w) for w in some_stream_of_words))
>>> l2w[2]
{'{}', 'in', 'of', 'be', 'up', '(a', '-1', 'is', 'no', '4,', '6}', 'to', 'if', 'as', 'v)', '1,', '38', '"\'', '6:', 'As', '-c', '2,', 'w)', '8:', 'or', 'v:', '2}', '2:', 'on', '10', '==', 'x,', '7,', '0}', '-l', '4}', "',", 'k:', 'do', 'So', 'it', '4:', 'd:', '9,', '3)', 'at', '7:', 'Or', '3:', '40', '5:', '##', 'ad', '^D', 'an', '20', '9:', 'ls', "T'", 'p2', 'y:', '1}', 'It', 'To', 'so', '1)', '17', '0)', 'by', 'k,', '3,', '//', 'x:', '0,', '26', '5,', '+=', 'p1', 'me', '1:'}

These functions are magic in that they can work both as class methods and as instance methods. The instance methods obj.collect(...) and obj.combine(...) works kind of like a combining or collecting obj.update(). The only exception is that obj.combine() works immutably by returning a new dictionary as if it was a binary operator.

See help(betterdict.combine) and help(betterdict.collect) for more information.

Filtering and Mapping

.map_keys(f), .map_values(f), .map_pairs(f), .filter_keys(p), .filter_values(p), .filter_pairs(p) all do the somewhat obvious thing.

>>> q = betterdict(enumerate("I'm nothing but heart"))
>>> q.filter_keys(lambda x: 4 < x < 10)
{5: 'o', 6: 't', 7: 'h', 8: 'i', 9: 'n'}
>>> q.filter_values(str.isupper)
{0: 'I'}

Though it should be noted that unlike .update they do not modify the dictionary in-place as they're targeting more functional programming.

map_pairs(f) and filter_pairs(f) take a function of two arguments.

>>> q.map_pairs(lambda x, y: (x, x * y))
{0: '', 1: "'", 2: 'mm', 3: '   ', 4: 'nnnn', 5: '', 6: 't', 7: 'hh', 8: 'iii', 9: 'nnnn', 10: '', 11: ' ', 12: 'bb', 13: 'uuu', 14: 'tttt', 15: '', 16: 'h', 17: 'ee', 18: 'aaa', 19: 'rrrr', 20: ''}

Since map_keys() and map_pairs() might map two values to the same key, they also take a function and initial value as optional arguments to do a reduction, similar to how the built-in reduce() works. (Refer to .combine().)

>>> q.map_keys(lambda x: x%2, str.__add__)
{0: 'Imntigbthat', 1: "' ohn u er"} 

There's also .filter([keys=f], [values=g]) and .map([keys=f], [values=g]) which can be used to filter/map both keys and values with two different functions in one step:

>>> q.filter(lambda k: 4 < k < 10, lambda v: v in 'aeiouy')
{5: 'o', 8: 'i'}

Inversion

Flipping those arrows.

>>> q = betterdict(enumerate('divebar'))
>>> q
{0: 'd', 1: 'i', 2: 'v', 3: 'e', 4: 'b', 5: 'a', 6: 'r'}
>>> q.invert()
{'d': 0, 'i': 1, 'v': 2, 'e': 3, 'b': 4, 'a': 5, 'r': 6}
>>> q == q.invert().invert()

Note that q.invert() does not modify q, but returns a new dictionary. (Functional friendliness where it makes sense.)

Most of the time the map is not expected to be injective (one-to-one) though, and there are two ways of handling that:

  • you can either collect values into a container like a list, or
>>> q = betterdict(enumerate('syzygy'))
>>> q.invert()
{'s': 0, 'y': 5, 'z': 2, 'g': 4}
>>> q.invert_and_collect(list)
{'s': [0], 'y': [1, 3, 5], 'z': [2], 'g': [4]}
>>> q.invert_and_collect(set)
{'s': {0}, 'y': {1, 3, 5}, 'z': {2}, 'g': {4}}
  • you can combine values repeatedly until you get a single answer.
>>> q.invert_and_combine(int.__mul__)
{'s': 0, 'y': 15, 'z': 2, 'g': 4}
>>> q.invert_and_combine(lambda x, y: x+[y], [])
{'s': [0], 'y': [1, 3, 5], 'z': [2], 'g': [4]}

As seen above, collecting and combining are sometimes just special cases of a more general operation (folding, repeated applicaitino of a monoid, etc). More functional languages usually only has one operation for this, but I chose to separate them because they have very a different feel in in Python, with very different performance characteristics.

jsdict (ad hoc use only)

from betterdicts import jsdict, njsdict, rjsdict

jsdict is a betterdicts which works like JavaScript object, where keys and attributes are the same. This is accomplished with zero overhead.

>>> d = jsdict()
>>> d['hello'] = 1
>>> d.filepath = '/'
>>> d
{'hello': 1, 'filepath': '/'}
>>> d.hello, d['filepath']
(1, '/')

For obvious reasons, this is a touch insane. JavaScript was never accused of good design, and bringing it to Python where conventions are different will lead to awful things:

>>> d = jsdict(clear=0)
>>> d.clear()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'int' object is not callable

Thus these are for ad hoc use only, i.e. in one-off scripts or when doging work directly in the REPL.

Refer to attr_dict below for a slightly safer alternative.

But sometimes this kind of convenience is just too good to pass up. You'll know this in your heart too, if you've ever dealt with loading configuration from some source that gives you a dict, and then having to type out ['...'] so many times you actually get angry.

>>> import json
>>> config = json.loads('{"lazy": "sarah", "active": false}', object_pairs_hook=jsdict)
>>> config.active
False
>>> config
{'lazy': 'sarah', 'active': False}

I get equally angry whenever a source gives me some object of a godforsaken abstract ConfigurationProxyHelper class which is just a glorified dict without any of the functionality or compatibility.

  • rjsdict is a jsdict where obj.key for a missing key automatically returns and inserts a new rjsdict instance. Think of it as a recursive jsdict. It's useful for when you're building some hierarchical structure:
 >>> config = rjsdict()
>>> config.lights = 'off'
>>> config.player.health = 40
>>> config.player.name = 'Mr. T'
>>> config
{'lights': 'off', 'player': {'health': 40, 'name': 'Mr. T'}}
  • njsdict is a jsdict where obj.key evaluates to None instead of raising AttributeError. (obj is not modified.)

attr_dict

from betterdicts import attr_dict

Inspired by jsdict above, it works in much the same way (though different mechanism). It is provided as a slightly safer alternative, because it doesn't overwrite extant attributes:

>>> q = attr_dict(a=10)
>>> q.a
10
>>> q.b = 20
>>> q
{'a': 10, 'b': 20}
>>> q.pop = -1
>>> q
{'a': 10, 'b': 20, 'pop': -1}
>>> q.pop
<built-in method pop of attr_dict object at 0x7fc85a580c40>

But note it is still has the glow of insanity, which will inspire nothing but anger, fear, and frustration.

dynamic_dict

from betterdicts import dynamic_dict, cache_dict

How many times have you been frustrated by the fact that the standard collections.defaultdict() calls its factory function without providing the key context?

Yeah, me too.

dynamic_dict(f) is equivalent to a betterdict but if a missing key is requested it is first created with f(key) and inserted.

>>> d = dynamic_dict(hex)
>>> d[16]
'0x10'
>>> d[100]
'0x64'
>>> d
{16: '0x10', 100: '0x64'}

cache_dict

How many times have you wanted concrete access to the function cache when using something like functools.cache?

Yeah, me too.

Intended to be used as a decorator on a function, this will turn the function into a callable dictionary. The dictionary is its own cache and can freely be inspected, modified, etc.

@cache_dict
def heavy_bite(n):
  "Calculates a heavy bite."
  if n < 1: return 1
  print(f'calculating heavy bite {n}...')
  return heavy_bite(n // 3) * heavy_bite(n - 3 + (-n % 3)) + heavy_bite(n - 1)
>>> heavy_bite(10)
calculating heavy bite 10...
calculating heavy bite 3...
calculating heavy bite 1...
calculating heavy bite 2...
calculating heavy bite 9...
calculating heavy bite 6...
calculating heavy bite 5...
calculating heavy bite 4...
calculating heavy bite 8...
calculating heavy bite 7...
2880
>>> heavy_bite(10)
2880
>>> del heavy_bite[10]
>>> heavy_bite(10)
calculating heavy bite 10...
2880
>>> heavy_bite.__doc__
'Calculates a heavy bite.'

NOTE: per now it only caches on its first argument. The rest are just passed through (in case of a cache miss). Ideally I want a way to specify an argument signature with indication of which ones constitutes the key.

number_dict

Acts like a collections.Counter() with arithmetic support like a number.

This is sort of a poor man's numpy-dict.

>>> q = number_dict(range(5))
>>> q
{0: 1, 1: 1, 2: 1, 3: 1, 4: 1}
>>> q[1] += 5
>>> q[4] += 1
>>> q[9]
0
>>> q[9] = 9
>>> (q+1)**2
{0: 4, 1: 49, 2: 4, 3: 4, 4: 9, 9: 100}
>>> 1 / q
{0: 1.0, 1: 0.16666666666666666, 2: 1.0, 3: 1.0, 4: 0.5, 9: 0.1111111111111111}

persistent_dict

The simplest possible persistent state exposed as a betterdict. Also intended for ad hoc use.

This is for when you need something really simple and magic to store some flat data between script invocations, without the extra management of a database or file formats.

[franksh@moso ~]$ python -iq -c 'from betterdicts import persistent_dict'
>>> p = persistent_dict()
>>> p
{}
>>> p['foo'] = 17
>>> p
{'foo': 17}
>>> persistent_dict()
{'foo': 17}
>>> p['bar'] = [1,2]
>>> persistent_dict()
{'foo': 17, 'bar': [1, 2]}
>>> ^D
[franksh@moso ~]$ ls -l cache.pickle 
-rw-r--r-- 1 franksh franksh 38 Aug 26 17:18 cache.pickle
[franksh@moso ~]$ python -iq -c 'from betterdicts import persistent_dict'
>>> persistent_dict()
{'foo': 17, 'bar': [1, 2]}
>>> 

It defaults to loading and saving from ./cache.pickle in whatever the current working directory happens to be, as seen above.

Any change made directly1 to the dictionary causes it to save itself to disk as a pickle file. So obviously if you're building up some initial data quickly, you want to use another dictionary first, and then convert it later with persistent_dict([filename], data). This will automatically both load and save.

To use a custom filename use persisent_dict([filename]):

[franksh@moso ~]$ python -iq -c 'from betterdicts import persistent_dict'
>>> p1 = persistent_dict('foo.p', dict(a=1,b=22,c=333))
>>> p2 = persistent_dict('bar.p', p1.invert())
not pers ({1: 'a', 22: 'b', 333: 'c'},) {}
>>> persistent_dict('foo.p')
{'a': 1, 'b': 22, 'c': 333}
>>> persistent_dict('bar.p')
{1: 'a', 22: 'b', 333: 'c'}
>>> del p2[22]
>>> p2 == persistent_dict('bar.p')
True

Warning: Two separate persistent_dict() objects bound to the same file will not automatically stay in sync, instead they will keep overwriting each other's data!

stack_dict

Stack dicts emulate how scopes or namespaces work. It allows you to repeatedly save the state of the dictonary (push_stack()) and later retore it (pop_stack()).

>>> from betterdicts import stack_dict
>>> q=stack_dict(a=1,b=2)
>>> q
{'a': 1, 'b': 2}
>>> q.push_stack()
>>> q
{'a': 1, 'b': 2}
>>> q['c'] = 7
>>> del q['a']
>>> q
{'b': 2, 'c': 7}
>>> q.pop_stack()
>>> q
{'a': 1, 'b': 2}
>>> q.push_stack(hello=0, world=-1)  # push_stack works like update()
>>> q.push_stack(hello=1000, world=1000)
>>> q
{'a': 1, 'b': 2, 'hello': 1000, 'world': 1000}
>>> q.pop_stack(); q
{'a': 1, 'b': 2, 'hello': 0, 'world': -1}
>>> q.pop_stack(); q
{'a': 1, 'b': 2}

Stack dicts can also be used with with-blocks:

>>> from betterdicts import stack_dict
>>> q = stack_dict({'a': 1, 'b': 2})
>>> with q:
...   q['c'] = 10
...   print(q)
... 
{'a': 1, 'b': 2, 'c': 10}
>>> q  # q is reset to its previous state after the `with`.
{'a': 1, 'b': 2}

Footnotes

  1. "deep" changes, like modifying a mutable object in the dictionary are not detected

About

A collection of dictionary and dictionary-like utility classes for Python.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages