Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Support For MongoDB Client Side Field Level Encryption (CSFLE) #67

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions docs/api/ming.encryption.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
:mod:`ming.encryption` module
================================


.. automodule:: ming.encryption
:members:
:private-members:
2 changes: 2 additions & 0 deletions docs/api/ming.odm.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@

.. automodule:: ming.odm.declarative
:members:
:show-inheritance:
:inherited-members:

.. automodule:: ming.odm.base
:members:
Expand Down
2 changes: 1 addition & 1 deletion docs/baselevel.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ you want.
While this dynamic behavior is handy in a rapid development environment where you
might delete and re-create the database many times a day, it starts to be a
problem when you *need* to make guarantees of the type of data in a collection
(because you code depends on it). The goal of Ming is to allow you to specify
(because your code depends on it). The goal of Ming is to allow you to specify
the schema for your data in Python code and then develop in confidence, knowing
the format of data you get from a query.

Expand Down
89 changes: 89 additions & 0 deletions docs/encryption.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
:tocdepth: 3

.. _odm-encryption:

============================
Encrypting Sensitive Data
============================

This section describes how Ming can be used to automatically encrypt and decrypt your document's fields. This is accomplished by leveraging MongoDB's `Client-Side Field Level Encryption (CSFLE)`_ feature.



.. _Client-Side Field Level Encryption (CSFLE): https://pymongo.readthedocs.io/en/stable/examples/encryption.html#client-side-field-level-encryption


Encryption at the Foundation Level
==================================

When declaratively working with models by subclassing :class:`~ming.declarative.Document` in the :ref:`ming_baselevel`, you can add field level encryption by pairing a :class:`~ming.encryption.DecryptedField` with a :class:`~ming.metadata.Field`.


A simple example might look like the following.

.. code-block:: python

class UserEmail(Document):
class __mongometa__:
session = session
name = 'user_emails'
_id = Field(schema.ObjectId)

email_encrypted = Field(S.Binary, if_missing=None)
email = DecryptedField(str, 'email_encrypted')


Breaking down DecryptedField
----------------------------------

This approach requires that you follow a few conventions:

#. The field storing the encrypted data should be configured in the following way:

* It should be a :class:`~ming.metadata.Field`.
* The Field should be of type :class:`~ming.schema.Binary`.
* The Field's name should end with `_encrypted`.

#. Next to this should be a corresponding :class:`~ming.encryption.DecryptedField` that will decrypt the data.

* Its first argument should be the type that you expect the decrypted data to be (`str`, `int`, etc.).
* The second argument should be the name of the encrypted field (e.g. `email_encrypted`).
* The DecryptedField's name should be the same as the encrypted :class:`~ming.metadata.Field`, but without the `_encrypted` suffix (e.g. `email`).


Encryption at the Declarative Level
========================================

Similarly when working with the higher level of abstraction offered by :class:`~ming.odm.declarative.MappedClass`es, you can add field level encryption by pairing a :class:`~ming.odm.declarative.DecryptedProperty` with a :class:`~ming.odm.property.FieldProperty`


A simple example might look like the following.

.. code-block:: python

class UserEmail(MappedClass):
class __mongometa__:
session = session
name = 'user_emails'
_id = FieldProperty(schema.ObjectId)

email_encrypted = FieldProperty(S.Binary, if_missing=None)
email = DecryptedProperty(str, 'email_encrypted')


Breaking down DecryptedProperty
----------------------------------

Similarly to the foundation level, this approach requires that you follow a few conventions:

#. The field storing the encrypted data should be configured in the following way:

* It should be a :class:`~ming.odm.property.FieldProperty`.
* The FieldProperty should be of type :class:`~ming.schema.Binary`.
* The FieldProperty's name should end with `_encrypted`.

#. Next to this should be a :class:`~ming.odm.declarative.DecryptedProperty` that will decrypt the data.

* Its first argument should be the type that you expect the decrypted data to be (`str`, `int`, etc.).
* The second argument should be the name of the encrypted field (e.g. `email_encrypted`).
* The DecryptedProperty's name should be the same as the encrypted :class:`~ming.odm.declarative.DecryptedProperty`, but without the `_encrypted` suffix (e.g. `email`).
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,7 @@ Documentation Content
polymorphism
custom_properties
baselevel
encryption
reference
news

Expand Down
84 changes: 84 additions & 0 deletions docs/presentations/demo_encryption.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
import bson
from ming import Session
from ming.datastore import create_engine, create_datastore, DataStore
from ming.encryption import EncryptionConfig
import ming.schema as S
from ming.tests import make_encryption_key

bind: DataStore = create_datastore(
'mongodb://localhost:27017/test_database',
encryption=EncryptionConfig({
'kms_providers': {
'local': {
# Don't use this for production! This is just for demo purposes
'key': make_encryption_key('demo_encryption'),
},
},
'key_vault_namespace': 'demo_encryption_db.__keyVault',
'provider_options': {
'local': {
'key_alt_names': ['datakeyName'],
},
},
}))

# clean up for our demo purposes
bind.conn.drop_database('test_database')
bind.conn.drop_database('demo_encryption_db')

session = Session(bind)

from ming import Field, Document, schema
from ming.encryption import DecryptedField
import datetime

class UserEmail(Document):
class __mongometa__:
session = session
name = 'user_emails'
_id = Field(schema.ObjectId)

# Encrypted fields should:
# - Have '_encrypted' suffix
# - Have type Binary
email_encrypted = Field(S.Binary, if_missing=None)

# Decrypted fields should:
# - Have no suffix
# - Have the actual type
# - Provide the encrypted field's full name
email = DecryptedField(str, 'email_encrypted')


user_email = UserEmail.make({})
assert not user_email.email
assert not user_email.email_encrypted

# Can directly set DecryptedField and it will auto-populate and encrypt its counterpart
user_email.email = 'rick@example.com'
assert user_email.email_encrypted is not None
assert user_email.email_encrypted != 'rick@example.com'
assert isinstance(user_email.email_encrypted, bson.Binary)
user_email.m.save()


# Use .make_encr to properly create new instance with unencrypted data
user_email2 = UserEmail.make_encr(dict(
email='stacy@example.com'))


assert user_email2.email_encrypted is not None
assert user_email2.email_encrypted != 'stacy@example.com'
assert isinstance(user_email2.email_encrypted, bson.Binary)
blob1 = user_email2.email_encrypted

user_email2.m.save()

# updating the email updates the corresponding encrypted field
user_email2.email = 'stacy+1@example.com'
assert user_email2.email_encrypted != blob1

user_email2.m.save()

bind.conn.drop_database('test_database')
bind.conn.drop_database('demo_encryption_db')
2 changes: 2 additions & 0 deletions ming/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ def configure(**kwargs):
def configure_from_nested_dict(config):
try:
from formencode import schema, validators
import ming.validators as ming_validators
except ImportError:
raise MingConfigError("Need to install FormEncode to use ``ming.configure``")

Expand All @@ -36,6 +37,7 @@ class DatastoreSchema(schema.Schema):
auto_ensure_indexes = validators.StringBool(if_missing=True)
# pymongo
tz_aware = validators.Bool(if_missing=False)
encryption = ming_validators.EncryptionConfigValidator(if_missing=None)

datastores = {}
for name, datastore in config.items():
Expand Down
80 changes: 71 additions & 9 deletions ming/datastore.py
Original file line number Diff line number Diff line change
@@ -1,20 +1,29 @@
from __future__ import annotations

import time
import logging
from threading import Lock
from typing import Union
from typing import Union, TYPE_CHECKING
import urllib
from pymongo import MongoClient
from pymongo.database import Database
from pymongo.errors import ConnectionFailure, InvalidURI
from pymongo.encryption import ClientEncryption, Algorithm
from pymongo.errors import ConnectionFailure, InvalidURI, EncryptionError
from pymongo.uri_parser import parse_uri
from pymongocrypt.errors import MongoCryptError

from ming.utils import LazyProperty

from . import mim
from . import exc

if TYPE_CHECKING:
from . import encryption

Check warning on line 21 in ming/datastore.py

View check run for this annotation

Codecov / codecov/patch

ming/datastore.py#L21

Added line #L21 was not covered by tests

Conn = Union[mim.Connection, MongoClient]


def create_engine(*args, **kwargs):
def create_engine(*args, **kwargs) -> Engine:
"""Creates a new :class:`.Engine` instance.

According to the provided url schema ``mongodb://`` or ``mim://``
Expand All @@ -34,7 +43,7 @@
return Engine(use_class, args, kwargs, connect_retry, auto_ensure_indexes)


def create_datastore(uri, **kwargs):
def create_datastore(uri, **kwargs) -> DataStore:
"""Creates a new :class:`.DataStore` for the database identified by ``uri``.

``uri`` is a mongodb url in the form ``mongodb://username:password@address:port/dbname``,
Expand Down Expand Up @@ -74,6 +83,8 @@
if database.startswith("/"):
database = database[1:]

encryption_config: encryption.EncryptionConfig = kwargs.pop('encryption', None)

if uri:
# User provided a valid connection URL.
if bind:
Expand All @@ -85,14 +96,14 @@
# Create engine without connection.
bind = create_engine(**kwargs)

return DataStore(bind, database)
return DataStore(bind, database, encryption_config)


class Engine:
"""Engine represents the connection to a MongoDB (or in-memory database).

The ``Engine`` class lazily creates the connection the firs time it's
actually accessed.
The ``Engine`` class lazily creates the connection the first time it's
accessed.
"""

def __init__(self, Connection,
Expand Down Expand Up @@ -135,6 +146,7 @@
try:
with self._lock:
if self._conn is None:
# NOTE: Runs MongoClient/EncryptionClient
self._conn = self._Connection(
*self._conn_args, **self._conn_kwargs)
else:
Expand All @@ -159,10 +171,10 @@
:func:`.create_datastore` function.
"""

def __init__(self, bind, name, authenticate=None):
def __init__(self, bind: Engine, name: str, encryption_config: encryption.EncryptionConfig = None):
self.bind = bind
self.name = name
self._authenticate = authenticate
self._encryption_config = encryption_config
self._db = None

def __repr__(self): # pragma no cover
Expand Down Expand Up @@ -191,3 +203,53 @@

self._db = self.bind[self.name]
return self._db

@property
def encryption(self) -> encryption.EncryptionConfig | None:
return self._encryption_config

@LazyProperty
def encryptor(self) -> ClientEncryption:
"""Creates and returns a :class:`pymongo.encryption.ClientEncryption` instance for the given ming datastore. It uses this to handle encryption/decryption using pymongo's native routines.

:param ming_ds: the :class:`ming.datastore.Datastore` for which this encryptor should be configured with.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can delete this line, since self is the datastore now!

"""
encryption = ClientEncryption(self.encryption.kms_providers, self.encryption.key_vault_namespace,
self.conn, self.conn.codec_options)
return encryption

def make_data_key(self):
"""Mongodb's Client Side Field Level Encryption (CSFLE) requires a data key to be present in the key vault collection. This ensures that the key vault collection is properly indexed and that a data key is present for each provider.
"""
# index recommended by mongodb docs:
key_vault_db_name, key_vault_coll_name = self.encryption.key_vault_namespace.split('.')
key_vault_coll = self.conn[key_vault_db_name][key_vault_coll_name]
key_vault_coll.create_index("keyAltNames", unique=True,
partialFilterExpression={"keyAltNames": {"$exists": True}})

for provider, options in self.encryption.provider_options.items():
self.encryptor.create_data_key(provider, **options)

def encr(self, s: str | None, _first_attempt=True, provider='local') -> bytes | None:
"""Encrypts a string using the encryption configuration of the ming datastore that this class is bound to.
Most of the time, you won't need to call this directly, as it is used by the :meth:`ming.encryption.EncryptedDocumentMixin.encrypt_some_fields` method.
"""
if s is None:
return None

Check warning on line 238 in ming/datastore.py

View check run for this annotation

Codecov / codecov/patch

ming/datastore.py#L238

Added line #L238 was not covered by tests
try:
key_alt_name = self.encryption._get_key_alt_name(provider)
return self.encryptor.encrypt(s, Algorithm.AEAD_AES_256_CBC_HMAC_SHA_512_Deterministic,
key_alt_name=key_alt_name)
except (EncryptionError, MongoCryptError) as e:
if _first_attempt and 'not all keys requested were satisfied' in str(e):
self.make_data_key()
return self.encr(s, _first_attempt=False)
else:
raise

Check warning on line 248 in ming/datastore.py

View check run for this annotation

Codecov / codecov/patch

ming/datastore.py#L248

Added line #L248 was not covered by tests

def decr(self, b: bytes | None) -> str | None:
"""Decrypts a string using the encryption configuration of the ming datastore that this class is bound to.
"""
if b is None:
return None

Check warning on line 254 in ming/datastore.py

View check run for this annotation

Codecov / codecov/patch

ming/datastore.py#L254

Added line #L254 was not covered by tests
return self.encryptor.decrypt(b)
8 changes: 7 additions & 1 deletion ming/declarative.pyi
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ from typing import TypeVar, Mapping, Any

from ming.base import Object
from ming.metadata import Manager
from ming.encryption import EncryptedMixin

M = TypeVar('M')

Expand All @@ -10,12 +11,17 @@ M = TypeVar('M')
# from ming.metadata import _Document
# class _Document(Object): ...methods...
# class Document(_Document):
class Document(Object):
class Document(Object, EncryptedMixin):
def __init__(self, data:Mapping=None, skip_from_bson=False) -> None: ...

@classmethod
def make(cls, data, allow_extra=False, strip_extra=True) -> Document: ...

# Encryption-Related fields:

@classmethod
def make_encr(cls, data: dict) -> Document: ...

# ...
# class __mongometa__:
# name: Any = ...
Expand Down
Loading