-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit 675ab2e
Showing
11 changed files
with
508 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
.idea/ | ||
dist/ | ||
storagebox.egg-info/ | ||
__pycache__/ | ||
*.pyc |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,116 @@ | ||
# StorageBox | ||
|
||
StorageBox is a python module that you can use to de-duplicate data | ||
among distributed components. | ||
|
||
For example, let's assume you run a movie store. You have | ||
voucher codes you'd like to distribute to the first 30 users who press | ||
a button. You are concerned that some users might try to get more | ||
than 1 voucher code by exploiting race conditions (maybe clicking the | ||
button from multiple machines at the same time). | ||
|
||
|
||
|
||
Here is what StorageBox allows you to do | ||
``` | ||
# Setup Code | ||
import storagebox | ||
item_repo = storagebox.ItemBankDynamoDbRepository(table_name="voucher_codes") | ||
deduplication_repo = storagebox.DeduplicationDynamoDbRepository(table_name="storage_box_deduplication_table") | ||
# You can add items to the item repo (for example add list of voucher codes) | ||
item_repo.batch_add_items(voucher_codes) | ||
# You can then assign voucher codes to User IDs | ||
deduplicator = storagebox.Deduplicator(item_repo=item_repo, deduplication_repo=deduplication_repo) | ||
voucher_code = deduplicator.fetch_item_for_deduplication_id( | ||
deduplication_id=user_id | ||
) | ||
``` | ||
And that's it! | ||
|
||
As long as you use a suitable `deduplication_id`, all race conditions | ||
and data hazards will be taken care of for you. Examples of suitable | ||
candidates for `deduplication_id` can be User ID, IP Address, | ||
Email Address or anything that works best with your application. | ||
|
||
|
||
## Prerequisites | ||
To use StorageBox, you need the following already set up. | ||
|
||
- An ItemBank DynamoDB Table, The current implementation requires the table to have 1 column | ||
called `item`. This is where you will store items (in the case of the example: | ||
voucher codes). | ||
- A Deduplication DynamoDB Table, This will be used by `StorageBox` to achieve idempotency, | ||
that is, to make sure that if you call `fetch_item_for_deduplication_id` multiple times with | ||
the same `deduplication_id`, you will always get the same result. | ||
|
||
If you prefer to use something else other than DynamoDB, you can implement your own `ItemBankRepository` | ||
and/or `DeduplicationRepository` for any other backend. This implementation will have to implement | ||
the already established Abstract class. If you do that, contributions are welcome! | ||
|
||
|
||
## Installation | ||
``` | ||
pip install storagebox | ||
``` | ||
|
||
|
||
## Other Example Use Cases | ||
Hosting a big event and only have 10,300 seats that would be booked in the first few minutes? | ||
``` | ||
# Before the event, add 10,300 numbers to the bank | ||
item_repo.batch_add_items([str(i) for i in range(10300)]) | ||
# From your webserver | ||
assignment_number = deduplicator.fetch_item_for_deduplication_id( | ||
deduplication_id=email | ||
) | ||
``` | ||
|
||
Are you an influencer and only have 5000 people to give special referral links to? (First 5000 | ||
people who click the link in the description get a free something!) | ||
``` | ||
# Before you post your content | ||
item_repo.batch_add_items(referral_links_list) | ||
# From your webserver | ||
referral_link = deduplicator.fetch_item_for_deduplication_id( | ||
deduplication_id=ip_address | ||
) | ||
``` | ||
|
||
Are you organizing online classes for your 150 students, you're willing to host 3 classes (50 students) | ||
each but you'd like to be sure that no student attends more than 1 class? | ||
``` | ||
# Before you host your classes | ||
class_1_codes = storagebox.ItemBankDynamoDbRepository(table_name="class_1_codes") | ||
class_2_codes = storagebox.ItemBankDynamoDbRepository(table_name="class_2_codes") | ||
class_3_codes = storagebox.ItemBankDynamoDbRepository(table_name="class_3_codes") | ||
deduplication_repo = storagebox.DeduplicationDynamoDbRepository(table_name="myonline_classes_deduplication_table") | ||
class_1_codes.([str(i) for i in range(0, 50)]) | ||
class_2_codes.([str(i) for i in range(50, 100)]) | ||
class_3_codes.([str(i) for i in range(100, 150)]) | ||
# From your webserver | ||
deduplicators = { | ||
'class_1': storagebox.Deduplicator(item_repo=class_1_codes, deduplication_repo=deduplication_repo), | ||
'class_2': storagebox.Deduplicator(item_repo=class_2_codes, deduplication_repo=deduplication_repo), | ||
'class_3': storagebox.Deduplicator(item_repo=class_3_codes, deduplication_repo=deduplication_repo), | ||
} | ||
deduplicator[requested_class].fetch_item_for_deduplication_id( | ||
deduplication_id=student_id | ||
) | ||
``` | ||
|
||
# How It Works | ||
A blogpost explaining how `storagebox` works is available [here](https://blog.peteremil.com/2021/02/realtime-distributed-deduplication-how.html) |
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
[tool.poetry] | ||
name = "storagebox" | ||
version = "1.0.5" | ||
description = "A reusable, idempotent, and exactly once deduplication API" | ||
authors = ["Peter Emil Halim <peter@peteremil.com>"] | ||
readme = "README.md" | ||
|
||
[tool.poetry.dependencies] | ||
python = "^3.8" | ||
boto3 = "^1.16.63" | ||
|
||
[tool.poetry.dev-dependencies] | ||
|
||
[build-system] | ||
requires = ["poetry>=0.12"] | ||
build-backend = "poetry.masonry.api" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
from storagebox.repository.deduplication import DeduplicationDynamoDbRepository | ||
from storagebox.repository.item_bank import ItemBankDynamoDbRepository | ||
from storagebox.api import Deduplicator |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
import logging | ||
import typing | ||
from botocore.exceptions import ClientError | ||
from storagebox import settings | ||
from storagebox import repository | ||
|
||
|
||
log = logging.getLogger('storageBox') | ||
log.setLevel(settings.DEFAULT_LOGGING_LEVEL) | ||
|
||
|
||
class Deduplicator: | ||
def __init__(self, item_repo, deduplication_repo): | ||
self.item_repo = item_repo | ||
self.deduplication_repo = deduplication_repo | ||
|
||
def fetch_item_for_deduplication_id(self, deduplication_id): | ||
item_string = self.item_repo.get_item_from_bank() | ||
if item_string is None: | ||
return item_string | ||
try: | ||
self.deduplication_repo.put_deduplication_id( | ||
deduplication_id=deduplication_id, | ||
item_string=item_string | ||
) | ||
return item_string | ||
except ClientError: | ||
log.debug("deduplication_id is already assigned, will check if I" | ||
" should return item_string %s to the bank", item_string) | ||
existing_item_string = self.deduplication_repo.get_value_for_deduplication_id( | ||
deduplication_id=deduplication_id | ||
) | ||
if existing_item_string != item_string: | ||
self.item_repo.add_item_to_bank( | ||
item_string=item_string | ||
) | ||
log.debug("Item %s was returned", item_string) | ||
return existing_item_string | ||
return item_string | ||
|
||
def add_items_to_bank(self, items: typing.List[str]): | ||
self.item_repo.batch_add_items( | ||
items=items | ||
) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
from storagebox.repository.deduplication import DeduplicationDynamoDbRepository | ||
from storagebox.repository.item_bank import ItemBankDynamoDbRepository |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
import abc | ||
import logging | ||
from storagebox.repository.dynamodb import DynamoDBBasedRepository | ||
|
||
|
||
log = logging.getLogger('storageBox') | ||
|
||
|
||
class DeduplicationRepository(abc.ABC): | ||
@abc.abstractmethod | ||
def get_value_for_deduplication_id(self, deduplication_id: str): | ||
raise NotImplementedError | ||
|
||
@abc.abstractmethod | ||
def put_deduplication_id(self, deduplication_id: str, item_string: str): | ||
raise NotImplementedError | ||
|
||
|
||
class DeduplicationDynamoDbRepository(DeduplicationRepository, DynamoDBBasedRepository): | ||
def get_value_for_deduplication_id(self, deduplication_id:str): | ||
response = self.table.get_item( | ||
Key={ | ||
'deduplication_id': str(deduplication_id) | ||
} | ||
) | ||
return response.get('Item', {}).get('item_string') # Returns None if not found | ||
|
||
def put_deduplication_id(self, deduplication_id: str, item_string: str): | ||
obj = { | ||
'deduplication_id': deduplication_id, | ||
'item_string': item_string | ||
} | ||
self.table.put_item( # should only be put if there is no existing entry | ||
Item=obj, | ||
Expected={ | ||
'deduplication_id': { | ||
'Exists': False | ||
} | ||
} | ||
) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
import boto3 | ||
|
||
|
||
class DynamoDBBasedRepository: | ||
def __init__(self, table_name): | ||
self.table_name = table_name | ||
self.client = boto3.client('dynamodb') | ||
if not self.table_alreaedy_exists(table_name=self.table_name): | ||
raise RuntimeError(f"DynamoDB table {self.table_name} does not exist") | ||
dynamodb = boto3.resource('dynamodb') | ||
self.table = dynamodb.Table(self.table_name) | ||
|
||
def table_alreaedy_exists(self, table_name) -> bool: | ||
try: | ||
self.client.describe_table(TableName=table_name) | ||
return True | ||
except self.client.exceptions.ResourceNotFoundException: | ||
return False |
Oops, something went wrong.