- To create a sizable amount of fake data in Snowflake in the absolute cheapest way possible.
- Showing cost difference between write_pandas (copy command) vs Snowpipe.
- snowpipe does not use a warehouse; it ends up being significantly cheaper.
- Install
pdm
globally, if you don't already have it.pip install pdm
- run
pdm install
, which will install of the dependancies for this project.
- Prerequisite: aws cli configured and working.
- Add a .env file to the project root and add these variables.
SNOWFLAKE_USER=YOUR_USER SNOWFLAKE_PASSWORD=YOUR_PASSWORD SNOWFLAKE_ACCOUNT=your-account SNOWFLAKE_WAREHOUSE=YOUR_WAREHOSUE SNOWFLAKE_DATABASE=YOUR_DATABASE SNOWFLAKE_SCHEMA=YOUR_SCHEMA SNOWFLAKE_ROLE=YOUR_ROLE SECRET_NAME = your-snowflake-secret-name-in-secrets-manager REGION = aws-region-of-your-secret
- Review config.py to see how variables are set from
.env
- Directly run
SecretsManager.py
once. This will create a new secret in AWS Secrets Manager according to your environment variables set in.env
.- This assumes you have rsa_key.p8 in the root directory of the project. This is the key file for your Snowflake user.
- This project does not cover setting up the keypair auth.
- You can run this locally by running
fake_data_to_snowflake.py
directly, adjusting thenumber_of_rows
input.
- I'm hoping to add automated steps using
aws cdk
in the near future. For now, this is not a "one click build" project. - For now, follow these manual steps:
- Build the docker container.
- Push it to
ECR
- Create a lambda that runs docker, select an
arm64
image type. - json body should be like this:
{ "rows": 100000 }
- Schedule it using EventBridge.
- Enjoy your fake streaming table.