Skip to content

Commit

Permalink
Modified notebook with feedback
Browse files Browse the repository at this point in the history
  • Loading branch information
chetan thote authored and chetan thote committed Nov 21, 2024
1 parent b011d93 commit d42a383
Showing 1 changed file with 68 additions and 57 deletions.
125 changes: 68 additions & 57 deletions notebooks/load-csv-data-s3-placeholder/notebook.ipynb
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{
"cells": [
{
"id": "e82b93c8",
"id": "c55dedcc",
"cell_type": "markdown",
"metadata": {},
"source": [
Expand All @@ -17,7 +17,7 @@
]
},
{
"id": "d657c3f4",
"id": "e8bf6bf1",
"cell_type": "markdown",
"metadata": {},
"source": [
Expand All @@ -35,18 +35,45 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"This notebook demonstrates how to create a sample table in SingleStore, set up a pipeline to import data from an Amazon S3 bucket, and run queries on the imported data. It is designed for users who want to integrate S3 data with SingleStore and explore the capabilities of pipelines for efficient data ingestion."
"<div class=\"alert alert-block alert-warning\">\n",
" <b class=\"fa fa-solid fa-exclamation-circle\"></b>\n",
" <div>\n",
" <p><b>Input Credentials</b></p>\n",
" <p>Define the <b>URL</b>, <b>REGION</b>, <b>ACCESS_KEY</b>, and <b>SECRET_ACCESS_KEY</b> variables below for integration, replacing the placeholder values with your own.</p>\n",
" </div>\n",
"</div>"
],
"id": "4b3c156d"
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"URL = 's3://your-bucket-name/your-data-file.csv'\n",
"REGION = 'your-region'\n",
"ACCESS_KEY = 'access_key_id'\n",
"SECRET_ACCESS_KEY = 'access_secret_key'"
],
"id": "e9e6acea"
"id": "80939f42"
},
{
"cell_type": "markdown",
"id": "64fdd646",
"metadata": {},
"source": [
"This notebook demonstrates how to create a sample table in SingleStore, set up a pipeline to import data from an Amazon S3 bucket, and run queries on the imported data. It is designed for users who want to integrate S3 data with SingleStore and explore the capabilities of pipelines for efficient data ingestion."
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"<h3>Demo Flow</h3>"
"<h3>Pipeline Flow Illustration</h3>"
],
"id": "4933b61c"
"id": "85c97bbb"
},
{
"attachments": {},
Expand All @@ -55,35 +82,38 @@
"source": [
"<img src=https://singlestoreloaddata.s3.ap-south-1.amazonaws.com/images/LoadDataCSV.png width=\"100%\" hight=\"50%\"/>"
],
"id": "ef90c6c9"
"id": "7b97f983"
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Sample Table in SingleStore\n",
"## Creating Table in SingleStore\n",
"\n",
"Start by creating a table that will hold the data imported from S3."
],
"id": "8a85c544"
"id": "d1ea9458"
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"%%sql\n",
"CREATE TABLE IF NOT EXISTS sample_table (\n",
"%%sql\n",
"/* Feel free to change table name and schema */\n",
"\n",
"CREATE TABLE IF NOT EXISTS my_table (\n",
" id INT,\n",
" name VARCHAR(255),\n",
" age INT,\n",
" address TEXT,\n",
" created_at TIMESTAMP\n",
");"
],
"id": "2ca9281c"
"id": "66eb1b49"
},
{
"attachments": {},
Expand All @@ -99,31 +129,7 @@
"Proper IAM roles or access keys are configured in SingleStore.\n",
"The CSV file has a structure that matches the table schema.</i>"
],
"id": "a88192c9"
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Set Up Variables\n",
"\n",
"Define the <b>URL</b>, <b>REGION</b>, <b>ACCESS_KEY</b>, and <b>SECRET_ACCESS_KEY</b> variables for integration, replacing the placeholder values with your own."
],
"id": "87d2c776"
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"URL = 's3://your-bucket-name/your-data-file.csv'\n",
"REGION = 'your-region'\n",
"ACCESS_KEY = 'access_key_id'\n",
"SECRET_ACCESS_KEY = 'access_secret_key'"
],
"id": "78c44e19"
"id": "3704a19a"
},
{
"attachments": {},
Expand All @@ -132,7 +138,7 @@
"source": [
"Using these identifiers and keys, execute the following statement."
],
"id": "eb0c3643"
"id": "83a75641"
},
{
"cell_type": "code",
Expand All @@ -141,18 +147,18 @@
"outputs": [],
"source": [
"%%sql\n",
"\n",
"%%sql\n",
"CREATE PIPELINE s3_import_pipeline\n",
"AS LOAD DATA S3 '{{URL}}'\n",
"CONFIG '{\\\"REGION\\\":\\\"{{REGION}}\\\"}'\n",
"CREDENTIALS '{\\\"AWS_ACCESS_KEY_ID\\\": \\\"{{ACCESS_KEY}}\\\",\n",
" \\\"AWS_SECRET_ACCESS_KEY\\\": \\\"{{SECRET_ACCESS_KEY}}\\\"}'\n",
"INTO TABLE sample_table\n",
"INTO TABLE my_table\n",
"FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\\\"'\n",
"LINES TERMINATED BY '\\n'\n",
"IGNORE 1 lines;"
],
"id": "6efe3112"
"id": "e88495e2"
},
{
"attachments": {},
Expand All @@ -163,18 +169,19 @@
"\n",
"To start the pipeline and begin importing the data from the S3 bucket:"
],
"id": "5902a86a"
"id": "08781117"
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"%%sql\n",
"%%sql\n",
"START PIPELINE s3_import_pipeline;"
],
"id": "bb436fc2"
"id": "adb89dc1"
},
{
"attachments": {},
Expand All @@ -185,7 +192,7 @@
"\n",
"Once the data has been imported, you can run a query to select it:"
],
"id": "c82c8439"
"id": "15275805"
},
{
"cell_type": "code",
Expand All @@ -194,9 +201,10 @@
"outputs": [],
"source": [
"%%sql\n",
"SELECT * FROM sample_table LIMIT 10;"
"%%sql\n",
"SELECT * FROM my_table LIMIT 10;"
],
"id": "bb740975"
"id": "f2855248"
},
{
"attachments": {},
Expand All @@ -205,7 +213,7 @@
"source": [
"### Check if all data of the data is loaded"
],
"id": "df1cdb14"
"id": "ea34b089"
},
{
"cell_type": "code",
Expand All @@ -214,9 +222,10 @@
"outputs": [],
"source": [
"%%sql\n",
"SELECT count(*) FROM sample_table"
"%%sql\n",
"SELECT count(*) FROM my_table"
],
"id": "dd98c9a3"
"id": "4e9023e8"
},
{
"attachments": {},
Expand All @@ -228,7 +237,7 @@
"We have shown how to insert data from a Amazon S3 using `Pipelines` to SingleStoreDB. These techniques should enable you to\n",
"integrate your Amazon S3 with SingleStoreDB."
],
"id": "892e7f8d"
"id": "236cb111"
},
{
"attachments": {},
Expand All @@ -239,7 +248,7 @@
"\n",
"Remove the '#' to uncomment and execute the queries below to clean up the pipeline and table created."
],
"id": "3c053a57"
"id": "6d195418"
},
{
"attachments": {},
Expand All @@ -248,20 +257,21 @@
"source": [
"#### Drop Pipeline"
],
"id": "8874a110"
"id": "8eafa43e"
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"%%sql\n",
"%%sql\n",
"#STOP PIPELINE s3_import_pipeline;\n",
"\n",
"#DROP PIPELINE s3_import_pipeline;"
],
"id": "043861f7"
"id": "451e80c9"
},
{
"attachments": {},
Expand All @@ -270,7 +280,7 @@
"source": [
"#### Drop Data"
],
"id": "445c6369"
"id": "ae825668"
},
{
"cell_type": "code",
Expand All @@ -279,12 +289,13 @@
"outputs": [],
"source": [
"%%sql\n",
"#DROP TABLE sample_table;"
"%%sql\n",
"#DROP TABLE my_table;"
],
"id": "f8b697e5"
"id": "3c4b631d"
},
{
"id": "39231766",
"id": "89365517",
"cell_type": "markdown",
"metadata": {},
"source": [
Expand Down

0 comments on commit d42a383

Please sign in to comment.