The Snowpark library provides intuitive APIs for querying and processing data in a data pipeline. Using this library, you can build applications that process data in Snowflake without having to move data to the system where your application code runs.
Source code | Snowpark Python developer guide | Snowpark Python API reference | Snowpark pandas developer guide | Snowpark pandas API reference | Product documentation | Samples
If you don't have a Snowflake account yet, you can sign up for a 30-day free trial account.
You can use miniconda, anaconda, or virtualenv to create a Python 3.8, 3.9, 3.10 or 3.11 virtual environment.
For Snowpark pandas, only Python 3.9, 3.10, or 3.11 is supported.
To have the best experience when using it with UDFs, creating a local conda environment with the Snowflake channel is recommended.
pip install snowflake-snowpark-python
To use the Snowpark pandas API, you can optionally install the following, which installs modin in the same environment. The Snowpark pandas API provides a familiar interface for pandas users to query and process data directly in Snowflake.
pip install "snowflake-snowpark-python[modin]"
from snowflake.snowpark import Session
connection_parameters = {
"account": "<your snowflake account>",
"user": "<your snowflake user>",
"password": "<your snowflake password>",
"role": "<snowflake user role>",
"warehouse": "<snowflake warehouse>",
"database": "<snowflake database>",
"schema": "<snowflake schema>"
}
session = Session.builder.configs(connection_parameters).create()
# Create a Snowpark dataframe from input data
df = session.create_dataframe([[1, 2], [3, 4]], schema=["a", "b"])
df = df.filter(df.a > 1)
result = df.collect()
df.show()
# -------------
# |"A" |"B" |
# -------------
# |3 |4 |
# -------------
import modin.pandas as pd
import snowflake.snowpark.modin.plugin
from snowflake.snowpark import Session
CONNECTION_PARAMETERS = {
'account': '<myaccount>',
'user': '<myuser>',
'password': '<mypassword>',
'role': '<myrole>',
'database': '<mydatabase>',
'schema': '<myschema>',
'warehouse': '<mywarehouse>',
}
session = Session.builder.configs(CONNECTION_PARAMETERS).create()
# Create a Snowpark pandas dataframe from input data
df = pd.DataFrame([['a', 2.0, 1],['b', 4.0, 2],['c', 6.0, None]], columns=["COL_STR", "COL_FLOAT", "COL_INT"])
df
# COL_STR COL_FLOAT COL_INT
# 0 a 2.0 1.0
# 1 b 4.0 2.0
# 2 c 6.0 NaN
df.shape
# (3, 3)
df.head(2)
# COL_STR COL_FLOAT COL_INT
# 0 a 2.0 1
# 1 b 4.0 2
df.dropna(subset=["COL_INT"], inplace=True)
df
# COL_STR COL_FLOAT COL_INT
# 0 a 2.0 1
# 1 b 4.0 2
df.shape
# (2, 3)
df.head(2)
# COL_STR COL_FLOAT COL_INT
# 0 a 2.0 1
# 1 b 4.0 2
# Save the result back to Snowflake with a row_pos column.
df.reset_index(drop=True).to_snowflake('pandas_test2', index=True, index_label=['row_pos'])
The Snowpark Python developer guide, Snowpark Python API references, Snowpark pandas developer guide, and Snowpark pandas api references have basic sample code. Snowflake-Labs has more curated demos.
Configure logging level for snowflake.snowpark
for Snowpark Python API logs.
Snowpark uses the Snowflake Python Connector.
So you may also want to configure the logging level for snowflake.connector
when the error is in the Python Connector.
For instance,
import logging
for logger_name in ('snowflake.snowpark', 'snowflake.connector'):
logger = logging.getLogger(logger_name)
logger.setLevel(logging.DEBUG)
ch = logging.StreamHandler()
ch.setLevel(logging.DEBUG)
ch.setFormatter(logging.Formatter('%(asctime)s - %(threadName)s %(filename)s:%(lineno)d - %(funcName)s() - %(levelname)s - %(message)s'))
logger.addHandler(ch)
Snowpark Python API supports reading from and writing to a pandas DataFrame via the to_pandas and write_pandas commands.
To use these operations, ensure that pandas is installed in the same environment. You can install pandas alongside Snowpark Python by executing the following command:
pip install "snowflake-snowpark-python[pandas]"
Once pandas is installed, you can convert between a Snowpark DataFrame and pandas DataFrame as follows:
df = session.create_dataframe([[1, 2], [3, 4]], schema=["a", "b"])
# Convert Snowpark DataFrame to pandas DataFrame
pandas_df = df.to_pandas()
# Write pandas DataFrame to a Snowflake table and return Snowpark DataFrame
snowpark_df = session.write_pandas(pandas_df, "new_table", auto_create_table=True)
Snowpark pandas API also supports writing to pandas:
import modin.pandas as pd
df = pd.DataFrame([[1, 2], [3, 4]], columns=["a", "b"])
# Convert Snowpark pandas DataFrame to pandas DataFrame
pandas_df = df.to_pandas()
Note that the above Snowpark pandas commands will work if Snowpark is installed with the [modin]
option, the additional [pandas]
installation is not required.
Please refer to CONTRIBUTING.md.