This project demonstrates how to integrate Apache Spark with SlashML's Text-to-SQL model to enable natural language queries on your data. The demo includes a sample retail database with sales, products, and customers data that you can query using plain English.
Enter your question > What are the total sales by category?
Generated SQL Query:
SELECT
p.category,
SUM(s.total_amount) as total_sales
FROM sales s
JOIN products p ON s.product_id = p.product_id
GROUP BY p.category
ORDER BY total_sales DESC;
Query Result:
+----------+------------------+
| category| total_sales|
+----------+------------------+
|Electronics| 567890.45|
| Furniture| 234567.89|
|Appliances| 123456.78|
+----------+------------------+
- Natural language to SQL query conversion using SlashML's Text-to-SQL model
- Interactive command-line interface for querying data
- Built-in sample retail database with:
- Sales transactions
- Product catalog
- Customer information
- Real-time query generation and execution with Spark
- Data preview capabilities
- Schema visualization
- Comprehensive error handling
- Visit dashboard.slashml.com
- Create an account or log in
- Navigate to Models section
- Deploy a Text-to-SQL model
- Copy the API endpoint URL
Python 3.7+
Apache Spark
pip install pyspark requests
The demo includes a sample retail database with three tables:
- Sales Table
CREATE TABLE sales (
transaction_id INTEGER,
product_id INTEGER,
customer_id INTEGER,
sale_date DATE,
quantity INTEGER,
unit_price DOUBLE,
total_amount DOUBLE
);
- Products Table
CREATE TABLE products (
product_id INTEGER,
product_name VARCHAR,
category VARCHAR,
supplier_id INTEGER
);
- Customers Table
CREATE TABLE customers (
customer_id INTEGER,
customer_name VARCHAR,
country VARCHAR,
join_date DATE
);
- Clone this repository:
git clone https://github.com/yourusername/spark-text-to-sql-demo.git
cd spark-text-to-sql-demo
- Run the demo with your SlashML API endpoint:
python spark_sql_demo.py --api-endpoint "YOUR_SLASHML_API_ENDPOINT"
help : Display help information and example questions
schema : Show the database schema
preview : Show sample data preview
exit : Exit the program
The demo understands complex analytical questions such as:
- What are the total sales for each product category?
- Who are the top 5 customers by total purchase amount?
- What is the average order value by country?
- How many products were sold in each month of 2023?
- Which products have never been sold?
$ python spark_sql_demo.py
Initializing Spark session...
Creating sample data...
Available tables: sales, products, customers
Sample data preview:
Products:
+---------+------------+----------+-----------+
|product_id|product_name| category|supplier_id|
+---------+------------+----------+-----------+
| 1| Laptop|Electronics| 101|
| 2| Smartphone|Electronics| 102|
| 3| Desk Chair| Furniture| 103|
+---------+------------+----------+-----------+
Enter your question > What are the total sales by category?
Generated SQL Query:
SELECT
p.category,
SUM(s.total_amount) as total_sales
FROM sales s
JOIN products p ON s.product_id = p.product_id
GROUP BY p.category
ORDER BY total_sales DESC;
Query Result:
+----------+------------------+
| category| total_sales|
+----------+------------------+
|Electronics| 567890.45|
| Furniture| 234567.89|
|Appliances| 123456.78|
+----------+------------------+
To use your own data, modify the create_sample_data
method in the SparkSQLDemo
class:
- Define your schemas:
your_schema = StructType([
StructField("column_name", DataType(), nullable=False),
# Add more fields...
])
- Create your data:
your_data = [
(value1, value2, ...),
# Add more records...
]
- Create DataFrame and register view:
your_df = self.spark.createDataFrame(your_data, your_schema)
your_df.createOrReplaceTempView("your_table")
- Update the schema string in
__init__
:
self.schema_str = """
CREATE TABLE your_table (
column_name DATA_TYPE,
# Add more columns...
);
"""
- API Connection Issues:
Error: Cannot connect to the SQL generation API
Solution: Verify your API endpoint and internet connection
- Spark Installation:
Error: Spark not found
Solution: Ensure Spark is properly installed and SPARK_HOME is set
- Query Generation:
Error: Failed to generate SQL query
Solution: Verify question is related to available schema/tables
This project is licensed under the MIT License - see the LICENSE file for details.
- SlashML for the Text-to-SQL model
- Apache Spark for the data processing framework
- For code issues: Open an issue in this repository
- For SlashML questions: Visit dashboard.slashml.com
- For Spark questions: Visit spark.apache.org/docs