Chat with Your Data using Spark and SlashML Text-to-SQL

This project demonstrates how to integrate Apache Spark with SlashML's Text-to-SQL model to enable natural language queries on your data. The demo includes a sample retail database with sales, products, and customers data that you can query using plain English.

Enter your question > What are the total sales by category?

Generated SQL Query:
SELECT 
    p.category,
    SUM(s.total_amount) as total_sales
FROM sales s
JOIN products p ON s.product_id = p.product_id
GROUP BY p.category
ORDER BY total_sales DESC;

Query Result:
+----------+------------------+
|  category|       total_sales|
+----------+------------------+
|Electronics|         567890.45|
| Furniture|         234567.89|
|Appliances|         123456.78|
+----------+------------------+

🌟 Features

Natural language to SQL query conversion using SlashML's Text-to-SQL model
Interactive command-line interface for querying data
Built-in sample retail database with:
- Sales transactions
- Product catalog
- Customer information
Real-time query generation and execution with Spark
Data preview capabilities
Schema visualization
Comprehensive error handling

🚀 Getting Started

Setting Up SlashML Text-to-SQL Model

Visit dashboard.slashml.com
Create an account or log in
Navigate to Models section
Deploy a Text-to-SQL model
Copy the API endpoint URL

Prerequisites

Python 3.7+
Apache Spark
pip install pyspark requests

Dataset Structure

The demo includes a sample retail database with three tables:

Sales Table

CREATE TABLE sales (
    transaction_id INTEGER,
    product_id INTEGER,
    customer_id INTEGER,
    sale_date DATE,
    quantity INTEGER,
    unit_price DOUBLE,
    total_amount DOUBLE
);

Products Table

CREATE TABLE products (
    product_id INTEGER,
    product_name VARCHAR,
    category VARCHAR,
    supplier_id INTEGER
);

Customers Table

CREATE TABLE customers (
    customer_id INTEGER,
    customer_name VARCHAR,
    country VARCHAR,
    join_date DATE
);

Running the Demo

Clone this repository:

git clone https://github.com/yourusername/spark-text-to-sql-demo.git
cd spark-text-to-sql-demo

Run the demo with your SlashML API endpoint:

python spark_sql_demo.py --api-endpoint "YOUR_SLASHML_API_ENDPOINT"

💡 Usage

Available Commands

help    : Display help information and example questions
schema  : Show the database schema
preview : Show sample data preview
exit    : Exit the program

Example Questions

The demo understands complex analytical questions such as:

- What are the total sales for each product category?
- Who are the top 5 customers by total purchase amount?
- What is the average order value by country?
- How many products were sold in each month of 2023?
- Which products have never been sold?

Interactive Session Example

$ python spark_sql_demo.py
Initializing Spark session...
Creating sample data...

Available tables: sales, products, customers

Sample data preview:
Products:
+---------+------------+----------+-----------+
|product_id|product_name| category|supplier_id|
+---------+------------+----------+-----------+
|        1|     Laptop|Electronics|       101|
|        2| Smartphone|Electronics|       102|
|        3| Desk Chair| Furniture|       103|
+---------+------------+----------+-----------+

Enter your question > What are the total sales by category?

Generated SQL Query:
SELECT 
    p.category,
    SUM(s.total_amount) as total_sales
FROM sales s
JOIN products p ON s.product_id = p.product_id
GROUP BY p.category
ORDER BY total_sales DESC;

Query Result:
+----------+------------------+
|  category|       total_sales|
+----------+------------------+
|Electronics|         567890.45|
| Furniture|         234567.89|
|Appliances|         123456.78|
+----------+------------------+

🔧 Customizing the Demo

Using Your Own Data

To use your own data, modify the create_sample_data method in the SparkSQLDemo class:

Define your schemas:

your_schema = StructType([
    StructField("column_name", DataType(), nullable=False),
    # Add more fields...
])

Create your data:

your_data = [
    (value1, value2, ...),
    # Add more records...
]

Create DataFrame and register view:

your_df = self.spark.createDataFrame(your_data, your_schema)
your_df.createOrReplaceTempView("your_table")

Update the schema string in __init__:

self.schema_str = """
CREATE TABLE your_table (
    column_name DATA_TYPE,
    # Add more columns...
);
"""

🔍 Troubleshooting

API Connection Issues:

Error: Cannot connect to the SQL generation API
Solution: Verify your API endpoint and internet connection

Spark Installation:

Error: Spark not found
Solution: Ensure Spark is properly installed and SPARK_HOME is set

Query Generation:

Error: Failed to generate SQL query
Solution: Verify question is related to available schema/tables

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

SlashML for the Text-to-SQL model
Apache Spark for the data processing framework

❓ Support

For code issues: Open an issue in this repository
For SlashML questions: Visit dashboard.slashml.com
For Spark questions: Visit spark.apache.org/docs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

readme.md

readme.md

Chat with Your Data using Spark and SlashML Text-to-SQL

🌟 Features

🚀 Getting Started

Setting Up SlashML Text-to-SQL Model

Prerequisites

Dataset Structure

Running the Demo

💡 Usage

Available Commands

Example Questions

Interactive Session Example

🔧 Customizing the Demo

Using Your Own Data

🔍 Troubleshooting

📝 License

🙏 Acknowledgments

❓ Support

Files

readme.md

Latest commit

History

readme.md

File metadata and controls

Chat with Your Data using Spark and SlashML Text-to-SQL

🌟 Features

🚀 Getting Started

Setting Up SlashML Text-to-SQL Model

Prerequisites

Dataset Structure

Running the Demo

💡 Usage

Available Commands

Example Questions

Interactive Session Example

🔧 Customizing the Demo

Using Your Own Data

🔍 Troubleshooting

📝 License

🙏 Acknowledgments

❓ Support