Skip to content

jorgeviz/flaspark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Asynchronous Pyspark Framework in Flask

Web application framework with asyncronous Pyspark jobs execution.

Pre-requirements

Flaspark v0.0.1 tested with:

  • Ubuntu 16.04 LTS
  • Python 3.4
  • Spark 2.0.0
  • Redis 2.8.4

(Not guaranteed for other versions).

Create Virtualenv and install Python reqs.

    #!/usr/bin/env bash
    sudo apt-get -y update

    # Virtualenv installation
    sudo apt-get -y install python3-pip
    sudo pip3 install virtualenv

    virtualenv env
    . env/bin/activate
    pip install -r requirements.txt

Install Redis

    sudo apt-get install redis-server

Install Spark.

    #!/usr/bin/env bash

    sudo apt-get -y update

    # Install openjdk if needed
    # sudo apt-get purge openjdk*
    # sudo apt-get install -y openjdk-7-jdk

    # Spark installation
    wget http://d3kbcqa49mib13.cloudfront.net/spark-1.3.1-bin-hadoop2.6.tgz -O spark.tgz
    tar -xf spark.tgz
    rm spark.tgz
    sudo mv spark-* ~/spark

Spark Installation Reference : https://github.com/sloanahrens/qbox-blog-code

Env Vars

  • SPARK_MASTER
  • SPARK_MASTER_HOME
  • SPARK_CLIENT_HOME
  • PYTHON_PATH
  • FLASK_APP
  • APP_HOST
  • APP_PORT
  • DEBUG
  • LOGGING_LEVEL
  • CELERY_BROKER
  • CELERY_HOST
  • CELERY_PORT

Deploy

    # Run Redis Server in background
    redis-server &
    source .envvars
    source env/bin/activate
    # Start Celery
    celery worker -A app.celery --loglevel=INFO --concurrency=1
    # Only For Local
    python wsgi.py

Credits

Releases

No releases published

Packages

No packages published