Skip to content

BloomTech-Labs/PT15-cityspire-c-ds

Repository files navigation

CitySpire - Data Science

Docs

Mission

Be a one-stop resource for users to receive the most accurate city information.

Description

An app that analyzes data from cities such as populations, cost of living, rental rates, crime rates, park (walk score), and many other social and economic factors that are important in deciding where someone would like to live. This app will present such important data in an intuitive and easy to understand interface.

Use data to find a place right for you to live.

alt text

Data Engineering

FastAPI app, deployed to AWS, provides 3 primary routes:

  • /cityspire is a GET route that provides all of the data in the database in a table format.
  • /locations is a GET route that provides a list of all cities in the database.
  • /location/data is a POST route that takes a request of location in the form of "City, State" and returns all of the data about that location.
Type Endpoint Required Parameters Returns
GET /cityspire none "[[0, 0, "Akron, Ohio", 197597.0, 678.0, 1782.0, 27.0, 181.0, 328.0, 1246.0, 6568.0, 1686.0, 4305.0, 577.0, 65.0, 8484.440553247267, 46, 46, 90.8, 7972.779227752733], ...]"
GET /locations none { "locations": ["Akron, Ohio", "Albany, New York", ...] }
POST /location/data/ "location": "City, State" { "city_name": "El Paso, Texas", "population": 681728, "rent_per_month": 990, "walk_score": 41, "livability_score": 12687 }

[TODO] More details about the API endpoints can be found at the ReDoc interface or by exploring the interactive SwaggerUI.

Machine Learning

Nearest Neighbors ML Model for CitySpire City Living Recommendations

The data wrangling and merging and can be found in the wrangling.ipynb notebook, while the tokening and TFIDF vectoring of text, creation of Nearest Neighbors model, training on tokenized and vectorized text, and pickling of Nearest Neighbors Model and TFIDF Vectorizer can all be found in the rec_modeling.ipynb notebook in the notebooks directory.

The Nearest Neighbors and TFIDF Vectorizer pickles can be found in the pickles directory.

The pickled Nearest Neighbors model and TFIDF Vectorizer are imported into recommend.py in the app directory so that they can be used in a recommend function in the Data Engineering API in order to recommend cities to live in to users based on desired population, rental rate, crime rate, walkability score, cost of living index, and livability score.

Deployment

The CitySpire API is backed by a Postgres DB in AWS RDS. The data was uploaded to the DB using the df_to_sql.py script in the notebooks directory.

After you create your own PG DB on AWS RDS you need to add the DB URL to a .env file:

DATABASE_URL=postgresql://DBusername:DBpassword@blah.blah.blah.us-east-1.rds.amazonaws.com/dbname

Commands to deploy locally:

Create virtual environment in root directory of project: pipenv shell

Install project dependencies in virutal environment: pipenv install --dev

Launch app locally: uvicorn app.main:app --reload

Launch app locally on different port: uvicorn app.main:app --reload --port 8080

The API app is deployed to AWS Elastic Beanstalk using a Dockerfile. It is crucial to organize all of the app directories into the app directory because the Dockerfile copies the app structure from the app directory, not the root directory of this repo.

Documentation on how to set up AWS and EB CLI

Commands to deploy to Elastic Beanstalk:

Commit your work: git add --all git commit -m "Your commit message"

Then use these EB CLI commands (Elastic Beanstalk command line interface) to deploy. (Replace CHOOSE-YOUR-NAME with your own name.) eb init --platform docker --region us-east-1 CHOOSE-YOUR-NAME eb create --region us-east-1 CHOOSE-YOUR-NAME

Do you have environment variables? Then configure environment variables in the Elastic Beanstalk console.

Now you can open your deployed app! 🎉 eb open

Commands to redeploy to Elastic Beanstalk:

Commit your work: git add --all git commit -m "Your commit message"

Then use these EB CLI commands (Elastic Beanstalk command line interface) to re-deploy. eb deploy eb open

It is also possible to redeploy without committing your work with these commands: git add . eb deploy --staged

Data Sources

Population Data - https://www2.census.gov/programs-surveys/popest/tables/2010-2019/cities/totals/SUB-IP-EST2019-ANNRES.xlsx

Rental Rates - https://files.zillowstatic.com/research/public_v2/zori/Zip_ZORI_AllHomesPlusMultifamily_SSA.csv

Crime Rates - https://ucr.fbi.gov/crime-in-the-u.s/2019/crime-in-the-u.s.-2019/tables/table-8/table-8.xls/view

Walk Scores - https://www.walkscore.com/cities-and-neighborhoods/

Cost of Living Index - https://advisorsmith.com/data/coli/

Contributors

John Dailey Neha Kumari Theda Mickey Wells
Data Scientist Data Scientist Data Scientist Data Scientist