Overview and Software Setup

This repository contains teaching resources that we will use over the fellowship. It is supplementary to the DSSG Hitchiker's Guide and heavily sourced from it, which is an invaluable resource for doing any DSSG project. This repository is tailored specifically to the tutorials/classes we will be giving over the 2018 summer fellowship in Lisbon.

Technical support:

Nuno Brás - lead technical mentor
Qiwei Han - technical mentor
William Grimes - junior technical mentor
Iñigo Martínez de Rituerto de Troya - infrastructure and technical support
João Fonseca - infrastructure support

Technical mentor’s role:

Project mentor/consultant on technical side
Core infrastructure maintenance (data, computing resources)
Technical training/support

Ask us anything about technical stuff. We will try our best to help you address the difficulties or direct you to the right person whenever necessary.

Local software setup for tutorials and projects

SSH (PuTTY for Windows)
Git (for version control)
pSQL (PostgreSQL command line interface)
IDE / text editor (Atom, Sublime, Vim, VS studio, PyCharm, Spyder, ...)
DBeaver
miniconda or pip/virtualenv with Python 3.6
Python Packages
- pandas/numpy/scipy
- matplotlib
- scikit-learn
- psycopg2
- iPython
- jupyter

For more detail on the software setup have a look here.

Try it out! You should give all installed software a quick spin to check that it did install. For your Python packages, try to import them. Type Python in your shell, and then once you are in your Python session, try for example import numpy, import matplotlib, and so on. (You can quit with exit().) Also try iPython and jupyter notebook in your terminal, and see if you get any errors.

Working in the cloud

Project work over the summer will be done in a cloud computing environment, where each project will have a seperate server (AWS EC2 instance) as their main server for large-scale data processing tasks, and a database to securely store the data. This is advantageous since data is maintained in one place, teams can collaborate easily, and you have access to scalable computing resources.

Good news: DSSG is supported by Amazon Web Service Cloud Credits for Research program and Microsoft Azure for Research awards! Amazon Web Service (AWS) https://aws.amazon.com/

Each fellow will be assigned a user account that allows you to make use of AWS service

Checklist

Have all software installed, running, and tested locally
Have a Github account created
Join the two DSSG github organisations:
- DSSG Chicago - input your github username at this link.
- DSSG Europe - you will have received an invite through e-mail.
Try to SSH into the training instance using your saved private key
ssh -i ~/path/to/pemfile.pem username@35.176.252.186

Technical Syllabus

This summer we go through a set of modules that will help you starting and/or growing as DataScientists for Social Good. Each session is briefly identified in square brackets in the calendar, like

[All] [TERMINAL] 1 command line basics

which means

All to attend, Module TERMINAL, First lesson, about command line basics

1. Terminal Module

A set of lessons that introduce you and helps you to be productive while working in the terminal. This is specially important when working in virtual machines. The sessions are:

Sessions	Week
Command line basics	w1
Software versioning with git	w1
SSH and the cloud	w1

2. SQL Module

A set of lessons to make you dominate simple and advanced SQL (PostGres) and also some relevant database concepts.

Sessions	Week
SQL basics	w1
SQL advanced	w3
Databases theory	w3

3. Python Module

A module to make all on the same pace. We work with things like

Dictionaries and other structures
Functions, Classes and Objects, numpy, matplotlib
Python Code best practices

Sessions	Week
Python beyond scripting	w2

4. Data Module

Handling of data using Pandas; feature extraction, transformation, selection;

Sessions	Week
Feature engineering	w4

5. Machine Learning Module

From general introduction to machine learning concepts up to a set of algorithms adapted to your problems.

Sessions	Week
ML intro	w3
Quantitative Social Science	w4
Causal inference	w4
ML models 1	w4
ML models 2	w6

6. ETL Module

How to bring data science solutions to production architectures. Workflows and data streamings; Data Warehouses and Data Lakes.

Sessions	Week
csvtodb and other simple data handling	w2
DAGs and other workflow systems	w5

Final Syllabus

Modules	Sessions	Week
TERMINAL	Command line basics	w1
TERMINAL	Software versioning with git	w1
TERMINAL	SSH and the cloud	w1
SQL	SQL basics	w1
PYTHON	Python beyond scripting	w2
ETL	csvtodb and other simple data handling	w2
TERMINAL	git advanced	w2
SQL	SQL advanced	w3
SQL	Databases theory	w3
ML	ML intro	w3
DATA	Feature engineering	w4
ML	Quantitative social science	w4
ML	Causal inference	w4
ML	ML models 1	w4
ETL	DAGs and other workflow systems	w5
ML	ML models 2	w6

Extra: Special Sessions

This sessions should be given in order to fulfill lack of knowledge in specific areas that could make a huge difference to some groups;
They are not compulsory.
They actually can be given by fellows!

here are some examples:

Special Sessions
Web Scrapping
GIS analysis
Network Analysis
Text Analysis
Record Linkage
Optimization

Name		Name	Last commit message	Last commit date
Latest commit History 318 Commits
00_BONUS_technical_best_practices		00_BONUS_technical_best_practices
01_TERMINAL_command_line_basics		01_TERMINAL_command_line_basics
02_TERMINAL_software_versioning_with_git		02_TERMINAL_software_versioning_with_git
03_TERMINAL_ssh_and_the_cloud		03_TERMINAL_ssh_and_the_cloud
04_ETL_csvtodb_and_other_simple_data_handling		04_ETL_csvtodb_and_other_simple_data_handling
05_SQL_sql_basics		05_SQL_sql_basics
06_SQL_sql_advanced		06_SQL_sql_advanced
07_PYTHON_python_for_data_analysis		07_PYTHON_python_for_data_analysis
08_TERMINAL_git_advanced		08_TERMINAL_git_advanced
09_SQL_sql_theory		09_SQL_sql_theory
10_DATA_feature_engineering		10_DATA_feature_engineering
11_ML_quantitative_social_science		11_ML_quantitative_social_science
12_ML_causal_inference		12_ML_causal_inference
13_ML_ml_intro		13_ML_ml_intro
14_ML_ml_models_1		14_ML_ml_models_1
15_ETL_dags_and_other_workflow_systems		15_ETL_dags_and_other_workflow_systems
16_ML_ml_models_2		16_ML_ml_models_2
images		images
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview and Software Setup

Technical support:

Technical mentor’s role:

Local software setup for tutorials and projects

Working in the cloud

Checklist

Technical Syllabus

1. Terminal Module

2. SQL Module

3. Python Module

4. Data Module

5. Machine Learning Module

6. ETL Module

Final Syllabus

Extra: Special Sessions

About

Releases

Packages

Contributors 9

Languages

DSSG-EUROPE/dssg-europe-syllabus

Folders and files

Latest commit

History

Repository files navigation

Overview and Software Setup

Technical support:

Technical mentor’s role:

Local software setup for tutorials and projects

Working in the cloud

Checklist

Technical Syllabus

1. Terminal Module

2. SQL Module

3. Python Module

4. Data Module

5. Machine Learning Module

6. ETL Module

Final Syllabus

Extra: Special Sessions

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 9

Languages

Packages