BFOR 416/516 Forecasting Competition

Overview

Weather prediction is difficult.

Most data analytics courses only predict past data and use train/test sets. Predicting real events which have yet to occur is far more difficult and exciting. The learning that comes from genuine prediction will put you far ahead of peers.

We are going to attempt to predict the weather. Weather prediction is a great way to learn data analytics because it contains continuous predictions (temperature), discrete predictions (rain/no rain), and has many possibilities for using unsupervised techniques (like clustering and visualization). Weather predictions require careful thought about how to structure data. The data is public and comes in daily, which makes evaluation easier.

Data

The National Oceanic and Atmospheric Administration (NOAA) publishes hourly and daily weather observations from weather stations around the world.

The GSODR R package makes it easy to download the latest weather data. We will use the Albany Internation Airport as the location for the weather data and forecasts. The station ID is 725180-14735.

Since the course is using Python, and I have not found a suitable Python replacement for GSODR, I will post the historical forecast data to Slack using a bot. You can either use the data from Slack, or run the R script download_latest_data.R.

Forecasts

You should submit the forecasts for the daily high temparature (MAX), daily low temperature (MIN), the probability of precipitation (P_PRCP). Preciptation will be have occurred when the value of the PRCP column is greater than 0.00.

The baseline prediction is named mean_prev_seven. It is simply the average temperature from the last seven days.

Submission Format

The submissions must be in a CSV file formatted exactly as the values in the tables below. The easiest way to acheive this format is to build your forecasts with the baseline_predictions.py file. Modify the code in that file to include your forecasting technique.

fc_date	fc_name	fc_var	fc_value
YYYY-MM-DD	your_name	VAR_NAME	Degrees (C)

Example:

fc_date	fc_name	fc_var	fc_value
2019-02-04	lspitzley	MAX	5.5776
2019-02-04	lspitzley	MIN	-0.5468
2019-02-04	lspitzley	P_PRCP	0.2556

A one-week forecast will should have 21 rows.

You can test your format by running your forecasts through the forecast_evaluation.py script.

Evaluation

Continuous forecasts are evaluated with sklearn regression metrics. Binary predictions are scored with standard methods that we covered in class.

The forecast_evaluation.py script reports scores to the console and to a .csv file in the output/ directory.

The RMSE will be the primary scoring metric for the temperature predictions. Log loss and AUC are the primary metrics for precipitation predictions.

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
R code		R code
forecasts		forecasts
output		output
utilities		utilities
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
baseline_predictions.py		baseline_predictions.py
download_latest_data.py		download_latest_data.py
forecast_evaluation.py		forecast_evaluation.py
forecast_functions.py		forecast_functions.py
requirements.txt		requirements.txt
slack_upload.py		slack_upload.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BFOR 416/516 Forecasting Competition

Overview

Data

Forecasts

Submission Format

Evaluation

About

Releases

Packages

Contributors 2

Languages

License

lspitzley/forecasting-competition

Folders and files

Latest commit

History

Repository files navigation

BFOR 416/516 Forecasting Competition

Overview

Data

Forecasts

Submission Format

Evaluation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages