A website to visualize train delays (under development)
Making use of the Deutsche Bahn's API for timetable and timetable changes (https://developer.deutschebahn.com/store/apis/info?name=Timetables&version=v1&provider=DBOpenData&#/), this app collects and displays delay data of trains departing from a specific train station.
- Docker-compose to orchestrate the Docker containers needed
- Airflow to manage DAGs (tasks) which are executed in fixed intervals
- Apache Kafka to handle streaming messages from the Deutsche Bahn API
- Apache Spark to work with streaming data
- Streamlit to display train delay data
- Clone this repository and install docker-compose
- Get a token from Deutsche Bahn to use its API (see https://developer.deutschebahn.com/store/apis/info?name=Timetables&version=v1&provider=DBOpenData&#/)
- Create a file
.env
in the main project folder with the following content:
AIRFLOW_CONN_SPARK_DEFAULT=spark://airflow:airflow@spark%3A%2F%2Fspark:8080
BEARER=<your_deutsche_bahn_token>
- Use
docker-compose up -d
to start the pipeline - Go to
localhost:8501
to see collected data over time
Currently, data are collected for the München-Pasing train station. To change this, change the eva
variable in functions.py
. The eva (ID) of every other station from Deutsche Bahn can be fetched by means of the get /station/{pattern}
API (see https://developer.deutschebahn.com/store/apis/info?name=Timetables&version=v1&provider=DBOpenData&#!/default/get_station_pattern).
MIT license