The 1st fully open-source repository for road traffic timeseries.
This repository is a mix of engineering, data science and knowledge. The underling topic is that of road traffic timeseries analysis: vehicles (or, in general) objects move in a road network and it's worth spending time to study the data, to ask questions and to uncover patterns.
Everything you will see here is open-source and reproducible:
- Data generation is reproducible via scripting and Docker files.
- The generated data is published and can be downloaded by everybody (no registration required).
- Analysis, studies and techniques are published in form of articles and/or reproducible notebooks.
The real aim of this project is to involve as many people as possible. Whether you are an experienced engineer, data scientist or a student (isn't everybody?), if you are interested in playing with these datasets then please go ahead and have fun. We hope you'll get in touch and collaborate, because we believe open-source is meant to produce and share knowledge. There's already some amazing people sharing their experience with us, and we'd love for you to be the next one.
If that's what you believe too, open an issue now and explain what ideas you have for your next article with this data.
But if for some reason you'd rather not, then you can simply download the data and use it for your own purpose. You don't need to ask permission. Take note of the licence though: it's MIT.
Here's how you can navigate this repository after you fork it.
knowledge/
is where you should start. The directory contains all articles and notebooks with the studies other contributors have made and published. Remember, you can be the next one!std_traffic
is a Python package. You can install it withpip install -e .
.std_traffic/pipelines/
is where the software pipelines for data generation, processing and storage are (sort of ETL scripts).std_traffic/utils/
contains Python functions that can be useful for a variety of things, mainly interacting with cloud storage and databases.scripts/
contains ... executable scripts!
Whenever we generate or collect data, we publish it for everybody's benefits. Next is a list of all datasets, or databases, that we have.
Principality of Monaco
These are timeseries of simulated road traffic data. The simulator used is SUMo, and the simulated city is the Principality of Monaco. We used the previous work of researchers at Communication Systems Department of Sophia-Antipolis, France. We took their (quite complex!) work and made it 100% reproducible with a Docker file. The story is told in the introduction of this article.
For a description of the data, read the introduction of this other article.
For more information about the ETL process, read this page.
Time horizon | File size | Download |
---|---|---|
4am - 6:30am | 200 MB | link |
4am - 7am | 686 MB | link |
4am - 8am | 1.4 GB | link |
4am - 8:30am | 2 GB | link |
4am - 9am | 2.5 GB | link |
4am - 10am | 3.9 GB | link |
4am - 11am | 5.2 GB | link |
4am - 12pm | 6.2 GB | link |
4am - 1pm | 7 GB | link |
4am - 2pm | 7+ GB | link |
We have also saved the same data in a database that is accessible via the internet. This is the better approach for statistical sampling and large data, instead of downloading a huge CSV. See this article for a usage example.
Maintaining the database is a bit expensive for us, especially because this is a nonprofit, self-funded project. Therefore, we don't disclose the host and password, to avoid bots.
But know this: if you request access and tell us what's your idea, we will definitely share the database credentials with you. Nobody's request was ever rejected so far. Open an issue to start collaborating!
The list is in alphabetical order (by last names).