The Road Graph Tool is a project for processing data from various sources into a road graph usable as an input for transportation problems. Version 0.1.0 of the project targets to provide a road network with the following features:
- geographical location of vertices and edges, and
- geographical shape of edges
The version 0.1.0 use the following data sources:
- OpenStreetMap (OSM) data for the road network and its geographical properties,
The processing and storage of the data are done in a PostgreSQL/PostGIS database. To manipulate the database, import data to the database, and export data from the database, the project provides a set of Python scripts.
To run the tool, you need access to a local or remote PostgreSQL database with the following extensions installed:
Also, you need the osm2pgsql tool installed for importing OSM data into the database.
To execute the configured pipeline, follow these steps:
- Install the Road Graph Tool Python package:
pip install -e <clone dir>/Python
. - Create a YAML configuration file for the project. For details about this file, refer to the Configuration section.
- Configure the database. The remote database can be accessed through an SSH tunnel. The SSH tunneling is handled at the application level.
- In the
python/
directory, run thescripts/install_sql.py
. This script will initialize the needed tables, procedures, functions, etc., in your database. - Run the
main.py
script with with the path to the configuration file as the first argument. The script will execute the pipeline according to the configuration file.
For configuring the Road Graph Tool, we use the YAML format. The path to the configuration file should be specified as a first argument when running the main script. All the relative paths specified in the configuration file are relative to the configuration file itself, unless specified otherwise. The main configuration affecting the whole tool is in the root of the configuration file. Other parameters are in following sections:
db
: database configurationimport
: configuration for the import componentexport
: configuration for the export component
In the root of the project, there is an example configuration file named config-example.yml
.
Additionaly, it is necessary to store some sensitive information like passwords. These are stored in the secrets.yml
file, that should be stored in the same directory as the configuration file. The structure of the file is the same as the structure of the main configuration file. The example file is stored in the root of the project and is named secrets-example.yml
.
For testing the PostgreSQL procedures that are the core of the Road Graph Tool, we use the pgTAP testing framework. To learn how to use pgTAP, see the pgTAP manual.
To run the tests, follow these steps:
- Install the
pgTAP
extension for your PostgreSQL database cluster according to the pgTAP manual. - If you haven't already, create and initialize the database
- create new database using
CREATE DATABASE <database_name>;
- copy the
config-EXAMPLE.ini
file toconfig.ini
and fill in the necessary information - inititalize new database using the script
<rgt root>/python/scripts/install_db.py
.- this script will install all necessary extensions and create all necessary tables, procedures, and functions.
- the configuration for the database is loaded from the
config.ini
file.
- create new database using
- Execute the tests by running the following query in your PostgreSQL console:
SELECT * FROM run_all_tests();
- This query will return a result set containing the execution status of each test.
Because the map processing with Road Graph Tool can be time-consuming, it is recommended to filter the input data before processing. Most importantly, the data should be filtered to include
- only the area of interest, and
- only the objects of interest.
The following tools are available for filtering the input data:
The road graph tool consists of a set of components that are responsible for individual processing steps, importing data, or exporting data. Each component is implemented as an PostgreSQL procedure or Python script, possibly calling other procedures or functions. Additionally, each component has its own Python wrapper script that connects to the database and calls the procedure. Currently, the following components are implemented:
- OSM file processing for importing to PostgreSQL database: processes data from OSM file that are to be imported into PostgreSQL database for further use
- Graph Contraction: simplifies the road graph by contracting nodes and creating edges between the contracted nodes.
This component processes the data in an Open Street Map (OSM) XML file format and imports it into a PostgreSQL database.
Before processing and loading data (can be downloaded at Geofabrik) into the database, we'll need to install several libraries:
- psql for PostgreSQL
- osmium: osmium-tool (macOS:
brew install osmium-tool
, Ubuntu:apt install osmium-tool
) for preprocessing of OSM files - osm2pgsql (macOS:
brew install osm2pgsql
, Ubuntu:apt install osm2pgsql
for version 1.6.0) for importing - the current version of RGT is compatible with both2.0.0
and1.11.0
version ofosm2pgsql
. The PostgreSQL database needs PostGis extension in order to enable spatial and geographic capabilities within the database, which is essential for working with OSM data. Loading large OSM files to database is memory demanding so documentation suggests to have RAM of at least the size of the OSM file.
Preprocessing an OSM file with osmium aims to enhance importing efficiency and speed of osm2pgsql tool. The two most common actions are sorting and renumbering. For these options, you can use the provided process_osm.py
Python script:
python3 process_osm.py [option_flag] [input_file] -o [output_file]
Call python3 process_osm.py -h
or python3 process_osm.py --help
for more information.
- Sorting: Sorts objects based on IDs in ascending order.
python3 process_osm.py s [input_file] -o [output_file]
- Renumbering: Negative IDs usually represent inofficial non-OSM data (no clashes with OSM data), osm2pgsql can only handle positive sorted IDs (negative IDs are used internally for geometries). Renumbering starts at index 1 and goes in ascending order.
python3 process_osm.py r [input_file] -o [output_file]
- Sorting and renumbering: Sorts and renumbers IDs in ascending order starting from index 1.
python3 process_osm.py sr [input_file] -o [output_file]
The primary function of process_osm.py
script is to import OSM data to the database using osm2pgsql
tool configured by Flex output. Flex output allows more flexible configuration such as filtering logic and creating additional types (e.g. areas, boundary, multipolygons) and tables for various POIs (e.g. restaurants, themeparks) to get the desired output. To use it, we define the Flex style file (Lua script) that has all the logic for processing data in OSM file.
The u
flag triggers import_osm_to_db() function, which requires the OSM file path as an argument.
Function import_osm_to_db():
- Imports the data into the database (default schema is `public, but a different schema can be specified) with provided Lua style file - if omitted, the default style file pipeline.lua is used. To customize the style file, set a new path for the DEFAULT_STYLE_FILE.
- Postprocesses the data in database if specified in POSTPROCESS_DICT, which can be configured based on the style file used during importing
python3 process_osm.py u [input_file] [-l style_file]
WARNING: Running this command will overwrite existing data in the relevant table (these tables are specified in schema.py). If you wish to proceed, use
--force
flag to overwrite or create new schema for new data.
E.g. this command (described bellow) processes OSM file of Lithuania using Flex output and uploads it into database (all configurations should be provided in config.ini
in root folder of the project).
# runs with pipeline.lua
python3 process_osm.py u lithuania-latest.osm.pbf
# runs with simple.lua script
python3 process_osm.py u lithuania-latest.osm.pbf -l resources/lua_styles/simple.lua
Nodes in Lithuania:
Data are often huge and lot of times we only need certain extracts or objects of interest in our database. So it's better practice to filter out only what we need and work with that in our database.
Both osmium and osm2pgsql filter data inside the bounding box of following format: bottom-left (minlon,minlat) corner, top-right (maxlon,maxlat) corner
.
Nodes inside bounding box in Lithuania:
- These commands process OSM file using bounding box coordinates to filter data within the bounding box. File
resources/extracted-bbox.osm.pbf
is created and can be futher processed with Flex output.
# bounding box specified directly
python3 filter_osm.py b [input_file] -c [left],[bottom],[right],[top]
# bounding box specified in config file:
python3 filter_osm.py b [input_file] -c [config_file]
- E.g. extract bounding box of Lithuania OSM file:
python3 filter_osm.py b lithuania-latest.osm.pbf -c 25.12,54.57,25.43,54.75
# or:
python3 filter_osm.py b lithuania-latest.osm.pbf -c resources/extract-bbox.geojson
- We can calculate the greatest bounding box coordinates using
python3 process_osm.py b
based on the ID of relation (mentioned in 3.1.2) that specifies the area of interest (e.g. Vilnius - capital of Lithuania). This command processes OSM file using calculated bounding box coordinates with Flex output and imports the bounded data into database.
# find bbox (uses Python script find_bbox.py)
python3 process_osm.py b [input_file] -id [relation_id] -s [style_file]
- E.g. this command extracts greatest bounding box from given relation ID of Lithuania OSM file and uploads it to PostgreSQL database using osm2pgsql:
python3 process_osm.py b lithuania-latest.osm.pbf -id 1529146
For more precise extraction, we define multipolygon - its specification is based on relation ID: https://www.openstreetmap.org/api/0.6/relation/RELATION-ID/full.
It's better to filter out only what we need with osmium (before processing with flex output) as suggested.
Ways inside multipolygon of Vilnius:
-
ID can be found by specific filtering using
resources/expression-example.txt
or on OpenStreetMap - more on how to filter- use
name:en
for easiest filtering
NOTE:
admin_level=*
expression represents administrative level of feature (borders of territorial political entities) - each country (even county) can have different numbering - use
-
e.g. to find relation ID that bounds Vilnius city (ID: 1529146), run double tag filtration:
# expressions-example.txt should contain: r/type=boundary
python3 filter_osm.py f lithuania-latest.osm.pbf -e expressions-example.txt
# expressions-example.txt should contain: r/name:en=Vilnius
python3 filter_osm.py f lithuania-latest.osm.pbf -e expressions-example.txt
- get multipolygon extract that can be further processed with Flex output:
python3 filter_osm.py id [input_file] -rid [relation_id] [-s strategy]
# E.g. extract multipolygon based on relation ID of Vilnius city:
python3 filter_osm.py id lithuania-latest.osm.pbf -rid 1529146 # creates: id_extract.osm
python3 process_osm.py u id_extract.osm
- Strategies (optional for
id
andb
flags infilter_osm.py
) are used to extract region in certain way: use[-s strategy]
to set strategy:- simple: faster, doesn't include complete ways (ways out of multipolygon)
- complete ways: ways are reference-complete
- smart: ways and multipolygon relations (by default) are reference-complete
Filter specific objects based on tags.
- common tags:
- amenity, building, highway, leisure, natural, boundary
- find more tags here
Ways with highway tag in Lithuania:
- use
resources/expressions-example.txt
to specify tags to be filtered in format:[object_type]/[expression]
where:object_type
: n (nodes), w (ways), r (relations) - can be combinedexpression
: what it should match against- more details
python3 filter_osm.py t [input_file] -e [expression_file] [-R]
- Optional
-R
flag: nodes referenced in ways and members referenced in relations will not be added to output if-R
flag is used - e.g. to filter out highway objects use:
# expression file contains: nwr/highway
python3 filter_osm.py t [input_file] -e [expression_file]
- use
filter_osm.py h
to filter objects with highway tags (even referenced and untagged)
- Use lua style files to filter out objects that have the desired tag.
- e.g. to filter out highway objects use
resources/lua_styles/filter-highway.lua
which filters nodes, ways and relations with highway flag
- e.g. to filter out highway objects use
python3 process_osm.py u lithuania-latest.osm.pbf -s resources/lua_styles/filter-highway.lua
NOTE: Unfortunately, untagged nodes and members referenced in ways and relations respectively can't be included as
osm2pgsql
processes objects in certain order. Usefilter_osm.py
for filtering referenced objects too.
- More examples of various Flex configurations can be found in the oficial osm2pgsql GitHub project.
Both filter_osm.py
and process_osm.py
output some basic logging info. Use -v/--verbose
for more debugging.
This script contracts the road graph within a specified area.
- function:
contract_graph_in_area
- SQL procedure:
contract_graph_in_area
- location:
python/main.py
- required tables:
nodes
edges
road_segments
The SQL procedure contract_graph_in_area
processes the graph in the following steps, visualized in the diagram below:
- Road Segments Table Creation: Generates a temporary table containing road segments within a target area. A road segment is a line between two subsequent nodes from the OSM data.
- Graph Contraction: Contracts the graph by creating a temporary table that holds the contraction information for each node.
- Node Updates: Updates the nodes in the database to mark some of them as contracted.
- Edge Creation: Generates edges for both contracted and non-contracted road segments.
- Contraction Segments Generation: Creates contraction segments to facilitate the creation of edges for contracted road segments.
The exporter component is responsible for exporting the processed data from the database. Currently, the following formats are supported:
- CSV: exports the data to two CSV files: one for nodes and one for edges. The columns are separated by a tabulator.
- Shapefile: exports the data to two shapefiles: one for nodes and one for edges.
The output files contain the following fields:
The nodes file contains
id
: the unique identifier of the node. The id goes from 0 to the number of exported nodes - 1, so it can be used as an index.db_id
: the unique identifier of the node in the database.x
: the x-coordinate of the node.y
: the y-coordinate of the node.
The edges file contains:
u
: theid
of the starting node of the edge.v
: theid
of the ending node of the edge.db_id_from
: the unique identifier of the starting node in the database.db_id_to
: the unique identifier of the ending node in the database.length
: the length of the edge in meters.speed
: the speed on the edge in km/h.