Understanding the complete UK transit network relies on the knowledge and software that can parse various other transit feeds such as bus data, provided in TransXChange format, and train data, provided in CIF format.
The initial tasks was to convert these formats to GTFS. The team indentified two viable converters: (i) C# based TransXChange2GTFS to convert TransXChange data; and (ii) sql based dtd2mysql to convert CIF data. The TransXChange2GTFS code was modified by the Campus and pushed back (successfully) to the original repository. The team behind dtd2mysql, planar network, have since created their own TransXChange to GTFS converter, which does not require a C# compiler.
Below is a more detailed set-by-step guide on how these converters are used.
A GTFS folder typically comprises the following files:
Filename | Description | Required? |
---|---|---|
agency.txt | Contains information about the service operator | Yes |
stops.txt | Contains details of each stop in the timetables provided | Yes |
routes.txt | Contains information about the route | Yes |
trips.txt | Contains information about each trip on a route and service | Yes |
stop_times.txt | Contains the start and end times for stops on a journey | Yes |
calendar.txt | The start and end dates of journeys | Yes |
calendar_dates.txt | Shows exceptions for journeys for holidays etc | Optional |
fare_attributes.txt | Contains information about journey fares | Optional |
fare_rules.txt | Assigns fares to certain journeys | Optional |
transfers.txt | Transfer type and time between stops | Optional |
Transport network models such as OpenTripPlanner (OTP) require a ZIP folder of these files.
UK bus data in TransXChange format can be downloaded from here following the creation of an account at the Traveline website, here. The data is catergorised by region. For our work, we downloaded the Wales (W) data. The data will be contained within a series of XML files for each bus journey. For example, here is a snippet of the CardiffBus28-CityCentre-CityCentre6_TXC_2018803-1215_CBAO028A.xml
:
<?xml version="1.0" encoding="utf-8"?>
<TransXChange xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xsi:schemaLocation="http://www.transxchange.org.uk/ http://www.transxchange.org.uk/schema/2.1/TransXChange_general.xsd" CreationDateTime="2018-08-03T12:15:26" ModificationDateTime="2018-08-03T12:15:26" Modification="revise" RevisionNumber="1" FileName="CardiffBus28-CityCentre-CityCentre6_TXC_2018803-1215_CBAO028A.xml" SchemaVersion="2.1" RegistrationDocument="false" xmlns="http://www.transxchange.org.uk/">
<ServicedOrganisations>
<ServicedOrganisation>
<OrganisationCode>CDS</OrganisationCode>
<Name>Cardiff</Name>
<WorkingDays>
Initially, we used TransXChange2GTFS to convert the TransXChange files into GTFS format. TransXChange is a C# tool. Our method to convert the data was:
- Place the XML files in the 'dir/input' folder.
- Run the Program.cs file (i.e.,
dotnet run Program.cs
). - The GTFS txt files will be created in the 'dir/output' folder.
- Compress the txt files to a ZIP folder with an appropriate name (e.g., 'bus_GTFS.zip').
The team, planar network, who we initially used to convert the UK train data to GTFS, have created a TypeScript TransXChange to GTFS converter, transxchange2gtfs. Their GitHub page provides good detailed instructions to installing and converting the files. The method we used was:
- Install the converter as per the GitHub instructions.
- Run
transxchange2gtfs path/to/GTFS/file.zip gtfs-output.zip
in terminal/command line.
As mentioned above, UK train data in CIF format can be downloaded from here following the creation of an account. The timetable data will download as a zipped folder named 'ttis***.zip'.
Inside the zipped folder will be the following files: ttfis***.alf, ttfis***.dat, ttfis***.flf, ttfis***.mca, ttfis***.msn, ttfis***.set, ttfis***.tsi, and ttfis***.ztr. Most of these files are difficult to read, hence the need for GTFS.
We used the sql tool dtd2mysql created by planar network to convert the files into a SQL database, then into the GTFS format. The dtd2mysql github page gives a guide on how to convert the data. This method used here was:
- Create a sql database with an appropriate name (e.g., 'train_database'). Note, this is easiest done under the root username with no password.
- Run the following in a new terminal/command line window within an appropriate directory:
DATABASE_USERNAME=root DATABASE_NAME=train_database dtd2mysql --timetable /path/to/ttisxxx.ZIP
- Run the following to download the GTFS files into the root directory:
DATABASE_USERNAME=root DATABASE_NAME=train_database dtd2mysql --gtfs-zip train_GTFS.zip
- As OpenTripPlanner (OTP) requires GTFS files to not be stored in subfolders in the GTFS zip file, extract the downloaded 'train_GTFS.zip' and navigate to the subfolder level where the txt files are kept, then zip these files to a folder with an appropriate name (e.g., 'train_GTFS.zip').
Note: if you are receiving a 'group_by' error, you will need to temporarily or permenantly disable 'ONLY_FULL_GROUP_BY'
in mysql.
The converted GTFS ZIP files may not work directly with OpenTripPlanner (OTP). Often this is caused by stops within the stop.txt file that are not handled by other parts of the GTFS feed, but there are other issues too, such as latitude and longitudes of stops being assigned to 0. In propeR we have created a function called cleanGTFS()
to clean and preprocess the GTFS files. To run:
#R
library(propeR)
cleanGTFS(gtfs.dir, gtfs.filename)
Where gtfs.dir
is the directory where the GTFS ZIP folder is located, and gtfs.filename
is the filename of the GTFS feed. This will create a new, cleaned GTFS ZIP folder in the same location as the old ZIP folder, but with the suffix '_new'. Run this function for each GTFS feed.