Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

document background & rationale #4

Merged
merged 1 commit into from
Mar 15, 2020
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
102 changes: 76 additions & 26 deletions readme.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,29 @@
# stable-public-transport-ids

**Get stable IDs for public transport stations, etc.** Currently very biased towards German [GTFS](https://developers.google.com/transit/gtfs/) & [`hafas-client`](https://github.com/public-transport/hafas-client) data.
**Get stable IDs for public transport data.**

[![npm version](https://img.shields.io/npm/v/@derhuerst/stable-public-transport-ids.svg)](https://www.npmjs.com/package/@derhuerst/stable-public-transport-ids)
[![build status](https://api.travis-ci.org/derhuerst/stable-public-transport-ids.svg?branch=master)](https://travis-ci.org/derhuerst/stable-public-transport-ids)
![ISC-licensed](https://img.shields.io/github/license/derhuerst/stable-public-transport-ids.svg)
[![chat with me on Gitter](https://img.shields.io/badge/chat%20with%20me-on%20gitter-512e92.svg)](https://gitter.im/derhuerst)
[![support me on Patreon](https://img.shields.io/badge/support%20me-on%20patreon-fa7664.svg)](https://patreon.com/derhuerst)

[*Why linked open transit data?*](https://github.com/public-transport/why-linked-open-transit-data) explains the background of this project:

> Because public transportation *data* reflects strongly interconnected public transportation *systems*, it has many links. **When data by an author/source "A" refers to data from *another* author/source "B", it needs a reliable and precise way to identify items in "B" data.** In federated systems, especially in [linked data](https://en.wikipedia.org/wiki/Linked_data) systems, the need for stable & globally unique IDs is even more significant than in traditional, centralized systems.

**This project explores how to derive such IDs *from the data itself* in a deterministic way**. There is an inherent trade-off: In order to prevent collisions, the input data (which the ID will be computed from) must be quite detailed; On the other hand, for these IDs to be easily computable (e.g. offline), only little data should have to be transferred & stored.

It is an ongoing process of

- generalising them enough to support all relevant kinds of public transportation infrastructure,
- designing them to identify local infrastructure precisely (enough),
- finding and "covering" enough edge cases for the standard to be practical in real-world szenarios,
- formally describing, testing and implementing the standard in multiple evironments & programming languages.

**In addition, we use indeterministic but well-known (and thus rather stable) identifiers**, such as [Wikidata](https://wikidata.org/) IDs, to work as a "stepping-stone" until the deterministic IDs have widespread adoption.

**This project computes *multiple* IDs per item, with a varying degree of precision (and thus uniqueness), stability and reusability.** Refer to the [*Usage* section](#usage) for more.

## Installation

@@ -18,13 +34,39 @@ npm install @derhuerst/stable-public-transport-ids

## Usage

```js
const getStopIds = require('@derhuerst/stable-public-transport-ids/stop')
const getLineIds = require('@derhuerst/stable-public-transport-ids/line')
const getArrDepIds = require('@derhuerst/stable-public-transport-ids/arrival-departure')
*Note:* This project is currently strongly biased towards *German* [GTFS](https://developers.google.com/transit/gtfs/) & [`hafas-client`](https://github.com/public-transport/hafas-client) data.

For each supported "type", this package exposes a function that generates a list of IDs. If any of these match any ID of another item (of the same type), they can be considered the same.

As an example, the function `areStopsTheSame` checks if two stops are the same:

```js
// This string will be used for all non-globally-unique pieces
// of identifying information (e.g. IDs from the provider).
// You could use the canonical abbreviation of the transit operator.
const dataSource = 'some-data-source'
const normalizeName = name => name.toLowerCase().trim()

// The following implementation is simplified for demonstration purposes.
// In practice, it should handle as many cases as possible:
// - normalize various Unicode chars to ASCII
// - remove inconsistent spaces
// - remove vendor-/API-specific prefixes & suffixes
const normalizeName = name => name.toLowerCase().trim().replace(/\s+/, '-')

const createGetStopIds = require('@derhuerst/stable-public-transport-ids/stop')
const getStopIds = createGetStopIds(dataSource, normalizeName)

const areStopsTheSame = (stopA, stopB) => {
const idsForA = getStopIds(stopA)
return getStopIds(stopB).some(idForB => idsForA.includes(idForB))
}
```

We can generate IDs for stops, lines & departures/arrivals as follows:

```js
const createGetLineIds = require('@derhuerst/stable-public-transport-ids/line')
const createGetArrDepIds = require('@derhuerst/stable-public-transport-ids/arrival-departure')

const stop = {
type: 'station',
@@ -36,8 +78,13 @@ const stop = {
longitude: 13.303846
}
}
const stopIds = getStopIds(dataSource, normalizeName)(stop)
const stopIds = getStopIds(stop)
console.log(stopIds)
// [
// '1:some data source:900000024101',
// '1:s charlottenburg:52.50:13.30'
// …
// ]

const line = {
type: 'line',
@@ -46,8 +93,14 @@ const line = {
public: true,
name: 'S9'
}
const lineIds = getLineIds(dataSource, normalizeName)(line)
const getLineIds = createGetLineIds(dataSource, normalizeName)
const lineIds = getLineIds(line)

console.log(lineIds)
// [
// '1:some data source:18299',
// '1:suburban:s9'
// ]

const dep = {
tripId: 'trip-12345',
@@ -62,26 +115,23 @@ const dep = {
}
const routeIds = []
const tripIds = [dep.tripId]
const getIds = getArrDepIds(stopIds, tripIds, routeIds, lineIds, normalizeName)
console.log(getIds('departure', dep))
const getArrDepIds = createGetArrDepIds(stopIds, tripIds, routeIds, lineIds, normalizeName)

console.log(getArrDepIds('departure', dep))
// [
// '1:departure:1:some data source:900000024101:trip-12345',
// '1:departure:1:s charlottenburg:52.50:13.30:trip-12345'
// …
// ]
```

```js
[
'1:some data source:900000024101',
'1:s charlottenburg:52.50:13.30'
// …
]
[
'1:some data source:18299',
'1:suburban:s9'
]
[
'1:departure:1:some data source:900000024101:trip-12345',
'1:departure:1:s charlottenburg:52.50:13.30:trip-12345'
// …
]
```

## Related

- [*Why linked open transit data?*](https://github.com/public-transport/why-linked-open-transit-data)
- [Linked Connections](https://linkedconnections.org)
- [TransitLand Onestop ID scheme](https://transit.land/documentation/onestop-id-scheme/)
- [*Identification of Fixed Objects in Public Transport (IFOPT)* ](https://en.wikipedia.org/wiki/Identification_of_Fixed_Objects_in_Public_Transport) [CEN](https://en.wikipedia.org/wiki/European_Committee_for_Standardization) spec


## Contributing