document background & rationale #4

derhuerst · 2020-01-21T13:29:57Z

@juliuste @pietercolpaert Is anything crucial missing? Keep in mind that this project is a prototype, so only the rationale is important to understand.

pietercolpaert

I waited a bit to comment because I have more questions than answers. Here we go...

I would have expected an algorithm description of how to calculate such an ID and on the basis of what input data. Did I miss it somewhere?

I would also expect a comparison to the way onestop id works. However, I have problems with that approach as:

It sets an arbitrary distance the stop may be temporarily moved (e.g., for maintenance)
false positive are likely to occur (cfr. Still more "false positive" issues with dist calc transitland/transitland-datastore#945)
The identifiers themselves are descriptive: shouldn’t we then just use blank nodes with a description that can be used by a reconciliation algorithm at a later time?

Reconciling a stop based on its description is something we should try to standardize (and maybe not call that an identifier?), but at least open up the possibility to have different kinds of descriptions based on the specific transport network as well as the specific back-end system’s scope.

Another project I find inspirational in that regards is this one: https://sharedstreets.io/. This type of location referencing could maybe also be a good idea for this project?

readme.md

pietercolpaert · 2020-01-24T13:55:18Z

Just came across this Twitter post by @danbri: https://twitter.com/danbri/status/1220685660677443584 -- The discussion is very similar to this one. In my opinion we need two things: strive towards global identifiers to be used everywhere, if not possible, have standardized methods to reconcile.

derhuerst · 2020-01-25T17:06:16Z

I would have expected an algorithm description of how to calculate such an ID and on the basis of what input data. Did I miss it somewhere?

Maybe this isn't clear from reading the readme (yet). This project computes multiple IDs for an item: rather unstable & unprecise "stepping-stone" IDs, as well as those supposed to be stable and wide-spread in the feature.

What phrasing would you like to see to make this more clear?

I would also expect a comparison to the way onestop id works.

I didn't document this yet because, as mentioned in #2, there's no standalone reference implementation (or thorough documentation & set of test fixtures). Therefore, there's no reliable way to make sure the Transitland and stable-public-transport-ids implementations are compatible.

However, I have problems with that approach as it sets an arbitrary distance the stop may be temporarily moved (e.g., for maintenance) [...].

I agree! I think we could make it more stable by picking the precision based on the type/size of the structure, but it still doesn't solve the problem.

There's is a more fundamental underlying question though: For how long do you consider a moved stop to be the same as the "former" one?

[...] as false positive are likely to occur (cfr. transitland/transitland-datastore#945).

Definitely! I still think the goal of deterministic IDs is worth pursuing though. Even a stable-in-99%-of-the-cases ID scheme is an improvement over arbitrary vendor-specific IDs.

The identifiers themselves are descriptive: shouldn’t we then just use blank nodes with a description that can be used by a reconciliation algorithm at a later time?

I thought about that too!

By accident, this may end up becoming a general model/ontology for describing public transportation infrastructure though (and would then heavily overlap in scope with Transmodel, GTFS, OSM, Wikidata, so I'd like to tackle this later.

Reconciling a stop based on its description is something we should try to standardize [...].

I agree, but would like to work on this once I have gotten an idea of how well this works across Europe.

[Reconciling a stop based on its description is something we should] maybe not call that an identifier? [...]

What is the difference between an ID for an item made up of its properties, and a (minimal) list of properties describing it as uniquely & precisely as possible? Deterministic IDs blend the two, right?

[We should] at least open up the possibility to have different kinds of descriptions based on the specific transport network as well as the specific back-end system’s scope.

That is what I wanted to achieve with the list of IDs for a single item. I should emphasise this more if it didn't come across from the text.

Another project I find inspirational in that regards is this one: https://sharedstreets.io/. This type of location referencing could maybe also be a good idea for this project?

Very interesting project! Will play with it.

Do you want to use it to reference e.g. a road next to the station, or just reuse its concepts?

I stumbled upon this conceptually similar question in sharedstreets/sharedstreets-ref-system#23

To rephrase the question then – is providing stable IDs across separate or evolving basemaps within the purview of SharedStreets? If not, what is the recommended process for reconciling datasets that were mapmatched to different basemaps or to sharedstreets geometry tiles generated from planet builds many months/years apart?

Doesn't mean you can't compare them, but our view is that we should create a map of how IDs evolved and then generate application specific translations for old data.

pietercolpaert · 2020-01-26T19:34:48Z

By accident, this may end up becoming a general model/ontology for describing public transportation infrastructure though (and would then heavily overlap in scope with Transmodel, GTFS, OSM, Wikidata, so I'd like to tackle this later.

What is the difference between an ID for an item made up of its properties, and a (minimal) list of properties describing it as uniquely & precisely as possible? Deterministic IDs blend the two, right?

For exactly this reason I find “deterministic IDs” term confusing. When an ID must be determenistic, the fact that all systems need to adhere to the same structure, will break certain use cases. The fact that we want a decentralized approach is because different organizations have different political, organizational and cultural aspects to take into account.

I like the term “deterministic ID” when used in combination with a global identification system. E.g., a certain URL may follow a certain deterministic URI scheme (e.g., http://example.org/{agency}/{gtfs_version}/{gtfs_id}), but the fact that across servers you may not assume a certain meaning from the URI alone makes sure it works globally. A different server may for example use http://example.org/{agency}/{gtfs_stop_code}.

On shared streets: Do you want to use it to reference e.g. a road next to the station, or just reuse its concepts?

As stops are always attached to one or multiple roads, I find this an incredibly interesting idea. It would bring together geospatial referencing as well as a global ID approach.

derhuerst · 2020-01-27T13:47:38Z

As stops are always attached to one or multiple roads

Are they? This may not work all the time for watercraft stops & airports. I quickly looked up a minor airport which is only connected to the road network by a aeroway=taxiway.

derhuerst · 2020-01-27T14:06:33Z

What is the difference between an ID for an item made up of its properties, and a (minimal) list of properties describing it as uniquely & precisely as possible? Deterministic IDs blend the two, right?

For exactly this reason I find “deterministic IDs” term confusing. When an ID must be determenistic, the fact that all systems need to adhere to the same structure, will break certain use cases. The fact that we want a decentralized approach is because different organizations have different political, organizational and cultural aspects to take into account.

I assume we only have a different interpretations of the terms we used, not of the concepts.

If two systems shared a "deterministic ID", they would have to use the same structure. If they needed different structures, they would yield different IDs.

I like the term “deterministic ID” when used in combination with a global identification system. E.g., a certain URL may follow a certain deterministic URI scheme (e.g., http://example.org/{agency}/{gtfs_version}/{gtfs_id}), but the fact that across servers you may not assume a certain meaning from the URI alone makes sure it works globally. A different server may for example use http://example.org/{agency}/{gtfs_stop_code}.

In my understanding, the two URLs that you mentioned contain one ID each. {agency}/{gtfs_version}/{gtfs_id} would be one "ID scheme"/structure, {agency}/{gtfs_stop_code} would ne another one.

pietercolpaert · 2020-01-27T14:09:13Z

Are they? This may not work all the time for watercraft stops & airports. I quickly looked up a minor airport which is only connected to the road network by a aeroway=taxiway.

Still has a location where you should be dropped off, right? I guess the location I’m mostly interested in is where one must enter?

derhuerst · 2020-01-31T16:25:00Z

Still has a location where you should be dropped off, right? I guess the location I’m mostly interested in is where one must enter?

I think we have a misunderstanding here. 😅

While I agree from a end-user UX point of view, this is about getting the IDs as widely adopted & stable as possible, so IMO the priorities are slightly different:

As an example, there might be three data sets with 1) general station info and connections between them, 2) where & how to enter each station, with accessibility info, and 3) if the elevators at each station are working. Each of these 3 APIs considers different "aspects" of the train station to be important for identifying it, but probably all know roughly about its location, name and size.

When combining all three data sets using some kind of shared ID, we enable the UX that end users want & need: just getting to know how to get from the street nearby into the train to their destination.

taken from derhuerst/stable-public-transport-ids#4

derhuerst · 2020-03-04T12:08:23Z

Let's continue discussing this over at public-transport/why-linked-open-transit-data#1.

pietercolpaert suggested changes Jan 22, 2020

View reviewed changes

readme.md Outdated Show resolved Hide resolved

readme.md Outdated Show resolved Hide resolved

readme.md Outdated Show resolved Hide resolved

readme.md Outdated Show resolved Hide resolved

derhuerst added a commit to public-transport/why-linked-open-transit-data that referenced this pull request Mar 4, 2020

move text from stable-public-transport-ids rationale here

49390ec

taken from derhuerst/stable-public-transport-ids#4

derhuerst mentioned this pull request Mar 4, 2020

identification using fixed-structure stable IDs vs descriptive properties public-transport/why-linked-open-transit-data#1

Open

derhuerst force-pushed the master branch from 4d895dc to a76a5ca Compare March 6, 2020 17:15

readme: add rationale, better Usage section 📝

9692139

derhuerst force-pushed the rationale branch from 4618cad to 9692139 Compare March 15, 2020 13:48

derhuerst merged commit 536a91c into master Mar 15, 2020

derhuerst deleted the rationale branch March 15, 2020 13:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

document background & rationale #4

document background & rationale #4

derhuerst commented Jan 21, 2020

pietercolpaert left a comment

pietercolpaert commented Jan 24, 2020

derhuerst commented Jan 25, 2020

pietercolpaert commented Jan 26, 2020

derhuerst commented Jan 27, 2020

derhuerst commented Jan 27, 2020

pietercolpaert commented Jan 27, 2020

derhuerst commented Jan 31, 2020

derhuerst commented Mar 4, 2020

document background & rationale #4

document background & rationale #4

Conversation

derhuerst commented Jan 21, 2020

pietercolpaert left a comment

Choose a reason for hiding this comment

pietercolpaert commented Jan 24, 2020

derhuerst commented Jan 25, 2020

pietercolpaert commented Jan 26, 2020

derhuerst commented Jan 27, 2020

derhuerst commented Jan 27, 2020

pietercolpaert commented Jan 27, 2020

derhuerst commented Jan 31, 2020

derhuerst commented Mar 4, 2020