Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

document background & rationale #4

Merged
merged 1 commit into from
Mar 15, 2020
Merged

document background & rationale #4

merged 1 commit into from
Mar 15, 2020

Conversation

derhuerst
Copy link
Owner

@juliuste @pietercolpaert Is anything crucial missing? Keep in mind that this project is a prototype, so only the rationale is important to understand.

Copy link

@pietercolpaert pietercolpaert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I waited a bit to comment because I have more questions than answers. Here we go...

I would have expected an algorithm description of how to calculate such an ID and on the basis of what input data. Did I miss it somewhere?

I would also expect a comparison to the way onestop id works. However, I have problems with that approach as:

Reconciling a stop based on its description is something we should try to standardize (and maybe not call that an identifier?), but at least open up the possibility to have different kinds of descriptions based on the specific transport network as well as the specific back-end system’s scope.

Another project I find inspirational in that regards is this one: https://sharedstreets.io/. This type of location referencing could maybe also be a good idea for this project?

readme.md Outdated Show resolved Hide resolved
readme.md Outdated Show resolved Hide resolved
readme.md Outdated Show resolved Hide resolved
readme.md Outdated Show resolved Hide resolved
@pietercolpaert
Copy link

Just came across this Twitter post by @danbri: https://twitter.com/danbri/status/1220685660677443584 -- The discussion is very similar to this one. In my opinion we need two things: strive towards global identifiers to be used everywhere, if not possible, have standardized methods to reconcile.

@derhuerst
Copy link
Owner Author

I would have expected an algorithm description of how to calculate such an ID and on the basis of what input data. Did I miss it somewhere?

Maybe this isn't clear from reading the readme (yet). This project computes multiple IDs for an item: rather unstable & unprecise "stepping-stone" IDs, as well as those supposed to be stable and wide-spread in the feature.

What phrasing would you like to see to make this more clear?

I would also expect a comparison to the way onestop id works.

I didn't document this yet because, as mentioned in #2, there's no standalone reference implementation (or thorough documentation & set of test fixtures). Therefore, there's no reliable way to make sure the Transitland and stable-public-transport-ids implementations are compatible.

However, I have problems with that approach as it sets an arbitrary distance the stop may be temporarily moved (e.g., for maintenance) [...].

I agree! I think we could make it more stable by picking the precision based on the type/size of the structure, but it still doesn't solve the problem.

There's is a more fundamental underlying question though: For how long do you consider a moved stop to be the same as the "former" one?

[...] as false positive are likely to occur (cfr. transitland/transitland-datastore#945).

Definitely! I still think the goal of deterministic IDs is worth pursuing though. Even a stable-in-99%-of-the-cases ID scheme is an improvement over arbitrary vendor-specific IDs.

The identifiers themselves are descriptive: shouldn’t we then just use blank nodes with a description that can be used by a reconciliation algorithm at a later time?

I thought about that too!

By accident, this may end up becoming a general model/ontology for describing public transportation infrastructure though (and would then heavily overlap in scope with Transmodel, GTFS, OSM, Wikidata, so I'd like to tackle this later.

Reconciling a stop based on its description is something we should try to standardize [...].

I agree, but would like to work on this once I have gotten an idea of how well this works across Europe.

[Reconciling a stop based on its description is something we should] maybe not call that an identifier? [...]

What is the difference between an ID for an item made up of its properties, and a (minimal) list of properties describing it as uniquely & precisely as possible? Deterministic IDs blend the two, right?

[We should] at least open up the possibility to have different kinds of descriptions based on the specific transport network as well as the specific back-end system’s scope.

That is what I wanted to achieve with the list of IDs for a single item. I should emphasise this more if it didn't come across from the text.


Another project I find inspirational in that regards is this one: https://sharedstreets.io/. This type of location referencing could maybe also be a good idea for this project?

Very interesting project! Will play with it.

Do you want to use it to reference e.g. a road next to the station, or just reuse its concepts?

I stumbled upon this conceptually similar question in sharedstreets/sharedstreets-ref-system#23

To rephrase the question then – is providing stable IDs across separate or evolving basemaps within the purview of SharedStreets? If not, what is the recommended process for reconciling datasets that were mapmatched to different basemaps or to sharedstreets geometry tiles generated from planet builds many months/years apart?

Doesn't mean you can't compare them, but our view is that we should create a map of how IDs evolved and then generate application specific translations for old data.

@pietercolpaert
Copy link

By accident, this may end up becoming a general model/ontology for describing public transportation infrastructure though (and would then heavily overlap in scope with Transmodel, GTFS, OSM, Wikidata, so I'd like to tackle this later.

What is the difference between an ID for an item made up of its properties, and a (minimal) list of properties describing it as uniquely & precisely as possible? Deterministic IDs blend the two, right?

For exactly this reason I find “deterministic IDs” term confusing. When an ID must be determenistic, the fact that all systems need to adhere to the same structure, will break certain use cases. The fact that we want a decentralized approach is because different organizations have different political, organizational and cultural aspects to take into account.

I like the term “deterministic ID” when used in combination with a global identification system. E.g., a certain URL may follow a certain deterministic URI scheme (e.g., http://example.org/{agency}/{gtfs_version}/{gtfs_id}), but the fact that across servers you may not assume a certain meaning from the URI alone makes sure it works globally. A different server may for example use http://example.org/{agency}/{gtfs_stop_code}.

On shared streets: Do you want to use it to reference e.g. a road next to the station, or just reuse its concepts?

As stops are always attached to one or multiple roads, I find this an incredibly interesting idea. It would bring together geospatial referencing as well as a global ID approach.

@derhuerst
Copy link
Owner Author

As stops are always attached to one or multiple roads

Are they? This may not work all the time for watercraft stops & airports. I quickly looked up a minor airport which is only connected to the road network by a aeroway=taxiway.

@derhuerst
Copy link
Owner Author

What is the difference between an ID for an item made up of its properties, and a (minimal) list of properties describing it as uniquely & precisely as possible? Deterministic IDs blend the two, right?

For exactly this reason I find “deterministic IDs” term confusing. When an ID must be determenistic, the fact that all systems need to adhere to the same structure, will break certain use cases. The fact that we want a decentralized approach is because different organizations have different political, organizational and cultural aspects to take into account.

I assume we only have a different interpretations of the terms we used, not of the concepts.

If two systems shared a "deterministic ID", they would have to use the same structure. If they needed different structures, they would yield different IDs.

I like the term “deterministic ID” when used in combination with a global identification system. E.g., a certain URL may follow a certain deterministic URI scheme (e.g., http://example.org/{agency}/{gtfs_version}/{gtfs_id}), but the fact that across servers you may not assume a certain meaning from the URI alone makes sure it works globally. A different server may for example use http://example.org/{agency}/{gtfs_stop_code}.

In my understanding, the two URLs that you mentioned contain one ID each. {agency}/{gtfs_version}/{gtfs_id} would be one "ID scheme"/structure, {agency}/{gtfs_stop_code} would ne another one.

@pietercolpaert
Copy link

Are they? This may not work all the time for watercraft stops & airports. I quickly looked up a minor airport which is only connected to the road network by a aeroway=taxiway.

Still has a location where you should be dropped off, right? I guess the location I’m mostly interested in is where one must enter?

@derhuerst
Copy link
Owner Author

Still has a location where you should be dropped off, right? I guess the location I’m mostly interested in is where one must enter?

I think we have a misunderstanding here. 😅

While I agree from a end-user UX point of view, this is about getting the IDs as widely adopted & stable as possible, so IMO the priorities are slightly different:

As an example, there might be three data sets with 1) general station info and connections between them, 2) where & how to enter each station, with accessibility info, and 3) if the elevators at each station are working. Each of these 3 APIs considers different "aspects" of the train station to be important for identifying it, but probably all know roughly about its location, name and size.

When combining all three data sets using some kind of shared ID, we enable the UX that end users want & need: just getting to know how to get from the street nearby into the train to their destination.

@derhuerst
Copy link
Owner Author

Let's continue discussing this over at public-transport/why-linked-open-transit-data#1.

@derhuerst derhuerst merged commit 536a91c into master Mar 15, 2020
@derhuerst derhuerst deleted the rationale branch March 15, 2020 13:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants