Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating examples to use names instead of integer ids whenever possible #1070

Open
wants to merge 3 commits into
base: develop
Choose a base branch
from

Conversation

ArlindKadra
Copy link
Member

@ArlindKadra ArlindKadra commented May 3, 2021

Reference Issue

#1066, #1063

What does this PR implement/fix? Explain your changes.

  • Uses names instead of integer ids in the examples whenever possible.
  • Update documentation for get_dataset
  • Check implementation of examples for datasets and splits, update them if necessary in case they do not show that they can be accessed with names too.

How should this PR be tested?

Existing unit tests.

@ArlindKadra ArlindKadra requested a review from mfeurer May 3, 2021 18:02
@ArlindKadra ArlindKadra removed the request for review from mfeurer May 3, 2021 18:53
Comment on lines +371 to +372
Dataset ID of the dataset to download. It can be an integer or it can be a string
of the dataset name.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Dataset ID of the dataset to download. It can be an integer or it can be a string
of the dataset name.
The ID or name of the dataset to download.

Since ID's are parsed to integer anyway if they're given as strings.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PGijsbers I can also do that, however, I tried to keep it consistent with

dataset_ids : iterable
Integers or strings representing dataset ids or dataset names.
If dataset names are specified, the least recent still active dataset version is returned.
at get_datasets

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the difference is fine, or should favor the simpler wording.

@ArlindKadra
Copy link
Member Author

@PGijsbers @mfeurer While making the modifications I noticed a study that actually had no alias, can that be a case?
And also can we add an alias to it? It was at the 30_extended/study_tutorial, study https://www.openml.org/s/123

@PGijsbers
Copy link
Collaborator

An alias is not required, but I'll ask people with access if they're prepared to add one (I don't believe there's an API call for it).

@ArlindKadra
Copy link
Member Author

An alias is not required, but I'll ask people with access if they're prepared to add one (I don't believe there's an API call for it).

Ah thanks. Yeah, then maybe it is not worth the effort. I can just keep it like that and write a comment at the tutorial like we discussed for the datasets.

@mfeurer
Copy link
Collaborator

mfeurer commented May 6, 2021

Hey, what do you think about adding the fact that a user should always use names and version instead of the IDs when creating examples to the pull request template? (Also maybe that a warning should be emitted when using the test server, although that's not really related to this PR)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants