Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update language list #158

Merged
merged 4 commits into from
Sep 24, 2024
Merged

Update language list #158

merged 4 commits into from
Sep 24, 2024

Conversation

kunfang98927
Copy link
Contributor

@kunfang98927 kunfang98927 commented Sep 17, 2024

By calling wikidata API, we will totally import 601 languages. This is the entire list of all wikidata's supported languages for "add a new name". There are two steps:

Step 1: Get a language code list by requesting the following data:

{
	"action": "query",
	"format": "json",
	"prop": "",
	"list": "",
	"meta": "siteinfo",
	"formatversion": "2",
	"siprop": "languages"
}

See https://www.wikidata.org/w/api.php?action=help&modules=query%2Bsiteinfo and find "languages" under "siprop":

image

Try this API in sandbox: https://www.wikidata.org/wiki/Special:ApiSandbox#action=query&format=json&prop=&list=&meta=siteinfo&formatversion=2&siprop=languages

This will return a language code list like this:

image

Each language in the returned list only has "code" and "autonym", but we also want its English label. So we need step 2.

Step 2: Get language info for each language

For example, if we want to get the info of English ("en"), French ("fr"), Chinese ("zh"), and Japanese ("ja"), we should request these data by calling wikidata API:

{
	"action": "query",
	"format": "json",
	"prop": "",
	"list": "",
	"meta": "languageinfo",
	"formatversion": "2",
	"liprop": "autonym|code|name",
	"licode": "en|fr|zh|ja"
}

The response is like this:

image

See https://www.wikidata.org/w/api.php?action=help&modules=query%2Blanguageinfo to get more details of the params.
image

Also you can try it in sandbox: https://www.wikidata.org/wiki/Special:ApiSandbox#action=query&format=json&prop=&list=&meta=languageinfo&formatversion=2&liprop=autonym%7Ccode%7Cname&licode=en%7Cfr%7Czh%7Cja

Please note that here the language codes are not bound to a QID, so I think we can remove the "wikidata_id" field in "Language" model. Another reason for removing the "wikidata_id" field is that when we add a new name for an instrument, wikidata "wbsetlabel" API only request the "language code" instead of "QID of the language":

Resolves: #157

- Remove "wikidata_id" field of Language model, beacuse the language codes supported by wikidata are not bind with a QID. See:  https://www.wikidata.org/w/api.php?action=help&modules=query%2Blanguageinfo
- In Django command "import_languages", get the wikidata "language code list" by calling wikidata API; then get the language details for each code also by calling wikidata API

Refs: #157
Copy link
Contributor

@dchiller dchiller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! And wow, thank you for finding the endpoint that gives supported languages :)

I made a small suggestion to a comment that you could accept before merging, but it's not really important.

I agree with getting rid of the QID for the languages. Do you think it is worth adding ISO codes as well? (Maybe in a different issue) Or the wikidata code ("en", "fr", etc.) is good enough?

…_languages.py


fix: replace "VIM" with "UMIL"

Co-authored-by: Dylan Hillerbrand <dhillerbrand@gmail.com>
@kunfang98927
Copy link
Contributor Author

Looks great! And wow, thank you for finding the endpoint that gives supported languages :)

I made a small suggestion to a comment that you could accept before merging, but it's not really important.

I agree with getting rid of the QID for the languages. Do you think it is worth adding ISO codes as well? (Maybe in a different issue) Or the wikidata code ("en", "fr", etc.) is good enough?

I think at least for "add new name", Wikidata code is enough. And since this set of Wikidata code is not bound to QID, it is difficult for us to determine what their corresponding ISO code is.

@kunfang98927 kunfang98927 merged commit 47c3464 into develop Sep 24, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update Languages for "add new instrument name" feature
2 participants