Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace raw "language code" with a popup #790

Closed
wants to merge 1 commit into from
Closed

Conversation

fatih-erikli
Copy link
Collaborator

Fixes #746.

@justvanrossum
Copy link
Collaborator

Where did you get the list from? It is probably not complete, right?

@fatih-erikli
Copy link
Collaborator Author

fatih-erikli commented Sep 11, 2023

I googled BCP-47 language tags. Where should we get the complete list? @justvanrossum.

@justvanrossum
Copy link
Collaborator

Where should we get the complete list?

I was hoping you could help finding out. The spec is apparently here: https://www.rfc-editor.org/info/bcp47 but I don't see a "list of codes + names". Wakamaifondue has this list:

https://github.com/Wakamai-Fondue/wakamai-fondue-engine/blob/master/src/tools/ot-to-html-lang.js

It also contains the OpenType tag for each language, which we don't need per se.

Please check that repository how that file was made: it would be nice if we could have a script that generates the needed data.

@justvanrossum
Copy link
Collaborator

Actually, the OpenType tag would be useful of we ever parse the language information from the actual fonts. Maybe we should just use ot-to-html-lang.js as is.

(While parsing the font would be best, I'm worried about performance for big fonts.)

@fatih-erikli
Copy link
Collaborator Author

This is too much. I am not sure if Anglo-Saxon or Berber should be included in this list.

@fatih-erikli
Copy link
Collaborator Author

This list looks more realistic https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes

@fatih-erikli
Copy link
Collaborator Author

There is a language code list in googlefonts page
https://fonts.google.com/knowledge/using_type/language_support_in_fonts

Belarusian is missing that I notice

@fatih-erikli
Copy link
Collaborator Author

Maybe parsing the font could be done in Python side if we are worried about the performance.

@justvanrossum
Copy link
Collaborator

  1. We need to set the lang attribute, so it has to be BCP-47.
  2. The list is not "too much", but we may need a search-like function. Perhaps <select> is the wrong UI for this. Unless we parse the fonts.
  3. Please try to find out how wakamaifondue parses the language tags from fonts and converts them to human friendly names and BCP-47 codes. It likely uses lib-font, and I'd be curious to see how it can be used.
  4. Reference-font is a client-side feature by design. Uploading to the server is not an option.

(I do worry about the inbalance between the effort of implementing this feature versus its (low) priority.)

@fatih-erikli
Copy link
Collaborator Author

I tried wakamaifoundue, it isn't working well. Try it with GoogleSans.

@fatih-erikli
Copy link
Collaborator Author

fatih-erikli commented Sep 12, 2023

Wakamai fondue worked well with IBM Flex Sans font.

It returns these:
Afrikaans, Albanian, Azerbaijani, Basque, Belarusian, Bosnian, Bulgarian, Catalan, Croatian, Czech, Danish, Dutch, English, Estonian, Faroese, Filipino, Finnish, French, Galician, German, Greek, Hungarian, Icelandic, Indonesian, Irish, Italian, Kazakh, Latvian, Lithuanian, Macedonian, Malay, Mongolian, Norwegian Bokmål, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swahili, Swedish, Tongan, Turkish, Ukrainian, Uzbek, Vietnamese, Welsh, Zulu

If we are going to parse the font, as you suggested, we can lookup these language names in ot-to-html-lang.js and get the language codes of them. We can show the result in UI in

  • Select
  • Textinput with datalist element example
  • Select with search input example

@fatih-erikli
Copy link
Collaborator Author

The simplest solution looks like defining all the language codes in ot-to-html-lang.js as datalist element for the textinput.

@justvanrossum
Copy link
Collaborator

I tried wakamaifoundue, it isn't working well.

Can you be more specific about what didn't work well?

@fatih-erikli
Copy link
Collaborator Author

It didn't work with the font I tried (Google Sans). Supported languages were empty in the result.

@justvanrossum
Copy link
Collaborator

It didn't work with the font I tried

You tried it on the wakamaifondue beta site? It worked for me with the subset, but froze on the full font (I could perhaps have completed eventually, but I didn't have the patience.)

I'm pretty sure Wakamaifondue does a lot more font parsing than only getting the languages out, and I'd like to know how long it would take to do just that. Can you find the code in Wakamaifondue that is responsible for parsing the languages, and adapt it for a test we can do ourselves?

@fatih-erikli
Copy link
Collaborator Author

You tried it on the wakamaifondue beta site?

Yes

Can you find the code in Wakamaifondue that is responsible for parsing the languages

It reads "GSUB" table of the binary font. Full file read is required.

https://github.com/Wakamai-Fondue/wakamai-fondue-engine/blob/master/src/fondue/Fondue.js#L964

In my opinion, full-binary read should be in one side, either in Python or Javascript.

@justvanrossum
Copy link
Collaborator

Can you find the code in Wakamaifondue that is responsible for parsing the languages

It reads "GSUB" table of the binary font. Full file read is required.

https://github.com/Wakamai-Fondue/wakamai-fondue-engine/blob/master/src/fondue/Fondue.js#L964

Ah great. Check also:

https://github.com/Wakamai-Fondue/wakamai-fondue-engine/blob/326729ffdbabefa3001b596391e16a906732481c/src/fondue/Fondue.js#L136-L138

In my opinion, full-binary read should be in one side, either in Python or Javascript.

Like I said before, reference-font is a client-side feature, so JS it is.

I would love to know how long it takes for a big (GS) font to extract the languages in a browser.

@fatih-erikli
Copy link
Collaborator Author

I delete this PR, start a new one.

@fatih-erikli fatih-erikli deleted the issue-746 branch September 13, 2023 14:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[reference font] replace raw "language code" with a popup
2 participants