-
-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: ICU plural-case handling during Machine-Translation #2445
Conversation
Thanks a lot for the PR! 🎉 Looks like it breaks the tests. Also this feature will require some unit tests as well. |
Hey! Sorry It actually doesn't break the tests. It's only the report task which always fail for PRs. But ktlint fails. You have to run Anyway, adding unit tests testing this new functionality need to be done before merging this. Are you willing to do this or should we do this ourselves? |
It seems to me that my PR merely patches a side effect of the bug. Issue A Issue B
The case-values would be transformed to:
Note the 2 'few' case-values. Now I realize this is a contrived example, but I hope it illustrates that the transformation is not as innocent as it seems, it has side-effects in the sense that different cases in the original icu-statement will be mapped to the same case value. Issue C
So cases =0,=2,=3,=4 completely vanish during machine-translation. This means that 'Issue B' at the moment is merely a theoretical issue, because it never reaches the state of producing identical transformed case-values for different inputs. |
The root-cause seems to be that the following code:
... for target-locale "en-US" returns:
which leads to it producing translations for these, where these keys intersect with the plural-cases, causing it to drop all other plural-cases (like =0, =4, etc) and if it weren't for my fallback in this PR, it would drop '=1' too, as only 'other' is in both the targetExamples.keys and in the plural.forms.keys. |
Would it be a good idea to simply leave explicit-values as they are?
Passing the following (contrived) ICU-string to the translation-service now produces exactly the same output:
|
Yeah, this looks fine! |
Hey! As I am thinking about it now, maybe no transformation would indeed be better solution. The reason why I added the complex transformation was that I wanted to give the machine translators number example. But I forgot that the form is actually also pretty good example value for the machine translators. So now I believe, that maybe we can just use the variant name ( |
Oh! Sorry. As I am checking the code, I can see that the transformation actually is required. I will try fix the issue and add some tests. |
For many cases the source forms doesn't match the target forms. For example in English there is only one and other required, but the same string requires 4 forms to be provided in Czech language. That's why this transformation is necessary. I probably did the same thing as you by adding the exact forms to the data separately, so we got them provided as well. This should handle all the situations. #2454 I just need to test, how the UI will handle this situation. |
Thanks, I actually made a change already in the PR-code, as I noticed the same requirement, based on your unit-tests. You can see how I merge the examples/form cases. Having said that, your implementation is obviously cleaner, as this is my first attempt at kotlin. |
OKI doke, so I am closing this. Thanks for cooperation. :) |
ICU plural element mishandling explicit case values during translation
When submitting ICU to the machine-translation, it will correctly handle plural-cases 'one' and 'other', but not explicit value cases like '=1' and '=0'
Output B
outputs case 'one' for input case '=1'Output B
uses the value of case 'other' instead of the value of case '=1', and therefore yields '1 products', which is grammatically incorrect.Machine translation
As the ICU text is correctly stored in the database, and correctly shown in the web-interface, but is corrupted during the machine-translation, there must be a text-transformation just prior to making the call to the translation-provider.
During machine-translation, the following queries are sent to the provider.
Bug
The reason this is happening is that Tolgee rewrites plural-case '=1' to 'one' (similarly, it maps '=0' to 'zero'), and then tries to find 'one' in the original plural-element, which does not exist, as the plural-element contains the '=1' case. This causes it to fallback to the 'other' case.
Solution
Besides looking for the 'one' case, we should also look for the '=1' plural-case when doing the lookup.