-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Unicode support #2794
Improve Unicode support #2794
Conversation
b3ac633
to
71fa683
Compare
03401a2
to
7ac278c
Compare
015d7ae
to
b17ccae
Compare
These are intended to replace non-UTF-16 uses of mbstowcs() / wcstombs()
These are intended to replace UTF-16 uses of mbstowcs() / wcstombs()
8219c69
to
6671816
Compare
Because of the way UTF-8 encoding works, there is no need to use mbstowcs/wcstombs in the implementation of this function.
These calls are replaced with the newer UTF-16 parsing code withing the parse module
These calls are now replaced with explicit UTF conversion routines in the common/string_calls.[hc] and common/parse.[hc] modules. Also removed:- - The support code in common/os_calls.c to set the locale to use these routines. - The twchar type in arch.h
6671816
to
f5f67e2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, LGTM. Let's fit it later if something is not working well.
Thanks @metalefty @firewave is working on #2829 at the moment which is getting there but will be quite a disruptive merge. It involves small mods to lots of files.. Let's get that one in first so I don't mess up his workflow. |
As this is ready to merge - put it in. I have no issues with rebasing/conflicts and the other PR still needs work. No need to postpone other (more substantial) changes because of it. |
Thanks @firewave - I'll do that then. It might simplify the work I'm doing with smartcards. |
Updated 2023-10-20: Ready for review
Fixes #2603 #942
I've been looking into the way we handle Unicode in general.
We've been using
g_mbstowcs()
andg_wcstombs()
to handle conversions between UTF-8 and both UTF-16 and UTF-32.These calls suffer from a number of problems when used for this purpose:-
wchar_t
is not portable between platforms.UTF-16 is used for Windows communication, and UTF-32 is used for other reasons, mostly related to font handling in the login window code.
This PR moves UTF-16 support into the Windows marshalling and unmarshalling code (common/parse.[hc]), and changes everything else to use UTF-8 mostly with UTF-32 conversions applied where needed.
New routines are provided to handle the conversions which are robust when presented with incorrect data. These routines are locale-independent. Extensive unit tests have been added for these.
The types
char32_t
andchar16_t
are used for UTF-32 and UTF-16 characters respectively. These are back-ported from C11.The calls
g_mbstowcs()
andg_wcstombs()
and the typetwchar
are no longer required and have been removed. There is one remaining use of a barembstowcs()
call in the genkeymap tool which I haven't touched as it seems to be the best way to achieve what is required.