Skip to content
This repository has been archived by the owner on Jul 5, 2022. It is now read-only.

"freerdp_uniconv_out" error dealing with multi-byte characters under Windows #11

Open
cuzz opened this issue Apr 24, 2011 · 2 comments
Open
Milestone

Comments

@cuzz
Copy link

cuzz commented Apr 24, 2011

freerdp Version: v0.8.2-515-gc1f0fe0
OS: Windows Xp SP3 ( Chinese )
Symptoms: When connecting to a RDP Server from Windows( Chinese version), the wfreerdp process collapsed .
Analysis: When parameter str is multi-byte characters (for example, Chinese,Kerean, Japanese), the function *** freerdp_uniconv_out *** (libfreerdputils,unicode.c) deals in a wrong way.

The easiest way to solve the problem is to use the Windows API in the Windows environment:

#if ( defined(WINDOWS) || defined(_WIN32)   ||defined(_WINDOWS) )
    #include  <windows.h>
#endif

/* Convert str from DEFAULT_CODEPAGE to WINDOWS_CODEPAGE and return buffer like xstrdup.
 * Buffer is 0-terminated but that is not included in the returned length. */

char* freerdp_uniconv_out(UNICONV *uniconv, char *str, size_t *pout_len)
{
    size_t ibl = strlen(str), obl ; /* FIXME: worst case */
    char *pin = str,  *pout0 ;

#if ( defined(WINDOWS) || defined(_WIN32)   ||defined(_WINDOWS) )

    obl = MultiByteToWideChar (CP_ACP, 0, pin, -1, NULL, 0); 
    pout0 = xmalloc(obl*2 );
    MultiByteToWideChar (CP_ACP, 0, pin, -1, (wchar_t*)pout0, obl);
    pout0[obl*2-1]=0;
    pout0[obl*2-2]=0;
    *pout_len=obl*2-2;
#else

    char *pout;

    obl = 2 * ibl;
    pout0 = xmalloc(obl + 2);
    pout = pout0;

#ifdef HAVE_ICONV
    if (iconv(uniconv->out_iconv_h, (ICONV_CONST char **) &pin, &ibl, &pout, &obl) == (size_t) - 1)
    {
        printf("freerdp_uniconv_out: iconv() error\n");
        return NULL;
    }
#else
    while ((ibl > 0) && (obl > 0))
    {
        if ((signed char)(*pin) < 0)
        {
            return NULL;
        }
        *pout++ = *pin++;
        *pout++ = 0;
        ibl--;
        obl -= 2;
    }
#endif

    if (ibl > 0)
    {
        printf("freerdp_uniconv_out: string not fully converted - %d chars left\n", (int) ibl);
    }

    *pout_len = pout - pout0;
    *pout++ = 0;    /* Add extra double zero termination */
    *pout = 0;

#endif

    return pout0;
}

@otavio
Copy link
Contributor

otavio commented May 13, 2011

Have you been able to reproduce this issue using current master branch?

@cuzz
Copy link
Author

cuzz commented May 14, 2011

Yes, I have tested the current master branch, the issue still exists.
Moreover, the issue also exists in function "freerdp_uniconv_in".

Chinese characters do not necessarily account for every 2 bytes.
In fact, many characters occupy four bytes.

GB 18030-2000 is the new “compulsory” Chinese national standard.

http://en.wikipedia.org/wiki/GB_18030

The mandatory part of GB 18030-2005 consists of 1 byte and 2 byte encoding, together with 4 byte encoding for CJK Unified Ideographs Extension A. The corresponding Unicode code points of this subset lie entirely in the BMP.

In a move of historic significance for software supporting Unicode, the PRC decided to mandate support of certain code points outside the BMP. This means that software can no longer get away with treating characters as 16 bit fixed width entities (UCS-2). Therefore they must either process the data in a variable width format (such as UTF-8 or UTF-16), which are the most common choices, or move to a larger fixed width format (such as UCS-4 or UTF-32). Microsoft made the change from UCS-2 to UTF-16 with Windows 2000."

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants