Skip to content

Commit

Permalink
fix(js/string_util): u2b(): convert non-Big5 chars to A1BC (□) instea…
Browse files Browse the repository at this point in the history
…d of FFFD (non-Big5)

FFFD is not a valid Big5-UAO character but a Unicode character.
Also, it means `IAC DO <lacking option id>` in Telnet protocol
if the FF is not replaced by `IAC IAC` (escaped 0xFF).

To fix the issue, non-Big5-UAO chars are now converted into A1BC (Big5 '□').
Also, UTF-16 high surrogates are now ignored to make the char count consistent.

A1BC (Big5 '□') has been chosen for the following reasons:

* Visual feedback when the input has been received and processed.
* Non-ASCII code for preventing unwanted operations.

This makes all the following cases convert to a single A1BC (□).

 String        | UTF-16    | prev u2b()  | Telnet meaning
 ------------- | --------- | ----------- | --------------
'孒' (U+5B52)  | 5B52      | FF FD       | IAC DO
'𡤼' (U+2193C) | D846 DD3C | FF FD FF FD | IAC DO Extended-Options-List 'FD'
  • Loading branch information
IepIweidieng committed Dec 10, 2023
1 parent 33ef59d commit 5fe32e6
Showing 1 changed file with 3 additions and 3 deletions.
6 changes: 3 additions & 3 deletions src/js/string_util.js
Original file line number Diff line number Diff line change
Expand Up @@ -79,10 +79,10 @@ export function u2b(it) {
}
var pos = it.charCodeAt(i);
var hi = lib.u2bArray[2*pos], lo = lib.u2bArray[2*pos+1];
if (hi || lo)
if ((hi || lo) && hi < 0xff)
data += String.fromCharCode(hi) + String.fromCharCode(lo);
else // Not a big5 char
data += '\xFF\xFD';
else if (!(pos >= 0xd800 && pos <= 0xdbff)) // Not a big5 char nor a UTF-16 high surrogate
data += '\xA1\xBC'; // '□' (Big5)
}
return data;
};
Expand Down

0 comments on commit 5fe32e6

Please sign in to comment.