-
Notifications
You must be signed in to change notification settings - Fork 0
GoStrings
Andrew Gerrand edited this page Dec 10, 2014
·
1 revision
Strings are not required to be UTF-8. Go source code is required to be UTF-8. There is a complex path between the two.
In short, there are three kinds of strings. They are:
- the substring of the source that lexes into a string literal.
- a string literal.
- a value of type string.
Only the first is required to be UTF-8. The second is required to be written in UTF-8, but its contents are interpreted various ways and may encode arbitrary bytes. The third can contain any bytes at all.
Try this on:
var s string = "\xFF語"
Source substring: "\xFF語"
, UTF-8 encoded. The data:
22
5c
78
46
46
e8
aa
9e
22
String literal: \xFF語
(between the quotes). The data:
5c
78
46
46
e8
aa
9e
The string value (unprintable; this is a UTF-8 stream). The data:
ff
e8
aa
9e
And for record, the characters (code points):
<erroneous byte FF, will appear as U+FFFD if you range over the string value>
語 U+8a9e
- Home
- Getting started with Go
- Working with Go
- Learning more about Go
- The Go Community
- Using the Go toolchain
- Additional Go Programming Wikis
- Online Services that work with Go
- Troubleshooting Go Programs in Production
- Contributing to the Go Project
- Platform Specific Information
- Release Specific Information