Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle array access syntax on strings with emoji #34

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jbender
Copy link
Collaborator

@jbender jbender commented Apr 11, 2016

Emojis pose a problem for array-like access of a string. If you try to
grab one register you'll get am error: "You can't cut a surrogate in two in
an encoding that is not UTF-16 (IndexError)"
Calling split(''), which splits the string into array of characters, works
correctly even with emojis. So to make this method work as expected for
strings, first split it then join.

@colinta
Copy link
Contributor

colinta commented Apr 12, 2016

I'm nervous about overriding String#[], but not opposed. Maybe we could add some more specs in there, to make sure the default behavior is "as expected?" (I didn't look in the existing specs to see if those are already tested, I just looked at the diff)

@tkadauke
Copy link
Contributor

Few questions:

  1. How is the performance compared to vanilla String#[]?
  2. Have you considered adding a new method (e.g. utf8_char_at, or maybe just at)?
  3. Maybe it's a good idea to subclass String instead (UTF8String?) and then override the [] operator?
  4. Shouldn't this be fixed in RubyMotion instead?

@@ -101,4 +101,33 @@ def last(limit = 1)
from(-limit)
end
end

# Emojis pose a problem for array-like access of a string. If you try to
# grab one register you'll get am error: "You can't cut a surrogate in two in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: "you'll get am error"

@jbender
Copy link
Collaborator Author

jbender commented Apr 12, 2016

@tkadauke can't speak to the performance of it except to say that I've been using it in production for a few months now with no complaints. Do you have any specific tests you'd like to perform on it?

I'm a proponent of the "it should just work" principle, so I'd be opposed to making this its own method. In doing so you're forcing people to know if a string may contain an emoji at any point rather than just making sure they're always safe.

It'd be perfectly reasonable for this to be handled by RubyMotion itself (indeed probably preferred), but I happen to have an inside track to fix it here so thought I'd propose. 😁

@tkadauke
Copy link
Contributor

tkadauke commented Aug 18, 2016

Sorry to not get back to you in a long time. It just occurred to me that as a compromise, we can make this opt-in. E.g. we can have a class method on String like:

class String
  def self.enable_emoji_support
    @@emoji_support = true
  end

  def [](*args)
    return bracket_access_original(*args) unless @@emoji_support
    # ...
  end
end

The reason for that is a behavior change in getting the n-th character from a string can lead to catastrophic results. Ruby was plagued with this ever since Unicode encodings became popular.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants