Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Foreign characters may cause misalignment in UI #10

Open
danielpclark opened this issue Feb 13, 2017 · 1 comment
Open

Foreign characters may cause misalignment in UI #10

danielpclark opened this issue Feb 13, 2017 · 1 comment

Comments

@danielpclark
Copy link
Owner

danielpclark commented Feb 13, 2017

When other locales are added it's important to note that not all ascii characters are the same width. Japanese characters have double width in most (maybe all) cases which will cause the right aligned text to be off by that extra amount. Figure a way to calculate actual character width for alignment.

Create tests with the new alignment logic by demonstrating a string with a word of characters that use more than the standard width and test that the alignment should be with the methods output of that word. So for a given character display width for the output be able to align and provide balance between left, center, and right without exceeding the display width from unusually sized characters.

@danielpclark
Copy link
Owner Author

This gem unicode-display_width may be a possible solution, but the "marshall" data may be incompatible across Ruby versions.

This Rust crate unicode_width would work well with a Ruby gem wrapped around it.

This S.O. answer has Japanese related ordinal info: http://stackoverflow.com/a/15651264/1500195

Excerpt of that:

# -*- coding: utf-8 -*-

def is_halfwidth_katakana(c)
  return (c.ord >= 0xff61 and c.ord <= 0xff9f)
end

def is_fullwidth_katakana(c)
  return (c.ord >= 0x30a0 and c.ord <= 0x30ff)
end

def is_halfwidth_roman(c)
  return (c.ord >= 0x21 and c.ord <= 0x7e)
end

def is_fullwidth_roman(c)
  return (c.ord >= 0xff01 and c.ord <= 0xff60)
end

def is_hiragana(c)
  return (c.ord >= 0x3041 and c.ord <= 0x309f)
end

def is_kanji(c)
  return (c.ord >= 0x4e00 and c.ord <= 0x9fcc)
end

text = "Hello World、こんにちは、半角カタカナ、全角カタカナ、fullwidth 0-9\n"

text.split("").each do |c|
  if is_halfwidth_katakana(c)
    type = "halfwidth katakana"
  elsif is_fullwidth_katakana(c)
    type = "fullwidth katakana"
  elsif is_halfwidth_roman(c)
    type = "halfwidth roman"
  elsif is_fullwidth_roman(c)
    type = "fullwidth roman"
  elsif is_hiragana(c)
    type = "hiragana"
  elsif is_kanji(c)
    type = "kanji"
  end

  printf("%c (%x) %s\n",c,c.ord,type)
end

From a quick check standard characters of width 1 are often 1 byte, but characters of width 2 are 3 bytes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant