Skip to content

A pure Python implementation of ICU's rule-based number format engine

License

Notifications You must be signed in to change notification settings

rhasspy/unicode-rbnf

Repository files navigation

Unicode RBNF

A pure Python implementation of rule based number formatting (RBNF) using the Unicode Common Locale Data Repository (CLDR).

This lets you spell out numbers for a large number of locales:

from unicode_rbnf import RbnfEngine

engine = RbnfEngine.for_language("en")
assert engine.format_number(1234).text == "one thousand two hundred thirty-four"

Different formatting purposes are supported as well, depending on the locale:

from unicode_rbnf import RbnfEngine, FormatPurpose

engine = RbnfEngine.for_language("en")
assert engine.format_number(1999, FormatPurpose.CARDINAL).text == "one thousand nine hundred ninety-nine"
assert engine.format_number(1999, FormatPurpose.YEAR).text == "nineteen ninety-nine"
assert engine.format_number(11, FormatPurpose.ORDINAL).text == "eleventh"

For locales with multiple genders, cases, etc., the different texts are accessible in the result of format_number:

from unicode_rbnf import RbnfEngine

engine = RbnfEngine.for_language("de")
print(engine.format_number(1))

Result:

FormatResult(
  text='eins',
  text_by_ruleset={
    'spellout-numbering': 'eins',
    'spellout-cardinal-neuter': 'ein',
    'spellout-cardinal-masculine': 'ein',
    'spellout-cardinal-feminine': 'eine',
    'spellout-cardinal-n': 'einen',
    'spellout-cardinal-r': 'einer',
    'spellout-cardinal-s': 'eines',
    'spellout-cardinal-m': 'einem'
  }
)

The text property of the result holds the text of the ruleset with the shortest name (least specific).

Supported locales

See: https://github.com/unicode-org/cldr/tree/release-44/common/rbnf

Engine implementation

Not all features of the RBNF engine are implemented. The following features are available:

  • Literal text (hundred)
  • Quotient substitution (<< or ←←)
  • Reminder substitution (>> or →→)
  • Optional substitution ([...])
  • Rule substituton (←%ruleset_name←)
  • Rule replacement (=%ruleset_name=)
  • Special rules:
    • Negative numbers (-x)
    • Improper fractions (x.x)
    • Not a number (NaN)
    • Infinity (Inf)

Some features that will need to be added eventually:

  • Proper fraction rules (0.x)
  • Preceding reminder substitution (>>> or →→→)
  • Number format strings (==)
  • Decimal format patterns (#,##0.00)
  • Plural replacements ($(ordinal,one{st}...))

About

A pure Python implementation of ICU's rule-based number format engine

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages