A module that allows you to extract a bunch of locale-specific information from the Unicode CLDR (Common Localization Data Repository), including:
- Date, time, and date-time formats
- Date interval formats
- Number formats, symbols, and digits for all number systems
- Exemplar and ellipsis characters
- Day names, month names, quarter names, era names, and cyclic names
- Patterns for rendering lists of items (see the this-and-that module for an easy-to-consume version)
- Display names for languages, time zones, territories, scripts and currencies
- Plural rule functions (converted to JavaScript functions)
- Rule-based number formatting functions (converted to JavaScript functions)
The extraction code was originally written for the inter i18n library, but can be used on its own.
To understand the data itself, you might need to dive into the LDML specification, which describes the schema of the CLDR XML files.
Comes bundled with the CLDR 45 release and isn't attempting to be backwards compatible with earlier versions.
See the changelog.
Make sure you have node.js and npm installed, then run:
$ npm install cldr
Now you're ready to create a node-cldr instance and take it for a spin:
var cldr = require('cldr');
console.log(cldr.extractTerritoryDisplayNames('fr'));
Output:
{ '142': 'Asie',
'143': 'Asie centrale',
'145': 'Asie occidentale',
'150': 'Europe',
[...]
YT: 'Mayotte',
ZA: 'Afrique du Sud',
ZM: 'Zambie',
ZW: 'Zimbabwe',
ZZ: 'région indéterminée' }
Advanced users can also provide the path to another CLDR installation like this:
var cldr = require('cldr').load('/path/to/cldr');
An array of locale ids for which data is available (656 in CLDR
release 22.1). The locale ids are "normalized" to be all lower case
with underscores separating the fragments. However, all methods that
take a locale id as a parameter will accept any casing and both -
and _
as separators.
An array of calendar ids for which data is available. In CLDR release 22.1:
[
'buddhist',
'chinese',
'coptic',
'dangi',
'ethioaa',
'ethiopic',
'gregorian',
'hebrew',
'indian',
'islamic',
'islamicc',
'iso8601',
'japanese',
'persian',
'roc',
];
An array of number system ids for which data is available. In CLDR release 22.1:
[
'arab',
'arabext',
'armn',
'armnlow',
'bali',
'beng',
'brah',
'cakm',
'cham',
'deva',
'ethi',
'finance',
'fullwide',
'geor',
'grek',
'greklow',
'gujr',
'guru',
'hanidec',
'hans',
'hansfin',
'hant',
'hantfin',
'hebr',
'java',
'jpan',
'jpanfin',
'kali',
'khmr',
'knda',
'lana',
'lanatham',
'laoo',
'latn',
'lepc',
'limb',
'mlym',
'mong',
'mtei',
'mymr',
'mymrshan',
'native',
'nkoo',
'olck',
'orya',
'osma',
'roman',
'romanlow',
'saur',
'shrd',
'sora',
'sund',
'takr',
'talu',
'taml',
'tamldec',
'telu',
'thai',
'tibt',
'traditio',
'vaii',
];
All the data extraction methods are synchronous, which means that XML
documents that haven't already been loaded will be loaded using
fs.readFileSync
. The reasoning behind this is that the API would be
awkward if all the extraction methods had to take callbacks. Also,
node-cldr
is unlikely to be used in a setting where performance is
critical. However, if for some reason you want to avoid the
synchronous loads, you can use cldr.load(<arrayOfLocaleIds>, cb)
to
load all the needed data in parallel before starting the extraction
itself. Then all the needed documents will be loaded and ready.
Extract a locale display pattern hash for a locale:
cldr.extractLocaleDisplayPattern('en_GB');
{ localePattern: '{0} ({1})',
localeSeparator: '{0}, {1}',
localeKeyTypePattern: '{0}: {1}' }
Extract a locale ID => display name hash for a locale:
cldr.extractLanguageDisplayNames('it').en;
('inglese');
Subdivision aliases contain deprecated or alternative subdivision codes. Note that the returned code may be either a territory code, (such as 'cn71' => 'TW'), or a subdivision code.
cldr.extractSubdivisionAliases().czol;
({ replacement: 'cz71', reason: 'deprecated' });
Extract a subnational territory ID => display name hash for a locale.
Codes follow the BCP47 standard, e.g. usca
for California, USA.
Note that these codes are similar but not identical to ISO 3166-2 codes. Unlike ISO 3166-2, CLDR never reuses a code.
For global regions and countries, see extractTerritoryDisplayNames
cldr.extractSubdivisionDisplayNames('en').dk85;
('Zealand');
Extract a time zone ID (Olson) => display name hash for a locale:
cldr.extractTimeZoneDisplayNames('it')['Europe/Gibraltar'];
('Gibilterra');
Extract a hash with ICU formats for displaying information about a time zone in a locale:
cldr.extractTimeZoneFormats('da');
{ hour: [ '+HH.mm', '-HH.mm' ],
gmt: 'GMT{0}',
gmtZero: 'GMT',
region: 'Tidszone for {0}',
fallback: '{1} ({0})',
regions: { daylight: '{0} (+1)', standard: '{0} (+0)' } }
Territory aliases contain deprecated or alternative territory codes.
cldr.extractTerritoryAliases().BU;
({ replacement: 'MM', reason: 'deprecated' });
Extract a territory ID => display name hash for a locale. This method will return global regions and countries. For subnational divisions, see extractSubdivisionDisplayNames.
cldr.extractTerritoryDisplayNames('fr').US;
('États-Unis');
Get a flattened tree structure with information about which territories are
contained in other territories. The territories are given by their id, so the
result is locale-independent. Consult extractTerritoryDisplayNames
for the
translated display names.
cldr.extractTerritoryContainmentGroups()
{ '142':
{ type: '142',
contains: [ '145', '143', '030', '034', '035' ],
parent: '001' },
'143':
{ type: '143',
contains: [ 'TM', 'TJ', 'KG', 'KZ', 'UZ' ],
parent: '142' },
[...]
'009':
{ type: '009',
contains: [ '053', '054', '057', '061', 'QO' ],
parent: '001' },
QO:
{ type: 'QO',
contains: [ 'AQ', 'AC', 'CP', 'DG', 'TA' ],
parent: '009' } }
Extract hash with currency ID keys mapping to currency info objects for a locale:
cldr.extractCurrencyInfoById('es').YUN;
{ displayName: 'dinar convertible yugoslavo',
symbol: undefined,
one: 'dinar convertible yugoslavo',
other: 'dinares convertibles yugoslavos' },
Extract a script ID => display name hash for a locale:
cldr.extractScriptDisplayNames('en_US').Arab;
('Arabic');
Extract a variant ID => display name hash for a locale:
cldr.extractVariantDisplayNames('fr').VALENCIA;
('valencien');
Extract keys and their associated types for a locale.
cldr.extractKeyTypes('en').calendar;
{ displayName: 'Calendar',
types:
{ buddhist: 'Buddhist Calendar',
chinese: 'Chinese Calendar',
coptic: 'Coptic Calendar',
dangi: 'Dangi Calendar',
ethiopic: 'Ethiopic Calendar',
ethiopicAmeteAlem: 'Ethiopic Amete Alem Calendar',
gregorian: 'Gregorian Calendar',
hebrew: 'Hebrew Calendar',
indian: 'Indian National Calendar',
islamic: 'Islamic Calendar',
islamicCivil: 'Islamic Calendar (tabular, civil epoch)',
islamicRgsa: 'Islamic Calendar (Saudi Arabia, sighting)',
islamicTbla: 'Islamic Calendar (tabular, astronomical epoch)',
islamicUmalqura: 'Islamic Calendar (Umm al-Qura)',
iso8601: 'ISO-8601 Calendar',
japanese: 'Japanese Calendar',
persian: 'Persian Calendar',
roc: 'Minguo Calendar' } }
cldr.extractKeyTypes('en').x;
{
displayName: 'Private-Use';
}
Extract a hash of transform names for a locale.
cldr.extractTransformNames('en');
{ BGN: 'BGN',
Numeric: 'Numeric',
Tone: 'Tone',
UNGEGN: 'UNGEGN',
'x-Accents': 'Accents',
'x-Fullwidth': 'Fullwidth',
'x-Halfwidth': 'Halfwidth',
'x-Jamo': 'Jamo',
'x-Pinyin': 'Pinyin',
'x-Publishing': 'Publishing' }
Extract a hash of measurement system names for a locale.
cldr.extractMeasurementSystemNames('en');
{ metric: 'Metric', UK: 'UK', US: 'US' }
Extract a hash of code patterns for a locale.
> cldr.extractCodePatterns('en');
{ language: 'Language: {0}',
script: 'Script: {0}',
territory: 'Region: {0}' }
Extract a nested hash with era names in wide
and abbreviated
formats for a calendar and locale:
cldr.extractEraNames('es', 'gregorian');
{ wide:
{ '0': 'antes de Cristo',
'1': 'anno Dómini' },
abbreviated:
{ '0': 'a.C.',
'1': 'd.C.' } }
Extract a nested hash with quarter names in various formats for a calendar and locale:
cldr.extractQuarterNames('es', 'gregorian');
{ format:
{ abbreviated: { '0': 'T1', '1': 'T2', '2': 'T3', '3': 'T4' },
narrow: { '0': '1T', '1': '2T', '2': '3T', '3': '4T' },
wide: { '0': '1er trimestre', '1': '2º trimestre', '2': '3er trimestre', '3': '4º trimestre' } },
standAlone:
{ abbreviated: { '0': 'Q1', '1': 'Q2', '2': 'Q3', '3': 'Q4' },
narrow: { '0': '1T', '1': '2T', '2': '3T', '3': '4T' },
wide: { '0': '1.er trimestre', '1': '2.º trimestre', '2': '3.er trimestre', '3': '4.º trimestre' } } }
Extract a nested hash with day periods in various formats for a calendar and locale:
cldr.extractDayPeriods('en_GB', 'gregorian');
{ format:
{ abbreviated: { am: 'AM', pm: 'PM' },
narrow: { am: 'a', noon: 'n', pm: 'p' },
wide: { am: 'am', pm: 'pm', noon: 'noon' } },
standAlone:
{ abbreviated: { am: 'AM', pm: 'PM' },
narrow: { am: 'AM', pm: 'PM' },
wide: { am: 'AM', pm: 'PM' } } }
Extract a nested hash with cyclic names for a calendar and locale
(only the chinese
calendar contains these):
cldr.extractCyclicNames('en_US', 'chinese').zodiacs.format.abbreviated;
{ '1': 'Rat', '2': 'Ox', '3': 'Tiger', '4': 'Rabbit', '5': 'Dragon', '6': 'Snake', '7': 'Horse', '8': 'Goat', '9': 'Monkey', '10': 'Rooster', '11': 'Dog', '12': 'Pig' }
Extract a nested hash with month names (in various contexts) for a calendar and locale:
cldr.extractMonthNames('nl', 'gregorian').format.wide;
{ '0': 'januari', '1': 'februari', '2': 'maart', '3': 'april', '4': 'mei', '5': 'juni', '6': 'juli',
'7': 'augustus', '8': 'september', '9': 'oktober', '10': 'november', '11': 'december' }
Extract a nested hash with month patterns (in various contexts) for a calendar and locale:
cldr.extractMonthPatterns('nl', 'chinese');
{ format:
{ abbreviated: { leap: '{0}bis' },
narrow: { leap: '{0}b' },
wide: { leap: '{0}bis' } },
numeric: { all: { leap: '{0}bis' } },
standAlone:
{ abbreviated: { leap: '{0}bis' },
narrow: { leap: '{0}b' },
wide: { leap: '{0}bis' } } }
Extract a nested hash with day names (in various contexts) for a calendar and locale:
cldr.extractDayNames('en', 'gregorian').format.abbreviated;
{ '0': 'Sun',
'1': 'Mon',
'2': 'Tue',
'3': 'Wed',
'4': 'Thu',
'5': 'Fri',
'6': 'Sat' }
Extract a nested hash with display names (including relative) for various fields for a locale:
cldr.extractFields('en').month;
{ displayName: 'Month',
relative:
{ '0': 'this month',
'1': 'next month',
'-1': 'last month' },
relativeTime:
{ future:
{ one: 'in {0} month',
other: 'in {0} months' },
past:
{ one: '{0} month ago',
other: '{0} months ago' } } }
Extract a hash with ICU patterns that show how to build a date-time pattern out of a date pattern and a time pattern in various contexts for a calendar and locale:
cldr.extractDateTimePatterns('en', 'gregorian');
{ full: '{1} \'at\' {0}',
long: '{1} \'at\' {0}',
medium: '{1}, {0}',
short: '{1}, {0}' }
Extract a hash of basic date formats (ICU) for a calendar and locale:
cldr.extractDateFormats('en_GB', 'gregorian');
{ full: 'EEEE, d MMMM y',
long: 'd MMMM y',
medium: 'd MMM y',
short: 'dd/MM/yyyy' }
Extract a hash of basic time formats (ICU) for a given calendar and locale:
cldr.extractTimeFormats('en_GB', 'gregorian');
{ full: 'HH:mm:ss zzzz',
long: 'HH:mm:ss z',
medium: 'HH:mm:ss',
short: 'HH:mm' }
Extract a hash of ICU date formats for displaying dates and times at various detail levels for a calendar and locale:
cldr.extractDateFormatItems('en_GB', 'gregorian');
{ d: 'd',
Ed: 'E d',
Ehm: 'E h:mm a',
EHm: 'E HH:mm',
[...]
yQQQ: 'QQQ y',
yyMMM: 'MMM yy',
yyyyMM: 'MM/yyyy',
yyyyMMMM: 'MMMM y' }
Extract a nested hash with date interval display formats (ICU), keyed by the detail level and the 'greatest difference' field for a calendar and a locale (tip: Look for "greatest difference" in the LDML spec):
cldr.extractDateIntervalFormats('en_GB', 'gregorian');
{ d: { d: 'd–d' },
h: { a: 'h a – h a', h: 'h–h a' },
H: { H: 'HH–HH' },
hm: { a: 'h:mm a – h:mm a', h: 'h:mm–h:mm a', m: 'h:mm–h:mm a' },
[...]
yMMMEd:
{ d: 'E, d – E, d MMM y',
M: 'E, d MMM – E, d MMM y',
y: 'E, d MMM y – E, d MMM y' },
yMMMM: { M: 'MMMM–MMMM y', y: 'MMMM y – MMMM y' } }
Extract the date interval fallback format (ICU) for a given calendar and locale (to be used when the date interval formats don't offer a specific format):
cldr.extractDateIntervalFallbackFormat('en_GB', 'gregorian');
('{0} – {1}');
Extract the number symbols for a given number system and locale:
cldr.extractNumberSymbols('en_GB', 'latn');
{ decimal: '.',
group: ',',
list: ';',
percentSign: '%',
plusSign: '+',
minusSign: '-',
exponential: 'E',
perMille: '‰',
infinity: '∞',
nan: 'NaN' }
Extract the number formats (ICU DecimalFormat) for a given number system and locale:
cldr.extractNumberFormats('en_GB', 'latn');
{ scientific: { default: '#E0' },
decimal:
{ long:
{ '1000': { one: '0 thousand', other: '0 thousand' },
'10000': { one: '00 thousand', other: '00 thousand' },
[...]
'100000000000000': { one: '000 trillion', other: '000 trillion' } },
short:
{ '1000': { one: '0k', other: '0K' },
'10000': { one: '00k', other: '00K' },
[...]
'100000000000000': { one: '000tn', other: '000T' } },
default: '#,##0.###' },
currency: { default: '¤#,##0.00', one: '{0} {1}', other: '{0} {1}' },
percent: { default: '#,##0%' } }
Extract the id of the default number system for a locale:
cldr.extractDefaultNumberSystemId('en_GB');
('latn');
cldr.extractDefaultNumberSystemId('ar');
('arab');
Extract the unit patterns (ICU) for a locale (to be used with a plural rule function):
cldr.extractUnitPatterns('en_GB').long.unit.massKilogram
{ one: '{0} kilogram',
other: '{0} kilograms' }
Extract the delimiters for a locale:
cldr.extractDelimiters('en_GB');
{ quotationStart: '“',
quotationEnd: '”',
alternateQuotationStart: '‘',
alternateQuotationEnd: '’' }
Extract the list patterns (ICU) for a locale:
Object.keys(cldr.extractListPatterns('en_GB'));
[ 'default',
'unit',
'unitNarrow',
'unitShort' ]
cldr.extractListPatterns('en_GB').default;
{ '2': '{0} and {1}',
end: '{0} and {1}',
middle: '{0}, {1}',
start: '{0}, {1}' }
Extract information about the writing direction for a locale:
cldr.extractLayout('ar');
{ orientation:
{ characterOrder: 'right-to-left',
lineOrder: 'top-to-bottom' } }
Extract information about various character classes, ellipsis patterns etc. for a locale:
cldr.extractCharacters('en_GB');
{ exemplar:
{ default: [ 'a', 'b', 'c', 'd', 'e', [...], 'x', 'y', 'z' ],
auxiliary: [ 'á', 'à', 'ă', 'â', 'å', [...], 'ü', 'ū', 'ÿ' ],
index: [ 'A', 'B', 'C', 'D', 'E', [...], 'X', 'Y', 'Z' ],
punctuation: [ '\\-', '‐', '–', '—', ',', [...], '‡', '′', '″' ] },
ellipsis: { final: '{0}…', initial: '…{0}', medial: '{0}… {1}' },
moreInformation: '?' }
Extract a list of available plural classes for a locale (See the LDML spec for an explanation):
cldr.extractPluralClasses('en_GB', 'cardinal')[('one', 'other')];
cldr.extractPluralRuleFunction('ar', 'cardinal')[
('zero', 'one', 'two', 'few', 'many', 'other')
];
cldr.extractPluralRuleFunction('ar', 'ordinal')['other'];
Extract a plural rule function for a locale (See the LDML spec for an explanation):
cldr.extractPluralRuleFunction('en_GB', 'cardinal').toString();
function (n) {
if (typeof n === "string") n = parseInt(n, 10);
if (n === 1) return "one";
return "other";
}
cldr.extractPluralRuleFunction('en_GB', 'ordinal').toString();
function (n) {
if (typeof n === "string") n = parseInt(n, 10);
if (n % 10 === 1 && !(n % 100 === 11)) return "one";
if (n % 10 === 2 && !(n % 100 === 12)) return "two";
if (n % 10 === 3 && !(n % 100 === 13)) return "few";
return "other";
}
cldr.extractPluralRuleFunction('ar').toString();
function (n) {
if (typeof n === "string") n = parseInt(n, 10);
if (n === 0) return "zero";
if (n === 1) return "one";
if (n === 2) return "two";
if (n % 100 >= 3 && n % 100 <= 10) return "few";
if (n % 100 >= 11 && n % 100 <= 99) return "many";
return "other";
}
Extracts RBNF (rule-based number formatting) functions for a locale. The 'types' parameter specifies the names of the functions you want (defaults to all available), and the returned hash will contain the ones that were found plus their dependencies.
The original function names have been converted to camelCase and
prefixed with render
, and you need to use that naming convention
when specifying the types
array as well.
cldr.extractRbnfFunctionByType('en_GB').renderRomanUpper(2012);
('MMXII');
cldr.extractRbnfFunctionByType('de').renderSpelloutOrdinal(2323);
('zwei tausend drei hundert drei und zwanzigste');
Note that some of the generated functions expect to be able to call
this.renderNumber(<number>, <icuNumberFormat>);
. If there's demand
for it, that can be made customizable, just file an issue.
Extract information about a numbering system. The supported numbering systems
can be retrieved as cldr.numberingSystemIds
.
Most of the numbering systems will have a type of numeric
and provide an array
of digits that correspond to 0 through 9:
cldr.extractNumberingSystem('bali');
{ type: 'numeric',
digits: [ '᭐', '᭑', '᭒', '᭓', '᭔', '᭕', '᭖', '᭗', '᭘', '᭙' ] }
For other more complicated numbering systems, the type
will be algorithmic
and there will be a rules
property that is a string starting with render
. In
that case, use the RBNF function (see above) of that name for producing a
number. These RBNF functions are the same in every locale, so you can just pass
root
:
cldr.extractNumberingSystem('cyrl')
{ type: 'algorithmic', rules: 'renderCyrillicLower' }
const rbnfs = cldr.extractRbnfFunctionByType('root', 'renderCyrillicLower')
rbnfs.renderCyrillicLower(1234)
'҂асл҃д'
Some of the algorithmic
numbering systems will additionally refer to a
specific locale:
cldr.extractNumberingSystem('jpan');
{ type: 'algorithmic',
rules: 'renderSpelloutCardinal',
locale: 'ja' }
In that case you should extract the RBNF rule from that specific locale to render the numbers:
const rbnfs = cldr.extractRbnfFunctionByType('ja', 'renderSpelloutCardinal');
const rbnfs.renderSpelloutCardinal(1234);
'千二百三十四'
Deprecated: Please use extractNumberingSystem()
instead.
Extract a hash of number system id => digits array. For some exotic
number systems, 'digits' is a string starting with render
. In that
case, use the RBNF function (see above) of that name for producing a
number.
cldr.extractDigitsByNumberSystemId();
{ arab: [ '٠', '١', '٢', '٣', '٤', '٥', '٦', '٧', '٨', '٩' ],
arabext: [ '۰', '۱', '۲', '۳', '۴', '۵', '۶', '۷', '۸', '۹' ],
armn: 'renderArmenianUpper',
armnlow: 'renderArmenianLower',
[...]
latn: [ '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' ],
orya: [ '୦', '୧', '୨', '୩', '୪', '୫', '୬', '୭', '୮', '୯' ],
[...] }
Extract supplemental data for languages. These data contain a list of territories where the language is spoken, and scripts that are used with the language. Both territories and scripts can be either primary or secondary for the language.
{ ar:
{ scripts: [ 'Arab' ], territories: [ 'AE', 'BH', [...] ],
secondary: {
scripts: [ 'Syrc' ], territories: [ 'IR', 'SS' ] }
},
[...] }
Extract supplemental week data, including what day should be considered the first day of the week. The data is grouped by territories and/or locales.
{ firstDay:
[
{ day='mon', territories=[ '001', 'AD', [...] ] },
{ day='fri', territories=[ 'MV'] },
{ day='sat', territories=[ 'AE', 'AF', [...] ] },
{ day='sun', territories=[ 'AG', 'AR', [...] ] },
{ day='sun', variant=true, territories=[ 'GB' ] },
],
[...] }
node-cldr is licensed under a standard 3-clause BSD license -- see the
LICENSE
-file for details.