Skip to content

Commit

Permalink
Updated 'i18n_icu', 'i18n_lexer' and 'i18n_pattern' crates to support…
Browse files Browse the repository at this point in the history
… any data provider implementing the ICU 'DataProvider'. Originally only 'BufferProvider' implementations was supported. The beginnings of 'Message' has been implemented, using 'IcuDataProvider' and 'LStringProvider'.
  • Loading branch information
rizzen-yazston committed Mar 23, 2023
1 parent 86b2f11 commit b27f13b
Show file tree
Hide file tree
Showing 22 changed files with 588 additions and 73 deletions.
50 changes: 49 additions & 1 deletion CHANGELOG.asciidoc
Original file line number Diff line number Diff line change
@@ -1,6 +1,46 @@
= Changelog
Rizzen Yazston

== i18n 0.6.0 (2023-03-??)

WARNING: This update has many API breaking changes for many `i18n` crates.

Breaking change is the result of changing how ICU data providers are used and passed to various components, thus many examples are affected even if the module is not affected by the ICU data provider change.

* Added the `icu` crate:

* Added `IcuDataProvider`, `DataProviderWrapper`, and `IcuError`.

** Added the `Cargo.toml`, license, and documentation.

* Updated the `i18n_provider` crate:

** Added `LStringProviderWrapper`.

** Updated documentation.

* Updated the `i18n_provider_sqlite` crate:

** Added `LStringProviderSqlite3`, `AsLStringProviderSqlite3`, and its blanket implementation.

** Updated tests, examples and documentation.

* Updated the `i18n_lexer` crate:



* Updated the `i18n_pattern` crate:



* Added the `i18n_message` crate:

** Added `Message`, `MessageError`.

** Added tests.

** Added the `Cargo.toml`, license, and documentation.

== i18n 0.5.0 (2023-03-16)

WARNING: This update has many API breaking changes for all existing `i18n` crates.
Expand All @@ -9,12 +49,20 @@ Breaking change is the result of changing the implementation of handling error a

* Added the `i18n_provider` crate:

** Added `LStringProvider`
** Added `LStringProvider`, `ProviderError`.

** Added the `Cargo.toml`, license, and documentation.

* Added the `i18n_provider_sqlite3` crate:

** Added implementation of `LStringProvider` using Sqlite3 backend.

** Added `tests` directory.

** Added Sqlite3 file for supported error language strings.

** Added the `Cargo.toml`, license, and documentation.

* Updated the `i18n_utility` crate:

** Renamed crate `i18n_utility` to `i18n_registry`.
Expand Down
25 changes: 25 additions & 0 deletions crates/icu/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -43,5 +43,30 @@ version = "0.8.0"
# Needed for BufferProvider
features = [ "serde" ]

[dependencies.icu_plurals]
version = "1.1.0"
# Needed for BufferProvider
features = [ "serde" ]

[dependencies.fixed_decimal]
version = "0.5.2"
# Needed for BufferProvider
features = [ "ryu" ]

[dependencies.icu_decimal]
version = "1.1.0"
# Needed for BufferProvider
features = [ "serde" ]

[dependencies.icu_calendar]
version = "1.1.0"
# Needed for BufferProvider
features = [ "serde" ]

[dependencies.icu_datetime]
version = "1.1.0"
# Needed for BufferProvider
features = [ "serde" ]

[lib]
name = "i18n_icu"
12 changes: 7 additions & 5 deletions crates/icu/README.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,13 @@ Rizzen Yazston
:LanguageIdentifier: https://docs.rs/icu/latest/icu/locid/struct.LanguageIdentifier.html
:BCP_47_Language_Tag: https://www.rfc-editor.org/rfc/bcp/bcp47.txt

ICU4X data provider helper.

The `IcuDataProvider` type contains a member `data_provider` holding the `DataProvider`, which is a deserialised `BufferProvider`.

The `IcuDataProvider` type also contains non-locale based data used within the `i18n_lexer` crate.
ICU4X data provider helper.

The `IcuDataProvider` type contains a member `data_provider` holding the `&DataProvider` as a `DataProviderWrapper` type.

The `IcuDataProvider` type also contains non-locale based data used within the `i18n_lexer` crate.

`IcuDataProvider` type is used within the `Rc` type as `Rc<IcuDataProvider>` to prevent unnecessary duplication.

== Cargo.toml

Expand Down
96 changes: 73 additions & 23 deletions crates/icu/src/icu.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,51 +2,97 @@
// called `LICENSE-BSD-3-Clause` at the top level of the `i18n_icu-rizzen-yazston` crate.

use crate::IcuError;
use icu_provider::serde::DeserializingBufferProvider;
use icu_properties::sets::{ load_pattern_white_space, load_pattern_syntax, CodePointSetData };
use icu_segmenter::GraphemeClusterSegmenter;
use icu_provider::DataProvider;
use icu_properties::{
provider::{ PatternSyntaxV1Marker, PatternWhiteSpaceV1Marker },
sets::{ load_pattern_white_space, load_pattern_syntax, CodePointSetData }
};
use icu_segmenter::{ GraphemeClusterSegmenter, provider::GraphemeClusterBreakDataV1Marker };
use icu_plurals::provider::{ CardinalV1Marker, OrdinalV1Marker };
use icu_decimal::provider::DecimalSymbolsV1Marker;
use icu_datetime::provider::calendar::{
TimeSymbolsV1Marker,
TimeLengthsV1Marker,
GregorianDateLengthsV1Marker,
BuddhistDateLengthsV1Marker,
JapaneseDateLengthsV1Marker,
JapaneseExtendedDateLengthsV1Marker,
CopticDateLengthsV1Marker,
IndianDateLengthsV1Marker,
EthiopianDateLengthsV1Marker,
GregorianDateSymbolsV1Marker,
BuddhistDateSymbolsV1Marker,
JapaneseDateSymbolsV1Marker,
JapaneseExtendedDateSymbolsV1Marker,
CopticDateSymbolsV1Marker,
IndianDateSymbolsV1Marker,
EthiopianDateSymbolsV1Marker,
};
use icu_calendar::provider::{ WeekDataV1Marker, JapaneseErasV1Marker, JapaneseExtendedErasV1Marker };

/// The `IcuDataProvider` type contains a member `data_provider` holding the `DataProvider`, which is a deserialised
/// `BufferProvider`.
/// The `IcuDataProvider` type contains a member `data_provider` holding the `&DataProvider` as a `DataProviderWrapper`
/// type.
///
/// The `IcuDataProvider` type also contains non-locale based data used within the `i18n_lexer` crate.
///
/// `IcuDataProvider` type is used within the `Rc` type to prevent unnecessary duplication.
/// `IcuDataProvider` type is used within the `Rc` type as `Rc<IcuDataProvider>` to prevent unnecessary duplication.
pub struct IcuDataProvider<'a, P>
where
P: ?Sized
P: ?Sized + DataProvider<PatternSyntaxV1Marker> + DataProvider<PatternWhiteSpaceV1Marker>
+ DataProvider<GraphemeClusterBreakDataV1Marker> + DataProvider<CardinalV1Marker>
+ DataProvider<OrdinalV1Marker> + DataProvider<DecimalSymbolsV1Marker> + DataProvider<TimeSymbolsV1Marker>
+ DataProvider<TimeLengthsV1Marker> + DataProvider<WeekDataV1Marker>
+ DataProvider<GregorianDateLengthsV1Marker> + DataProvider<BuddhistDateLengthsV1Marker>
+ DataProvider<JapaneseDateLengthsV1Marker> + DataProvider<JapaneseExtendedDateLengthsV1Marker>
+ DataProvider<CopticDateLengthsV1Marker> + DataProvider<IndianDateLengthsV1Marker>
+ DataProvider<EthiopianDateLengthsV1Marker> + DataProvider<GregorianDateSymbolsV1Marker>
+ DataProvider<BuddhistDateSymbolsV1Marker> + DataProvider<JapaneseDateSymbolsV1Marker>
+ DataProvider<JapaneseExtendedDateSymbolsV1Marker> + DataProvider<CopticDateSymbolsV1Marker>
+ DataProvider<IndianDateSymbolsV1Marker> + DataProvider<EthiopianDateSymbolsV1Marker>
+ DataProvider<JapaneseErasV1Marker> + DataProvider<JapaneseExtendedErasV1Marker>,
{
data_provider: DeserializingBufferProvider<'a, P>,
data_provider: DataProviderWrapper<'a, P>,
pattern_syntax: CodePointSetData,
pattern_white_space: CodePointSetData,
grapheme_segmenter: GraphemeClusterSegmenter,
}

impl<'a, P> IcuDataProvider<'a, P>
impl<'a, P> IcuDataProvider<'a, P>
where
P: ?Sized + icu_provider::BufferProvider
P: ?Sized + DataProvider<PatternSyntaxV1Marker> + DataProvider<PatternWhiteSpaceV1Marker>
+ DataProvider<GraphemeClusterBreakDataV1Marker> + DataProvider<CardinalV1Marker>
+ DataProvider<OrdinalV1Marker> + DataProvider<DecimalSymbolsV1Marker> + DataProvider<TimeSymbolsV1Marker>
+ DataProvider<TimeLengthsV1Marker> + DataProvider<WeekDataV1Marker>
+ DataProvider<GregorianDateLengthsV1Marker> + DataProvider<BuddhistDateLengthsV1Marker>
+ DataProvider<JapaneseDateLengthsV1Marker> + DataProvider<JapaneseExtendedDateLengthsV1Marker>
+ DataProvider<CopticDateLengthsV1Marker> + DataProvider<IndianDateLengthsV1Marker>
+ DataProvider<EthiopianDateLengthsV1Marker> + DataProvider<GregorianDateSymbolsV1Marker>
+ DataProvider<BuddhistDateSymbolsV1Marker> + DataProvider<JapaneseDateSymbolsV1Marker>
+ DataProvider<JapaneseExtendedDateSymbolsV1Marker> + DataProvider<CopticDateSymbolsV1Marker>
+ DataProvider<IndianDateSymbolsV1Marker> + DataProvider<EthiopianDateSymbolsV1Marker>
+ DataProvider<JapaneseErasV1Marker> + DataProvider<JapaneseExtendedErasV1Marker>,
{

/// Create a `IcuDataProvider` object using the ICU's `DeserializingBufferProvider` as the `DataProvider`. Besides
/// storing the `DataProvider`, it also obtains and stores the Pattern_Syntax character set, the
/// Pattern_White_Space character set, and the Grapheme Cluster Segmenter required for the `Lexer` types to
/// function.
pub fn try_new( data_provider: DeserializingBufferProvider<'a, P> ) -> Result<Self, IcuError> {
let syntax = load_pattern_syntax( &data_provider )?;
let white_space = load_pattern_white_space( &data_provider )?;
/// Create a `IcuDataProvider` object using the ICU's `DataProvider` as a reference within the
/// `DataProviderWrapper` type, which is provided by this crate. Besides storing the `DataProvider`, it also
/// obtains and stores the Pattern_Syntax character set, the Pattern_White_Space character set, and the Grapheme
/// Cluster Segmenter required for the `Lexer` types to function.
pub fn try_new( data_provider: &'a P ) -> Result<Self, IcuError> {
let syntax = load_pattern_syntax( data_provider )?;
let white_space = load_pattern_white_space( data_provider )?;
let grapheme_segmenter =
GraphemeClusterSegmenter::try_new_unstable( &data_provider )?;
GraphemeClusterSegmenter::try_new_unstable( data_provider )?;
Ok( IcuDataProvider {
data_provider,
data_provider: DataProviderWrapper( data_provider ),
pattern_syntax: syntax,
pattern_white_space: white_space,
grapheme_segmenter,
} )
}

/// Get the `DataProvider` object that can be used in any ICU function that accepts a `DataProvider` as a
/// parameter.
pub fn data_provider( &self ) -> &DeserializingBufferProvider<P> {
/// Get the `DataProviderWrapper` object that can be used in any ICU function that accepts a `DataProvider` as a
/// parameter, as `data_provider().0`.
pub fn data_provider( &self ) -> &DataProviderWrapper<P> {
&self.data_provider
}

Expand All @@ -64,4 +110,8 @@ where
pub fn grapheme_segmenter( &self ) -> &GraphemeClusterSegmenter {
&self.grapheme_segmenter
}
}
}

/// A simple tuple struct that holds a reference to a ICU4X `DataProvider` implementation. This tuple struct allows
/// a `DataProvider` reference to be stored within other structs.
pub struct DataProviderWrapper<'a, P: ?Sized>( pub &'a P );
6 changes: 4 additions & 2 deletions crates/icu/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,13 @@

//! ICU4X data provider helper.
//!
//! The `IcuDataProvider` type contains a member `data_provider` holding the `DataProvider`, which is a deserialised
//! `BufferProvider`.
//! The `IcuDataProvider` type contains a member `data_provider` holding the `&DataProvider` as a `DataProviderWrapper`
//! type.
//!
//! The `IcuDataProvider` type also contains non-locale based data used within the `i18n_lexer` crate.
//!
//! `IcuDataProvider` type is used within the `Rc` type as `Rc<IcuDataProvider>` to prevent unnecessary duplication.
//!
//! # Examples
//!
//! See various examples of `i18n_lexer`, `i18n_pattern`, and `i18n_message` crates.
Expand Down
25 changes: 25 additions & 0 deletions crates/lexer/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,31 @@ version = "0.8.0"
# Needed for BufferProvider
features = [ "serde" ]

[dependencies.icu_plurals]
version = "1.1.0"
# Needed for BufferProvider
features = [ "serde" ]

[dependencies.fixed_decimal]
version = "0.5.2"
# Needed for BufferProvider
features = [ "ryu" ]

[dependencies.icu_decimal]
version = "1.1.0"
# Needed for BufferProvider
features = [ "serde" ]

[dependencies.icu_calendar]
version = "1.1.0"
# Needed for BufferProvider
features = [ "serde" ]

[dependencies.icu_datetime]
version = "1.1.0"
# Needed for BufferProvider
features = [ "serde" ]

[dev-dependencies]

[dev-dependencies.icu_testdata]
Expand Down
32 changes: 30 additions & 2 deletions crates/lexer/README.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,9 @@ Rizzen Yazston

String lexer and resultant tokens.

The `Lexer` is initialised using a data provider {BufferProvider}[`BufferProvider`] to an {Unicode_Consortium}[Unicode Consortium] {CLDR}[CLDR] data repository, usually it is just a local copy of the CLDR in the application's data directory. Once the `Lexer` has been initialised it may be used to tokenise strings, without needing to re-initialising the `Lexer` before use. Consult the {ICU4X}[ICU4X] website for instructions on generating a suitable data repository for the application, by leaving out data that is not used by the application.
The `Lexer` is initialised using any data provider implementing the [`DataProvider`] trait to an [Unicode Consortium] [CLDR] data repository (even a custom database). Usually the repository is just a local copy of the CLDR in the application's data directory. Once the `Lexer` has been initialised it may be used to tokenise strings, without needing to re-initialising the `Lexer` before use.

Consult the [ICU4X] website for instructions on generating a suitable data repository for the application, by leaving out data that is not used by the application.

Strings are tokenised using the method `tokenise()` taking string slice and a vector containing grammar syntax characters.

Expand All @@ -23,6 +25,7 @@ version = "1.1.0"
# Needed for BufferProvider
features = [ "serde", "deserialize_bincode_1" ]


[dependencies.icu_properties]
version = "1.1.0"
# Needed for BufferProvider
Expand All @@ -32,6 +35,31 @@ features = [ "serde" ]
version = "0.8.0"
# Needed for BufferProvider
features = [ "serde" ]

[dependencies.icu_plurals]
version = "1.1.0"
# Needed for BufferProvider
features = [ "serde" ]

[dependencies.fixed_decimal]
version = "0.5.2"
# Needed for BufferProvider
features = [ "ryu" ]

[dependencies.icu_decimal]
version = "1.1.0"
# Needed for BufferProvider
features = [ "serde" ]

[dependencies.icu_calendar]
version = "1.1.0"
# Needed for BufferProvider
features = [ "serde" ]

[dependencies.icu_datetime]
version = "1.1.0"
# Needed for BufferProvider
features = [ "serde" ]
```

== Examples
Expand All @@ -47,7 +75,7 @@ use std::error::Error;
fn tokenise() -> Result<(), Box<dyn Error>> {
let buffer_provider = buffer();
let data_provider = buffer_provider.as_deserializing();
let icu_data_provider = IcuDataProvider::try_new( data_provider )?;
let icu_data_provider = IcuDataProvider::try_new( &data_provider )?;
let mut lexer = Lexer::try_new( &Rc::new( icu_data_provider ) )?;
let tokens = lexer.tokenise(
"String contains a {placeholder}.", &vec![ '{', '}' ]
Expand Down
Loading

0 comments on commit b27f13b

Please sign in to comment.