There are 8 key steps to localize Pehla Class. We work with local liguistic experts at each stage:
- Analyse target language and culture. Examine the target language, and build a clear description of it, including: linguistic structure, alphabet, phonetic structure, order and frequency of graphemes, list of high frequency words, set of common first and last names. Carry out a comprehensive analysis to understand the target area, in order to culturally adapt the course to that specific geographical area.
- Cross check target language. Check the description of the new language against our existing learning activities. Develop new components or modes, if beneficial to the child.
- Story selection. Select appropriate stories from our extensive book library, and write or curate further stories as required.
- Translation. Translate the full database of words, instruction scripts for each activity, chosen stories, and numeracy learning units.
- Writing. If there is any additional material required for the target language, such as simple phrases and sentences, it will be written at this stage.
- Images. Creation of any new images required for learning units and stories.
- Recording. Recording of all audio material required, using several voice artists.
- Final build. With all assets prepared, build Pehla Class in the new target language.
Each localized version of Pehla Class requires a language pack for a locale. A locale is a pairing of a language and an optional region (e.g. Tanzanian Swahili, British English). A locale is represented using abbreviated codes, so Swahili would be sw
and British English en_GB
. A language pack consists of eight elements:
- The alphabet
- Phonemes, syllables and words
- Audio and images
- Stories
- Component audio
- Learning journey
- Fonts
- Video subtitles
The file assets/oc-literacy-gen/local/LOCALE/letters.xml
defines the individual letters that constitute the language’s alphabet.
Each letter
has a unique id and an optional set of tags
. Possible values for the tag attribute vary across language families, with vowel
being used in both English and Swahili. The letters a, b and d are represented like this:
<letters>
<letter id="a" tags="vowel"></letter>
<letter id="b"></letter>
<letter id="d"></letter>
</letters>
The file assets/oc-literacy-gen/local/LOCALE/wordcomponents.xml
defines the key phonemes, consonant clusters and syllables present in the language. It also contains a curated set of high-frequency and culturally specific words for learning the language.
Each phoneme
has a unique id
with the prefix is
. In Swahili, the phonemes a, th and ng' are represented like this:
<phonemes>
<phoneme id="is_a">a</phoneme>
<phoneme id="is_th">th</phoneme>
<phoneme id="is_ng_apost_">ng'</phoneme>
</phonemes>
Next come syllables. Each syllable
is made from one or more phoneme
and has a unique id
with the prefix isyl
. The individual phonemes in each syllable are delimited by /
. In Swahili, the syllables ju and zwe are represented like this:
<syllables>
<syllable id="isyl_ju">j/u</syllable>
<syllable id="isyl_zwe">zw/e</syllable>
</syllables>
Finally, the set of words. Each word
is made from one or more phonemes
or syllables
and has a unique id
with the prefix fc
. The phonemes or syllables in each word are delimited by /
. In Swahili, the words paka and rafiki are represented like this:
<words>
<word id="fc_cat">pa/ka</word>
<word id="fc_friend">ra/fi/ki</word>
</words>
Each letter
, phoneme
, syllable
and word
is recorded by a native speaker and an aac compressed version is stored in an .m4a file, named by it's English id
. For example, in Swahili, the words nywele, jani and kima exist as recorded audio files: fc_hair.m4a
, fc_leaf.m4a
and fc_monkey.m4a
. These reside in the assets/oc-literacy-gen/local/LOCALE/
directory.
Where a word is broken down into syllables
or phonemes
, the recording has an accompanying .epta file which specifies the start time of each phoneme or syllable in the audio file. For example, the English word kick, recorded in the file fc_let_kick.m4a
has the following phoneme breakdowns in fc_let_kick.etpa
:
<timings text="k - i - ck">
<timing id="0" start="0.000" end="0.207" startframe="0" framelength="9108" text="k"/>
<timing id="1" start="0.609" end="0.924" startframe="26844" framelength="13915" text="i"/>
<timing id="2" start="1.324" end="1.486" startframe="58401" framelength="7121" text="ck"/>
</timings>
For each word
, an optional .png image exists. In Swahili, the words mbu, kiazi and shule have image files: fc_mosquito.png
, fc_potato.png
and fc_school.png
.
Localised stories each have an id
and reside in assets/oc-reading/books/xr-[id]/
. Each story has configuration file book.xml
.
Presentational apsects of the story are defined on the book
element. Each page
contains one or more localised para
. A page
can have an optional picjustify
attribute to specify the page layout.
In lower level stories, there is a syllable breakdown for each word, delimited by /
. The title and first page of the A very tall man story in Swahili are represented like this:
<book id="xr-averytallmanSW" indent="N" lineheight="1.5" paraheight="1.33" letterspacing="1" fontsize="50" noparas="true">
<page pageno="0">
<para>M/tu m/re/fu sa/na</para>
</page>
<page pageno="1">
<para>Mwa/nga/li/e m/tu hu/yu. Je/mbe la/ke ni fu/pi m/no.</para>
</page>
…
Every para
has a corresponding .m4a recorded audio file and an accompanying .etpa file which specifices the start time of each word in the audio file. For example, the breakdown of the first page of the story above can be seen in p1_1.etpa
:
<xml>
<timings text="Mwangalie mtu huyu. Jembe lake ni fupi mno.">
<timing id="0" start="0.000" end="0.513" startframe="0" framelength="22630" text="Mwangalie"/>
<timing id="1" start="0.513" end="0.872" startframe="22630" framelength="15820" text="mtu"/>
<timing id="2" start="0.872" end="1.235" startframe="38450" framelength="16018" text="huyu"/>
…
Each word within a para
is also individually recorded, along with a version split into syllables for lower-level stories.
Use PhraseAnal, to assist with generating .etpa files from audio files.
Component localizations consist of a set of .m4a recorded audio files. The file names tend to correspond to the english localization.
For example in the numeracy component Add and subtract, assets/oc-addsubtract/local/sw/q_sevenbees.m4a
is the Swahili translation of the phrase "Seven bees". In the reading component Making plurals, assets/oc-makingplurals/local/en_GB/mp2_goodtheyreinorder.m4a
is the english recording of "Good, they are in order".
We have provided mappings of all English audio to .m4a filenames. These are xml files inside the assets/localization
directory.
The child's learning journey is an ordered set of learning units
to be worked through. A learning unit
is a component
with parameters assigned from the underlying language pack as well as any required visual, audio or configuration assets. It is specific to a particular localization. There are three variants, all residing in the assets/masterlists/[community|playzone|library]
directory:
This is defined in the file community_LOCALE/units.xml
. An example of the first part of the Pehla Class Swahili learning journey from community_swunits.xml
is shown below. The first unit is an introduction to using the tablet, the second a flashcard reading activity:
<level id="1">
<unit id="0002.OC_SectionIT"
target="OC_SectionIT"
params="eventit"
config="oc-introduction"
lang="sw"
targetDuration="120"
passThreshold="0.5"
icon="icon_0002"/>
<unit id="0003.OC_Sm2"
target="OC_Sm2"
params="sm2/
demo=true/
demotype=a1/
noscenes=8/
words=fc_prize,fc_animals,fc_sugar,fc_donkey,fc_pineapple,fc_glasses,fc_shorts,fc_minibus,fc_address,fc_basin,fc_cooking_pot,fc_sister,fc_mango,fc_hoe,fc_button,fc_drum,fc_pump_for_well,fc_friends,fc_rabbit,fc_soil,fc_children,fc_heron"
config="oc-lettersandsounds"
lang="sw"
targetDuration="120"
passThreshold="0.5"
icon="icon_0003"/>
…
A learning journey consists of a number of levels
. These are ordered groups of 105 learning units
. They represent weeks in the course, with 15 units per day.
A learning unit
has the following parameters:
id
unique identifier.target
component the unit is using.param
list of component-specfic parameters inkey=value
form, each delimited by/
.config
configuration directory for audio and video assets used by the component.lang
language pack to be used.targetDuration
upper bound on the time to should take an average child to complete the unit.passThreshold
the ratio of correct:incorrect answers which constitute the child passing the unit successfully.icon
the image shown to the child before beginning the unit. These arepng
files stored inassets/masterlists/[community|library|playzone]_LOCALE/icons/
In the following example, the letter tracing component OC_LetterTrace
is being used. The params
indicate:
lt
single letter mode.intro=false
no introductory audio.letter=i
use the letteri
from the Swahililanguage pack
notraces=4
the letter is to be traced 4 times.
<unit id="0052.OC_LetterTrace"
target="OC_LetterTrace"
params="lt/intro=false/letter=i/notraces=4"
config="oc-lettersandsounds"
lang="sw"
targetDuration="120"
passThreshold="0.5"
icon="icon_0052"
/>
This is defined in the file playzone_LOCALE/units.xml
. It specifies which units appear in the play zone for each level (week).
This is defined in the file library_LOCALE/units.xml
. It specifies all of the stories in the story library for the locale. Here, level represents the relative complexity of a set of stories.
Pehla Class by default uses two fonts, onebillionreader-Regular.otf
and onebillionwriter-Regular.otf
. These can be replaced by identically named alternative fonts in app/src/main/fonts/
. Please note Pehla Class does not currently support right-to-left scripts.
For video clips in the Pehla Class play zone, optional subtitles can be added. These are standard .srt
text files placed in the assets/oc-video/local/LOCALE/
directory. Each subtitle entry within a file consists of four parts:
- A numeric counter identifying each sequential subtitle.
- The time at which the subtitle should appear on the screen, followed by
-->
and the time it should disappear. - The subtitle itself on one or more lines.
- A blank line indicating the end of this subtitle.
For example, the Swahili subtitle for the video Origami Elephant in assets/oc-video/local/sw/origami_elephant.srt
:
1
00:00:00,000 --> 00:09:08,990
Tutengeneze tembo wa karatasi!
Apply the following configurations to files in the Pehla Class source code directory
-
Create a settings
.plist
file for the new locale by copying the English:cp app/src/main/config/settings_community_enIN.plist app/src/main/config/settings_community_LOCALE.plist
-
Specify fallback language for assets common among locales. Mention LOCALE code for the assets to be used.
<key>localisation_fallback_language_id</key> <string>LOCALE</string>
-
Edit the new settings file, replacing the value of the following keys:
<key>app_masterlist</key> <string>community_LOCALE</string> <key>app_masterlist_playzone</key> <string>playzone_LOCALE</string> <key>app_masterlist_library</key> <string>library_LOCALE</string>
-
Append the following configuration to
build.gradle
:LOCALE { applicationId ‘com.maq.pehlaclass.LOCALE' versionCode 1 versionName '1.0' resValue "string", "app_name", "Pehla Class" resValue "string", "test_only", "false" buildConfigField 'String', 'SETTINGS_FILE', '"settings_community_LOCALE.plist"' manifestPlaceholders = [ appIcon: "@mipmap/icon_child" ] }
-
Update the TTS locale
ttsLang
in OBTextToSpeech.java file. -
Add locale specific app icons in the src folder.