Texts with OpenITI mARkdown tags (https://maximromanov.github.io/mARkdown/) can now be converted without having effect on the tags. All conversion functions now have an optional mARkdown
argument (default: False
); if set to True
, the tags will be temporarily replaced by placeholders and put back into the text after conversion.
A paleography mode was added, in which sukūns, vowels and other diacritics are not automatically added in transcription and Arabic script, and users have the possibility to manually add sukūns. The betacodeToArabic
and arabicToBetaCode
functions now have an optional argument paleo
that, if set to True
, disables the automatic creation of sukūns and vowels apart from those that are explicitly transcribed. In paleo
mode, use simple a or i at the beginning of a word to transcribe a bare alif without vowels, hamza, madda or waṣla. The default setting for paleo
is False
, so that the code is by default executed exactly as before.
In addition, support for Persian was added. Persian letters are transcribed in normal mode. In addition, the betacodeToArabic
and betaCodeToArSimple
functions now have an optional argument persian
(default: False
); if set to True
, the Persian variants of kāf and yā' will be used instead of the Arabic versions (which look slightly different, and have different Unicode code points) will be used in the output. More functionality may be added later.
The following new characters were also added:
betacode | translit | Arabic letter |
---|---|---|
?b | ɓ * | undotted bāʾ/tāʾ/thāʾ and non-final yāʾ/nūn |
?n | ɲ * | undotted final nūn |
?f | ƒ * | undotted fāʾ |
?q | ɋ * | undotted qāf |
!o | ° | explicit sukūn (paleo mode) |
p | p | Persian letter pe |
^c | č | Persian letter che |
^z | ž | Persian letter zhe |
k | k | Persian letter kāf ** |
g | g | Persian letter gāf |
* no standard symbols appear to exist for transcribing undotted letters.
** Persian and Arabic kāf are transcribed in the same way; the persian
argument in the betacodeToArabic
function defines if the Persian or Arabic variant is used in the output.
Done to avoid issues with Alpheios translation alignment, which automatically splits supplied texts into words. Essentially, combinations with "." and ":" are replaced with "*" and "=" respectively.
- =t is tāʾ marbūṭaŧ
- *s is ṣād (and the same for other letters transliterated with dots)
Although both Windows and Mac OS now support Arabic, it is still quite difficult to type and edit Arabic texts. It is particularly frustrating to edit and manipulate fully vocalized texts, since most fonts either render “short vowels” (ḥarakāt) invisible, or do not render them properly. Because of the “stacking,” i.e. “short vowels” being placed on top of letters and on top of each other, it becomes impossible to edit texts and one is often forced to go into delete-and-retype mode (and there is still no guarantee, because of visual issues, that all the letters and “short vowels” will actually be in the right order). betaCode can make it easy to type fully-vocalized Arabic texts on any machine through the use of simple character combinations and automatic rendering into various transliteration schemes and the Arabic script (scroll below for examples).
betaCode is first converted into a one-to-one transliteration scheme, which combines conventions from various academic transliteration schemes. Such scheme is necessary, since none of the existing academic schemes (American/Library of Congress, British, French, German, etc.) allow representing Arabic text unambiguously for computational purposes. Arabic betaCode transliteration can be then converted into any transliteration convention. At the moment the following schemes are implemented:
- Library of Congress Romanization of Arabic
- Simplified transliteration (LOC without diacritics)
- Arabic script (the rules of ḥamzaŧ orthography are implemented, but may require some additional testing)
NB: The idea of betaCode is borrowed from the Classicists who developed "a method of representing, using only ASCII characters, characters and formatting found in ancient Greek texts". The current betaCode is inspired by, and is therefore quite similar to, the arabTex scheme. Linguists working with Arabic are commonly using Buckwalter transliteration, which is very similar to the current betaCode, but less readable.
betacode | translit | Arabic letter |
---|---|---|
_a | ā | alif |
b | b | bāʾ |
p | p | Persian letter pe |
t | t | tāʾ |
_t | ṯ | thāʾ |
^g, j | ǧ | jīm |
^c | č | Persian letter che |
*h | ḥ | ḥāʾ |
_h | ḫ | khāʾ |
d | d | dāl |
_d | ḏ | dhāl |
r | r | rā’ |
z | z | zayn |
^z | ž | Persian letter zhe |
s | s | sīn |
^s | š | shīn |
*s | ṣ | ṣād |
*d | ḍ | ḍād |
*t | ṭ | ṭāʾ |
*z | ẓ | ẓāʾ |
` | ʿ | ‘ayn |
*g | ġ | ghayn |
f | f | fāʾ |
*k, q | ḳ | qāf |
k | k | kāf (Arabic and Persian) |
g | g | Persian letter gāf |
l | l | lām |
m | m | mīm |
n | n | nūn |
h | h | hā’ |
w | w | wāw |
_u | ū | wāw |
y | y | yāʾ |
_i | ī | yāʾ |
betacode | translit | Arabic |
---|---|---|
' | ʾ | ḥamzaŧ |
/a | á | alif maqṣūraŧ |
=t | ŧ | tāʾ marbūṭaŧ |
betacode | translit | Arabic |
---|---|---|
~a | ã | dagger alif |
u | u | ḍammaŧ |
i | i | kasraŧ |
a | a | fatḥaŧ |
!o | ° | explicit sukūn (paleo mode) |
*n | ȵ | n of tanwīn |
*a | å | silent alif |
*w | ů | silent wāw |
?u | ủ | final ḍammaŧ * |
?i | ỉ | final kasraŧ * |
?a | ả | final fatḥaŧ * |
* “finals” are those final vowels that are usually dropped in transliteration and pronounciation (i.e., al-kitāb, instead of al-kitābủ, al-kitābỉ, al-kitābả), vs those that are not (huwa, hiyya, ḏãlika, tilka).
Every Arabic letter is betaCoded with its one-letter equivalent, preceded (if necessary) with a technical character that is similar to a diacritical mark in the transliterated version. Thus, most common symbols are as follows:
- _ (underscore), if a letter can be transliterated with macron/breve below or above (ā, ṯ, ḫ, ḏ, ū, ī)
- * (asterisk), if a letter can be transliterated transliterated with dot below or above (ḥ, ṣ, ḍ, ṭ, ẓ, ġ, ḳ)
- ^ (caret), if a letter can be transliterated with caron (ǧ, š)
- ! (exclamation mark), for diacritics that are explicitly in a manuscript (in paleo mode only)
- attached prepositions/conjunctions and pronominal suffixes must be separated with “-” (mostly relevant for text alignment, treebanking, and general readability):
fa-_dahaba
- add “?” before “optional” final vowels that are usually dropped in transliteration and pronounciation (mostly relevant for transliteration):
bi-Llah?i
, but not:fa-_dahaba
- tāʾ marbūṭaŧ: add “+” after tāʾ marbūṭaŧ, if the first word of iḏāfaŧ (mostly relevant for transliteration):
`_amma=t+u Ba.gd_ada
, but:al-`_amma=tu f_i Ba.gd_ada
- transliterating tanwīn:
*n
?u*n
?i*n
?a*n
- silent wāw and alif:
*w
(Amr?u*n*w
, for <span="arabic">عَمْرٌو)*a
(wa-fa`al_u*a
, for <span="arabic">وَفَعَلُوا)
- bare initial alif (without vowels, hamza, wasla, madda): in paleo mode only:
*
a
ori
- (Python 3.xx must be installed on the machine)
- clone git repository
- save texts that must be transliterated (i.e., the text is in English, but has some Arabic terms that must be transliterated) into
./to_translit/
(follow the format given in the example file). - save texts that must be fully transliterated or/and converted into Arabic script (i.e., the entire texts is in Arabic) into
./to_arabic/
(follow the format given in the example file). - run the script
_generateBetaCode.py
(in Mac terminal:python3 _generateBetaCode.py
; on Windows: double-click on the script should work). - converted texts (in all available modes of conversion) will be appended to the file.
- if you need to make any changes, edit your initial betaCode text and run the script again, converted results will be replaced with relevant updated versions.
NB: These are examples of converting betaCode to full transliteration and Arabic script. The very last paragraph showcases conversion of hamzaŧ in different positions.
q_ala 'ab_u Mas`_ud?i*n :: 'an_a qad sami`tu ha_d_a min ras_ul?i Allah?i ( *sl`m )
hadda_ta-n_a `Amr?uw bn?u R_afi`?in , hadda_ta-n_a `Abd?u All~ah?i bn?u al-Mub_arak?i , `an Muhammad?i bn?i 'Ish_aq?a , `an Muhammad?i bn?i ^Ga`far?in , `an `Ubayd?i Allah?i bn?i `Abd?i Allah?i bn?i `Umar?a , `an 'Ab_i-hi , `an?i al-Nabiyy?i ( sl`m ) nahwa-hu
'a_hbara-n_a Qutayba=t?u q_ala , hadda_ta-n_a Sufy_an?u , `an Yahy/a bn?i Sa`_id?in , `an 'Ab_i Bakr?i bn?i Muhammad?in , `an `Umar?a bn?i `Abd?i al-`Az_iz?i , `an 'Ab_i Bakr?i bn?i `Abd?i al-Rahm~an?i bn?i al-H_ari_t?i bn?i Hi^s_am?in , `an 'Ab_i Hurayra=t?a mi_tla-hu
Tahw_il?u al-hamza=t?i ( kalim_at?un mufrada=t?u*n )
'amr?un 'uns?un 'ins?un '_im_an?un '_aya=t?un '_amana mas'ala=t?un sa'ala ra's?un qur'_an?un ta'_amara _di'b?un as'ila=t?un q_ari'i-hi su'l?un mas'_ul?un tak_afu'u-hu su'ila q_ari'i-hi _di'_ab?un ra'_is?un bu'isa ru'_uf?un ra'_uf?un su'_al?un mu'arri_h?un abn_a'a-hu abn_a'u-hu abn_a'i-hi ^say'?an _hat_i'a=t?un daw'u-hu d_u'u-hu daw'a-hu daw'i-hi mur_u'a=t?un 'abn_a'i-hi bar_i'u-hu s_u'ila f_il?un f_ann?un f_unn?un s_a'ala fu'_ad?un ^surak_a'u-hu ri'_asa=t?un tahni'a=t?un daf_a'a=t?un taff_a'a=t?un ta'r_i_h?un fa'r?un ^say'?un ^say'?in ^say'?an daw'?un daw'?in daw'?an juz'?un juz'?in juz'?an mabda'?un mabda'?in mabda'?an naba'a q_ari'?un tak_afu'?un tak_afu'?in tak_afu'?an abn_a'u abn_a'i abn_a'a jar_i'?un maqr_u'?un daw'?un ^say'?un juz'?un `ulam_a'u al-`ulam_a'i al-`ulam_a'a `Amr?unw wa-fa`al_u*a
ḳāla ʾabū Masʿūdỉȵ :: ʾanā ḳad samiʿtu hãḏā min rasūlỉ Allãhỉ ( ṣlʿm )
ḥaddaṯa-nā ʿAmrủů bnủ Rāfiʿỉȵ , ḥaddaṯa-nā ʿAbdủ Allãhỉ bnủ al-Mubārakỉ , ʿan Muḥammadỉ bnỉ ʾIsḥāḳả , ʿan Muḥammadỉ bnỉ Ǧaʿfarỉȵ , ʿan ʿUbaydỉ Allãhỉ bnỉ ʿAbdỉ Allãhỉ bnỉ ʿUmarả , ʿan ʾAbī-hi , ʿanỉ al-Nabiyyỉ ( ṣlʿm ) naḥwa-hu
ʾaḫbara-nā Ḳutaybaŧủ ḳāla , ḥaddaṯa-nā Sufyānủ , ʿan Yaḥyá bnỉ Saʿīdỉȵ , ʿan ʾAbī Bakrỉ bnỉ Muḥammadỉȵ , ʿan ʿUmarả bnỉ ʿAbdỉ al-ʿAzīzỉ , ʿan ʾAbī Bakrỉ bnỉ ʿAbdỉ al-Raḥmãnỉ bnỉ al-Ḥāriṯỉ bnỉ Hišāmỉȵ , ʿan ʾAbī Hurayraŧả miṯla-hu
Taḥwīlủ al-hamzaŧỉ ( kalimātủȵ mufradaŧủȵ )
ʾamrủȵ ʾunsủȵ ʾinsủȵ ʾīmānủȵ ʾāyaŧủȵ ʾāmana masʾalaŧủȵ saʾala raʾsủȵ ḳurʾānủȵ taʾāmara ḏiʾbủȵ asʾilaŧủȵ ḳāriʾi-hi suʾlủȵ masʾūlủȵ takāfuʾu-hu suʾila ḳāriʾi-hi ḏiʾābủȵ raʾīsủȵ buʾisa ruʾūfủȵ raʾūfủȵ suʾālủȵ muʾarriḫủȵ abnāʾa-hu abnāʾu-hu abnāʾi-hi šayʾảȵ ḫaṭīʾaŧủȵ ḍawʾu-hu ḍūʾu-hu ḍawʾa-hu ḍawʾi-hi murūʾaŧủȵ ʾabnāʾi-hi barīʾu-hu sūʾila fīlủȵ fānnủȵ fūnnủȵ sāʾala fuʾādủȵ šurakāʾu-hu riʾāsaŧủȵ tahniʾaŧủȵ dafāʾaŧủȵ ṭaffāʾaŧủȵ taʾrīḫủȵ faʾrủȵ šayʾủȵ šayʾỉȵ šayʾảȵ ḍawʾủȵ ḍawʾỉȵ ḍawʾảȵ ǧuzʾủȵ ǧuzʾỉȵ ǧuzʾảȵ mabdaʾủȵ mabdaʾỉȵ mabdaʾảȵ nabaʾa ḳāriʾủȵ takāfuʾủȵ takāfuʾỉȵ takāfuʾảȵ abnāʾu abnāʾi abnāʾa ǧarīʾủȵ maḳrūʾủȵ ḍawʾủȵ šayʾủȵ ǧuzʾủȵ ʿulamāʾu al-ʿulamāʾi al-ʿulamāʾa ʿAmrủȵů wa-faʿalūå
قَالَ أَبُو مَسْعُودٍ :: أَنَا قَدْ سَمِعْتُ هٰذَا مِنْ رَسُولِ الـلّٰـهِ ( صْلْعْمْ )
حَدَّثَنَا عَمْرُو بْنُ رَافِعٍ ، حَدَّثَنَا عَبْدُ الـلّٰـهِ بْنُ الْمُبَارَكِ ، عَنْ مُحَمَّدِ بْنِ إِسْحَاقَ ، عَنْ مُحَمَّدِ بْنِ جَعْفَرٍ ، عَنْ عُبَيْدِ الـلّٰـهِ بْنِ عَبْدِ الـلّٰـهِ بْنِ عُمَرَ ، عَنْ أَبِيهِ ، عَنِ النَّبِيِّ ( صْلْعْمْ ) نَحْوَهُ
أَخْبَرَنَا قُتَيْبَةُ قَالَ ، حَدَّثَنَا سُفْيَانُ ، عَنْ يَحْيٰى بْنِ سَعِيدٍ ، عَنْ أَبِي بَكْرِ بْنِ مُحَمَّدٍ ، عَنْ عُمَرَ بْنِ عَبْدِ الْعَزِيزِ ، عَنْ أَبِي بَكْرِ بْنِ عَبْدِ الرَّحْمٰنِ بْنِ الْحَارِثِ بْنِ هِشَامٍ ، عَنْ أَبِي هُرَيْرَةَ مِثْلَهُ
تَحْوِيلُ الْهَمْزَةِ ( كَلِمَاتٌ مُفْرَدَةٌ )
أَمْرٌ أُنْسٌ إِنْسٌ إِيمَانٌ آيَةٌ آمَنَ مَسْأَلَةٌ سَأَلَ رَأْسٌ قُرْآنٌ تَآمَرَ ذِئْبٌ أَسْئِلَةٌ قَارِئِهِ سُؤْلٌ مَسْؤُولٌ تَكَافُؤُهُ سُئِلَ قَارِئِهِ ذِئَابٌ رَئِيسٌ بُئِسَ رُؤُوفٌ رَؤُوفٌ سُؤَالٌ مُؤَرِّخٌ أَبْنَاءَهُ أَبْناؤُهُ أَبْنائِهِ شَيْئًا خَطِيئَةٌ ضَوْءُهُ ضُوؤُهُ ضَوْءَهُ ضَوْئِهِ مُرُوءَةٌ أَبْنائِهِ بَرِيؤُهُ سُوئِلَ فِيلٌ فَانٌّ فُونٌّ سَاءَلَ فُؤَادٌ شُرَكاؤُهُ رِئَاسَةٌ تَهْنِئَةٌ دَفَاءَةٌ طَفّاءَةٌ تَأْرِيخٌ فَأْرٌ شَيْءٌ شَيْءٍ شَيْئًا ضَوْءٌ ضَوْءٍ ضَوْءًا جُزْءٌ جُزْءٍ جُزْءًا مَبْدَأٌ مَبْدَأٍ مَبْدَأً نَبَأَ قَارِئٌ تَكَافُؤٌ تَكَافُؤٍ تَكَافُؤًا أَبْناءُ أَبْناءِ أَبْناءَ جَريءٌ مَقْروءٌ ضَوْءٌ شَيْءٌ جُزْءٌ عُلَماءُ الْعُلَماءِ الْعُلَماءَ عَمْرٌو وَفَعَلُوا
NB: This is an example of the English text with terms, names and toponyms given in betaCode and automatically converted into different transliteration flavors (exerpts are from Brill’s Encyclopaedia of Islam).
Dima^s.k, Dima^s.k al-^S_am or simply al-^S_am , (Lat. Damascus, Fr. Damas) is the largest city of Syria. It is situated ... very much at the same latitude as Ba.gd_ad and F_as, at an altitude of nearly 700 metres, on the edge of the desert at the foot of ^Gabal .K_asiy_un.
al-_Dahab_i, ^Sams al-D_in Ab_u `Abd Allah Mu.hammad b. `U_tm_an b. .K_aym_a.z b. `Abd Allah al-Turkum_an_i al-F_ari.k_i al-Dima^s.k_i al-^S_afi`_i, an Arab historian and theologian, was born at Damascus or at Mayy_afari.k_in on 1 or 3 Rab_i` II (according to al-Kutub_i, in Rab_i` I) 673/5 or 7 October 1274, and died at Damascus, according to al-Subk_i and al-Suy_u.t_i, in the night of Sunday-Monday on 3 _D_u al-.Ka`da=t 748/4 February 1348, or, according to A.hmad b. `Iy_as, in 753/1352-3. He was buried at the B_ab al-.Sa.g_ir.
Dimašḳ, Dimašḳ al-Šām or simply al-Šām , (Lat. Damascus, Fr. Damas) is the largest city of Syria. It is situated ... very much at the same latitude as Baġdād and Fās, at an altitude of nearly 700 metres, on the edge of the desert at the foot of Ǧabal Ḳāsiyūn.
al-Ḏahabī, Šams al-Dīn Abū ʿAbd Allãh Muḥammad b. ʿUṯmān b. Ḳāymāẓ b. ʿAbd Allãh al-Turkumānī al-Fāriḳī al-Dimašḳī al-Šāfiʿī, an Arab historian and theologian, was born at Damascus or at Mayyāfariḳīn on 1 or 3 Rabīʿ II (according to al-Kutubī, in Rabīʿ I) 673/5 or 7 October 1274, and died at Damascus, according to al-Subkī and al-Suyūṭī, in the night of Sunday-Monday on 3 Ḏū al-Ḳaʿdaŧ 748/4 February 1348, or, according to Aḥmad b. ʿIyās, in 753/1352-3. He was buried at the Bāb al-Ṣaġīr.
Dimashq, Dimashq al-Shām or simply al-Shām , (Lat. Damascus, Fr. Damas) is the largest city of Syria. It is situated ... very much at the same latitude as Baghdād and Fās, at an altitude of nearly 700 metres, on the edge of the desert at the foot of Jabal Qāsiyūn.
al-Dhahabī, Shams al-Dīn Abū ʿAbd Allāh Muḥammad b. ʿUthmān b. Qāymāẓ b. ʿAbd Allāh al-Turkumānī al-Fāriqī al-Dimashqī al-Shāfiʿī, an Arab historian and theologian, was born at Damascus or at Mayyāfariqīn on 1 or 3 Rabīʿ II (according to al-Kutubī, in Rabīʿ I) 673/5 or 7 October 1274, and died at Damascus, according to al-Subkī and al-Suyūṭī, in the night of Sunday-Monday on 3 Dhū al-Qaʿda 748/4 February 1348, or, according to Aḥmad b. ʿIyās, in 753/1352-3. He was buried at the Bāb al-Ṣaghīr.
Dimashq, Dimashq al-Sham or simply al-Sham , (Lat. Damascus, Fr. Damas) is the largest city of Syria. It is situated ... very much at the same latitude as Baghdad and Fas, at an altitude of nearly 700 metres, on the edge of the desert at the foot of Jabal Qasiyun.
al-Dhahabi, Shams al-Din Abu Abd Allah Muhammad b. Uthman b. Qaymaz b. Abd Allah al-Turkumani al-Fariqi al-Dimashqi al-Shafii, an Arab historian and theologian, was born at Damascus or at Mayyafariqin on 1 or 3 Rabi II (according to al-Kutubi, in Rabi I) 673/5 or 7 October 1274, and died at Damascus, according to al-Subki and al-Suyuti, in the night of Sunday-Monday on 3 Dhu al-Qada 748/4 February 1348, or, according to Ahmad b. Iyas, in 753/1352-3. He was buried at the Bab al-Saghir.
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.