Fix data format for Andhrabharati and other willing individuals #604

drdhaval2785 · 2021-09-02T16:48:02Z

Reference - sanskrit-lexicon/CORRECTIONS#414

There has been suggestions to make corrections in Cologne data more user friendly.
SLP1 is very much suited for programmatic manipulation, but its human readability has a steep learning curve.
This issue is dedicated to come to a consensus regarding the format in which Cologne files may be given (with reversibility) to someone who wants to improve the files substantially, like @Andhrabharati .

drdhaval2785 · 2021-09-02T16:53:44Z

Example

rAma in PWG
https://www.sanskrit-lexicon.uni-koeln.de/scans/csl-apidev/servepdf.php?dict=PWG&page=6-0342

The corresponding entry in pwg.txt is as follows

<L>84468<pc>6-0342<k1>rAma<k2>rAma/<h>1
1. {#rAma/#}¦ (wohl desselben Ursprungs wie {#rAtrI#}) 
<div n="1"> 1) <lex>adj.</lex> (f. {#A#}) {%dunkelfarbig, schwarz%} 
<ls>NIR. 12, 13.</ls> 
<ls>AK. 3, 4, 23, 143.</ls> 
<ls>H. 1397.</ls> 
<ls>an. 2, 334.</ls> 
<ls>MED. m. 26. fg.</ls> <ls>HALĀY. 4, 49.</ls> {#rAme\ kfzRe\ asi^kni ca#} 
<ls>AV. 1, 23, 1.</ls> Schaf <ls>12, 2, 19.</ls> {#nAsya rAma (= ramaRIyaH putraH#} Comm.) {#ucCizwaM pibet#} 
<ls>TAITT. ĀR. 5, 8, 13.</ls> {#rAmA#} {%eine Dunkle%} d. i. {%ein Weib gemeiner Herkunft%}: {#nAgniM ci\tvA rA\mAmupe^yAt#} 
<ls>TS. 5, 6, 8, 3.</ls> 
<ls>TAITT. ĀR. 5, 8, 13.</ls> <ls>Schol.</ls> zu <ls>KĀTY. ŚR. 18, 6, 27.</ls> Auch die Bedeutung 2. {#rAma#} 2) {%c)%}
<lang n="greek">(α)</lang> wäre indessen hier möglich. Nach 
<ls>AK.</ls> <ls>H. an.</ls> und <ls>MED.</ls> auch {%weiss.%} 
<div n="1">— 2) <lex>m.</lex> 
<div n="2"> a) {%eine Hirschart%} 
<ls>AK. 2, 5, 11.</ls> 
<ls>TRIK. 3, 3, 302.</ls> 
<ls>H. an.</ls> 
<ls>MED.</ls> 
<div n="2">— b) {%Pferd%} 
<ls>MED.</ls> 
<div n="2">— c) N. pr. eines Mannes 
<ls>ṚV. 10, 93, 14.</ls> mit dem patron. <is>Mārgaveya</is> 
<ls>AIT. BR. 7, 27. 34.</ls> <is>Aupatasvini</is> 
<ls>ŚAT. BR. 4, 6, 1, 7.</ls> <is>Jāmadagnya</is>, Verfassers von 
<ls>ṚV. 10, 110.</ls> Im Epos und später erscheinen {%drei%} <is>Rāma</is> (daher {#rAma#} als Bez. {%der Zahl drei%} 
<ls>VARĀH. BṚH. S. 8, 20</ls>), von denen die beiden ersten für Incarnationen <is>Viṣṇu's</is> gelten: 
<div n="3"> α) mit dem patron. <is>Jāmadagnya</is> oder <is>Bhārgava</is>, ein Sohn der <is>Reṇukā</is>, auch {#paraSurAma#} genannt, 
<ls>TRIK. 3, 3, 302.</ls> 
<ls>H. 848.</ls> 
<ls>H. an.</ls> 
<ls>MED.</ls> (wo {#rERukeye#} st. {#vERukeye#} zu lesen ist). {#rAmaH SastraBftAmaham#} (vgl. 
<ls>HARIV. 5869</ls>) sagt <is>Kṛṣṇa</is> 
<ls>BHAG. 10, 31.</ls> 
<ls>MBH. 1, 272. 2612. 3, 8658. 8, 1584. 12, 1715. fgg. 12948.</ls> 
<ls>HARIV. 2313. fgg. 5869. fg.</ls> {#rAmarAmavivAda#} 
<ls>R. 1, 3, 11 (5 GORR.). 74, 22. fg. 76, 1.</ls> <ls>R. GORR. 1, 77, 23. 37.</ls> <ls>RAGH. 11, 68.</ls> 
<div n="3">— β) mit dem patron. <is>Rāghava</is> oder <is>Dāśarathi</is> 
<ls>TRIK. 2, 8, 3. 3, 3, 302.</ls> 
<ls>H. 703.</ls> 
<ls>H. an.</ls> 
<ls>MED.</ls> 
<ls>MBH. 3, 11197. 15933. 12, 12949.</ls> 
<ls>HARIV. 822. 2324. fgg. 3065. fgg. 5871. 7373.</ls> 
<ls>R. 1, 1, 10. 17. 20.</ls> {#rAmarAmavivAda#} 
<ls>3, 11 (5</ls> <ls>GORR.).</ls> {#ramayatyeva sa guRErudArEstErimAH prajAH . yasmAdato rAma iti nAmEtattasya viSrutam ..#} 
<ls>R. GORR. 1, 1, 22. 6, 102.</ls> <ls>RAGH. 11, 68.</ls> <ls>VARĀH. BṚH. S. 58, 30.</ls> <ls>VP. 384.</ls> <ls>BHĀG. P. 9, 10. fgg.</ls> <ls>Spr. 2630.</ls> {#rAmo hemamfgaM ma rvetti#} 
<ls>2631.</ls> {#ramante yogino 'nante satyAnande cidAtmani . #}
[Page6-0343]
{# iti rAmapadenAsO paraM brahmABiDIyate ..#} 
<ls>WEBER, RĀMAT. UP. 286.</ls> 
<div n="3">— γ) = <is>Balarāma</is>, <is>Halāyudha</is>, ein älterer Bruder <is>Kṛṣṇa's</is> 
<ls>AK. 1, 1, 1, 18. 3, 4, 23, 143.</ls> 
<ls>H. 224.</ls> 
<ls>H. an.</ls> 
<ls>MED.</ls> 
<ls>HALĀY. 1, 29.</ls> 
<ls>BHĀG. P. 1, 11, 17. 10, 1, 8.</ls> <ls>WEBER, KṚṢṆAJ. 268. 281. 284. 289.</ls> erscheint bei den <is>Jaina</is> unter den 
<ls>9</ls> {%weissen%} (s. oben u. 
<div n="1"> 1) <is>Bala's</is> 
<ls>H. 698.</ls> — <is>Rāma</is> unter den sieben Weisen eines <is>Manu</is> 
<ls>HARIV. 453.</ls> 
<ls>MĀRK. P. 80, 4.</ls> <is>Rāma</is> ist ein auch später häufig vorkommender Name: so heisst z. B. ein Sohn <is>Tārāvaloka's</is> und einer <is>Mādrī</is> und Zwillingsbruder <is>Lakṣmaṇa's</is> 
<ls>KATHĀS. 113, 32.</ls> verschiedene Lehrer, Autoren 
u.s.w. <ls>BURN. Intr. 567</ls> (neben {#Badanta°#}). 
<ls>COLEBR. Misc. Ess.</ls> <ls>?II,49. Verz. d. B. H. No. 109. 833. Ind. St.8,389. HALL 84. 119. Verz. d. Oxf. H. 126,b, No. 220. 129,b, No. 234. 148,a,9. 151,b, No. 321. fgg. 335,b, No. 788. 341,b, N. 358,a, No. 853. 386,a, No. 505.</ls> ein Fürst von <is>Mallapura</is> 
<ls>148,b,15. 18.</ls> von <is>Śṛṅgavera</is> 
<ls>165</ls>, {%a%}, 
<ls>7. 178</ls>, {%a%}, 
<ls>?16. - RĀJA-TAR. 8, 785. KṢITĪŚ. 10, 7. fgg.</ls> 
<div n="2">— d) Bein. <is>Varuṇa's</is> 
<ls>MED.</ls> 
<div n="2">— e) pl. N. pr. eines Volkes 
<ls>VP. 177.</ls> 
<div n="1">— 3) <lex>f.</lex> {#A#} 
<div n="2"> a) {%ein Weib niedriger Herkunft%}; s. u. 1). 
<div n="2">— b) = {#hiNgu#} {%Asa foetida%} 
<ls>H. an.</ls> <ls>MED.</ls> = {#hiNgula#} {%Mennig%} 
<ls>ŚABDAR.</ls> im <ls>ŚKDR.</ls> 
<div n="1">— 4) <lex>f.</lex> {#I#} {%Dunkel, Nacht%}: {#u\zA na rA\mIra^ru\RErapo^rRute#} 
<ls>ṚV. 2, 34, 11.</ls> 
<div n="1">— 5) <lex>n.</lex> 
<div n="2"> a) {%Dunkel%}: {#a\gnI ruSa^dBi\rvarRE^ra\Bi rA\mama^sTAt#} 
<ls>ṚV. 10, 3, 3.</ls> 
<div n="2">— b) = {#vAstuka#} ({%Chenopodium album%}) und {#kuzWa#} (in welcher Bed.?) 
<ls>H. an.</ls> <ls>MED.</ls> = {#tamAlapattra#} 
<ls>RĀJAN.</ls> im <ls>ŚKDR.</ls> 
<div n="v">— Vgl. {#aDo°, paraSu°, bala°, Badanta°, maRi°, manasA°#} .
<LEND>

drdhaval2785 · 2021-09-02T16:57:21Z

{#rAma/#} - Sanskrit text is shown between {# and #} tags. I think the requirement would be to convert it to Devanagari. This can be handled.
<ls>HALĀY. 4, 49.</ls> - Reference in the printed book is in some old Anglicized Sanskrit format, which is now brought to IAST format. This would not require any change I guess.

Is this what you have in mind @Andhrabharati ?

Andhrabharati · 2021-09-02T17:09:29Z

I already finished doing whatever is needed in PWG, so I am not interested any more in this particular lexicon.
sanskrit-lexicon/PWG#39 (comment)

Yes, no encodings in whatever manner, for any language; the tagging/marking could remain as is.

I would try "masking" my eyes on them; but this has a side-effect of skipping cases like wrong tagging/marking, which I had pointed during my MW work earlier!!

drdhaval2785 · 2021-09-03T06:53:21Z

https://github.com/sanskrit-lexicon/csl-devanagari/ this repository has 36 dictionaries from Cologne in Devanagari friendly manner. If there are any corrections needed, they may be tracked separately in that repository.

drdhaval2785 assigned drdhaval2785 and Andhrabharati Sep 2, 2021

drdhaval2785 added the doc Improvements or additions to documentation label Sep 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix data format for Andhrabharati and other willing individuals #604

Fix data format for Andhrabharati and other willing individuals #604

drdhaval2785 commented Sep 2, 2021

drdhaval2785 commented Sep 2, 2021

drdhaval2785 commented Sep 2, 2021 •

edited

Loading

Andhrabharati commented Sep 2, 2021

drdhaval2785 commented Sep 3, 2021

Fix data format for Andhrabharati and other willing individuals #604

Fix data format for Andhrabharati and other willing individuals #604

Comments

drdhaval2785 commented Sep 2, 2021

drdhaval2785 commented Sep 2, 2021

Example

drdhaval2785 commented Sep 2, 2021 • edited Loading

Andhrabharati commented Sep 2, 2021

drdhaval2785 commented Sep 3, 2021

drdhaval2785 commented Sep 2, 2021 •

edited

Loading