-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[INDOLOGY] Upgrade to online Koln Bohtlink-Roth dictionary? #72
Comments
I don't think that's worth spending our time on at CDSL; getting the links to scan pages itself is a big task and Jim has taken up the same with some support from my side. That OCRing is best left to the interested people (if any!!). I really doubt if anyone would venture the task and complete even a single book; people are just "making use" of the texts provided by open sources like GRETIL, Sanskritdocuments etc. (with whatever quality/drawbacks that they possess). No further improvement, nor any independent work!! |
And I recall that not even a single step has been taken (at your end, @gasyoun) for "getting" the text out of the front pages of the CDSL works [which is a very practical & achievable task] that was talked about few years back!! |
Disagree. Would want to discuss it with @martingluckman at a later stage. |
Encouraging to see that this feature of links to references is noticed by Gluckman. @gasyoun I agree with AB that OCRing (getting the text out of) the Documentation Frontmatter scans would be an upgrade to that section at Cologne. |
What Marcis said is that Harry Spier had noticed the linking feature, not Gluckman (whom Marcis wants to approach for helping in OCRing the full-works!) |
Yes, at least for the 'major' PWG references. I think I've put all of the 'link targets' here: https://github.com/orgs/sanskrit-lexicon-scans/repositories This repo also contains copies of the scanned images for the dictionaries. So someone interested in OCRing any of the link targets could clone one of these repos to get images of the individual pages. |
In fact, These github repos are also used by cdsl displays (e.g. of PWG) to 'serve' the images. |
Just OCRing can be done practically in no-time these days (courtesy Google); but it is the next phase, i.e. proofing the OCRed text to match the print is the REAL task. |
Is it not worthy to do this for all the works that exceed a count of 10k (references), in this spree? And @funderburkjim should update the lsextract_pwg file (which seems to have been last updated on 13th Jan. 2023) again, which will have further members (extending the list that I mentioned at the KSS issue) joining the 10k+ club! |
And also link the Indische Sprüche (1st ed.) scans, though the 2nd ed. has been already linked as a digital text. |
I am sure Jim cannot spend any time for this, and I WILL NOT (though I can do the proofing also, iff I take up the work); so you are welcome to get it done by any interested party, @gasyoun !! |
@Andhrabharati I'm speaking of a dirty OCR, nonproofed
|
A simple script will do it, @gasyoun! [And quite many of them are floating across the net.] |
Looks like Suśruta, 1835-6 is the only other candidate coming into the 10k+ club! Once this 'bound book" is split into two constituent volumes [Vol.1 (1835): 378pp and Vol.2 (1836): 562pp, leaving the front 4 "title" pages in each volume], there is no need for any indexing for this work-- as the references are just in the (volume,page,line) manner. Very easy for Jim, just like in the case of the Verz. d. Oxf. H.!! |
Never seen one @Andhrabharati
Where are the others? |
Well, not everyone need to know everything!
You mean the list of names? Look at my post above! |
@funderburkjim @Andhrabharati the work is started to be noticed! And so I can a question if we can batch get an OCR of the scans on our end with https://ocr.sanskritdictionary.com and with a little help from @martingluckman
"Does anyone know if this is an ongoing project to make all the references in the B-R Grosse Worterbuch live (i.e. point to the actual page of the work referenced). and if this project also extends to other of the Koln on-line dictionaries." - what is the plan and at what URL as of now? What is already covered? Even I miss part of the changelog.
The text was updated successfully, but these errors were encountered: