-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Request of IDs for MW meanings (additional to L for entries) #19
Comments
Your links are aesthetically pleasing. Is this your work? Your suggestion about tagging meanings is a new idea to me. It is definitely a research idea, and should be worked on in a 'dev' environment, and not the main trunk of current MW. A way for you to begin materializing your thoughts might be to mark up by hand some MW records. That way others can see something specific that they might be able to constructively criticize. You can also promote interest in your idea by expanding your explanations - I sampled several of the links, and they looked interesting and like someone has spent a lot of time and effort developing them. But, following the train of thought is hard. Just as my readme files are often obscure, these links are also obscure. Unless you know otherwise, assume your reader has absolutely no idea of what you're talking about - don't assume your reader is interested in 'reading your mind'. Be absurdly simple and clear. That will help you get useful feedback. |
Yes, it's mine. Under my guidance we have developed an .xls macro that takes the .html input and gives out a .doc with styles, after that which we print to .pdf as well. It's not that research, it's already there for ten years at http://yukta.org/download.php?lang=rus (esp. http://yukta.org/download/base_yukta.zip MySQL dump), a partly translation of MW to Russian.
To expand the explanations I will need your help, because I need to understand how to relink every entry to what I see at http://kjc-fs-cluster.kjc.uni-heidelberg.de/dcs/index.php?contents=corpus Do you have an idea how we can know if, for example, http://kjc-fs-cluster.kjc.uni-heidelberg.de/dcs/index.php?contents=einzelwort&IDWord=325 is |
Under the 'Meanings' section at http://kjc-fs-cluster.kjc.uni-heidelberg.de/dcs/index.php?contents=einzelwort&IDWord=325, there are 8 text definitions. By looking at akza in
Also, the definitions above are sometimes 'expansions' of MW definitions. If you had (a) the headword (= akza) and (b) the list of definitions, If you had (b), it might be possible to devise a scheme to deduce (a). |
By 'expansions' you mean the |
re: By 'expansions' you mean the religious knowledge case? That's one; another is 'name' in 523-5. 'Any idea where to start?' A download of Oliver's data. General principle: computer programs have input and output. The input has to be completely specified (simplest case: the input is all the data in a particular computer file). Application of general principle: You must specify the 'input' for the computer program you have in mind. That seems to me the place to start. |
@funderburkjim https://github.com/sanskrit-lexicon/DCS/blob/master/DCS-72034-gramm-tag-stats.csv is one of the downloads I've made. Should it be enough, as |
So that is a list of headwords. Where are the definitions? Also, the file of headwords shows many '?' characters: aka??akin, aka??aka ;adj, , aka??hya ;adj . |
@gasyoun, I am not sure whether I understand you here. Can you rephrase what you want done? |
Is this issue closable now? |
Since this issue is mentioned in @drdhaval2785 list at sanskrit-lexicon/COLOGNE#325, @drdhaval2785 perhaps your list should be reviewed and status updated? |
Let me explain #7 "24) Can we get a unique ID for every meaning in MW as well?" in more details. Otherwise it could get lost even for the sake of discussion.
I've been working with MW for ages and have tried a lot to understand what we should do with it (just today stumbled upon http://www.dialog-21.ru/digests/dialog2006/materials/html/Gasuns.htm) in the future.
Let me tell you my story, why I need to have an ID for each "translation" as well.
I've heard the the 2nd German translation took 6.5 hears to make it. And it was not even a word by word translation, so I'm not sure if it's a good idea, but my ayurvedic doctor wants to translate AHS. That means I have to translate it myself before he does it so I can help him in some way.
2) But before it becomes Russian, I need to make an English word by word translation. The good news is that Oliver has already partly solved the task in 2012 at his DSC. He has assigned an ID to every word (not meaning) and I'm not aware weather the AHS has any missasignments, for that I'll need additional help from an Indian ayurvedic researcher. The bad news - at some point and he does not remember why Oliver sorted the meanings inside every dictionary entry alphabetically (see http://samskrtam.ru/dsc-bugs/). That means that the frequent, wanted meanings starting with
z
will never be there, because the logical order of the entries is broken. Only because of that a huge amount of work has to be redone now.A single example. अर्थhas first (=main) meaning
aim
. But in Oliver's "new order" it's fifth and as I take only the first 3 meanings of every entry, it is left out in my automatic word by word translation file.3) If every translation will have it's own ID similar to the L numbers, than I will be able to start develop a system of improving the quallity of the word by word translations. First of all I will include:
After that I would make a voting system. An ayurvedic doctor reads the text and clicks on the most appropriate word out from the 3-5 word list. We gather statistics for these words http://kjc-fs-cluster.kjc.uni-heidelberg.de/dcs/index.php?contents=textdictionary&IDText=10 which meaning they preffer. After that maybe even an auto correction mode can be made. We "vote" for words in one chapter and the quallity score for different chapters might improve as well. We would need to map which
L
nubmers match witcheinzelwort&IDWord
at Oliver (sample http://kjc-fs-cluster.kjc.uni-heidelberg.de/dcs/index.php?contents=einzelwort&IDWord=104 = L=224 andwithout a why or a wherefore, accidentally, suddenly.
is split by,
). There are 72 000 words used at Oliver's version of MW.4) After that I will need to try to match Russian translations of MW at http://www.yukta.org/download.php?lang=rus and verify them at my Parallel Sanskrit Corpus, like http://samskrtam.ru/parallel-mahabharata/
I do not know if I should try to make a macro to work inside Word, Adobe Acrobat or some web UI. That is not that important know. What matters is that every meaning should have it's ID. Does it seams reasonable and possible, @funderburkjim ? @drdhaval2785 , is it possible to understand what I speak about?
The text was updated successfully, but these errors were encountered: