Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception: error in opening STYs file #1

Open
gourango01 opened this issue Aug 29, 2016 · 5 comments
Open

Exception: error in opening STYs file #1

gourango01 opened this issue Aug 29, 2016 · 5 comments

Comments

@gourango01
Copy link

./code/bin/umlsLceIndriRunQuery configs/wsuirdaa.cfg
reading wikiMedicalEntities...
Exception: error in opening STYs file:

@balaneshin
Copy link
Contributor

balaneshin commented Aug 29, 2016

You can update ... to the location of the downloaded STY file. You can obtain it from: https://metamap.nlm.nih.gov/Docs/SemanticTypes_2013AA.txt

Instead, I recommmend to run IndriRunQuery configs/wsuirdaa.cfg
IndriRunQuery is described in detail in the following link: https://sourceforge.net/p/lemur/wiki/IndriRunQuery/

@gourango01
Copy link
Author

gourango01 commented Sep 9, 2016

I tried what you have recommended i.e. IndriRunQuery configs/wsuirsaa.cfg , but i was getting P@10=0.28 which is not matching with the P@10=0.46 for configuration wsuirsaa.cfg which is mentioned in the paper. So can you please mention what i am missing in the process or what i have to do so i can get results mention in the paper for configuration "wsuirsaa.cfg" . During indexing have you considered document as whole or divided it in different segments(Title, Abstract, Body etc.) then indexing.

@balaneshin
Copy link
Contributor

As you have correctly figured out, you need to modify the collection preprocessing step. Today, I will update the readme file and describe this step in more detail. But basically what I did was to extract the values of all the fields from all nxml files in the collection and concatenate them into a single text file (with trectext format). I also replaced all non-English and non-alphabetic characters with spaces. I also removed all very long words (those with more than 25 characters).

@gourango01
Copy link
Author

gourango01 commented Sep 12, 2016

Is it possible that i can learn weights for different concepts types ("unigrams in topic summary","ordered bigrams in UMLS concepts in topic summary","unordered bigrams in UMLS concepts in topic summary","unigrams in PRF documents") using the code you have provided. Basically i just want to replicate results for wsuirsaa configuration end to end{from indexing to retrieval of related articles for each query}. Please suggest a way so i can achieve above stuff.

@balaneshin
Copy link
Contributor

Sorry for my late response. The modified PubMed collection that is used in this project is now publicly available at:
http://academictorrents.com/details/371a9244d2e9344a196a449f898e0a4385b6b43a
By using this collection and indexing it by using the configuration file in this link, you can replicate results for wsuirsaa from indexing to retrieval.
I will provide a simplified version of the code, so that you can just compute the weights for different concept types without trying to run the whole code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants