-
Notifications
You must be signed in to change notification settings - Fork 208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature for training CRF #83
Comments
Beyond gazetteer features, adding Brown or Clark cluster features also improve performances. I experimented a lot with Brown cluster features and got consistent improvement across various models I built. The nice property of Brown clusters is their hierarchical nature. You can include the whole path as features and let the algorithm figure out (e.g set "-p c1=0.1" option) which are important. |
I understand. I also tried including various features specific to my application. My question is slightly different. Why Baseline features+Word Embedding give good accuracy and only Word Embedding doesn't give good accuracy for CRF? |
My guess is that the word embeddings is highly variant and require many training examples. While the other features are not. However the other features might be ambigue. Like if it starts with a capital letter is it the first word of the sentence, a name or a location? This is where the word embedding helps to increase the accuracy because if it has a certain 'shape' or value. Thus for example if it was the first word of the sentence then the algorithm can see from the word embedding that it is a normal woord. While the other feature disagree. Update: The word embeddings also have a high probability to find synonyms for words or words with a simular meaning. Thus it can make the rules more general then with the hand picked features alone. |
I would say that baseline features work as advertised - you know what information they carry. This is because those are hand-crafted features. The word embedding features encode information about a specific word being in some context. It might capture some of the information the baseline features does but you don't know that for sure (beauty of deep learning, eh?). It is safe to say that the two are complimentary. |
CRFSuite provides a good pipeline for NER training and recognition using CRF. I wanted to confirm the training procedure. From what I observed, only word embeddings do not provide good accuracy. However, adding them on baseline features like contexual tokens, pos, isupper, isdigit, istitle, etc gives good accuracy. Is there anything on which I am missing out?
The text was updated successfully, but these errors were encountered: