Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature for training CRF #83

Open
niharikagupta92 opened this issue Jun 19, 2017 · 4 comments
Open

Feature for training CRF #83

niharikagupta92 opened this issue Jun 19, 2017 · 4 comments

Comments

@niharikagupta92
Copy link

CRFSuite provides a good pipeline for NER training and recognition using CRF. I wanted to confirm the training procedure. From what I observed, only word embeddings do not provide good accuracy. However, adding them on baseline features like contexual tokens, pos, isupper, isdigit, istitle, etc gives good accuracy. Is there anything on which I am missing out?

@usptact
Copy link

usptact commented Jun 19, 2017

Beyond gazetteer features, adding Brown or Clark cluster features also improve performances. I experimented a lot with Brown cluster features and got consistent improvement across various models I built. The nice property of Brown clusters is their hierarchical nature. You can include the whole path as features and let the algorithm figure out (e.g set "-p c1=0.1" option) which are important.

@niharikagupta92
Copy link
Author

I understand. I also tried including various features specific to my application. My question is slightly different. Why Baseline features+Word Embedding give good accuracy and only Word Embedding doesn't give good accuracy for CRF?

@borissmidt
Copy link

borissmidt commented Jun 28, 2017

My guess is that the word embeddings is highly variant and require many training examples. While the other features are not. However the other features might be ambigue. Like if it starts with a capital letter is it the first word of the sentence, a name or a location?

This is where the word embedding helps to increase the accuracy because if it has a certain 'shape' or value. Thus for example if it was the first word of the sentence then the algorithm can see from the word embedding that it is a normal woord. While the other feature disagree.

Update: The word embeddings also have a high probability to find synonyms for words or words with a simular meaning. Thus it can make the rules more general then with the hand picked features alone.

@usptact
Copy link

usptact commented Jun 28, 2017

I would say that baseline features work as advertised - you know what information they carry. This is because those are hand-crafted features. The word embedding features encode information about a specific word being in some context. It might capture some of the information the baseline features does but you don't know that for sure (beauty of deep learning, eh?). It is safe to say that the two are complimentary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants