Feature for training CRF #83

niharikagupta92 · 2017-06-19T06:15:34Z

CRFSuite provides a good pipeline for NER training and recognition using CRF. I wanted to confirm the training procedure. From what I observed, only word embeddings do not provide good accuracy. However, adding them on baseline features like contexual tokens, pos, isupper, isdigit, istitle, etc gives good accuracy. Is there anything on which I am missing out?

usptact · 2017-06-19T06:25:41Z

Beyond gazetteer features, adding Brown or Clark cluster features also improve performances. I experimented a lot with Brown cluster features and got consistent improvement across various models I built. The nice property of Brown clusters is their hierarchical nature. You can include the whole path as features and let the algorithm figure out (e.g set "-p c1=0.1" option) which are important.

niharikagupta92 · 2017-06-19T08:04:15Z

I understand. I also tried including various features specific to my application. My question is slightly different. Why Baseline features+Word Embedding give good accuracy and only Word Embedding doesn't give good accuracy for CRF?

borissmidt · 2017-06-28T07:42:31Z

My guess is that the word embeddings is highly variant and require many training examples. While the other features are not. However the other features might be ambigue. Like if it starts with a capital letter is it the first word of the sentence, a name or a location?

This is where the word embedding helps to increase the accuracy because if it has a certain 'shape' or value. Thus for example if it was the first word of the sentence then the algorithm can see from the word embedding that it is a normal woord. While the other feature disagree.

Update: The word embeddings also have a high probability to find synonyms for words or words with a simular meaning. Thus it can make the rules more general then with the hand picked features alone.

usptact · 2017-06-28T18:59:15Z

I would say that baseline features work as advertised - you know what information they carry. This is because those are hand-crafted features. The word embedding features encode information about a specific word being in some context. It might capture some of the information the baseline features does but you don't know that for sure (beauty of deep learning, eh?). It is safe to say that the two are complimentary.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature for training CRF #83

Feature for training CRF #83

niharikagupta92 commented Jun 19, 2017

usptact commented Jun 19, 2017 •

edited

Loading

niharikagupta92 commented Jun 19, 2017

borissmidt commented Jun 28, 2017 •

edited

Loading

usptact commented Jun 28, 2017

Feature for training CRF #83

Feature for training CRF #83

Comments

niharikagupta92 commented Jun 19, 2017

usptact commented Jun 19, 2017 • edited Loading

niharikagupta92 commented Jun 19, 2017

borissmidt commented Jun 28, 2017 • edited Loading

usptact commented Jun 28, 2017

usptact commented Jun 19, 2017 •

edited

Loading

borissmidt commented Jun 28, 2017 •

edited

Loading