Recognizing abbreviations - custom entities #84
-
The sentence parser is not handling certain strings well for me... such as
i tried (roughly):
output is breaking up the abbreviations as though it were the last word in the sentence... like this.
because it's breaking apart the sentence at the punctuation for the abbreviation, it's making it impossible for me to define another pattern to recognize the full line as a phrase. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
Hello @jonzim The current model is not trained on recipe/cooking related corpus/data. As a result it is breaking the sentences incorrectly. With you help we can possibly train the model to handle such text and also add QUANTITY entity detection as well. We will require a set of representative texts that will include such terms/quantities to begin with. Best, |
Beta Was this translation helpful? Give feedback.
-
Hello @jonzim We have made some improvements in wink-eng-lite-web-model — it now able to handle standard abbreviations such as Custom entities performs a greedy match and in case of multiple matches, the longest one is given preference. The construct given in your example is not supported. However we are planing to add detection of more entities including QUANTITY. We may be able to help you further if you share more details of your use case. We are giving below the revised code below. Best, const winkNLP = require( 'wink-nlp' );
const model = require( 'wink-eng-lite-web-model' );
const nlp = winkNLP( model );
const patterns = [
{ name: 'quantity', patterns: ['[CARDINAL] [c|c.|tbsp|tbsp.|tsp|tsp.]'] }
];
nlp.learnCustomEntities(patterns);
const text = `1 tbsp. milk.
3 tbsp unsweetened bakers chocolate.
2 tbsp. of sugar.
1/4 tsp. almond extract.
1 c. milk.
2 tbsp. whipped cream (for garnish).`
const doc = nlp.readDoc( text );
console.log( doc.customEntities().out() );
// -> ["1 tbs.", "3 tbs", "2 tbs.", "1/4 tsp.", "1 c.", "2 tbs."]
doc.sentences().out();
// -> 1 tbsp. milk.
// -> 3 tbsp unsweetened bakers chocolate.
// -> 2 tbsp. of sugar.
// -> 1/4 tsp. almond extract.
// -> 1 c. milk.
// -> 2 tbsp. whipped cream (for garnish). |
Beta Was this translation helpful? Give feedback.
Hello @jonzim
We have made some improvements in wink-eng-lite-web-model — it now able to handle standard abbreviations such as
tsp.
andtbsp.
during SBD. This is version 1.4.3.Custom entities performs a greedy match and in case of multiple matches, the longest one is given preference.
The construct given in your example is not supported. However we are planing to add detection of more entities including QUANTITY.
We may be able to help you further if you share more details of your use case. We are giving below the revised code below.
Best,
Sanjaya