Custom entities with regex pattern #86
-
I'm ingesting some data from Slack. Slack formats its messages with some custom templates. For example: Custom emojis are surrounded in colons At-mentions are re-encoded similar to this: More examples and docs here: https://api.slack.com/reference/surfaces/formatting#retrieving-messages How can I have wink tag these known regular elements? It seems like learnCustomEntities is the correct path but it needs to take a regex pattern. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
Hello @bennyty Based on the problem description, here is a possible solution: const winkNLP = require( 'wink-nlp' );
const model = require( 'wink-eng-lite-web-model' );
const nlp = winkNLP( model );
const its = nlp.its;
// Define patterns.
const patterns = [
{ name: 'customEmoji', patterns: [ ': green _ checkmark :' ] },
{ name: 'mention', patterns: [ '< MENTION >' ] },
{ name: 'channel', patterns: [ '< HASHTAG >' ] }
];
// Train using patterns.
nlp.learnCustomEntities(patterns);
// Sample text
const text = "<@U024BE7LH> had shared a :green_checkmark: on <#C024BE7LR> slack's channel after completing the task.";
// Read document.
const doc = nlp.readDoc( text );
// Print tokens.
console.log( doc.tokens().out() );
// Print details of each entity.
console.log( doc.customEntities().out( its.detail ) );
// Markup entities along with their type for highlighting them in the text.
doc.customEntities().each( ( e ) => {
e.markup( '<mark>', `<sub style="font-weight:900"> ${e.out(its.type)}</sub></mark>` );
} );
// Render them as HTML!
doc.out( its.markedUpText ); The corresponding HTML output on runkit: However you will have to keep in mind that Best, |
Beta Was this translation helpful? Give feedback.
Hello @bennyty
Based on the problem description, here is a possible solution: