-
Notifications
You must be signed in to change notification settings - Fork 653
v7 Upgrade, welcome
Version 7 is our biggest and most-needed change to this library. It's a many-fold rewrite to the existing api, beginning November 2016, and consisting of 700 commits.
It softens many edges in the original workflow, and offers a pretty fresh way of working and playing with english text in all forms.
basically:
nlp(myText, myLexicon).mySubset().aSubsetFn().out(myOutput)
the idea is to make it simple to 1) reach-in, 2) make a change, 3) output any way.
// give it your arbitrary text
var r = nlp(`Finally, the api is stable.`, {})
//grab a subset and make a transformation..
r.nouns().toUpperCase()
//call a subset-specific method
r.sentences().toExclamation()
//output the new thing as whatever
r.out('text')
//"Finally, the API is stable!"
- it's now simply called
compromise
(Thanks Joshua!) - all functions are now sentence/terms level, instead of single-term-level - no more looping!
- includes a clever regex-like matching scheme for grammatical patterns and templates
- easy-access to common text normalizations (contractions, punctuation, etc)
- one universal input, which is now consistently tagged/parsed
- lucid consistent/dependent/conflicting POS-tag logic
demands less working knowledge of internals + grammar 💥
no longer fusses with lumping/splitting of neighbouring terms 💥
more playful and 'bottom up' api 💥
easier matching of ad-hoc templates 💥
cuter debugging and traceable decision-making:boom:
npm install compromise
Instead of single Term
objects having the methods & tooling, the library now hoists all this functionality to the main API, so you can filter-down, act-upon, and inspect any list of terms, just as easy as acting on a single term.
( ie. one word is now just a list of words, of length 1. )
This way, you can work on arbitrary text without arbitrary compromise
choices getting in the way:
r= nlp('singing').verbs().toPastTense()
// sang
r= nlp('would have been singing').verbs().toPastTense()
// would have sang
r= nlp('john is singing. Sara was singing.').verbs().toPastTense().out('array')
//[is, was]
##no more nlp.person(), nlp.value()...
every input will now be pos-tagged, and supplied the appropriate methods for each sequence.
let r= nlp('five years old')
r.values().toNumber()
r.out('text')
// '5 years old'
if you don't trust this, you can co-erce the POS:
nlp('john is cool').tagAs('Noun').nouns().toPlural().out('text')
//john is cools
##Match/subset-lookup .match()
see match syntax
nlp('john is cool and jane is nice').match('#Person is').out('array')
//[ 'john is', 'jane is']
more functionality:
nlp('john is cool and jane is nice').not('#Person is').out('array')
//[ 'cool', 'nice']
nlp('john is cool and jane is nice').matchOne('#Person is').out('array')
//[ 'john is']
nlp('John is cool').out('normal');
nlp('John is cool').out('text');
nlp('John is cool').out('html');
//also allows a cleaner, less-crowded result
nlp('John is cool').out('json');
//and adhoc-scripting
nlp('John is cool').out(myFunction);
to see all the new features, see compromise.cool/demos
a huge thank you to our 45! contributors to the work.
for low-hanging fruit, checkout our todo list