Skip to content

v7 Upgrade, welcome

spencer kelly edited this page Jan 26, 2017 · 10 revisions

🕺

Version 7 is our biggest and most-needed change to this library. It's a many-fold rewrite to the existing api, beginning November 2016, and consisting of 700 commits.

It softens many edges in the original workflow, and offers a pretty fresh way of working and playing with english text in all forms.

basically:

nlp(myText, myLexicon).mySubset().aSubsetFn().out(myOutput)

the idea is to make it simple to 1) reach-in, 2) make a change, 3) output any way.

// give it your arbitrary text
var r = nlp(`Finally, the api is stable.`, {})

//grab a subset and make a transformation..
r.nouns().toUpperCase()

//call a subset-specific method
r.sentences().toExclamation()

//output the new thing as whatever
r.out('text')
//"Finally, the API is stable!"

major takeaways:

  • it's now simply called compromise (Thanks Joshua!)
  • all functions are now sentence/terms level, instead of single-term-level - no more looping!
  • includes a clever regex-like matching scheme for grammatical patterns and templates
  • easy-access to common text normalizations (contractions, punctuation, etc)
  • one universal input, which is now consistently tagged/parsed
  • lucid consistent/dependent/conflicting POS-tag logic

minor takeaways:

demands less working knowledge of internals + grammar 💥

no longer fusses with lumping/splitting of neighbouring terms 💥

more playful and 'bottom up' api 💥

easier matching of ad-hoc templates 💥

cuter debugging and traceable decision-making:boom:

npm install compromise

Words live in groups

Instead of single Term objects having the methods & tooling, the library now hoists all this functionality to the main API, so you can filter-down, act-upon, and inspect any list of terms, just as easy as acting on a single term.

( ie. one word is now just a list of words, of length 1. )

This way, you can work on arbitrary text without arbitrary compromise choices getting in the way:

r= nlp('singing').verbs().toPastTense()
// sang

r= nlp('would have been singing').verbs().toPastTense()
// would have sang

r= nlp('john is singing. Sara was singing.').verbs().toPastTense().out('array')
//[is, was]

##no more nlp.person(), nlp.value()... every input will now be pos-tagged, and supplied the appropriate methods for each sequence.

let r= nlp('five years old')
r.values().toNumber()
r.out('text')
// '5 years old'

if you don't trust this, you can co-erce the POS:

nlp('john is cool').tagAs('Noun').nouns().toPlural().out('text')
//john is cools

##Match/subset-lookup .match() see match syntax

nlp('john is cool and jane is nice').match('#Person is').out('array')
//[ 'john is', 'jane is']

more functionality:

nlp('john is cool and jane is nice').not('#Person is').out('array')
//[ 'cool', 'nice']
nlp('john is cool and jane is nice').matchOne('#Person is').out('array')
//[ 'john is']

output

nlp('John is cool').out('normal');
nlp('John is cool').out('text');
nlp('John is cool').out('html');
//also allows a cleaner, less-crowded result
nlp('John is cool').out('json');
//and adhoc-scripting
nlp('John is cool').out(myFunction);

to see all the new features, see compromise.cool/demos

a huge thank you to our 45! contributors to the work.

for low-hanging fruit, checkout our todo list

Clone this wiki locally