This little library uses streams to fetch data from RSS endpoints and saves the transformed data to a levelDB database for further use (did anyone said ML?)
cd repository
nvm use
npm i -g yarn
yarn
RssRipper
is an exposed class that receives an optional "transformer" parameter and has a single method called rip
.
const defaultRipper = new RssRipper();
const myFeedStream = defaultRipper.rip('http://my-feed-url.com');
myFeedStream.subscribe(([url, id, item]) => {
// At this point the library has successfully retrieved an item and stored it to levelDB
// We receive the ripped url, the id that has been
// saved to levelDB and the item itself
console.log(`Saved item with id ${id} from ${url}: ${item}`);
})
A transformer
is a simple method that can be plugged when we initialize the RssRipper
that extracts the data as we prefer.
The default transformer is called pass-through
:
export default (item, index) => Rx.Observable.of([index, item])
It returns an 2-dimensional array that will be stored in level db as key
and value
respectively.
You can extract and transform data from item
as you please, leveraging the power of Rx.js to manipulate the stream (e.g. doing an AJAX call per every item, or buffering results and doing a batch call every N items...)
I've bundled this with a small example that reads paged ATOM rss feeds from the wordpress blog, firing a call every 500 ms to fetch 100 pages in total.
You can run it out-of-the-box using yarn run examples:wordpress
.