Skip to content

Stream-based RSS ripper, built with Rx.js and powered by LevelDB

Notifications You must be signed in to change notification settings

vshjxyz/rss-ripper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RSS ripper

This little library uses streams to fetch data from RSS endpoints and saves the transformed data to a levelDB database for further use (did anyone said ML?)

Setup

cd repository
nvm use
npm i -g yarn
yarn

Usage

RssRipper is an exposed class that receives an optional "transformer" parameter and has a single method called rip.

const defaultRipper = new RssRipper();
const myFeedStream = defaultRipper.rip('http://my-feed-url.com');
myFeedStream.subscribe(([url, id, item]) => {
  // At this point the library has successfully retrieved an item and stored it to levelDB
  
  // We receive the ripped url, the id that has been 
  // saved to levelDB and the item itself
  console.log(`Saved item with id ${id} from ${url}: ${item}`);
})

Transformers

A transformer is a simple method that can be plugged when we initialize the RssRipper that extracts the data as we prefer.

The default transformer is called pass-through:

export default (item, index) => Rx.Observable.of([index, item])

It returns an 2-dimensional array that will be stored in level db as key and value respectively.

You can extract and transform data from item as you please, leveraging the power of Rx.js to manipulate the stream (e.g. doing an AJAX call per every item, or buffering results and doing a batch call every N items...)

Examples

I've bundled this with a small example that reads paged ATOM rss feeds from the wordpress blog, firing a call every 500 ms to fetch 100 pages in total.

You can run it out-of-the-box using yarn run examples:wordpress.

About

Stream-based RSS ripper, built with Rx.js and powered by LevelDB

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published