Mozilla's Readability.js as a service
This project packages Mozilla's Readability JS as an HTTP service that can be deployed via Docker anywhere.
There's only one endpoint, which consumes and delivers json. You send in a URL to some page you want content extracted, you get back a json payload echoing the URL and containing the stripped out content.
You'll get back all properties parsed out by Mozilla's Readability.
~ curl -XPOST http://readability-js-server:3000/ \
-H "Content-Type: application/json" \
-d'{"url": "https://en.wikipedia.org/wiki/Firefox"}'
To which you will receive:
HTTP/1.1 200 OK
X-Powered-By: Express
Content-Type: application/json; charset=utf-8
{
"url": "https://en.wikipedia.org/wiki/Firefox",
"title": "",
"byline": null,
"dir": "ltr",
"content": "<div id=\"readability-page-1\" class=\"page\"><div dir=\"ltr\" lang=\"en\" id=\"mw-content-text\">\n\n\n<table><caption>Mozilla Firefox</caption><tbody><tr><td colspan=\"2\"><a href=\"/wiki/File:Firefox_logo,_2019.svg\"><img data-file-height=\"80\" data-file-width=\"77\" srcset=\"//upload.wikimedia. [...],
"length": 101272,
"excerpt": "Firefox 89 on Windows 10 displaying Wikipedia with the default system theme.",
"siteName": null
}
The container image lives at phpdockerio/readability-js-server
. At the moment, it takes no configuration for anything,
although this might change if and when the use case arises.
linux/amd64
linux/arm64
If you require linux/arm/v7
(32 bit), the newest supported version is 1.5.0
.
We tag each image as latest
, x.x.x
, x.x
and x
. Since Semver is in use, you can peg to, say,
phpdockerio/readability-js-server:1
with the confidence that no breaking changes will come to ruin your day. You can
also peg to phpdockerio/readability-js-server:1.x
if there's a specific minor version that introduces a new feature
you need.
You'll probably be using this if you're deploying the service somewhere. Simply run the equivalent to
~ docker run -p3000:3000 phpdockerio/readability-js-server
You'll need node
>= 10 and yarn
.
Once you clone the repo:
~ yarn install
~ yarn start
- No configuration required. This might change if the need arises.
- The docker image runs via
pm2
andnode 20
with 5 processes.