RFC 9309 spec compliant robots.txt builder and parser. 🦾 No dependencies, fully typed.
Before using this library, I recommend you to read the following guide by Google: https://developers.google.com/search/docs/crawling-indexing/robots/intro
Note to myself (and contributors): https://www.rfc-editor.org/rfc/rfc9309.html
npm i robotstxt-util
Exports a parser parseRobotsTxt
and an object RobotsTxt
to create and manage robots.txt data.
import { RobotsTxt } from 'robotstxt-util'
const robotstxt = new RobotsTxt()
const allBots = robotstxt.newGroup('*')
allBots.disallow('/')
const googleBot = robotstxt.newGroup('googlebot')
googleBot.allow('/abc')
googleBot.disallow('/def').disallow('/jkl')
// specify multiple bots
const otherBots = robotstxt.newGroup(['abot', 'bbot', 'cbot'])
googleBot.allow('/qwe')
// specify custom rules
googleBot.addCustomRule('crawl-delay', 10)
// add sitemaps
robotstxt.add('sitemap', 'https://yoursite/sitemap.en.xml')
robotstxt.add('sitemap', 'https://yoursite/sitemap.tr.xml')
// and export
const json = robotstxt.json()
const txt = robotstxt.txt()
Parses the data and returns instance of RobotsTxt
:
import { parseRobotsTxt } from 'robotstxt-util'
const data = `
# hello robots
User-Agent: *
Disallow: *.gif$
Disallow: /example/
Allow: /publications/
User-Agent: foobot
Disallow:/
crawl-delay: 10
Allow:/example/page.html
Allow:/example/allowed.gif
# comments will be stripped out
User-Agent: barbot
User-Agent: bazbot
Disallow: /example/page.html
Sitemap: https://yoursite/sitemap.en.xml
Sitemap: https://yoursite/sitemap.tr.xml
`
const robotstxt = parseRobotsTxt(data)
// update something in some group
robotstxt.findGroup('barbot').allow('/aaa').allow('/bbb')
// store as json or do whatever you want
const json = robotstxt.json()
If you're interested in contributing, read the CONTRIBUTING.md first, please.
Thanks for watching 🐬