Skip to content

Saves words when are published for the first time on Il Post, inspired by the work of Max Bittker.

Notifications You must be signed in to change notification settings

sinanatra/ilpost-first-said

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Screenshot 2023-12-13 at 11 12 17

A script that tracks when the newspaper IlPost publishes a word for the first time.
Running at: @ilPostDice. Largely inspired by the work of Max Bittker.

Scraper:

Basic architecture


Il Post first said is essentially a single script which runs every two hours as a cron job on Github.

html_proc.py parses an xml document. In this case https://www.ilpost.it/feed/ ( sometimes https://rss.draghetti.it/ilpost.xml, as it appears to be sometimes more reliable). It opens the url of each new article, retrieves the text of the article, tokenizes each word and can: tweet the new words using utils/tweet.py or add them to telegram via: utils/telegramBot.py.
Each new word, its context, date and link are saved in a Mongo DB instance. For example:

{
  _id: 6579651f1a28f773943e8448
  word:"stsso"
  context
  "dei combustibili fossili, come ha detto lo stsso al Jaber. Al tempo stesso, però"
  date_added:"2023-12-13T06:00:04.000000+0000"
  url:"https://www.ilpost.it/2023/12/13/nuova-bozza-cop28/"
}

Requisites


Install PIP requirements with pip install -r requirements.txt.

Start


Run html_proc.py to launch the script.

Visualization:

On the visualization branch a Sveltekit app visualizes the data over a timeline.

Setup

Install dependencies with npm install (or pnpm install or yarn), start a development server:

npm run dev

# or start the server and open the app in a new browser tab
npm run dev -- --open

Building

To create a production version:

npm run build

You can preview the production build with npm run preview.

To deploy your app, you may need to install an adapter for your target environment.

About

Saves words when are published for the first time on Il Post, inspired by the work of Max Bittker.

Resources

Stars

Watchers

Forks

Languages