Next Headlines Archiver

Firebase Functions / Node.js / Puppeteer / FaunaDB / Next.js / Auth0 / Tailwind CSS

Full Stack Data Aggregation app that archives the headlines content from CNN and Fox News on scheduled intervals and allows users to scroll through a news timeline to see how they are reported on these sites.

How is data collected and displayed?

Firebase Cloud Function - Puppeteer - FaunaDB

A Firebase Pub/Sub Cloud Function is setup to run a background job every hour. This function uses Node.js and Puppeteer to scrape the headlines content from both CNN and Fox and then saves it to FaunaDB using the FaunaDB JavaScript Driver.

Data Management with FaunaDB

FaunaDB, the chosen database for this project, is a transactional database built in the cloud with a fast and developer friendly API.

The data collected from the web scraper is saved as documents in the news Collection with the following shape:

{
  "ref": Ref(Collection("news"), "1"), // Fauna specific
  "ts": 1617917180920000, // Fauna specific
  "data": {
    "provider": "CNN or Fox",
    "headLineUrl": "headLineUrl",
    "headLineTitle": "headLineTitle",
    "headLineImg": "headLineImg",
    "headLineTxt": "headLineTxt",
    "headLineTs": 1617917167766,
    "headLineUTCDate": "Thu, 08 Apr 2021 21:26:07 GMT"
    "likes": 0
  }
}

Several indexes were created to enforce uniqueness, perform searches and data sorting as well as custom functions to pull the data for different scenarios and paged results.

Displaying Data with Next.js

The client-side is built with Next.js, a framework built on top of React, which features hybrid static & server rendering. The data is displayed in the form of a timeline with cards shown side by side.

Client-side Features

Hybrid pages featuring both Static & Server Side Rendering and SEO
Static pages also feature Incremental Static Regeneration
Server and Client side rendering for search and filters
Optimized images
Data is displayed in cards over a timeline
Uses Tailwind CSS framework for styling
Responsive design
Progress bar indicator shown on page transitions
Search news
Filters included by provider or by dates range
Cards likes system
Only logged in users may like cards once
Auth0 for user authentication

Clonning this repo

If you'd like to clone this repo, you'd first need to setup a firebase Pub/Sub Cloud Function for the web scrapper background job (code found in the functions folder). Then open a FaunaDb account and setup a database with two collections (news and likes) and get a server key. However, further tweaks are needed to pull the correct data for the various API endpoints, search & sort and likes. Setup necessary environment variables.

git clone https://github.com/luvagu/next-headlines-archiver.git

cd next-headlines-archiver

npm install

npm run dev

Deploy the scheduled crawler after setting up and liking to your firebase project

cd functions

npm run deploy

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Next Headlines Archiver

How is data collected and displayed?

Client-side Features

Clonning this repo

Files

README.md

Latest commit

History

README.md

File metadata and controls

Next Headlines Archiver

How is data collected and displayed?

Client-side Features

Clonning this repo