Firebase Functions / Node.js / Puppeteer / FaunaDB / Next.js / Auth0 / Tailwind CSS
Full Stack Data Aggregation
app that archives the headlines content from CNN and Fox News on scheduled intervals and allows users to scroll through a news timeline to see how they are reported on these sites.
Firebase Cloud Function - Puppeteer - FaunaDB
A Firebase Pub/Sub Cloud Function
is setup to run a background job every hour. This function uses Node.js and Puppeteer to scrape the headlines content from both CNN and Fox and then saves it to FaunaDB using the FaunaDB JavaScript Driver.
Data Management with FaunaDB
FaunaDB, the chosen database for this project, is a transactional database
built in the cloud with a fast and developer friendly API.
The data collected from the web scraper is saved as documents in the news
Collection with the following shape:
{
"ref": Ref(Collection("news"), "1"), // Fauna specific
"ts": 1617917180920000, // Fauna specific
"data": {
"provider": "CNN or Fox",
"headLineUrl": "headLineUrl",
"headLineTitle": "headLineTitle",
"headLineImg": "headLineImg",
"headLineTxt": "headLineTxt",
"headLineTs": 1617917167766,
"headLineUTCDate": "Thu, 08 Apr 2021 21:26:07 GMT"
"likes": 0
}
}
Several indexes
were created to enforce uniqueness
, perform searches
and data sorting
as well as custom functions
to pull the data for different scenarios and paged
results.
Displaying Data with Next.js
The client-side is built with Next.js
, a framework built on top of React
, which features hybrid static & server rendering
. The data is displayed in the form of a timeline with cards shown side by side.
- Hybrid pages featuring both
Static & Server Side Rendering
andSEO
- Static pages also feature
Incremental Static Regeneration
- Server and Client side rendering for search and filters
- Optimized images
- Data is displayed in cards over a timeline
- Uses Tailwind CSS framework for styling
- Responsive design
- Progress bar indicator shown on page transitions
- Search news
- Filters included by provider or by dates range
- Cards likes system
- Only logged in users may like cards once
- Auth0 for user authentication
If you'd like to clone this repo, you'd first need to setup a firebase Pub/Sub Cloud Function
for the web scrapper background job (code found in the functions
folder). Then open a FaunaDb account and setup a database with two collections (news and likes) and get a server key. However, further tweaks are needed to pull the correct data for the various API endpoints, search & sort and likes. Setup necessary environment variables.
git clone https://github.com/luvagu/next-headlines-archiver.git
cd next-headlines-archiver
npm install
npm run dev
Deploy the scheduled crawler after setting up and liking to your firebase project
cd functions
npm run deploy