Created with the study purpose
It's a Vue app for display the bank's data from a client and a back-end in express to expose the data collected by a web crawler
WARNING: Using this tool without care may lead to your bank account being blocked. Use at your own risk!
Table of Contents
At this moment the project has toke around 20 hours, and I have tried three approaches to the crawling the Nubank Account data:
-
Using the package request to write a light web crawler, I already have inspected the page in DevTools and saw the requests, parameters, and responses. But I can't bypass the authentication because is an SPA and need JavaScript to render correctly.
-
Using Puppeteer, it is a more heavy crawler, but I can render correctly the SPA. And here I tried to intercept the request after logged in page. I could intercept the request, but the request body comes empty, by my research this feature still not implemented well in Chrome DevTools API or in Puppeteer.
-
The most heavy approach I think. Yet using Puppeteer to emulate the user interaction with the Web App, and collect the HTML rendered. And cheerio to parser the whole HTML and extract the relevant information. Here I needed to understand the structure of the HTML.
-
The IP or Username are blocked by the bank security because of too many requests. This occurs in all approaches.
-
If some part of HTML structure was changed it can break the whole crawler parser.
- Nubank: Fetching the whole transactions timeline data, Categories and Tags; Displaying only Categories information.
-
Based in Node.js v10.11
-
JavaScript Style Guide from Airbnb to keep the code pattern. reference
-
Used ES6 with Babel to transpile the production code to common JS.
-
Vue 2 is an open source front-end framework to create robust SPA.
-
axios to handle the api requests;
-
chart.js Simple JS charts to display the information
Inside ./server
-
express Minimal web framework for node, to handle the requests
-
Puppeteer is a Node library which provides a high-level API to control Chrome or Chromium over the DevTools Protocol. Making possible to crawler a SPA like Nubank web app.
-
cheerio implementation of jQuery to easy parser the HTML.
-
dotenv to work with .env files
-
cors is a middleware to handle the CORS in express
-
morgan HTTP request logger middleware
Vue App
yarn install
Server
cd server
yarn install
Setting up the .env
files in Vue App and in the Server App
Vue App
Path: .env
VUE_APP_ROOT_API='http://localhost:3005/v1' # URL to Express API
Server
Path: server/.env
MAIN_PATH='/home/davi/Dropbox/Projects/bank-analysis/server/src' # Your server/src path
STORAGE_DIR='storage' # Store the HTML to parser and extracted data. The application uses in this way 'MAIN_PATH/../STORAGE_DIR'
PORT=3005 # Express Server PORT
PUPPETEER_HEADLESS=true # To use puppteer in headless mode
USE_STORED_DATA=true # Bypass the crawler in use the previous extracted data for the same username
NUBANK_URL='https://app.nubank.com.br' # Nubank Web App URL
Remove all files from Server STORAGE_DIR
By default
rm -rf server/storage/*.json server/storage/*.html
Vue App
yarn run serve
By default Vue Cli serves in http://localhost:8080
Server
cd server
yarn run serve:dev
Vue App
yarn run build
Vue App
yarn run lint
Server
cd server
yarn run lint
Vue App
yarn run test:unit
Vue App
yarn run test:e2e
-
Handling Authentication Error on Nubank
-
Display the information
-
Parser the User personal information
-
Fake data to test the crawler
-
Fake data to test the parser
-
Tests to the front-end
-
Integrate with MongoDB
-
Better UI Design
Read the contributing guide