Build a scalable, high-performance dashboard that processes and visualises data from large datasets (2GB+) in real-time. The system must incorporate user-specific roles, provide real-time data streaming, and handle large-scale data with robust access control and performance optimisations.
Candidates must choose one of the following publicly available datasets, each in CSV format, to demonstrate large dataset handling:
-
NYC Taxi & Limousine Commission Trip Record Data
Dataset Size: ~1.4GB per month of data
Description: Trips, fares, and other transactional data. -
U.S. Stock Market Historical Data (Kaggle)
Dataset Size: Multiple GBs
Description: Daily price, volume data of all US-listed stocks and ETFs. -
OpenWeather Historical Weather Data
Dataset Size: Varies depending on time range
Description: Historical weather data for any location in CSV format.
-
API Design:
- Set up an Express.js server to manage RESTful API requests.
- Implement user authentication (JWT) and authorisation (RBAC/Entity-based Access).
- Implement a queue/worker system (e.g., Bull, Redis, RabbitMQ) for processing large CSV files in the background.
- Store processed data efficiently in a database like PostgreSQL or MongoDB.
- Create an endpoint to ingest data via CSV files**, which will be processed and stored in the database.
- Provide APIs for real-time data streaming (WebSockets/Server-Sent Events).
- Ensure endpoints allow fetching, filtering, and pagination of large datasets.
-
Scalability:
- Implement horizontal scalability with microservices architecture for the backend (optional).
- Support real-time streaming with low latency for large-scale data.
- Efficiently handle multiple users uploading large datasets simultaneously.
-
Security:
- Implement best practices for securing API requests (rate limiting, input validation).
- Apply encryption for sensitive data during storage and transit.
-
Design & UI/UX:
- Create a responsive, modular dashboard with a clean and modern UI/UX.
- Implement at least 3 role-specific views (admin, manager, user).
- Include data visualisations (charts, graphs) using libraries like D3.js, Chart.js, or Recharts.
- Support real-time updates with WebSockets or Server-Sent Events.
- Allow users to upload new datasets and recalculate metrics without UI disruption.
-
Performance & Optimisation:
- Implement lazy loading, code splitting, and component-level caching.
- Ensure optimal rendering of large datasets by using virtualisation (e.g.,
react-window
orreact-virtualized
). - Add pagination and data chunking to handle large data efficiently on the frontend.
-
Real-time Features:
- Integrate WebSockets or Server-Sent Events for real-time data updates.
- Display real-time analytics and graphs that auto-update as new data arrives.
-
Efficient Processing:
- Design a scalable solution for processing large CSV files in the background using worker processes.
- Implement optimisations such as batch inserts and memory-efficient file reading (e.g.,
stream
,csv-parser
). - Support re-processing or updates when new datasets are uploaded.
- Ensure data integrity when new data is processed or merged with existing entities.
-
Analytics:
- Calculate real-time metrics (e.g., sums, averages, etc.) as data is uploaded or modified.
- Handle edge cases like incomplete data or inconsistent CSV formats gracefully.
- Provide an efficient mechanism to update, delete, or modify entity data based on user role.
-
Infrastructure Setup:
- Use AWS (Lambda/EC2) for backend services.
- Deploy the frontend to Vercel or Netlify with proper environment configuration.
-
CI/CD Pipeline:
- Set up automated tests and deployments using GitHub Actions.
- Ensure high code quality with linting (ESLint), unit tests (Jest), and integration tests.
- Monitor performance metrics post-deployment (e.g., using AWS CloudWatch).
-
Database:
- Optimise database queries (e.g., indexes, partitioning) for large dataset performance.
- Implement caching layers (e.g., Redis) for frequently accessed data.
-
API & Caching:
- Use HTTP/2 and server-side caching (ETags, Redis) for frequently requested data.
- Implement rate limiting and API throttling for expensive queries.
-
Frontend:
- Use efficient data-fetching techniques (SWR/React Query) and reduce network payload size with GZIP compression.
- Write a highly efficient Node.js script to process large datasets.
- Implement batch processing to read, process, and store data in a database.
- Create RESTful APIs to:
- Stream real-time data to the frontend.
- Allow filtered and paginated access to the data.
- Update, delete, or modify data when new CSV files are uploaded.
- Ensure user permissions are respected based on roles.
- Design a modular, responsive dashboard with a clean UI using Next.js.
- Implement role-based views:
- Admin: Full data access, control over user roles, data management features.
- Manager: Access to specific datasets and metrics related to their entities.
- User: Basic view with read-only access to a subset of data.
- Add real-time data visualisations using a charting library.
- Handle large datasets with proper pagination and filtering mechanisms.
- Implement real-time updates on the frontend using WebSockets or Server-Sent Events.
- Visualise real-time changes in data and analytics.
- Implement JWT-based authentication.
- Set up RBAC and entity-based access controls to ensure data security.
- Ensure secure APIs with input validation and role-based access filtering.
- Deploy the frontend to Vercel and backend to AWS.
- Set up a GitHub Actions CI/CD pipeline for automated testing and deployment.
- Implement lazy loading, pagination, and data virtualisation for large datasets.
- Optimise database queries and use caching strategies for commonly requested data.
- Measure and improve API response time and frontend rendering performance.
- Code quality, readability, and maintainability.
- Scalability and efficiency in handling large datasets.
- Robust real-time processing and data streaming implementation.
- Ability to handle different user roles with strict data access control.
- UI/UX quality, responsiveness, and user-friendliness.
- Optimised backend and frontend performance.
- Deployment strategy, CI/CD pipeline setup, and code documentation.
- Handling of edge cases, error handling, and testing coverage.
- Meaningful Git commits with clear, descriptive messages for both frontend and backend development progress.
- Provide a GitHub repository link with all source code.
- Include a README.md with:
- Setup instructions.
- Architecture and design decisions.
- Assumptions, limitations, and areas for improvement.
- Provide live URLs for both frontend and backend deployments.
- Submit a short demo video showcasing the app's core features.
- Provide demo user credentials for each role (admin, manager, user) with explanations of role-based access.
You have 7 days to complete this assessment. Good luck!