Skip to content

Master's Thesis in Computer Science, University of Bologna, A.Y. 2022-2023.

Notifications You must be signed in to change notification settings

prushh/master-thesis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Master's thesis

Title: Exploring the Effectiveness of AWS Lambda and Knative in a Serverless Web Crawler: A Comparative Study

Author: Davide Pruscini

Supervisor: Prof. Gianluigi Zavattaro

Co-supervisors: Eng. Emanuele Casadio, Dr. Matteo Trentin

Academic Year: 2022/2023

University: Alma Mater Studiorum - University of Bologna

Degree course: Computer Science

Abstract

The Internet has become a key resource for accessing and sharing information. However, not all content found on it can be considered legitimate, and using tools such as web crawlers can help search for violations. In this thesis, carried out in collaboration with Kopjra, we aim to develop a web crawler application capable of automatically visiting a website, extracting URLs and indexing the HTML documents of its web pages, so as to enable keyword searches. We decided to compare two serverless implementations based on AWS Lamba and Knative, with a third microservice-based one that exploits the resources made available by Kubernetes. It is also possible to choose between two search methodologies: HTTP requests or Browser automation. To support the application, two microservices were developed, comprising the backend and frontend, as well as the deployment of an Elasticsearch cluster, which is necessary for proper ingestion of the content of web pages. Thanks to a series of tests, it is possible to compare the different implementations and understand the critical issues of each.

System Architecture

High level architecture

Implementations

AWS Lambda

AWS architecture

Knative

Knative architecture

Kubernetes

Kubernetes architecture

Thesis

You can download or view the .pdf file of the thesis here. Please note that this file will never be updated.

License

This project is licensed under the CC BY-NC-ND 4.0 License - see the AMSLaurea page for details.

Acknowledgement

The template's skeleton was taken from jjocram/master-thesis.