This project is an attempt to create a distributed storage system using Python and the socket library. It involves a server and multiple clients that communicate with each other to store and retrieve data.
The distributed storage system allows clients to store and retrieve data on the server, which acts as a central coordinator. It manages the distribution of data across multiple clients. When a client sends data to the server, the server distributes this data across the connected clients using a token-based algorithm. Clients can also request data from the server, which then retrieves the data from all connected clients and sends it back to the requesting client.
The server is the central component of the system. It:
- Listens for incoming client connections.
- Manages the distribution of data across connected clients.
- Handles requests from clients to retrieve data.
The client component:
- Connects to the server.
- Can send and receive data. When sending data to the server, the server distributes this data across the connected clients.
- Can request data from the server, which retrieves the data from all connected clients and sends it back.
Defined in the msg_classes.py
module, these classes handle the communication between the server and clients. It includes classes for creating and parsing various message types, such as POST
(for sending data), FETCH
(for requesting data), and FETCH_RESP
(for responding to data requests).
The data_classes.py
module defines classes for representing the data objects that are stored and retrieved in the system.
The db_utils.py
module provides a utility class SQLStorage
for interacting with an SQLite database. It enables creating tables, inserting data, updating data, retrieving data, and deleting data from the database.
The settings.py
module contains global constants and utility functions used throughout the project.
- The server starts listening for incoming client connections.
- Clients connect to the server.
- When a client sends data to the server, it distributes the data across the connected clients using a token-based algorithm.
- When a client requests data, the server broadcasts a
FETCH
request to all connected clients. - Each client responds with the data it has stored.
- The server aggregates the data from all clients and sends it back to the requesting client.
- Implement data consistency mechanisms to ensure data integrity across multiple clients.
- Explore more efficient and scalable data distribution strategies, such as consistent hashing or distributed hash tables (DHTs).
- Implement security measures, including authentication, authorization, and encryption.
- Add error handling and exception handling mechanisms.
- Implement logging mechanisms for tracking activities, errors, and debug information.
- Add unit tests and integration tests for system correctness and to facilitate future changes and improvements.
To run the project, follow these steps:
- Start the server by running
python server.py
. - Start one or more clients by running
python client.py
. - In the server terminal, input data to store or type "FETCH" to retrieve data from the clients.
- In the client terminals, observe the data being received and stored.
Note: The project currently uses an in-memory data storage approach. For persistent storage, the code can be modified to integrate with the db_utils.py
module.