Distributed system offering unified memory access across multiple nodes, providing an illusion of a single, large memory chunk. Inspired by models like Hadoop's DFS. Built primarily in C with socket programming.
+----------------------UDP/TCP----------------------+
| |
| |
v |
+--------+ UDP/TCP +--------+ UDP/TCP |
| Node 1 |------------------>| Node 2 |------------------>| Node 3 |
+--------+ GET / PUT +--------+ GET / PUT +--------+
^ |
| |
| |
+---------------------UDP/TCP------------------------+
- Introduction
- Problem Statement
- Hadoop Distributed File System Simulation
- Features
- Code Overview
- Pseudocode
- Technologies
- Getting Started
- Usage
- Output Explanation
- Displaying the Hash Table
- Contributing
- License
- Acknowledgments
Distributed system offering unified memory access across multiple nodes. Users experience the illusion of accessing a large contiguous memory space, even though it's distributed across several machines.
Objective:
To develop a distributed system that offers a unified memory access experience across multiple nodes. Despite the memory being scattered across different physical machines, the end user should perceive it as a single, contiguous chunk of memory.
Background:
In today's data-driven world, there's an increasing need to store and manage vast amounts of data. Traditional single-machine memory models often don't suffice for the voluminous data storage needs. Distributed memory models, such as the one used by the Hadoop Distributed File System (HDFS), allow data to be stored across several machines, taking advantage of combined storage capacities.
Challenges:
-
Distributed Architecture: Memory is fragmented across different physical machines, and accessing it in a unified manner is a challenge.
-
Transparency: The end user should remain unaware of the underlying distribution of data. They should not need to track where particular chunks of their data reside.
-
Network Overhead: Since the memory access occurs over a network, there can be latency and other network-related challenges to address.
-
Consistency: Ensuring data consistency in such distributed environments, especially with concurrent access, can be challenging.
Specifications:
-
Language: The primary language of choice is C due to its efficiency in memory management and low-level operations. However, other languages with socket programming knowledge, such as C++, Python, and Java, are also permissible.
-
Nodes: Consider a scenario with four physical machines, each having a hard drive of 2 GB. The user should perceive this as a single 8 GB memory chunk, even though it's distributed.
-
User Interaction: There will be a network layer between the user and the memory storage. The user interacts with this network layer, which in turn communicates with the distributed nodes.
-
Data Retrieval: The system should ensure swift data retrieval without the user needing to know from which physical machine the data is fetched.
End Goal:
To present the end user with the illusion that they're accessing a large, unified memory chunk, even though the memory is distributed across multiple physical nodes.
This repository contains an implementation of a Distributed Hash Table (DHT), which provides a decentralized way of storing data. This implementation supports GET
and PUT
operations and is designed to work in a network of nodes.
The challenge is to create a distributed system that mimics the behavior of a large-scale key-value store. The system should be able to handle GET
and PUT
requests, route these requests to the appropriate nodes based on keys, and work in a network with an arbitrary number of nodes.
This project serves as a simplified simulation of the Hadoop Distributed File System (HDFS). In HDFS, data is stored across multiple nodes and can be accessed in a distributed manner. Our implementation utilizes a similar architecture, with nodes in a network each taking on the responsibility of storing a subset of keys in the hash table.
- 🔎
GET
- Retrieve a value based on its key. - 📝
PUT
- Insert a key-value pair into the distributed hash table. - 🔄 Automatic key-value forwarding to appropriate nodes.
- 💻 TCP and UDP networking.
The project mainly consists of three parts:
- UDP Socket Programming: Responsible for the communication between nodes in a distributed manner.
- TCP Socket Programming: Handles
GET
andPUT
operations. - Distributed Hash Table (DHT): The core logic for storing and retrieving key-value pairs.
Files:
dht.c
- Main implementation file.
Initialize UDP and TCP sockets
Bind UDP and TCP sockets to ports
Listen for incoming TCP connections
Initialize readfds (file descriptor set)
While True:
Clear readfds
Add UDP socket, TCP master socket, and stdin to readfds
Call select() to watch readfds
// Handle UDP Activity
If UDP socket is ready:
Receive UDP packet into buffer
If packet starts with 'PUT':
Extract key-value from packet
If key falls in local node's range:
Insert key-value pair into Hashtable
Else:
Forward packet to next node via UDP
ElseIf packet starts with 'GET':
Extract key from packet
If key falls in local node's range:
Retrieve value from Hashtable
Send value to requesting node via TCP
Else:
Forward packet to next node via UDP
// Handle TCP Activity
If TCP master socket is ready:
Accept incoming TCP connection
If previously performed GET:
Read value from Hashtable
Send value to connected client
Close TCP connection
ElseIf previously performed PUT:
Read value from connected client
Insert into Hashtable
Close TCP connection
// Handle Standard Input
If stdin is ready:
Read command from stdin
If command is 'put':
Extract key-value
Insert into Hashtable or forward to next node via UDP
ElseIf command is 'get':
Extract key
Retrieve value from Hashtable or forward to next node via UDP
ElseIf command is 'r':
Print Hashtable
Function forwardUDP(destination_node, message):
Create UDP socket
Set destination address and port
Send message to destination
Close UDP socket
- C
- UDP and TCP Sockets
- gcc compiler
- Terminal
- Clone the repository:
git clone
- Navigate to the directory:
cd your-repo-name https://github.com/ANSANJAY/DistMemoryFacade
- Compile the source code:
gcc -o dht dht.c
Certainly, here's a sample README.md
file with emojis to make it more engaging. Just copy and paste the below text into your README file.
To start the program and initialize your 3-node token ring topology, follow these steps:
rum the make command which will make node0
,node1
,node2
make
1️⃣ Open three different terminal windows.
2️⃣ In the first terminal window, compile and run node0.c
:
./node0
3️⃣ In the second terminal window, compile and run node1.c
:
./node1
4️⃣ In the third terminal window, compile and run node2.c
:
./node2
📘 Each node terminal will display instructions for interacting with the distributed hash table.
Each terminal will display similar instruction sets:
- PUT request: To insert a value into the hash table, use the format
PUT(<key>,<value>)
. 📥 - GET request: To retrieve a value from the hash table, use the format
GET(<key>)
. 📤 - Hash Table: To view the current state of the hash table, enter
r
. 📚
📝 Here's a breakdown of the operations:
- Forward the Request 📡: If the node isn't responsible for the key, it will forward the request to the next node.
FORWARD REQUEST : 'xxx(2,2000)0[127.0.0.1,r]' has been forwarded to node ---->1
- Process Locally 🏠: If the node is responsible for the key, it processes the request locally.
PROCESSING THE REQUEST HERE:
RESULT: AT KEY : 3, VALUE INSERTED : 10 IN HASH TABLE SUCCESS
🔍 The GET operation behaves similarly to the PUT operation.
Nodes communicate through UDP packets.
UDP PACKET RECEIVED FROM (IP ADDRESS : 127.0.0.1 , PORT NO : 52071 , NODE NO : 0) : xxx(2,2000)0[127.0.0.1,r]
To view the hash table for any node, simply enter r
. This will display the current state of that node's hash table.
After several PUT
operations, Node 0 has the following in its hash table:
-----Hash Table Contents(0--297)------------
key : 3 ===== value : 5
After several PUT
operations, Node 1 has the following in its hash table:
-----Hash Table Contents(1--298)------------
key : 7 ===== value : 3
-
key : X ===== value : Y: This shows the key-value pair that the node currently maintains.
-
Hash Table Contents(X--Y): This indicates the range of keys for which the node is responsible.
Interested in contributing? We welcome pull requests, bug fixes, and issue reports. Before proposing a change, discuss it via issues.
This project is licensed under the MIT License - see the LICENSE.md file for details.
Made with ❤️ by Anamika