This guide outlines the steps to set up a sharded MongoDB infrastructure using Docker.
This project is to make three mongodb sharded data clusters, which allows an increased speed and availability. The aim is to make requests to manage Montpellier restaurants data, through a REST API. The data is stored in a csv file, that has been cleaned with a python script.
Note : .env file normally is not pushed on an online repository, but for testing reasons for the teacher we pushed it, furthermore, the repository is internal. Note 2 : The README about the API and its routes is located in the path api-node/src/routes/README.md
To establish the entire infrastructure, execute the following commands which will starts 16 containers which are :
- 3 mongos to route requests to the configServers
- 3 configServers which initialize a replica set
- 9 shards, 3 for each config server, which are storage nodes that allow horizontal scaling by storing a subset of data
- 1 postgres for the authentication to the API
It will split "manually" data in 3 chunks of data, since mongo does not split automatically data for very small collection.
chmod +x ./setup/setup.sh
chmod 400 mongodb-keyfile
sudo chown 999:999 mongodb-keyfile
./setup/setup.sh
You must adapt the moving of the chunks according to the primary shard that was elected by mongos.
# version without keyfile
sudo docker exec -it mongos1 mongosh
# version with keyfile
sudo docker exec -it mongos1 mongosh --username admin --password admin --authenticationDatabase admin
use restaurantsDB
db.printShardingStatus()
# Visualize the distribution of data among shards to see on which shard chunks are located
db.getSiblingDB("restaurantsDB").restaurants.getShardDistribution()
sh.status()
db.adminCommand( { moveChunk : 'restaurantsDB.restaurants',find : {_id : '1704449678'},to : 'shardxrs' } )
db.adminCommand( { moveChunk : 'restaurantsDB.restaurants',find : {_id : '4734986492'},to : 'shardxrs' } )
db.printShardingStatus()
To check if the data is present, run this command :
sudo docker exec -it mongos1 mongosh --username admin --password admin --authenticationDatabase admin --eval 'db.getSiblingDB("restaurantsDB").restaurants.find().toArray()'
To shut down and clear every dockers
./setup/down.sh
You can run a REST API to make requests in the cluster Enter the following commands to run the API, and then you can use a tool like Postman to make requests.
npm init -y
npm install
npm run dev
To test it, do a get request on this url : localhost:5050/restaurants it will return the collection
To clean and reformat the csv file :
cd setup
python -m venv myenv
source myenv/bin/activate
pip install -r requirements.txt
python clean.py
deactivate