diff --git a/charts/README.md b/charts/README.md index fd950d4..4b33048 100644 --- a/charts/README.md +++ b/charts/README.md @@ -99,11 +99,37 @@ The following table lists the configurable parameters of the latest Simplyblock | `logicalVolume.qos_rw_mbytes` | the value of lvol parameter qos_rw_mbytes | `0` | | | `logicalVolume.qos_r_mbytes` | the value of lvol parameter qos_r_mbytes | `0` | | | `logicalVolume.qos_w_mbytes` | the value of lvol parameter qos_w_mbytes | `0` | | -| `logicalVolume.compression` | set to `True` if compression needs be enabled on lvols | `False` | | | `logicalVolume.encryption` | set to `True` if encryption needs be enabled on lvols. | `False` | | | `logicalVolume.distr_ndcs` | the value of distr_ndcs | `1` | | | `logicalVolume.distr_npcs` | the value of distr_npcs | `1` | | -| `cachingnode.ifname` | the default interface to be used for binding the caching node to host interface | `eth0` | | +| `benchmarks` | the number of benchmarks to run | `0` | | +| `cachingnode.tolerations.create` | Whether to create tolerations for the caching node | `false` | | +| `cachingnode.tolerations.effect` | The effect of tolerations on the caching node | `NoSchedule` | | +| `cachingnode.tolerations.key ` | The key of tolerations for the caching node | `dedicated` | | +| `cachingnode.tolerations.operator ` | The operator for the caching node tolerations | `Equal` | | +| `cachingnode.tolerations.value ` | The value of tolerations for the caching node | `simplyblock-cache` | | +| `cachingnode.ifname` | the default interface to be used for binding the caching node to host interface | `eth0` | | +| `cachingnode.cpuMask` | the cpu mask for the spdk app to use for caching node | `` | | +| `cachingnode.spdkMem` | the amount of hugepage memory to allocate for caching node | `` | | +| `cachingnode.spdkImage` | SPDK image uri for caching node | `` | | +| `cachingnode.multipathing` | Enable multipathing for lvol connection | `true` | | +| `storagenode.tolerations.create` | Whether to create tolerations for the storage node | `false` | | +| `storagenode.tolerations.effect` | the effect of tolerations on the storage node | `NoSchedule` | | +| `storagenode.tolerations.key ` | the key of tolerations for the storage node | `dedicated` | | +| `storagenode.tolerations.operator ` | the operator for the storage node tolerations | `Equal` | | +| `storagenode.tolerations.value ` | the value of tolerations for the storage node | `simplyblock-cache` | | +| `storagenode.ifname` | the default interface to be used for binding the storage node to host interface | `eth0` | | +| `storagenode.cpuMask` | the cpu mask for the spdk app to use for storage node | `` | | +| `storagenode.spdkImage` | SPDK image uri for storage node | `` | | +| `storagenode.maxLvol` | the default max lvol per storage node | `10` | | +| `storagenode.maxSnap` | the default max snapshot per storage node | `10` | | +| `storagenode.maxProv` | the max provisioning size of all storage nodes | `150g` | | +| `storagenode.jmPercent` | the number in percent to use for JM from each device | `3` | | +| `storagenode.numPartitions` | the number of partitions to create per device | `0` | | +| `storagenode.numDevices` | the number of devices per storage node | `1` | | +| `storagenode.iobufSmallPoolCount` | bdev_set_options param | `` | | +| `storagenode.iobufLargePoolCount` | bdev_set_options param | `` | | + ## troubleshooting - Add `--wait -v=5 --debug` in `helm install` command to get detailed error diff --git a/docs/caching-nodes.md b/docs/caching-nodes.md index 9f8328b..e4f46da 100644 --- a/docs/caching-nodes.md +++ b/docs/caching-nodes.md @@ -10,21 +10,22 @@ Caching nodes are a special kind of node that works as a cache with a local NVMe Make sure that the Kubernetes worker nodes to be used for cache has access to the simplyblock storage cluster. If you are using terraform to deploy the cluster. Please attach `container-instance-sg` security group to all the instances. -#### Step1: Install nvme cli tools +#### Step1: Install nvme cli tools and nbd To attach NVMe device to the host machine, the CSI driver uses [nvme-cli]([url](https://github.com/linux-nvme/nvme-cli)). So lets install that ``` sudo yum install -y nvme-cli sudo modprobe nvme-tcp +sudo modprobe nbd ``` #### Step1: Setup hugepages -Before you prepare the caching nodes, please decide the amount of huge pages that you would like to allocate for simplyblock and set those hugepages accordingly. We suggest allocating at least 8GB of huge pages. +Before you prepare the caching nodes, please decide the amount of huge pages that you would like to allocate for simplyblock and set those hugepages accordingly. +It is recommended to use a minimum of 1 GiB + 0.5% of the size of the local SSD, which you want to use as a cache. For example, if your local SSD has a size of 1.9 TiB, and you want to use it entirely as a write-through cache, you need to assign 10.5 GiB of RAM. If you only want to utilize 1 TiB (52.9% of the SSD), you assign 6 GiB of RAM and the cache will be automatically resized to fit the available (assigned) memory. >[!IMPORTANT] ->The caching node requires at least 2.2% of the size of the nvme cache + 50 MiB of RAM. This should be the minimum configured as hugepage ->memory. +>One huge page contains 2 MiB of memory. A value of e.g. 4096 therefore is equal to 8 GiB of huge page memory. ``` sudo sysctl -w vm.nr_hugepages=4096 @@ -58,13 +59,13 @@ lspci After the nodes are prepared, label the kubernetes nodes ``` -kubectl label nodes ip-10-0-4-118.us-east-2.compute.internal ip-10-0-4-176.us-east-2.compute.internal type=cache +kubectl label nodes ip-10-0-4-118.us-east-2.compute.internal ip-10-0-4-176.us-east-2.compute.internal type=simplyblock-cache ``` Now the nodes are ready to deploy caching nodes. ### StorageClass -If the user wants to create a PVC that uses NVMe cache, a new storage class can be used with additional volume parameter as `type: simplyblock-cache`. +If the user wants to create a PVC that uses NVMe cache, a new storage class can be used with additional volume parameter as `type: cache`. ### Usage and Implementation diff --git a/docs/storage-nodes.md b/docs/storage-nodes.md new file mode 100644 index 0000000..e4c01bc --- /dev/null +++ b/docs/storage-nodes.md @@ -0,0 +1,66 @@ +### Storage nodes volume provisioning + +Apart from a disaggregated storage cluster deployment, storage-plane pods can now also be deployed onto k8s workers and they may-coexist with any compute workload (storage consumers). +Depending on the type of the storage node, it has to come with either at least one locally attached nvme drive or ebs block storage volumes are auto-attached in the during the deployment (aws only). + +### Preparing nodes + +#### Step 0: Networking & tools + +Make sure that the Kubernetes worker nodes running storage-plane pods have nvme-oF access to each other and - if needed - external storage nodes in the simplyblock cluster. They also need connectivity to/from the simplyblock control plane. If you are using terraform to deploy the cluster. Please attach `container-instance-sg` security group to all the instances. + +#### Step1: Install nvme cli tools and nbd + +To attach NVMe device to the host machine, the CSI driver uses [nvme-cli]([url](https://github.com/linux-nvme/nvme-cli)). So lets install that +``` +sudo yum install -y nvme-cli +sudo modprobe nvme-tcp +sudo modprobe nbd +``` + +#### Step1: Setup hugepages + +Simplyblock uses huge page memory. It is necessary to reserve an amount of huge page memory early on. +The simplyblock storage plane pod allocates huge page memory from the reserved pool when the pod is added or restarted. +The amount reserved is based on parameters provided to the storage node add, such as the maximum amount of logical volumes and snapshots and the max. provisioning size of the node (see helm chart parameters). +The minimum amount to reserve is 2 GiB, but try to reserve at least 25% of the node's total RAM. +It is fine to reserve more than needed, as Simplyblock will allocate only the amount required from that pool and the rest can be used by the system. + +>[!IMPORTANT] +>One huge page is 2 MiB. So e.g. a value of 4096 reserves 8 GiB of huge page memory. + +``` +sudo sysctl -w vm.nr_hugepages=4096 +``` + +confirm the hugepage changes by running +cat /proc/meminfo | grep -i hug + + +and restart kubelet +``` +sudo systemctl restart kubelet +``` + +conform if huge pages are added to the cluster or not. +``` +kubectl describe node ip-10-0-2-184.us-east-2.compute.internal | grep hugepages-2Mi +``` +this output should show 8GB. This worker node can allocate 8GB of hugepages to pods which is required in case of SPDK pods. + +#### Step2: Mount the SSD or EBS to be used by the storage node +If the instance comes with a default NVMe disk, it can be used with minimum of 2 partitions and 2 device where one is used for Journal manager and the other storage node. Or 2 additional EBS one for Journal Manager and the other for the Storage. the disks can be viewed by running: + +``` +sudo yum install pciutils +lspci +``` + + +#### Step3: Tag the kubernetes nodes + +After the nodes are prepared, label the kubernetes nodes +``` +kubectl label nodes ip-10-0-4-118.us-east-2.compute.internal ip-10-0-4-176.us-east-2.compute.internal type=simplyblock-storage-plane +``` +Now the nodes are ready to deploy storage nodes. diff --git a/docs/support-ports.md b/docs/support-ports.md index 4e80691..696a9e7 100644 --- a/docs/support-ports.md +++ b/docs/support-ports.md @@ -1,4 +1,4 @@ -# Supported Port for eks or ks3 +# Supported Port for eks or ks3 for caching-node | Port | Protocol | Description | -------------- | ------------- | ------------- @@ -8,3 +8,19 @@ | 2375 | TCP | Docker Engine API. Allows the management node to communicate with Docker engines running on other nodes. | - | ICMP | Allows ICMP Echo requests. Used for ping operations to check the availability and responsiveness of management nodes. | 5000 | TCP | Caching node. Enables communication with caching services running on the node. + + +# Supported Port for eks or ks3 for storage-node + +| Port | Protocol | Description +| -------------- | ------------- | ------------- +| 6443 | TCP | Kubernetes API server. Required for communication between the Kubernetes control plane and the nodes in the cluster. +| 22 | TCP | SSH access to the instances. Necessary for administrative access and management. +| 8080 | TCP | SPDK Proxy for the storage node. Facilitates communication between the storage nodes and the management node. +| 2375 | TCP | Docker Engine API. Allows the management node to communicate with Docker engines running on other nodes. +| - | ICMP | Allows ICMP Echo requests. Used for ping operations to check the availability and responsiveness of management nodes. +| 5000 | TCP | Storage node. Enables communication with storage-node services running on the node. +| 4420 | TCP | Storage node logical volume (lvol) connection - this port must be open (1) btw. all of the workers hosting storage plane pods (2) from all workers with pods connecting to storage to any workers hosting storage plane pods and any external storage nodes. +| 53 | UDP | DNS resolution from worker nodes. Necessary for resolving internal DNS queries within the cluster. +| 10250-10255 | TCP | Kubernetes node communication. Used for kubelet API communication between the nodes. +| 1025-65535 | UDP | Ephemeral ports for UDP traffic. Required for certain network protocols and services.