fault tolerance issue with hdds.container.ratis.datanode.storage.dir #7505

julienlau · 2024-11-28T15:45:21Z

julienlau
Nov 28, 2024

Hello,

I have 6 datanodes with 20 HDDs available on each of them.
I configured 19 HDD for data and I configured 1 HDD for datanode ratis on each node.
It appears that if the drive hosting the ratis goes down the whole datanodes goes down.
This is not what I would expect. How can I configure the metadata of each drive to be colocated with the data so that the fault tolerance would be the same for all drives ?

On ceph I would have configured 20 OSD and ceph automatically colocate metadata and data on the same drive in order to tolerate the fault of any drive with the same tolerance.

This is a big problem because loosing HDD happens very often.
I have bad experience with failing nodes on crowded object stores : at the same time you loose 1/6 of the infrastructure + the erasure coding needs more CPU to reconstruct the objects and serve requests

In addition, there is no best practices on hardware recommendations or sizing. In contrast, Ceph documentation makes very detailed sizing recommendations. For exemple the equivalent of this datanodes.ratis drive would be the bluestore drive for Ceph and it's recommended to have not more than 3 HDD per SSD bluestore drive and there are also recommendations for the size of the bluestore drive.
With ozone I don't the space or IOPS that would be use on this datanodes.ratis ? How will it evolve with 20 HDDs ? 40 HDDs ?

Regards

kerneltime · 2024-12-03T05:57:56Z

kerneltime
Dec 3, 2024
Collaborator

@julienlau can you provide more details for the failure you see.

Typically, the Ratis transaction logs are required for writes and are stored on a SSD on a Datanode and the data is stored on spindle drives. The transaction logs do not grow indefinitely and the size of the SSD tends to flatline based on the number of transaction logs configured. The number of transaction logs stored is a function of the number of pipelines configured.

Your observation for availability should apply to writes only. Let us know which version of the code you are using and what has been your observation.

Also, you can co locate pipelines with the data drives (but SSD is still recommended), if certain drives fail, there should be new pipelines created.

0 replies

julienlau · 2024-12-03T10:41:24Z

julienlau
Dec 3, 2024
Author

Hi,
Thanks for the insights.
version is 1.4.0.

I known SSD are recommended, but my datanodes servers are there and they are full HDD.
I can only have SSD for my control plane : 3 servers with NVMe.

If I turn down ozone on a single datanode, unmount the drive used for hdds.container.ratis.datanode.storage.dir then it is impossible to restart ozone on this node. The node is gone.

I don't know what a pipelines is and I don't find any mention in the doc on how to configure pipeline to be colocated with data drive.
I don't know if it is related to https://ozone.apache.org/docs/edge/feature/streaming-write-pipeline.html but I did not understand this page at all.
If you could please provide snippets that would be nice.

0 replies

julienlau · 2024-12-03T14:54:53Z

julienlau
Dec 3, 2024
Author

It is very strange to have metadata everywhere !
SCM store metadata
OM store metadta
datanodes stores metadata in the hdds.container.ratis.datanode.storage.dir and also directly in the data directory ...
here /data/1 is my hdds.container.ratis.datanode.storage.dir

and /data/2 is a data directory where there also are some metadata:

0 replies

julienlau · 2024-12-03T15:52:12Z

julienlau
Dec 3, 2024
Author

maybe related to this feature :
https://ozone.apache.org/docs/edge/feature/dn-merge-rocksdb.html

but it does not make much sense which option to choose. After reading the doc it gives the feeling that the V3 default version should not be used.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fault tolerance issue with hdds.container.ratis.datanode.storage.dir #7505

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 4 comments

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

fault tolerance issue with hdds.container.ratis.datanode.storage.dir #7505

julienlau Nov 28, 2024

Replies: 4 comments

kerneltime Dec 3, 2024 Collaborator

julienlau Dec 3, 2024 Author

julienlau Dec 3, 2024 Author

julienlau Dec 3, 2024 Author

julienlau
Nov 28, 2024

kerneltime
Dec 3, 2024
Collaborator

julienlau
Dec 3, 2024
Author

julienlau
Dec 3, 2024
Author

julienlau
Dec 3, 2024
Author