Troubleshooting

This page lists common errors and ways to address them.

I get this error: hivemind.dht.protocol.ValidationError: local time must be within 3 seconds of others on WSL. What should I do?

Petals needs clocks on all nodes to be synchronized. Please set the date using an NTP server:
```
sudo apt install ntpdate
sudo ntpdate pool.ntp.org
```
The server starts loading blocks and then prints: Killed. What should I do?

This happens since Windows doesn't allocate much RAM to WSL by default, so the server gets OOM-killed.

To increase the memory limit, go to C:/Users/username and create the .wslconfig with this contents:
```
[wsl2]
memory=12GB
```
Then reboot WSL (run sudo reboot in the WSL console) and it should work fine.
I get this error: torch.cuda.OutOfMemoryError: CUDA out of memory. What should I do?

If you use an Anaconda env, run this before starting the server:
```
export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:128
```
If you use Docker, add this argument after --rm in the Docker command:
```
-e "PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:128"
```
WSL clock tends to get out of synch, which prevents Petals server launch with error hivemind.dht.protocol.ValidationError: local time must be within 3 seconds of others.

To sync WSL clock run sudo ntpdate pool.ntp.org. See more fixes discussed at stackverflow.

If your error is not covered there, let us know in Discord and we will help!

This project is a part of the BigScience research workshop.

Provide feedback