-
Notifications
You must be signed in to change notification settings - Fork 525
Troubleshooting
This page lists common errors and ways to address them.
-
I get this error:
hivemind.dht.protocol.ValidationError: local time must be within 3 seconds of others
on WSL. What should I do?Petals needs clocks on all nodes to be synchronized. Please set the date using an NTP server:
sudo apt install ntpdate sudo ntpdate pool.ntp.org
-
The server starts loading blocks and then prints:
Killed
. What should I do?This happens since Windows doesn't allocate much RAM to WSL by default, so the server gets OOM-killed.
To increase the memory limit, go to
C:/Users/username
and create the.wslconfig
with this contents:[wsl2] memory=12GB
Then reboot WSL (run
sudo reboot
in the WSL console) and it should work fine. -
I get this error:
torch.cuda.OutOfMemoryError: CUDA out of memory
. What should I do?If you use an Anaconda env, run this before starting the server:
export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:128
If you use Docker, add this argument after
--rm
in the Docker command:-e "PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:128"
-
WSL clock tends to get out of synch, which prevents Petals server launch with error
hivemind.dht.protocol.ValidationError: local time must be within 3 seconds of others
.To sync WSL clock run
sudo ntpdate pool.ntp.org
. See more fixes discussed at stackverflow.
If your error is not covered there, let us know in Discord and we will help!