Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Balance Seems to Ignore Available Drive #329

Open
old-square-eyes opened this issue Jun 11, 2024 · 11 comments
Open

Balance Seems to Ignore Available Drive #329

old-square-eyes opened this issue Jun 11, 2024 · 11 comments
Labels

Comments

@old-square-eyes
Copy link

I had 3 of my smaller drives fill up, and a 4th (much bigger) drive has plenty of space. This is obviously due to file copies settings. I added a new drive with the GUI and restarted the service. greyhole -s sees it.

I ran a FSCK for good measure and it found a bunch of files without the correct number of copies and copied them to the new drive. Great.

So I wanted to do a balance. I kicked that off, and reviewing the live logs I immediately started to see my new drive (with free space) being rejected. Here is an example...

DEBUG balance:   Drives with available space: /srv/dev-disk-by-uuid-qwerty-fd90-40fc-8c2c-786ac2754860/gh (1.52TB avail) - /srv/dev-disk-by-uuid-qwerty-d070-4452-ae8e-9e72df22a300/gh (249GB avail) - /srv/dev-disk-by-uuid-qwerty-3398-4b68-9b50-26f3c606e39c/gh (105GB avail) 
Jun 11 22:54:30 DEBUG balance:   Drives with enough free space, but no available space: /srv/dev-disk-by-uuid-qwerty-7b04-4f23-84ba-7aa3e857fb51/gh (-2.16GB avail) - /srv/dev-disk-by-uuid-qwerty-1d2a-4bfb-8f90-450c4fcc7323/gh (-8.87GB avail) - /srv/dev-disk-by-uuid-qwerty-12b5-4877-a232-88063e4ff2dc/gh (-9.39GB avail) 
Jun 11 22:54:30 DEBUG balance: ││││ Target drive: /srv/dev-disk-by-uuid-qwerty-d070-4452-ae8e-9e72df22a300/gh (249GB available)
Jun 11 22:54:30 DEBUG balance: ││├┘ Target drive needs more available space; moving a file there would do the opposite. Skipping.

This is happening over and over.

@gboudreau
Copy link
Owner

Hi. Restart the daemon, and look at the logs when it starts. There might be a warning there about this new drive.
Also, the log excerpt you pasted doesn't show the source file. Reading just that, I would think the source file is on the fd90 drive, which is why it is NOT picked as the target.

@old-square-eyes
Copy link
Author

No warning. Here is a bigger log...

Jun 11 23:12:14 DEBUG sleep: Nothing to do... Sleeping.
Jun 11 23:12:24 INFO balance: Now working on task ID 4319268: balance /
Jun 11 23:12:24 INFO balance: Starting available space balancing
Jun 11 23:12:24 DEBUG balance: ┌ Will balance the shares in the following order: Photos, Videos, Music, Software, _Shared
Jun 11 23:12:24 DEBUG balance: ├┐ Balancing share: Photos (100MB or + files)
Jun 11 23:12:24 DEBUG balance: │├ Will work on the storage pool drives in the following order: /srv/dev-disk-by-uuid-qwerty-12b5-4877-a232-88063e4ff2dc/gh, /srv/dev-disk-by-uuid-qwerty-1d2a-4bfb-8f90-450c4fcc7323/gh, /srv/dev-disk-by-uuid-qwerty-7b04-4f23-84ba-7aa3e857fb51/gh, /srv/dev-disk-by-uuid-qwerty-3398-4b68-9b50-26f3c606e39c/gh, /srv/dev-disk-by-uuid-qwerty-d070-4452-ae8e-9e72df22a300/gh, /srv/dev-disk-by-uuid-qwerty-fd90-40fc-8c2c-786ac2754860/gh
Jun 11 23:12:24 DEBUG balance: │├┐ Balancing storage pool drive: /srv/dev-disk-by-uuid-qwerty-12b5-4877-a232-88063e4ff2dc/gh (-9.39GB available, target: 315GB)
Jun 11 23:12:24 DEBUG balance: │││ Found 469 files that can be moved.
Jun 11 23:12:24 DEBUG balance: ││├┐ Working on file: Photos/2016-11-13/100CANON/_MOV/MVI_2087.MOV (469MB)
Jun 11 23:12:24 DEBUG balance:   Drives with available space: /srv/dev-disk-by-uuid-qwerty-fd90-40fc-8c2c-786ac2754860/gh (1.52TB avail) - /srv/dev-disk-by-uuid-qwerty-d070-4452-ae8e-9e72df22a300/gh (249GB avail) - /srv/dev-disk-by-uuid-qwerty-3398-4b68-9b50-26f3c606e39c/gh (105GB avail) 
Jun 11 23:12:24 DEBUG balance:   Drives with enough free space, but no available space: /srv/dev-disk-by-uuid-qwerty-7b04-4f23-84ba-7aa3e857fb51/gh (-2.16GB avail) - /srv/dev-disk-by-uuid-qwerty-1d2a-4bfb-8f90-450c4fcc7323/gh (-8.87GB avail) - /srv/dev-disk-by-uuid-qwerty-12b5-4877-a232-88063e4ff2dc/gh (-9.39GB avail) 
Jun 11 23:12:24 DEBUG balance: ││││ Target drive: /srv/dev-disk-by-uuid-qwerty-d070-4452-ae8e-9e72df22a300/gh (249GB available)
Jun 11 23:12:24 DEBUG balance: ││├┘ Target drive needs more available space; moving a file there would do the opposite. Skipping.

@old-square-eyes
Copy link
Author

This is interesting. The last two pool drives in this image should have spare space. Yet the image seems to say it needs space. There is nothing else on these drives.

image

@gboudreau
Copy link
Owner

The only reason for balance to skip a drive that is listed first in the available drives is if this drive already has a copy of this file. I'm guessing you have multiple file copies on your Photos share, and the large drive already contains a copy of the file it is trying to move, and thus moving this file copy would not help.
Let it continue looking for files that could be moved to free space on your other drives, and it should find some (maybe on another share).

balance will try to free space on all drives, not just the ones that are missing available space. At the end of the balance process, the end result should be that all drives have the same available space, so that when a new files gets added, it could go onto any of those drives, since they all have the same available space.

@old-square-eyes
Copy link
Author

old-square-eyes commented Jun 11, 2024

Any thoughts on the image above? It seems to show incorrect values. I added a second drive. This ran overnight (quite slowly) and files were not moved off the full drives to either of the new drives. I just got tens of thousands of the logs like above.

Like I said, a normal FSCK is copying files to the destination, indicating it's not a drive/permissions error.

You said it's likely because a copy already exists in the destination. But that's impossible as they are both new drives. Other than 1-2GiB dropped in from the FSCK, there are no other files. One drive has 100GiB spare, and the other 250GiB spare.

Could it be a bug/result related to the "Min. free space" having been breached on a couple of the full drives?

Another thought is that perhaps the very slow re-balance didn't finish properly. As you said, it trawls through ALL drives, even those not full. It was still running in the morning but seemed stopped about the time the daily FSCK was scheduled.

@gboudreau
Copy link
Owner

Please detail your setup: list all drives, their size, and which one(s) you added. You mention adding a drive, then a second drive, but since you don't mention which one was added, and how large each drive is, it's very hard to figure out with the partial logs and screenshots, and thus it is very hard to try to help you understand what is happening.

@old-square-eyes
Copy link
Author

old-square-eyes commented Jun 12, 2024

Last two were added recently.

Storage Pool
  Total -   Used =   Free +  Trash = Possible
  /srv/dev-disk-by-uuid-7323/gh:   916G -   915G =     1G +     0G =     1G
  /srv/dev-disk-by-uuid-7fb51/gh:   916G -   908G =     8G +     0G =     8G
  /srv/dev-disk-by-uuid-4860/gh:  3666G -  2099G =  1567G +     0G =  1567G
  /srv/dev-disk-by-uuid-ff2dc/gh:   916G -   915G =     1G +     0G =     1G
  /srv/dev-disk-by-uuid-2a300/gh:   292G -   203G =    90G +     0G =    90G
  /srv/dev-disk-by-uuid-6e39c/gh:   218G -    54G =   152G +     0G =   152G
                                                                 ==========================================
  Total:                                                          6924G -  5094G =  1819G +     0G =  1819G


image

@old-square-eyes
Copy link
Author

The main thing I don't understand is that the logs are contradictory. First it selects a disk with room, then tells you it can't copy the file because there is no room.

Target drive: /srv/dev-disk-by-uuid-a2045ce5-3398-4b68-9b50-26f3c606e39c/gh (52.0GB available)
Jun 12 18:04:30 DEBUG balance: ││├┘ Target drive needs more available space; moving a file there would do the opposite. Skipping.

@gboudreau
Copy link
Owner

By default, greyhole balances the available space on each drive. Meaning it tries to keep the quantity (in GB) of available space the same on all drives in your pool.

That will cause problems if you have hard drives that are smaller than the target available space... Because those drive will be kept (mostly) empty (unless you force Greyhole to put files there, for example with file_copies = max for specific shares, or sticky folders).

From the images you just posted, you can see that most drives have just a little free space available, except for 4860, that has a lot. That means that if you try to balance your storage pool, greyhole will try to find files on all of the other drives, that are NOT yet on that drive, to move those there. This is what the balance status image you first posted shows: red is the files it will try to remove from all drives, and green is where it wants to move those files to.

And this:

Target drive: /srv/dev-disk-by-uuid-a2045ce5-3398-4b68-9b50-26f3c606e39c/gh (52.0GB available)
Jun 12 18:04:30 DEBUG balance: ││├┘ Target drive needs more available space; moving a file there would do the opposite. Skipping.

It is saying that the target drive that was selected, e39c, is not a good option, because balance wants to REMOVE files from that drive, not ADD files, and so it skips this file. This happens because the only possible target, 4860, already has that file stored (because you selected multiple copies), which I would guess is the case for most of your files.

So to solve your problem, you need to add bigger drives to your pool. (Where did you find 292 and 218 GB drives..?) At least a 1 TB drive(s), but 4TB would be better.

@old-square-eyes
Copy link
Author

old-square-eyes commented Jun 12, 2024

I see now I am interpreting the logs/visualisations slightly differently (incorrectly) in that I assumed each drive with available space would be available for filling, and would display remaining available capacity as such.

My requirement of "I have an issue with certain extra copies needing a separate physical drive", being able to be addressed (to the limit of available space on the new drive), by adding new drives, doesn't seem to be catered for by the balance procedure.

I'd have thought the main purpose of re-balance would be to make use of the additional space added, up until the soft limit for that drive, all the while attempting to keep other drives below their respective min free space threshold. Or at least have an option to balance based on percentages, rather than some arbitrary average universal amount that doesn't take the relative drive capacity in to account.

I thought the whole point of Greyhole was to support building out a big pool full of random sized drives (at least that's what attracted me to it).

As it stands, I have three problems. 1. Usable space unable to be reached by the balance procedure, 2. lower than configured numbers of copies of certain shares, and 3. certain drives at 100% capacity with no easy way to reduce those to the min available free space threshold.

All good. I don't mean to sound ungrateful. Just thinking out loud. I guess I'll have to unravel this some other way.

@old-square-eyes
Copy link
Author

old-square-eyes commented Jun 14, 2024

In a development - it seems the GH queue had been stuck for quite some time (weeks or months). Because I cleared it today and did a fsck (can only assume this is why the daily fsck didn't help), and it found multiples of files beyond my copies config (4 copies found when expected 3, for photos). I believe this is from when when I had a SATA cable problem and a drive was offline for a while. So after the fsck I have cleared enough space on each drive so they are all below the min threshold.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants