-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"datastore: key not found" / "failed to lookup index for mh" #499
Comments
Hi @RobQuistNL Im seeing the same - currently re-initiating the dagstore again to see if that resolves it. btw can you share the output of: after that run: see if there are any change in size dagstore/* Im seeing that after I pass around 100GB in size |
I have not seen any deals stuck in Staged yet - but the markets actor takes forever to start. For sake of clarity I'm logging the how-to on re-initializing the dagstore here: https://lotus.filecoin.io/storage-providers/configure/dagstore/#first-time-migration |
I am trying to recover 3 shards here. Have removed and terminated 3 sectors and did the --recover afterwards to see if i can get it back on track. lotus-miner dagstore list-shards | grep Recovering |
I have just ran the entire re-indexing (which took almost 2 days) but I still can not retrieve some deals that are 100% active on chain. |
but the piecestore shows it:
|
@RobQuistNL I'm not sure what's going wrong here. As part of the DAG store initialization process the DAG store:
The output of initialize-all that you pasted shows that the indexing completed correctly for the piece. When the client tries to retrieve a file using the payload CID:
Something I noticed that seems odd in the output that you pasted:
The piece should not have a size of zero. I wonder if there's a problem with that particular CAR file. Are you seeing this same error when attempting to retrieve other pieces? |
Yes, this error happens with a lot of my deals. They have mostly been generated like so;
then I check the last line of the .out (which should have the dag root hash)
and then that one is being used to make deals & seal. I know it worked before - I was able to list the contents of that file using the I have ~12k car files which I did that way, here's another;
|
@RobQuistNL I'm going to do some testing today and try to reproduce this issue |
I can send you some example .car files if you want to seal one of them. Be aware that a TON of slingshot deals have been made this way (and the weird thing is that I'm positive I've seen the listing work on these deals before) |
Here's a list of providerID's and retrievals that failed because of this bug;
This is from running my evergreen bot for about 1 hour edit: 3 hours |
I was able to repro on our miner, I'm debugging now |
So in my case the problem was actually that the deal failed, so it wasn't indexed correctly. What happens if you do
and then try to retrieve by payload cid?
Note that in this example you provided it may not be working because you're trying to retrieve by piece cid (instead of payload cid):
(initialize-all outputs piece cids, not payload cids) |
This is exactly what I've stated before ( |
This is carfile # 23: deal on f01392893: Deal Status:
Pieces info:
CID info:
|
But yeah, if the index provider only receives "active" deals from the deal subsystem, then thats not going to work because that 2-year-old bug is snowballing through to the indexing subsystem. |
More related bugs: |
|
I'm not so familiar with this part of the code, I'm looking through how it works as we go. So it seems like there are a couple of stages that a deal goes through with the DAG store:
When you run So I think we need a command that will allow you to manually register a deal with the DAG store. I will add this as a ticket to Boost and close out this ticket. |
I initially opened this issue in the lotus project - shouldn't the root cause be fixed there instead of having a patch / workaround here? |
The DAG store is part of the markets subsystem, which is now being replaced by Boost. So the fix should be in Boost. |
Okay. I'm not really sure which one are really "Errored" but its safe to say there's a lot of data that actually is available, thats not unretrievable because of this (or underlying) issues;
|
Closing in favour of #509 |
I mean if there's a way that we can patch it with a command like this, maybe we can at least restore our dagstore to a normal state using that. The underlaying issue still seems to exist though. Here's a recent deal thats active and probably unretrievable; master@daemon:~$ lotus state get-deal 5816931
{
"Proposal": {
"PieceCID": {
"/": "baga6ea4seaqndlukp7vphibmpxn32rjhjqt6yhw7oo3fs2ktdzkys4zmk437iiq"
},
"PieceSize": 34359738368,
"VerifiedDeal": true,
"Client": "f01817237",
"Provider": "f01392893",
"Label": "uAXASILL83AGD7C185VjkO5CHUFTA0gKNLpeEgCGdY3fNRbP9",
"StartEpoch": 1813627,
"EndEpoch": 3237918,
"StoragePricePerEpoch": "0",
"ProviderCollateral": "5911184172583188",
"ClientCollateral": "0"
},
"State": {
"SectorStartEpoch": 1773590,
"LastUpdatedEpoch": -1,
"SlashEpoch": -1
}
}
|
I can provide a list of dealCID's / dealID's that are active on chain but are marked as failed by doing some simple CLI stuff with the tools available now - is there a way I can allow these to work in my "legacy" lotus markets setup? Or do I need to wait on #509 ? |
I can safely say about 70% of my deals are currently unretrievable. Looking at other providers, I feel like they have similar values. |
@kernelogic This is fixed in boost if you want to give it a try. Otherwise, it should be available in upcoming lotus stable release. |
Now that 1.17.0 is finally stable; Before:
Then ran:
Then output was:
So this fixes the issue. Now to automate it :) |
Checklist
Latest release
, or the most recent RC(release canadiate) for the upcoming release or the dev branch(master), or have an issue updating to any of these.Lotus component
Lotus Version
Describe the Bug
I have a host that
But when trying to retrieve the data from the host, the retrieve command returns the following;
Is the markets actor broken? Does it not know where to find the data? How can we repair it?
Logging Information
Repo Steps
The text was updated successfully, but these errors were encountered: