-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lotus-miner deal status not in sync with on chain status of deals #185
Comments
Hi @f8-ptrk Thanks for creating the ticket. |
this is great ux feedback - transferring to boost so they will make sure they cover this in their design. In lotus = we should make sector fsm states up to date for |
@Reiers this should actually live in the lotus repo |
I have the same issue: It's been happening a LOT more lately, and since we're only doing deals to increase storage, it's becoming a big problem as I have to check manually if each error'ed deals has gone through :-( Storage Deal status (lotus-miner storage-deals list -v): error awaiting deal pre-commit: failed to set up called handler: called check error (h: 1570875): failed to look up deal on chain: deal 3964785 not found - deal may not have completed sealing before deal proposal start epoch, or deal may have been slashed Market node logs (with the hand off first) 2022-02-21T19:58:30.126+1100 INFO providerstates providerstates/provider_states.go:329 handing off deal to sealing subsystem {"pieceCid": "baga6ea4seaqpm5ipi346kviurqgeglar3qkg4yn2r32fb5aanuza3z24jghymky", "proposalCid": "bafyreihr743zllr2eckgfiweouiap7pgcjnoyqldqa3mg3t75jjt7sfcpu"} 2022-02-21T20:00:54.519+1100 INFO providerstates providerstates/provider_states.go:376 successfully handed off deal to sealing subsystem {"pieceCid": "baga6ea4seaqpm5ipi346kviurqgeglar3qkg4yn2r32fb5aanuza3z24jghymky", "proposalCid": "bafyreihr743zllr2eckgfiweouiap7pgcjnoyqldqa3mg3t75jjt7sfcpu"} 2022-02-21T20:00:54.522+1100 INFO markets loggers/loggers.go:20 storage provider event {"name": "ProviderEventDealHandedOff", "proposal CID": "bafyreihr743zllr2eckgfiweouiap7pgcjnoyqldqa3mg3t75jjt7sfcpu", "state": "StorageDealAwaitingPreCommit", "message": ""} 2022-02-21T20:06:01.555+1100 INFO markets loggers/loggers.go:20 storage provider event {"name": "ProviderEventDealPrecommitFailed", "proposal CID": "bafyreihr743zllr2eckgfiweouiap7pgcjnoyqldqa3mg3t75jjt7sfcpu", "state": "StorageDealFailing", "message": "error awaiting deal pre-commit: failed to set up called handler: called check error (h: 1570875): failed to look up deal on chain: deal 3964785 not found - deal may not have completed sealing before deal proposal start epoch, or deal may have been slashed"} 2022-02-21T20:06:01.559+1100 WARN providerstates providerstates/provider_states.go:561 deal bafyreihr743zllr2eckgfiweouiap7pgcjnoyqldqa3mg3t75jjt7sfcpu failed: error awaiting deal pre-commit: failed to set up called handler: called check error (h: 1570875): failed to look up deal on chain: deal 3964785 not found - deal may not have completed sealing before deal proposal start epoch, or deal may have been slashed 2022-02-21T20:06:03.950+1100 INFO markets loggers/loggers.go:20 storage provider event {"name": "ProviderEventFailed", "proposal CID": "bafyreihr743zllr2eckgfiweouiap7pgcjnoyqldqa3mg3t75jjt7sfcpu", "state": "StorageDealError", "message": "error awaiting deal pre-commit: failed to set up called handler: called check error (h: 1570875): failed to look up deal on chain: deal 3964785 not found - deal may not have completed sealing before deal proposal start epoch, or deal may have been slashed"} |
is this still an issue with boost? |
I haven't seen it happening on boost, so I think it might have been a market node problem. I would close it. We can start a new issue if it appears on boost. |
80% of miners are still using Markets Its still an issue |
i think markets are dead and dev is fully on boost by now? |
I'm seeing this issue on more than one monolith miner+markets setups. |
@f8-ptrk @RobQuistNL Yeah, I don't think there are anymore dev on markets. Boost v1 just got released and they are working on a boost version that will work with the v16 network upgrade, so I'm going to make a guess that market will be dead after v16 (?). @marshyonline those issues seems to have been fixed in boost |
seem to or have been fixed? will this clean up the mess the market v1 created? |
I've been on boost for a couple of months now, and the error has not appeared. |
hey @dirkmc - are you the DRI for fully resolving this issue now that boost is GA and the default path going forward? I want to make sure we resolve this sync issue more holistically than this new "register shard" one-off command (#517) currently does. Effectively, we should ensure there is a process that audits the dagstore and repairs shards automatically, not wait for an SP to detect and manually fix an issue. Please LMK if you need more support from the lotus team on this - but I hear this is still a major issue for Evergreen SPs effectively serving retrieval issues - and therefore think this should be a P1 with a clear, dedicated DRI. |
Switching from markets to Boost will resolve the "failed to lookup deal on chain" problem. With regards to syncing the deal state from the state of the chain, we have an open ticket to do so: Investigate syncing Boost with the latest state of the miner and/or chain. |
this is not a failed to look up deal on chain problem. this is a local database problem. the deal actually never fails - it fails to be recorded in the local database. the only way to make this clean is to sync the local database with whats on chain and then sync with the sector store. everything else will fail again in the future in the end it's a design problem of using multiple databases that record the "same information" and aren't able to stay in sync! |
The underlying issue is that after the deal has been submitted to the sealing node, markets attempts to look up the deal state on chain, and this lookup fails. |
Let me clarify that last sentence. Boost doesn't do the lookup for deals made with v1.2.0 of the deal proposal protocol. One straight-forward fix would be for us to modify the legacy markets code base such that it will no longer try to lookup the state of the deal on chain after handing it off to the sealing subsystem. My understanding is that people are not really interested in checking the state of the deal on chain by using |
I would like my local miner to know what deals are on there without needing to parse the entire chain or use an external API to do that. It would also be nice if we were able to retrieve all the exabytes of storage that are on the network now from the slingshot events - a lot of that now is not retrievable and if I'm reading this correctly you're saying its not ever going to work because they are old deals. |
I don't mind not having the "storage-deals list" on the miner since this function is now available on the boost node. |
i pretty much care to not have that call removed from the lotus-miner! boost is an option to use and we shouldn't rely on it being present! i agree with rob: without this being fixed for past deals we have a huge problem as we have 100PB dead data on the chain that cannot be retrieved in the end |
Two separate issues here:
|
where does the wrong data reside right now? in the miner or the markets/boost or the miner? who is responsible for the underlying database that causes the problems? thats where it needs to be fixed for past deals. |
I agree that needs a fix. Let’s put this into perspective - the proofs and lotus team has put a great effort into implementing snap up, which potentially allows us to activate all the CC out there. Now, on the other hand we cannot prioritise fixing broken deal data? Then we might just as well count all this as dead data. It’s even worse than CC, as it cannot be snapped up, and it cannot be retrieved. I guess I just don’t buy that we won’t get a fix, as it is boring to fix old stuff, rather than building new features. |
@dirkmc Afaik @benjaminh83 while I understood your point, I think it’s important to point out that - the core devs/maintainers for boost, were also the main contributors to market protocol v1 & lotus market. The same team is now not only developing a new deal making software, but also developing a new version of market protocol. It’s common that certain bug fixes are only included in the newer version of protocol/software, without backporting to the order/potentially-to-be-deprecated version.
|
it's critical as a lot of deals are relying on first being able to be retrieved (either evergreen or avoiding re-transfer for renewals). evergreen records a 98% fail rate for retrievals afaik for example. Most of the slingshot deals cannot be retrieved. as a storage provider, with that bug in existence, it is hard to make serious storage deals knowing that the data cannot be retrieved by the client anymore. the problem is that it affects possibly a lot of old deals and boost doesn't seem to fix the issue for old deals and they stay irretrievable forever - are dead weight, as benjamin said. |
Shouln't this be fixed by manually being able to reimport a missing piece? LexLuthr already backported this function: filecoin-project/lotus#8645 If I'm not mistaken (I haven't tried 1.17.0-rc1 yet, but will soon) - we can use this version of lotus to;
and then we should be able to retroactively fix all the broken ones? Once again I'm not sure but if its a one off thing (that we can potentially simply throw into the dagstore migration code) we should be able to add that in (and maybe even run it once a week, I'm fine with that workaround) |
Sounds like there are two issues here:
We're planning to write a As Jennifer pointed out, we are a small team with limited resources so it's a question of what to prioritize. After working on the existing codebase for about a year we realized that
To fix this issue we'll need to spend a significant amount of time to figure out what's causing the problem, fix and test, and then you'll need to wait for the next stable release of lotus to upgrade. We've now released Boost v1.0.0 and we'd like to help SPs switch over. Please let us know in the #boost-help channel if there are things that are holding you back from upgrading and we'll try to address them. |
there is no boost version for v16 that could be tested on calibration net - thats a show stopper. for storage providers the issue is: retrievals for a lot of deals, most likely the majority, on the network don't work. for storage providers it is not relevant what happens to future deals, but what happens to the ones that are already stored. sure future deals are important but the work of the last 12+ month was in the end for nothing if stuff that was stored, 100PB, cannot be retrieved. |
Release v1.1-rc1 targets lotus version 16. We're working on fixing a couple of bugs in it, hopefully we'll have v1.1-rc2 today or tomorrow.
We're planning to write a |
P0! @jennijuju looking at it and someone will try it out |
Just chiming in here - it is absolutely a priority that the 100PiB of data that is currently stored on Filecoin (ex major datasets like below⬇️) are retrievable, and we fix the bugs in retrieving these v1.1.0 deals. Great to fix this in boost (yay), but we also need retrievals on old deals to work. The priority of the CLI command for listing local deals is definitely less important than retrievals working for all old deals, imho - so if that's the fastest way to fix that for now, I would be supportive. But I also agree with @RobQuistNL's comment that ideally there's a fix that does still allow listing out old deals (afaik Boost will only list out deals made with boost, and there are a lot of deals out there that have already been made and deserve support, attention, love, and proper consideration.) |
In the Boost UI you can see deals made with v1.1.0 of the deal proposal protocol (legacy deals) and deals made with v1.2.0 of the deal proposal protocol. See https://boost.filecoin.io for screenshots. Note that in Boost, deals made with either version of the protocol are retrievable. This issue is open against the Boost repo so I've been discussing it in terms of how to fix it in Boost.
This command is available in v1.17.x of lotus |
Just to be clear we are committed to making sure all deals can be retrieved, both new deals and deals made in the past, whether SPs are running lotus or boost. |
+1 for giving those deals plenty of love... |
hey, this is Charles from Filswan team, we suffer from this problem when we do cross chain storage, because the sync status is not accurate on lotus client, we cannot unlock user fund. something like this:
lotus 1.5 |
@flyworker - can you try the command @dirkmc mentioned in #185 (comment) and see if that manually fixes the issue you're seeing? |
1.17.x is not in a stable state yet @momack2 |
Hi Everyone, as this issue is currently referencing several issues (chain status not in sync, and multiple, separate retrieval problems), we've separated out the larger retrieval problem into a tracking issue, #645, and will be aggregating all bugs, fixes, and new features we develop to resolve these issues. We'll be posting updates to there at least weekly, so please subscribe to that issue if you want to follow updates. Our goal is to leverage that issue to better holistically track the problem. Thank you to everyone who's been submitting issues in Boost and Lotus, we'll be cross referencing issues from Lotus and Slack over the next few days, and ensure they're tracked in #645. Improving retrieval is currently the priority for the team, for all deals (old and new), with an emphasis on supporting deal renewal programs. While our primary focus will be landing fixes in Boost, as we can ship these more frequently, we will be ensuring that we have workarounds, at minimum, in Lotus markets for critical issues. |
Checklist
Latest release
, or the most recent RC(release canadiate) for the upcoming release or the dev branch(master), or have an issue updating to any of these.Lotus component
Lotus Version
Describe the Bug
when lotus-miner declares deals as erroneous and then seals them anyways the deal state in the list output for deals, the active deals count in the info output aren't representing the chain state of deals.
short: lotus-miner does not show active deals as active.
might be more than one bug that leads to this phenomenon.
Logging Information
Repo Steps
The text was updated successfully, but these errors were encountered: