Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retrieval truncates car file causing - ERROR: given data does not match expected commP - on import #8663

Closed
10 of 18 tasks
scaseye opened this issue May 17, 2022 · 13 comments
Closed
10 of 18 tasks
Assignees

Comments

@scaseye
Copy link

scaseye commented May 17, 2022

Checklist

  • This is not a security-related bug/issue. If it is, please follow please follow the security policy.
  • This is not a question or a support request. If you have any lotus related questions, please ask in the lotus forum.
  • This is not a new feature request. If it is, please file a feature request instead.
  • This is not an enhancement request. If it is, please file a improvement suggestion instead.
  • I have searched on the issue tracker and the lotus forum, and there is no existing related issue or discussion.
  • I am running the Latest release, or the most recent RC(release canadiate) for the upcoming release or the dev branch(master), or have an issue updating to any of these.
  • I did not make any code changes to lotus.

Lotus component

  • lotus daemon - chain sync
  • lotus miner - mining and block production
  • lotus miner/worker - sealing
  • lotus miner - proving(WindowPoSt)
  • lotus miner/market - storage deal
  • lotus miner/market - retrieval deal
  • lotus miner/market - data transfer
  • lotus client
  • lotus JSON-RPC API
  • lotus message management (mpool)
  • Other

Lotus Version

lotus version 1.15.2+mainnet+git.518dc962e

Describe the Bug

lotus client retrieve --provider f09848 --maxPrice 0 --allow-local --car 'QmSEQBHK9n52iZUFYRWs95bUbbd4o8dftUWMGYgE8wW2fc' $(pwd)/baga6e~tqzeo2ji__QmSEQB~gE8wW2fc.car

downloads a car file that does not match the original car file, and thus generates the mismatched commP.

This is what lotus retrieved...

$ ls -l baga6e~tqzeo2ji__QmSEQB~gE8wW2fc.car
-rw-r--r-- 1 scaseye scaseye 6436657574 May 16 11:47 baga6e~tqzeo2ji__QmSEQB~gE8wW2fc.car

$ sha256sum baga6e~tqzeo2ji__QmSEQB~gE8wW2fc.car
1a09d2a3703da1dc47920bf293924ac67e0dcb9e48f23ece7799d1f589cae2f2 baga6e~tqzeo2ji__QmSEQB~gE8wW2fc.car

$ go/bin/stream-commp < baga6e~tqzeo2ji__QmSEQB~gE8wW2fc.car

CommPCid: baga6ea4seaqlxgfjrxftuymz4khfpyrnpp6korwssctvnbnpp62ng4v6adlp4ba
Payload:          6436657574 bytes
Unpadded piece:   8522825728 bytes
Padded piece:     8589934592 bytes

CARv1 detected in stream:
Blocks:     13968
Roots:          1
    1: QmSEQBHK9n52iZUFYRWs95bUbbd4o8dftUWMGYgE8wW2fc

$ lotus-miner storage-deals import-data bafyreihvlxsjvaxlye3ugtaz4q4zuejpddu4u2uuhwr7nmwoqrodbqufbu $(pwd)/baga6e~tqzeo2ji__QmSEQB~gE8wW2fc.car

ERROR: given data does not match expected commP (got: baga6ea4seaqlxgfjrxftuymz4khfpyrnpp6korwssctvnbnpp62ng4v6adlp4ba, expected baga6ea4seaqarmfjs4i554ghukgg6dqwp7vrpinxqtdn7oyupbiqjkltqzeo2ji)

$ tail -c 16 baga6e~tqzeo2ji__QmSEQB~gE8wW2fc.car | hexdump -C

00000000  65 66 0a 33 30 35 37 30  32 0a 25 25 45 4f 46 0a  |ef.305702.%%EOF.|
00000010

$ head -c 128 baga6e~tqzeo2ji__QmSEQB~gE8wW2fc.car | hexdump -C

00000000  38 a2 65 72 6f 6f 74 73  81 d8 2a 58 23 00 12 20  |8.eroots..*X#.. |
00000010  39 d4 d5 03 20 b5 b0 d0  05 d7 45 ae 36 14 8b 18  |9... .....E.6...|
00000020  95 38 48 c4 b9 bb 42 e1  b7 a4 3b 6c 5c d8 5c eb  |.8H...B...;l\.\.|
00000030  67 76 65 72 73 69 6f 6e  01 bc f6 28 12 20 39 d4  |gversion...(. 9.|
00000040  d5 03 20 b5 b0 d0 05 d7  45 ae 36 14 8b 18 95 38  |.. .....E.6....8|
00000050  48 c4 b9 bb 42 e1 b7 a4  3b 6c 5c d8 5c eb 12 41  |H...B...;l\.\..A|
00000060  0a 24 01 55 12 20 a7 60  c9 cb 6d ea c3 78 9c 58  |.$.U. .`..m..x.X|
00000070  ae 69 e8 ee a3 cd 04 97  1c db 9d 4b 45 23 0a 1a  |.i.........KE#..|
00000080

This is what it should have been

$ ls -l baga6e~tqzeo2ji__QmSEQB~gE8wW2fc.car
-rw-r--r-- 1 lotus lotus 6436657579 May 16 17:15 baga6e~tqzeo2ji__QmSEQB~gE8wW2fc.car
$ sha256sum baga6e~tqzeo2ji__QmSEQB~gE8wW2fc.car
bfdacf77179631286a48c8eb8f62b33596404ae42b7dab752ec825b2d92b1025 baga6e~tqzeo2ji__QmSEQB~gE8wW2fc.car

$ go/bin/stream-commp < baga6e~tqzeo2ji__QmSEQB~gE8wW2fc.car

CommPCid: baga6ea4seaqarmfjs4i554ghukgg6dqwp7vrpinxqtdn7oyupbiqjkltqzeo2ji
Payload:          **6436657579** bytes
Unpadded piece:   8522825728 bytes
Padded piece:     8589934592 bytes

CARv1 detected in stream:
Blocks:     13969
Roots:          1
    1: QmSEQBHK9n52iZUFYRWs95bUbbd4o8dftUWMGYgE8wW2fc

Logging Information

see above.

Repo Steps

  1. Run '...'
  2. Do '...'
  3. See error '...'
    ...
@RobQuistNL
Copy link
Contributor

can you maybe reformat your message, its really hard to read like this

@TippyFlitsUK
Copy link
Contributor

Hey @scaseye

Thank you for your detailed report.

As @RobQuistNL mentioned, this is now rather difficult to make sense of. Can you please either reformat the original post or create a new ticket.

Many thanks!

@TippyFlitsUK TippyFlitsUK added kind/enhancement Kind: Enhancement need/author-input Hint: Needs Author Input area/markets/retrieval and removed need/triage kind/bug Kind: Bug labels May 17, 2022
@TippyFlitsUK TippyFlitsUK self-assigned this May 17, 2022
@scaseye
Copy link
Author

scaseye commented May 17, 2022

i have no idea how the strike through text formatting is there... i tried to clean up hope that helps

@RobQuistNL
Copy link
Contributor

Wrap it in backticks; `

@scaseye
Copy link
Author

scaseye commented May 17, 2022

@RobQuistNL happy? :)

@github-actions
Copy link

Oops, seems like we needed more information for this issue, please comment with more details or this issue will be closed in 24 hours.

@TippyFlitsUK TippyFlitsUK added need/analysis Hint: Needs Analysis and removed need/author-input Hint: Needs Author Input labels May 21, 2022
@stuberman
Copy link

stuberman commented May 23, 2022

I see the same issue with EverGreen generated deals:

boostd storage-deals import-data bafyreianx2fqttknjqho4xq4ekbemfaoxci7nlacryryl7v23a7vbuww2i /market/eg/baga6e~6b6y5mpi__QmWvBb~p7dDbYyc.car

Error: given data does not match expected commP (got: baga6ea4seaqfzfysi5ggaktnmbgmifwv77r2evtfsnz2il6e5qlnto4jo5qcaoa, expected baga6ea4seaqorn5ftgybsjgworvsrzwr5pzmgpyidwdakdrbliasjoe6b6y5mpi

From echo curl -sLH \"Authorization: $( ./fil-spid.bash f01278 )\" https://api.evergreen.filecoin.io/pending_proposals | sh

"deal_proposal_cid": "bafyreianx2fqttknjqho4xq4ekbemfaoxci7nlacryryl7v23a7vbuww2i",
"hours_remaining": 67,
"piece_size": 34359738368,
"piece_cid": "baga6ea4seaqorn5ftgybsjgworvsrzwr5pzmgpyidwdakdrbliasjoe6b6y5mpi",
"root_cid": "QmWvBbwDQia7o8XfzXGMKypY11gZVa2D5T88Qip7dDbYyc",
"deal_start_time": "2022-05-26T16:00:00Z",
"deal_start_epoch": 1842480,
"sample_import_cmd": "lotus-miner storage-deals import-data bafyreianx2fqttknjqho4xq4ekbemfaoxci7nlacryryl7v23a7vbuww2i baga6e ~6b6y5mpi__QmWvBb ~p7dDbYyc.car",
"sources": [

@stuberman
Copy link

@ribasushi also reported this:

I ran the exact command you gave me from f01345523 above, the result matches all
~$ < baga6e~6b6y5mpi__QmWvBb~p7dDbYyc.car ~/go/bin/stream-commp

CommPCid: baga6ea4seaqorn5ftgybsjgworvsrzwr5pzmgpyidwdakdrbliasjoe6b6y5mpi
Payload: 27941024465 bytes
Unpadded piece: 34091302912 bytes
Padded piece: 34359738368 bytes

CARv1 detected in stream:
Blocks: 107189
Roots: 1
1: QmWvBbwDQia7o8XfzXGMKypY11gZVa2D5T88Qip7dDbYyc

@ribasushi
Copy link
Collaborator

What @stuberman and I ran/downloaded are:

lotus client retrieve --provider f01345523 --maxPrice 0 --allow-local --car 'QmWvBbwDQia7o8XfzXGMKypY11gZVa2D5T88Qip7dDbYyc' $(pwd)/baga6e~6b6y5mpi__QmWvBb~p7dDbYyc.car
lotus client retrieve --provider f08403 --maxPrice 0 --allow-local --car 'QmWvBbwDQia7o8XfzXGMKypY11gZVa2D5T88Qip7dDbYyc' $(pwd)/baga6e~6b6y5mpi__QmWvBb~p7dDbYyc.car

I ran only the first one and got the correct car ( md5sum: 47011c8018ed58f1fcaf3037e2ea72ba )

@stuberman ran both and got a corrupted file both times

I suspect (only suspect) a corrupted blockstore with the hash verification somehow not firing, always resulting in the incorrect output, since the local bad block is reused.

@LexLuthr
Copy link
Contributor

@ribasushi I was able to download the file without any problems and md5sum is the same as above. I used only first method. Let me know if you want me to test the second as well.

@LexLuthr
Copy link
Contributor

LexLuthr commented Jul 7, 2022

Relevant thread in slack: https://filecoinproject.slack.com/archives/C0377FJCG1L/p1657019175554179

Observation:

  1. Same file can have multiple commP after retrieval, i.e. changes(corruption) to the file are not aways constant.
  2. The problem seems to be coming from CAR files differ from the original by one CID — bafkqaaa which is definitely an identity cid. (https://cid.ipfs.io/#bafkqaaa)
  3. This will probably not impact the booster-http transfers.

@rvagg
Copy link
Member

rvagg commented Aug 29, 2022

Thanks @jacobheun for connecting the dots to this issue. Over @ filecoin-project/boost#715 (comment) I outlined a scenario that could cause this problem—retrieving an entire piece and not getting a byte-perfect CAR that can generate the same CommP that was used to store it in the first place. The fix should be to add carv2.StoreIdentityCIDs(true) to the optiosn @ https://github.com/filecoin-project/go-fil-markets/blob/10e86a051a7d5c8dc2b35cb8fd6ef290e480cb02/stores/rw_bstores.go#L41. It does require some consideration though, and I think we need to clear up some ambiguity as identity CIDs relate to CommP:

  • The filecoin spec for online deals should probably be clear about whether identity CIDs are to be stored in the CAR that is used to generate CommP (we're kind of stuck with this for historical reasons, it's a waste of space, but does have some benefits for indexing).
  • The filecoin spec should probably clarify how identity CIDs are treated when they are the root of a DAG (what we're dealing with in the linked issue)
  • Lotus itself should be internally consistent. On one hand, for importing, it'll happily create a DAG with identity CIDs (
    b := cidutil.InlineBuilder{
    Builder: prefix,
    Limit: 126,
    }
    ) but on the other hand it'll retrieve those DAGs into a CAR that doesn't contain them.

The big problem here is that it's not actually obvious what the "right" behaviour is. But we do know what the historical precedent is so we should probably just follow that and bake it into well-documented specs.

@rjan90
Copy link
Contributor

rjan90 commented Jun 3, 2024

Closing this ticket as the Legacy Lotus/Lotus-Miner Markets sub-system reached EOL at the end of the 31st January 2023, and is being completely removed from the code here:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

8 participants