Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adopt possibly incomplete NeoFS SEARCH results in NeoFSBlockFetcher and upload-bin CLI command #3645

Open
AnnaShaleva opened this issue Oct 24, 2024 · 8 comments
Labels
blocked Can't be done because of something bug Something isn't working I4 No visible changes S4 Routine U1 Critically important to resolve quickly
Milestone

Comments

@AnnaShaleva
Copy link
Member

Current Behavior

Some blocks are uploaded to NeoFS, then restart of the script happens. The script starts to upload from 0 block, not from the latest incomplete batch:

2024-10-23 14:46:48.603	Chain block height: 6231784
2024-10-23 14:47:00.828	Processing batch from 0 to 9999
2024-10-23 14:47:00.828	First block of latest incomplete batch uploaded to NeoFS container: 0
2024-10-23 14:50:56.510	Processing batch from 10000 to 19999
2024-10-23 14:50:56.510	Successfully uploaded batch of blocks: from 0 to 9999
2024-10-23 14:55:19.178	Processing batch from 20000 to 29999
2024-10-23 14:55:19.178	Successfully uploaded batch of blocks: from 10000 to 19999
2024-10-23 15:00:07.874	Processing batch from 30000 to 39999
2024-10-23 15:00:07.874	Successfully uploaded batch of blocks: from 20000 to 29999
2024-10-23 15:03:35.567	Processing batch from 40000 to 49999
2024-10-23 15:03:35.567	Successfully uploaded batch of blocks: from 30000 to 39999
2024-10-23 15:04:49.432	Chain block height: 6231850
2024-10-23 15:07:34.927	Processing batch from 0 to 9999
2024-10-23 15:07:34.927	First block of latest incomplete batch uploaded to NeoFS container: 0
2024-10-23 15:12:00.468	Processing batch from 10000 to 19999
2024-10-23 15:12:00.468	Successfully uploaded batch of blocks: from 0 to 9999
2024-10-23 15:16:50.336	Processing batch from 20000 to 29999
2024-10-23 15:16:50.336	Successfully uploaded batch of blocks: from 10000 to 19999
2024-10-23 15:21:10.256	Processing batch from 30000 to 39999
2024-10-23 15:21:10.256	Successfully uploaded batch of blocks: from 20000 to 29999
2024-10-23 15:25:27.258	Processing batch from 40000 to 49999
2024-10-23 15:25:27.258	Successfully uploaded batch of blocks: from 30000 to 39999
2024-10-23 15:30:14.827	Processing batch from 50000 to 59999

The pattern repeats.

Expected Behavior

Reupload must happen starting from latest incomplete batch.

Possible Solution

Find the problem in fetchLatestMissingBlockIndex, fix it.

@AnnaShaleva AnnaShaleva added bug Something isn't working U1 Critically important to resolve quickly S4 Routine I4 No visible changes labels Oct 24, 2024
@AnnaShaleva AnnaShaleva added this to the v0.106.4 milestone Oct 24, 2024
@AnnaShaleva
Copy link
Member Author

One more example:

2024-10-24 06:10:41.740	
Successfully uploaded batch of blocks: from 1960000 to 1969999
2024-10-24 06:14:17.712	Processing batch from 1980000 to 1989999	
2024-10-24 06:14:17.712	Successfully uploaded batch of blocks: from 1970000 to 1979999	
2024-10-24 06:18:11.192	Processing batch from 1990000 to 1999999	
2024-10-24 06:18:11.192	Successfully uploaded batch of blocks: from 1980000 to 1989999	
2024-10-24 06:21:47.636	Processing batch from 2000000 to 2009999	
2024-10-24 06:21:47.636	Successfully uploaded batch of blocks: from 1990000 to 1999999	
2024-10-24 06:25:07.098	Processing batch from 2010000 to 2019999
2024-10-24 06:25:07.098	Successfully uploaded batch of blocks: from 2000000 to 2009999	
2024-10-24 06:29:13.781	Processing batch from 2020000 to 2029999	
2024-10-24 06:29:13.781	Successfully uploaded batch of blocks: from 2010000 to 2019999	
2024-10-24 06:33:18.434	Processing batch from 2030000 to 2039999	
2024-10-24 06:33:18.434	Successfully uploaded batch of blocks: from 2020000 to 2029999	
2024-10-24 06:36:05.190	upload error: failed to initiate object upload: connection: no healthy client
2024-10-24 06:37:08.145	Chain block height: 6235291	
2024-10-24 06:54:58.627	Processing batch from 0 to 9999	
2024-10-24 06:54:58.627	First block of latest incomplete batch uploaded to NeoFS container: 0	
2024-10-24 07:00:32.594	Processing batch from 10000 to 19999	
2024-10-24 07:00:32.594	Successfully uploaded batch of blocks: from 0 to 9999	
2024-10-24 07:05:19.672	Processing batch from 20000 to 29999	
2024-10-24 07:05:19.672	Successfully uploaded batch of blocks: from 10000 to 19999	
2024-10-24 07:08:32.479	Processing batch from 30000 to 39999	
2024-10-24 07:08:32.479	Successfully uploaded batch of blocks: from 20000 to 29999	
2024-10-24 07:10:25.644	Processing batch from 40000 to 49999	
2024-10-24 07:10:25.644	Successfully uploaded batch of blocks: from 30000 to 39999	
2024-10-24 07:10:48.870	upload error: failed to initiate object upload: connection: no healthy client	
2024-10-24 07:11:50.608	Chain block height: 6235419

@AnnaShaleva
Copy link
Member Author

AnnaShaleva commented Oct 24, 2024

But sometimes it works differently (logs are from the same mainnet service):

2024-10-24 06:37:08.145	Chain block height: 6235291	
2024-10-24 06:54:58.627	Processing batch from 0 to 9999	
2024-10-24 06:54:58.627	First block of latest incomplete batch uploaded to NeoFS container: 0
2024-10-24 07:00:32.594	Processing batch from 10000 to 19999	
2024-10-24 07:00:32.594	Successfully uploaded batch of blocks: from 0 to 9999	
2024-10-24 07:05:19.672	Processing batch from 20000 to 29999	
2024-10-24 07:05:19.672	Successfully uploaded batch of blocks: from 10000 to 19999	
2024-10-24 07:08:32.479	Processing batch from 30000 to 39999	
2024-10-24 07:08:32.479	Successfully uploaded batch of blocks: from 20000 to 29999
2024-10-24 07:10:25.644	Processing batch from 40000 to 49999	
2024-10-24 07:10:25.644	Successfully uploaded batch of blocks: from 30000 to 39999	
2024-10-24 07:10:48.870	upload error: failed to initiate object upload: connection: no healthy client	
2024-10-24 07:11:50.608	Chain block height: 6235419	
2024-10-24 07:18:58.201	failed to fetch the latest missing block index from container: search of index files failed for batch with indexes from 2480000 to 2489999: failed to initiate object search: session: init session: status: code = 1024 message = connection to the RPC node has been lost	
2024-10-24 07:20:01.123	Chain block height: 6235449	
2024-10-24 07:29:07.555	Processing batch from 40000 to 49999	
2024-10-24 07:29:07.555	First block of latest incomplete batch uploaded to NeoFS container: 40000	
2024-10-24 07:29:45.742	Processing batch from 50000 to 59999	
2024-10-24 07:29:45.742	Successfully uploaded batch of blocks: from 40000 to 49999	
2024-10-24 07:30:14.433	Processing batch from 60000 to 69999	
2024-10-24 07:30:14.433	Successfully uploaded batch of blocks: from 50000 to 59999	
2024-10-24 07:31:04.556	Processing batch from 70000 to 79999	
2024-10-24 07:31:04.556	Successfully uploaded batch of blocks: from 60000 to 69999	
2024-10-24 07:31:42.554	upload error: failed to initiate object upload: connection: no healthy client
2024-10-24 07:32:44.920	Chain block height: 6235496

@AliceInHunterland
Copy link
Contributor

can be connected #3615

@AnnaShaleva
Copy link
Member Author

Well, of course it's a bug in fetchLatestMissingBlockIndex only if our batches are full and don't have gaps.

can be connected #3615

Check currently uploaded data for N3 mainet. See if there are gaps in batches.

@AnnaShaleva
Copy link
Member Author

Checked, depends on #3615 resolution.

@AnnaShaleva
Copy link
Member Author

To resolve this issue, we need to adopt the SEARCH completeness marker once nspcc-dev/neofs-node#2721 implemented.

@AnnaShaleva AnnaShaleva added the blocked Can't be done because of something label Oct 24, 2024
@AnnaShaleva AnnaShaleva changed the title Bug in the search of latest uploaded batch in upload-bin Adopt possible incomplete NeoFS SEARCH results in NeoFSBlockFetcher and upload-bin CLI command Oct 24, 2024
@AnnaShaleva AnnaShaleva changed the title Adopt possible incomplete NeoFS SEARCH results in NeoFSBlockFetcher and upload-bin CLI command Adopt possibly incomplete NeoFS SEARCH results in NeoFSBlockFetcher and upload-bin CLI command Oct 24, 2024
@AnnaShaleva
Copy link
Member Author

AnnaShaleva commented Oct 24, 2024

See also nspcc-dev/neofs-node#2721 (comment): for situations where all SNs from REP policy are dead, we need to shut down BlockFetcher, because it won't be able to receive even block OIDs and proceed with addition of blocks to the chain.

@AnnaShaleva
Copy link
Member Author

Within the scope of this issue we need to revert changes made by #3670 and fall back from per-object SEARCH to SEARCH for the range of objects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked Can't be done because of something bug Something isn't working I4 No visible changes S4 Routine U1 Critically important to resolve quickly
Projects
None yet
Development

No branches or pull requests

3 participants