Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime error: integer divide by zero #209

Closed
streamer45 opened this issue Jan 24, 2024 · 12 comments · Fixed by #263 or #265
Closed

runtime error: integer divide by zero #209

streamer45 opened this issue Jan 24, 2024 · 12 comments · Fixed by #263 or #265

Comments

@streamer45
Copy link

We received a stack trace for a crash on this library:

runtime.errorString: runtime error: integer divide by zero
  File "github.com/blevesearch/zapx/v15@v15.3.11/posting.go", line 747, in (*PostingsIterator).nextDocNumAtOrAfterClean
  File "github.com/blevesearch/zapx/v15@v15.3.11/posting.go", line 632, in (*PostingsIterator).nextDocNumAtOrAfter
  File "github.com/blevesearch/zapx/v15@v15.3.11/posting.go", line 537, in (*PostingsIterator).nextAtOrAfter
  File "github.com/blevesearch/zapx/v15@v15.3.11/posting.go", line 526, in (*PostingsIterator).Next
  File "github.com/blevesearch/bleve/v2@v2.3.8/index/scorch/snapshot_index_tfr.go", line 88, in (*IndexSnapshotTermFieldReader).Next
  File "github.com/blevesearch/bleve/v2@v2.3.8/search/searcher/search_term.go", line 89, in (*TermSearcher).Next
  File "github.com/blevesearch/bleve/v2@v2.3.8/search/searcher/search_conjunction.go", line 218, in (*ConjunctionSearcher).Next
  File "github.com/blevesearch/bleve/v2@v2.3.8/search/searcher/search_conjunction.go", line 246, in (*ConjunctionSearcher).Advance
  File "github.com/blevesearch/bleve/v2@v2.3.8/search/searcher/search_disjunction_slice.go", line 241, in (*DisjunctionSliceSearcher).Advance
  File "github.com/blevesearch/bleve/v2@v2.3.8/search/searcher/search_conjunction.go", line 253, in (*ConjunctionSearcher).advanceChild
  File "github.com/blevesearch/bleve/v2@v2.3.8/search/searcher/search_conjunction.go", line 241, in (*ConjunctionSearcher).Advance
  File "github.com/blevesearch/bleve/v2@v2.3.8/search/searcher/search_boolean.go", line 356, in (*BooleanSearcher).Advance
  File "github.com/blevesearch/bleve/v2@v2.3.8/search/searcher/search_conjunction.go", line 253, in (*ConjunctionSearcher).advanceChild
  File "github.com/blevesearch/bleve/v2@v2.3.8/search/searcher/search_conjunction.go", line 200, in (*ConjunctionSearcher).Next
  File "github.com/blevesearch/bleve/v2@v2.3.8/search/searcher/search_boolean.go", line 165, in (*BooleanSearcher).advanceNextMust
  File "github.com/blevesearch/bleve/v2@v2.3.8/search/searcher/search_boolean.go", line 316, in (*BooleanSearcher).Next
  File "github.com/blevesearch/bleve/v2@v2.3.8/search/collector/topn.go", line 226, in (*TopNCollector).Collect
  File "github.com/blevesearch/bleve/v2@v2.3.8/index_impl.go", line 551, in (*indexImpl).SearchInContext
  File "github.com/blevesearch/bleve/v2@v2.3.8/index_impl.go", line 369, in (*indexImpl).Search

It looks like i.postings.chunkSize may need some checking as it can be zero.

@agarciamontoro
Copy link

We just received another similar crash, again caused by i.postings.chunkSize being 0, this time in the nextDocNumAtOrAfter function:

runtime.errorString: runtime error: integer divide by zero
  File "github.com/blevesearch/zapx/v15@v15.3.11/posting.go", line 644, in (*PostingsIterator).nextDocNumAtOrAfter
  File "github.com/blevesearch/zapx/v15@v15.3.11/posting.go", line 537, in (*PostingsIterator).nextAtOrAfter
  File "github.com/blevesearch/zapx/v15@v15.3.11/posting.go", line 526, in (*PostingsIterator).Next
  File "github.com/blevesearch/bleve/v2@v2.3.8/index/scorch/snapshot_index_tfr.go", line 88, in (*IndexSnapshotTermFieldReader).Next
  File "github.com/blevesearch/bleve/v2@v2.3.8/search/searcher/search_term.go", line 89, in (*TermSearcher).Next
  File "github.com/blevesearch/bleve/v2@v2.3.8/search/searcher/search_conjunction.go", line 218, in (*ConjunctionSearcher).Next
  File "github.com/blevesearch/bleve/v2@v2.3.8/search/searcher/search_conjunction.go", line 218, in (*ConjunctionSearcher).Next
  File "github.com/blevesearch/bleve/v2@v2.3.8/search/searcher/search_boolean.go", line 165, in (*BooleanSearcher).advanceNextMust
  File "github.com/blevesearch/bleve/v2@v2.3.8/search/searcher/search_boolean.go", line 316, in (*BooleanSearcher).Next
  File "github.com/blevesearch/bleve/v2@v2.3.8/search/searcher/search_boolean.go", line 396, in (*BooleanSearcher).Advance
  File "github.com/blevesearch/bleve/v2@v2.3.8/search/searcher/search_conjunction.go", line 253, in (*ConjunctionSearcher).advanceChild
  File "github.com/blevesearch/bleve/v2@v2.3.8/search/searcher/search_conjunction.go", line 190, in (*ConjunctionSearcher).Next
  File "github.com/blevesearch/bleve/v2@v2.3.8/search/searcher/search_boolean.go", line 165, in (*BooleanSearcher).advanceNextMust
  File "github.com/blevesearch/bleve/v2@v2.3.8/search/searcher/search_boolean.go", line 316, in (*BooleanSearcher).Next
  File "github.com/blevesearch/bleve/v2@v2.3.8/search/collector/topn.go", line 226, in (*TopNCollector).Collect
  File "github.com/blevesearch/bleve/v2@v2.3.8/index_impl.go", line 551, in (*indexImpl).SearchInContext
  File "github.com/blevesearch/bleve/v2@v2.3.8/index_impl.go", line 369, in (*indexImpl).Search

@abhinavdangeti
Copy link
Member

Would you retest this with the latest v2.3.x release of bleve (v2.3.10)?

In situations such as these, it'd greatly help our team if you could give us a test case or some sample data with which we can deterministically reproduce the error.

@cpoile
Copy link

cpoile commented Sep 10, 2024

We received another crash, same as before, on v2.3.10:

runtime error: integer divide by zero runtime.errorString
github.com/blevesearch/zapx/v15@v15.3.13/posting.go in (*PostingsIterator).nextDocNumAtOrAfter at line 644
github.com/blevesearch/zapx/v15@v15.3.13/posting.go in (*PostingsIterator).nextAtOrAfter at line 537
github.com/blevesearch/zapx/v15@v15.3.13/posting.go in (*PostingsIterator).Next at line 526
github.com/blevesearch/bleve/v2@v2.3.10/index/scorch/snapshot_index_tfr.go in (*IndexSnapshotTermFieldReader).Next at line 88
github.com/blevesearch/bleve/v2@v2.3.10/search/searcher/search_term.go in (*TermSearcher).Next at line 92
github.com/blevesearch/bleve/v2@v2.3.10/search/searcher/search_conjunction.go in (*ConjunctionSearcher).Next at line 218
1
github.com/blevesearch/bleve/v2@v2.3.10/search/searcher/search_boolean.go in (*BooleanSearcher).advanceNextMust at line 165
github.com/blevesearch/bleve/v2@v2.3.10/search/searcher/search_boolean.go in (*BooleanSearcher).Next at line 316
github.com/blevesearch/bleve/v2@v2.3.10/search/searcher/search_boolean.go in (*BooleanSearcher).Advance at line 396
github.com/blevesearch/bleve/v2@v2.3.10/search/searcher/search_conjunction.go in (*ConjunctionSearcher).advanceChild at line 253
github.com/blevesearch/bleve/v2@v2.3.10/search/searcher/search_conjunction.go in (*ConjunctionSearcher).Next at line 190
github.com/blevesearch/bleve/v2@v2.3.10/search/searcher/search_boolean.go in (*BooleanSearcher).advanceNextMust at line 165
github.com/blevesearch/bleve/v2@v2.3.10/search/searcher/search_boolean.go in (*BooleanSearcher).Next at line 316
github.com/blevesearch/bleve/v2@v2.3.10/search/collector/topn.go in (*TopNCollector).Collect at line 228
github.com/blevesearch/bleve/v2@v2.3.10/index_impl.go in (*indexImpl).SearchInContext at line 580
github.com/blevesearch/bleve/v2@v2.3.10/index_impl.go in (*indexImpl).Search at line 371

@abhinavdangeti
Copy link
Member

@moshaad7 would you look into this, see if this covered or separate from what you've been looking into.

@moshaad7
Copy link
Contributor

This is unrelated to the GetCardinality panic issue that we are working on at #259

This is essentially PostingsList.chunkSize being 0 and that is leading to divide by 0 error

func (i *PostingsIterator) nextDocNumAtOrAfter(atOrAfter uint64) (uint64, bool, error) {
    ...
    nChunk := n / uint32(i.postings.chunkSize)
   ...
}

I see previous attempts to avoid setting chunkSize to 0 in this PR raised by @abhinavdangeti https://github.com/blevesearch/zapx/pull/151/files
But somehow we landed with 0 chunkSize.

@moshaad7
Copy link
Contributor

Here is my proposal to solve this problem:

So we have a method named getChunkSize which return chunkSize and an error if any.
The problem with this method is that it can return chunkSize=0 and err=nil

Now in the previous PR that I mentioned , we are handling this case of chunkSize being 0 at only one place.
But there are several other places where we are calling this method getChunkSize()

So, we should update the implementation of getChunkSize() method to return error in case chunkSize is 0.
This way, caller will only have to worry about error being non-nil and they will be assured that chunkSize will never be 0

moshaad7 added a commit that referenced this issue Sep 12, 2024
The existing implementation of the method
getChunkSize(...) (uint64, error), in some cases,
can return chunkSize==0 and err==nil.

The onus is on the caller to check for such possibility
and handle it properly.
Callers often use the returned chunkSize as a divisor and
a zero chunkSize lead to panic.
see #209

This PR intends to update the method implementation to
always return an error in case the returned chunkSize
value is 0.
That way caller need to only worry about error being non-nil.

Callers which are ok with 0 chunkSize can check the returned
error against ErrChunkSizeZero
moshaad7 added a commit that referenced this issue Sep 12, 2024
The existing implementation of the method
getChunkSize(...) (uint64, error), in some cases,
can return chunkSize==0 and err==nil.

The onus is on the caller to check for such possibility
and handle it properly.
Callers often use the returned chunkSize as a divisor and
a zero chunkSize lead to panic.
see #209

This PR intends to update the method implementation to
always return an error in case the returned chunkSize
value is 0.
That way caller need to only worry about error being non-nil.

Callers which are ok with 0 chunkSize can check the returned
error against ErrChunkSizeZero
@moshaad7 moshaad7 linked a pull request Sep 12, 2024 that will close this issue
moshaad7 added a commit that referenced this issue Sep 12, 2024
The existing implementation of the method
getChunkSize(...) (uint64, error), in some cases,
can return chunkSize==0 and err==nil.

The onus is on the caller to check for such possibility
and handle it properly.
Callers often use the returned chunkSize as a divisor and
a zero chunkSize lead to panic.
see #209

This PR intends to update the method implementation to
always return an error in case the returned chunkSize
value is 0.
That way caller need to only worry about error being non-nil.

Callers which are ok with 0 chunkSize can check the returned
error against ErrChunkSizeZero
@moshaad7 moshaad7 linked a pull request Sep 12, 2024 that will close this issue
moshaad7 added a commit that referenced this issue Sep 17, 2024
The existing implementation of the method
getChunkSize(...) (uint64, error), in some cases,
can return chunkSize==0 and err==nil.

The onus is on the caller to check for such possibility
and handle it properly.
Callers often use the returned chunkSize as a divisor and
a zero chunkSize lead to panic.
see #209

This PR intends to update the method implementation to
always return an error in case the returned chunkSize
value is 0.
That way caller need to only worry about error being non-nil.

Callers which are ok with 0 chunkSize can check the returned
error against ErrChunkSizeZero
moshaad7 added a commit that referenced this issue Sep 18, 2024
The existing implementation of the method
getChunkSize(...) (uint64, error), in some cases,
can return chunkSize==0 and err==nil.

The onus is on the caller to check for such possibility
and handle it properly.
Callers often use the returned chunkSize as a divisor and
a zero chunkSize lead to panic.
see #209

This PR intends to update the method implementation to
always return an error in case the returned chunkSize
value is 0.
That way caller need to only worry about error being non-nil.

Callers which are ok with 0 chunkSize can check the returned
error against ErrChunkSizeZero
@moshaad7
Copy link
Contributor

mattermost folks (@cpoile , @agarciamontoro , @streamer45 ), while we are adding checks to avoid "divide by zero" errors.
We also want to figure out the source of the issue ( how chunkSize became zero).

There are several possibilities, like

  • usage of a premature value of type PostingsList. ( Say someone just created a postingsList but didn't set the right chunkSize, so it will take the default of 0 )
  • A segment reporting zero number of docs in it's header.
  • Maybe an issue while merging two segments, leading to docNum dropping to zero
  • etc ...

And also we ( couchbase) aren't able to reproduce this issue in-house.

It would be helpful, if you can try to reproduce it on staging/dummy data and share the index files to probe if there is an actual issue with segment headers etc ...

Thanks.

@mjm918
Copy link

mjm918 commented Sep 28, 2024

mattermost folks (@cpoile , @agarciamontoro , @streamer45 ), while we are adding checks to avoid "divide by zero" errors. We also want to figure out the source of the issue ( how chunkSize became zero).

There are several possibilities, like

  • usage of a premature value of type PostingsList. ( Say someone just created a postingsList but didn't set the right chunkSize, so it will take the default of 0 )
  • A segment reporting zero number of docs in it's header.
  • Maybe an issue while merging two segments, leading to docNum dropping to zero
  • etc ...

And also we ( couchbase) aren't able to reproduce this issue in-house.

It would be helpful, if you can try to reproduce it on staging/dummy data and share the index files to probe if there is an actual issue with segment headers etc ...

Thanks.

I am facing the same issue. Here are the files. Download Link

You wont be able to see the issue if you make 1 or 2 requests. I'm using blevesearch in a webapp, i found this issue when i was doing a testing. I tried to make request to blevesearch from 100 concurrent connections. each client was making 10,000 requests. after few thousand requests, i encountered this panic.

@moshaad7
Copy link
Contributor

moshaad7 commented Sep 29, 2024

@mjm918 Thanks for your contribution

❯ ls -R
index_meta.json store

./store:
00000000000e.zap root.bolt

Can you also share index mapping related information and your query json ?
And were you performing any mutations while querying ? (or your data was stable with no ingestion)

And are you sure your bleve index only have one segment ?
If yes, Did you use (*Scorch)ForceMerge API to achieve this ?


For the given zap file, header looks fine

chunkMode: 1026
numDocs: 59996

I was exploring Dictionaries corresponding to each field.
Following fields have empty dictionary ( meaning no document had value corresponding to these )
product_desc, qr_code, product_promo

You can check if this aligns with your data :)

link:
https://gist.github.com/moshaad7/fdc2ee53056e0c5fca2d8657112eb8b5

@moshaad7
Copy link
Contributor

moshaad7 commented Oct 10, 2024

Dear @mjm918,

It would be greatly appreciated by all of us using Bleve and Zap if you could kindly address the questions mentioned in the previous comment. The insights you've shared so far have been incredibly helpful, and any additional information you can provide will be invaluable in tracking down this unusual bug.

Thank you for your time and support!

@mjm918
Copy link

mjm918 commented Oct 11, 2024

Hi Sorry for late reply.

Can you also share index mapping related information and your query json ?

Index Mapping -

{
   "types":{
      "moss":{
         "enabled":true,
         "dynamic":true,
         "properties":{
            "product_code":{
               "enabled":true,
               "dynamic":true,
               "fields":[
                  {
                     "type":"text",
                     "analyzer":"standard",
                     "index":true,
                     "include_term_vectors":true,
                     "include_in_all":true
                  }
               ]
            },
            "product_desc":{
               "enabled":true,
               "dynamic":true,
               "fields":[
                  {
                     "type":"text",
                     "analyzer":"keyword",
                     "index":true,
                     "include_in_all":true
                  }
               ]
            },
            "product_id":{
               "enabled":true,
               "dynamic":true,
               "fields":[
                  {
                     "type":"number",
                     "index":true,
                     "include_term_vectors":true,
                     "include_in_all":true
                  }
               ]
            },
            "product_name":{
               "enabled":true,
               "dynamic":true,
               "fields":[
                  {
                     "type":"text",
                     "analyzer":"keyword",
                     "index":true,
                     "include_in_all":true
                  }
               ]
            },
            "product_status":{
               "enabled":true,
               "dynamic":true,
               "fields":[
                  {
                     "type":"number",
                     "index":true,
                     "include_term_vectors":true,
                     "include_in_all":true
                  }
               ]
            },
            "qr_code":{
               "enabled":true,
               "dynamic":true,
               "fields":[
                  {
                     "type":"text",
                     "analyzer":"en",
                     "index":true,
                     "include_in_all":true
                  }
               ]
            }
         },
         "default_analyzer":"standard"
      }
   },
   "default_mapping":{
      "enabled":true,
      "dynamic":true,
      "default_analyzer":"standard"
   },
   "type_field":"_type",
   "default_type":"_default",
   "default_analyzer":"standard",
   "default_datetime_parser":"dateTimeOptional",
   "default_field":"_all",
   "store_dynamic":true,
   "index_dynamic":true,
   "docvalues_dynamic":true,
   "analysis":{
      
   }
}

Query -

{
   "query":{
      "conjuncts":[
         {
            "conjuncts":[
               {
                  "match":"580",
                  "prefix_length":0,
                  "fuzziness":1
               },
               {
                  "match":"SONY",
                  "prefix_length":0,
                  "fuzziness":1
               },
               {
                  "match":"VOLUME",
                  "prefix_length":0,
                  "fuzziness":1
               }
            ]
         },
         {
            "conjuncts":[
               {
                  "query":"product_status:1"
               }
            ]
         }
      ]
   },
   "size":10,
   "from":0,
   "highlight":{
      "style":"ansi",
      "fields":[
         "*"
      ]
   },
   "fields":[
      "*"
   ],
   "facets":null,
   "explain":false,
   "sort":[
      "-product_name"
   ],
   "includeLocations":false,
   "search_after":null,
   "search_before":null
}

And were you performing any mutations while querying ? (or your data was stable with no ingestion)

There was no ingestion while querying

And are you sure your bleve index only have one segment ?
If yes, Did you use (*Scorch)ForceMerge API to achieve this ?

No ForceMerge was done. I indexed the data, tried to do a stress test and noticed the error.

Following fields have empty dictionary ( meaning no document had value corresponding to these )
product_desc, qr_code, product_promo

Yes it's correct. I was using a sample data database for testing.

abhinavdangeti pushed a commit that referenced this issue Oct 14, 2024
The existing implementation of the method
getChunkSize(...) (uint64, error), in some cases,
can return chunkSize==0 and err==nil.

The onus is on the caller to check for such possibility and handle it
properly.
Callers often use the returned chunkSize as a divisor and a zero
chunkSize lead to panic.
see #209

This PR intends to update the method implementation to always return an
error in case the returned chunkSize value is 0.
That way caller need to only worry about error being non-nil.

Callers which are ok with 0 chunkSize can check the returned error
against ErrChunkSizeZero
abhinavdangeti pushed a commit that referenced this issue Oct 15, 2024
This is a backport of #265.

The existing implementation of the method
getChunkSize(...) (uint64, error), in some cases,
can return chunkSize==0 and err==nil.

The onus is on the caller to check for such possibility and handle it
properly.
Callers often use the returned chunkSize as a divisor and a zero
chunkSize lead to panic.
see #209

This PR intends to update the method implementation to always return an
error in case the returned chunkSize value is 0.
That way caller need to only worry about error being non-nil.

Callers which are ok with 0 chunkSize can check the returned error
against ErrChunkSizeZero
abhinavdangeti pushed a commit that referenced this issue Oct 16, 2024
The existing implementation of the method
getChunkSize(...) (uint64, error), in some cases,
can return chunkSize==0 and err==nil.

The onus is on the caller to check for such possibility and handle it
properly.
Callers often use the returned chunkSize as a divisor and a zero
chunkSize lead to panic.
see #209

This PR intends to update the method implementation to always return an
error in case the returned chunkSize value is 0.
That way caller need to only worry about error being non-nil.

Callers which are ok with 0 chunkSize can check the returned error
against ErrChunkSizeZero
@abhinavdangeti
Copy link
Member

Targeting some fixes here for bleve's v2.4.3 release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants