Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Occasional "Connection reset" and 500 Internal Server Error #190

Open
earshinov opened this issue Nov 3, 2020 · 3 comments
Open

Occasional "Connection reset" and 500 Internal Server Error #190

earshinov opened this issue Nov 3, 2020 · 3 comments
Labels
bug Something isn't working
Milestone

Comments

@earshinov
Copy link

earshinov commented Nov 3, 2020

Hi!

We occasionally have "Connection reset" errors in Convergence server logs and 500 Internal Server Error in response to a REST API call.

GET domains/convergence/default/models/ffreport-3605: 500 Server Error
b'{"body":{"error_code":"internal_server_error"},"ok":false}'

Convergence server logs:

INFO: Caught Network I/O errors on odb.pl-grt-orientdb:2424/7048553260737367023, trying an automatic reconnection... (error: Connection reset)
Caused by: java.net.SocketException: Connection reset
    at java.net.SocketInputStream.read(SocketInputStream.java:186) ~[?:?]
    at java.net.SocketInputStream.read(SocketInputStream.java:140) ~[?:?]
    at java.io.BufferedInputStream.fill(BufferedInputStream.java:252) ~[?:?]
    at java.io.BufferedInputStream.read(BufferedInputStream.java:271) ~[?:?]
    at java.io.DataInputStream.readByte(DataInputStream.java:270) ~[?:?]
    at com.orientechnologies.orient.enterprise.channel.binary.OChannelBinary.readByte(OChannelBinary.java:75) ~[com.orientechnologies.orientdb-core-3.0.32.jar:3.0.32]
    at com.orientechnologies.orient.client.binary.OChannelBinaryAsynchClient.beginResponse(OChannelBinaryAsynchClient.java:176) ~[com.orientechnologies.orientdb-client-3.0.32.jar:3.0.32]
    at com.orientechnologies.orient.client.binary.OChannelBinaryAsynchClient.beginResponse(OChannelBinaryAsynchClient.java:160) ~[com.orientechnologies.orientdb-client-3.0.32.jar:3.0.32]
    at com.orientechnologies.orient.client.remote.OStorageRemote.beginResponse(OStorageRemote.java:1891) ~[com.orientechnologies.orientdb-client-3.0.32.jar:3.0.32]
    at com.orientechnologies.orient.client.remote.OStorageRemote.lambda$networkOperationRetryTimeout$2(OStorageRemote.java:248) ~[com.orientechnologies.orientdb-client-3.0.32.jar:3.0.32]
    at com.orientechnologies.orient.client.remote.OStorageRemote.baseNetworkOperation(OStorageRemote.java:307) ~[com.orientechnologies.orientdb-client-3.0.32.jar:3.0.32]
    ... 42 more
16:34:35 ERROR RealtimeModelActor - Unexpected error getting model
com.orientechnologies.common.io.OIOException: Connection reset
    at com.orientechnologies.orient.client.remote.OStorageRemote.baseNetworkOperation(OStorageRemote.java:351) ~[com.orientechnologies.orientdb-client-3.0.32.jar:3.0.32]
    at com.orientechnologies.orient.client.remote.OStorageRemote.networkOperationRetryTimeout(OStorageRemote.java:228) ~[com.orientechnologies.orientdb-client-3.0.32.jar:3.0.32]
    at com.orientechnologies.orient.client.remote.OStorageRemote.networkOperationNoRetry(OStorageRemote.java:261) ~[com.orientechnologies.orientdb-client-3.0.32.jar:3.0.32]
    at com.orientechnologies.orient.client.remote.OStorageRemote.command(OStorageRemote.java:947) ~[com.orientechnologies.orientdb-client-3.0.32.jar:3.0.32]
    at com.orientechnologies.orient.core.db.document.ODatabaseDocumentRemote.indexQuery(ODatabaseDocumentRemote.java:397) ~[com.orientechnologies.orientdb-client-3.0.32.jar:3.0.32]
    at com.orientechnologies.orient.core.index.OIndexRemoteOneValue.get(OIndexRemoteOneValue.java:48) ~[com.orientechnologies.orientdb-core-3.0.32.jar:3.0.32]
    at com.orientechnologies.orient.core.index.OIndexRemoteOneValue.get(OIndexRemoteOneValue.java:39) ~[com.orientechnologies.orientdb-core-3.0.32.jar:3.0.32]
    at com.orientechnologies.orient.core.index.OIndexAbstractDelegate.get(OIndexAbstractDelegate.java:59) ~[com.orientechnologies.orientdb-core-3.0.32.jar:3.0.32]
    at com.orientechnologies.orient.core.index.OIndexTxAwareOneValue.get(OIndexTxAwareOneValue.java:231) ~[com.orientechnologies.orientdb-core-3.0.32.jar:3.0.32]
    at com.orientechnologies.orient.core.index.OIndexTxAwareOneValue.get(OIndexTxAwareOneValue.java:38) ~[com.orientechnologies.orientdb-core-3.0.32.jar:3.0.32]
    at com.convergencelabs.convergence.server.backend.datastore.OrientDBUtil$.$anonfun$findDocumentFromSingleValueIndex$1(OrientDBUtil.scala:189) ~[com.convergencelabs.convergence-server-1.0.0-rc.7.jar:1.0.0-rc.7]
    at scala.util.Try$.apply(Try.scala:210) ~[org.scala-lang.scala-library-2.13.2.jar:?]
    at com.convergencelabs.convergence.server.backend.datastore.OrientDBUtil$.findDocumentFromSingleValueIndex(OrientDBUtil.scala:185) ~[com.convergencelabs.convergence-server-1.0.0-rc.7.jar:1.0.0-rc.7]
    at com.convergencelabs.convergence.server.backend.datastore.domain.model.ModelStore$.com$convergencelabs$convergence$server$backend$datastore$domain$model$ModelStore$$findModelDocument(ModelStore.scala:333) ~[com.convergencelabs.convergence-server-1.0.0-rc.7.jar:1.0.0-rc.7]
    at com.convergencelabs.convergence.server.backend.datastore.domain.model.ModelStore.$anonfun$getModel$1(ModelStore.scala:198) ~[com.convergencelabs.convergence-server-1.0.0-rc.7.jar:1.0.0-rc.7]
    at com.convergencelabs.convergence.server.backend.db.PooledDatabaseProvider.$anonfun$withDatabase$3(PooledDatabaseProvider.scala:58) ~[com.convergencelabs.convergence-server-1.0.0-rc.7.jar:1.0.0-rc.7]
    at com.convergencelabs.convergence.server.util.TryWithResource.tryWithResolvedResource(TryWithResource.scala:50) ~[com.convergencelabs.convergence-server-1.0.0-rc.7.jar:1.0.0-rc.7]
    at com.convergencelabs.convergence.server.util.TryWithResource.com$convergencelabs$convergence$server$util$TryWithResource$$tryWithResource(TryWithResource.scala:36) ~[com.convergencelabs.convergence-server-1.0.0-rc.7.jar:1.0.0-rc.7]
    at com.convergencelabs.convergence.server.util.TryWithResource$.apply(TryWithResource.scala:78) ~[com.convergencelabs.convergence-server-1.0.0-rc.7.jar:1.0.0-rc.7]
    at com.convergencelabs.convergence.server.backend.db.PooledDatabaseProvider.$anonfun$withDatabase$1(PooledDatabaseProvider.scala:56) ~[com.convergencelabs.convergence-server-1.0.0-rc.7.jar:1.0.0-rc.7]
    at scala.util.Success.flatMap(Try.scala:258) ~[org.scala-lang.scala-library-2.13.2.jar:?]
    at com.convergencelabs.convergence.server.backend.db.PooledDatabaseProvider.withDatabase(PooledDatabaseProvider.scala:55) ~[com.convergencelabs.convergence-server-1.0.0-rc.7.jar:1.0.0-rc.7]
    at com.convergencelabs.convergence.server.backend.datastore.AbstractDatabasePersistence.withDb(AbstractDatabasePersistence.scala:32) ~[com.convergencelabs.convergence-server-1.0.0-rc.7.jar:1.0.0-rc.7]
    at com.convergencelabs.convergence.server.backend.datastore.domain.model.ModelStore.getModel(ModelStore.scala:196) ~[com.convergencelabs.convergence-server-1.0.0-rc.7.jar:1.0.0-rc.7]
    at com.convergencelabs.convergence.server.backend.services.domain.model.RealtimeModelActor.$anonfun$retrieveModel$3(RealtimeModelActor.scala:325) ~[com.convergencelabs.convergence-server-1.0.0-rc.7.jar:1.0.0-rc.7]
    at com.convergencelabs.convergence.server.backend.services.domain.model.RealtimeModelActor.$anonfun$retrieveModel$3$adapted(RealtimeModelActor.scala:323) ~[com.convergencelabs.convergence-server-1.0.0-rc.7.jar:1.0.0-rc.7]
    at scala.util.Success.flatMap(Try.scala:258) ~[org.scala-lang.scala-library-2.13.2.jar:?]
    at com.convergencelabs.convergence.server.backend.services.domain.model.RealtimeModelActor.$anonfun$retrieveModel$1(RealtimeModelActor.scala:323) ~[com.convergencelabs.convergence-server-1.0.0-rc.7.jar:1.0.0-rc.7]
    at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18) ~[org.scala-lang.scala-library-2.13.2.jar:?]
    at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:671) ~[org.scala-lang.scala-library-2.13.2.jar:?]
    at scala.concurrent.impl.Promise$Transformation.run(Promise.scala:430) ~[org.scala-lang.scala-library-2.13.2.jar:?]
    at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:56) ~[com.typesafe.akka.akka-actor_2.13-2.6.6.jar:2.6.6]
    at akka.dispatch.BatchingExecutor$BlockableBatch.$anonfun$run$1(BatchingExecutor.scala:93) ~[com.typesafe.akka.akka-actor_2.13-2.6.6.jar:2.6.6]
    at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18) [org.scala-lang.scala-library-2.13.2.jar:?]
    at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:94) [org.scala-lang.scala-library-2.13.2.jar:?]
    at akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:93) [com.typesafe.akka.akka-actor_2.13-2.6.6.jar:2.6.6]
    at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:48) [com.typesafe.akka.akka-actor_2.13-2.6.6.jar:2.6.6]
    at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:48) [com.typesafe.akka.akka-actor_2.13-2.6.6.jar:2.6.6]
    at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290) [?:?]
    at java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020) [?:?]
    at java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656) [?:?]
    at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594) [?:?]
    at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:177) [?:?]

More details:

  • We observe no other connectivity issues with Convergence server or OrientDB.
  • There are no errors or warnings in OrientDB logs.
  • We have everything deployed to a Kubernetes cluster. Our installation has 1 instance of convergence-seed, 3 instances of convergence-server, and 1 instance of OrientDB.

Maybe you could give us some hints for what might go wrong and tips on what Convergence server settings to tweak. Since GET should be a safe method, should we implement retries on the side of our application that makes REST API calls?

Versions

  • Convergence Version: convergencelabs/convergence-server:1.0.0-rc.7
  • OS: Running in Docker

Describe the Bug

REST API client sometimes gets 500 Internal Server Error from Convergence.

Expected Behavior

If Convergence server has some kind of an OrientDB connection pool which might contain stale connections, it should reconnect when hitting connection reset.

Alternatively, if retries are considered a caller's duty, Convergence server should probably return 503 instead of 500 to signify that the client should make a retry.

@earshinov earshinov added the bug Something isn't working label Nov 3, 2020
@mmacfadden
Copy link
Contributor

Thanks for reporting. This does seem like a bug. I do believe we are using the built in OrientDB connection pooling capability. In general, we would expect that the connection pool would be keeping the connections alive or fresh.

@mmacfadden
Copy link
Contributor

Finally getting to look at this error. I did confirm that the error is inside OrientDB client code. Here specifically:

https://github.com/orientechnologies/orientdb/blob/3c1a6e6a9a18ecc0312058ac9f2ba86cd2365594/client/src/main/java/com/orientechnologies/orient/client/remote/OStorageRemote.java#L348

We are using the OrientDB connection pooling, so this seems like there is an error in the connection pool logic on the OrientDB side. We are in the process of upgrading from 3.0.x to 3.2.x of OrientDB so I am hoping that this will resolve the situation.

I do agree on the 503 vs 500 error.

@mmacfadden mmacfadden added this to the 1.0.0-rc.11 milestone Jun 13, 2021
@earshinov
Copy link
Author

As a matter of fact, after we reduced the number of convergence-server instances we run in the Akka cluster from 3 to 1 we basically stopped receiving these "Connection reset" errors. They only happen occasionally (a couple of times in a month) and no longer bother us.

But as you were going to upgrade OrientDB client libraries to 3.2.x (and it seems it hasn't been done yet in 1.0.0-rc.12), I think we should keep this issue open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants