Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Services stop serving on dmsghttp randomly #140

Open
ersonp opened this issue Dec 30, 2021 · 1 comment
Open

Services stop serving on dmsghttp randomly #140

ersonp opened this issue Dec 30, 2021 · 1 comment
Labels
bug Something isn't working

Comments

@ersonp
Copy link
Contributor

ersonp commented Dec 30, 2021

Describe the bug

The services stop serving on dmsghttp randomly and the service needs to be restarted to fix it, but sometimes it fixes on it own without needing to restart. The dmsg-server displays the following

[2021-12-30T11:35:08Z] INFO [dmsg_srv]: Initiating stream. session=0229e6340c9ffce614923f410f2ce25ad1e33536fd316e88f6ff2f901f30e2350a yamux_id=19
[2021-12-30T11:35:08Z] DEBUG [dmsg_srv]: Read stream request from initiating side. dst_addr=038f751df4af75fb3d51f6693602bfe8289145e633ffdd1e67d686bea595f84d55:80 session=0229e6340c9ffce614923f410f2ce25ad1e33536fd316e88f6ff2f901f30e2350a src_addr=0229e6340c9ffce614923f410f2ce25ad1e33536fd316e88f6ff2f901f30e2350a:49166 yamux_id=19
[2021-12-30T11:35:08Z] INFO [dmsg_srv]: Stopped stream. error="dmsg error 307 - request cannot be forwarded because the next session is non-existent" session=0229e6340c9ffce614923f410f2ce25ad1e33536fd316e88f6ff2f901f30e2350a yamux_id=19

Following are the logs of the visor

[2022-01-03T14:55:17+05:30] WARN retrier [public_autoconnect]: Get "dmsg://038f751df4af75fb3d51f6693602bfe8289145e633ffdd1e67d686bea595f84d55:80/api/services?quantity=3&type=visor": i/o deadline reached
[2022-01-03T14:55:37+05:30] WARN [app_discovery]: Failed to register service entry in discovery. Retrying... appName="skysocks" error="Get "dmsg://038f751df4af75fb3d51f6693602bfe8289145e633ffdd1e67d686bea595f84d55:80/security/nonces/02aa6fc7009d31622705cb3f2feaa8fe0da10c2ae081cf52e7ba7672bf097defad": i/o deadline reached"
[2022-01-03T14:55:37+05:30] WARN retrier [app_discovery]: Get "dmsg://038f751df4af75fb3d51f6693602bfe8289145e633ffdd1e67d686bea595f84d55:80/security/nonces/02aa6fc7009d31622705cb3f2feaa8fe0da10c2ae081cf52e7ba7672bf097defad": i/o deadline reached appName="skysocks"
[2022-01-03T14:56:07+05:30] WARN retrier [public_autoconnect]: Get "dmsg://038f751df4af75fb3d51f6693602bfe8289145e633ffdd1e67d686bea595f84d55:80/api/services?quantity=3&type=visor": i/o deadline reached
[2022-01-03T14:56:17+05:30] WARN [app_discovery]: Failed to register service entry in discovery. Retrying... appName="skysocks" error="Get "dmsg://038f751df4af75fb3d51f6693602bfe8289145e633ffdd1e67d686bea595f84d55:80/security/nonces/02aa6fc7009d31622705cb3f2feaa8fe0da10c2ae081cf52e7ba7672bf097defad": i/o deadline reached"
[2022-01-03T14:56:17+05:30] WARN retrier [app_discovery]: Get "dmsg://038f751df4af75fb3d51f6693602bfe8289145e633ffdd1e67d686bea595f84d55:80/security/nonces/02aa6fc7009d31622705cb3f2feaa8fe0da10c2ae081cf52e7ba7672bf097defad": i/o deadline reached appName="skysocks"

There are no visible logs of the failure on the side of the service concerned.

@ersonp ersonp added the bug Something isn't working label Dec 30, 2021
@ersonp
Copy link
Contributor Author

ersonp commented Jan 4, 2022

Logs in service discovery

[2022-01-04T07:07:01Z] ERROR [service-discovery]: failed to get service types count error="Redis command returned unexpected error: read tcp 10.2.4.44:46602->10.128.212.58:6379: i/o timeout"
[2022-01-04T07:07:01Z] ERROR [service-discovery]: Error getting dmsg-servers. error="something unexpected happened"
2022/01/04 07:07:02 "GET http://038f751df4af75fb3d51f6693602bfe8289145e633ffdd1e67d686bea595f84d55:80/api/services?quantity=3&type=visor HTTP/1.1" from 034315e403fc858aa2d986f33e8f610e242aed4e3ad1dcc96024795647dc8c0cbf:59199 - 200 3B in 3.006013132s
2022/01/04 07:07:03 "GET http://sd.skywire.dev/api/services?quantity=3&type=visor HTTP/1.1" from 144.91.125.89 - 200 3B in 3.004070887s
2022/01/04 07:07:04 "GET http://sd.skywire.dev/api/services?quantity=3&type=visor HTTP/1.1" from 195.248.241.246 - 200 3B in 3.004016541s
[2022-01-04T07:07:11Z] ERROR [service-discovery]: failed to get service types count error="Redis command returned unexpected error: i/o timeout"
[2022-01-04T07:07:11Z] ERROR [service-discovery]: Error getting dmsg-servers. error="something unexpected happened"
[2022-01-04T07:07:21Z] ERROR [service-discovery]: failed to get service types count error="Redis command returned unexpected error: i/o timeout"
[2022-01-04T07:07:21Z] ERROR [service-discovery]: Error getting dmsg-servers. error="something unexpected happened"
[2022-01-04T07:07:31Z] ERROR [service-discovery]: failed to get service types count error="Redis command returned unexpected error: i/o timeout"
[2022-01-04T07:07:31Z] ERROR [service-discovery]: Error getting dmsg-servers. error="something unexpected happened"
2022/01/04 07:07:35 "GET http://038f751df4af75fb3d51f6693602bfe8289145e633ffdd1e67d686bea595f84d55:80/api/services?quantity=3&type=visor HTTP/1.1" from 034315e403fc858aa2d986f33e8f610e242aed4e3ad1dcc96024795647dc8c0cbf:59200 - 200 3B in 3.005041891s
2022/01/04 07:07:36 "GET http://sd.skywire.dev/api/services?quantity=3&type=visor HTTP/1.1" from 144.91.125.89 - 200 3B in 3.004672763s
2022/01/04 07:07:37 "GET http://sd.skywire.dev/api/services?quantity=3&type=visor HTTP/1.1" from 195.248.241.246 - 200 3B in 3.004127111s
[2022-01-04T07:07:48Z] ERROR [service-discovery]: failed to get service types count error="Redis command returned unexpected error: EOF"
[2022-01-04T07:07:48Z] ERROR [service-discovery]: Error getting dmsg-servers. error="something unexpected happened"

dmsg-discovery

[2022-01-04T01:02:51Z] WARN [dmsg_disc]: 500: something unexpected happened RequestID="dmsg-discovery-7f5dfc8569-qhqbr/zCfdIpLam4-240659"
2022/01/04 01:02:51 [dmsg-discovery-7f5dfc8569-qhqbr/zCfdIpLam4-240659] "GET http://dmsg-discovery.skywire.svc.cluster.local:9090/dmsg-discovery/available_servers HTTP/1.1" from 10.2.4.44:38078 - 500 54B in 3.005538438s
[2022-01-04T01:02:51Z] ERROR [dmsg_disc]: failed to check and remove servers entries error="i/o timeout"
[2022-01-04T01:02:51Z] ERROR [store]: Failed to get servers (SRandMemberN) from redis error="i/o timeout"
[2022-01-04T01:02:51Z] ERROR [dmsg_disc]: Error getting dmsg-servers. error="something unexpected happened"
[2022-01-04T01:02:54Z] ERROR [store]: Failed to get entry from redis error="dial tcp 10.128.212.58:6379: connect: connection refused" pk=02a7d8a02f1cc635d8fd0d785af19e1b371aba8d43cda0ef3b2c0d664f71e3b3e9
[2022-01-04T01:02:54Z] WARN [dmsg_disc]: 500: something unexpected happened RequestID="dmsg-discovery-7f5dfc8569-qhqbr/zCfdIpLam4-240660"
2022/01/04 01:02:54 [dmsg-discovery-7f5dfc8569-qhqbr/zCfdIpLam4-240660] "GET http://dmsgd.skywire.dev/dmsg-discovery/entry/02a7d8a02f1cc635d8fd0d785af19e1b371aba8d43cda0ef3b2c0d664f71e3b3e9 HTTP/1.1" from 192.168.132.49 - 500 54B in 8.038718ms
[2022-01-04T01:02:54Z] ERROR [store]: Failed to get entry from redis error="dial tcp 10.128.212.58:6379: connect: connection refused" pk=0253a0317504e0f4240fe850d36f16fcf99ac3ad8a44339b0031c9415f96b9db73
[2022-01-04T01:02:54Z] WARN [dmsg_disc]: 500: something unexpected happened RequestID="dmsg-discovery-7f5dfc8569-qhqbr/zCfdIpLam4-240661"
2022/01/04 01:02:54 [dmsg-discovery-7f5dfc8569-qhqbr/zCfdIpLam4-240661] "GET http://dmsgd.skywire.dev/dmsg-discovery/entry/0253a0317504e0f4240fe850d36f16fcf99ac3ad8a44339b0031c9415f96b9db73 HTTP/1.1" from 192.168.132.49 - 500 54B in 6.970413ms
[2022-01-04T01:02:54Z] ERROR [store]: Failed to get servers (SRandMemberN) from redis error="dial tcp 10.128.212.58:6379: connect: connection refused"
[2022-01-04T01:02:54Z] WARN [dmsg_disc]: 500: something unexpected happened RequestID="dmsg-discovery-7f5dfc8569-qhqbr/zCfdIpLam4-240662"
2022/01/04 01:02:54 [dmsg-discovery-7f5dfc8569-qhqbr/zCfdIpLam4-240662] "GET http://dmsg-discovery.skywire.svc.cluster.local:9090/dmsg-discovery/available_servers HTTP/1.1" from 10.2.1.51:52596 - 500 54B in 5.672287ms
[2022-01-04T01:02:55Z] ERROR [store]: Failed to get servers (SRandMemberN) from redis error="dial tcp 10.128.212.58:6379: connect: connection refused"
[2022-01-04T01:02:55Z] WARN [dmsg_disc]: 500: something unexpected happened RequestID="dmsg-discovery-7f5dfc8569-qhqbr/zCfdIpLam4-240663"
2022/01/04 01:02:55 [dmsg-discovery-7f5dfc8569-qhqbr/zCfdIpLam4-240663] "GET http://dmsg-discovery.skywire.svc.cluster.local:9090/dmsg-discovery/available_servers HTTP/1.1" from 10.2.0.44:52018 - 500 54B in 3.881937ms
[2022-01-04T01:02:56Z] ERROR [store]: Failed to get servers (SRandMemberN) from redis error="dial tcp 10.128.212.58:6379: connect: connection refused"
[2022-01-04T01:02:56Z] WARN [dmsg_disc]: 500: something unexpected happened RequestID="dmsg-discovery-7f5dfc8569-qhqbr/zCfdIpLam4-240664"
2022/01/04 01:02:56 [dmsg-discovery-7f5dfc8569-qhqbr/zCfdIpLam4-240664] "GET http://dmsg-discovery.skywire.svc.cluster.local:9090/dmsg-discovery/available_servers HTTP/1.1" from 10.2.0.47:39906 - 500 54B in 7.801717ms
2022/01/04 01:02:58 [dmsg-discovery-7f5dfc8569-qhqbr/zCfdIpLam4-240665] "GET http://dmsg-discovery.skywire.svc.cluster.local:9090/dmsg-discovery/available_servers HTTP/1.1" from 10.2.4.44:38078 - 200 716B in 24.232725ms

transport discovery

[2022-01-04T06:25:47Z] ERROR [transport_discovery]: failed to get transports count error="read tcp 10.2.1.51:32924->10.128.212.58:6379: i/o timeout"
[2022-01-04T06:25:47Z] ERROR [transport_discovery]: Error getting dmsg-servers. error="something unexpected happened"
2022/01/04 06:25:52 [transport-discovery-5fd76d6dc5-kn8c4/Yn7b3agvKI-728365] "GET http://10.2.1.51:9091/health HTTP/1.1" from 172.104.188.39:53026 - 200 178B in 87.232µs
2022/01/04 06:25:57 [transport-discovery-5fd76d6dc5-kn8c4/Yn7b3agvKI-728366] "GET http://10.2.1.51:9091/health HTTP/1.1" from 172.104.188.39:53040 - 200 178B in 130.052µs
[2022-01-04T06:25:57Z] ERROR [transport_discovery]: Error getting dmsg-servers. error="something unexpected happened"
[2022-01-04T06:25:57Z] ERROR [transport_discovery]: failed to get transports count error="i/o timeout"
2022/01/04 06:26:02 [transport-discovery-5fd76d6dc5-kn8c4/Yn7b3agvKI-728367] "GET http://10.2.1.51:9091/health HTTP/1.1" from 172.104.188.39:53048 - 200 178B in 91.682µs
2022/01/04 06:26:07 [transport-discovery-5fd76d6dc5-kn8c4/Yn7b3agvKI-728368] "GET http://10.2.1.51:9091/health HTTP/1.1" from 172.104.188.39:53064 - 200 178B in 95.373µs
[2022-01-04T06:26:07Z] ERROR [transport_discovery]: failed to get transports count error="i/o timeout"
[2022-01-04T06:26:07Z] ERROR [transport_discovery]: Error getting dmsg-servers. error="something unexpected happened"
2022/01/04 06:26:12 [transport-discovery-5fd76d6dc5-kn8c4/Yn7b3agvKI-728369] "GET http://10.2.1.51:9091/health HTTP/1.1" from 172.104.188.39:53072 - 200 178B in 62.241µs
2022/01/04 06:26:17 [transport-discovery-5fd76d6dc5-kn8c4/Yn7b3agvKI-728370] "GET http://10.2.1.51:9091/health HTTP/1.1" from 172.104.188.39:53086 - 200 178B in 70.381µs
[2022-01-04T06:26:17Z] ERROR [transport_discovery]: failed to get transports count error="i/o timeout"
[2022-01-04T06:26:17Z] ERROR [transport_discovery]: Error getting dmsg-servers. error="something unexpected happened"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant