Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: fix non-local client connection problem when server listen on 0.0.0.0 #92

Merged
merged 1 commit into from
Jul 10, 2024

Conversation

frostyplanet
Copy link
Contributor

@frostyplanet frostyplanet commented Jul 4, 2024

Senario:
server(supervisor) listen on 0.0.0.0, might because it has multiple ip address, or the server hide behind L4 loadbalancer.
client (worker) connect server from another host (by ip or hostname) will cause such error.

File "/root/inference/xinference/deploy/worker.py", line 65, in _start_worker                                                                                                                              
    await start_worker_components(                                                                                                                                                                                 
  File "/root/inference/xinference/deploy/worker.py", line 43, in start_worker_components                                                                                                                    
    await xo.create_actor(                                                                                                                                                                                         
  File "/usr/local/lib/python3.10/dist-packages/xoscar/api.py", line 78, in create_actor                                                                                                                           
    return await ctx.create_actor(actor_cls, *args, uid=uid, address=address, **kwargs)                                                                                                                            
  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 143, in create_actor                                                                                                             
    return self._process_result_message(result)                                                                                                                                                                    
  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 102, in _process_result_message                                                                                                  
    raise message.as_instanceof_cause()                                                                                                                                                                            
  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 598, in create_actor                                                                                                                
    await self._run_coro(message.message_id, actor.__post_create__())                                                                                                                                              
  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 370, in _run_coro                                                                                                                   
    return await coro                                                                                                                                                                                              
  File "/root/inference/xinference/core/worker.py", line 192, in __post_create__                                                                                                                             
    await self._supervisor_ref.add_worker(self.address)
  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 224, in send
    future = await self._call(actor_ref.address, send_message, wait=False)
  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 77, in _call
    return await self._caller.call(
  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/core.py", line 180, in call
    client = await self.get_client(router, dest_address)
  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/core.py", line 68, in get_client

File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/router.py", line 143, in get_client
    client = await self._create_client(client_type, address, **kw)
  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/router.py", line 157, in _create_client
    return await client_type.connect(address, local_address=local_address, **kw)
  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/communication/socket.py", line 255, in connect
    (reader, writer) = await asyncio.open_connection(host=host, port=port, **kwargs)
  File "/usr/lib/python3.10/asyncio/streams.py", line 48, in open_connection
    transport, _ = await loop.create_connection(
  File "/usr/lib/python3.10/asyncio/base_events.py", line 1076, in create_connection
    raise exceptions[0]
  File "/usr/lib/python3.10/asyncio/base_events.py", line 1060, in create_connection
    sock = await self._connect_sock(

File "/usr/lib/python3.10/asyncio/base_events.py", line 969, in _connect_sock
    await self.sock_connect(sock, address)
  File "/usr/lib/python3.10/asyncio/selector_events.py", line 501, in sock_connect
    return await fut
  File "/usr/lib/python3.10/asyncio/selector_events.py", line 541, in _sock_connect_cb
    raise OSError(err, f'Connect call failed {address}')
ConnectionRefusedError: [address=172.31.23.86:9020, pid=1126889] [Errno 111] Connect call failed ('0.0.0.0', 9090)

The root cause is that xo.actor_ref() will return ActorRef(address=0.0.0.0) by server-side, although client originally specify server addr is not 0.0.0.0.
The next call to actor_ref will raise exception because 0.0.0.0 is treated as 127.0.0.1 by client-side.

We can split ActorRef.address into ip & port, detect zero address and replace with correct ip to fix the problem

@XprobeBot XprobeBot added this to the v0.3.0 milestone Jul 4, 2024
@frostyplanet frostyplanet force-pushed the fix_0.0.0.0 branch 3 times, most recently from 122d10b to 4b93854 Compare July 6, 2024 09:24
Copy link

codecov bot commented Jul 6, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 88.34%. Comparing base (d6465c9) to head (4b93854).
Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #92      +/-   ##
==========================================
- Coverage   88.97%   88.34%   -0.64%     
==========================================
  Files          48       48              
  Lines        4038     4040       +2     
  Branches      770      771       +1     
==========================================
- Hits         3593     3569      -24     
- Misses        358      380      +22     
- Partials       87       91       +4     
Flag Coverage Δ
unittests 88.21% <100.00%> (-0.59%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@frostyplanet frostyplanet force-pushed the fix_0.0.0.0 branch 2 times, most recently from 25ba65d to 6b56c4c Compare July 6, 2024 11:52
@frostyplanet frostyplanet force-pushed the fix_0.0.0.0 branch 2 times, most recently from 2a4c035 to 2e4c193 Compare July 10, 2024 04:10
It's xoscar's concept that When client init xoscar.actor_ref, server-side will return an
ActorRef.address (which might redirect to another port).

If server-side listen on zero ip, and client connect from a different host,
the address 0.0.0.0:port returned by server is not address that can be reached.
In this case we use client-side connect address to fix ActorRef.address
@qinxuye qinxuye changed the title Fix non-local client connection problem when server listen on 0.0.0.0 ENH: fix non-local client connection problem when server listen on 0.0.0.0 Jul 10, 2024
@XprobeBot XprobeBot added the enhancement New feature or request label Jul 10, 2024
Copy link
Contributor

@qinxuye qinxuye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@qinxuye qinxuye merged commit 5ffa0ea into xorbitsai:main Jul 10, 2024
11 checks passed
frostyplanet added a commit to frostyplanet/xoscar that referenced this pull request Jul 14, 2024
Introduced from
 commit 5ffa0ea
Author: Adam Ning <frostyplanet@gmail.com>
Date:   Wed Jul 10 15:24:08 2024 +0800

    ENH: fix non-local client connection problem when server listen on 0.0.0.0 (xorbitsai#92)
frostyplanet added a commit to frostyplanet/xoscar that referenced this pull request Jul 30, 2024
Introduced from
 commit 5ffa0ea
Author: Adam Ning <frostyplanet@gmail.com>
Date:   Wed Jul 10 15:24:08 2024 +0800

    ENH: fix non-local client connection problem when server listen on 0.0.0.0 (xorbitsai#92)
frostyplanet added a commit to frostyplanet/xoscar that referenced this pull request Aug 5, 2024
Introduced from
 commit 5ffa0ea
Author: Adam Ning <frostyplanet@gmail.com>
Date:   Wed Jul 10 15:24:08 2024 +0800

    ENH: fix non-local client connection problem when server listen on 0.0.0.0 (xorbitsai#92)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants