Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Fix route problem when listen on 0.0.0.0 #101

Merged
merged 1 commit into from
Aug 21, 2024

Conversation

frostyplanet
Copy link
Contributor

Fix route problem when both service listening 0.0.0.0:port calling each other

Two service listen on 0.0.0.0:1234, on different hosts:

xoscar.actor_ref(111.111.111.111:1234) will return unexpected LocalActorRef.

log print in context.actor_ref(111.111.111.111:1234)

    actor_ref ActorRef(uid=b'supervisor', address='111.111.111.111:1234')
    _call 111.111.111.111:1234
    get client 111.111.111.111:1234
    got LocalActorRef(uid=None, address='0.0.0.0:1234'), actor_weakref=<weakref at 0x75745bcdecf0; to 'CloudSupervisorActor' at 0x75745d3ed260>
    fix_all_zero_ip()
    got LocalActorRef(uid=None, address='111.111.111.111:1234'), actor_weakref=<weakref at 0x75745bcdecf0; to 'CloudSupervisorActor' at 0x75745d3ed260>

using the returned LocalActorRef, method call intend for remote service actually sent to local service.

The solution is simple, during pool initialization, do not register_local_pool if address is all zero (because it's a widcast)

基于此前的改动 #92 发现了一个场景中的问题:

场景是两个服务,都 listen 0.0.0.0 但是在不同主机上同一个端口, 当写一个函数从 hostA.method 中去调用 hostB.method, 发现实际行为是 hostA.method 调用了 hostA.method,出现死循环。是由于从 hostA 去构建 actor_ref(hostB) 的时候返回了LocalActorRef。调查发现和 pool 初始化的行为有关。
解决办法是: 通配地址 0.0.0.0 不应该认为是本地地址,正常走 socket 通信就可以了.

Copy link

codecov bot commented Aug 18, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 85.10%. Comparing base (d6465c9) to head (bb150cd).
Report is 11 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #101      +/-   ##
==========================================
- Coverage   88.97%   85.10%   -3.88%     
==========================================
  Files          48       54       +6     
  Lines        4038     4559     +521     
  Branches      770      834      +64     
==========================================
+ Hits         3593     3880     +287     
- Misses        358      584     +226     
- Partials       87       95       +8     
Flag Coverage Δ
unittests 84.93% <ø> (-3.88%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@qinxuye qinxuye changed the title Fix route problem when listen on 0.0.0.0 BUG: Fix route problem when listen on 0.0.0.0 Aug 18, 2024
@XprobeBot XprobeBot added the bug Something isn't working label Aug 18, 2024
@qinxuye
Copy link
Contributor

qinxuye commented Aug 20, 2024

CI is fixed, could you rebase?

…ch other

Two services on different hosts both listen on 0.0.0.0:port,
xoscar.actor_ref(remote_addr:port) will return unexpected LocalActorRef.

log print in context.actor_ref(address=remote_addr:port)

    actor_ref ActorRef(uid=b'supervisor', address='remote_addr:port')
    _call remote_addr:port
    get client remote_addr:port
    got LocalActorRef(uid=None, address='0.0.0.0:port'), actor_weakref=<weakref at 0x75745bcdecf0; to 'CloudSupervisorActor' at 0x75745d3ed260>
    fix_all_zero_ip()
    got LocalActorRef(uid=None, address='remote_addr:port'), actor_weakref=<weakref at 0x75745bcdecf0; to 'CloudSupervisorActor' at 0x75745d3ed260>

using the returned LocalActorRef, method call intended for remote service actually sent to local service.

The solution is simple, during pool initialization, do not register_local_pool if address is all zero.
@frostyplanet
Copy link
Contributor Author

@qinxuye all ci passed

Copy link
Contributor

@qinxuye qinxuye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@qinxuye qinxuye merged commit 2164333 into xorbitsai:main Aug 21, 2024
12 of 13 checks passed
luweizheng pushed a commit to luweizheng/xoscar that referenced this pull request Sep 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants