Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace sync request with async request in DistroFilter #10148

Closed
yuyijq opened this issue Mar 21, 2023 · 14 comments
Closed

Replace sync request with async request in DistroFilter #10148

yuyijq opened this issue Mar 21, 2023 · 14 comments

Comments

@yuyijq
Copy link
Contributor

yuyijq commented Mar 21, 2023

Using a synchronous approach to forward requests to other servers in DistroFilter can block Tomcat's processing thread when other servers respond very slowly. This can easily fill up Tomcat's thread pool and render the entire cluster unusable in case of failures. Is it possible to change this to an asynchronous request?

在DistroFilter里使用同步的方式转发请求给其他server,如果其他server响应非常缓慢的时候,这将会block tomcat的处理线程,当出现故障的时候,很容易将tomcat的线程池打满,导致整个集群不可用。
这个地方能改成异步请求吗?

@KomachiSion
Copy link
Collaborator

DistroFilter是用于适配1.x客户端的的一个转发器, 未来最终会被移除(在不需要支持1.x客户端时)

改为异步会存在几个问题:

  1. 返回给客户端成功,但实际请求失败,会导致数据不一致的问题(比如提示给客户端心跳成功,但实际异步执行失败,客户端就不进行补偿注册了,实例就丢了)
  2. 可能会导致集群更大的压力,如果服务端已经处于较高压力的情况下,客户端的心跳,注册等操作如果不同步等待处理结果,用虚拟结果代替,会导致用更快的频率向服务端发起,服务端的异步转发请求会进一步变多,可能造成更大的影响。

目前想到这两方面的问题。

@yuyijq
Copy link
Contributor Author

yuyijq commented Mar 21, 2023

@KomachiSion 异步的话不会导致数据不一致啊,我是说结合async servlet,异步请求target server,然后等结果返回后返回给客户端,而不占用当前server的tomcat线程,但是现在的问题是比较严重的,一个节点出现问题影响整个集群可用性。
我们上周出现一件事情是,其中一个节点因为运维某个操作,导致这个节点CPU飙升,处理请求变慢,最后发现其他两个正常的节点tomcat线程池被打满,拒绝服务了。

至于导致更大的压力这个就是另外一件事情了啊,也应该依靠这里的同步请求来来解决的,比如可以限流等。

@guozongkang
Copy link
Contributor

"http-nio-8848-exec-200" #525 daemon prio=5 os_prio=0 tid=0x00007ff578177000 nid=0x266 waiting on condition [0x00007ff4c30f5000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000006f7bd2610> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkUntil(LockSupport.java:256)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitUntil(AbstractQueuedSynchronizer.java:2120)
at org.apache.http.pool.AbstractConnPool.getPoolEntryBlocking(AbstractConnPool.java:391)
at org.apache.http.pool.AbstractConnPool.access$300(AbstractConnPool.java:70)
at org.apache.http.pool.AbstractConnPool$2.get(AbstractConnPool.java:253)
- locked <0x00000007dfe2a8a0> (a org.apache.http.pool.AbstractConnPool$2)
at org.apache.http.pool.AbstractConnPool$2.get(AbstractConnPool.java:198)
at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.leaseConnection(PoolingHttpClientConnectionManager.java:306)
at org.apache.http.impl.conn.PoolingHttpClientConnectionManager$1.get(PoolingHttpClientConnectionManager.java:282)
at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:190)
at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)
at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:108)
at com.alibaba.nacos.common.http.client.request.DefaultHttpClientRequest.execute(DefaultHttpClientRequest.java:58)
at com.alibaba.nacos.common.http.client.NacosRestTemplate.execute(NacosRestTemplate.java:482)
at com.alibaba.nacos.common.http.client.NacosRestTemplate.exchange(NacosRestTemplate.java:447)
at com.alibaba.nacos.naming.misc.HttpClient.request(HttpClient.java:119)
at com.alibaba.nacos.naming.web.DistroFilter.doFilter(DistroFilter.java:133)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:189)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:162)
at com.alibaba.nacos.core.auth.AuthFilter.doFilter(AuthFilter.java:69)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:189)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:162)
at com.alibaba.nacos.naming.web.ServiceNameFilter.doFilter(ServiceNameFilter.java:77)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:189)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:162)
at com.alibaba.nacos.naming.web.TrafficReviseFilter.doFilter(TrafficReviseFilter.java:75)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:189)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:162)
at org.springframework.security.web.FilterChainProxy.doFilterInternal(FilterChainProxy.java:204)
at org.springframework.security.web.FilterChainProxy.doFilter(FilterChainProxy.java:183)
at org.springframework.web.filter.DelegatingFilterProxy.invokeDelegate(DelegatingFilterProxy.java:354)
at org.springframework.web.filter.DelegatingFilterProxy.doFilter(DelegatingFilterProxy.java:267)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:189)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:162)
at org.springframework.web.filter.RequestContextFilter.doFilterInternal(RequestContextFilter.java:100)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:117)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:189)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:162)
at org.springframework.web.filter.FormContentFilter.doFilterInternal(FormContentFilter.java:93)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:117)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:189)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:162)
at org.springframework.boot.actuate.metrics.web.servlet.WebMvcMetricsFilter.doFilterInternal(WebMvcMetricsFilter.java:96)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:117)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:189)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:162)
at org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:201)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:117)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:189)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:162)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:197)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:97)
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:541)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:135)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:92)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:78)
at org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:687)
at org.apache.catalina.valves.RemoteIpValve.invoke(RemoteIpValve.java:769)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:360)
at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:399)
at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:65)
at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:890)
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1743)
at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
- locked <0x00000007e0087148> (a org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper)
at org.apache.tomcat.util.threads.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1191)
at org.apache.tomcat.util.threads.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:659)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:750)

@yuyijq
Copy link
Contributor Author

yuyijq commented Mar 21, 2023

@KomachiSion 能帮我review一下 这个pr吗?#10158

@KomachiSion
Copy link
Collaborator

PR先不急, 第一个问题异步的话会出现。

@yuyijq
Copy link
Contributor Author

yuyijq commented Mar 24, 2023

@KomachiSion 不会啊,我用的是async servlet,只是不block tomcat的请求线程,我还是会等到“责任”节点返回后再返回给客户端啊,不会出现客户端成功但实际失败的情况啊

@yuyijq
Copy link
Contributor Author

yuyijq commented Mar 25, 2023

@xuechaos
@KomachiSion

我是这么觉得,我想的其实是DistroFilter的转发不要占用当前节点的tomcat线程,因为一旦占用其实就是节点之间相互影响了,那么其实比较好的办法就是将这个转发改成异步,但这个异步改造并不是说DistroFilter里将转发请求发出去就ok了,是会利用async servlet的特性,等着对应负责的节点返回后将结果返回给客户端,所以不会带来 @KomachiSion 所说的,给客户端返回心跳成功,实际失败的情况,我用伪代码描述一下:

这是同步的方式

//in DistroFilter
result = HttpClient.request(...);
writeResponse(result);

异步的方式

AsyncContext context = request.startAsync();
HttpClient.asyncRequest(... ( result)->{
   writeResponse(result);
   context.complete();
});

另外,对于问题2,我觉得也是不存在的,我们不能以集群的可用性为代价来保护一个本来就慢了的节点呀。

关于升级客户端到2.x
升级是必然的趋势,这也是我们的方向,但是Nacos作为一个比较核心的组件,它的客户端几乎存在于公司的所有应用中,所以升级是有一个过程的,我们不可能说明天全公司全给升了,这是比较难达到的。

另外,我其实认为DistroFilter这种转发是有一定的道理的,这样可以在某种程度获得一定的实时性保证。因为在正常的时候,所有的请求都是由负责的节点处理,得到的数据是最一致和新鲜的。

谢谢

@KomachiSion
Copy link
Collaborator

@KomachiSion 不会啊,我用的是async servlet,只是不block tomcat的请求线程,我还是会等到“责任”节点返回后再返回给客户端啊,不会出现客户端成功但实际失败的情况啊

同步转异步,客户端到服务端A的请求需要被直接返回,应该只能返回成功吧?

但是异步请求到服务端B可能失败, 这样客户端认为成功了, 但是失败了, 这样就不会补偿了, 数据就出问题了。

@yuyijq
Copy link
Contributor Author

yuyijq commented Mar 27, 2023

@KomachiSion 不会
Client -> A,A -- async request --> B
Client -> A会等到B返回给A再返回,不会立刻返回

@KomachiSion
Copy link
Collaborator

@xuechaos @KomachiSion

我是这么觉得,我想的其实是DistroFilter的转发不要占用当前节点的tomcat线程,因为一旦占用其实就是节点之间相互影响了,那么其实比较好的办法就是将这个转发改成异步,但这个异步改造并不是说DistroFilter里将转发请求发出去就ok了,是会利用async servlet的特性,等着对应负责的节点返回后将结果返回给客户端,所以不会带来 @KomachiSion 所说的,给客户端返回心跳成功,实际失败的情况,我用伪代码描述一下:

这是同步的方式

//in DistroFilter
result = HttpClient.request(...);
writeResponse(result);

异步的方式

AsyncContext context = request.startAsync();
HttpClient.asyncRequest(... ( result)->{
   writeResponse(result);
   context.complete();
});

另外,对于问题2,我觉得也是不存在的,我们不能以集群的可用性为代价来保护一个本来就慢了的节点呀。

关于升级客户端到2.x 升级是必然的趋势,这也是我们的方向,但是Nacos作为一个比较核心的组件,它的客户端几乎存在于公司的所有应用中,所以升级是有一个过程的,我们不可能说明天全公司全给升了,这是比较难达到的。

另外,我其实认为DistroFilter这种转发是有一定的道理的,这样可以在某种程度获得一定的实时性保证。因为在正常的时候,所有的请求都是由负责的节点处理,得到的数据是最一致和新鲜的。

谢谢

这个方式的话,看起来是可行的,但是要做好response的管理,避免连接和内存泄漏。

可以添加一个开关,在异步有问题的时候能够回滚到同步恢复。默认值维持同步。

@xuechaos
Copy link
Member

@yuyijq @KomachiSion 嗯 我觉得可以的,可以先用默认同步的,增加个动态配置能开启异步,后边逐步默认过渡到异步。长远上,其实也可以考虑从客户端开始异步化,这部分可以一起在Nacos 3.x可以考虑一下。
@yuyijq 帮忙增加个开关,我们合一下,下个版本就可以带出去。

@yuyijq
Copy link
Contributor Author

yuyijq commented Mar 27, 2023

@xuechaos 好的,我加个开关~~
谢谢支持

@yuyijq
Copy link
Contributor Author

yuyijq commented Mar 27, 2023

@xuechaos @KomachiSion
我在EnvUtil里加了一个环境变量的开关,可以通过启动参数 -Dnacos.async.distro.forward=true 打开这个异步的功能,麻烦review一下
谢谢

KomachiSion pushed a commit that referenced this issue Apr 20, 2023
* Replace sync forward request with async request in DistroFilter. issue #10148

* extract method for config default headers

* add env switch for async distro forward.

* Fixed code review problems:
1. Move async forward switch from sys module to naming module.
2. use nacos code style to format code.

* Fixed nacos code checkstyle:
1. one import per Class
2. add javadoc

* In order to avoid additional overhead, move switch from GlobalConfig to ClientConfig and cache the env switch.

* Move switch from ClientConfig to DistroConfig.

* Removed unused import.

* Add test for async forward for DistroFilter

* Add license

* rename test method name

* Should enable async forward

* fixed test

* set async forward request switch to true in test

* fixed test: create and set property with MockEnvironment

* fixed check style

* move MockEnvironment init to BeforeClass

* add setter for asyncForwardRequest switch
@KomachiSion
Copy link
Collaborator

@yuyijq
Sorry, I make a mistake, I found add new dependency

        <dependency>
            <groupId>org.mock-server</groupId>
            <artifactId>mockserver-netty</artifactId>
        </dependency>
        <dependency>
            <groupId>org.mock-server</groupId>
            <artifactId>mockserver-client-java</artifactId>
        </dependency>

I merged the PR but I found these new dependencies and do revert first.

Why add these new dependencies?

Can you use mock RestTemplate for instead?

@yuyijq yuyijq closed this as completed Apr 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants