Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

secretpad 训练流状态显示不一致 #112

Open
john8628 opened this issue Jul 29, 2024 · 16 comments
Open

secretpad 训练流状态显示不一致 #112

john8628 opened this issue Jul 29, 2024 · 16 comments

Comments

@john8628
Copy link

john8628 commented Jul 29, 2024

Issue Type

Feature

Have you searched for existing issues?

Yes

Link to Relevant Documentation

No response

Question Details

两个建立通讯的节点,在pad查看同一个训练流,两边状态不一致;是什么原因导致;
但是不影响在单边的配置,执行任务;通一个任务日志,两边进入容器后,可以查看任务的日志;
(做过sqlite=>mysql)
@aokaokd
Copy link

aokaokd commented Jul 29, 2024

看下kuscia容器中kuscia log日志,是否有error

@john8628
Copy link
Author

看下kuscia容器中kuscia log日志,是否有error

执行的任务确实失败过;但是页面展示的问题已经怎么解决啊?

@zimu-yuxi
Copy link

看下kuscia容器中kuscia log日志,是否有error

执行的任务确实失败过;但是页面展示的问题已经怎么解决啊?

页面组件是未运行,还是一直在运行中。方便给一下截图吗?

@john8628
Copy link
Author

image
image

@aokaokd
Copy link

aokaokd commented Jul 29, 2024

你好,你使用的是p2p的部署模式吗

@john8628
Copy link
Author

你好,你使用的是p2p的部署模式吗

是的

@aokaokd
Copy link

aokaokd commented Jul 30, 2024

你重新发起一个任务,看下secretpad侧的日志中有没有error

@john8628
Copy link
Author

你重新发起一个任务,看下secretpad侧的日志中有没有error

有个报错;
14:11:49 [http-nio-8080-exec-7] ERROR o.s.s.w.e.SecretpadExceptionHandler - find SecretpadException error: AUTH_FAILED, message: The request header does not contain header!
org.secretflow.secretpad.common.exception.SecretpadException: The request header does not contain header!
at org.secretflow.secretpad.common.exception.SecretpadException.of(SecretpadException.java:58)
at org.secretflow.secretpad.web.util.AuthUtils.findTokenInHeader(AuthUtils.java:43)
at org.secretflow.secretpad.web.interceptor.LoginInterceptor.processByUserRequest(LoginInterceptor.java:187)
at org.secretflow.secretpad.web.interceptor.LoginInterceptor.preHandle(LoginInterceptor.java:149)
at org.springframework.web.servlet.HandlerExecutionChain.applyPreHandle(HandlerExecutionChain.java:146)
at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:1076)
at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:974)
at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:1014)
at org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:914)
at jakarta.servlet.http.HttpServlet.service(HttpServlet.java:590)
at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:885)
at jakarta.servlet.http.HttpServlet.service(HttpServlet.java:658)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:206)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:150)
at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:51)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:175)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:150)
at org.secretflow.secretpad.web.filter.AddResponseHeaderFilter.doFilterInternal(AddResponseHeaderFilter.java:61)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:175)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:150)
at org.springframework.web.filter.RequestContextFilter.doFilterInternal(RequestContextFilter.java:100)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:175)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:150)
at org.springframework.web.filter.FormContentFilter.doFilterInternal(FormContentFilter.java:93)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:175)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:150)
at org.springframework.web.filter.ServerHttpObservationFilter.doFilterInternal(ServerHttpObservationFilter.java:109)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:175)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:150)
at org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:201)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:175)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:150)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:167)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:90)
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:482)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:115)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:93)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:74)
at org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:673)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:344)
at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:391)
at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:63)
at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:896)
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1736)
at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:52)
at org.apache.tomcat.util.threads.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1191)
at org.apache.tomcat.util.threads.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:659)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:63)
at java.base/java.lang.Thread.run(Thread.java:840)

@aokaokd
Copy link

aokaokd commented Jul 30, 2024

鉴权失败的error,不是这个导致的,后面应该收敛了吧

@aokaokd
Copy link

aokaokd commented Jul 30, 2024

你看下你的kuscia log日志,不是任务日志

@john8628
Copy link
Author

有类似的报错
image

@aokaokd
Copy link

aokaokd commented Jul 31, 2024

这个是因为rpc连接被关闭导致的。看上去是这个原因。这里的源码你有修改过吗。检查你的代码逻辑

@john8628
Copy link
Author

这个是因为rpc连接被关闭导致的。看上去是这个原因。这里的源码你有修改过吗。检查你的代码逻辑

没有改过rpc的核心代码;改造了mysql的存储;

@aokaokd
Copy link

aokaokd commented Jul 31, 2024

再跑一下任务,控制台会轮询请求node/status,看看请求里面有没有error

@john8628
Copy link
Author

再跑一下任务,控制台会轮询请求node/status,看看请求里面有没有error

还是kusia.log,看不出来什么问题,问题方便加个dingding吗;

@john8628
Copy link
Author

已经按照建议;大概率是网络抖动造成的数据同步问题;已经去掉了网络代理nginx的相关配置;持续观察中

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants