Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Smurf S04 Slot 2 high noise on Bands 0-3, 'Unlocked' Jesd status #439

Open
samdayweiss opened this issue Aug 27, 2024 · 2 comments
Open

Comments

@samdayweiss
Copy link

samdayweiss commented Aug 27, 2024

Hi all, this is from troubleshooting that Yuhan W did on the Srv So4 slot 2 Bay 0 AMC card that previously had power issues at the beginning of SATP3's Run 13.

AMC 0 only has bad band 0 and 1, maybe also band 2
after hammering S.check_jesd(0) either returns:

[ 2024-08-27 03:19:46 ]  JESD Tx Okay
[ 2024-08-27 03:19:46 ]  JESD Rx Okay
[ 2024-08-27 03:19:47 ]  JESD health check finished after 1 seconds. The final status was Unlocked.
(True, True, 'Unlocked')

or

[ 2024-08-27 02:03:45 ]  JESD Tx DOWN
[ 2024-08-27 02:03:45 ]  JESD Rx Okay
[ 2024-08-27 02:03:46 ]  JESD health check finished after 1 seconds. The final status was Unlocked.
(False, True, 'Unlocked')
  1. power cycle the crate doesn’t fix this
  2. when trying S.setup(force_configure=True)
    it errored out with:
ChannelAccessGetFailure: Get failed; status code: 192

CA.Client.Exception...............................................
    Warning: "Virtual circuit disconnect"
    Context: "localhost:5064"
    Source File: ../cac.cpp line 1223
    Current Time: Tue Aug 27 2024 03:24:28.384456697
..................................................................
4. when hammering, saw this
[ 2024-08-27 03:01:50 ]  caget smurf_server_s2:AMCc:SmurfApplication:ConfiguringInProgress
[ 2024-08-27 03:02:00 ]  caget smurf_server_s2:AMCc:SmurfApplication:ConfiguringInProgress
[ 2024-08-27 03:02:10 ]  caget smurf_server_s2:AMCc:SmurfApplication:ConfiguringInProgress
[ 2024-08-27 03:02:20 ]  caget smurf_server_s2:AMCc:SmurfApplication:ConfiguringInProgress
[ 2024-08-27 03:02:30 ]  caget smurf_server_s2:AMCc:SmurfApplication:ConfiguringInProgress
[ 2024-08-27 03:02:40 ]  caget smurf_server_s2:AMCc:SmurfApplication:ConfiguringInProgress
[ 2024-08-27 03:02:50 ]  caget smurf_server_s2:AMCc:SmurfApplication:ConfiguringInProgress
[ 2024-08-27 03:03:00 ]  caget smurf_server_s2:AMCc:SmurfApplication:ConfiguringInProgress
[ 2024-08-27 03:03:10 ]  caget smurf_server_s2:AMCc:SmurfApplication:ConfiguringInProgress
[ 2024-08-27 03:03:20 ]  caget smurf_server_s2:AMCc:SmurfApplication:ConfiguringInProgress
[ 2024-08-27 03:03:30 ]  caget smurf_server_s2:AMCc:SmurfApplication:ConfiguringInProgress
[ 2024-08-27 03:03:40 ]  caget smurf_server_s2:AMCc:SmurfApplication:ConfiguringInProgress
[ 2024-08-27 03:03:50 ]  caget smurf_server_s2:AMCc:SmurfApplication:ConfiguringInProgress
[ 2024-08-27 03:04:00 ]  caget smurf_server_s2:AMCc:SmurfApplication:ConfiguringInProgress
[ 2024-08-27 03:04:10 ]  caget smurf_server_s2:AMCc:SmurfApplication:ConfiguringInProgress
[ 2024-08-27 03:04:20 ]  caget smurf_server_s2:AMCc:SmurfApplication:ConfiguringInProgress
[ 2024-08-27 03:04:26 ]  False
[ 2024-08-27 03:04:26 ]  caget smurf_server_s2:AMCc:SmurfApplication:SystemConfigured
[ 2024-08-27 03:04:26 ]  True
[ 2024-08-27 03:04:26 ]  System configuration finished after 196 seconds. The final state was True.
[ 2024-08-27 03:04:26 ]  caget smurf_server_s2:AMCc:SmurfApplication:SmurfVersion
[ 2024-08-27 03:04:26 ]  7.4.0+0.g3ed5cb3c.dirty

See plot attached for full band response.
Screenshot 2024-08-26 at 11 30 28 PM

@jlashner
Copy link
Collaborator

Hi Sam, thanks for the info. If this problem is still occurring do you think you can run S.save_state to dump the rogue state so we could get maybe @swh76 to quickly check it out?

The JESD being unlocked is concerning to me, I'm not sure if I've seen that before, though its not something I regularly check often

I don't believe that the CAClient exception during S.setup means anything, it is just timing out because the server is busy loading the configuration. I believe the fact that the logs end with: System configuration finished after 196 seconds. The final state was True. means it was successful.

We've previously seen issues on SATp1 where individual bands (not entire AMCs) were saturated while others looked normal. We never really figured out what was wrong, but learned that it was sometimes fixed by hammering. Just to confirm, you did hammer but this didn't resolve the issue? This is a good one to track.

@swh76
Copy link
Collaborator

swh76 commented Aug 30, 2024

Chatted w/ @samdayweiss on slack ; I think this AMC needs to come back to SLAC for a look. The issue has been confirmed to follow the AMC (ie, another AMC installed in its place works as expected). I'm not sure what the issue could be, but I think odds point to it being a hardware issue of some kind. I believe hammering also did not resolve this issue.
@jlashner re: individual bands being saturated that resolves on hammer, is there an open discussion for that problem? If don't have already, it would be good to get a state dump for a system in that state.
Re: JESD reporting unlocked, depending on the details this may not be that worrisome; the Unlocked status is probably a bad status descriptor usually not what it sounds like ; ie that status will get reported if a system instantaneously loses lock and successfully recovers (which as far as I know, does not result in any negative impact). It is most useful when run right after setup(), otherwise interpretation depends on the exact state of the many things (some innocuous) that the rogue CheckJesd routine looks at.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants