Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZMQ freezes on first command after session is created #39

Open
CSchoel opened this issue Sep 8, 2020 · 7 comments
Open

ZMQ freezes on first command after session is created #39

CSchoel opened this issue Sep 8, 2020 · 7 comments

Comments

@CSchoel
Copy link

CSchoel commented Sep 8, 2020

Sometimes (quite rarely), the first call to sendExpression() after an OMCSession is created freezes.

Stacktrace of InterruptException (after CTRL-C):

 [1] wait(::FileWatching._FDWatcher; readable::Bool, writable::Bool) at /build/julia/src/julia-1.5.0/usr/share/julia/stdlib/v1.5/FileWatching/src/FileWatching.jl:529
 [2] wait at /home/cslz90/.julia/packages/ZMQ/R3wSD/src/socket.jl:52 [inlined]
 [3] _recv!(::ZMQ.Socket, ::ZMQ.Message) at /home/cslz90/.julia/packages/ZMQ/R3wSD/src/comm.jl:75
 [4] recv at /home/cslz90/.julia/packages/ZMQ/R3wSD/src/comm.jl:94 [inlined]
 [5] sendExpression(::OMJulia.OMCSession, ::String) at /home/cslz90/.julia/packages/OMJulia/ZLXEs/src/OMJulia.jl:1014
 [6] setupOMCSession(::String, ::String; quiet::Bool, checkunits::Bool) at /home/cslz90/.julia/packages/ModelicaScriptingTools/G5LLK/src/ModelicaScriptingTools.jl:374

setupOMCSession is my own code which contains the following relevant lines with the second line being the one that shows up in the stacktrace:

omc = OMCSession()
sendExpression(omc, "cd(\"$(moescape(outdir))\")")

This happens with the release version 0.1.0 of OMJulia. I believe I have also encountered it with the current version from the master branch in the past, but I cannot confirm that since I have switched back to the official released version some time ago.

I will try to introduce a sleep for 100ms between the creation of the Session and the first sendExpression() and report back whether this workaround is successful.

@CSchoel
Copy link
Author

CSchoel commented Sep 8, 2020

One additional note: Together with #32 one might get the impression that perhaps any sendExpression() call might freeze, but across several hundred test runs over the last months, I never encountered a freeze between individual simulations, but only at the very start or at the end of the pipeline.

@CSchoel
Copy link
Author

CSchoel commented Nov 3, 2020

Update: I gradually increased the timeout from 100 ms to 500 ms, but still got occasional hangups. My next best guess is this suggestion from a related issue in ZMQ.jl: JuliaInterop/ZMQ.jl#87 (comment)

function avoidStartupFreeze(omc:: OMCSession)
    status = :started
    timeout = 0.1
    while status != :received
        # send a simple command to OMC
        send(omc.socket, "getVersion()")
        # use julia task to allow recv to run into a timeout
        c = Channel()
        @async put!(c, (recv(omc.socket), :received));
        @async (sleep(timeout); put!(c, (nothing, :timedout));)
        data, status = take!(c)
        if status == :timedout
            @warn("getVersion() timed out in avoidStartupFreeze")
        end
    end
end

This sends getVersion() to the OMC until an answer is received in less than 100 ms. I am not sure if this (rather crude) timeout mechanism will work if ZMQ freezes as the issue is not reliably reproducible. I will report back when I encounter a case where the warning message is issued.

@CSchoel
Copy link
Author

CSchoel commented Nov 12, 2020

Update can be found here: THM-MoTE/ModelicaScriptingTools.jl#9

The solution avoids freezes, but ZMQ crashes with a ZMQ.StateError.

@CSchoel
Copy link
Author

CSchoel commented Nov 19, 2020

Another update: I have now improved the function avoidStartupFreeze to a point where it simply discards the whole OMCSession and creates a new one when a timeout is detected.

function avoidStartupFreeze(omc:: OMCSession)
    function reconnect(omc:: OMCSession)
        try
            send(omc.socket, "quit()")
        catch e
        end
        return OMCSession()
    end
    status = :started
    timeout = 0.1
    while status != :received
        # send a simple command to OMC
        send(omc.socket, "getVersion()")
        # use julia task to allow recv to run into a timeout
        # idea from https://github.com/JuliaInterop/ZMQ.jl/issues/87#issuecomment-131153884
        c = Channel()
        @async put!(c, (recv(omc.socket), :received));
        @async (sleep(timeout); put!(c, (nothing, :timedout));)
        data, status = take!(c)
        if status == :timedout
            omc = reconnect(omc)
        end
    end
    return omc
end

So far this works great, although it is more a workaround rather than a solution.

@ghost
Copy link

ghost commented Mar 31, 2021

@CSchoel thank you for that workaround. I also observed the startup freeze, but additionally have problems when running thousands of simulations in a row - at some point the communication fails.

@CSchoel
Copy link
Author

CSchoel commented Apr 9, 2021

@DarkVador42 you're welcome. I am happy that it could be of help to someone else. 😄

Is your error by any chance related to a ZMQ.StateError? This is the only additional problem that I encountered with this method and it only occurs during the creation of an OMCSession instance. I use a very crude solution for this which just recreates the session until there is no error and up until now it works. 🤷

@ghost
Copy link

ghost commented Apr 12, 2021

@CSchoel, yes, it also happens regularly when I create the OMCSession. Apart from that it also froze when I had thousands of model calls, where it was trapped inside a "wait" function of ZMQ - I cannot be more specific here, since I was not able to reproduce the error...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant