Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

partial socket IO isn't handled #306

Open
jlevon opened this issue Feb 4, 2021 · 5 comments
Open

partial socket IO isn't handled #306

jlevon opened this issue Feb 4, 2021 · 5 comments

Comments

@jlevon
Copy link
Collaborator

jlevon commented Feb 4, 2021

Blocking I/O in API

Note that neither the listen fd nor the conn fd are non-blocking. Instead, the get_request_sock() callback sets non-blocking recvmsg() flags if the LIBVFIO_USER_FLAG_ATTACH_NB flag is set.

If we're supposed to be non-blocking, there are multiple paths whereby we might block still:

vfu_dma_read/write(),vfu_irq_message()
  ->vfu_msg()->vfu_msg_fds()->vfu_msg_iovec()->vfu_send_iovec()->sendmsg()
                                             ->vfu_recv_fds()->getmsg()->recvmsg()
                                                             ->recv_blocking()->recvmsg()
process_request()->vfu_send_iovec()->sendmsg() # replies

NB: recv_blocking() is mis-leading: we don't have O_NONBLOCK set anyway, so it doesn't change anything about the socket.

In practice, this is usually not a problem: we're either expecting a quick response (for vfu_dma_read/write()), or we expect to be able to write to the pipe anyway (for process_request() replies)

But it's certainly not good for users such as SPDK.

Short reads/writes

If we ever have short reads or writes (indicating that the message couldn't be handled in one go), then we'll totally break (I think), with no way to recover.

All the sock code is at issue here, namely vfu_send_iovec(), get_msg(), and recv_blocking().
If we're in blocking mode, we could at least loop until we've received a full message.

In non-blocking, we could still loop. In the worst case, we'd sit at 100% CPU and hang the runtime (SPDK). The alternative is for us to somehow save the state, and return from vfu_run_ctx() / vfu_dma_read/write()/vfu_irq_message(), so on the next call, we can pick it up and try to finish the read/write. This has a bunch of problems, though. For starters, the library is not re-entrant, so other contexts generally can't come in and call the library again. (And another SPDK thread could call e.g. vfu_dma_read()). To do this we'd have to effectively re-write the entire library to be properly async event loop based.

In practice, again, we're generally OK, We've not yet seen this issue in practice: the pipe buffer is big in comparison to message size (65K), so as long as the other side of the socket is active, we're generally OK. For reads, the messages are also likely to be smaller than PIPE_BUF.

@tmakatos
Copy link
Member

tmakatos commented Feb 5, 2021

If we're supposed to be non-blocking, there are multiple paths whereby we might block still:

I think it's wrong for vfu_dma_read/write()/vfu_irq_message() to wait for a quick response, there's no guarantee that the response will be quick, plus it doesn't work since the client might send a request which we'll mistake for the response we're waiting for (see #279 for more details).

Assuming we fix the above, I think we should reconsider looking at fixing it properly, can we really on it working properly the way it's implemented now if the host is under stress (fragmented memory, lots of context switches, lots of open files)?

@jlevon
Copy link
Collaborator Author

jlevon commented Feb 5, 2021

It's definitely brittle right now.

There are really only two options to fix:

  1. we go to sleep on a CV until we get a reply matching our ID. This is basically the way the qemu client code works.
  2. we return immediately from the API call, queue up the message+reply, then call a user-provided callback when we're done.

Only 2. can match the SPDK use case, I think, and it's probably not worse in terms of implementation than 1.

It's probably not too painful to do, but I think a lot might depend on callback expectations from SPDK

@swapnili
Copy link
Collaborator

swapnili commented Feb 5, 2021

At least for now SPDK is not using vfu_dma_read/write()/vfu_irq_message() and not sure if ever use in future.
Then until the real use case for handling the reply asynchronously, maybe we keep the behaviour simple as in vfu_dma_read/write()/vfu_irq_message() calls will be blocking until reply received from client.

@jlevon
Copy link
Collaborator Author

jlevon commented Feb 5, 2021

We could reject those calls if ATTACH_NB is set, maybe.

We still have the reply path of process_request() though.

@tmakatos
Copy link
Member

I think #296 is related to this in the sense that we have to store information in order to be able to send a response from a different context. The thing that makes it easy is that we don't have to support processing more requests while the single request on the migration region is pending. I think this solves the problem that the library is non-reentrant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants