Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pass server->client command over a separate socket pair #762

Merged
merged 15 commits into from
Sep 15, 2023
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
83 changes: 59 additions & 24 deletions docs/vfio-user.rst
Original file line number Diff line number Diff line change
Expand Up @@ -204,12 +204,32 @@ A server can serve:
1) one or more clients, and/or
2) one or more virtual devices, belonging to one or more clients.

The current protocol specification requires a dedicated socket per
client/server connection. It is a server-side implementation detail whether a
single server handles multiple virtual devices from the same or multiple
clients. The location of the socket is implementation-specific. Multiplexing
clients, devices, and servers over the same socket is not supported in this
version of the protocol.
The current protocol specification requires dedicated sockets per
client/server connection. Commands in the client-to-server direction are
handled on the main communication socket which the client connects to, and
replies to these commands are passed on the same socket. Commands sent in the
other direction from the server to the client as well as their corresponding
replies, are exchanged on a separate socket, which is set up during negotiation
(AF_UNIX servers just pass the file descriptor).
mnissler-rivos marked this conversation as resolved.
Show resolved Hide resolved

Using separate sockets for each command channel avoids introducing an
artificial point of synchronization between the channels. This simplifies
implementations since it obviates the need to demultiplex incoming messages
into commands and replies and interleave command handling and reply processing.
Note that it is still illegal for implementations to stall command or reply
processing indefinitely while waiting for replies on the other channel, as this
may lead to deadlocks. However, since incoming commands and requests arrive on
different sockets, it's possible to meet this requirement e.g. by running two
independent request processing threads that can internally operate
synchronously. It is expected that this is simpler to implement than fully
asynchronous message handling code. Implementations may still choose a fully
asynchronous, event-based design for other reasons, and the protocol fully
supports it.

It is a server-side implementation detail whether a single server handles
multiple virtual devices from the same or multiple clients. The location of the
socket is implementation-specific. Multiplexing clients, devices, and servers
over the same socket is not supported in this version of the protocol.

Authentication
--------------
Expand Down Expand Up @@ -488,21 +508,32 @@ format:

Capabilities:

+--------------------+--------+------------------------------------------------+
| Name | Type | Description |
+====================+========+================================================+
| max_msg_fds | number | Maximum number of file descriptors that can be |
| | | received by the sender in one message. |
| | | Optional. If not specified then the receiver |
| | | must assume a value of ``1``. |
+--------------------+--------+------------------------------------------------+
| max_data_xfer_size | number | Maximum ``count`` for data transfer messages; |
| | | see `Read and Write Operations`_. Optional, |
| | | with a default value of 1048576 bytes. |
+--------------------+--------+------------------------------------------------+
| migration | object | Migration capability parameters. If missing |
| | | then migration is not supported by the sender. |
+--------------------+--------+------------------------------------------------+
+--------------------+---------+-----------------------------------------------+
| Name | Type | Description |
+====================+=========+===============================================+
| max_msg_fds | number | Maximum number of file descriptors that can |
| | | be received by the sender in one message. |
| | | Optional. If not specified then the receiver |
| | | must assume a value of ``1``. |
+--------------------+---------+-----------------------------------------------+
| max_data_xfer_size | number | Maximum ``count`` for data transfer messages; |
| | | see `Read and Write Operations`_. Optional, |
| | | with a default value of 1048576 bytes. |
+--------------------+---------+-----------------------------------------------+
| migration | object | Migration capability parameters. If missing |
| | | then migration is not supported by the |
| | | sender. |
+--------------------+---------+-----------------------------------------------+
| reverse_cmd_socket | boolean | Indicates whether the client is capable of |
| | | using a separate socket as the channel for |
| | | server-to-client commands. If specified and |
| | | the server supports it, it will pass a file |
| | | descriptor along with its reply. This is |
| | | enabled by request only for the benefit of |
mnissler-rivos marked this conversation as resolved.
Show resolved Hide resolved
| | | clients that implement older drafts of this |
| | | specification which did not include |
| | | independent sockets per channel. |
+--------------------+---------+-----------------------------------------------+

The migration capability contains the following name/value pairs:

Expand All @@ -517,7 +548,9 @@ Reply
^^^^^

The same message format is used in the server's reply with the semantics
described above.
described above. In case the client specified ``reverse_cmd_socket`` in its
capabilities, the server may pass a file descriptor to use for the
server-to-client command channel.

``VFIO_USER_DMA_MAP``
---------------------
Expand Down Expand Up @@ -1399,7 +1432,8 @@ Reply
-----------------------

If the client has not shared mappable memory, the server can use this message to
read from guest memory.
read from guest memory. This message and its reply are passed over the separate
server-to-client socket if negotiated at connection setup.

Request
^^^^^^^
Expand Down Expand Up @@ -1437,7 +1471,8 @@ Reply
-----------------------

If the client has not shared mappable memory, the server can use this message to
write to guest memory.
write to guest memory. This message and its reply are passed over the separate
server-to-client socket if negotiated at connection setup.

Request
^^^^^^^
Expand Down
1 change: 1 addition & 0 deletions lib/private.h
Original file line number Diff line number Diff line change
Expand Up @@ -168,6 +168,7 @@ struct vfu_ctx {

int client_max_fds;
size_t client_max_data_xfer_size;
bool enable_cmd_conn;

struct vfu_ctx_pending_info pending;
bool quiesced;
Expand Down
59 changes: 51 additions & 8 deletions lib/tran.c
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <sys/socket.h>

#include <json.h>

Expand All @@ -55,6 +56,7 @@
* "migration": {
* "pgsize": 4096
* }
* "reverse_cmd_socket": true,
* }
* }
*
Expand All @@ -63,7 +65,8 @@
*/
mnissler-rivos marked this conversation as resolved.
Show resolved Hide resolved
int
tran_parse_version_json(const char *json_str, int *client_max_fdsp,
size_t *client_max_data_xfer_sizep, size_t *pgsizep)
size_t *client_max_data_xfer_sizep, size_t *pgsizep,
bool *reverse_cmd_socket_supportedp)
{
struct json_object *jo_caps = NULL;
struct json_object *jo_top = NULL;
Expand Down Expand Up @@ -129,6 +132,19 @@ tran_parse_version_json(const char *json_str, int *client_max_fdsp,
}
}

if (json_object_object_get_ex(jo_caps, "reverse_cmd_socket", &jo)) {
if (json_object_get_type(jo) != json_type_boolean) {
goto out;
}

errno = 0;
*reverse_cmd_socket_supportedp = json_object_get_boolean(jo);

if (errno != 0) {
goto out;
}
}

ret = 0;

out:
Expand All @@ -142,7 +158,8 @@ tran_parse_version_json(const char *json_str, int *client_max_fdsp,

static int
recv_version(vfu_ctx_t *vfu_ctx, uint16_t *msg_idp,
struct vfio_user_version **versionp)
struct vfio_user_version **versionp,
bool *reverse_cmd_socket_supportedp)
{
struct vfio_user_version *cversion = NULL;
vfu_msg_t msg = { { 0 } };
Expand Down Expand Up @@ -207,7 +224,7 @@ recv_version(vfu_ctx_t *vfu_ctx, uint16_t *msg_idp,

ret = tran_parse_version_json(json_str, &vfu_ctx->client_max_fds,
&vfu_ctx->client_max_data_xfer_size,
&pgsize);
&pgsize, reverse_cmd_socket_supportedp);

if (ret < 0) {
/* No client-supplied strings in the log for release build. */
Expand Down Expand Up @@ -267,13 +284,14 @@ recv_version(vfu_ctx_t *vfu_ctx, uint16_t *msg_idp,

static int
send_version(vfu_ctx_t *vfu_ctx, uint16_t msg_id,
struct vfio_user_version *cversion)
struct vfio_user_version *cversion, int reverse_cmd_socket_fd)
{
struct vfio_user_version sversion = { 0 };
struct iovec iovecs[2] = { { 0 } };
char server_caps[1024];
vfu_msg_t msg = { { 0 } };
int slen;
int ret;

if (vfu_ctx->migration == NULL) {
slen = snprintf(server_caps, sizeof(server_caps),
Expand Down Expand Up @@ -312,30 +330,55 @@ send_version(vfu_ctx_t *vfu_ctx, uint16_t msg_id,
msg.hdr.msg_id = msg_id;
msg.out_iovecs = iovecs;
mnissler-rivos marked this conversation as resolved.
Show resolved Hide resolved
msg.nr_out_iovecs = 2;
if (reverse_cmd_socket_fd != -1) {
msg.out.fds = &reverse_cmd_socket_fd;
msg.out.nr_fds = 1;
}

ret = vfu_ctx->tran->reply(vfu_ctx, &msg, 0);

return vfu_ctx->tran->reply(vfu_ctx, &msg, 0);
return ret;
mnissler-rivos marked this conversation as resolved.
Show resolved Hide resolved
}

int
tran_negotiate(vfu_ctx_t *vfu_ctx)
tran_negotiate(vfu_ctx_t *vfu_ctx, int *reverse_cmd_socket_fdp)
{
struct vfio_user_version *client_version = NULL;
int reverse_cmd_socket_fds[2] = { -1, -1 };
bool reverse_cmd_socket_supported = false;
uint16_t msg_id = 0x0bad;
int ret;

ret = recv_version(vfu_ctx, &msg_id, &client_version);
ret = recv_version(vfu_ctx, &msg_id, &client_version,
&reverse_cmd_socket_supported);

if (ret < 0) {
vfu_log(vfu_ctx, LOG_ERR, "failed to recv version: %m");
return ret;
}

ret = send_version(vfu_ctx, msg_id, client_version);
if (reverse_cmd_socket_supported && reverse_cmd_socket_fdp != NULL &&
vfu_ctx->client_max_fds > 0) {
if (socketpair(AF_UNIX, SOCK_STREAM, 0, reverse_cmd_socket_fds) == -1) {
return -1;
mnissler-rivos marked this conversation as resolved.
Show resolved Hide resolved
}
}

ret = send_version(vfu_ctx, msg_id, client_version,
reverse_cmd_socket_fds[0]);

free(client_version);

/*
* The remote end of the reverse cmd socket pair is no longer needed, the
* local one only if successful.
mnissler-rivos marked this conversation as resolved.
Show resolved Hide resolved
*/
close_safely(&reverse_cmd_socket_fds[0]);
if (ret < 0) {
vfu_log(vfu_ctx, LOG_ERR, "failed to send version: %m");
close_safely(&reverse_cmd_socket_fds[1]);
} else if (reverse_cmd_socket_fdp != NULL) {
*reverse_cmd_socket_fdp = reverse_cmd_socket_fds[1];
}

return ret;
Expand Down
5 changes: 3 additions & 2 deletions lib/tran.h
Original file line number Diff line number Diff line change
Expand Up @@ -72,10 +72,11 @@ struct transport_ops {
*/
int
tran_parse_version_json(const char *json_str, int *client_max_fdsp,
size_t *client_max_data_xfer_sizep, size_t *pgsizep);
size_t *client_max_data_xfer_sizep, size_t *pgsizep,
bool *reverse_cmd_socket_supportedp);

int
tran_negotiate(vfu_ctx_t *vfu_ctx);
tran_negotiate(vfu_ctx_t *vfu_ctx, int *reverse_cmd_socket_fdp);

#endif /* LIB_VFIO_USER_TRAN_H */

Expand Down
2 changes: 1 addition & 1 deletion lib/tran_pipe.c
Original file line number Diff line number Diff line change
Expand Up @@ -285,7 +285,7 @@ tran_pipe_attach(vfu_ctx_t *vfu_ctx)
tp->in_fd = STDIN_FILENO;
tp->out_fd = STDOUT_FILENO;

ret = tran_negotiate(vfu_ctx);
ret = tran_negotiate(vfu_ctx, NULL);
if (ret < 0) {
ret = errno;
tp->in_fd = -1;
Expand Down
13 changes: 10 additions & 3 deletions lib/tran_sock.c
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@
typedef struct {
int listen_fd;
int conn_fd;
int reverse_cmd_socket_fd;
} tran_sock_t;

int
Expand Down Expand Up @@ -380,6 +381,7 @@ tran_sock_init(vfu_ctx_t *vfu_ctx)

ts->listen_fd = -1;
ts->conn_fd = -1;
ts->reverse_cmd_socket_fd = -1;

if ((ts->listen_fd = socket(AF_UNIX, SOCK_STREAM, 0)) == -1) {
ret = errno;
Expand Down Expand Up @@ -464,7 +466,7 @@ tran_sock_attach(vfu_ctx_t *vfu_ctx)
return -1;
}

ret = tran_negotiate(vfu_ctx);
ret = tran_negotiate(vfu_ctx, &ts->reverse_cmd_socket_fd);
if (ret < 0) {
close_safely(&ts->conn_fd);
return -1;
Expand Down Expand Up @@ -615,14 +617,18 @@ tran_sock_send_msg(vfu_ctx_t *vfu_ctx, uint16_t msg_id,
void *recv_data, size_t recv_len)
{
tran_sock_t *ts;
int fd;

assert(vfu_ctx != NULL);
assert(vfu_ctx->tran_data != NULL);

ts = vfu_ctx->tran_data;

return tran_sock_msg(ts->conn_fd, msg_id, cmd, send_data, send_len,
hdr, recv_data, recv_len);
/* NB. use the reverse command socket file descriptor if available. */
fd = ts->reverse_cmd_socket_fd != -1 ? ts->reverse_cmd_socket_fd
: ts->conn_fd;
return tran_sock_msg(fd, msg_id, cmd, send_data, send_len, hdr, recv_data,
recv_len);
}

static void
Expand All @@ -636,6 +642,7 @@ tran_sock_detach(vfu_ctx_t *vfu_ctx)

if (ts != NULL) {
close_safely(&ts->conn_fd);
close_safely(&ts->reverse_cmd_socket_fd);
}
}

Expand Down
2 changes: 1 addition & 1 deletion samples/client.c
Original file line number Diff line number Diff line change
Expand Up @@ -197,7 +197,7 @@ recv_version(int sock, int *server_max_fds, size_t *server_max_data_xfer_size,
}

ret = tran_parse_version_json(json_str, server_max_fds,
server_max_data_xfer_size, pgsize);
server_max_data_xfer_size, pgsize, NULL);

if (ret < 0) {
err(EXIT_FAILURE, "failed to parse server JSON \"%s\"", json_str);
Expand Down
Loading