Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add send_serialized_message() #75

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

sl1200mk2
Copy link
Contributor

Hi,
don't know if this can be useful...

++

@radarsat1
Copy link
Owner

Hi, I think it could be. Already it was intended to be able to deserialize memory into a lo_message and then use lo_send_message(), but in that case it will re-serialize, so I suppose this bypasses such an unnecessary operation in the case that a serialized message is available. Do you have a specific use case for it?

@sl1200mk2
Copy link
Contributor Author

sl1200mk2 commented Mar 28, 2019 via email

@radarsat1
Copy link
Owner

Ok, interesting. I think it could have some use but exposing internals is something I'll do I guess only if it's really needed for someone.

If you are sending multiple messages to multiple destinations, have you considered using multicast?

@sl1200mk2
Copy link
Contributor Author

sl1200mk2 commented Mar 28, 2019 via email

@7890
Copy link
Contributor

7890 commented Mar 28, 2019

I like the idea too .. thinking out loud
One use case is if the exact same message is sent over and over. It will save a few de/serialize steps. It's more of a theoretical thing since messages are usually small and very quickly serialized. For many iterations or larger messages, it makes a difference but it's unclear if significant.
Eg. if messages are static and have no runtime dependent values, it would be possible to store the complete set of messages in already serialized format.
If liblo has a way to send serialized messages, it would also allow another liblo instance or 3rd party logic to create binary OSC content (ship it via fifo or else) and liblo will take care of the network related stuff for sending and receiving by using the pre-"baked" messages.

@radarsat1
Copy link
Owner

The only reason I haven't accepted this is that something about the inelegance of exposing internals and directly sending "raw" memory was bugging me... after a bit of thought I think I can elaborate, because I realized it speaks to some inefficiencies in liblo that could or should be addressed.

I got to thinking that it would be nicer to be able to construct a lo_message as a simple wrapper for serialized memory, since the memory used by lo_message is already partially serialized -- it only keeps the typestring and argument data separate but otherwise the data is more or less serial. So, in send_data, which calls lo_message_serialize(), a buffer is allocated just to temporarily make a copy this stuff in serial to be passed to send_data.

Then I realized that it also of course prepends it with a path. So, the only thing stopping for keeping around a pre-serialized copy of a lo_message is the fact that lo_message is not associated with a path until sending -- something that I've found a bit odd to be honest.

It makes we wonder if we should support the following:

  • Optionally keep a path in the lo_message.
  • Allow lo_message to point to "serialized" memory (only re-allocating if the message is modified)
  • In the case of serialized memory, an ownership flag stopping deallocation of the memory.
  • Allow lo_send_message to have path=null, if !=null, then override lo_message path.

That way a user could create a lo_message wrapping his own raw OSC representation. It would allow one-time validation of the raw memory. It would also transparently avoid temporary allocations and serializations when re-sending the same message, so explicitly serializing purely for efficiency (which imho is conflating two semantically distinct operations) would not even be necessary.

On the points in @7890's message:

One use case is if the exact same message is sent over and over. It will save a few de/serialize steps. It's more of a theoretical thing since messages are usually small and very quickly serialized. For many iterations or larger messages, it makes a difference but it's unclear if significant.

I think the inefficiency is more in the short-lived temporary allocations that are made. There are in general too many allocations made by liblo during "real time" operation, so if we can reduce them without breaking backwards compatibility it's a win.

Eg. if messages are static and have no runtime dependent values, it would be possible to store the complete set of messages in already serialized format.

I do wonder further if there's anything we could do with regards to handling multiple messages, maybe a new object lo_message_set. It would be similar to lo_bundle of course, but with the semantics of multiple subsequent messages instead of an actual OSC bundle. I don't know if the concept is useful..

If liblo has a way to send serialized messages, it would also allow another liblo instance or 3rd party logic to create binary OSC content (ship it via fifo or else) and liblo will take care of the network related stuff for sending and receiving by using the pre-"baked" messages.

liblo already supports using an external transport, you can serialize to memory, transport it however you want (eg. memory copy to a thread, shared memory to another process, etc), and then on a server, you can dispatch directly from memory using lo_server_dispatch_data.

However what you are saying appears to be the opposite? Use external libraries for OSC handling, and use liblo only for networking? In that case what is the use of liblo, why not just use a different networking library?

I suppose liblo provides some nice things on the network layer like stream handling and multicast support, but I don't know if it's sufficient to warrant fully supporting liblo-as-a-networking-library.

What do you think?

I hesitate, since the more "features" liblo supports, the more maintenance burden it is. I don't mind adding features if they are specifically useful to people, but that's why I ask for use cases when these things come up.

Even if something is a "nice to have", I don't want to add lots of maintenance surface unnecessarily.

liblo was always intended as an easy to use library, not with a major focus on efficiency -- that is why I am so intent on keeping backward compatibility.

If I were to break things to improve efficiency, may as well write a new API that focuses on avoiding allocations entirely, etc.

@7890
Copy link
Contributor

7890 commented Mar 29, 2019

Let me read this a few times to catch every bit.
One thing is already clear: backward compatibility is extremely important.

@radarsat1
Copy link
Owner

One thing is already clear: backward compatibility is extremely important.

Yes, but my point was more to explain why I give such careful consideration to even very simple proposals such as this PR. It's not just about backwards compatibility, but maintenance burden.

I might just merge something simple like this because it seems useful, and then later someone comes along and says, "look, now liblo is vulnerable because you can send unvalidated memory, etc.. and everything would be fixed if you'd add a buffer length argument.." I'm not saying this PR specifically is exposing a vulnerability (I think it's not), but my point is that, due to a focus on backwards compatibility, and a general need to support things in the future, API decisions have to be made very carefully, and the less "surface" liblo has, the better.

So, feel free to ignore those comments :) but I'm just justifying why I'm turning this simple PR into what may seem to be an overly complicated discussion.

@sl1200mk2
Copy link
Contributor Author

sl1200mk2 commented Mar 29, 2019 via email

@7890
Copy link
Contributor

7890 commented Mar 29, 2019

@radarsat1 I absolutely agree, this has to be considered. Adding a new feature is one thing, maintaining it and foreseeing possible side-effects is another. This care is essential for a robust established library! I see absolutely no hurry for changes, it's still in the alpha phase where every thought is allowed.

The example of using liblo "just" for networking was not a good one. But it can be done, alone the server_thread handler is very convenient, there is SLIP, TCP, ... I think the point I wanted to make is that the separation of OSC as a content format and OSC as a network ~ "protocol" would be more complete with an entry point such as the send_serialized_message.

As for real-time, the library is by design not rt-safe from what i understand (everything should be pre-allocated etc. which simply doesn't make sense for liblo). There is an implementation "rtosc" which is only a content format without the networking part that specifically is meant for rt. That said, I have used liblo OSC in rt-sensitive context without any issue. It's only in theory that allocating a few bytes would cause an issue. I know this is not 100% correct but testing is believing.

I have to understand better and think more about the proposals in your second last post, this is why I was "short" on just the compatibility topic (this one was the simplest to grasp).

@7890
Copy link
Contributor

7890 commented Mar 29, 2019

@radarsat1 reading again now, I get a better gist of your ideas.

What you outlined matches my understanding of liblo as it is, adding values happens in a sequence (and can thus be serialized), already set values eg. string can not be changed later because of variable length. Only when message is prepared to send over transport or serialized, path and typetag string is prepended to the value buffer.

It makes we wonder if we should support the following:

Optionally keep a path in the lo_message.
Allow lo_message to point to "serialized" memory (only re-allocating if the message is modified)
In the case of serialized memory, an ownership flag stopping deallocation of the memory.
Allow lo_send_message to have path=null, if !=null, then override lo_message path.

That way a user could create a lo_message wrapping his own raw OSC representation. It would allow one-time validation of the raw memory. It would also transparently avoid temporary allocations and serializations when re-sending the same message, so explicitly serializing purely for efficiency (which imho is conflating two semantically distinct operations) would not even be necessary.

Wouldn't this involve relatively deep changes to the lo_message type?
However the outlined impact sounds very good to me.

I think the inefficiency is more in the short-lived temporary allocations that are made. There are in general too many allocations made by liblo during "real time" operation, so if we can reduce them without breaking backwards compatibility it's a win.

+1 for reducing when it can happen as a logical no-op. I have to underline efficiency of liblo was never a problem (for me at least).

I do wonder further if there's anything we could do with regards to handling multiple messages, maybe a new object lo_message_set. It would be similar to lo_bundle of course, but with the semantics of multiple subsequent messages instead of an actual OSC bundle. I don't know if the concept is useful..

The case I was thinking of is a program that has say 10 messages it can send (static, serialized ones). It would not send them all at once, but take the one needed from an array, or even from a lo_message_set. It was a totally made-up example to sell the "-" for osc_dump :)
The case you refer to is to send a sequence of messages in one go, eventually with timestamps, without packing them into a bundle is that correct? I haven't thought about that, this would co-operate with the tee / pipe schemes I suppose.

It started with dump and the process to discuss tangents is enjoyable for me, sharing ideas and implications. Please be assured I didn't want to hi-jack this PR guys :)

@7890
Copy link
Contributor

7890 commented Mar 29, 2019

Wait. "-" was in the other PR. The pre-baked message set was a made-up example in this thread. I tried to make sense for send_serialized_message(). I'm innocent.

@radarsat1
Copy link
Owner

That's okay, the message set thing just occurred to me as I was writing, because both PRs are discussing serialization of messages, so my mind naturally went towards "what if there are two messages.." However, it's a topic for another issue, for sure, and in fact I don't think it's necessary after some thought.

Wouldn't this involve relatively deep changes to the lo_message type?

Just a bit of memory handling and checking some flags for ownership states. The only major change would be adding an optional path to lo_message, and indeed considering whether it is a weird idea to have lo_message "maybe" associated with a path but "not always".

Unfortunately I can't think of a way to handle keeping serialized memory around without also adding a "path" field to lo_message. But it would be great to avoid multiple allocations and copies every time one does lo_send_message on the same message.

I suppose a work-around could be to keep the path data on the first lo_message_serialize, but not make it a 1st-class "field" (no accessor), and always just check internally that it matches whatever is provided to lo_send_message. Then re-allocation would still happen if you are calling lo_send_message on the same message with different paths, which is still kind of ugly. A user could use lo_message_clone for efficiency in that case.

Paths would have to be checked but that can be quite efficient** -- at least as efficient as a memory copy anyways, with the advantage of no allocation.

** while we are going on tangents: if i was blue-skying this I would create a lo_path structure instead of using raw character strings, and include a checksum for comparisons -- i've long thought that would be useful for message dispatch too.

@radarsat1 radarsat1 force-pushed the master branch 2 times, most recently from e60a7d8 to 5668467 Compare March 29, 2019 16:30
@7890
Copy link
Contributor

7890 commented Mar 29, 2019

Taking a step back, my feeling is that this can get too complicated for eventually little benefit if it would be wished to deeper integrate this to the library.

There were good ideas to keep somewhere:

oscsend - <multiple messages> | oscdump -

oscsend - <multiple, timed messages> | oscdump - | oscsend osc.udp://destination -

Such prototypes are good to shape ideas. The multiple message / stream thing is something that might be worth to further develop. It's not unlike SLIP ~

@radarsat1 radarsat1 force-pushed the master branch 3 times, most recently from c918cfa to 735a872 Compare March 29, 2019 16:49
@7890
Copy link
Contributor

7890 commented Mar 29, 2019

@radarsat1 Checksum of path to compare? Interesting idea.
Or using a common dictionary to send "unstyled" messages, like with css. The decoration can be done when received, using a dictionary, eg. sender would just send /42 and receiver knows this is actually mapped to /channel/4/mute. OSC is so flexible. It can be verbose or dense, depending on use case.

Eventually a place to put/keep ideas and wishes (it's almost at no cost) would be good, i have tons to render a next version of liblo incompatible :) Joke aside, some things could go there that are not filed as issue/bug or PR. What are your thoughts

@radarsat1
Copy link
Owner

Just thought of an unintended consequence of changing memory during lo_message_serialize (keeping the serialized version), is that it would no longer be thread safe, i.e, @sl1200mk2 's solution would require a mutex:

I'll try to register several thread to send to several destination, so this PR might be unnecessary.

That said liblo is generally not so thread safe. I'm not sure that data sending is thread-safe even now, since local buffers are allocated. Something to keep in mind.

@radarsat1
Copy link
Owner

@radarsat1 Checksum of path to compare? Interesting idea.

Yes, apart from pattern support, it should be possible. As it is all handlers are checked one byte at a time during dispatch. It's something I've had in mind for a long time but never got around to implementing.

Or using a common dictionary to send "unstyled" messages, like with css. The decoration can be done when received, using a dictionary, eg. sender would just send /42 and receiver knows this is actually mapped to /channel/4/mute. OSC is so flexible. It can be verbose or dense, depending on use case.

There are a few ideas similar to this found in osc mailing list and even NIME papers I think.

A personal idea I had along these lines, again something I never implemented, was to build something similar at the transport level instead of the application/OSC layer. Basically the game is to replace long and repeated parts of the data with shorter versions, which if done incrementally, you get exactly the LZW algorithm, so I always thought piping a TCP stream through an online compression algorithm would be a great idea. Downside, does not work for packeted data, and requires transmitting an index. Anyways the hash thing is more just an implementation detail and transparent to the user.

Eventually a place to put/keep ideas and wishes (it's almost at no cost) would be good, i have tons to render a next version of liblo incompatible :) Joke aside, some things could go there that are not filed as issue/bug or PR. What are your thoughts

It seems the open sound control mailing list isn't working any more unfortunately, it's been a long time since I checked into it. At least the link on opensoundcontrol.org is down and the last posts on the forum there are 7 years old.

There is the liblo mailing list, but I'm find with discussing ideas in issues/PRs too. I do think more communication between different OSC implementation authors would be useful, and I don't know where that should take place right now.

i have tons to render a next version of liblo incompatible

indeed, a breaking major version is not of the question ! I also have lots of ideas, but ideas are cheap, implementations are more work ;)

In general if I were to break liblo I would get rid of all the "with" and "from" functions that have grown over the years and reduce it to a more common interface, backed by user-supplied structs for configuration. I would also design is specifically for easy binding to other languages. It's never really felt worth it though, as liblo works fine as is, is fairly easy to use and to bind, and I'm actually quite happy with how few issues come up these days. Anyways I am also afraid of open sound control going out of favour for something newer after having done a bunch of work on it.

@radarsat1 radarsat1 force-pushed the master branch 3 times, most recently from f631bd5 to 5d6f13c Compare March 29, 2019 17:41
@7890
Copy link
Contributor

7890 commented Mar 30, 2019

These design decisions made earlier (years ago..) are part of the success of this non-changing format. I can see how compressed OSC could be of use, then again it would have made it slightly more complicated (alone to handle that variation is compressed / not compressed), and some implementations would support it, others not. Today from what I see the main differences between implementations is which argument types are supported, and the minimal set is practically always there. So that's good.
As for the mailinglist, indeed there wasn't a mesage for a long time!
Could be worth to try to save these mails to a commonly accessible place to keep some of the information therein. I've done this for the jack mailing list, creating static HTML pages from raw mail files, to put it into a repo with gh pages.
Related: I created an organization for OSC some time ago and invited a few people that I thought could fit in. It's basically inactive. The idea was to collect OSC documents, discuss related topics etc. I've just added you and made you an owner with full control.
I've collected the specs and put them here https://github.com/7890/osc_spec, also in a transcribed asciidoc format. Maybe we can slowly revive some OSC community and make osc_spec a repo of the organization to maintain the spec and collect ideas there. Nothing well-thought-through yet.

@sl1200mk2
Copy link
Contributor Author

sl1200mk2 commented Mar 30, 2019 via email

@radarsat1
Copy link
Owner

Thanks for the response! Just want to clarify a few things..

These design decisions made earlier (years ago..) are part of the success of this non-changing format. I can see how compressed OSC could be of use, then again it would have made it slightly more complicated (alone to handle that variation is compressed / not compressed)

Just to be clear, I was not suggesting adding compression or otherwise changing the OSC protocol. I was talking only about transport-layer changes, adding some more possibilities for layers between TCP and OSC. Right now liblo supports TCP streams packetized by SLIP or length-prefix methods -- the packetization requires some kind of metadata, even if that's just "end of packet" tokens, so how to do that must be agreed on between implementations.

A problem with supporting even two, of course, is detecting which one is used at the beginning of the stream. Liblo does this in server.c / function detect_slip, which is rather heuristic. Adding more protocols would make this more difficult. So I am just brainstorming here. I think there could be some use case for a compression layer over TCP, similar to how HTTP transparently supports gzip. At least it would be more transparent than changing the OSC protocol itself to support compression (via "path aliases") as people have proposed in the past.

Secondly I thought it could be an opportunity to add websocket support directly in liblo, which actually wouldn't be too hard I think. WebSocket is its own transport layer packet protocol over TCP, so you'd get TCP -> WebSocket packets -> reconstruct stream from WS packets -> re-packetize OSC (delimited how.. SLIP or length-prefix again!) Anyways, no idea if this would be interesting for people but it would make adding web interfaces to OSC programs a breeze.

some implementations would support it, others not.

Yes, of course that's true. UDP is the only universally supported transport for OSC as far as I know. Many implementations do not even support basic TCP streams. Already liblo supports two types of streams (SLIP and length-prefix) which are not universally supported. On the libmapper project we make use of the TCP protocol, it is not without use even if not universally supported.

I've collected the specs and put them here https://github.com/7890/osc_spec

Very nice initiative, thank you, I'll star it.

the path part of message is hardcoded in a buffer at the very early stage of the program and never change.

Yup I think this is very common, which is why I can see it actually being useful to embed this in the lo_message object. Have to give it a try and see if it seems an efficient thing to do..

I could avoid lo_message creation + fill, and if I alloc enough room to the buffer

I see two interpretations here. Either you mean you are serializing to a buffer and never touching it again, which is what I think could be eventually encapsulated by lo_message, OR you mean you are building your serialized buffer yourself without any lo_message_add_* functions? In the latter case I am confused as you are not using liblo much at all in that case..

It only makes sense to "freeze" the backing buffer on lo_message_serialize, after having built the message, because every time you add a new argument with eg lo_message_add_int32, it must add a character to the end of the typestring as well as some data to the end of the message. So it cannot leave the message "in place", it would have to shift the entire buffer if it was trying to operating on a pre-serialized buffer. In other words, some dynamic allocation is just required while building messages, to avoid shifting everything constantly. Therefore, lo_message maintains separate buffers for type string and data, and serializes them in one go on lo_message_send via lo_message_serialise.

But I take the point mainly that you want to do lo_message_serialise only once, instead of every time lo_message_send is called, which totally makes sense and I think could be encapsulated directly in lo_message. We could also provide a lo_message_freeze() function that could optionally allow you to provide the backing buffer without lo_message taking ownership, then you could do your pre-allocation thing.

I would have to work on it to see about feasibility but it seems possible.

Basically your program would be something like,

lo_message m = lo_message_new();
lo_message_add_...(m, ...);
lo_message_add_...(m, ...);
lo_message_add_...(m, ...);
char mymem[2048];
lo_message_freeze(m, "/my/path", mymem, 2048); // keep internal pointer to data, but do not own it
 //.. later..
lo_send_message(s, "/my/path", m); // does not call lo_message_serialise() as long as path matches
// message automatically "unfrozen" if path changes? or error?

or alternatively,

lo_message m = lo_message_new();
lo_message_add_...(m, ...);
lo_message_add_...(m, ...);
lo_message_add_...(m, ...);
// no prior call to lo_message_freeze().

lo_send_message(s, "/my/path", m); // calls lo_message_freeze(m, "/my/path", 0, 0)
// lo_message_freeze calls lo_message_serialise but only first time
// which allocates backing memory for serialisation
// lo_message owns the data and free()s it when destroyed
// automatically "unfreeze" if lo_send_message() called with different path

I believe this covers the same reasons to use send_serialized_data(), but without unnecessarily exposing raw serialized representation. (Of course we could provide functions to retrieve the backing memory, the "frozen" path, etc., if those seem useful.. although it seems safer to leave them opaque)

@7890
Copy link
Contributor

7890 commented Mar 31, 2019

Right now liblo supports TCP streams packetized by SLIP or length-prefix methods -- the packetization requires some kind of metadata, even if that's just "end of packet" tokens, so how to do that must be agreed on between implementations.

I think I understand better now. A compressed transport would be "on top of" TCP, and how the encoding and decoding is made is an agreement between sender/receiver. Let's say liblo would invent a compressed format and offer a reference implementation, it should then be possible for other libraries to adapt. With a compressed format, the amount of bytes transferred over the network would be smaller. I wondered how the additional encoding/decoding would relate to the saved bytes. I guess there is some trade-off totally depending on the use case (message sizes, how many compressable strings, processing power of sender/receiver, ...). Overall I like the idea!

Secondly I thought it could be an opportunity to add websocket support directly in liblo, which actually wouldn't be too hard I think. WebSocket is its own transport layer packet protocol over TCP, so you'd get TCP -> WebSocket packets -> reconstruct stream from WS packets -> re-packetize OSC (delimited how.. SLIP or length-prefix again!) Anyways, no idea if this would be interesting for people but it would make adding web interfaces to OSC programs a breeze.

This would be awesome and super hot. It's yet unclear to me if it can be done without making liblo a HTTP server, I guess there is a cut somewhere and liblo would "just" care about feeding/reading from a socket basically? Making up a full-scale scenario will be helpful, from a browser setting up the websocket, the HTTP counter part and where liblo would jump in. From what I understand now, there is another piece involved to setup the whole "graph" for sending back and forth from/to browser. I think the process of setting up a secure ws connection from the browser (this includes a combo of special headers, eventually cookies and all that stuff) is something that can happen outside of liblo. Say to authenticate alone, the webserver part might do a db lookup and a lot of other stuff before redirecting to WS protocol. Thinking about it - a small diagram would help to identify the involved parts and information flow. In my imagination, any other process can care about it up to a point where we have a bi-directional socket from liblo to the browser, and from there on, liblo takes control until it eventually gives it back to a HTTP server that cares about what happens after WS was closed. I'm not sure if this makes any sense..

Interesting points for "freeze". I don't have anything reasonable to say, I have to read again and think more. It seems like you have made progress in evaluating how it could be done, that's fantastic to see!

@7890
Copy link
Contributor

7890 commented Mar 31, 2019

A short addition: some browsers only allow websocket connections if they are secure (https) and possibly other restrictions (such as same origin or similar). I've seen this in mobile Safari.

Addition 2: The fetch() API seems like a perfect match to receive and parse binary data stream too (without WS)

@radarsat1
Copy link
Owner

Your points about https and the complications that entails are taken, it's a very good point and probably a show stopper for me implementing it myself, quite beyond the scope of liblo. Already liblo supports a way to access serialized/deserialized data, perhaps a standard method to provide a transport "glue" might help (eg a callback that allows to pass stream data on to another library), along with some examples of usage with an existing HTTP/websockets library, would be a more realistic approach.

@7890
Copy link
Contributor

7890 commented Apr 10, 2019

I like the callback approach! At some point we should make scenarios visible with simple diagrams (I'm sure I'm not the only graphical thinking person).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants