How to handle common, non-primitive data types #70

jtc42 · 2020-05-02T18:51:33Z

This is likely going to be an open question for a while, but there are my current thoughts. All input is welcome.

I feel like, by and large, data collected from lab instruments can sensibly be converted to primitive data types. The most common types I have in mind are Numpy arrays, and Pandas data frames. Both of these can be represented easily with primitive data types.

There are however cases where data will be collected that cannot be converted to a primitive type.

In the new cbor branch, I've added a section to the JSON encoder that will base64 encode bytes Python objects. I've correspondingly included a Marshmallow Bytes field to handle validating binary data in this format. It populates the documentation with information about the string values being a base64 encoded block of binary data. Everything is fine on that front.

However, as @rwb27 has mentioned in the past, sometimes the binary data collected will be big enough that the b64 encoding overhead could become problematic. To handle these cases, I've included support for clients to accept application/cbor responses instead of application/json.

CBOR has built in support for binary encoded data, so if a client requests a CBOR response, no encoding overhead is introduced. The data gets passed directly to the CBOR response, otherwise identical to the JSON response, but with the binary section unencoded.

This solution isn't perfect though. The Thing Description is required to be JSON. This is fine in most cases as it accurately describes the base64 encoded binary blobs. However, it means that the CBOR response will deviate from the Thing Description, receiving a bytes type value where the Description says a string will be returned.

I currently feel however that the cases where large, non-primitive data files are being collected with such high frequency that CBOR encoding is required are infrequent enough that, given proper documentation, this solution could still be fine.

Again, thoughts are welcome.

Note: The CBOR branch is useful even aside from this. It's a much more compact data format that JSON, so for many cases it may be beneficial to actually communicate over BSON even without needing to transfer bytes objects. It was easy to add support, and doesn't affect the JSON functionality at all.

The text was updated successfully, but these errors were encountered:

ChasNelson1990 · 2020-05-11T09:58:17Z

I had to look up CBOR but this seems like a good solution.

Are you saying that the only negative (or most significant negative) is the divergence from the W3C Web of Things standard?

If so, have you brought this problem/solution to their forum? Somebody might provide a insight on any thoughts the working group(s?) have had. Also, a quick search says that they're currently rechartering the working group so now might be a good time to introduce new ideas for their consideration.

jtc42 · 2020-05-11T10:02:23Z

Yeah pretty much, though interestingly the Mozilla implementation actually already specifically describes both CBOR representations and WebSocket protocol bindings, so the newest versions of LabThings are based more heavily on the Mozilla implementation of the W3C standard.

I imagine that if the W3C add new information around these, Mozilla will update their implementation correspondingly. Our spec repo is forked from the Mozilla spec so we can easily make sure we’re synchronised with upstream.

Mozilla have made this much simpler than it would otherwise have been. Very happy!

jtc42 added the discussion label May 2, 2020

jtc42 assigned rwb27, ChasNelson1990 and jtc42 May 2, 2020

jtc42 pinned this issue May 25, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to handle common, non-primitive data types #70

How to handle common, non-primitive data types #70

jtc42 commented May 2, 2020

ChasNelson1990 commented May 11, 2020

jtc42 commented May 11, 2020

How to handle common, non-primitive data types #70

How to handle common, non-primitive data types #70

Comments

jtc42 commented May 2, 2020

ChasNelson1990 commented May 11, 2020

jtc42 commented May 11, 2020