Skip to content

SegStringSymEntry to encapsulate Strings as single entity

glitch edited this page Jun 1, 2021 · 6 revisions

WARNING:DRAFT

This is one of the implementation approaches outlined in String pdarrays in Symbol Table Design Discussion Notes

Approach details

  • Single Symbol Table (keeps existing in place)
  • Additional SymEntry called SegStringSymEntry which extends GenSymEntry
  • SegStringSymEntry is a composite of the two SymEntry items which held the Strings components offsets and bytes aka segments and values

Since SymEntry is really a template for 4 different typed values of SymEntry we already have different types of SymEntries in the Symbol Table. Conceptually this adds to that design by including a fifth, albeit more complex, entry: SegStringSymEntry. It extends GenSymEntry just like SymEntry does and includes a conversion utility for casting to its specific type.

Design details

There were a few common patterns that emerged during the implementation & design for this approach.

Construction & Access

You can view SymEntry as a general holder class of a typed array and domain. The common pattern for creating these are to allocate a domain/array combination, reserve an id/name in the SymTab for use/interaction throughout the lifetime of the session, construct the Entry, insert it in the table, and return a message to the client containing its id and basic structure dimensions.

For the previous Strings approach we basically just doubled the above pattern for an offsets array and a separate bytes array (aka segments and values). A SegString object was usually constructed as a temporary object comprised of these two SymEntry's which contained various procedures for different types of operations.

This new implementation follows this pattern but encapsulates the two separate SymEntry's into a single one so that only one item need be tracked in the SymTab. Various factory methods have been added to aid the different construction patterns used throughout the server code. These factory methods generally funnel down to a single implementation which at the end of the day creates a SegStringSymEntry, puts it in the SymTab for tracking, and returns a SegString concrete object to be used by the rest of the server code. When a client wishes to interact with the object the server uses the appropriate factory method to retrieve the entry from the SymTab by name/id and constructs a SegString object to perform the requested operation.

Common ways of generating a SegString object fall under these patterns:

  • from segments[] int, values[] uint(8)
  • from 2 GenSymEntry objects (when the objects already exist in the SymTab as their own entries)
  • from 2 SymEntry objects not tracked in the SymTab
  • retrieved from SymTab by name/id

I funneled all of these access/construction patterns through factory methods which construct or cast a SegStringSymEntry, put it in the table if it's not there already, and return a SegString concrete object.

Type Coercion

Right now GenSymEntry is really acting like an interface for things going into the SymTab (i.e. Map<String, GenSymEntry>) and SymEntry is acting as a template for SymEntry:int, SymEntry:real, SymEntry:uint8, etc. As such there is a casting procedure for going from GenSymEntry-> typed SymEntry for the primitive types. For complex types I had to add another type casting procedure toSegStringSymEntry(gse: borrowed GenSymEntry). We'll likely need to add one for each new complex object we add to the server... there may be a better way to do this, I wasn't sure. However, coercing/casting from general to specific types happens all over the place.

Message passing

Most of the message passing between the client & server fall into the pattern of:

  • Request an operation be performed on an object identified by an id/name (client -> server), return the result of that operation (client <- server)
  • Create an object with values or using a procedure on the server to generate values, which returns the id/name of the object with basic meta-info.

Two things which popped out in the messaging layer

  1. We don't have a way to pass more than 1 array at a time from the client->server for array construction. This means we need to pass each one separate and then perform some type of assemble operation. i.e. you pass offsets, then pass bytes, then request the server to combine them into a SegStringSymEntry. Expanding this implementation to a Categorical will likely present a similar pattern. We'll have between 2-4 parts and need to assemble them in some way. Doing them individual for create isn't all that bad; I'm not sure of the tradeoffs of trying to encode things into a json array.

  2. Passing back information from server->client regarding the meta-data of the structure we created or the outcome of the result, etc. For Strings objects we had some sub-struct to our return values i.e. created id_1 int64 1 (1,) 1+created id_2 uint8 1 (1,) 0 1 etc (Note: this isn't 100% accurate, I'm just trying to convey the idea). The client generally splits on + and then looks for positional arguments delimited via whitespace. An issue I ran into here is that I don't need the second component anymore, but I also had an additional meta-data value to include in the first set of attributes. However, the SymTab.attrib / GenSymEntry only have a specific number of values. As things become more complex and have higher dimensional order, we'll need a better way to structure/encode this kind of information. For example I needed to return the number of bytes value for my new single entity, I ended up replacing the second SymEntry position with ...+created bytes.size 1234 An incremental way to improve this would be to include key=value pairs instead of position args and a more robust approach would be to encode this info as a json string.