Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add Filestore backend #25

Merged
merged 17 commits into from
May 28, 2024
Merged

Conversation

tangkong
Copy link
Contributor

@tangkong tangkong commented May 10, 2024

Description

Still some work to be done, but I think I should open this up to review before I go too off the rails.

  • Adds Filestore Backend
  • Adds Root to the model definitions for holding top-level entries
  • Added UUID to potentially lazy fields.

Motivation and Context

Much of this is inspired from lessons learned from happi, forgive any vestigial bits that have made their way in.

What's going on here?

Internally, the FilestoreBackend holds one data structure in addition to the Root:

  • an entry cache, mapping uuid to lazy-ified Entry

The entry cache is used to quickly grab an Entry or search without recursing through the tree (avoiding double hits). With each CRUD operation, the backend uses this cache to reconstruct the Root object, filling in uuid references where they exist.

In the future, I expect us to keep a sort of uuid-link-map, mapping a uuid to other uuids that reference it.
When other backends delete an entry, we will need this information to make sure we dereference the other entries properly. However since the FilestoreBackend reconstructs the entire root with every modification, it's not strictly needed

Why did you add a Root object?

In any backend context, we need some way to know what Entrys are at the top level of the tree structure. While in the Filestore backend case we might store the Entrys in a "filled" state (with full objects as children recursively), if we ever want to flatten this structure we will need to keep track of the top-level items.

A simple "Root" object (holding only a list of Entrys) seemed like a simple way to do this.

Why did @as_tagged_union come back?

It may not be totally needed (we could specify each of the Entry subclasses in Root), but I think it helps the readability of the serialized output. Also there are issues with Value -> Setpoint/Readback, the nested tagged-union has been causing issues with serialization round trips

UUIDs are back

Admittedly we could probably run through the entire filestore database every time we searched for an Entry, but I used this as a potential example of how we might implement entry caching, lazy loading, and tree reconstruction. The rough idea for the filestore backend side is to regenerate and save the json db with every action.

How Has This Been Tested?

Added some unit tests, more to come.

Where Has This Been Documented?

This PR

Pre-merge checklist

  • Code works interactively
  • Code follows the style guide
  • Code contains descriptive docstrings, including context and API
  • New/changed functions and methods are covered in the test suite where possible
  • Test suite passes locally
  • Test suite passes on GitHub Actions
  • Ran docs/pre-release-notes.sh and created a pre-release documentation page
  • Pre-release docs include context, functional descriptions, and contributors as appropriate

@tangkong tangkong linked an issue May 11, 2024 that may be closed by this pull request
@tangkong
Copy link
Contributor Author

tangkong commented May 13, 2024

I'll attempt to describe my current sticking point so I can come back to it when I have more time. Our schema has everything inheriting from Entry. A subclass Value is then subclassed again into Setpoint and Readback, with the current intent to use all three. Currently this creates issues where apischema cannot distinguish what type is desired.

  • If we hint Value, but supply a Setpoint or Readback, we will lose all subclass information since we cast the object to a Value
  • If we hint Setpoint or Readback, we can no longer use Value, since it is not technically a Setpoint or Readback (a square is a rectangle but a rectangle is not a square)
  • If we hint all three, apischema short-circuits at the first applicable hint.

I think this behavior all makes sense, given the ambiguity of these classes. I see two main options:

  • Stop using Value, and leave it act only as an abstract base class of sorts. Then hint only Setpoint/Readback
  • Make Setpoint/Readback not inherit from Value, and duplicate the fields. Then hint all 3. (I favor this solution)

I tried looking around for apischema-voodoo that would let us use a tagged_union of a tagged_union, but didn't anything super simple.

@tangkong
Copy link
Contributor Author

This has gone on long enough. I'll request reviews and give it another once-over tomorrow with fresher eyes.

@tangkong tangkong marked this pull request as ready for review May 15, 2024 23:58
@tangkong tangkong changed the title ENH/WIP: Add Filestore backend ENH: Add Filestore backend May 15, 2024
Copy link
Member

@ZLLentz ZLLentz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review, I'm out on Friday but I can continue next week if needed

Comment on lines +25 to +28
path = os.path.expanduser(path)
if not os.path.isabs(path):
return os.path.abspath(os.path.join(basedir, path))
return path
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This utils function is a bit weird to me. In a lot of cases it's not necessary to provide a basedir to figure out a full absolute path, and in many cases this input arg is ignored entirely. Maybe it has specific utility in suggesting directories to save in?

superscore/backends/filestore.py Outdated Show resolved Hide resolved
superscore/errors.py Outdated Show resolved Hide resolved


@pytest.fixture(scope='function')
def sample_database() -> Root:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this at odds with the fixture defined in

def linac_backend():
or do they serve different purposes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think they can coexist. Having multiple test backends can't hurt right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If they both exist as in-memory databases for testing and if they disagree on how to set up such a database, it's possible that we can get confused later.

Copy link
Contributor Author

@tangkong tangkong May 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fair. I would can get behind unifying how we set them up, rather than avoiding having multiple test backends

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I support having multiple as long as the interface is consistent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

captured some thoughts in: #33

@tangkong
Copy link
Contributor Author

An example serialized file if we remove @as_tagged_union and just hint all the types of Entry: losing the class name

Details

{
  "meta_id": "a28cd77d-cc92-46cc-90cb-758f0f36f041",
  "entries": [
    {
      "uuid": "dfce8fb3-0a13-4fb3-b34e-aa7752decf26",
      "description": "parameter 1 in root",
      "creation_time": "2024-05-16T23:52:28.879185+00:00",
      "pv_name": "MY:MOTOR:mtr1.ACCL",
      "abs_tolerance": null,
      "rel_tolerance": null,
      "readback": null,
      "read_only": false
    },
    {
      "uuid": "0c566d5a-5ec8-4e7c-9d65-625a0d44185e",
      "description": "",
      "creation_time": "2024-05-16T23:52:28.879216+00:00",
      "pv_name": "MY:MOTOR:mtr1.ACCL",
      "data": 2,
      "status": 18,
      "severity": 4,
      "readback": null
    },
    {
      "uuid": "24fc4d5d-545f-4b54-9a7d-233866f0cb0d",
      "description": "collection 1 defining some motor fields",
      "creation_time": "2024-05-16T23:52:28.879236+00:00",
      "title": "collection 1",
      "children": [
        {
          "uuid": "067e4eb3-b745-49b9-97ba-669c17b7a1a3",
          "description": "motor field ACCL",
          "creation_time": "2024-05-16T23:52:28.879277+00:00",
          "pv_name": "MY:PREFIX:mtr1.ACCL",
          "abs_tolerance": null,
          "rel_tolerance": null,
          "readback": null,
          "read_only": false
        },
        {
          "uuid": "c1ea420c-735f-4299-811c-4726863df69c",
          "description": "motor field VELO",
          "creation_time": "2024-05-16T23:52:28.879314+00:00",
          "pv_name": "MY:PREFIX:mtr1.VELO",
          "abs_tolerance": null,
          "rel_tolerance": null,
          "readback": null,
          "read_only": false
        },
        {
          "uuid": "68581b2c-a69e-4695-94ef-2e6de2c81605",
          "description": "motor field PREC",
          "creation_time": "2024-05-16T23:52:28.879350+00:00",
          "pv_name": "MY:PREFIX:mtr1.PREC",
          "abs_tolerance": null,
          "rel_tolerance": null,
          "readback": null,
          "read_only": false
        }
      ],
      "tags": []
    },
    {
      "uuid": "32e5d77c-350e-4842-807a-5e281f1a8023",
      "description": "Snapshot 1 created from collection 1",
      "creation_time": "2024-05-16T23:52:28.879255+00:00",
      "title": "snapshot 1",
      "origin_collection": null,
      "children": [
        {
          "uuid": "f71c5363-a7e0-479e-b748-f034eb9f75f4",
          "description": "",
          "creation_time": "2024-05-16T23:52:28.879295+00:00",
          "pv_name": "MY:PREFIX:mtr1.ACCL",
          "data": 2,
          "status": 18,
          "severity": 4,
          "readback": null
        },
        {
          "uuid": "74405890-5fbb-4c44-916a-1ffc728d8e70",
          "description": "",
          "creation_time": "2024-05-16T23:52:28.879331+00:00",
          "pv_name": "MY:PREFIX:mtr1.VELO",
          "data": 2,
          "status": 18,
          "severity": 4,
          "readback": null
        },
        {
          "uuid": "0fec8587-6cc7-41e8-823f-11b43f3ed3d9",
          "description": "",
          "creation_time": "2024-05-16T23:52:28.879367+00:00",
          "pv_name": "MY:PREFIX:mtr1.PREC",
          "data": 6,
          "status": 18,
          "severity": 4,
          "readback": null
        }
      ],
      "tags": [],
      "meta_pvs": []
    }
  ]
}

Copy link
Member

@ZLLentz ZLLentz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more thoughts from EOD reviewing

superscore/model.py Show resolved Hide resolved
data: Optional[AnyEpicsType] = None
status: Status = Status.UDF
severity: Severity = Severity.INVALID


@dataclass
class Setpoint(Value):
class Setpoint(Entry):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is Value used for after this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me there are Values that are not Setpoint-Readback pairs and I wanted to leave that option open. If people are ok with having readback=None for that type of Value, we can remove Value

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer having Setpoint and Readback inherit from Value, but I know that that's causing issues with your serialization. If we can't resolve that issue, then I'm fine with removing Value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove Value then. I think it's unfortunate that I can't get the tagged-unions-of-tagged-unions to work, but from a usability standpoint I don't think there's much of a difference. It's not the most DRY code in the world but I'd argue there are more offensive things out there...



@pytest.fixture(scope='function')
def sample_database() -> Root:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If they both exist as in-memory databases for testing and if they disagree on how to set up such a database, it's possible that we can get confused later.

from superscore.type_hints import AnyEpicsType
from superscore.utils import utcnow

logger = logging.getLogger(__name__)
_root_uuid = _root_uuid = UUID("a28cd77d-cc92-46cc-90cb-758f0f36f041")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the idea behind this root uuid? It looks like it is static and is placed into every file exported by the filestore backend?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is static. The concept behind Root is to have a single entry in the database with this UUID. This entry holds the top level Entrys, for constructing a tree-view for example.

Otherwise there would be no way to determine which Entrys are at the root level, and distinguish them from child Entrys

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, it stuck out as me as weird since I assumed the root uuid was a database unique identifier or something since it gets included in the serialized data file. I guess it's not a problem if every filestore database has an extra a28cd in it.

superscore/backends/filestore.py Show resolved Hide resolved
os.remove(temp_path)
raise

def _temp_path(self) -> str:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend looking at tempfile in the standard lib, which provides similar functionality in a context manager.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The benefit of rolling our own here is that in the case of an interruption, the temp file will continue to exist and can be recovered.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I played around with tempfile.NamedTemporaryFile, which seemed promising at first. I wasn't able to easily incorporate it into the backend, and there is some OS-dependent behavior I'd like to avoid.

This file auto-deletes itself after exiting the context manager, which seems convenient. Except for the fact that we attempt to move the temp file to the existing file location. After doing this the tmp file no longer exists and the context manager throws. Copy transactions are not guaranteed to be atomic, so I actually think we might have to roll our own here.

superscore/backends/filestore.py Outdated Show resolved Hide resolved
superscore/backends/filestore.py Outdated Show resolved Hide resolved
superscore/backends/filestore.py Outdated Show resolved Hide resolved


@pytest.fixture(scope='function')
def sample_database() -> Root:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I support having multiple as long as the interface is consistent.

data: Optional[AnyEpicsType] = None
status: Status = Status.UDF
severity: Severity = Severity.INVALID


@dataclass
class Setpoint(Value):
class Setpoint(Entry):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer having Setpoint and Readback inherit from Value, but I know that that's causing issues with your serialization. If we can't resolve that issue, then I'm fine with removing Value.

superscore/model.py Show resolved Hide resolved
superscore/tests/conftest.py Show resolved Hide resolved
superscore/model.py Outdated Show resolved Hide resolved
@tangkong tangkong merged commit c3c54bd into pcdshub:master May 28, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: Build Filestore Backend
3 participants