-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert formats
to yaml
with id
s
#76
Conversation
riddle me this - why would the tests fail at 2719511 miniscope_io.formats.stream.StreamBufferHeader == miniscope_io.models.stream.StreamBufferHeaderFormat.from_id("stream-buffer-header") |
Pull Request Test Coverage Report for Build 12243401376Details
💛 - Coveralls |
d8f56ec
to
e8aee8c
Compare
ok @t-sasatani updated the root comment with a description of this PR :) it looks big but a lot of it is from converting the formats to yaml and also how we don't have linting on for the tests and my IDE reformatted some of the test files, oop. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I generally like being able to call IDs instead of directly specifying file paths and making the YAML configs explicitly coupled with models.
Please correct me if I'm wrong, but other parts of this PR felt like it's leaning towards making the YAML config files primarily code-edited instead of manual-edited, which I think raises the bar for people who don't code much. Structure-wise, it makes sense if this is an intermediate step for making miniscope-io
fully GUI-based, but I'm unsure if we should merge this part at this point. This is mostly about developer interface, and we've looked at this too much, so it could be good to ask opinions from the other folks trying to integrate this io
.
Overall, my current opinions are:
id
: I prefer using thedirectory/stemname
as the ID.mio_model
: let's do this.mio_version
: I prefer not having this in the config file itself because it'll make it a mix of code/hand-edited entities.
while we're mostly CLI.
@@ -0,0 +1,41 @@ | |||
id: wirefree-sd-layout-battery |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might not fully understand the benefit of having a dedicated id
field when we could also use the file path directly. (I assume it'll be one configuration per one YAML file here.)
- Can't we use the stem path under
config
as a unique ID to call configs? Likewirefree/sd-layout-battery
for this example or maybeuser/some-config
if it's a user directory. - I feel there could be some confusion with duplicate IDs, and ambiguous IDs, especially because they will be hidden from the human perspective. We could add tests to prevent duplicates, but I think using file paths as unique identifiers is more intuitive.
- There will be multiple configs for one device. I think we'll have higher risks of ambiguous configs IDs getting pushed (like
ScopeX-1
,ScopeX-2
,ScopeX-old
,ScopeX-latest
, andScopeY
for a device calledScopeX
), compared to when we just use path/filename as IDs. - Being able to have nice versions in IDs feels nice, but I think it'll be pretty confusing if the filename doesn't sync with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ope responded to this as a global comment before i read the individual comments. Responded to most of these there, but for continuity's sake: currently we get both, and there are reasons to have both, and they don't really trade off.
I feel there could be some confusion with duplicate IDs, and ambiguous IDs,
this would be an interface question. so we would want something like
$ mio config --show
id path
sd-layout-battery miniscope_io/data/configs/sd-layout-battery.yaml
my-config {user_dir}/configs/my-config.yaml
my-config {user_dir}/configs/my-config-2.yaml
but again you can always specify something as a path if you want to.
There will be multiple configs for one device. I think we'll have higher risks of ambiguous configs IDs getting pushed (like ScopeX-1, ScopeX-2, ScopeX-old, ScopeX-latest, and ScopeY for a device called ScopeX), compared to when we just use path/filename as IDs.
the goal is that we don't want to have scope-v1
, scope-v2
, etc. The idea behind having the id
is that we want to have a way of identifying unique configurations semantically, so then different versions could each be id: scope-default
with a different mio_version
field to identify that they are related configs that are for different versions of the scope. Then the id only contains the minimal semantic information necessary to identify it: like one would usually just use wireless
to run their wireless miniscope, but if they had some custom configuration they could use wireless-mycustomconfig
everywhere and not have to worry about keeping the rest of their code up to date with the current location of that file, or the different file names as the version changes.
Being able to have nice versions in IDs feels nice, but I think it'll be pretty confusing if the filename doesn't sync with it.
Filenames are unique identifiers to the system, but we want to provide an interface that potentially allows people to have multiple versions of the same configuration, and exposing them in the filesystem makes for nicer ux than just having them in a database somewhere. so that's the compromise - have an id
field and allow it to be decoupled from the filename.
if someone wants to refer to something by the filename, then they just go path/filename.yaml
. if someone wants to refer to something by the id, they go myconfig
. one can make the filename match the if they want to, but they have the ability to make it orthogonal to filename in cases where that's useful.
There will be multiple configs for one device. I think we'll have higher risks of ambiguous configs IDs getting pushed (like ScopeX-1, ScopeX-2, ScopeX-old, ScopeX-latest, and ScopeY for a device called ScopeX), compared to when we just use path/filename as IDs
for config_file in chain(*globs): | ||
try: | ||
file_id = yaml_peek("id", config_file) | ||
if file_id == id: | ||
init_logger("config").debug( | ||
"Model for %s found at %s", cls._model_name(), config_file | ||
) | ||
return cls.from_yaml(config_file) | ||
except KeyError: | ||
continue | ||
raise KeyError(f"No config with id {id} found in {Config().config_dir}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be better to return errors when there are duplicates (maybe when generating)? Though I feel it's better to use filepath as IDs as in the above comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure sure, we could check and warn for duplicates here, that would be np
target-version = "py311" | ||
target-version = "py39" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly out of curiosity, but do we want to do this, or is it just a leftover from something? Might be better to have some notes if 3.11
isn't good because I believe 3.9
's end of support is soon
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh sorry ya, this is because ruff uses this as the minimum target, so it was constantly suggesting changes to the code that were incompatible with our lowest version. (specifically Union[X,Y]
to X | Y
which 3.9 doesn't support). 3.9 should be EOL next october, so basically this value should always stay pegged to our lowest version we support and bumped then (rather than the max version, which was 3.11 at the time i set this iirc)
* `id` - unique identifier for this config | ||
* `mio_model` - fully-qualified module path to model class | ||
* `mio_version` - version of miniscope-io when this model was created |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like having the model here, and even if this is manually filled, having validation for this could be useful.
However, I'm unsure about the rest because I assume these YAML config files will usually be created by copy-paste and manually filling them out. Having this mio_version
seems to make these primarily made via a script or CLI command, which I feel will raise the bar without much benefit and cause confusion when hand-edited (which I assume will be the norm).
I feel it'll be more intuitive if we assume the configs will be manually edited. And maybe write a CI flow that documents the mio_version
that the config YAML was last changed from the commit ID if we need that.
I've written about the id
in other parts, so I will skip it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe generating something like this with CI?
https://miniscope-io--84.org.readthedocs.build/en/84/history/config.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copy-paste and manually filling them out.
I think for most deployments we should assume that people won't be looking at the source code - we assume pip install miniscope-io
is the default way to get the package. So then they won't have anything to copy/paste, and we need to provide a cli or programmatic interface for dealing with configs. that's sorta the whole idea with the design of having an id field and the way it was implemented as being something that gets autofilled when loaded and then put at the top of the yaml file so we could use a cheap yaml_peek
method to look through many files quickly.
So i think this is a pretty easy way to start using the program
$ pip install miniscope-io
$ mio capture --config wireless
and then when i want to make my own config
$ mio config create wireless --to wireless-custom
# or
$ mio config create wireless --to my_config_dir/wireless_custom.yaml
and then manually edit it.
CI flow that documents the mio_version that the config YAML was last changed from the commit ID if we need that.
we're looking to support user-custom configs here, so we assume they aren't committing anything to the repository (and don't have to fork it to use it) - versions need to be totally independent from the git flow
Fair point! The idea for the design here is that it can be both: you can totally hand-code your configs if you want to, and then when they are loaded the proper metadata is stamped into the hand-coded config files if it's not already there. So the anticipated workflow would be someone doing something like this # kwargs version
mio config create --from wireless-base --to my-wireless-base
# positional args version
mio config create wireless-base my-wireless-base to copy the config file with Then they could edit it however they want by hand, and use it with the id this should actually make it easier to hand-maintain custom configs over time, specifically the
the idea for having a single id slug is to make it agnostic to location: we want to be able to refer to something uniquely from anywhere. so say when someone publishes their work, they might repackage their data and their config and etc. into a custom directory structure that fits for that journal's expectation, or some format, etc. As-written though, we get both: you can either provide an |
I agree that it will be the same for people used to coding and will be better for maintaining.
Function-wise, it is strictly additive, as you say. However, UI-wise, I think it's pretty negative to do this, requiring the user to read through the CLI reference to run a command they only run once (the alternative is just to copy and paste the YAML file), providing many ways to do the same thing, mixing up manually edited fields and generated fields in the same config file, etc. I see a benefit and agree that limiting the flow's entrance to a CLI is good if the objective is to keep the IDs unique and the fields organized for future updates, but I see a pretty critical trade-off here. I think at this point it's more important lowering the bar/simplifying as much as possible. |
yep! and what i'm saying is that |
if you wanted to just copy and paste the files and pass relative paths, you can - this change is fully backwards compatible |
Ok, there's no reason to actively oppose this, as it is true that this is fully backward-compatible. Also, I guess my view of the user is mixed up; I considered tool developers as the users because they will decide whether to use this, and the end users will have no business with the device configs for a while. I thought it'd be nice if it looked a bit easier for developers to join the coding, but if they will likely treat this io as a well-defined black box downloaded via pip, as you say, there's no issue. Related note: I think it is perfect timing to pull in other Miniscope developers (i.e., our lab people to start) in the dev to make our lives easier, but I'm a bit concerned that people seem to be a bit overwhelmed, which I think makes total sense. I don't know a good way to approach this, but it's rather non-ideal that it's mostly the two of us for nearly 1.5 years, though it should be a bridging project. These nesting, assuring multiple pathways, rich versioning, etc., will benefit future scalability, but it could also somewhat raise the bar for joining (at least, it is raising my mental bar to add stuff a bit, even though I think I've read most of the code). I totally agree these changes are worth it and get this project closer to what it ultimately should look like, but it could make sense to slow down a bit on these future scalability aspects. |
totally agree. in trying to get hemal up to speed, the config system being split across the format models and the yaml files was a big barrier, so hopefully this simplifies things down a bit, and there is still a decent big of work to do to simplify implementation since the sd-card classes and the stream daq are basically two independent implementations. |
Fix: #55
OK Here we replace the awkward
format
model system with one where we just useyaml
files for all static config like that.This is something that is intended to work alongside #72 to make a more coherent configuration experience.
Now we have the starts of a uniform system where we extended the generic
YAMLMixin
class to aConfigYAMLMixin
class which can be used with any pydantic model.The way it works is
id
: an identifier that is expected to be locally unique - i.e. in a given deployment, between the builtin configs and the user provided configs, an id is unique. not globally unique across all deployments.mio_model
: the fully-qualified python module that the config corresponds to, likepackage.subpackage.ClassName
mio_version
: the last known version ofminiscope-io
that this config was known to work with this config. more on versioning below.config_dir
(see Global and User config, CLI config #72 )from_any
method that accepts the possible input types, yaml files and ids, to provide a "fuck it find my thing" interface. we get to avoid unstructured or EAFP-style string parsing by definingid
to exclude.
, so we can unambiguously decide whether something is a path or an id.yaml_peek
function to be able to scan forid
s quickly because parsing yaml is surprisingly expensive, especially if we imagine that someone might eventually use this tool to make a relatively complex configuration like we see in the existing miniscope-qt-daq configs. it should be at least as fast aspyyaml
always, because the underlying file reading and regex methods in python are well optimized. the reason this is so perf-sensitive is that while it shouldn't happen repeatedly during runtime, we should expect someone to accumulate a billion and a half config files as they use it, and this is easier and likely about as reliable as building a caching system for that.So basically how this should work in normal usage is like this:
where given some config file in a config directory like
where the user doesn't need to handle paths but can instead refer to their config by ID. this also helps with publishing code with papers, but that's more of a 'later' thing.
Where the actual API of the thing is like this
and so on.
This design sets us up for the future where we will want to have several different kinds of configs for every object, some builtin and fixed, and others provided by the user. For example the current stream_daq config blends together params that are intrinsic to the device with params that the user might want to configure, both dynamically (i.e. in a gui) and for a given set of experiments. we will want to be able to have a bunch of different types of configs that we can refer to with a shorthand and have a standard way of locating them rather than having them splayed out everywhere :). This also makes it possible to have a uniform interface over builtin configs as well as user-provided ones, because the user should not have to know anything about the directory structure of the package (e.g. knowing that wireless miniscope configs are in
data/configs/wireless
) because they usually can't see it, and they also shouldn't have to juggle paths in derivative code.Additional notes on the design here:
id
has been constrained to one where we might be able to support configs with PIDs in the future: a config is allowed to use/
and#
symbols to indicate hierarchy and fragments, respectively.mio_version
parameter is to be able to identify when old configs might not work with the current version of the model it corresponds to, and to be able to migrate between those version updates. I opted not to give every model its own independent version, which we would then have to write tests to validate that we write a migration whenever the model is updated and so on. Instead we just use theminiscope-io
version for everything, and then when we change a model we write a migration that corresponds to that version, and on load we check "are there any migrations for this model? if so, are the versions they are tagged to greater than the version in this config, if so migrate." That's all tbd and i'm not sure if there is tooling for this already in pydantic universe, but if not we'll write it as a separate package. Downstream users are welcome to add an additional version field to their model if they'd like to keep track of things that way as well. we're not quite at plugin phase yet and we'll probably need to mildly tweak this in the future whenminiscope-io
version is not the package that contains the migrations, but we'll get there later.mio_model
value matches the loading class. that's because it would be annoying while we are pre-alpha and names and locations are shifting to keep having to update these strings. once we get a decent migration system then we can do that. for now it's like "you can name something the wrong name at your own peril lol"📚 Documentation preview 📚: https://miniscope-io--76.org.readthedocs.build/en/76/