Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(client): upload dataset items to data service via http using client #642

Merged
merged 6 commits into from
May 31, 2024

Conversation

aaraney
Copy link
Member

@aaraney aaraney commented May 29, 2024

dmod.client client will upload dataset items to data service via http (instead of websocket connection) if the client is configured to bypass the request service. In the future this option will become more explicit when it is possible to upload to the data service without needing to bypass the request service.

closes #599

Changes

  • dmod.client client will upload dataset items to data service via http (instead of websocket connection) if the client is configured to bypass the request service.

Note

I decided to take a very direct approach in implementing this feature (e.g. existing abstractions were not expanded / implemented). In discussing how we might implement this feature with @robertbartel, it became clear that the TransportLayerClient abstraction is not well suited to handle the request-response paradigm. Additionally, bloating that base class with methods that would enable a request-response paradigm is not desirable. It seems clear that careful thought is required to support communication via multiple paradigms. For the time being, a concrete approach will suffice until a more general approach is required.

@aaraney aaraney requested a review from robertbartel May 29, 2024 18:03
Copy link
Contributor

@robertbartel robertbartel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few items that need addressing, and a couple additional, more minor comments/questions.

async def data_service_action(self, action: str, **kwargs) -> ResultIndicator:
async def data_service_action(
self,
action: Literal["create", "delete", "upload", "download", "list", "items"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First and foremost, the relatively new state option is missing.

More generally, I don't particularly like the use of Literal here, for exactly the type of thing that's happened, though I may not have full appreciation for the benefits.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch. I don't know how I missed state.

More generally, I don't particularly like the use of Literal here, for exactly the type of thing that's happened, though I may not have full appreciation for the benefits.

The goal of using Literal here is to improve the developer experience when using this api. I think this is especially important when developing client libraries. By using Literal here a language server will populate the available options for the action parameter in the developer's IDE.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just addressed this.

@@ -225,7 +229,17 @@ def data_service_client(self) -> DataServiceClient:
if self.client_config.data_service is not None and self.client_config.data_service.active:
t_client_type = determine_transport_client_type(self.client_config.data_service.endpoint_protocol)
t_client = t_client_type(**self.client_config.data_service.dict())
self._data_service_client = DataServiceClient(t_client, self._auth_client)
# use http client where applicable when bypassing request service
if self._bypass_request_service:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably going to need some modifications. As is, we won't get the extra HTTP client without the user manually specifying an extra CLI arg, and configuring the data service details will have the side effect of bypassing the request service for all data service communication. I expected that we wouldn't (necessarily) want to bypass the request server except for the actual data transfer.

This _bypass_request_service attribute (and related CLI option) is a legacy thing that just never got taken out. It's False unless the user specifically does something extra at runtime.

What replaced that legacy behavior was the CLI client moving to using a configuration class, and instead bypassing the request service (for data service actions) based on whether a self.client_config.data_service value was available. So if the ClientConfig has a data_service value, use that ConnectionConfig when lazily creating the data_service_client property. Otherwise, create data_service_client using the "regular" transport client, which was created from ClientConfig.request_service.

Depending on how we want to go about things, we'll at least need to change where and how the conditional is used, and possibly more than that if we want to continue to support full bypassing of the request service (though maybe we don't).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably going to need some modifications. As is, we won't get the extra HTTP client without the user manually specifying an extra CLI arg, and configuring the data service details will have the side effect of bypassing the request service for all data service communication. I expected that we wouldn't (necessarily) want to bypass the request server except for the actual data transfer.

I should have left a note in the code here why I chose to go this way. My thinking was this was the most sensible default for the moment given where we are in the development lifecycle. Namely, in our deployment architecture with the request service acting as the main entry point to the deployment, at this point in time, we don't have a way to reach the http data service endpoints through the request service or via proxy. So, the only way to reach the http data service endpoints are by directly reaching out to the data service.

Once this is no longer the case, we will need to modify this section of the client code to remove this condition. Thoughts on that?

#338 is also related.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The part that I am having trouble with is the de facto bypassing for any data service communication. A necessary (but not sufficient) condition for HTTP uploads produces side effects for more than just uploads. To work around them, we'd have to go back and edit the config (or manually toggle between configs).

I don't think we want this, and I think it should be fairly easy to avoid. Any reason not to also make the outer condition dependent on self._bypass_request_service? Or, perhaps more generally, why not (at least for now) change the request service bypassing functionality to once again require an explicit CLI arg, and also make use of HTTP uploads dependent on having bypassed the request service?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you are saying. I will just remove the check conditional check on self._bypass_request_service and use the http client if the configuration file is configured to bypass the request service. Sorry, not enough coffee on a Friday, I guess 😅!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just addressed this

transport_client: TransportLayerClient,
auth_client: Optional[AuthClient] = None,
*args,
http_client: Optional[aiohttp.ClientSession] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason this was ordered this way with respect to *args and **kwargs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, great question! This forces http_client to be provided as a keyword argument. If there were to be a subclass of DataServiceClient that added arguments to it's __init__ method, adding the http_client as a keyword or positional argument is a breaking change. Subclasses that add a positional / keyword argument to their __init__ method could be now inadvertently mis-initialized.

@robertbartel robertbartel merged commit 8465473 into NOAA-OWP:master May 31, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enable uploading data to dataservice via HTTP using DMOD client
2 participants