Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: allow remote chrome instance #51

Closed
wants to merge 2 commits into from
Closed

feat: allow remote chrome instance #51

wants to merge 2 commits into from

Conversation

jkroepke
Copy link
Contributor

@jkroepke jkroepke commented Jul 5, 2024

Depends on #53

Fixes #41
Fixes #42

@jkroepke jkroepke changed the title deps: Update all backend dependencies feat: allow remote chrome instance Jul 5, 2024
Signed-off-by: Jan-Otto Kröpke <joe@cloudeteer.de>
@mahendrapaipuri mahendrapaipuri added the enhancement New feature or request label Jul 8, 2024
@mahendrapaipuri
Copy link
Owner

Firstly, thanks a lot for working on this. Really appreciate your time and effort. Globally, I see that we are in a good direction. However, I have few ideas and suggestions to improve it further.

Major:

  • I think the signatures of functions would be greatly simplified if we use remote chrome instance always. The difference will be who manages the life cycle of the chromium instance.

    • In k8s context when chromium is deployed as side car container, it is k8s that will manage the life cycle of chromium. The operator will configure the plugin to use the chromium instance running in sidecar by setting up the chromium URL in the plugin config.
    • In bare metal deployment when there is no external chromium instance running, the plugin will start a chromium instance in a sub process and it sets the chromium URL in the config.

    In both cases chromium will run outside of plugin and the difference will be the manager of chromium instance. This way, we always use the same chromium instance and avoid spinning up a new chromium for each request in bare metal deployments. This will also allows us to do all chromium related config in NewApp function and just use the remote chromium instance to make requests from client and report methods. What do you think?

Minor:

  • I dont see the need of ability to configure chromium URL from Grafana UI. I mean in k8s context, we cannot change URL (either name or port of sidecar container) without restarting the pod. So, it should be like the config parameters appUrl, skipTlsCheck that should be provisioned in the config file during deployment. What I think a better enhancement will be able to configure these parameters from env vars. Currently, plugin reads appUrl from GF_APP_URL env var. We can fix a env var for skipTlsCheck (like IGNORE_HTTPS_ERRORS like grafana-image-renderer does) and one for remoteChromeAddr (like CHROME_URL) so it will simplify deployments in k8s as there is no need to maintain a config file.

These are some initial thoughts. Let me know what do you think. If you dont have time to implement these changes, it is totally fine. I can work on them on the top of your branch. You have already done great job in putting up these PRs.

@mahendrapaipuri
Copy link
Owner

Looking closely into the PR, I see that we are still creating a new browser instance for each request (either remote or local). However, in your gist you were using the same browser instance with different tabs. Is there any reason why you did not implement it that way? I mean using multiple tabs on same instance of browser.

@jkroepke
Copy link
Contributor Author

jkroepke commented Jul 9, 2024

I see that we are still creating a new browser instance for each request (either remote or local).

No. In terms of local. the behavior has been not changed.

In terms of remote, the chrome instance is persistent. The instance is already started, before grafana-dashboard-reporter-app is connected. Browser instance lifecycle is not handled by grafana-dashboard-reporter-app. Each report requests create a new connection to the remote chrome instance via Chrome Remote Debug Protocol.

Is there any reason why you did not implement it that way?

Too much refactoring is required here to implement it. Normally, the local chrome instance could be also something which is starting at plugin start and then, each requests just creates a new tab, but then, failure tolerance needs to be implemented. What happens, if the connection to the remote chrome is lost?

Make a new network request for each report request resolve the issue of maintain a long-living connection. Additionally, it would be possible to load-balance multiple chrome instances behind a Kubernetes service.

in general a fan of small, minimal PRs instead big PRs which refactor everything.

This PR just replicate the current logic.

and one for remoteChromeAddr (like CHROME_URL) so it will simplify deployments in k8s as there is no need to maintain a config file.

I'm not a fan of this. Grafana plugins should avoid using any environment variables, except something like GF_*. Grafana is planning to restrict environment variables for any plugin. Plugins should be configurable via provisioning interfaced. Any plugin settings should used the designed interface.

@mahendrapaipuri
Copy link
Owner

Thanks a lot for your patience.

in general a fan of small, minimal PRs instead big PRs which refactor everything.

I agree this strategy when we are working with well refined code base. However, the current plugin is far from ideal. I had to hack something fast from existing reporter app for our needs and I managed to do so. But as you must have noticed, the code base is bit distorted. So, I prefer not to fragment the code more but to re-organise it.

I managed to clean it up a bit and I hope the backend is more easy to follow now. Consequently there are lot of conflicts now. Sorry about that!

Normally, the local chrome instance could be also something which is starting at plugin start and then, each requests just creates a new tab, but then, failure tolerance needs to be implemented.

This is a big change I managed to do during this code clean up. Now a local chrome instance will be spun up when app instance is created and this instance will be reused everywhere by opening new tabs. Now if we find a remote chrome URL in the config, we instead create a RemoteAllocator here and return the context. Maybe we can rename that function as newBrowserContext which will be more appropriate. Normally that would be the only change and all the downstream code should work out-of-the-box. I think we should do a connectivity test if remote chrome is being used and to check if the remote instance is actually usable. What do you think?

Make a new network request for each report request resolve the issue of maintain a long-living connection.

This is what we are going to do. Create a new tab for each request and there is no need to maintain long living connection. When using local chrome instance, it is chromedp's reponsibility to manage life cycle of browser instance and this is what NewAllocExecutor does.

Grafana plugins should avoid using any environment variables, except something like GF_*

For appUrl we are already using GF_APP_URL and for skipTlsCheck, I added GF_REPORTER_PLUGIN_IGNORE_HTTPS_ERRORS. So, for chrome URL we can use GF_REPORTER_PLUGIN_CHROME_URL.

Grafana is planning to restrict environment variables for any plugin

The PR has been merged and it seems like it is operator that should opt out of passing server env vars to plugin. So, we are fine for backward compatibility. By supporting env vars, we give options to the operators.

If it is too much of a hassle to resolve conflicts, feel free to put up a new PR. I think the changes should be more straight forward now. Please let me know your thoughts!!

Cheers

@jkroepke
Copy link
Contributor Author

jkroepke commented Jul 12, 2024

What do you think?

By opening a network connection at the start of the application, we need to maintain and monitor the connection. What happens if the remote Chrome instance no longer exists? If we choose this method, we need to account for many factors.

On the other hand, creating a new connection for each report is efficient. The remote instance will not spin up a dedicated Chrome instance for each incoming connection. Instead, all connections share the same context. Chrome fully supports multiple remote debugging connections to the same browser, as detailed here: https://developer.chrome.com/blog/new-in-devtools-63#multi-client.

Setting up a connection for each report request has additional benefits. Users can choose different architectures on their own and run multiple Chrome instances as separate pods in Kubernetes. Since each report creates a new connection, the connections can be well load-balanced.

Additionally, instead of running and maintaining Chrome locally, end-users have the option to use Chrome clouds, such as https://testingbot.com/support/puppeteer/chromedp.html.

It is likely that some long-lived connections over the internet will be terminated from time to time.

By supporting env vars, we give options to the operators.

But it's not compliant with the standard interface for configuring a Grafana Plugin.

In context of Kubernetes, system operators could use different technics to provision plugins, for example with a sidecar.

operators has should do app provision anyways. https://grafana.com/docs/grafana/latest/administration/provisioning/#plugins

There are a lot of other options like logo and so on. They need it anyways I highly recommend to use that existing interface to configure that plugin.

Environment variables always requires a restart of Grafana.

Plugin configuration can be reload at anytime. In context of Kubernetes, system operators may use the sidecar for plugins anyways. (Example for datasources: https://github.com/grafana/helm-charts/tree/91d549c8e2b682f56919fa33115da04e507e263c/charts/grafana#sidecar-for-datasources ; but it exists for plugins as well https://github.com/grafana/helm-charts/blob/91d549c8e2b682f56919fa33115da04e507e263c/charts/grafana/values.yaml#L1059)

The benefits of using the plugin configuration interfaces is that settings can be changes without restarting grafana. Without restarting the reporter-app, which would affect running requests. Provision config configurations can be reload via HTTP as well (https://grafana.com/docs/grafana/latest/developers/http_api/admin/#reload-provisioning-configurations)

From that point of view, use environment variable instand the designed plugin interface would be a terrible design choice and has close to zero benefits.

For appUrl we are already using GF_APP_URL

Which is not necessary. The grafana SDK provide a public interface to get this information, without using os.getenv

https://github.com/grafana/grafana-plugin-examples/blob/0fd2458bc19d8925d1267998992a00c4c7b22622/examples/app-with-service-account/pkg/plugin/resources.go#L25-L26

I added GF_REPORTER_PLUGIN_IGNORE_HTTPS_ERRORS

I highly recommend to mark this as deprecation and remove it in the next major instance. Instead, provide a configurable setting via admin portal.

Sorry, but I highly recommend to avoid using environment variables. System operators should use the provision configuration.

Other Grafana tools like https://github.com/grafana/grafana-operator will never ever support this.

@mahendrapaipuri
Copy link
Owner

By opening a network connection at the start of the application, we need to maintain and monitor the connection. What happens if the remote Chrome instance no longer exists? If we choose this method, we need to account for many factors.

On the other hand, creating a new connection for each report is efficient. The remote instance will not spin up a dedicated Chrome instance for each incoming connection. Instead, all connections share the same context. Chrome fully supports multiple remote debugging connections to the same browser, as detailed here: https://developer.chrome.com/blog/new-in-devtools-63#multi-client.

You have a valid point for the deployments that use remote chrome instances. But for baremetal deployments this translates to spinning up a chrome instance for each request (which is how the actual implementation works and I guess we could improve that). I agree that in the case of plugin using remote chrome instances, a drop in ws connection can render the plugin unusable. So, I guess it is better to open a new connection for each request when we use remote chrome instance as you suggested.

In the deployments using local chrome instances, we can still spin up a single chrome instance when the plugin app is loaded. And we can interact with this single instance using remote debugging protocol using URL ws://localhost:9222 within each request. This way we can have same design pattern either for local or remote chrome instances. Both will open a new connection for each request. In the local instance, the plugin app will control life cycle of chrome instance.

There are a lot of other options like logo and so on. They need it anyways I highly recommend to use that existing interface to configure that plugin.

There is some misunderstanding here. I am not and never proposed to deprecate provisioning the plugin app with config file. To sum up, there are two ways to provision the plugin right now:

  • Using a config file installed at $GF_PATHS_DATA/plugins/reporter.yml as instructed in docs
  • From Grafana UI where users can modify the provisioned config.

Currently, all config parameters except, appUrl and skipTlsCheck can be modified from the Grafana UI after plugin has been provisioned. The reason why these two parameters are not configurable via Grafana UI is that they both are strongly tied to Grafana instance and it does not make to modify them without restarting the Grafana instance.

So, what I am proposing here is besides allowing configuring appUrl and skipTlsCheck using provisioning method, we can let operators to configure them using env vars. This is what the actual implementation does: defines JSON tags appUrl and skipTlsCheck in the Config struct, unmarshall the provisioned config into config instance and if env vars are passed, the provisioned values are overridden.

We are by no means obliging the operators to configure the app using env vars. They can use their regular provisioning methods by config file to configure the plugin and that will be always supported.

Imagine a use case where an operator is happy to live with "default" config for their deployment and the only thing that need to configure for the plugin to work is appUrl. So, instead of having to maintain and install a config file just to set this one parameter, they can simply inject that parameter into env vars. This is more relevant in baremetal deployments like ours where setting up a new env var in systemd is far more easier and practical. I hope that makes it clear.

Which is not necessary. The grafana SDK provide a public interface to get this information, without using os.getenv

Not quite sure about it. I remember playing around it (during december/january) and it always gave me empty config. Unless something has changed recently, we wont get App URL from the plugin context. But if SDK does expose this info now, that would make installing plugin far more simpler. It would be plug-and-play situation where operators do not have to worry about provisioning.

@jkroepke
Copy link
Contributor Author

jkroepke commented Jul 15, 2024

The refactoring PR #52 which include this PR also make appURL and skipTLS configurable.

In my case, appURL points to an external URL. In that case, the reporter plugins would go over the internet back to the public LB to the grafana pod. I have to set http://localhost:3000 to skip any unnecessary hops.

The solution with define GF_APP_URL may not work, since Grafana will override the env variable again. Thats the reason, why I plan to make this configurable.

In general, #52 is replacing this PR.

@jkroepke jkroepke closed this Jul 15, 2024
@jkroepke jkroepke deleted the remote-chrome branch July 15, 2024 13:32
@mahendrapaipuri
Copy link
Owner

In my case, appURL points to an external URL. In that case, the reporter plugins would go over the internet back to the public LB to the grafana pod. I have to set http://localhost:3000/ to skip any unnecessary hops.

I think you might have a point here. Could you please provide more details? You have configured server section on Grafana to point the domain to external URL? If so, how are you configuring it? Using .ini file or GF_APP_URL env var. If you are pointing something like GF_APP_URL=https://myserver.grafana.com for your Grafana, I agree with you that the plugin will always take this into account and will never be able to make requests to configured appUrl in the provisioned config. In that case we should use an env var that does not clash with Grafana config env vars. Maybe something like GF_REPORTER_PLUGIN_CALLBACK_URL.

Ok, I will look into the other PR and we continue the discussion over there.

Thanks a lot for your valuable inputs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Provide ab fat zip with plugin and the chromium binary Proposal: Split plugin into to segments
2 participants