Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Series Opencast Call can fire too frequently #604

Open
andiempettJISC opened this issue Nov 15, 2018 · 5 comments
Open

Series Opencast Call can fire too frequently #604

andiempettJISC opened this issue Nov 15, 2018 · 5 comments

Comments

@andiempettJISC
Copy link
Collaborator

If [series] in the galicaster configuration is left unconfigured (it is not configured by default) Galicaster will attempt to retrieve all series from the opencast it is pointed at. This happens at galicaster init and it also does this on every long heartbeat (default 60 seconds). Normally this would be OK however if you have 1000+ series in opencast and many Galicaster capture agents it may become a bit of a problem. for two reasons:

  1. it impacts opencast performance calling the series REST endpoint so frequently
  2. the call from galicaster may be seen by firewalls as malicious as many many HTTP calls are made in a single second

I would suggest a few changes around the default behaviour maybe? possible ideas could be to increase the results per page hard-coded variable or have this also in the [series] configuration https://github.com/teltek/Galicaster/blob/c066b5abd3b32ed038a633cd2a9069c37bdafb5a/galicaster/opencast/series.py#L23
maybe also have the ability to make series polling less frequent? say once at initialisation then just nightly?

@ppettit
Copy link
Collaborator

ppettit commented Nov 15, 2018

I also noticed this was causing a huge load on our admin node. We do not use this data on our Galicasters at all, so an option to disable the calls completely would fix it fast for us.

Maybe it would be possible to only update very infrequently (nightly?) in case the machine is offline, but do a live query after the first few letters are typed when entering a series if the machine is online? This would dramatically reduce the load and having something to filter on would mean less results when you do call the series endpoint.

@Alfro
Copy link
Contributor

Alfro commented Nov 15, 2018

Hmmm, I can see how this can be an issue. I guess you don't really need to change the metadata for scheduled recordings? So it may make sense to allow configuring the frequency of these calls (or make them stop altogether).

@smarquard
Copy link

There is #547 to use the more efficient json endpoint.

The use-case for selecting a series for us is only for ad-hoc recordings, or when you want to ingest a recorded event into a different series than it was scheduled for (not very common, but sometimes helpful).

I think on startup, once an hour or once a day would probably be fine.

@ppettit
Copy link
Collaborator

ppettit commented Nov 21, 2018

@Alfro yes we have no need for the series data at all as everything is scheduled and nobody can even get to the UI to use it.

i just stopped the series stuff completely on our machines. this is the result on our admin node:
screenshot_2018-11-21 grafana - opencast

this is a 6x2GHz server so a significant reduction in CPU usage! we have ~100 galicasters.

I guess we really need to have a way of running jobs at arbitrary intervals rather than just on the long/short timers in order to be able to do "once an hour" etc. and/or a way of guaranteeing that a job can get run once at startup.

I didn't try the code in #547 so not sure what difference that makes.

@Alfro
Copy link
Contributor

Alfro commented Nov 26, 2018

Thanks for the info @ppettit!
That looks like a serious improvement. I'll look into adding #547 to Galicaster and adding a configuration value to the series endpoint to disable it/change the frequency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants