-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Near real-time updates to crawled data #486
Comments
so it looks like solving w3c/reffy#850 will gets us with ~1min30 as a basis for a no-update workflow run, and updating one spec is probably in the order of ~10s, so running a full crawl might be reasonable approach to this, although we should expect the basis to grow in proportion of the number of specs being crawled. For the more efficient single-spec update approach, we might be able to use https://github.com/softprops/turnstyle as a way to ensure trigger events are processed sequentially - see also https://github.community/t/race-condition-possible-from-rapidly-executed-concurrent-github-actions/137411/3 |
In a variety of contexts (CI in particular, but likely also in the context of the data re-used by spec authoring tools), it would be ideal if the content in webref reflected changes in the underlying documents in close to real-time.
One way we could enable this (at least partially) is by having spec repos trigger a webref update for the given spec whenever the main source file of the said spec is updated - this could be typically achieved with a webhook installed at the repo or (more likely for scaling) at the org level.
One issue is that if several updates are processed at the same time, they would likely trigger an error at the time of pushing the results; this could be avoided either using a different timing in how checkouts and crawls are organized, or by doing a full crawl (with HTTP caching optimizations to reduce the time / network impact).
The text was updated successfully, but these errors were encountered: