-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ability to find events by slide text & captions in search #1189
base: master
Are you sure you want to change the base?
Commits on Aug 12, 2024
-
Add
event_texts
andevent_texts_queue
to DBThese tables will hold texts of events, extracted from subtitles and slide texts, which will be searchable later. The queue is used for fetching all those text assets from Opencast.
Configuration menu - View commit details
-
Copy full SHA for b1ba905 - Browse repository at this point
Copy the full SHA b1ba905View commit details -
Add
other_hosts
config to[opencast]
sectionThis is useful to specify other trusted hosts, where Tobira may send the sync login data to.
Configuration menu - View commit details
-
Copy full SHA for 68dbaa0 - Browse repository at this point
Copy the full SHA 68dbaa0View commit details -
Configuration menu - View commit details
-
Copy full SHA for 06d3c66 - Browse repository at this point
Copy the full SHA 06d3c66View commit details -
Add function for fetching text assets from OC (
sync texts fetch
)This will be part of the worker and is able to deal with a variety of error cases. Figuring all this out took quite some time. I decided now that ignoring assets for which Opencast returns something unexpected is fine most of the time. Admins will be able to easily requeue these failed events. This can also deal with network errors or similar indications that OC is not available at the moment, using an exponential backoff then.
Configuration menu - View commit details
-
Copy full SHA for e860481 - Browse repository at this point
Copy the full SHA e860481View commit details -
Configuration menu - View commit details
-
Copy full SHA for 8518d9a - Browse repository at this point
Copy the full SHA 8518d9aView commit details -
Configuration menu - View commit details
-
Copy full SHA for 8cb63db - Browse repository at this point
Copy the full SHA 8cb63dbView commit details -
What... like the explicit color choice is there to override terminal detection. So why...?!
Configuration menu - View commit details
-
Copy full SHA for 548b432 - Browse repository at this point
Copy the full SHA 548b432View commit details -
Remove outdated texts after fetching new ones
This was forgotten before: maybe some assets don't exist anymore after an event was updated. Those entries shouldn't persist in the event_texts table. By deleting all entries beforehand, we can also easily use a bulk insert now (since we don't require `on conflict`). I extracted some logic into a helper function to deduplicate code. I tested the users upsert function after this change.
Configuration menu - View commit details
-
Copy full SHA for a5a138d - Browse repository at this point
Copy the full SHA a5a138dView commit details -
Add
queue
anddequeue
subcommands forsync texts
This allows you to queue or dequeue a specific set of events. In particular the `queue --missing` is very relevant as Tobira sometimes gives up on some events after too many failures.
Configuration menu - View commit details
-
Copy full SHA for 6ca156c - Browse repository at this point
Copy the full SHA 6ca156cView commit details -
Make download concurrency configurable and change default to 8
I just tested this with our 12 core test Opencast (where Java serves the files): 2: ~125% CPU, ~1.1 MiB/s down => 3m 2s 4: ~230% CPU, ~2.0 MiB/s down => 1m 42s 8: ~380% CPU, ~3.5 MiB/s down => 1m 16: ~600% CPU, ~5.5 MiB/s down => 42s 32: CPU and downlink wildly varying => 37s
Configuration menu - View commit details
-
Copy full SHA for 84fc10a - Browse repository at this point
Copy the full SHA 84fc10aView commit details -
Configuration menu - View commit details
-
Copy full SHA for cbb9fc7 - Browse repository at this point
Copy the full SHA cbb9fc7View commit details
Commits on Sep 2, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 9fd2c4a - Browse repository at this point
Copy the full SHA 9fd2c4aView commit details -
Configuration menu - View commit details
-
Copy full SHA for 8e2b031 - Browse repository at this point
Copy the full SHA 8e2b031View commit details -
Configuration menu - View commit details
-
Copy full SHA for d1a1d69 - Browse repository at this point
Copy the full SHA d1a1d69View commit details -
Configuration menu - View commit details
-
Copy full SHA for 378c402 - Browse repository at this point
Copy the full SHA 378c402View commit details -
Configuration menu - View commit details
-
Copy full SHA for 30a840f - Browse repository at this point
Copy the full SHA 30a840fView commit details -
The main change is that texts with the same span are concatenated to only be one entry in the index. This doesn't reduce the size of the `texts` field in Meili, but that of the timespan index. This optimization is mostly there for slide texts, not for captions. But this commit also moves the build process into `FromSql` to avoid a bunch of useless allocations. Ideally one would also avoid all the intermediate `String` allocations, but that's not easily possible right now.
Configuration menu - View commit details
-
Copy full SHA for 98f45ea - Browse repository at this point
Copy the full SHA 98f45eaView commit details -
Configuration menu - View commit details
-
Copy full SHA for 6f2d8e9 - Browse repository at this point
Copy the full SHA 6f2d8e9View commit details -
Fix timeline width in search events
It was only as wide as the metadata made the container, which is not great.
Configuration menu - View commit details
-
Copy full SHA for e7c13c6 - Browse repository at this point
Copy the full SHA e7c13c6View commit details -
Configuration menu - View commit details
-
Copy full SHA for 0303e1f - Browse repository at this point
Copy the full SHA 0303e1fView commit details -
Add more aggressive text cleanup (require words >= 1 bytes long)
This is still not very aggressive... I first wanted use 2 as threshold, but ... looking at all chars encodable in 2 byte UTF-8... I cannot be sure that it doesn't make sense to search for one individually. Pi came to mind. We can always make this more aggressive later.
Configuration menu - View commit details
-
Copy full SHA for 82fbe60 - Browse repository at this point
Copy the full SHA 82fbe60View commit details -
Ignore texts with timespans less than 100ms
Mostly ignoring broken ranges
Configuration menu - View commit details
-
Copy full SHA for 03d089b - Browse repository at this point
Copy the full SHA 03d089bView commit details -
Improve
textMatches
in search API by joining multiple matchesThis rewrites the logic that creates the `textMatches` array for the search API. Before, one Meili match was emitted as one text match, but this had several problems. Most importantly, with two words in the query, if those words would appear in a text right next to one another, Meili would still generate two matches. They would have the same timespan and Tobira would just show two divs on top of each other, only one of which would be visible. Now, for each individual text, we join all matches (with a limit) and return only one `TextMatch`, but potentially with multiple highlight ranges.
Configuration menu - View commit details
-
Copy full SHA for ed8f608 - Browse repository at this point
Copy the full SHA ed8f608View commit details -
Configuration menu - View commit details
-
Copy full SHA for f2c5b1d - Browse repository at this point
Copy the full SHA f2c5b1dView commit details -
Make matched text in tooltip smaller and limit to 2 lines
Still enough context and makes it a bit less "in your face".
Configuration menu - View commit details
-
Copy full SHA for c24d2f5 - Browse repository at this point
Copy the full SHA c24d2f5View commit details -
Configuration menu - View commit details
-
Copy full SHA for 9220890 - Browse repository at this point
Copy the full SHA 9220890View commit details -
The previous commits changed a lot -> a rebuild is necessary.
Configuration menu - View commit details
-
Copy full SHA for 21ef9bf - Browse repository at this point
Copy the full SHA 21ef9bfView commit details
Commits on Sep 3, 2024
-
These failed due to the passage of time. We incorrectly used a locator with "2 years ago" in it, which obviously fails at some point, given that the dates stay constant.
Configuration menu - View commit details
-
Copy full SHA for 5b699b1 - Browse repository at this point
Copy the full SHA 5b699b1View commit details -
Fix CI script for test deployments
Turns out `actions/upload-artifacts` introduced a breaking change where hidden files are not included anymore by default. This also applies to explicitly listed hidden files, which does not make any sense to me. But oh well... actions/upload-artifact#602
Configuration menu - View commit details
-
Copy full SHA for 0cdc964 - Browse repository at this point
Copy the full SHA 0cdc964View commit details -
Store text type (caption, slide text) with all searchable texts
We want to distinguish different text matches in the frontend, so we need to store the type at every point.
Configuration menu - View commit details
-
Copy full SHA for 0b5ff25 - Browse repository at this point
Copy the full SHA 0b5ff25View commit details -
Adjust text matches timeline design
This is now closer to the old video portal and distinguishes between the two different kind of matches.
Configuration menu - View commit details
-
Copy full SHA for b428868 - Browse repository at this point
Copy the full SHA b428868View commit details -
Add segment preview images on text match hover
Also seeing the preview image for the text matches makes it easier to find a relevant spot in the video. The segment preview images are not loaded in the initial GQL request as it would increase response size quite a bit and is not necessary in most cases. We simply load them on hover, i.e. when they will become used.
Configuration menu - View commit details
-
Copy full SHA for b80be67 - Browse repository at this point
Copy the full SHA b80be67View commit details -
Include text assets in DB dump
This increases the size of the dump from roughly 2.7MB to 11MB. But we want the dump to be immediately fully useful and reflect real world data, so we accept this increase.
Configuration menu - View commit details
-
Copy full SHA for 7ba81bf - Browse repository at this point
Copy the full SHA 7ba81bfView commit details -
Change
db load-dump
to cache downloaded dumpsNo point in downloading it again and again. This is simple, no invalidation is done and the dumps are keyed by the version.
Configuration menu - View commit details
-
Copy full SHA for e7ba169 - Browse repository at this point
Copy the full SHA e7ba169View commit details -
Configuration menu - View commit details
-
Copy full SHA for e5c1080 - Browse repository at this point
Copy the full SHA e5c1080View commit details
Commits on Sep 5, 2024
-
Adjust text match tooltip sizing
Since the area for showing the text is made a bit smaller, we also use less context in the backend.
Configuration menu - View commit details
-
Copy full SHA for 4ae7c7d - Browse repository at this point
Copy the full SHA 4ae7c7dView commit details -
Configuration menu - View commit details
-
Copy full SHA for cda9393 - Browse repository at this point
Copy the full SHA cda9393View commit details
Commits on Sep 6, 2024
-
Fix z-index issue by removing z-index values
This fix was a easier than anticipated (once I figured this out anyway). The problem is: giving the `WithTooltip` a `z-index`, means that parent element (containing both the trigger element and the tooltip), creates a new stacking context. And a stacking context and its children behaves as one unit in regards to other stacking contexts. So even setting the z-index of the tooltip itself to 100, two timelines are two different stacking contexts, so the 100 of the one tooltip is not rendered in front of the 4 of the trigger elements of another timeline. So there were two options: - Either introduce another div in the floating components that would have the trigger element as children, but not the tooltip, and that could get a z-index then. But that would have required changing appkit. - Remove the z-index from `WithTooltip` to avoid creating stacking contexts. First I thought the only way to do this is to remove the big clickable link area of the search results. And we might still do that, as it often brings annoying problems. But for this particular problem, the solution turned out easier than I thought.
Configuration menu - View commit details
-
Copy full SHA for 73e2f5f - Browse repository at this point
Copy the full SHA 73e2f5fView commit details -
Change icon for slide text match tooltip to
letter-text
Unfortunately, this required adding a new package. `react-icons` has a super old version of lucide icons, even in its newest version. Since almost all of our icons are from Lucide anyway, it makes sense to just use their packages directly.
Configuration menu - View commit details
-
Copy full SHA for e351418 - Browse repository at this point
Copy the full SHA e351418View commit details