Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to find events by slide text & captions in search #1189

Open
wants to merge 39 commits into
base: master
Choose a base branch
from

Commits on Aug 12, 2024

  1. Add event_texts and event_texts_queue to DB

    These tables will hold texts of events, extracted from subtitles and
    slide texts, which will be searchable later. The queue is used for
    fetching all those text assets from Opencast.
    LukasKalbertodt committed Aug 12, 2024
    Configuration menu
    Copy the full SHA
    b1ba905 View commit details
    Browse the repository at this point in the history
  2. Add other_hosts config to [opencast] section

    This is useful to specify other trusted hosts, where Tobira may send
    the sync login data to.
    LukasKalbertodt committed Aug 12, 2024
    Configuration menu
    Copy the full SHA
    68dbaa0 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    06d3c66 View commit details
    Browse the repository at this point in the history
  4. Add function for fetching text assets from OC (sync texts fetch)

    This will be part of the worker and is able to deal with a variety of
    error cases. Figuring all this out took quite some time. I decided now
    that ignoring assets for which Opencast returns something unexpected is
    fine most of the time. Admins will be able to easily requeue these
    failed events.
    
    This can also deal with network errors or similar indications that OC
    is not available at the moment, using an exponential backoff then.
    LukasKalbertodt committed Aug 12, 2024
    Configuration menu
    Copy the full SHA
    e860481 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    8518d9a View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    8cb63db View commit details
    Browse the repository at this point in the history
  7. Fix terminal color logic

    What... like the explicit color choice is there to override terminal
    detection. So why...?!
    LukasKalbertodt committed Aug 12, 2024
    Configuration menu
    Copy the full SHA
    548b432 View commit details
    Browse the repository at this point in the history
  8. Remove outdated texts after fetching new ones

    This was forgotten before: maybe some assets don't exist anymore after
    an event was updated. Those entries shouldn't persist in the event_texts
    table. By deleting all entries beforehand, we can also easily use a bulk
    insert now (since we don't require `on conflict`). I extracted some
    logic into a helper function to deduplicate code. I tested the users
    upsert function after this change.
    LukasKalbertodt committed Aug 12, 2024
    Configuration menu
    Copy the full SHA
    a5a138d View commit details
    Browse the repository at this point in the history
  9. Add queue and dequeue subcommands for sync texts

    This allows you to queue or dequeue a specific set of events. In
    particular the `queue --missing` is very relevant as Tobira sometimes
    gives up on some events after too many failures.
    LukasKalbertodt committed Aug 12, 2024
    Configuration menu
    Copy the full SHA
    6ca156c View commit details
    Browse the repository at this point in the history
  10. Make download concurrency configurable and change default to 8

    I just tested this with our 12 core test Opencast (where Java serves
    the files):
    
     2: ~125% CPU, ~1.1 MiB/s down      => 3m 2s
     4: ~230% CPU, ~2.0 MiB/s down      => 1m 42s
     8: ~380% CPU, ~3.5 MiB/s down      => 1m
    16: ~600% CPU, ~5.5 MiB/s down      => 42s
    32: CPU and downlink wildly varying => 37s
    LukasKalbertodt committed Aug 12, 2024
    Configuration menu
    Copy the full SHA
    84fc10a View commit details
    Browse the repository at this point in the history
  11. Configuration menu
    Copy the full SHA
    cbb9fc7 View commit details
    Browse the repository at this point in the history

Commits on Sep 2, 2024

  1. Configuration menu
    Copy the full SHA
    9fd2c4a View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    8e2b031 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    d1a1d69 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    378c402 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    30a840f View commit details
    Browse the repository at this point in the history
  6. Optimize text search index

    The main change is that texts with the same span are concatenated to
    only be one entry in the index. This doesn't reduce the size of the
    `texts` field in Meili, but that of the timespan index. This
    optimization is mostly there for slide texts, not for captions.
    
    But this commit also moves the build process into `FromSql` to avoid
    a bunch of useless allocations. Ideally one would also avoid all the
    intermediate `String` allocations, but that's not easily possible right
    now.
    LukasKalbertodt committed Sep 2, 2024
    Configuration menu
    Copy the full SHA
    98f45ea View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    6f2d8e9 View commit details
    Browse the repository at this point in the history
  8. Fix timeline width in search events

    It was only as wide as the metadata made the container, which is not
    great.
    LukasKalbertodt committed Sep 2, 2024
    Configuration menu
    Copy the full SHA
    e7c13c6 View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    0303e1f View commit details
    Browse the repository at this point in the history
  10. Add more aggressive text cleanup (require words >= 1 bytes long)

    This is still not very aggressive... I first wanted use 2 as threshold,
    but ... looking at all chars encodable in 2 byte UTF-8... I cannot be
    sure that it doesn't make sense to search for one individually. Pi came
    to mind. We can always make this more aggressive later.
    LukasKalbertodt committed Sep 2, 2024
    Configuration menu
    Copy the full SHA
    82fbe60 View commit details
    Browse the repository at this point in the history
  11. Ignore texts with timespans less than 100ms

    Mostly ignoring broken ranges
    LukasKalbertodt committed Sep 2, 2024
    Configuration menu
    Copy the full SHA
    03d089b View commit details
    Browse the repository at this point in the history
  12. Improve textMatches in search API by joining multiple matches

    This rewrites the logic that creates the `textMatches` array for the
    search API. Before, one Meili match was emitted as one text match, but
    this had several problems. Most importantly, with two words in the
    query, if those words would appear in a text right next to one another,
    Meili would still generate two matches. They would have the same
    timespan and Tobira would just show two divs on top of each other, only
    one of which would be visible.
    
    Now, for each individual text, we join all matches (with a limit) and
    return only one `TextMatch`, but potentially with multiple highlight
    ranges.
    LukasKalbertodt committed Sep 2, 2024
    Configuration menu
    Copy the full SHA
    ed8f608 View commit details
    Browse the repository at this point in the history
  13. Configuration menu
    Copy the full SHA
    f2c5b1d View commit details
    Browse the repository at this point in the history
  14. Make matched text in tooltip smaller and limit to 2 lines

    Still enough context and makes it a bit less "in your face".
    LukasKalbertodt committed Sep 2, 2024
    Configuration menu
    Copy the full SHA
    c24d2f5 View commit details
    Browse the repository at this point in the history
  15. Configuration menu
    Copy the full SHA
    9220890 View commit details
    Browse the repository at this point in the history
  16. Bump search index version

    The previous commits changed a lot -> a rebuild is necessary.
    LukasKalbertodt committed Sep 2, 2024
    Configuration menu
    Copy the full SHA
    21ef9bf View commit details
    Browse the repository at this point in the history

Commits on Sep 3, 2024

  1. Fix playwright tests

    These failed due to the passage of time. We incorrectly used a locator
    with "2 years ago" in it, which obviously fails at some point, given
    that the dates stay constant.
    LukasKalbertodt committed Sep 3, 2024
    Configuration menu
    Copy the full SHA
    5b699b1 View commit details
    Browse the repository at this point in the history
  2. Fix CI script for test deployments

    Turns out `actions/upload-artifacts` introduced a breaking change where
    hidden files are not included anymore by default. This also applies to
    explicitly listed hidden files, which does not make any sense to me. But
    oh well...
    
    actions/upload-artifact#602
    LukasKalbertodt committed Sep 3, 2024
    Configuration menu
    Copy the full SHA
    0cdc964 View commit details
    Browse the repository at this point in the history
  3. Store text type (caption, slide text) with all searchable texts

    We want to distinguish different text matches in the frontend, so we
    need to store the type at every point.
    LukasKalbertodt committed Sep 3, 2024
    Configuration menu
    Copy the full SHA
    0b5ff25 View commit details
    Browse the repository at this point in the history
  4. Adjust text matches timeline design

    This is now closer to the old video portal and distinguishes between the
    two different kind of matches.
    LukasKalbertodt committed Sep 3, 2024
    Configuration menu
    Copy the full SHA
    b428868 View commit details
    Browse the repository at this point in the history
  5. Add segment preview images on text match hover

    Also seeing the preview image for the text matches makes it easier to
    find a relevant spot in the video.
    
    The segment preview images are not loaded in the initial GQL request as
    it would increase response size quite a bit and is not necessary in most
    cases. We simply load them on hover, i.e. when they will become used.
    LukasKalbertodt committed Sep 3, 2024
    Configuration menu
    Copy the full SHA
    b80be67 View commit details
    Browse the repository at this point in the history
  6. Include text assets in DB dump

    This increases the size of the dump from roughly 2.7MB to 11MB. But we
    want the dump to be immediately fully useful and reflect real world
    data, so we accept this increase.
    LukasKalbertodt committed Sep 3, 2024
    Configuration menu
    Copy the full SHA
    7ba81bf View commit details
    Browse the repository at this point in the history
  7. Change db load-dump to cache downloaded dumps

    No point in downloading it again and again. This is simple, no
    invalidation is done and the dumps are keyed by the version.
    LukasKalbertodt committed Sep 3, 2024
    Configuration menu
    Copy the full SHA
    e7ba169 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    e5c1080 View commit details
    Browse the repository at this point in the history

Commits on Sep 5, 2024

  1. Adjust text match tooltip sizing

    Since the area for showing the text is made a bit smaller, we also use
    less context in the backend.
    LukasKalbertodt committed Sep 5, 2024
    Configuration menu
    Copy the full SHA
    4ae7c7d View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    cda9393 View commit details
    Browse the repository at this point in the history

Commits on Sep 6, 2024

  1. Fix z-index issue by removing z-index values

    This fix was a easier than anticipated (once I figured this out anyway).
    The problem is: giving the `WithTooltip` a `z-index`, means that parent
    element (containing both the trigger element and the tooltip), creates
    a new stacking context. And a stacking context and its children behaves
    as one unit in regards to other stacking contexts. So even setting the
    z-index of the tooltip itself to 100, two timelines are two different
    stacking contexts, so the 100 of the one tooltip is not rendered in
    front of the 4 of the trigger elements of another timeline. So there
    were two options:
    - Either introduce another div in the floating components that would
      have the trigger element as children, but not the tooltip, and that
      could get a z-index then. But that would have required changing
      appkit.
    - Remove the z-index from `WithTooltip` to avoid creating stacking
      contexts. First I thought the only way to do this is to remove the
      big clickable link area of the search results. And we might still do
      that, as it often brings annoying problems. But for this particular
      problem, the solution turned out easier than I thought.
    LukasKalbertodt committed Sep 6, 2024
    Configuration menu
    Copy the full SHA
    73e2f5f View commit details
    Browse the repository at this point in the history
  2. Change icon for slide text match tooltip to letter-text

    Unfortunately, this required adding a new package. `react-icons` has
    a super old version of lucide icons, even in its newest version. Since
    almost all of our icons are from Lucide anyway, it makes sense to just
    use their packages directly.
    LukasKalbertodt committed Sep 6, 2024
    Configuration menu
    Copy the full SHA
    e351418 View commit details
    Browse the repository at this point in the history