(feat) configurable timestamp options for audio-to-text #3207

eliteprox · 2024-10-15T17:11:03Z

What does this pull request do? Explain your changes. (required)

This change adds the return_timestamps parameter to the audio-to-text pipeline, allowing end-users to configure the inference job to return timestamps at word-level, sentence-level or no timestamps at all.

Supported values for return_timestamps are false and word. The pipeline defaults to existing behavior of sentence-level timestamp transcription to avoid breaking changes with existing applications.

Specific updates (required)

This change only updates the go.mod references for ai-worker. See (feat) add return_timestamps as configurable in request ai-worker#228

How did you test each of these updates (required)

sentence-level timestamps

Sent request without return_timestamps parameter to verify inference job still defaults to sentence-level timestamps
sentence-timestamps.json

curl -X POST "https://<GATEWAY_IP>/audio-to-text" \
    -F model_id=openai/whisper-large-v3 \
    -F audio=@<PATH_TO_FILE> \

word-level timestamps

Sent request with return_timestamps=word to validate timestamps are returned at word-level
word-timestamps.json

curl -X POST "https://<GATEWAY_IP>/audio-to-text" \
    -F model_id=openai/whisper-large-v3 \
    -F audio=@<PATH_TO_FILE> \
    -F return_timestamps="word"

no timestamps

Sent request with return_timestamps=false to validate timestamps are excluded
no-timestamps.json

curl -X POST "https://<GATEWAY_IP>/audio-to-text" \
    -F model_id=openai/whisper-large-v3 \
    -F audio=@<PATH_TO_FILE> \
    -F return_timestamps="false"

Does this pull request close any open issues?

AI-630

Checklist:

Read the contribution guide
make runs successfully
All tests in ./test.sh pass
README and other documentation updated
Pending changelog updated

codecov · 2024-10-23T17:06:38Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 35.92244%. Comparing base (c41f3c4) to head (9d5130c).
Report is 8 commits behind head on ai-video.

Additional details and impacted files

@@                 Coverage Diff                 @@
##            ai-video       #3207         +/-   ##
===================================================
- Coverage   36.07820%   35.92244%   -0.15576%     
===================================================
  Files            124         124                 
  Lines          34525       34658        +133     
===================================================
- Hits           12456       12450          -6     
- Misses         21381       21520        +139     
  Partials         688         688

see 1 file with indirect coverage changes

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1bc4a6a...9d5130c. Read the comment docs.

see 1 file with indirect coverage changes

rickstaa

Wait I think I broke something.

rickstaa

I think I broke something.

This commit updates the ai-worker to the latest version so that users can start using the new `audio-to-text` `return_timestamps` field.

eliteprox changed the base branch from master to ai-video October 15, 2024 17:16

eliteprox requested a review from rickstaa as a code owner October 15, 2024 17:16

rickstaa force-pushed the ai-video branch from 4a66b22 to 2c50134 Compare October 21, 2024 09:13

eliteprox mentioned this pull request Oct 23, 2024

a2t: update docs to include return_timestamps option livepeer/docs#672

Closed

update go.mod for a2t optional timestamps

e7e633d

eliteprox force-pushed the a2t-optional-timestamps branch from 02c608b to e7e633d Compare October 23, 2024 16:40

update go mod bindings

e8cf7d3

rickstaa force-pushed the a2t-optional-timestamps branch from da087f9 to 3807cdc Compare October 25, 2024 08:38

rickstaa approved these changes Oct 25, 2024

View reviewed changes

rickstaa requested changes Oct 25, 2024

View reviewed changes

chore(ai): update ai-worker to new version

9d5130c

This commit updates the ai-worker to the latest version so that users can start using the new `audio-to-text` `return_timestamps` field.

rickstaa force-pushed the a2t-optional-timestamps branch from 3807cdc to 9d5130c Compare October 25, 2024 08:48

rickstaa merged commit 00bcceb into livepeer:ai-video Oct 25, 2024
14 checks passed

rickstaa deleted the a2t-optional-timestamps branch October 25, 2024 09:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(feat) configurable timestamp options for audio-to-text #3207

(feat) configurable timestamp options for audio-to-text #3207

eliteprox commented Oct 15, 2024 •

edited

Loading

codecov bot commented Oct 23, 2024 •

edited

Loading

rickstaa left a comment •

edited

Loading

rickstaa left a comment

(feat) configurable timestamp options for audio-to-text #3207

(feat) configurable timestamp options for audio-to-text #3207

Conversation

eliteprox commented Oct 15, 2024 • edited Loading

sentence-level timestamps

word-level timestamps

no timestamps

codecov bot commented Oct 23, 2024 • edited Loading

Codecov Report

rickstaa left a comment • edited Loading

Choose a reason for hiding this comment

rickstaa left a comment

Choose a reason for hiding this comment

eliteprox commented Oct 15, 2024 •

edited

Loading

codecov bot commented Oct 23, 2024 •

edited

Loading

rickstaa left a comment •

edited

Loading