Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Diarized Input Malformed Timestamp #38

Open
Esnapp opened this issue Dec 5, 2024 · 1 comment
Open

Diarized Input Malformed Timestamp #38

Esnapp opened this issue Dec 5, 2024 · 1 comment

Comments

@Esnapp
Copy link

Esnapp commented Dec 5, 2024

When auto-diarization is enabled the transcription loaded into pocketbase has a first timestamp of null instead of 0. This causes displayPane to throw an async promise error and keep it from displaying the output. The first speaker is also marked as null but does not seem to throw an error.
Attempt 1 Undiarized

{
  "model": {
    "audio": {
      "ctx": 1500,
      "head": 6,
      "layer": 4,
      "state": 384
    },
    "ftype": 1,
    "mels": 80,
    "multilingual": false,
    "text": {
      "ctx": 448,
      "head": 6,
      "layer": 4,
      "state": 384
    },
    "type": "tiny",
    "vocab": 51864
  },
  "params": {
    "language": "en",
    "model": "/models/ggml-tiny.en.bin",
    "translate": false
  },
  "result": {
    "language": "en"
  },
  "systeminfo": "AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | COREML = 0 | OPENVINO = 0 | ",
  "transcription": [
    {
      "offsets": {
        "from": 0,
        "to": 70
      },
      "text": "",
      "timestamps": {
        "from": "00:00:00,000",
        "to": "00:00:00,070"
      }
    },
    {
      "offsets": {
        "from": 70,
        "to": 250
      },
      "text": " All",
      "timestamps": {
        "from": "00:00:00,070",
        "to": "00:00:00,250"
      }
    },
    {
      "offsets": {
        "from": 250,
        "to": 680
      },
      "text": " right",
      "timestamps": {
        "from": "00:00:00,250",
        "to": "00:00:00,680"
      }
    },
    {
      "offsets": {
        "from": 680,
        "to": 850
      },
      "text": ",",
      "timestamps": {
        "from": "00:00:00,680",
        "to": "00:00:00,850"
      }
    },
    {
      "offsets": {
        "from": 850,
        "to": 1020
      },
      "text": " so",
      "timestamps": {
        "from": "00:00:00,850",
        "to": "00:00:01,020"
      }
    },
    {
      "offsets": {
        "from": 1020,
        "to": 1270
      },
      "text": " I'm",
      "timestamps": {
        "from": "00:00:01,020",
        "to": "00:00:01,300"
      }
    },
    {
      "offsets": {
        "from": 1270,
        "to": 1300
      },
      "text": "'m",
      "timestamps": {
        "from": "00:00:01,270",
        "to": "00:00:01,300"
      }
    },
    {
      "offsets": {
        "from": 1300,
        "to": 2040
      },
      "text": " recording",
      "timestamps": {
        "from": "00:00:01,300",
        "to": "00:00:02,040"
      }
    },
    {
      "offsets": {
        "from": 2040,
        "to": 2380
      },
      "text": " this.",
      "timestamps": {
        "from": "00:00:02,040",
        "to": "00:00:02,680"
      }
    },
    {
      "offsets": {
        "from": 2380,
        "to": 2680
      },
      "text": ".",
      "timestamps": {
        "from": "00:00:02,380",
        "to": "00:00:02,680"
      }
    },
    {
      "offsets": {
        "from": 2680,
        "to": 2980
      },
      "text": " This",
      "timestamps": {
        "from": "00:00:02,680",
        "to": "00:00:02,980"
      }
    },
    {
      "offsets": {
        "from": 2980,
        "to": 3150
      },
      "text": " is",
      "timestamps": {
        "from": "00:00:02,980",
        "to": "00:00:03,150"
      }
    },
    {
      "offsets": {
        "from": 3150,
        "to": 3430
      },
      "text": " just",
      "timestamps": {
        "from": "00:00:03,150",
        "to": "00:00:03,430"
      }
    },
    {
      "offsets": {
        "from": 3430,
        "to": 3500
      },
      "text": " a",
      "timestamps": {
        "from": "00:00:03,430",
        "to": "00:00:03,500"
      }
    },
    {
      "offsets": {
        "from": 3500,
        "to": 3830
      },
      "text": " test.",
      "timestamps": {
        "from": "00:00:03,500",
        "to": "00:00:04,000"
      }
    },
    {
      "offsets": {
        "from": 3830,
        "to": 4000
      },
      "text": ".",
      "timestamps": {
        "from": "00:00:03,830",
        "to": "00:00:04,000"
      }
    },
    {
      "offsets": {
        "from": 4000,
        "to": 4160
      },
      "text": " So",
      "timestamps": {
        "from": "00:00:04,000",
        "to": "00:00:04,160"
      }
    },
    {
      "offsets": {
        "from": 4160,
        "to": 4370
      },
      "text": " just",
      "timestamps": {
        "from": "00:00:04,160",
        "to": "00:00:04,370"
      }
    },
    {
      "offsets": {
        "from": 4370,
        "to": 4540
      },
      "text": " say",
      "timestamps": {
        "from": "00:00:04,370",
        "to": "00:00:04,540"
      }
    },
    {
      "offsets": {
        "from": 4540,
        "to": 4860
      },
      "text": " whatever",
      "timestamps": {
        "from": "00:00:04,540",
        "to": "00:00:04,860"
      }
    },
    {
      "offsets": {
        "from": 4860,
        "to": 4980
      },
      "text": " you",
      "timestamps": {
        "from": "00:00:04,860",
        "to": "00:00:04,980"
      }
    },
    {
      "offsets": {
        "from": 4980,
        "to": 5200
      },
      "text": " want.",
      "timestamps": {
        "from": "00:00:04,980",
        "to": "00:00:06,000"
      }
    },
    {
      "offsets": {
        "from": 5200,
        "to": 6000
      },
      "text": ".",
      "timestamps": {
        "from": "00:00:05,200",
        "to": "00:00:06,000"
      }
    },
    {
      "offsets": {
        "from": 6000,
        "to": 6920
      },
      "text": " Anything.",
      "timestamps": {
        "from": "00:00:06,000",
        "to": "00:00:07,000"
      }
    },
    {
      "offsets": {
        "from": 6920,
        "to": 7000
      },
      "text": ".",
      "timestamps": {
        "from": "00:00:06,920",
        "to": "00:00:07,000"
      }
    },
    {
      "offsets": {
        "from": 7000,
        "to": 8140
      },
      "text": " Okay.",
      "timestamps": {
        "from": "00:00:07,000",
        "to": "00:00:09,000"
      }
    },
    {
      "offsets": {
        "from": 8140,
        "to": 9000
      },
      "text": ".",
      "timestamps": {
        "from": "00:00:08,140",
        "to": "00:00:09,000"
      }
    },
    {
      "offsets": {
        "from": 9000,
        "to": 10000
      },
      "text": " Um",
      "timestamps": {
        "from": "00:00:09,000",
        "to": "00:00:10,000"
      }
    },
    {
      "offsets": {
        "from": 10000,
        "to": 10090
      },
      "text": ",",
      "timestamps": {
        "from": "00:00:10,000",
        "to": "00:00:10,090"
      }
    },
    {
      "offsets": {
        "from": 10090,
        "to": 10340
      },
      "text": " today's",
      "timestamps": {
        "from": "00:00:10,090",
        "to": "00:00:10,430"
      }
    },
    {
      "offsets": {
        "from": 10340,
        "to": 10430
      },
      "text": "'s",
      "timestamps": {
        "from": "00:00:10,340",
        "to": "00:00:10,430"
      }
    },
    {
      "offsets": {
        "from": 10430,
        "to": 10480
      },
      "text": " a",
      "timestamps": {
        "from": "00:00:10,430",
        "to": "00:00:10,480"
      }
    },
    {
      "offsets": {
        "from": 10480,
        "to": 10680
      },
      "text": " good",
      "timestamps": {
        "from": "00:00:10,480",
        "to": "00:00:10,680"
      }
    },
    {
      "offsets": {
        "from": 10680,
        "to": 10830
      },
      "text": " day.",
      "timestamps": {
        "from": "00:00:10,680",
        "to": "00:00:11,000"
      }
    },
    {
      "offsets": {
        "from": 10830,
        "to": 11000
      },
      "text": ".",
      "timestamps": {
        "from": "00:00:10,830",
        "to": "00:00:11,000"
      }
    },
    {
      "offsets": {
        "from": 11000,
        "to": 11200
      },
      "text": " I",
      "timestamps": {
        "from": "00:00:11,000",
        "to": "00:00:11,200"
      }
    },
    {
      "offsets": {
        "from": 11200,
        "to": 12000
      },
      "text": " hope",
      "timestamps": {
        "from": "00:00:11,200",
        "to": "00:00:12,000"
      }
    },
    {
      "offsets": {
        "from": 12000,
        "to": 12270
      },
      "text": " you",
      "timestamps": {
        "from": "00:00:12,000",
        "to": "00:00:12,270"
      }
    },
    {
      "offsets": {
        "from": 12270,
        "to": 13000
      },
      "text": " probably",
      "timestamps": {
        "from": "00:00:12,270",
        "to": "00:00:13,000"
      }
    },
    {
      "offsets": {
        "from": 13000,
        "to": 13180
      },
      "text": " good",
      "timestamps": {
        "from": "00:00:13,000",
        "to": "00:00:13,180"
      }
    },
    {
      "offsets": {
        "from": 13180,
        "to": 13320
      },
      "text": " day",
      "timestamps": {
        "from": "00:00:13,180",
        "to": "00:00:13,320"
      }
    },
    {
      "offsets": {
        "from": 13320,
        "to": 13510
      },
      "text": " into",
      "timestamps": {
        "from": "00:00:13,320",
        "to": "00:00:13,510"
      }
    },
    {
      "offsets": {
        "from": 13510,
        "to": 13780
      },
      "text": " you.",
      "timestamps": {
        "from": "00:00:13,510",
        "to": "00:00:14,000"
      }
    },
    {
      "offsets": {
        "from": 13780,
        "to": 14000
      },
      "text": ".",
      "timestamps": {
        "from": "00:00:13,780",
        "to": "00:00:14,000"
      }
    },
    {
      "offsets": {
        "from": 14000,
        "to": 14450
      },
      "text": " Thank",
      "timestamps": {
        "from": "00:00:14,000",
        "to": "00:00:14,450"
      }
    },
    {
      "offsets": {
        "from": 14450,
        "to": 14720
      },
      "text": " you.",
      "timestamps": {
        "from": "00:00:14,450",
        "to": "00:00:15,000"
      }
    },
    {
      "offsets": {
        "from": 14720,
        "to": 15000
      },
      "text": ".",
      "timestamps": {
        "from": "00:00:14,720",
        "to": "00:00:15,000"
      }
    },
    {
      "offsets": {
        "from": 15000,
        "to": 15000
      },
      "text": " All",
      "timestamps": {
        "from": "00:00:15,000",
        "to": "00:00:15,000"
      }
    },
    {
      "offsets": {
        "from": 15000,
        "to": 15000
      },
      "text": " right.",
      "timestamps": {
        "from": "00:00:15,000",
        "to": "00:00:15,000"
      }
    },
    {
      "offsets": {
        "from": 15000,
        "to": 15000
      },
      "text": ".",
      "timestamps": {
        "from": "00:00:15,000",
        "to": "00:00:15,000"
      }
    }
  ]
}

Attempt 1 Diarized

{
  "transcription": [
    {
      "speaker": null,
      "text": " All right so I'm",
      "timestamps": {
        "from": null,
        "to": "00:00:01,300"
      }
    },
    {
      "speaker": "SPEAKER_02",
      "text": " recording this. This is just a test. So just say whatever you want. Anything.",
      "timestamps": {
        "from": "00:00:01,300",
        "to": "00:00:07,000"
      }
    },
    {
      "speaker": "SPEAKER_00",
      "text": " Okay. Um today's a good day. I hope you probably good day into you.",
      "timestamps": {
        "from": "00:00:07,000",
        "to": "00:00:14,000"
      }
    },
    {
      "speaker": "SPEAKER_01",
      "text": " Thank you. All right.",
      "timestamps": {
        "from": "00:00:14,000",
        "to": "00:00:15,000"
      }
    }
  ]
}

Attempt 2 Undiarized

{
  "model": {
    "audio": {
      "ctx": 1500,
      "head": 6,
      "layer": 4,
      "state": 384
    },
    "ftype": 1,
    "mels": 80,
    "multilingual": false,
    "text": {
      "ctx": 448,
      "head": 6,
      "layer": 4,
      "state": 384
    },
    "type": "tiny",
    "vocab": 51864
  },
  "params": {
    "language": "en",
    "model": "/models/ggml-tiny.en.bin",
    "translate": false
  },
  "result": {
    "language": "en"
  },
  "systeminfo": "AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | COREML = 0 | OPENVINO = 0 | ",
  "transcription": [
    {
      "offsets": {
        "from": 0,
        "to": 7000
      },
      "text": " All right, so I'm recording this. This is just a test. So just say whatever you want. Anything.",
      "timestamps": {
        "from": "00:00:00,000",
        "to": "00:00:07,000"
      }
    },
    {
      "offsets": {
        "from": 7000,
        "to": 14000
      },
      "text": " Okay. Um, today's a good day. I hope you probably good day into you.",
      "timestamps": {
        "from": "00:00:07,000",
        "to": "00:00:14,000"
      }
    },
    {
      "offsets": {
        "from": 14000,
        "to": 15000
      },
      "text": " Thank you. All right.",
      "timestamps": {
        "from": "00:00:14,000",
        "to": "00:00:15,000"
      }
    }
  ]
}
@Esnapp
Copy link
Author

Esnapp commented Dec 5, 2024

Looking at the rttm generated by pyannote I think I've found where the error is occurring, the first 1.280 seconds has no speaker label. So when loading the rttm and matching against the transcription from whisper it runs into an error and leaves the timestamp and speaker as null.

SPEAKER iyx4k25z3jhygki-ffmpeg 1 1.280 4.016 <NA> <NA> SPEAKER_02 <NA> <NA>
SPEAKER iyx4k25z3jhygki-ffmpeg 1 7.760 1.299 <NA> <NA> SPEAKER_00 <NA> <NA>
SPEAKER iyx4k25z3jhygki-ffmpeg 1 10.443 1.603 <NA> <NA> SPEAKER_00 <NA> <NA>
SPEAKER iyx4k25z3jhygki-ffmpeg 1 12.434 1.249 <NA> <NA> SPEAKER_00 <NA> <NA>
SPEAKER iyx4k25z3jhygki-ffmpeg 1 14.020 1.114 <NA> <NA> SPEAKER_01 <NA> <NA>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant