Diarized Input Malformed Timestamp #38

Esnapp · 2024-12-05T19:31:15Z

When auto-diarization is enabled the transcription loaded into pocketbase has a first timestamp of null instead of 0. This causes displayPane to throw an async promise error and keep it from displaying the output. The first speaker is also marked as null but does not seem to throw an error.
Attempt 1 Undiarized

{
  "model": {
    "audio": {
      "ctx": 1500,
      "head": 6,
      "layer": 4,
      "state": 384
    },
    "ftype": 1,
    "mels": 80,
    "multilingual": false,
    "text": {
      "ctx": 448,
      "head": 6,
      "layer": 4,
      "state": 384
    },
    "type": "tiny",
    "vocab": 51864
  },
  "params": {
    "language": "en",
    "model": "/models/ggml-tiny.en.bin",
    "translate": false
  },
  "result": {
    "language": "en"
  },
  "systeminfo": "AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | COREML = 0 | OPENVINO = 0 | ",
  "transcription": [
    {
      "offsets": {
        "from": 0,
        "to": 70
      },
      "text": "",
      "timestamps": {
        "from": "00:00:00,000",
        "to": "00:00:00,070"
      }
    },
    {
      "offsets": {
        "from": 70,
        "to": 250
      },
      "text": " All",
      "timestamps": {
        "from": "00:00:00,070",
        "to": "00:00:00,250"
      }
    },
    {
      "offsets": {
        "from": 250,
        "to": 680
      },
      "text": " right",
      "timestamps": {
        "from": "00:00:00,250",
        "to": "00:00:00,680"
      }
    },
    {
      "offsets": {
        "from": 680,
        "to": 850
      },
      "text": ",",
      "timestamps": {
        "from": "00:00:00,680",
        "to": "00:00:00,850"
      }
    },
    {
      "offsets": {
        "from": 850,
        "to": 1020
      },
      "text": " so",
      "timestamps": {
        "from": "00:00:00,850",
        "to": "00:00:01,020"
      }
    },
    {
      "offsets": {
        "from": 1020,
        "to": 1270
      },
      "text": " I'm",
      "timestamps": {
        "from": "00:00:01,020",
        "to": "00:00:01,300"
      }
    },
    {
      "offsets": {
        "from": 1270,
        "to": 1300
      },
      "text": "'m",
      "timestamps": {
        "from": "00:00:01,270",
        "to": "00:00:01,300"
      }
    },
    {
      "offsets": {
        "from": 1300,
        "to": 2040
      },
      "text": " recording",
      "timestamps": {
        "from": "00:00:01,300",
        "to": "00:00:02,040"
      }
    },
    {
      "offsets": {
        "from": 2040,
        "to": 2380
      },
      "text": " this.",
      "timestamps": {
        "from": "00:00:02,040",
        "to": "00:00:02,680"
      }
    },
    {
      "offsets": {
        "from": 2380,
        "to": 2680
      },
      "text": ".",
      "timestamps": {
        "from": "00:00:02,380",
        "to": "00:00:02,680"
      }
    },
    {
      "offsets": {
        "from": 2680,
        "to": 2980
      },
      "text": " This",
      "timestamps": {
        "from": "00:00:02,680",
        "to": "00:00:02,980"
      }
    },
    {
      "offsets": {
        "from": 2980,
        "to": 3150
      },
      "text": " is",
      "timestamps": {
        "from": "00:00:02,980",
        "to": "00:00:03,150"
      }
    },
    {
      "offsets": {
        "from": 3150,
        "to": 3430
      },
      "text": " just",
      "timestamps": {
        "from": "00:00:03,150",
        "to": "00:00:03,430"
      }
    },
    {
      "offsets": {
        "from": 3430,
        "to": 3500
      },
      "text": " a",
      "timestamps": {
        "from": "00:00:03,430",
        "to": "00:00:03,500"
      }
    },
    {
      "offsets": {
        "from": 3500,
        "to": 3830
      },
      "text": " test.",
      "timestamps": {
        "from": "00:00:03,500",
        "to": "00:00:04,000"
      }
    },
    {
      "offsets": {
        "from": 3830,
        "to": 4000
      },
      "text": ".",
      "timestamps": {
        "from": "00:00:03,830",
        "to": "00:00:04,000"
      }
    },
    {
      "offsets": {
        "from": 4000,
        "to": 4160
      },
      "text": " So",
      "timestamps": {
        "from": "00:00:04,000",
        "to": "00:00:04,160"
      }
    },
    {
      "offsets": {
        "from": 4160,
        "to": 4370
      },
      "text": " just",
      "timestamps": {
        "from": "00:00:04,160",
        "to": "00:00:04,370"
      }
    },
    {
      "offsets": {
        "from": 4370,
        "to": 4540
      },
      "text": " say",
      "timestamps": {
        "from": "00:00:04,370",
        "to": "00:00:04,540"
      }
    },
    {
      "offsets": {
        "from": 4540,
        "to": 4860
      },
      "text": " whatever",
      "timestamps": {
        "from": "00:00:04,540",
        "to": "00:00:04,860"
      }
    },
    {
      "offsets": {
        "from": 4860,
        "to": 4980
      },
      "text": " you",
      "timestamps": {
        "from": "00:00:04,860",
        "to": "00:00:04,980"
      }
    },
    {
      "offsets": {
        "from": 4980,
        "to": 5200
      },
      "text": " want.",
      "timestamps": {
        "from": "00:00:04,980",
        "to": "00:00:06,000"
      }
    },
    {
      "offsets": {
        "from": 5200,
        "to": 6000
      },
      "text": ".",
      "timestamps": {
        "from": "00:00:05,200",
        "to": "00:00:06,000"
      }
    },
    {
      "offsets": {
        "from": 6000,
        "to": 6920
      },
      "text": " Anything.",
      "timestamps": {
        "from": "00:00:06,000",
        "to": "00:00:07,000"
      }
    },
    {
      "offsets": {
        "from": 6920,
        "to": 7000
      },
      "text": ".",
      "timestamps": {
        "from": "00:00:06,920",
        "to": "00:00:07,000"
      }
    },
    {
      "offsets": {
        "from": 7000,
        "to": 8140
      },
      "text": " Okay.",
      "timestamps": {
        "from": "00:00:07,000",
        "to": "00:00:09,000"
      }
    },
    {
      "offsets": {
        "from": 8140,
        "to": 9000
      },
      "text": ".",
      "timestamps": {
        "from": "00:00:08,140",
        "to": "00:00:09,000"
      }
    },
    {
      "offsets": {
        "from": 9000,
        "to": 10000
      },
      "text": " Um",
      "timestamps": {
        "from": "00:00:09,000",
        "to": "00:00:10,000"
      }
    },
    {
      "offsets": {
        "from": 10000,
        "to": 10090
      },
      "text": ",",
      "timestamps": {
        "from": "00:00:10,000",
        "to": "00:00:10,090"
      }
    },
    {
      "offsets": {
        "from": 10090,
        "to": 10340
      },
      "text": " today's",
      "timestamps": {
        "from": "00:00:10,090",
        "to": "00:00:10,430"
      }
    },
    {
      "offsets": {
        "from": 10340,
        "to": 10430
      },
      "text": "'s",
      "timestamps": {
        "from": "00:00:10,340",
        "to": "00:00:10,430"
      }
    },
    {
      "offsets": {
        "from": 10430,
        "to": 10480
      },
      "text": " a",
      "timestamps": {
        "from": "00:00:10,430",
        "to": "00:00:10,480"
      }
    },
    {
      "offsets": {
        "from": 10480,
        "to": 10680
      },
      "text": " good",
      "timestamps": {
        "from": "00:00:10,480",
        "to": "00:00:10,680"
      }
    },
    {
      "offsets": {
        "from": 10680,
        "to": 10830
      },
      "text": " day.",
      "timestamps": {
        "from": "00:00:10,680",
        "to": "00:00:11,000"
      }
    },
    {
      "offsets": {
        "from": 10830,
        "to": 11000
      },
      "text": ".",
      "timestamps": {
        "from": "00:00:10,830",
        "to": "00:00:11,000"
      }
    },
    {
      "offsets": {
        "from": 11000,
        "to": 11200
      },
      "text": " I",
      "timestamps": {
        "from": "00:00:11,000",
        "to": "00:00:11,200"
      }
    },
    {
      "offsets": {
        "from": 11200,
        "to": 12000
      },
      "text": " hope",
      "timestamps": {
        "from": "00:00:11,200",
        "to": "00:00:12,000"
      }
    },
    {
      "offsets": {
        "from": 12000,
        "to": 12270
      },
      "text": " you",
      "timestamps": {
        "from": "00:00:12,000",
        "to": "00:00:12,270"
      }
    },
    {
      "offsets": {
        "from": 12270,
        "to": 13000
      },
      "text": " probably",
      "timestamps": {
        "from": "00:00:12,270",
        "to": "00:00:13,000"
      }
    },
    {
      "offsets": {
        "from": 13000,
        "to": 13180
      },
      "text": " good",
      "timestamps": {
        "from": "00:00:13,000",
        "to": "00:00:13,180"
      }
    },
    {
      "offsets": {
        "from": 13180,
        "to": 13320
      },
      "text": " day",
      "timestamps": {
        "from": "00:00:13,180",
        "to": "00:00:13,320"
      }
    },
    {
      "offsets": {
        "from": 13320,
        "to": 13510
      },
      "text": " into",
      "timestamps": {
        "from": "00:00:13,320",
        "to": "00:00:13,510"
      }
    },
    {
      "offsets": {
        "from": 13510,
        "to": 13780
      },
      "text": " you.",
      "timestamps": {
        "from": "00:00:13,510",
        "to": "00:00:14,000"
      }
    },
    {
      "offsets": {
        "from": 13780,
        "to": 14000
      },
      "text": ".",
      "timestamps": {
        "from": "00:00:13,780",
        "to": "00:00:14,000"
      }
    },
    {
      "offsets": {
        "from": 14000,
        "to": 14450
      },
      "text": " Thank",
      "timestamps": {
        "from": "00:00:14,000",
        "to": "00:00:14,450"
      }
    },
    {
      "offsets": {
        "from": 14450,
        "to": 14720
      },
      "text": " you.",
      "timestamps": {
        "from": "00:00:14,450",
        "to": "00:00:15,000"
      }
    },
    {
      "offsets": {
        "from": 14720,
        "to": 15000
      },
      "text": ".",
      "timestamps": {
        "from": "00:00:14,720",
        "to": "00:00:15,000"
      }
    },
    {
      "offsets": {
        "from": 15000,
        "to": 15000
      },
      "text": " All",
      "timestamps": {
        "from": "00:00:15,000",
        "to": "00:00:15,000"
      }
    },
    {
      "offsets": {
        "from": 15000,
        "to": 15000
      },
      "text": " right.",
      "timestamps": {
        "from": "00:00:15,000",
        "to": "00:00:15,000"
      }
    },
    {
      "offsets": {
        "from": 15000,
        "to": 15000
      },
      "text": ".",
      "timestamps": {
        "from": "00:00:15,000",
        "to": "00:00:15,000"
      }
    }
  ]
}

Attempt 1 Diarized

{
  "transcription": [
    {
      "speaker": null,
      "text": " All right so I'm",
      "timestamps": {
        "from": null,
        "to": "00:00:01,300"
      }
    },
    {
      "speaker": "SPEAKER_02",
      "text": " recording this. This is just a test. So just say whatever you want. Anything.",
      "timestamps": {
        "from": "00:00:01,300",
        "to": "00:00:07,000"
      }
    },
    {
      "speaker": "SPEAKER_00",
      "text": " Okay. Um today's a good day. I hope you probably good day into you.",
      "timestamps": {
        "from": "00:00:07,000",
        "to": "00:00:14,000"
      }
    },
    {
      "speaker": "SPEAKER_01",
      "text": " Thank you. All right.",
      "timestamps": {
        "from": "00:00:14,000",
        "to": "00:00:15,000"
      }
    }
  ]
}

Attempt 2 Undiarized

{
  "model": {
    "audio": {
      "ctx": 1500,
      "head": 6,
      "layer": 4,
      "state": 384
    },
    "ftype": 1,
    "mels": 80,
    "multilingual": false,
    "text": {
      "ctx": 448,
      "head": 6,
      "layer": 4,
      "state": 384
    },
    "type": "tiny",
    "vocab": 51864
  },
  "params": {
    "language": "en",
    "model": "/models/ggml-tiny.en.bin",
    "translate": false
  },
  "result": {
    "language": "en"
  },
  "systeminfo": "AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | COREML = 0 | OPENVINO = 0 | ",
  "transcription": [
    {
      "offsets": {
        "from": 0,
        "to": 7000
      },
      "text": " All right, so I'm recording this. This is just a test. So just say whatever you want. Anything.",
      "timestamps": {
        "from": "00:00:00,000",
        "to": "00:00:07,000"
      }
    },
    {
      "offsets": {
        "from": 7000,
        "to": 14000
      },
      "text": " Okay. Um, today's a good day. I hope you probably good day into you.",
      "timestamps": {
        "from": "00:00:07,000",
        "to": "00:00:14,000"
      }
    },
    {
      "offsets": {
        "from": 14000,
        "to": 15000
      },
      "text": " Thank you. All right.",
      "timestamps": {
        "from": "00:00:14,000",
        "to": "00:00:15,000"
      }
    }
  ]
}

The text was updated successfully, but these errors were encountered:

Esnapp · 2024-12-05T20:07:09Z

Looking at the rttm generated by pyannote I think I've found where the error is occurring, the first 1.280 seconds has no speaker label. So when loading the rttm and matching against the transcription from whisper it runs into an error and leaves the timestamp and speaker as null.

SPEAKER iyx4k25z3jhygki-ffmpeg 1 1.280 4.016 <NA> <NA> SPEAKER_02 <NA> <NA>
SPEAKER iyx4k25z3jhygki-ffmpeg 1 7.760 1.299 <NA> <NA> SPEAKER_00 <NA> <NA>
SPEAKER iyx4k25z3jhygki-ffmpeg 1 10.443 1.603 <NA> <NA> SPEAKER_00 <NA> <NA>
SPEAKER iyx4k25z3jhygki-ffmpeg 1 12.434 1.249 <NA> <NA> SPEAKER_00 <NA> <NA>
SPEAKER iyx4k25z3jhygki-ffmpeg 1 14.020 1.114 <NA> <NA> SPEAKER_01 <NA> <NA>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Diarized Input Malformed Timestamp #38

Diarized Input Malformed Timestamp #38

Esnapp commented Dec 5, 2024 •

edited

Loading

Esnapp commented Dec 5, 2024

Diarized Input Malformed Timestamp #38

Diarized Input Malformed Timestamp #38

Comments

Esnapp commented Dec 5, 2024 • edited Loading

Esnapp commented Dec 5, 2024

Esnapp commented Dec 5, 2024 •

edited

Loading