export and augmentation for speaker verification #8132
-
I have been looking into setting up a speaker verification system using NeMo's TitaNet model. The general idea would be a system exposing a REST API with three endpoints: {
"openapi": "3.0.0",
"info": {
"title": "speaker verification REST API",
"version": "1.0.0"
},
"paths": {
"/register": {
"post": {
"summary": "register a speaker by fine-tuning and exporting a TitaNet model with new recordings",
"requestBody": {
"required": true,
"content": {
"multipart/form-data": {
"schema": {
"type": "object",
"properties": {
"speaker": {
"type": "string"
},
"recordings": {
"type": "array",
"items": {
"type": "string",
"format": "binary"
}
}
},
"required": ["speaker", "recordings"]
}
}
}
},
"responses": {
"200": {
"description": "speaker registered successfully"
}
}
}
},
"/verify": {
"post": {
"summary": "verify speaker identity with a recording",
"requestBody": {
"required": true,
"content": {
"multipart/form-data": {
"schema": {
"type": "object",
"properties": {
"recording": {
"type": "string",
"format": "binary"
}
},
"required": ["recording"]
}
}
}
},
"responses": {
"200": {
"description": "<speaker>"
},
"404": {
"description": "unknown speaker"
}
}
}
},
"/unregister": {
"post": {
"summary": "unregister a speaker by fine-tuning and exporting a TitaNet model without his or her recordings",
"requestBody": {
"required": true,
"content": {
"application/json": {
"schema": {
"type": "object",
"properties": {
"speaker": {
"type": "string"
}
},
"required": ["speaker"]
}
}
}
},
"responses": {
"200": {
"description": "speaker unregistered successfully"
},
"404": {
"description": "speaker not found"
}
}
}
}
}
} Now comes the difficult part as export and augmentation are key steps for the system quality of life (QoL). After looking at the speaker verification Jupyter notebook, I still have a few doubts. exportFor starters, model exporting is standard practice when deploying machine learning model for inference, but there is no mention of it in the Jupyter notebook whose inference example uses the fine-tuned pytorch or NeMo checkpoint. The only pointers I have found for TitaNet export to ONNX are #7245 and #6759, but from the comments it is not clear:
Therefore, is there any official reference for export for speaker verification in NeMo? augmentationThe Jupyter notebook suggests dataset augmentation for enhanced performance, which I think would be even more critical in such a scenario preventing a person from having to record a large amount of recordings for speaker registration. Therefore, 1) what is the optimal number of recordings per speaker for speaker verification fine-tuning? 2) what is the optimal amount of dataset augmentation and which augmentation techniques would be the most appropriate? A few examples are shared in the online augmentation Jupyter notebook, but it is not clear:
The Jupyter notebook repeatedly suggests opting for one-step offline augmentation, avoiding excessive slowdown in training, but I could not find any official reference to offline dataset augmentation examples in NeMo. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 5 replies
-
Any pointers on that, @titu1994? |
Beta Was this translation helpful? Give feedback.
-
Hi @DiTo97, In my experience (a year ago), the speed with ONNX is definitely faster than the native PyTorch checkpoint, but I don't recall exact numbers.
I suggest using online augmentation, as I did for TitaNet training. |
Beta Was this translation helpful? Give feedback.
export
This is correct, that is the reason we use current nemo model to use the preprocessor, but you can just import the dataloader for that class and replicate the dataloading to avoid preloading from model.
I am not currently working on ONNX export so my knowledge can be outdated, adding @borisfom to answer this query.
Yes, we currently onl…