Skip to content

ShreyAgarwal11/Privacy-Preserving-Representation-for-Audio-Visual-Speech-Understanding

Repository files navigation

Privacy-Preserving-Representation-for-Audio-Visual-Speech-Understanding

Multimodal datasets can contain personally identifiable information. We propose a general framework for privacy-aware representation of audio-visual (AV) data.

Data

VidTIMIT (Video Dynamic TIMIT) DeepfakeTIMIT MSP-Improv (Multimodal Sensitive Periods Improvisation Corpus)

Method

  1. Feature Extraction Using AV-HuBERT
  2. Privacy Transformer
  3. Differential privacy filter
  4. Speaker Recognition
  5. Emotion Recognition

image

Results

Speaker Recognition

Method Accuracy (VidTIMIT
AV-HuBERT 88.24 (batches of 2 )
Differential Privacy filter 50 (batches of 2 )
Transformer Privacy filter 58 (batches of 2 )

Emotion Recognition

Method F1 Score Accuracy
AV-HuBERT 41 41
Differential Privacy filter 22 22
Transformer Privacy filter 36 36

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •