Skip to content

FaceViT: A multi-task Vision Transformer for face detection, age estimation, and gender prediction, demonstrating the ability of the Vision Transformer to perform great across different tasks such as object detection and classification.

License

Notifications You must be signed in to change notification settings

dimiz51/FaceViT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FaceViT: A lightweight multitask Vision Transformer for face detection, age prediction and gender classification

FaceViT: A small-sized multi-task Vision Transformer trained from scratch for face detection, age estimation, and gender prediction, demonstrating the ability of the Vision Transformer to perform great across different tasks such as object detection and multiple classifications simultaneously.

Dataset

For this project I have used the UTK Faces dataset, which you can download with a Kaggle account from here:

NOTE: This dataset is heavily imbalanced when it comes to ages of people, while also containing a lot of relatively low quality images. This could be a limiting factor for performance over the different tasks.

Some predictions

Example Prediction

Training time loss and metrics

Loss plot

Example Prediction

Metrics plot

Example Prediction

NOTE: During this training experiment no augmentations were used.

Evaluation metrics on validation set

Metric Name Value Training Epochs
Top-3 Age Accuracy 61 % 43
Face Bounding Box MSE 0.0054 43
Gender Accuracy 75 % 43

About

FaceViT: A multi-task Vision Transformer for face detection, age estimation, and gender prediction, demonstrating the ability of the Vision Transformer to perform great across different tasks such as object detection and classification.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published