Face2Text is a model that will take the image of a face as it's input and describe the face according to facial features and emotional state in the form of a gramatically coherent sentence.
- Dataset used is the one described in the paper Face2Text: Collecting an Annotated Image Description Corpus for the Generation of Rich Face Descriptions
- It was made available to us on request. The data can be obtained by contacting either the RIVAL group or the authors of the aforementioned paper.
- The dataset consists of the around 5685 annotated images chosen randomly from CelebA dataset.
- The annotations describe the images as naturally as possible and focus on capturing the person's facial expression and/or their emotional state.
- To our knowledge, we seem to be the first to work on this dataset for task of face description.
- Clone this repo.
- Get the data by either contacting the RIVAL group or the authors of the paper add raw.json file in the data\ subdirectory.
- Download CelebA dataset and extract it to the data\ subdirectory.
- Run the jupyter notebook Face2Text_notebook_Xception.ipynb in the notebooks\ subdirectory.
- For Real Time face description, run the jupyter notebook RealTime.ipynb in the notebooks\ subdirectory.
The following are model's predictions for image descriptions:
- Use a pretrained network trained for face recognition task to get image encodings. This can help identify the nuances in facial features and give better performance
- Fine tune beam search
- Perform hyperparameter tuning