I built a fun little ML powered app over the Christmas holidays, hope you have fun using it as much as I did while building it. User Research (LOL): The inspiration of building a tool like this comes from my mother's need of understanding Whatsapp image forwards which have English text written over them. I believe there are others as well who face the same struggle while trying to understand the daily fowards which are not in their regional language. Egro, added the support for 6 international languages.
Spaces is a new and extremly useful tool to deploy or showcase your ML apps to the world. You can refer these videos - Build and Deploy a Machine Learning App in 2 Minutes or Building Machine Learning Applications Fast released by Huggingface, to get more idea about it. Also, please refer this wonderful blogpost on how you can use HuggingFace Spaces and Gradio in matter of few lines of code.
In this article I'll try and explain how I build this fun app and how you can build one too. Let's Go!
HuggingFace is a startup in the AI field, and there mission is to democratize good machine learning. Its an AI community trying to build the future in which everyone has equal opportunity and access to benfits of latest advances in AI. You can either browse their model hub to discover, experiment and contribute to new sota models, for example, gooogle-tapas, distilbert, facebook-wav2vec2, and so on. Or you can directly use their inference API to serve your moddels directly from HF infrastructure. The most famous artifact that HF has created so far is their Transformer library, which started as an nlp library but now has support for other modalities as well. Now, Transformer provides thousands of pretrained models to perform tasks on different modalities such as text, vision, and audio.
Gradio is the fastest and easiest way to demo your ML models with a very friendly and feature-rich UI. Almost anyone can us it without a manual and with just a little intuition. You can install Gradio library easily using pip install. I used both Hugging Face and Gradio on Colab so installations were allthemore starightforward and easier. You can deploy your ML model online using just 5-10 code lines depending on the complexity of your implementation. Recently, Gradio has made it possible to embed your deployed ML model to any webpage of your choice. I have done the same at the end of this article, check it out. Gadio code helps you generate a public link for your deployed ML model/app which you can then share with your friends, colleagues at work or a potential employer or collaborator.
I built a fun project in last couple days using HuggingFace and Gradio functionalities. This project employs mage analysis, language translation and OCR techniques. A user can select an image of his choice with some english text over it as an input. For example, an image with some motivational text written over it like the ones we all receive in our family whatsapp groups all the time. He then gets to make a selection from the given 7 languages as the output language - German, Spanish, French, Turkish, Hindi, Arabic, and Irish. The app then outputs the same image as input but with text now translated in the language selected by the user.
I am using pytesseract to perform the OCR on input image. Once I have the text 'extracted' from the input image, I employ HuggingFace transformers library to get the desired translation model and tokenizer loaded for an inference. These translation models are open sourced by the Language Technology Research Group at the University of Helsinki, and you can access their account page and pre-trained odels on HuggingFace'e website. The extracted text is then translated into the selected language. For example, if you have selected the language as German, the app will load the "Helsinki-NLP/opus-mt-en-de" translation model from transformers hub and would tranlate the OCR extracted English text to German.
Next, I am using Kers-OCR library to extract the cordinates of English text from the original input image. This library is based on Keras CRNN or Keras implementation of Convolutional Recurrent Neural Network for Text Recognition. Once I have these cordinates, I perform a cleaning of text using OpenCV Pillow library with just couple lines of code. This cleaning is inspired from Carlo Borella's incredible post.
After this, next step is to copy the translated text onto the 'cleansed' image. Current implementation does not take care of pasting the translated text exactly in place of the original English text, however i have plans to do that and more in my next iterations.
My HuggingFace - Gradio app can be accessed on my account page on thier website, its accessible to public and is available over here - Translate English Text to Your Regional Language In Your Forwarded Images. Providing the demo in form of an animation below.
HuggingFace Spaces is a cool new feature, where anyone can host their AI models using two awesome SDKs - either Streamlit and Gradio. Spaces are a simple way to host ML demo apps directly on your HF profile page or your organization’s profile. This empowers our ML community to create our own little ML project portfolios, showcasing our projects at conferences, to stakeholders, or to any interested parties and to work collaboratively with other people in the ecosystem.
Few points to keep in mind for an easy passage while building a complex Gradio app like this one -
- All the required libraries should be mentioned in requirements.txt file
- In case you have some Debian dependencies and you would want to use sudo apt install for the same, make sure you copy such libraries in packages.txt file
- Make sure you are copying all the supporting files (images/fonts) over to your app space repo
- Comment aptly the code that you are submiiting under app.py file
- Try to have your model and tokenizers loading outside the inference calls made from gradio.interface(). This helps in speeding up your app response to the users.
- This apps app.py code can help you take an inspiration in case you want to have multiple and different type of inputs and outputs (image/text/radio box etc.). It took me a while to figure out the right way.
At the end of the day a strong community support helps you in learning about cool new avenues, uderstanding hard concepts, in resolving your issues, and in staying motivated to improve yourself and your skills.
There are many incredible folks out there building and helping ML communities day in day out. I would like to take a moment and specially thank a few folks for all the efforts that they put in daily. A lot of effort goes in replying to call outs for help, in writing easy to follow blogs, in inspiring other by showcasing their own ML work and in doing everything that they do to put themselves out there. Reach out to them on Twitter and Discord over here --
- Abubakar Abid, Ali, and AK of Gradio labs
- Merve Noyan and Omar from HuggingFace
- Everyone who is active on HuggingFace Discord community
The app is still a bit rough on the edges and I plan to improve it in future interations, for example, right now it might not process well certain screenshots and those images in which the text is slanted a bit. Planning to enable OCR for slant text and for images in which text is present at multiple places. I will also be adding more languages to the mix. And lastly would be trying to insert the translated text at the same spot as the original image and in similar font style and font size.
Gradio helps in bridging the gap between developing your ML models and showcasing them to the world. In my humble opinion, this is a crucial step in two main themes of this new year - Democratizing AI and Productioninzing AI.
My github repo and code can be accessed over here - HugginFace_Gradio.
If you enjoyed this article, please feel free to connect with me on LinkedIn or Twitter and do share your feedback and any other ML app ideas that you would want to implement yourself, I will be happy to help as much as I can.
Image source - Photo by Michał Kubalczyk on Unsplash