An application for fraud detection in medicine packages and tablets.
- Performed Web Scraping with Beautiful Soup to gather a large dataset of of images and information about medicine packages and tablets packages and tablets .
- Transformed the data and loaded it into an excel file with Power Query Editor to have the names and information of each package in a local dataset.
- Extracted text from images with OCR tools such as EasyOCR and Pytesseract from medicine packages and tablets to detect fraud, and Aapplied Named Entity Recognition tagging on extracted text to label the medicine by name, dosage, type and size, and trained a custom spacy model on the processed data to predict labels on new text.
- Extracted the labeled text in a csv file and used Jaccard Similarity scores to detect fraud between the information on the packaging and the tablets.
- Built an user friendly dashboard with Streamlit.