Meme Retrieval Engine

Project Description

Technologies employed

Image Processing
Machine Learning
Natural Language Processing
Shell Scripting

Collection

The memes are collected from popular subreddits using a scraper script scrape/scraper.py

Standardization

The memes collected are put in raw folder and the script standard.py is run
Each file name is extracted and stored in a text file next to the new hex based filename generated for the image
The standardized images are stored in the processed folder

Query Extraction

The entered query is split into words and synonyms for each word is added to the list of related queries using the nltk library
We scan the database to match words with the words in related queries
This broadens the search area and minimizes zero output scenarios

Relevance to query

The memes are ordered in order of their relevance to the search query
This is done by assigning a score to each meme present in the database and then sorting in descending order of scores

Optical Character Recognition

OCR is done using Tesseract to extract text from the memes which is an essential part of the project
The extracted text are not perfectly accurate so the output from ocr is fed into the spellchecker of the Python autocorrect library
The spellchecker makes the conversion more accurate

Quick Testing

To run the GUI and test the functionalities, simply type

sudo bash run.sh

Collect and Run

To collect the memes from subreddits

sudo bash collect.sh

The bash script prepares the database which allows the Meme Engine to function properly
To run the Meme Retrieval Engine (Meme Finder) type

sudo bash run.sh

Enter the query in the text field and click on Go
The memes are sorted based on relevance
The selected memes can be browsed using the Next and Previous buttons

Add new subreddits to the list

sudo bash add.sh

Requirements

cv2 (OpenCV)
pytesseract
nltk
PIL
hashlib
shutil
autocorrect
pymongo

Future Improvements

Adding functionality to the progress bar
Correct the size scaling of memes for display on the canvas
Adding feature to flush stored memes
Storing popular meme templates and checking images for similarity and associating special keywords

Documentation

MemeFinder documentation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Meme Retrieval Engine

Project Description

Technologies employed

Collection

Standardization

Query Extraction

Relevance to query

Optical Character Recognition

Quick Testing

Collect and Run

Add new subreddits to the list

Requirements

Future Improvements

Documentation

Files

README.md

Latest commit

History

README.md

File metadata and controls

Meme Retrieval Engine

Project Description

Technologies employed

Collection

Standardization

Query Extraction

Relevance to query

Optical Character Recognition

Quick Testing

Collect and Run

Add new subreddits to the list

Requirements

Future Improvements

Documentation