imgrep

what does it do?

Transcribes a folder of images containing non-standard text - your folder of memes, screenshots, etc., and makes them searchable.

how does it work?

Grabs one transcription with a small multimodal model (Moondream, 2B parameters) and one transcription with conventional OCR (pytesseract), and uses the average of similarity to both to search.

The model can reconstruct deep-fried lines of text or text made difficult to OCR by being superimposed over images; the OCR handles the majority of cases. Most images containing text can be transcribed decently enough to appear in search by at least one of them. Not a pretty solution, but has a pretty high success rate.

how well does it work?

In active development. Basic functionality proven to work. Bugs abound.

how fast does it work?

benchmarks coming Soon(TM)

how do I install it?

curl -fsSL https://ollama.com/install.sh | sh
pip install -r requirements.txt

how do I run it?

ollama run moondream
python watcher_in_darkness.py

and then move or save your images into WATCH_PATH.

python search.py Text to query for

yes, you can run both simultaneously.

how much compute do I need?

Less than you think. I've run Moondream inference CPU-only on a $80 laptop and taken ~30 seconds per image. Not optimal, wouldn't recommend, but it works.

aren't there better open source models?

much better! for instance, llama3.2-vision consistently extracts text and responds with only text, even on screenshots of long paragraphs. But llama3.2-vision also takes up 4x as much space and significantly more compute and time.

this piece of software was assembled specifically to get a working and resilient pipeline without being able to rely on having the best currently available model which can accurately follow criteria - if you have the GPUs on hand to run llama3.2-vision on everything you want to index, you already have access to much better solutions.

this isn't a trivial fix either that can be improved by using a slightly larger model - llava-7b, for instance, performs worse than moondream across a wide variety of images when asked to extract text only.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
TODO.md		TODO.md
bench.py		bench.py
bench.sh		bench.sh
colorlogger.py		colorlogger.py
env.py		env.py
load_bulk.py		load_bulk.py
ollama_vision_client.py		ollama_vision_client.py
requirements.txt		requirements.txt
search.py		search.py
tess_reader.py		tess_reader.py
transcription_handler.py		transcription_handler.py
watcher_in_darkness.py		watcher_in_darkness.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

imgrep

what does it do?

how does it work?

how well does it work?

how fast does it work?

how do I install it?

how do I run it?

how much compute do I need?

aren't there better open source models?

About

Releases

Packages

Languages

License

eyewheel/imgrep

Folders and files

Latest commit

History

Repository files navigation

imgrep

what does it do?

how does it work?

how well does it work?

how fast does it work?

how do I install it?

how do I run it?

how much compute do I need?

aren't there better open source models?

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages