Does fscrawler docker image contain GPU acceleration support for OCR? #1662
-
I did a manual install some years back and had to ensure tika or tesseract (?) had been compiled to use my GPU. Just wondering if the included version in the docker image will use the CUDA enabled GPU of my host if I pass it through? If it is, I think all I need is the following in my docker-compose.yml? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 3 replies
-
I have absolutely no idea! If you can figure that out, that'd be an awesome contribution to document it (or modify the docker image). |
Beta Was this translation helpful? Give feedback.
-
so that was an adventure. It does not, by default. To get uncontainerised fscrawler to use GPU/CUDA/OpenCL, I did a lot of mucking about on my host. Tesseract needed to be explicitly compiled for OpenCL, and the host needed to have OpenCL libraries/drivers etc. I did about a million things to my host system (I spent too much time fighting with chatGPT, and not enough time just reading the actual source package doco ;) ) and finally got regular fscrawler to run on my GPU.....so every 30 seconds or so the GPU would blip up to 5% usage.... I don't think it would be worth getting into the container. There's a whole host of extra stuff needed to get nvidia bits passed through to docker to be accessible to the container, the flag I noted above seems like pretty much the last thing you do, after a bunch of other host system prep. And then we'd need to update the container with an opencl-compiled tesseract. My new adventure is to try and use all 12 CPU cores when running fscrawler. None of the hints from Bard or ChatGPT actually work with fscrawler (running fscrawler -t 12 as the command, adding threads:12 to the _settings.yaml), but I am semi-sure I had it multi-threaded a few years back when I first tried it.... |
Beta Was this translation helpful? Give feedback.
Thanks for all the info!
No. It has never been multi-threaded sadly. The way to bypass this today is to create as many instances of FSCrawler as you want and have each of them monitor a sub-part of your data directories.