Simple hassle-free image processing using OpenCV for large amount of images concurrently in C++.
This program can apply a list of filters on a group of images concurrently. For example, you can change red color to green, then yellow color to blue and then apply a gaussian blur in all images inside a directory. These operations will be in the most possible efficient way. Both image groups and every single filters on individual images are processed in parallel and concurrently.
Created using modern C++ 20 features to utilize concurrent and parallel programming and managing threads using asynchronous execution, futures and TBB lib.
No manual thread handling, no locks, no mutex, no semaphore.
- Parallel loading of images from directory
- Parallel manipulation of images in OpenCV
- Parallel manipulation of every single filter on individual images
- Preserve order of filters when applying concurrently
- Parallel saving of images to output directory
- Ability to buffer IO
- Asynchronous
- Using futures
- Using TBB
- Concurrent and parallel
- Sequential version for comparison
- No lock, mutex or semaphore
- No race conditions or deadlocks
- Simple and understandable code
- Documented code
For comparison, you can see a sequential manipulation of images takes too long:
mass-image-processing-sequential.mp4
But when using parallel execution everything is blazing fast:
mass-image-processing-parallel.mp4
This project is created using OpenCV and TBB libs, so you need to have those in your $PATH
in order to compile the
project.
After cloning repo simply run:
cmake .
make
Then you can find the mass_image_processing
in your current directory.
A minimal usage would look like this:
mass_image_processing ./imgs-in ./imgs-out
Where imgs-in
is a directory containing as many as jpeg images as you want and imgs-out
is an empty directory to
put the manipulated images.
There is also two more arguments that you can pass to the binary. First one is concurrency type and second one is IO type.
Concurrency type:
-p
(default) for parallel processing-s
for sequential processing
IO type:
-nb
(default) to not use buffered IO-b
to use buffered IO
Parameters order matters:
Usage: mass_image_processing <input_dir> <output_dir> [-p,-s] [-nb,-b]
-p: parallel processing (default)
-s: sequential processing
-nb: non-buffered I/O (default)
-b: buffered I/O
Run parallel processing with non-buffered IO (default):
mass_image_processing ./imgs-in ./imgs-out
or
mass_image_processing ./imgs-in ./imgs-out -p -nb
Run parallel processing with buffered IO:
mass_image_processing ./imgs-in ./imgs-out -p -b
Run sequential processing with non-buffered IO:
mass_image_processing ./imgs-in ./imgs-out -s -nb
Run sequential processing with buffered IO:
mass_image_processing ./imgs-in ./imgs-out -s -b
Depending on how you want to process images it will have a major effect on how your system resources will be utilized and how much time it will take to run the job.
We will compare all 4 possibilities below with different number of images.
All images are different from each other and are high-quality 4k images which every one of them has a size around 10MB.
These benchmarks are generated in an Arch Linux pc with 20 cores and 64GB of RAM.
Running with 10 images will be like this:
Parallel non-buffered | Parallel buffered | Sequential non-buffered | Sequential buffered |
---|---|---|---|
1.38 secs | 1.46 secs | 10.38 secs | 10.76 secs |
Running with 100 images will be like this:
Parallel non-buffered | Parallel buffered | Sequential non-buffered | Sequential buffered |
---|---|---|---|
20.39 secs | 20.28 secs | 170.40 secs | 170.22 secs |
Running with 1000 images will be like this:
Parallel non-buffered | Parallel buffered | Sequential non-buffered | Sequential buffered |
---|---|---|---|
131.16 secs | 129.61 secs | 16.99 mins | 16.93 mins |
By comparing runtime results we can see the parallel version is about 8 times faster than the sequential version.
Sequential version used a fixed amount of CPU and RAM because every image was roughly the same size and there was no parallel execution. Even one percent of CPU and RAM was not acquired by sequential version, so many of the system resources was not utilized and remained useless. But on the other hand the parallel version used as much CPU and RAM as it could acquire without degrading the performance of other parts of the system. This is because C++ asynchronous programming will manage threads in the most efficient way possible.
Enabling or disabling buffered IO seems to not much affect the performance, but when number of images increases it improves the performance. So we can say for low number of images (e.g. less than 1000) it's better to use non-buffered version but for higher amounts of images it's better to use buffered IO.