Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multithreading HowTo #8

Open
masc4ii opened this issue Aug 10, 2019 · 29 comments
Open

Multithreading HowTo #8

masc4ii opened this issue Aug 10, 2019 · 29 comments

Comments

@masc4ii
Copy link

masc4ii commented Aug 10, 2019

Thank you for sharing this great resizing library! Results look so good!
Now I would like to get it a little faster using multithreading, but I don't know how. In your documentation I found class CImageResizerThreadPool... is this the right class for multithreading? Unfortunately I have no idea how to use it. Could you please share a small example, how to realize multithreaded resizing with your library? Thanks in advance for your help!

@avaneev
Copy link
Owner

avaneev commented Aug 10, 2019

It is an extremely simple front-end. You should first gain knowledge how "worker thread pools" work in general, then you will be able to implement it.

My own implementation looks like this, but it needs custom-programmed thread pool and worker thread objects that call Workload -> process().

class CThreadPool : public avir :: CImageResizerThreadPool,
public CWorkerThreadPool
{
public:
int MaxThreadCount; // The number of threads to use.

virtual int getSuggestedWorkloadCount() const
{
	return( MaxThreadCount <= 0 ? CSystem :: getProcessorCount() :
		MaxThreadCount );
}

virtual void addWorkload( CWorkload* const Workload )
{
	VOXERRSKIP( add( new CResizeThread( Workload )));
}

virtual void startAllWorkloads()
{
	startAll();
}

virtual void waitAllWorkloadsToFinish()
{
	VOXERRSKIP( waitAllForFinish() );
}

virtual void removeAllWorkloads()
{
	removeAllThreads();
}

};

@masc4ii
Copy link
Author

masc4ii commented Aug 11, 2019

Thanks for your answer. In past I implemented a thread pool for another topic, so I will have a look if that helps here too. What I did not understand yet: does this strategy help a) to render a single picture faster, or does it b) help to render e.g. 4 pictures at nearly the same time like one (on a QuadCore CPU)?
If a) : I don't see howto divide the picture into parts, wouldn't be this necessary somehow?

@avaneev
Copy link
Owner

avaneev commented Aug 11, 2019

On a 4-core processor the resizing speed increases by a factor of 3.2, so it does help to resize images faster. Algorithm divides the image automatically.

@Ptomaine
Copy link

Hello Aleksey,

Unfortunately, I've been unable to re-scale an image with my thread pool.

The result is the striped picture:
image

But it's good when re-scaled with a single thread:
image

The code of the re-scaling thread pool is the following:

using thread_pool_base = thread_pool;
class avir_scale_thread_pool : public avir::CImageResizerThreadPool, public thread_pool_base
{
public:
    virtual int getSuggestedWorkloadCount() const override
    {
        return thread_pool_base::size();
    }

    virtual void addWorkload(CWorkload *const workload) override
    {
        _workloads.push(workload);
    }

    virtual void startAllWorkloads() override
    {
        while (!std::empty(_workloads))
        {
            _tasks.emplace_back(thread_pool_base::enqueue([](auto workload){ workload->process(); }, _workloads.front()));
            _workloads.pop();
        }
    }

    virtual void waitAllWorkloadsToFinish() override
    {
        for (auto &task : _tasks) task.wait();
    }

private:
    std::deque<task_future<void>> _tasks;
    std::queue<CWorkload*> _workloads;
};

Could please help me to investigate why the result is different from the expected?
Thanks in advance!

@avaneev
Copy link
Owner

avaneev commented Aug 17, 2019

How do you initialize the thread pool?

@avaneev
Copy link
Owner

avaneev commented Aug 17, 2019

Which thread library are you using?

@avaneev
Copy link
Owner

avaneev commented Aug 17, 2019

It looks like not all threads are actually being executed, maybe some workload queue mistake.

@avaneev
Copy link
Owner

avaneev commented Aug 17, 2019

You also probably need to remove items from _tasks if they are not autoremoved.

@Ptomaine
Copy link

Here is the library that I use (attached to the message).
thread_pool.txt

Just rename it to *.hpp

@Ptomaine
Copy link

The thread pool is utilized like this:

    nstd::avir_scale_thread_pool scaling_pool;
    nstd::avir::CImageResizerVars vars; vars.ThreadPool = &scaling_pool;
    nstd::avir::CImageResizerParamsUltra roptions;
    nstd::avir::CImageResizer<fpclass_dith> image_resizer { 8, 0, roptions};
    image_resizer.resizeImage(image, width, height, 0, new_image.get(), new_width, new_height, channels, 0, &vars);

@Ptomaine
Copy link

Removing tasks didn't help:

    virtual void removeAllWorkloads()
    {
        _tasks.clear();
    }

@avaneev
Copy link
Owner

avaneev commented Aug 17, 2019

Make sure thread_pool_base::size() returns correct value - should be the number of processors in the system. I have doubts that thread pool actually runs all workloads, make sure thread pool is functioning correctly. Test it by replacing workload->process(); with something like printf( "thread started\n" );. It should print this string thread_pool_base::size()*2 times.

@Ptomaine
Copy link

It does. It returns the right number of cores.

@Ptomaine
Copy link

the size of pool: 16 Workload... Workload... Workload... Workload... Workload... Workload... Workload... Workload... Workload... Workload... Workload... Workload... Workload... Workload... Workload...

@Ptomaine
Copy link

It prints 15 times...

@Ptomaine
Copy link

I changed the default value from:

std::max(std::thread::hardware_concurrency(), 2u) - 1u

to

std::max(std::thread::hardware_concurrency(), 2u)

It didn't change anything. The picture is still striped.

@Ptomaine
Copy link

Ptomaine commented Aug 17, 2019

Okay. I've fixed it!
I just looked into your code and saw that you use the same workloads two times.
My mistake was that I removed workloads right after the first execution.
The proper thread pool looks like this:

using thread_pool_base = thread_pool;
class avir_scale_thread_pool : public avir::CImageResizerThreadPool, public thread_pool_base
{
public:
    virtual int getSuggestedWorkloadCount() const override
    {
        return thread_pool_base::size();
    }

    virtual void addWorkload(CWorkload *const workload) override
    {
        _workloads.emplace_back(workload);
    }

    virtual void startAllWorkloads() override
    {
        for (auto &workload : _workloads) _tasks.emplace_back(thread_pool_base::enqueue([](auto workload){ workload->process(); }, workload));
    }

    virtual void waitAllWorkloadsToFinish() override
    {
        for (auto &task : _tasks) task.wait();
    }

    virtual void removeAllWorkloads()
    {
        _tasks.clear();
        _workloads.clear();
    }

private:
    std::deque<std::future<void>> _tasks;
    std::deque<CWorkload*> _workloads;
};

@avaneev
Copy link
Owner

avaneev commented Aug 17, 2019

I'm glad you've got it working, I'll leave this issue opened for others to learn.

@Ptomaine
Copy link

Thank you!

@masc4ii
Copy link
Author

masc4ii commented Aug 19, 2019

Thanks at all! Got it working now too!

@avaneev
Copy link
Owner

avaneev commented May 2, 2021

@masc4ii Hello! I've registered myself on the magiclantern.fm forum, and posted a couple of messages there, still waiting for moderation approval. Anyway, to get the information faster to you, here's what I've posted there:

"Hi! Have the MLV App authors tried to apply non-linear "saturation" image transformations in a higher resolution, with a later downsizing step? This is not a common technique, but from the DSP standpoint it should look much better. "Aliasing" is not the whole story like in image resizing, there's also "harmonic distortion", which is not as apparent with images as it is with audio. Maybe worth a try.

A follow-up: the same actually applies to "linearization" or sRGB->linear conversion. It's a non-conventional approach and is resource-heavy, but probably it will fix the feel of all these gamma corrections being "not right"."

I would like to add that making e.g. 3x upsampling followed by 3x downsampling is completely safe with AVIR regarding the dynamic range. What is affected is frequency response, but this is "visible" when resizing smaller images mostly. Any high-resolution photo in most cases is already lacking in the highest frequencies due to limitations of the lenses mainly.

@avaneev
Copy link
Owner

avaneev commented May 2, 2021

@masc4ii It's even a lot more safer with 2x upsample-downsample cycle (retains an unbelievable 120 dB range), but for best results I think 3x is needed.

@avaneev
Copy link
Owner

avaneev commented May 2, 2021

@masc4ii To optimize the things in the pipeline it may be useful to just first upsample, then process the pipeline, then downsample. I do this in professional audio software, that's a very important feature for the users.

@masc4ii
Copy link
Author

masc4ii commented May 2, 2021

Hi @avaneev ,

thank you! We'll see when the mods enable your account... 😄 ...yes, here I can read already.

Until now all operations are done on the original resolution of the RAW image. For the final export after processing with ffmpeg we upscale with AVIR. I can imagine, that doing all those processing operations on a upscaled image could give better quality - but that would probably slow down the processing a lot, or am I wrong? One of the biggest downsides of our app currently is processing time for most users. In your example (3x upscaling): I would expect the application to be ~9x slower - am I correct, or didn't I understand correct?
What exactly do you mean with conversion sRGB->linear? As far as I know, we do the opposite: RAW sensor data is linear and we convert to "something good-looking or useful" like sRGB or those Log profiles.
@ilia3101 : most of the processing was your work - what do you say?! 👀

Do you own a ML capable camera? If you want, I could send you some MLV samples, so you could play with the app a little - if you like.

Thanks so much for your ideas!

@avaneev
Copy link
Owner

avaneev commented May 2, 2021

Yes, that will slow-down all processing, maybe not by a factor of 9, but easily by a factor of 7. linear->sRGB conversion is the same situation, it's a non-linear sample mapping, so my proposal applies there, too.

I do not have an ML capable camera now, but I did a lot of "photography" when I had some Canon EOS middlerange camera and a couple of lenses in the past. Here you can see my "photoworks", on my audio samples product pages: https://www.voxengo.com/group/drum-samples/

@avaneev
Copy link
Owner

avaneev commented May 2, 2021

@masc4ii This option can be made switchable, e.g. 1x, 2x or 3x oversampling. For short videos or nightly renders one could select 2x or 3x, 1x for other cases. This is a "transparent" option, it does not change things too much, but probably improves perceived dynamic range and "smoothness" of the footage.

@avaneev
Copy link
Owner

avaneev commented May 2, 2021

@masc4ii One more note: my subjective feeling says that an image becomes apparently "more vivid" when non-linear sample mapping is applied at an increased resolution, and indeed it looks "smoother", but not in a "blurry" meaning. Dynamic range improvement is not too perceptible for me.

@avaneev
Copy link
Owner

avaneev commented May 2, 2021

@masc4ii It looks more "cinematic" I would also say, closer to the "vintage" than "eye-popping" crispness of "modern".

@avaneev
Copy link
Owner

avaneev commented May 2, 2021

@masc4ii Of course, it's important to apply any saturation/gamma transformation before the final resize, applying them afterwards won't make things look much better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants