Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft: Coroutines article #300

Open
wants to merge 11 commits into
base: master
Choose a base branch
from
Open

Draft: Coroutines article #300

wants to merge 11 commits into from

Conversation

koniarik
Copy link
Contributor

As discussed in email, I am making a PR with a draft of article about coroutines, I am looking forward for cooperation!

P.S: the article is intentionally set to 2024, feel free to suggest better date.

Copy link
Member

@bahildebrand bahildebrand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I like the idea of the article. I think our readers would be interested in a more focused look at how to implement coroutines for asynchronous peripheral IO, and what the benefits over traditional threaded concurrency would look like. Maybe you could spin up an asynchronous driver for an MCU peripheral and use it as an example?

Let me know if you have any question about any of my feedback.


Approach 1) has multiple potential issues, it might take a lot of code to implement a complex exchange of data in this way (a lot of state variables) and the approach is problematic, as there is a chance that one of the steps might take longer than expected and we can't prevent that.

Approach 2) has another set of issues. Each thread requires its own stack space (which might not scale), and we got all the problems of parallelism - exchanging data between can suffer various potential concurrency issues.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I agree with the thought that using coroutines avoids concurrency issues. Coroutines man not be parallel, but they are certainly concurrent. I think the main advantage would be the first focus of this sentence, savings on stack size and context switch penalties.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eeeeeh, it does, point of coroutines is that you make a context swithc at specific points in the code that are controlled by you. While in normal freertos-like threading, the context switch can happen anytime the main timer fires and switches the threads.

This means that you do not have to think problems about stuff like strict atomic work with shared variables, let's assume that i is shared variable between parts of the system that execute concurrently. This is unsafe in threads, but is perfectly ok in case of coroutines:

int j = i;
j += 1;
i = j;

That is, with threads: context switch can happen at any point in time, at any instruction, and you DO have to think about it and take care
with coros: context switch happens ONLY when you explicitly write in the code that the coroutine should interrupt itself.

For me this is one of the biggest motivations to use coroutines over threads in general, as the mental overhead of managing concurrent access to stuff with this explicit cooperation is much smaller than with threads.

Anyway, I will try to re-think this part and think how to express it in better way :)


That is, `coroutine` is just a `function` that can interrupt itself and be resumed by the caller.

### How does it look like
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to see some references throughout this section.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added general reference to cppreference coroutines, I will think about this some more whenever I want to go into more detail. (Generally, cppreference is a good reference)

Comment on lines +206 to +207
Coroutines have `allocator` support, we can provide the coroutine with an allocator that can be used to get memory for the coroutine and hence avoid the dynamic allocation. (Approach that I can suggest)
This is done by implementing custom `operator new` and `operator delete` on the `promise_type` which allocates the `entire frame`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the suggestion to roll our own allocator? I'll admit I'm not very familiar with C++ coroutines. More examples and references here would be appreciated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood, I will think about expanding this part more clearly, generally: you can use whatever you want to get/free memory, that is: malloc/free, your own memroy resource, allocators or anything.

Comment on lines +209 to +210
An alternative is to rely on `halo` optimization (Heap Allocation Elision Optimization) if the coroutine is implemented correctly and the parent function executes the entire coroutine in its context.
The compiler can optimize away the dynamic allocation and just store the frame on the stack of the coroutine's parent.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me this seems cleaner than a custom allocator. Could you elaborate on why you prefer the custom allocator solution?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did updated this in the article

What I would suggest (that is what I do), is to use coroutines for the long-term process and just build them during the initialization of the device.
Or just live with the dynamic memory.

(Note: if I could dream a bit here, I would want an explicit way of forcing the coroutine to live on the stack, which would have clearly defined behaviour of when and how it should happen, and with proper compiler errors if one fails to do so)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this would be the preferred way :(. It might help to flesh out the halo optimization more and why it isn't as ideal, since by your description it looks to accomplish this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will ok this more to make sure it is explained properly, halo is optimization... you can't force compiler to do it, comiler usually won't tell you why it did it or did not do it, or it won't even tell you how exactly it decides


#### Frame size is big

One more invisible caveat that appeared is that GCC is not yet smart enough with the frame size.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a GCC problem, or do other compilers have this issue as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try to check this

Comment on lines 285 to 286
Given the bus nature of `i2c`, it is also quite easy to achieve sharing of the `i2c bus` with multiple `device drivers` for various devices on the bus itself.
We can just implement the `i2c_coroutine round_robin_run(std::span<i2c_coroutine> coros)` coroutine that uses round robin to share access to the peripheral between multiple coroutines (devices).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this section would be very interesting to elaborate. Other strategies for handling concurrency for this scenario, and how you would implement them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some round robin implementation, wil lthink about others

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants