-
-
Notifications
You must be signed in to change notification settings - Fork 780
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: Alternate Manchester SWO implementation using DMA #1960
base: main
Are you sure you want to change the base?
Conversation
513b832
to
669659e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We haven't got too far into reviewing the DMA code as the review notes we have are already going to be a fair bit of work and with them resolved reviewing the rest of the implementation will be easier, so we're submitting this as an early review.
@@ -58,7 +58,6 @@ bool swo_itm_decoding = false; | |||
uint8_t *swo_buffer; | |||
uint16_t swo_buffer_read_index = 0U; | |||
uint16_t swo_buffer_write_index = 0U; | |||
_Atomic uint16_t swo_buffer_bytes_available = 0U; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might have seemed redundant, but please consider what happens when the two indicies are equal to each other after the buffer fills up - this particular bug has happened when a target is particularly bursty at higher frequencies, and results in a full buffer of data getting dropped when just checking the indexes aren't equal. This is why this variable was introduced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have. It's not about redundancy - other than the (not insignificant) performance penalty, maintaining the extra variable introduces a higher chance of getting things wrong. When swo_buffer_bytes_available crossed the size of the buffer, it wasn't handled either - it just sent the buffered data twice. Just using the indexes on the other hand tends to "drop" the full buffer. The stream is corrupted in both cases, but I'd argue sending less data and recovering faster is preferable to sending more bad data.
@@ -130,13 +130,11 @@ void swo_uart_deinit(void) | |||
{ | |||
/* Disable the UART and halt DMA for it, grabbing the number of bytes left in the buffer as we do */ | |||
usart_disable(SWO_UART); | |||
const uint16_t space_remaining = dma_get_number_of_data(SWO_DMA_BUS, SWO_DMA_CHAN); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The TRM didn't make it clear if disabling the channel reset the value read to the initial value programmed - this will need testing particularly on a GD32F1-based probe to ensure this doesn't cause an unforced error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd argue it can not, it would violate a basic principle of DMA - disabling the channel is not doing anything, not even stopping pending operations, it just masks future requests. Re-enabling it has to continue where it left off.
I have moved it below jsut to make sure transfers are not going to continue for the few cycles after the readout, but I can keep it where it is.
src/platforms/native/Makefile.inc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please include these changes in the Meson build system too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried to, hopefully it works
// a proper break in the pulse sequence | ||
//#define SWO_ADVANCED_RECOVERY 1 | ||
|
||
#define FORCE_INLINE inline __attribute__((always_inline)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not do this. Let the compiler choose by just saying inline
. This actively pesimises the inlining vs code size balance and should only be done when absolutely required. Let the compiler do its job otherwise please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found another way to do this, but I cannot agree with the general sentiment :) I also prefer to not use force inlining, but in this case the compiler chose not to inline even when inlining made the code several times faster and smaller. There is a reason this attribute exists.
It might be unfortunate, but writing this kind of extreme performance code is an endless loop of making the compiler generate the assembly one would write on their own (while still being portable to other CPU architectures, maybe with less optimal performance). At the same time, it is also true the compiler is better at ordering the instructions to make better use of various CPU pipeline stages which makes it generally faster than handwritten assembly, not to mention more maintainable. But the optimization heuristics are simply not able to understand the intent/requirements in all cases.
uint16_t rx, t, q; | ||
uint8_t s; | ||
int32_t b; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use expressive variable names - what are these, what do they mean?! We are not limited on variable name length, so please use the name to describe the purpose.
It might seem a little harsh, but the idea is that you should be describing to the reader, who might be yourself in 6 months, what the intent of all this is and what these mean/do so it can be maintained and adjusted if there are bugs found. That's just not practical with single-letter variable names which is why this is a clang-tidy lint. Every attempt to read this code and figure out what it does becomes a reverse engineering exercise otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kind of a matter of opinion, and these "variables" are very clearly documented inside the main processing function, this is just a stash to store the state in the meantime. I'd argue the C-style long variables make code very hard to read because there is an abysmal signal to noise ratio.
Anyway, changing this, I don't wish to force my will on the project :)
if ((*USB_EP_REG(SWO_ENDPOINT) & USB_EP_TX_STAT) != USB_EP_TX_STAT_VALID) | ||
{ | ||
swo_send_buffer(usbdev, SWO_ENDPOINT); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The braces are not necessary, please drop them. Please also run this file through clang-format
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has been run through clang-format. Also, I'm very hesitant to see code without braces - remember goto fail;?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-format is now really applied :) I incorrectly presumed make clang-format
does it since it changed a ton of files, but it goes only two levels deep.
Note: We are aware of the breakage on the lint pass from GitHub's deployment of Ubuntu 24.04 LTS, we will get this fixed in the mean time. Edit: This has now been fixed, when you rebase this PR on |
0a26202
to
7b87080
Compare
…ementation using DMA
7b87080
to
45fa106
Compare
45fa106
to
37d8ba2
Compare
Fast DMA-based SWO Manchester decoding
Detailed description
The current interrupt-based SWO decoding suffers (IMO) from several issues
This PR contains a full reimplementation of the SWO Manchester decoder using DMA that makes capture a lot more efficient:
The result is the ability to process SWO signal up to ~3 MHz with the probe being more resilient against higher frequency signals (it just fails to decode it properly). Continuous streaming at these speeds obviously makes USB a bottleneck (1.5 Mbit = ~180 kB/s, with the USB controller seemingly being able to handle up to about 70 kB/s without double buffering). But for reasonably intermittent flows, one big benefit is the lightened load on the target.
Your checklist for this pull request
Closing issues
None that I'm aware of