Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DWC2 DMA support #2576

Open
wants to merge 10 commits into
base: master
Choose a base branch
from
Open

DWC2 DMA support #2576

wants to merge 10 commits into from

Conversation

HiFiPhile
Copy link
Collaborator

@HiFiPhile HiFiPhile commented Apr 5, 2024

Describe the PR
Add Internal DMA support to DWC2.

Status

Model Core Rev Enumeration cdc_ual_ports audio_4_channel_mic video_capture_2ch
STM32F407
FS w/wo DMA
2.81a ✔️ ✔️ ✔️ ✔️
STM32F429
FS w/wo DMA
2.81a ✔️ ✔️ ✔️ ✔️
STM32F723
FS + HS w/wo DMA
3.30a ✔️ ✔️ ✔️ ✔️
STM32U5
HS w/wo DMA
4.11a ✔️ ✔️ ✔️ ✔️
Model Core Rev Enumeration cdc_msc_freertos audio_4_channel_mic_freertos hid_composite_freertos
ESP32-S3
FS w/wo DMA
4.00a ✔️ ✔️ ✔️ ✔️

Benchmark

STM32F723E-DISCO
IAR 9.50 High-Balanced
I-Cache enabled

Method used in #920, all buffers set to 2048b.

Throughput Total CPU Usage
DMA OFF 26.7 MB/s 55%
DMA ON 29.5 MB/s 26%

@HiFiPhile

This comment was marked as resolved.

@@ -67,18 +67,6 @@
// Debug level for DWC2
#define DWC2_DEBUG 2

#ifndef dcache_clean
#define dcache_clean(_addr, _size)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dcache code should be removed and add MPU suggestion to make usb ram non cacheable:

  • Not all internal buffers are aligned to cache line size and has size of multiple cache line size, speculative read or cache eviction of adjacent variables could lead to racing with DMA.
  • Frequent cache clean invalidate hurt performance
  • In most cases Cortex-M7 recommend use DTCM where cache is not used

static inline bool dma_enabled(uint8_t rhport)
{
// DMA doesn't support fifo transfer
#ifdef TUD_AUDIO_PREFER_RING_BUFFER
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't want to depend on class driver but maybe it's the simplest way.

@@ -351,6 +382,12 @@ static void bus_reset(uint8_t rhport) {
// Setup the control endpoint 0
_allocated_fifo_words_tx = 16;

// DMA needs extra space for processing
if(dma_enabled(rhport)) {
uint16_t reserved = _dwc2_controller[rhport].ep_fifo_size / 4- dwc2->ghwcfg3_bm.total_fifo_size;
Copy link
Collaborator Author

@HiFiPhile HiFiPhile Apr 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kernel use dwc2->ghwcfg3_bm.total_fifo_size it should be reliable.
We can even remove _dwc2_controller[rhport].ep_fifo_size and use this field instead, except GD32VF103 looks strange.

@HiFiPhile HiFiPhile changed the title WIP: DWC2 DMA support DWC2 DMA support Apr 25, 2024
@HiFiPhile HiFiPhile marked this pull request as ready for review April 25, 2024 22:15
@leptun

This comment was marked as resolved.

@leptun

This comment was marked as resolved.

@HiFiPhile

This comment was marked as resolved.

@leptun

This comment was marked as resolved.

@HiFiPhile

This comment was marked as resolved.

@leptun

This comment was marked as resolved.

@leptun

This comment was marked as resolved.

@HiFiPhile

This comment was marked as resolved.

@leptun

This comment was marked as resolved.

@HiFiPhile

This comment was marked as resolved.

@leptun

This comment was marked as resolved.

@HiFiPhile

This comment was marked as resolved.

@leptun

This comment was marked as resolved.

@leptun
Copy link
Contributor

leptun commented May 5, 2024

Just tested the fix in 02ec486 and it solved the issue on my end as well 👍
Also verified the dual-CDC example on a stm32f746-disco board with the HS ULPI phy and that enumerates as well

@HiFiPhile

This comment was marked as resolved.

@leptun

This comment was marked as resolved.

@HiFiPhile

This comment was marked as resolved.

@leptun

This comment was marked as resolved.

@HiFiPhile

This comment was marked as resolved.

@leptun

This comment was marked as resolved.

@roma-jam
Copy link

Hi @HiFiPhile ,
May I ask how did you measure the total CPU load?

@hathach hathach self-requested a review September 18, 2024 10:59
@hathach
Copy link
Owner

hathach commented Sep 18, 2024

thank you very much @HiFiPhile again for another great PR. sorry for super late response, I am doing more test with this, seems like it have some issue with my h743 eval which uses external HS PHY. That is ok, I am doing more testing and reading hopefully we could fix and merge this soon.

@HiFiPhile
Copy link
Collaborator Author

HiFiPhile commented Sep 18, 2024

seems like it have some issue with my h743 eval which uses external HS PHY

Thanks for testing, it's the combination I'm missing.

May I ask how did you measure the total CPU load?

@roma-jam In short I'm using GPIO toggle in ISR and in event queue processing, then take the averaged sum of 10 cycles.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants