Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monitoring malloc usage #20363

Merged
merged 1 commit into from
Mar 26, 2024
Merged

Monitoring malloc usage #20363

merged 1 commit into from
Mar 26, 2024

Conversation

mguetschow
Copy link
Contributor

@mguetschow mguetschow commented Feb 8, 2024

This is still far from ready to be merged, but before investing more time into it I would like to gather some feedback, see below.

Contribution description

This is a first attempt of a malloc monitoring implementation that is meant to be used for debugging purposes when working with (libraries that make use of) malloc and friends. Compared to the MALLOC_TRACING module already available in RIOT, this is not designed to print the calls to stdout, but to keep track of the amount of memory that is currently dynamically allocated, as well as the all-time max. I was using this while playing with uzlib in #20293.

Discussion points

  • Does such an implementation make sense at all? Or can someone think of another, easier way of monitoring the dynamic memory usage? I tried using mallinfo() but that didn't report the all-time max correctly.
  • This just hooks into sys/malloc_thread_safe which is afaik not used on all boards, and definitely not on native. Is there a list somewhere that shows which boards use which malloc implementation? Would it make sense to hook into all of them?
  • I would probably implement most of this in a separate module. In that case, do we want MALLOC_TRACING to stay around or should it be removed in favor of this implementation? Pinging @benpicco @maribu here since this was originally his contribution.

Testing procedure

None yet.

Test provided in tests/sys/malloc_monitor.

@github-actions github-actions bot added the Area: sys Area: System label Feb 8, 2024
@maribu
Copy link
Member

maribu commented Feb 12, 2024

Cool! I'm almost certain that some standard c libs (looking avrlibc) do not have mallinfo anyway, so this would be a solution to add this globally.

What would also be quite useful is adding a backtrace to malloc_state_rm() when called on an address not traced (e.g. to detect double-free).

In that case, do we want MALLOC_TRACING to stay around or should it be removed in favor of this implementation?

I found that quite useful to trace down a call to free() with a wild pointer. If malloc_state_rm() where to print the caller address when called on an address not traced, this would work even better for that use case, as it would only print the offending side.

But I guess it can still be useful to resort to highest output verbosity in some cases and the implementation is pretty low-key to maintain, so I guess it could just remain. Maybe just rename that to malloc_tracing_verbose, so that the name malloc_tracing would be available again?

This just hooks into sys/malloc_thread_safe which is afaik not used on all boards, and definitely not on native.

This is correct. What malloc_thread_safe is trying to provide is thread-safety on a C lib that does not have reentrant syscalls. E.g. the authors avrlibc never imagined someone would be running multiple threads on an AVR, so that will never be thread safe. But also for newlib and picolibc the overhead of enabling reentrant syscalls was prohibitive (at least as a default). Still, corruption and crashes (as concurrent calls to malloc() previously caused) was not acceptable and this fixed the most pressing issue with relatively low overhead.

On native calls to malloc() are thread-safe anyway, so this doesn't wrap them. Enabling malloc_thread_safe there anyway (as dependency for malloc_tracing) should however not do any harm, other than a bit more overhead.

@mguetschow
Copy link
Contributor Author

Thanks for the feedback! I will then go forward and clean this up.

What would also be quite useful is adding a backtrace to malloc_state_rm() when called on an address not traced (e.g. to detect double-free).

With backtrace you mean basically the same type of print that is used in malloc_tracing currently? I can certainly add that.

But I guess it can still be useful to resort to highest output verbosity in some cases and the implementation is pretty low-key to maintain, so I guess it could just remain. Maybe just rename that to malloc_tracing_verbose, so that the name malloc_tracing would be available again?

If we want to have the option of high verbosity, I would rather offer a configuration option for the new module and integrate the prints there behind a compile-time flag. I would suppose that a configuration option for a module is nicer for the user than having to deal with two different (pseudo)modules.

I would have proposed malloc_monitor as module name, how does that sound? I'd argue that it is monitoring the dynamic heap instead of tracing calls as malloc_tracing is currently doing.

This just hooks into sys/malloc_thread_safe which is afaik not used on all boards, and definitely not on native.

This is correct.

Are there any more boards where sys/malloc_thread_safe is not used? Is enabling it always (also on boards other than native) as a dependency of the new module a good option?

@github-actions github-actions bot added Area: doc Area: Documentation Area: tests Area: tests and testing framework Area: build system Area: Build system Area: Kconfig Area: Kconfig integration labels Mar 11, 2024
@mguetschow mguetschow marked this pull request as ready for review March 11, 2024 12:34
@mguetschow
Copy link
Contributor Author

This is now ready for review.

The module malloc_monitor contains the functionality described in the initial post and can be configured with Kconfig and the corresponding cflags wrt the maximum number of pointers that can be monitored at once and the verbosity. For verbose output, this completely replaces the functionality of the pseudo-module malloc_tracing, which is therefore deprecated and scheduled for removal after 2024.07.

As discussed, malloc_monitor builds on and therefore includes malloc_thread_safe automatically. Shouldn't do any harm apart from some runtime overhead, but malloc_monitor is anyway not meant to be used in production code.

@mguetschow mguetschow added the CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR label Mar 11, 2024
@riot-ci
Copy link

riot-ci commented Mar 11, 2024

Murdock results

✔️ PASSED

f31f820 sys: add malloc_monitor, deprecate malloc_tracing

Success Failures Total Runtime
10031 0 10031 10m:50s

Artifacts

@mguetschow
Copy link
Contributor Author

The murdock failure for native64:llvm is apparently only a single example of all boards with TOOLCHAIN=llvm failing, other locally failing boards are native and nrf52840dk.

It seems the wrapping functions for malloc and friends are not properly handled by clang, which however also means that the thread-safety wrappers are ignored by that toolchain. Interestingly, tests/sys/malloc_thread_safety built with TOOLCHAIN=llvm does succeed also on nrf52840dk, where malloc_thread_safe is pulled in by default.

@mguetschow
Copy link
Contributor Author

I have to correct myself, it was a case of the compiler being smarter than the human again :P

Since the memory returned by malloc was never used, all calls to malloc and free where just silently optimized away. Fixed that now by declaring the memory as volatile for the sake of the test.

@maribu
Copy link
Member

maribu commented Mar 15, 2024

Yeah, we had a test for calloc() returning NULL when the multiplication was overflowing. LLVM notices that the requested memory was never actually used, so it optimized the allocation out and just claimed that the optimized out allocation succeeded (and calloc() returned a non-NULL pointer) and the test failed therefore.

@mguetschow
Copy link
Contributor Author

./dist/tools/doccheck/check.sh x
Error: reached end of file while inside a '```' block!

Unfortunately I can't reproduce this locally. What's the doxygen version that's used in the CI?

sys/malloc_monitor/doc.txt Outdated Show resolved Hide resolved
Copy link
Member

@maribu maribu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks in general good to me. One open issue I see is synchronization to avoid data races when accessing the monitor data.

On the malloc() side, we do already have a mutex called. However, a badly timed malloc_monitor_reset_high_watermark() would result in data corruption, even on systems where writing to malloc_monitor.high_watermark would be an atomic operation.

sys/malloc_monitor/malloc_monitor.c Outdated Show resolved Hide resolved
sys/malloc_monitor/malloc_monitor.c Outdated Show resolved Hide resolved
sys/malloc_monitor/malloc_monitor.c Outdated Show resolved Hide resolved
sys/malloc_monitor/malloc_monitor.c Outdated Show resolved Hide resolved
sys/malloc_monitor/malloc_monitor.c Outdated Show resolved Hide resolved
tests/sys/malloc_monitor/main.c Outdated Show resolved Hide resolved
sys/malloc_monitor/malloc_monitor.c Outdated Show resolved Hide resolved
@mguetschow
Copy link
Contributor Author

Thanks for the review!

One open issue I see is synchronization to avoid data races when accessing the monitor data.

Very good point o.O, thanks for pointing it out! Addressed with c5b3528, please have a look :)

@mguetschow
Copy link
Contributor Author

@maribu @gdoffe maybe you can have another look to include this in the next release :)

@gdoffe
Copy link
Contributor

gdoffe commented Mar 26, 2024

@maribu @gdoffe maybe you can have another look to include this in the next release :)

As a simple user of your work LGTM, good job ! 👍

@maribu maribu enabled auto-merge March 26, 2024 08:35
@maribu maribu added this pull request to the merge queue Mar 26, 2024
Merged via the queue into RIOT-OS:master with commit a9d052b Mar 26, 2024
25 checks passed
@mguetschow
Copy link
Contributor Author

Thank you!

@mguetschow mguetschow deleted the malloc-monitor branch March 26, 2024 13:29
@MrKevinWeiss MrKevinWeiss added this to the Release 2024.04 milestone Apr 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: build system Area: Build system Area: doc Area: Documentation Area: Kconfig Area: Kconfig integration Area: sys Area: System Area: tests Area: tests and testing framework CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants