Skip to content

Many threads locking two mutexes can cause a crash #89331

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
dewitt-garmin opened this issue Apr 30, 2025 · 1 comment
Open

Many threads locking two mutexes can cause a crash #89331

dewitt-garmin opened this issue Apr 30, 2025 · 1 comment
Assignees
Labels
area: Kernel Backport Backport PR and backport failure issues bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug

Comments

@dewitt-garmin
Copy link
Contributor

Describe the bug
In some complex scenarios involving several threads and a few mutexes we have experienced some intermittent crashes depending on timing of the system. I've narrowed this down to an issue with the scalable waitq implementation not re-inserting when a priority changes, potentially making the rb tree invalid. Without the scalable waitq the wrong thread may be run which may or may not cause an issue depending on the application.

To Reproduce
This is not limited to mutexes, but the issue was most easily reproduced with mutexes with scalable waitq enabled. Consider 4 threads in decreasing priority order: A, B, C, and D along with two mutexes, m0 and m1:

  1. D locks m1
  2. C locks m0
  3. C pends on m1
  4. B pends on m1
  5. A pends on m0, boosts C's priority, now tree on m1 is not sorted
  6. D unlocks m1, left-most thread on tree is B. When removing B from tree it cannot be found because it searches to the right of C due to C's boosted priority when the node is actually on the left. rb_remove silently fails.
  7. B unlocks m1, left-most thread on tree is still B and it tries to unpend itself, resulting in a NULL pointer dereference on B->base.pended_on.

Expected behavior
System does not crash.

Impact
Intermittent but fairly consistent crashes on our system.

Logs and console output
See test: https://github.com/zephyrproject-rtos/zephyr/pull/87235/files#diff-8adc688dcc6c66e2f0a064f4ed385d3ff51e325b66ab8e4e9b7570cf80d1bf22

Environment (please complete the following information):

  • OS: WSL
  • Toolchain gcc-arm-none-eabi
  • v3.3, v3.7

Additional context
Fixed in #87235
Looking to get backported:
#87840
#87839
#87841

@dewitt-garmin dewitt-garmin added the bug The issue is a bug, or the PR is fixing a bug label Apr 30, 2025
Copy link

Hi @dewitt-garmin! We appreciate you submitting your first issue for our open-source project. 🌟

Even though I'm a bot, I can assure you that the whole community is genuinely grateful for your time and effort. 🤖💙

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Kernel Backport Backport PR and backport failure issues bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug
Projects
None yet
Development

No branches or pull requests

5 participants