Many threads locking two mutexes can cause a crash #89331
Labels
area: Kernel
Backport
Backport PR and backport failure issues
bug
The issue is a bug, or the PR is fixing a bug
priority: medium
Medium impact/importance bug
Describe the bug
In some complex scenarios involving several threads and a few mutexes we have experienced some intermittent crashes depending on timing of the system. I've narrowed this down to an issue with the scalable waitq implementation not re-inserting when a priority changes, potentially making the rb tree invalid. Without the scalable waitq the wrong thread may be run which may or may not cause an issue depending on the application.
To Reproduce
This is not limited to mutexes, but the issue was most easily reproduced with mutexes with scalable waitq enabled. Consider 4 threads in decreasing priority order: A, B, C, and D along with two mutexes, m0 and m1:
Expected behavior
System does not crash.
Impact
Intermittent but fairly consistent crashes on our system.
Logs and console output
See test: https://github.com/zephyrproject-rtos/zephyr/pull/87235/files#diff-8adc688dcc6c66e2f0a064f4ed385d3ff51e325b66ab8e4e9b7570cf80d1bf22
Environment (please complete the following information):
Additional context
Fixed in #87235
Looking to get backported:
#87840
#87839
#87841
The text was updated successfully, but these errors were encountered: