Fix SMP deadlock: consistent lock ordering in enif_monitor_process#2057
Fix SMP deadlock: consistent lock ordering in enif_monitor_process#2057bettio merged 1 commit intoatomvm:mainfrom
Conversation
|
There is another lock process_table > monitors lock order with So eventually, the lock should be the other way around or just no lock of both the process table and the resource type monitors in the |
src/libAtomVM/resources.c
Outdated
| return -1; | ||
| } | ||
|
|
||
| struct ListHead *monitors_head = synclist_wrlock(&resource_type->monitors); |
There was a problem hiding this comment.
If we revert this change in enif_monitor_process we should be good. I searched for potential leaks but didn't find any.
The process_info paths (from the nif or from the context_destroy, etc.) currently have the same lock order as enif_monitor_process had before this change: lock process table then lock monitors.
Besides, this current change does lock monitors -> lock process table -> release monitors -> release process table which I think we should avoid, I'd rather have nested locks (lock A lock B unlock B unlock A).
849694f to
a24fd78
Compare
https://ampcode.com/threads/T-019ba1c2-2a7f-77c7-bd33-ce9f303152a2 Fix SMP deadlock, due to ABBA deadlocks. Signed-off-by: Peter M <petermm@gmail.com>
a24fd78 to
7322265
Compare
User was reporting random but certain deadlocks when testing httpd webserver - this fixes the ABBA deadlock.
Entirely by AI:
https://ampcode.com/threads/T-019ba1c2-2a7f-77c7-bd33-ce9f303152a2
Verified as a fix, repeating load testing for multiple hours..
Summary
Fix a lock ordering inversion that causes deadlocks under SMP on ESP32 (and potentially other platforms) when sockets are used under heavy load.
Problem
enif_monitor_processandenif_demonitor_processacquire locks in opposite orders:enif_monitor_processprocesses_table→monitorsenif_demonitor_processmonitors→processes_tabledestroy_resource_monitorsmonitors→processes_tableThis creates an ABBA deadlock when two threads call these functions concurrently—one holds
processes_tablewaiting formonitors, while the other holdsmonitorswaiting forprocesses_table.The issue is triggered by
otp_socket.cwhich calls both monitor/demonitor from NIFs, the select thread, and monitor callbacks under load.With
AVM_NO_SMP,synclist_wrlockis a no-op so no deadlock occurs, which explains why disabling SMP works around the issue.Fix
Change
enif_monitor_processto acquire locks in the same order as the other functions:monitors→processes_table.Testing
These changes are made under both the "Apache 2.0" and the "GNU Lesser General
Public License 2.1 or later" license terms (dual license).
SPDX-License-Identifier: Apache-2.0 OR LGPL-2.1-or-later