Skip to content

Fix SMP deadlock: consistent lock ordering in enif_monitor_process#2057

Merged
bettio merged 1 commit intoatomvm:mainfrom
petermm:smp-fix-ampcode
Jan 24, 2026
Merged

Fix SMP deadlock: consistent lock ordering in enif_monitor_process#2057
bettio merged 1 commit intoatomvm:mainfrom
petermm:smp-fix-ampcode

Conversation

@petermm
Copy link
Contributor

@petermm petermm commented Jan 10, 2026

User was reporting random but certain deadlocks when testing httpd webserver - this fixes the ABBA deadlock.

Entirely by AI:
https://ampcode.com/threads/T-019ba1c2-2a7f-77c7-bd33-ce9f303152a2

Verified as a fix, repeating load testing for multiple hours..

Summary

Fix a lock ordering inversion that causes deadlocks under SMP on ESP32 (and potentially other platforms) when sockets are used under heavy load.

Problem

enif_monitor_process and enif_demonitor_process acquire locks in opposite orders:

Function Lock Order
enif_monitor_process processes_tablemonitors
enif_demonitor_process monitorsprocesses_table
destroy_resource_monitors monitorsprocesses_table

This creates an ABBA deadlock when two threads call these functions concurrently—one holds processes_table waiting for monitors, while the other holds monitors waiting for processes_table.

The issue is triggered by otp_socket.c which calls both monitor/demonitor from NIFs, the select thread, and monitor callbacks under load.

With AVM_NO_SMP, synclist_wrlock is a no-op so no deadlock occurs, which explains why disabling SMP works around the issue.

Fix

Change enif_monitor_process to acquire locks in the same order as the other functions: monitorsprocesses_table.

Testing

  • Tested on ESP32 with SMP enabled under heavy socket load, by @schnittchen

These changes are made under both the "Apache 2.0" and the "GNU Lesser General
Public License 2.1 or later" license terms (dual license).

SPDX-License-Identifier: Apache-2.0 OR LGPL-2.1-or-later

@pguyot
Copy link
Collaborator

pguyot commented Jan 11, 2026

There is another lock process_table > monitors lock order with context_process_process_info_request_signal that calls context_get_process_info with the process table locked, and context_get_process_info may lock the resource type monitors lock.

So eventually, the lock should be the other way around or just no lock of both the process table and the resource type monitors in the enif_demonitor_process and destroy_resource_monitors functions.

return -1;
}

struct ListHead *monitors_head = synclist_wrlock(&resource_type->monitors);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we revert this change in enif_monitor_process we should be good. I searched for potential leaks but didn't find any.

The process_info paths (from the nif or from the context_destroy, etc.) currently have the same lock order as enif_monitor_process had before this change: lock process table then lock monitors.

Besides, this current change does lock monitors -> lock process table -> release monitors -> release process table which I think we should avoid, I'd rather have nested locks (lock A lock B unlock B unlock A).

https://ampcode.com/threads/T-019ba1c2-2a7f-77c7-bd33-ce9f303152a2

Fix SMP deadlock, due to ABBA deadlocks.

Signed-off-by: Peter M <petermm@gmail.com>
@bettio bettio merged commit e90d215 into atomvm:main Jan 24, 2026
188 of 192 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants