Skip to content

Conversation

@cjen1-msft
Copy link
Contributor

This is an alternative to #7541 which introduces a close_on_error context.
The idea is that if an exception is thrown in the context, it closes the network that it wraps and then re-raises.

Our current test infra allows us to wrap a network (infra.network.network) where that network does not leak outside of the context.

However we have a bunch of tests which take a network, make some changes/tests and then return a network.
The above wrapper does not work for these tests, making it harder on occasion to debug failures.

This PR adds a new context wrapper which logs and closes the network whenever there is an exception before re-raising the exception.
ie:

network = Network(...)
with close_on_error(network):
  # use network
  if should_raise:
      raise Exception(...)
network.stop_all_nodes()

In this example if should_raise is true then we raise the exception which is caught, the network stopped etc, and then re-raised. Otherwise the code reaches network.stop_all_nodes().

@cjen1-msft cjen1-msft marked this pull request as ready for review December 30, 2025 11:12
@cjen1-msft cjen1-msft requested a review from a team as a code owner December 30, 2025 11:12
Copilot AI review requested due to automatic review settings December 30, 2025 11:12
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a reusable close_on_error context manager to handle network cleanup when exceptions occur during test execution. The context manager logs stack traces, optionally activates the debugger, and ensures proper network shutdown before re-raising exceptions.

Key Changes:

  • Added a new close_on_error context manager in tests/infra/network.py to encapsulate exception handling and cleanup logic
  • Refactored the existing network context manager to use close_on_error internally
  • Applied close_on_error to wrap critical sections in tests/recovery.py where the network is created and used but not managed by the network context manager

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
tests/infra/network.py Introduces the close_on_error context manager and refactors the network context manager to use it
tests/recovery.py Wraps the recovered_network operations with close_on_error to ensure proper cleanup on exceptions
tests/e2e_operations.py Adds a clarifying comment to document that the network should still be live at that point in the test


# Closes the network on error, logging stack traces and optionally dropping into pdb
@contextmanager
def close_on_error(net, has_partitioner=False, pdb=False):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These look unused to me?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect this's for partitions and other tests to be switched to this, but I'd say we just too altogether then, although not insisting on this, please feel free to resolve and merge

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The has_partitioner and pdb get passed through from the network context to allow it to wrap this one (thus getting test coverage of it).
The has_partitioner gets set by the init_partitioner which is used in the partition tests, while pdb gets passed directly through from the root args.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we don't use this wrapper in those tests in this PR, do we?


LOG.info("Stopping network")
net.stop_all_nodes(skip_verification=True, accept_ledger_diff=True)
if has_partitioner:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the partitioner cleanup always be done by stop_all_nodes? It feels that way to me, why would you want to keep partitions in place once you've effectively killed the network.

if pdb:
import pdb

pdb.set_trace()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pdb.set_trace()
pdb.post_mortem()

https://docs.python.org/3/library/pdb.html#pdb.post_mortem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants