-
Notifications
You must be signed in to change notification settings - Fork 246
Add close_on_error context to wrap critical sections #7544
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces a reusable close_on_error context manager to handle network cleanup when exceptions occur during test execution. The context manager logs stack traces, optionally activates the debugger, and ensures proper network shutdown before re-raising exceptions.
Key Changes:
- Added a new
close_on_errorcontext manager intests/infra/network.pyto encapsulate exception handling and cleanup logic - Refactored the existing
networkcontext manager to useclose_on_errorinternally - Applied
close_on_errorto wrap critical sections intests/recovery.pywhere the network is created and used but not managed by thenetworkcontext manager
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| tests/infra/network.py | Introduces the close_on_error context manager and refactors the network context manager to use it |
| tests/recovery.py | Wraps the recovered_network operations with close_on_error to ensure proper cleanup on exceptions |
| tests/e2e_operations.py | Adds a clarifying comment to document that the network should still be live at that point in the test |
Co-authored-by: Copilot <[email protected]>
|
|
||
| # Closes the network on error, logging stack traces and optionally dropping into pdb | ||
| @contextmanager | ||
| def close_on_error(net, has_partitioner=False, pdb=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These look unused to me?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect this's for partitions and other tests to be switched to this, but I'd say we just too altogether then, although not insisting on this, please feel free to resolve and merge
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The has_partitioner and pdb get passed through from the network context to allow it to wrap this one (thus getting test coverage of it).
The has_partitioner gets set by the init_partitioner which is used in the partition tests, while pdb gets passed directly through from the root args.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we don't use this wrapper in those tests in this PR, do we?
|
|
||
| LOG.info("Stopping network") | ||
| net.stop_all_nodes(skip_verification=True, accept_ledger_diff=True) | ||
| if has_partitioner: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the partitioner cleanup always be done by stop_all_nodes? It feels that way to me, why would you want to keep partitions in place once you've effectively killed the network.
| if pdb: | ||
| import pdb | ||
|
|
||
| pdb.set_trace() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| pdb.set_trace() | |
| pdb.post_mortem() |
This is an alternative to #7541 which introduces a close_on_error context.
The idea is that if an exception is thrown in the context, it closes the network that it wraps and then re-raises.
Our current test infra allows us to wrap a network (
infra.network.network) where that network does not leak outside of the context.However we have a bunch of tests which take a network, make some changes/tests and then return a network.
The above wrapper does not work for these tests, making it harder on occasion to debug failures.
This PR adds a new context wrapper which logs and closes the network whenever there is an exception before re-raising the exception.
ie:
In this example if
should_raiseis true then we raise the exception which is caught, the network stopped etc, and then re-raised. Otherwise the code reaches network.stop_all_nodes().