Skip to content

Conversation

@encukou
Copy link
Member

@encukou encukou commented Mar 18, 2025

There is some evidence that a delay in nextBuild can leak to unrelated workers.

This switches to canStartBuild, whose docs show an example to put the worker in “quarantine” and avoid the quarantine's exponential backoff mechanism.

I'd welcome some eyes on this before testing on the live box.

There is some evidence that a delay in nextBuild can leak to unrelated
workers.
Switch to canStartBuild, whose docs show an example to put the worker
in “quarantine” and avoid the quarantine's exponential backoff
mechanism.
@encukou encukou requested review from diegorusso and zware March 18, 2025 08:21
@encukou encukou changed the title Use canStartBuild rather than nextBuild for starting Use canStartBuild rather than nextBuild for avoiding perf runs Mar 18, 2025
Copy link
Contributor

@diegorusso diegorusso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the change, it looks OK to me.

builder.master.reactor.callLater(
int(delay),
deferred.callback,
requests[0],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the problem is here, where the first element of requests is a "random" job that gets deferred.

Copy link
Member

@zware zware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine to me. I do wonder if pause rather than quarantine might be a cleaner solution overall (ideally controlled by the worker itself; I have a question in to the #buildbot IRC channel about whether there's a nice way to do that), but I think this is improvement enough to go ahead with it anyway.

@encukou encukou merged commit 65fa367 into python:main Mar 18, 2025
1 check passed
@encukou encukou deleted the dont-leak-nextBuild branch March 19, 2025 08:15
@encukou
Copy link
Member Author

encukou commented Mar 19, 2025

AFAICS, quarantine has a built-in timeout, but to un-pause a worker we'd need to reliably schedule running some un-pausing code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants