Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[release-2.7] vSphere: ignore canceled VMs when scheduling #1466

Open
wants to merge 1 commit into
base: release-2.7
Choose a base branch
from

Conversation

mansam
Copy link
Contributor

@mansam mansam commented Mar 22, 2025

Backport of #1465

Now that the migration runner attempts to schedule
as many VMs as it can in a single reconcilation,
it has exposed a bug in the scheduler where canceled
VMs are rescheduled endlessly in certain pathological
cases, preventing the reconciliation from terminating.

(for instance, if a VM is canceled before
it has been marked started, it will always appear
ready to schedule although there is nothing for
the plan controller to do with it.)

The endless rescheduling causes the controller to
infinite loop and be unable to reconcile resources,
necessitating that the offending Migration resource
be deleted and the controller pod be restarted.

Checking for the `Canceled` condition on the VM is adequate
to prevent the problem. (Requesting cancelation of the VM
via the Migration resource will cause the VM to be
marked with the Canceled condition in memory as the Plan
is reconciled, which will cause it to be ignored by the
scheduler, and on the following reconcile it will be
cleaned up as intended.)

Signed-off-by: Sam Lucidi <[email protected]>
@mansam mansam requested review from mnecas and yaacov as code owners March 22, 2025 02:05
@mansam mansam changed the title [Backport release-2.7] vSphere: ignore canceled VMs when scheduling [release-2.7] vSphere: ignore canceled VMs when scheduling Mar 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant