You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When running lxcfs inside a GCP VM, we encountered a strange situation after the VM was live-migrated to a different host.
It seems that the kernel missed some interrupts due to the live migration, which caused the thread responsible for calculating the loadavg values inside containers to fail to wake up. This resulted in a freeze in the loadavg values for about 70 minutes.
Initially, when we observed the issue, we decided to try reloading lxcfs, hoping this would restart the loadavg thread. However, instead of resolving the issue, all attempts to read the files managed by lxcfs ended up hanging in the read call, and the reload did not occur. We had to restart the service and all containers running on the VM to resume normal operations.
A kernel backtrace of the thread before the restart showed that it was sleeping:
We had a few more nodes with the same issue, and they all recovered (started recalculating the loadavg values again) within 70 minutes without any additional intervention.
Further investigation showed the following message in the kernel logs, indicating that we missed about 0.21 seconds:
kernel: hrtimer: interrupt took 210767669 ns
This led us to conclude that we missed the wake-up time for the timer started by the usleep in the loadavg thread. We calculated that it would take about 70 minutes for the timer to overflow and reach the same value again, allowing normal operations to continue.
This issue does not occur with every live migration, so we believe it only happens when the timer is about to finish at the time the kernel skips due to migration.
I am opening a PR to address the issue with the 70-minute-long reload. It seems that normally the reload signal will interrupt a sleep only in the main thread (since it is handled by the main thread), so simply sending a signal to the loadavg thread seems to interrupt the sleep and is sufficient to allow the reload to remediate the situation. We’ve tested this on another node that experienced the same issue, and it works.
I would appreciate any advice on handling this situation in general (besides catching the kernel message and reloading lxcfs).
Regards,
Deyan
The text was updated successfully, but these errors were encountered:
When running lxcfs inside a GCP VM, we encountered a strange situation after the VM was live-migrated to a different host.
It seems that the kernel missed some interrupts due to the live migration, which caused the thread responsible for calculating the loadavg values inside containers to fail to wake up. This resulted in a freeze in the loadavg values for about 70 minutes.
Initially, when we observed the issue, we decided to try reloading lxcfs, hoping this would restart the loadavg thread. However, instead of resolving the issue, all attempts to read the files managed by lxcfs ended up hanging in the read call, and the reload did not occur. We had to restart the service and all containers running on the VM to resume normal operations.
A kernel backtrace of the thread before the restart showed that it was sleeping:
We had a few more nodes with the same issue, and they all recovered (started recalculating the loadavg values again) within 70 minutes without any additional intervention.
Further investigation showed the following message in the kernel logs, indicating that we missed about 0.21 seconds:
This led us to conclude that we missed the wake-up time for the timer started by the usleep in the loadavg thread. We calculated that it would take about 70 minutes for the timer to overflow and reach the same value again, allowing normal operations to continue.
This issue does not occur with every live migration, so we believe it only happens when the timer is about to finish at the time the kernel skips due to migration.
I am opening a PR to address the issue with the 70-minute-long reload. It seems that normally the reload signal will interrupt a sleep only in the main thread (since it is handled by the main thread), so simply sending a signal to the loadavg thread seems to interrupt the sleep and is sufficient to allow the reload to remediate the situation. We’ve tested this on another node that experienced the same issue, and it works.
I would appreciate any advice on handling this situation in general (besides catching the kernel message and reloading lxcfs).
Regards,
Deyan
The text was updated successfully, but these errors were encountered: