Description
What happened?
When kubelet loses connect, the node goes into the unknown state. The node lifecycle controller marks the pod as not ready by the markPodsNotReady
function because the health check status of the pod can not be obtained through kubelet. This feature is available only when node's Ready
state transitions from true
to unknown
.
However, if the node is already in the fail state (such as a containerd failure), markPodsNotReady
will not take effect if the node loses its connection at this time.
kubernetes/pkg/controller/nodelifecycle/node_lifecycle_controller.go
Lines 883 to 888 in cac5388
In this case, the pod may accidentally remain ready, which may cause some network traffic to be accidentally forwarded to this node.
What did you expect to happen?
As long as the node loses its connection beyond grace time, MarkPodsNotReady
should always work
How can we reproduce it (as minimally and precisely as possible)?
- Stop containerd and wait for the node
Ready
state to false - Stop kubelet or shutdown the node and wait the node
Ready
state to unknown - The pods which not be evicted on this node would be always ready
Anything else we need to know?
In the node lifecycle controller logic,MarkPodsNotReady
is just triggered when a node goes from true
state to an unknown
state. The correct way is to trigger when the node becomes unknown
state regardless of whether the node state was previously true
Kubernetes version
$ kubectl version
Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.15", GitCommit:"1d79bc3bcccfba7466c44cc2055d6e7442e140ea", GitTreeState:"clean", BuildDate:"2022-09-22T06:03:36Z", GoVersion:"go1.16.15", Compiler:"gc", Platform:"linux/amd64"}
Cloud provider
OS version
# On Linux:
$ cat /etc/os-release
$ uname -a
5.4.119-1-tlinux4-0008 #1 SMP Fri Nov 26 11:17:45 CST 2021 x86_64 x86_64 x86_64 GNU/Linux
# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here
Install tools
Container runtime (CRI) and version (if applicable)
Related plugins (CNI, CSI, ...) and versions (if applicable)
Metadata
Metadata
Labels
Type
Projects
Status