kubernetes errors
April 23, 2023

Liveness probe failed and Readiness probe failed

Возникает из-за того что контейнер в поде не проходит проверку на живость и доступность.

Ошибки Readiness probe failed/Liveness probe failed и последующая за ним Back-off restarting failed container, говорят о том, что контейнер не прошёл проверку на живость и доступность. И kubelet перезапускает его в надежде оживить, т.к. отрабатывает политика restartPolicy.
В качестве варианта решения можно попробовать увеличить таймаут ожидания ответа от контейнера в два раза - timeoutSeconds: x2. Если это не поможет, то проблема в нехватке ресурсов у воркер нод. И прежде всего cpu.

Пример того, как выглядит блок events у такого пода:

Events:
  Type     Reason     Age                   From                      Message
  ----     ------     ----                  ----                      -------
  Normal   Scheduled  30m                   default-scheduler         Successfully assigned ingress-nginx/ingress-nginx-controller-5578bd689b-lhrt4 to cl14vuhr5k8ulcdnhd42-uder
  Normal   Pulling    30m                   kubelet                   Pulling image "k8s.gcr.io/ingress-nginx/controller:v1.1.1@sha256:0bc88eb15f9e7f84e8e56c14fa5735aaa488b840983f87bd79b1054190e660de"
  Normal   Pulled     30m                   kubelet                   Successfully pulled image "k8s.gcr.io/ingress-nginx/controller:v1.1.1@sha256:0bc88eb15f9e7f84e8e56c14fa5735aaa488b840983f87bd79b1054190e660de" in 12.213938341s
  Normal   Created    30m                   kubelet                   Created container controller
  Normal   Started    30m                   kubelet                   Started container controller
  Normal   RELOAD     30m                   nginx-ingress-controller  NGINX reload triggered due to a change in configuration
  Normal   Killing    28m                   kubelet                   Container controller failed liveness probe, will be restarted
  Warning  Unhealthy  28m (x5 over 29m)     kubelet                   Readiness probe failed: Get "http://10.113.160.12:10254/healthz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
  Normal   Pulled     27m                   kubelet                   Container image "k8s.gcr.io/ingress-nginx/controller:v1.1.1@sha256:0bc88eb15f9e7f84e8e56c14fa5735aaa488b840983f87bd79b1054190e660de" already present on machine
  Normal   RELOAD     27m                   nginx-ingress-controller  NGINX reload triggered due to a change in configuration
  Normal   RELOAD     24m                   nginx-ingress-controller  NGINX reload triggered due to a change in configuration
  Normal   RELOAD     20m                   nginx-ingress-controller  NGINX reload triggered due to a change in configuration
  Normal   RELOAD     17m                   nginx-ingress-controller  NGINX reload triggered due to a change in configuration
  Normal   RELOAD     13m                   nginx-ingress-controller  NGINX reload triggered due to a change in configuration
  Warning  Unhealthy  10m (x70 over 28m)    kubelet                   Readiness probe failed: HTTP probe failed with statuscode: 500
  Normal   RELOAD     9m22s                 nginx-ingress-controller  NGINX reload triggered due to a change in configuration
  Normal   RELOAD     6m15s                 nginx-ingress-controller  NGINX reload triggered due to a change in configuration
  Warning  Unhealthy  5m27s (x36 over 29m)  kubelet                   Liveness probe failed: Get "http://10.113.160.12:10254/healthz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
  Warning  BackOff    20s (x18 over 3m31s)  kubelet                   Back-off restarting failed container