Kubernetes Eviction Policies for Handling Low RAM and Disk Space Situations - Part 1

Image for post
Image for post

By Alwyn Botha, Alibaba Cloud Community Blog author.

For this tutorial to be successful, you need to run it on a dedicated Kubernetes node. If other users are using Kubernetes, it will affect the RAM-sizing eviction logic carefully planned out herein.

You define thresholds for low RAM and low disk space, Kubernetes eviction policies act when those thresholds are reached. Kubernetes evicts Pods from a node to fix low RAM and low disk space problems.

Kubelet has configuration settings for defining resource thresholds. There are settings for disk space and RAM, but this tutorial will exclusively focus on RAM only.

Disk space eviction policies work the same as RAM eviction policies. Once you understand RAM eviction you will be able to easily apply your knowledge to disk space eviction.

Minikube Extra Config for Kubelet

You pass eviction thresholds to kubelet using --extra-config when you start minikube.

--extra-config=Kubernetes component.key="value"

ExperimentalCriticalPodAnnotation defines that critical Pods must not be evicted.

You must use this setting on a single node Kubernetes cluster. If not, your critical Pods will be evicted and you end up with a broken Kubernetes node.

( On a multiple node cluster it is possible to evict critical Pods — redundant copies on other nodes will automatically take over the workload. )

eviction-hard=”memory.available<600Mi”

Defines that when less than 600Mi RAM is available, Pods must be evicted HARD … immediately.

eviction-pressure-transition-period=”30s”

From https://kubernetes.io/docs/tasks/administer-cluster/out-of-resource/#oscillation-of-node-conditions

eviction-pressure-transition-period is the duration for which the kubelet has to wait before transitioning out of an eviction pressure condition: MemoryPressure.

We will see this in action below.

eviction-soft=”memory.available<800Mi”

Defines that when less than 800Mi RAM is available, Pods must be POTENTIALLY evicted SOFT, allowing a grace period eviction-soft-grace-period

eviction-soft-grace-period=”memory.available=2m”

Pods can exceed eviction-soft memory.available for this grace period. In this case it is 2 minutes.

If RAM available drops below SOFT threshold for less that this 2 minutes then no eviction will be done.

Prerequisites

I am running this on a 2200 MB VirtualBox virtual machine minikube. You must adjust these thresholds if you have a different size minikube node.

You will ONLY get identical eviction results on an identical size minikube node with nothing else running. ( This tutorial carefully calculated how many — which sizes — Pods will cause MemoryPressure conditions for exactly 2200 MB minikube node. )

You must follow this tutorial on a dedicated Kubernetes node. If other people are running Pods it will destroy the RAM-sizing eviction logic carefully planned out herein.

( If you run this on another size RAM server you must adjust the kernel boot parameter: mem=MEMORY_LIMIT )

From the bootparam main page:

Linux uses this BIOS call at boot to determine how much memory is installed.

You can use this boot arg to tell Linux how much memory you have. The value is in decimal or hexadecimal (prefix 0x), and the suffixes ‘k’ (times 1024) or ‘M’ (times 1048576) can be used.

Tips from This Tutorial

  • First time through do not follow all the links : they will break your train of though since you will spend a day reading it all. Facts at links not needed first time through. However it will add to your understanding of the topics once you did some practical exercises.
  • The eviction manager logs have a great deal of information. First time through do not try and decipher those logs on your server. I made a considerable effort here that makes those logs VERY easy to read. Just have a quick peek and see that your logs contain similar information as provided in this tutorial.
  • You need to run the date command repeatedly if you want to reconcile your logs with the creation and eviction of your Pods.
  • You may hit your eviction thresholds earlier or later than I did. Read both parts of this tutorial in full before you attempt to follow it step by step. This way if your experience differs, you will have some understanding as to what is happening.

First Pod Eviction Exercise

minikube start --extra-config=kubelet.eviction-hard="memory.available<600Mi" --extra-config=kubelet.feature-gates="ExperimentalCriticalPodAnnotation=true"  --extra-config=kubelet.eviction-pressure-transition-period="30s"  --extra-config=kubelet.eviction-soft="memory.available<800Mi"  --extra-config=kubelet.eviction-soft-grace-period="memory.available=2m"

Kubernetes developers provide this script that calculates RAM available — identical to the calculation kubelet uses for eviction decisions:

https://kubernetes.io/docs/tasks/administer-cluster/out-of-resource/memory-available.sh

You need to have this script on your node. We use it extensively in this tutorial. We are always only interested in the last line: memory.available_in_mb

source ./memory-available.sh | tail -n1

You need to wait at least 1 minute ( after minikube start ) for all startup processes to complete before you start running eviction tests. ( Available RAM becomes less during this first minute of turmoil )

Create spec for our first Pod:

nano myrampod2.yaml

We use an image: mytutorials/centos:bench , that I created and uploaded to the docker hub.

It contains a simple CentOS 7 base operating system. It also includes stress , a benchmark and stress test application.

command: [‘sh’, ‘-c’, ‘stress — vm 1 — vm-bytes 50M — vm-hang 3000 -t 3600’]

We run the stress benchmark utility:

  • vm 1 … we use 1 virtual machine ( process here )
  • vm-bytes 50M … we allocate 50 MB RAM
  • vm-hang 3000 … we let allocation hang 3000 seconds, otherwise it re-allocates every second ( eating ALL CPU time )
  • t 3600 … time out after 3600 seconds.
kubectl create -f myrampod3.yaml
pod/myram2 created

Check kubelet calculated: memory.available_in_mb:

memory.available_in_mb 1197

Check if node has MemoryPressure condition:

kubectl describe node minikube | grep MemoryPressure
MemoryPressure False Fri, 01 Feb 2019 08:02:50 +0200 KubeletHasSufficientMemory kubelet has sufficient memory available

No, false … kubelet has sufficient memory available

( Remember … eviction-soft=”memory.available<800Mi” )

Create another Pod that will use around 60 MB total )

nano myrampod3.yaml

Create :

kubectl create -f myrampod3.yaml
pod/myram3 created

Check kubelet calculated: memory.available_in_mb:

memory.available_in_mb 1139

No, false … kubelet has sufficient memory available

Create another Pod that will use around 60 MB total )

nano myrampod4.yaml

Check kubelet calculated: memory.available_in_mb:

memory.available_in_mb 1085

Create another Pod that will use around 60 MB total )

nano myrampod5.yaml

MemoryPressure True … why is that?

Check kubelet calculated: memory.available_in_mb:

memory.available_in_mb 635

Some other process uses 500 MB RAM. ( Could not see via TOP which one ) This always happens at this exact stage — hence the careful adding Pods one by one.

kubectl get pods

Right now we have 4 running Pods. This will change within seconds since the available RAM is way below soft threshold, it is at hard threshold. HARD means immediate Pod eviction.

  • eviction-hard=”memory.available<600Mi”
  • eviction-soft=”memory.available<800Mi”

I used the minikube logs command to capture kubelet eviction manager logs. These logs were extensively edited since each line contained too much information.

minikube logs # kubelet eviction manager logs

From https://kubernetes.io/docs/tasks/administer-cluster/out-of-resource/#evicting-end-user-pods

Kubelet rankes Pods for eviction

4 critical static Pods usage of starved RAM exceeds its request by more than our 4 tiny 50 MB pods.

If we did not specify ExperimentalCriticalPodAnnotation above these critical Pods will be evicted — resulting in a broken node.

minikube logs # kubelet eviction manager logs

Fortunately critical Pods do not get evicted. Pod myram2_default got evicted successfully. ( Our 4 Pods all exceed their memory request of “10Mi” by the same amount when it uses 50 MB each. So they are all ranked similarly here. The kubelet eviction manager does not display eviction rank values in the logs. )

A second later the second cycle of evictions continue: ( It would have been helpful if the eviction manager showed memory.available_in_mb while in an eviction cycle in the log. So we have to surmise the one eviction was not enough to bring RAM above eviction threshold. )

minikube logs # kubelet eviction manager logs

Second Pod on the priority list gets evicted.

This is the status of our running Pods at this point.

kubectl get pods

Check kubelet calculated: memory.available_in_mb:

memory.available_in_mb 696

100 MB above hard threshold — immediate evictions not need anymore.

100 MB below soft threshold — Pods will get evicted after eviction-soft-grace-period of 2 minutes.

Status of Pods after a few minutes:

kubectl get pods

pod myram3_default and pod myram4_default evicted.

minikube logs # kubelet eviction manager logs

Check kubelet calculated: memory.available_in_mb:

memory.available_in_mb 677

The last 2 evictions were unable to raise available RAM to above soft threshold: more evictions are needed.

You now have seen 4 times that:

cannot evict a critical static pod kube-apiserver-minikube_kube-system

Therefore those lines will be hidden from all log output for the rest of this tutorial.

Those critical 4 Pods will also be removed from the pods ranked for eviction: list for rest of tutorial.

During the next few minutes other not-critical but still Kubernetes-system Pods are evicted.

metrics-server and kubernetes-dashboard gets evicted.

minikube logs # kubelet eviction manager logs

The kube-proxy Pod gets evicted.

However 30 seconds later a replacement Pod got started. This replacement now ranks first for eviction.

It gets evicted, but 30 seconds later a replacement exists. This bad cycle continues ( probably forever ).

I edited the kube-proxy-12fy7_kube-system to be neat kube-proxy-11111_kube-system so you can more easily see the problem.

You can recognize minikube logs output by now, so it is no longer marked as such.

06:07:24 pods ranked for eviction:
kube-proxy-11111_kube-system,
kube-addon-manager-minikube_kube-system,
coredns-576cbf47c7-pf6gf_kube-system,
coredns-576cbf47c7-bz4hm_kube-system

MemoryPressure True forever.

kubectl describe node minikube | grep MemoryPressure

Check kubelet calculated: memory.available_in_mb:

memory.available_in_mb 673

The problem is that a few minutes ago an unidentified process started using around 500 MB additional RAM.

This never gets released. So RAM available never gets below soft threshold.

Let’s try deleting all evicted Pods:

kubectl delete -f myrampod2.yaml
pod "myram2" deleted

Only 20 MB extra RAM available.

memory.available_in_mb 694

Still 100 MB below soft threshold.

kubectl describe node minikube | grep MemoryPressure
MemoryPressure True Fri, 01 Feb 2019 08:12:21 +0200 KubeletHasInsufficientMemory kubelet has insufficient memory available

Still under MemoryPressure.

It is now impossible to schedule new Pods on this node.

Let’s attempt creating myram2 again:

kubectl create -f myrampod2.yaml

Get Pods:

kubectl get pods

See the last few lines from describe command:

kubectl describe pod/myram2

Lesson learned: you cannot set RAM eviction thresholds above available RAM.

memory.available_in_mb 700 MB

eviction-soft=”memory.available<800Mi”

Stop minikube

minikube stop

Original Source

Written by

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store