Kubernetes Eviction Policies for Handling Low RAM and Disk Space Situations - Part 2

By Alwyn Botha, Alibaba Cloud Community Blog author.

Hard RAM Eviction Thresholds

Start minikube, this time with only hard thresholds.

Give node startup processes time to complete. ( 2 minutes should be OK )

After 15 minutes in my case:

Get Pods:

Kubernetes kept the Pod spec definition we attempted to start at the end of part 1 tutorial.

Upon a fresh Kubernetes node start it starts up all Pods for which it has specs.

Create myrampod3.yaml

Check kubelet calculated: memory.available_in_mb:

Another 60 MB used.

No MemoryPressure yet.

Create third 50 MB Pod:

No MemoryPressure yet.

a minute later …

a minute later …

Mystery 300 MB process uses 300 MB RAM.

( Based on several reboot tests I learned that after a few 50 MB Pods Kubernetes needs to allocate some RAM for internal use. )

We now have a MemoryPressure condition.

eviction-hard RAM 650 MiB = 680 MB

We expect Pods to be swiftly evicted.

Logs from kubelet :

Check MemoryPressure status:

No MemoryPressure. 700 MB available; threshold is 680 MB.

5 minutes later: still no more Pods evicted.

Delete myram2 so that we can have a neat kubectl get pods list.

Define a 100 MB Pod:

Create Pod:

Check kubelet calculated: memory.available_in_mb:

Available RAM below 680 MB threshold. We have a MemoryPressure situation.

Seconds later two Pods got evicted.

kubelet logs:

  • myram4 Pod requests 10 MB, uses 50 MB
  • myram3 Pod requests 10 MB, uses 50 MB
  • myram7 Pod requests 10 MB, uses 100 MB

Based on https://kubernetes.io/docs/tasks/administer-cluster/out-of-resource/#evicting-end-user-pods

pods are ranked by Priority, and then usage above request.

This is not what we observe here. 9 and 8 minutes old Pods get evicted first.

Then myram7 Pod that used considerably more RAM than it requested gets evicted. Based on official Kubernetes information I would have expected myram7 to be evicted first. See logs below.

Check available RAM:

We do not have a MemoryPressure condition anymore. ( I forgot to do that actual grep command here )

Let’s read the describe output to see how an eviction gets reported:

Last 2 events lines explain it adequately.

Clean up other Pods:

Evictions and Priority Classes

You have heard before that pods are ranked for eviction by Priority.

This part of tutorial demonstrates that.

Kubelet eviction thresholds as before:

We need 3 priority classes:

Below are the YAML files for 6 Pods. Note they all use priorityClassName. Create it all.

Summary:

  • myhigh15 … High Priority Class … request memory: “1Mi” : use 15Mi
  • myhigh35 … High Priority Class … request memory: “1Mi” : use 35Mi
  • mymed15 … Medium Priority Class … request memory: “1Mi” : use 15Mi
  • mymed35 … Medium Priority Class … request memory: “1Mi” : use 35Mi
  • mylow15 … Low Priority Class … request memory: “1Mi” : use 15Mi
  • mylow35 … Low Priority Class … request memory: “1Mi” : use 35Mi

MemoryPressure?

No.

a minute later … 200 MB process uses additional 200 MB memory.

eviction-hard RAM 650 MiB = 680 MB

You can specify eviction thresholds in MB:

From https://github.com/kubernetes/apimachinery/blob/master/pkg/api/resource/quantity.go

Ki | Mi | Gi | Ti | Pi | Ei

k | M | G | T | P | E

List running Pods — no evictions — as expected.

Run our 100 MB Pod. That will surely push our node into MemoryPressure .

Mere seconds later: 2 Pods evicted ( Hard threshold acts immediately ).

Check RAM available:

We are in MemoryPressure True condition.

Let’s investigate the kubelet eviction manager to see how it ranks which Pods to evict.

  • Low priority Pods are listed first — will be evicted first
  • Medium priority Pods are listed second
  • High priority Pods are listed last — will be evicted last

35 MB Pods are listed before 15 MB Pods every time.

Official theory …

pods are ranked by Priority, and then usage above request.

EXACTLY what we see here.

Important tip: specify accurate RAM requests for your Pods. That way they are less likely to be evicted. Prevent everyone from vastly overstating RAM requests ( to prevent eviction ).

Our myram7 Pod that uses 100 MB is listed right at the top as likely to be evicted first.

30 seconds later more Pods evicted.

Let’s investigate the kubelet eviction manager to see how it ranks which of these latter Pods to evict.

Based on what you just saw above you can probably predict correctly what the eviction ranks will look like:

Pods are evicted in ranked order

  • mylow15_default
  • metrics-server-6486d4db88-t6krr_kube-system
  • kubernetes-dashboard-5bff5f8fb8-jxslb_kube-system
  • mymed35_default

Determine Pod status:

Check RAM available:

Still under MemoryPressure.

a minute later …

List Pods … all evicted.

It is worthwhile to investigate the evict manager here since there is actually a problem.

Several more repeating phrases deleted from output :

  • attempting to reclaim memory
  • must evict pod(s) to reclaim memory
  • evicted, waiting for pod to be cleaned up
  • successfully cleaned up

metrics-server, kubernetes-dashboard and mymed15 evicted — no problem so far.

A replacement metrics-server and kubernetes-dashboard needed eviction.

System is spinning its wheels — 2 steps forward then 2 steps back ( 2 Pods evicted, got replaced, new 2 replacement Pods need eviction )

Another 2 replacement Pods need eviction: kubernetes-dashboard, metrics-server.

During the next few minutes there was a brief moment when memory available was enough ( by a mere 500 kilobytes ).

Delete evicted Pods: make RAM available :

Still not enough RAM available :

I did not do a before and after, but it seems evicted Pods use very little RAM anyway.

Node now permanently under MemoryPressure:

Over 15 minutes … memory available stays below: kubelet.eviction-hard=”memory.available < 650Mi

During the next 25 minutes this is how kubelet eviction manager tries to fix low RAM situation.

It repeatedly evicts not-critical Pods. However those Pods get restarted automatically by kubelet since they are still Kubernetes system Pods.

Kubernetes seems to learn somewhat from this situation that is beyond any hope of repair.

Initially it evicts kube-proxy every 30 seconds. It then slowly takes longer to create replacement Pods. After 30 minutes it takes 14 minutes for kube-proxy eviction and Pod recreation.

From https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#restart-policy

Exited Containers that are restarted by the kubelet are restarted with an exponential back-off delay (10s, 20s, 40s …) capped at five minutes.

This behavior seems similar to the exponential back-off delay of exited containers.

This is the top resident RAM usage processes.

2200 MB RAM node minus around 1200 MB for processes above — that leaves 1000 MB available .

However our script shows 600 MB available. Let us call the missing ( in use ) 400 MB Kubernetes and Linux system overhead.

These are the running Kubernetes services.

Restarting kubelet does not fix problem.

kube-apiserver uses around 500 MB RAM upon startup so that did not balloon out of control either.

Clean up : delete …

Eviction Thresholds Syntax

Reference : https://kubernetes.io/docs/tasks/administer-cluster/out-of-resource/#eviction-thresholds

This tutorial specified memory thresholds using

memory.available<600Mi

600 MiB = 629.1456 MB

You can also specify the threshold in MBs

memory.available<630M

Thresholds may also be specified in percentages, for example a 12 GB node may have:

memory.available<10%

Eviction Monitoring Interval

From https://kubernetes.io/docs/tasks/administer-cluster/out-of-resource/

The kubelet evaluates eviction thresholds per its configured housekeeping interval.

housekeeping-interval is the interval between container housekeepings.

From https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/

— housekeeping-interval duration

Interval between container housekeepings (default 10s)

This tutorial did not change this interval from its default value.

eviction-pressure-transition-period

From https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/

— eviction-pressure-transition-period duration
Duration for which the kubelet has to wait before transitioning out of an eviction pressure condition. (default 5m0s)

From https://kubernetes.io/docs/tasks/administer-cluster/out-of-resource/#oscillation-of-node-conditions

eviction-pressure-transition-period is the duration for which the kubelet has to wait before transitioning out of an eviction pressure condition.

The kubelet would ensure that it has not observed an eviction threshold being met for the specified pressure condition for the period specified before toggling the condition back to false.

We deliberately set this value to 30 seconds so that we could quickly see transitions into and out of MemoryPressure.

30 seconds is probably a too low value for production usage. The official Kubernetes default value of 5 minutes seem a very good default.

Evicting Guaranteed Pods

From https://kubernetes.io/docs/tasks/administer-cluster/out-of-resource/#evicting-end-user-pods

Guaranteed pods and Burstable pods whose usage is beneath requests are evicted last.

( Guaranteed Pods are guaranteed only when requests and limits are specified for all the containers and they are equal. )

Such pods are guaranteed to never be evicted because of another Pod’s resource consumption.

Summary of other conditions:

  • When node only has Guaranteed or Burstable Pods using less than requests remaining
  • Node under RAM pressure MUST choose to evict such Pods
  • Kubelet eviction manager will evict pods of lowest priority first

This tutorial did not test any guaranteed Pods.

Devise a set of tests that will test these eviction conditions.

It is much easier to do two tests that investigate 2 conditions each.

Do not attempt to test all these conditions in one complex test: interactions will hide simple cause and effects you wish to observe.

You need to know ( from experience ) how guaranteed Pods are handled: they are guaranteed.

To really develop your mastery of this topic devise and run more tests where you:

  • investigate hard thresholds
  • investigate soft thresholds
  • set hard and soft thresholds in same test run
  • use the 3 priority classes above.

Original Source

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store