The Burgeoning Kubernetes Scheduling System — Part 3: Binpack Scheduling That Supports Batch Jobs

By Wang Qingcan and Zhang Kai


The Burgeoning Kubernetes Scheduling System series presents our experiences, technical thinking, and implementation details to Kubernetes users and developers. We hope the articles can help you understand the powerful capabilities and future trends of the Kubernetes scheduling system.

The first two articles in this series are The Burgeoning Kubernetes Scheduling System — Part 1: Scheduling Framework and The Burgeoning Kubernetes Scheduling System — Part 2: Coscheduling and Gang Scheduling That Support Batch Jobs.

These two articles introduce the Kubernetes Scheduling Framework and the implementation of Coscheduling and Gang Scheduling policies by extending the Scheduling Framework, respectively. When dealing with the batch job in the cluster, we need to focus on resource utilization. In particular, GPU cards are too expensive to waste resources. This article describes how to use Binpack to reduce resource fragmentation and improve GPU utilization during batch job scheduling.

Why Binpack?

The resource scheduling policy of Kubernetes is LeastRequestedPriority by default. The least-consumed nodes are scheduled first, so the resources of the entire cluster are distributed evenly among all nodes. However, this scheduling policy often produces more resource fragments on a single node.

The following is a simple example. As shown in the following figure, resources are used evenly among nodes. Each node uses three GPU cards with two nodes remaining on one GPU resource. At this time, a new job has applied for two GPUs and then is submitted to the scheduler. Due to insufficient resources, scheduling failure occurs.

As shown above, each node has one standby GPU, but it is out of service, leading to expensive resource waste. However, after using the Binpack resource scheduling policy, the resource fragmentation above is solved. Binpack fills the resources on the node first and then schedules the next node. A job that applies for two GPUs can be scheduled to a node successfully, which improves the resource usage of the cluster.

Implementation Method

Binpack implementation has been abstracted into the Score plug-in of Kubernetes Scheduler Framework (called RequestedToCapacityRatio.) It is used to score nodes in the preference phase according to self-defined configurations. The specific implementation can be divided into two parts, the constructing scoring function and scoring.

Constructing Scoring Function

The process of the constructing scoring function is simple. Users can define the score corresponding to different utilization rates to determine the decision-making process of scheduling.

If users set the definition like the following figure, higher resource utilization rates get higher scores. For example, if the resource utilization rate is 0, the score is 0, and if the resource utilization rate is 100, the score is 10. This is the resource allocation method of Binpack.

Users can also set the utilization rate of 0 with a score of 10 points and the utilization rate of 100 with 0 points. This means that the lower the resource utilization rate is, the higher the score is. This is the resource allocation method of spreading.

Users can add more points, and the corresponding relationship doesn’t have to be linear. For example, a score of 8 can be obtained when the resource utilization rate is 50. Thus, the score is divided into two intervals: 0–50 and 50–100.


Users can define referenced resources and weight values in the Binpack calculation. For example, they can set the values and weights of the GPU and CPU.

Then, during the scoring process, utilization rates of the corresponding resources are obtained according to the calculation results of (pod.Request + node.Allocated)/node.Total. Then, the utilization rates are added to the scoring function to obtain the corresponding scores. The final scores are acquired by weighing the weight value of all resources.

Binpack Usage


1. Create a/etc/kubernetes/scheduler-config.yaml. Users are allowed to configure other priorities policies according to their needs.

Demo Example

This is the result demonstration of Binpack after running the distributed work of TensorFlow. There are two 4-card GPU machines in the tested cluster.

1. Use Kubeflow’s Arena to deploy the tf-operator in an existing Kubernetes cluster.

Arena is one of the subprojects based on Kubeflow, an open-source community for machine learning systems in Kubernetes. Arena supports the main lifecycle management of machine learning jobs through command lines and SDKs. The lifecycle management includes environment installation, data preparation, model development, model training, and model prediction. The work efficiency of data scientists is improved with Arena.

Check the deployment status

2. The user submits a Tensorflow distributed job to the cluster that contains 1 PS and 4 workers. Each Worker requires 1 GPU.

3. When the user uses Binpack, 4 Workers are scheduled to the same GPU node: cn-shanghai. after the job is submitted.

4. When the user doesn’t use Binpack, 4 Workers are scheduled to two nodes: cn-shanghai. and cn-shanghai.192.168. 6.50g nodes after submitting the job. Thus, resource fragmentation occurs.


This article introduces how to use the native scheduling policy of Kubernetes (called RequestedToCapacityRatio) to enable Binpack Scheduling. By doing so, resource fragments can be reduced, and the GPU utilization rate can be improved. It is efficient, clear, and simple to use. The following articles of this series will discuss improving GPU resource utilization. We will focus on using GPU sharing scheduling to improve the GPU utilization rate.

Original Source:

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store