Advance Deep Learning with Alibaba Open-Source and Pluggable Scheduling Tool for GPU Sharing

Cluster Scheduling: Kubernetes GPU Sharing

  • Extended resource definition
  • Scheduler extender mechanism
  • Device plugin mechanism
  • Kubectl extension mechanism

User Scenarios

  • A cluster administrator: “I want to improve the GPU utilization of the cluster. During the development, multiple users share the model development environment.”
  • An application developer: “I hope to be able to run multiple logic tasks on the Volta GPU at the same time.”


  • Users can describe the applying for a shared resource through API and schedule the resource.


  • Isolation of the shared resource is not supported.
  • Overselling is not supported.

Design Principle

Detailed Design


Core Function Modules:

  • GPU Share Scheduler Extender: It uses the Kubernetes scheduler extension mechanism to determine whether a single GPU card on the node can provide enough GPU Mem when the global scheduler filters and binds, and record the GPU allocation results to the Pod Spec through the annotation at the time of binding for the subsequent filtering to check the allocation results.
  • GPU Share Device Plugin: It uses the Device Plugin mechanism, which is called by Kubelet on the node, to allocate the GPU cards and execute based on the allocation result of the Scheduler Extender.

Detailed Process:

  • To find the best GPU card ID in the node according to the binpack policy. The “best” here means that for different GPU cards in the same node, and taking the binpack policy as the determinant condition, the GPU card with the least remaining resources and the free resources satisfying the condition is preferentially selected, and saved as ALIYUN_COM_GPU_MEM_IDX in the annotation of the Pod. In addition, the GPU memory applied by the Pod is also saved as ALIYUN_COM_GPU_MEM_Pod and ALIYUN_COM_GPU_MEM_ASSUME_TIME to the annotation of the Pod, and the POD is bound to the selected node at this time.
  • To call the Kubernetes API to perform the binding of the node and the Pod.
  • All the GPU Share Pods in this node with Pending status and ALIYUN_COM_GPU_MEM_ASSIGNED to false are listed.
  • The Pod with the same number of ALIYUN_COM_GPU_MEM_POD (in the Pod Annotation) and Allocate applications is selected. If multiple Pods meet the condition, the POD with the earliest ALIYUN_COM_GPU_MEM_ASSUME_TIME is selected.
  • The ALIYUN_COM_GPU_MEM_ASSIGNED in the Pod Annotation is set to true, and the GPU information in the Pod Annotation is converted into an environment variable and returned to Kubelet to truly create the Pod.

Related Projects


Test Sample

apiVersion: apps/v1
kind: Deployment
name: binpack-1
app: binpack-1
replicas: 1
selector: # define how the deployment finds the pods it manages
app: binpack-1
template: # define the pods specifications
app: binpack-1
- name: binpack-1
image: cheyang/gpu-player:v2
# MiB 1024




  • Optional support for Nvidia MPS is available in the Device Plugin;
  • The solution can be deployed automatically in the Kubernetes cluster initiated by kubeadm;
  • Scheduler Extener availability is improved;
  • A general solution for GPU, RDMA and flexible network cards is provided.




