Alibaba Cloud cGPU Technology Improves GPU Usage, Boosts AI Efficiency, and Reduces Costs
Artificial intelligence (AI) has deeply influenced all walks of life and is the mainstream approach to AI implementation. The demand for computing power in deep learning is enormous and dynamic, and data migration to the cloud has become the mainstream trend.
GPU is an important source of AI computing power. Internet customers and traditional enterprise customers who have AI-related services need to rent GPU cloud servers for deep learning model training and inference.
With the continuous development of graphics card technology and the progress of semiconductor technology, the computational power of a single GPU card is rising and the cost is increasing. However, many deep learning tasks do not occupy a single GPU card. The inflexible resource scheduling results in low GPU resource usage.
In this case, scheduling underlying GPU resources by using containers becomes a good solution. Multiple tenants (VMs) use the same GPU card, which can be implemented by using vGPU technology. Single-tenancy and multi-threaded scenarios can be implemented by using GPU container sharing technology. GPU resources can be split in finer granularity to improve resource usage by deploying high-density containers on GPU cards.
Alibaba Cloud heterogeneous computing team launched the cGPU container sharing technology, which allows users to employ containers to schedule the underlying GPU resources in a more fine-grained manner. This technology can improve GPU resource usage, boost efficiency, and reduce costs.
GPU containers are commonly used in the industry. When a container schedules a GPU, container applications in different threads may compete for memory resources and affect each other, resulting in complete container isolation. For example, applications that require memory resources may occupy too many resources, resulting in insufficient memory resources for container applications running in another thread. In other words, the computing preemption problem is only solved, but faults are not isolated. For example, an enterprise runs a GPU inference application in two containers. One is stable, and the other is still in the development phase. If the application in one container fails, the application in the other container may also fail because no isolation technology is implemented.
Another improvement method used in the industry replaces or adjust the CUDA runtime library. one shortcoming of this method is that users cannot integrate the user-created environments with the environments of cloud vendors. Users must adapt and change the CUDA runtime library.
The cGPU container technology launched by Alibaba Cloud can securely isolate containers, avoid mutual interference between services, and prevent errors between containers from spreading, and enhance security and stability. This technology allows users to flexibly use containers to schedule the underlying GPU resources without modifying the CUDA runtime library.
The cGPU container technology will push more enterprises to use containers to schedule the underlying GPU container resources, improve GPU resource usage, boost efficiency, and reduce costs.