Implementing Fractional GPUs in Kubernetes with Aliyun Scheduler
This blog provides a detailed approach to partitioning a single GPU into as many as seven smaller GPUs, each equipped with its own memory, cache, and streaming multiprocessors, using open-source frameworks in a Kubernetes environment.
This guide is particularly beneficial for Machine Learning Engineers, Data Scientists, and AI researchers who aim to optimize their GPU resources for specific workload requirements.
Key Takeaways
- Deploying multiple containers with shared GPU resources.
- Understanding various methods' pros and cons.
- A step-by-step guide on using the Aliyun Gpushare Scheduler Extender.
Table of Contents
- Understanding Nvidia MIG and Its Limitations
- Recommendation: Aliyun Gpushare Scheduler Extender
- Step-by-Step Tutorial
- Advantages of Aliyun Gpushare Scheduler Extender
- Conclusion
Understanding Nvidia MIG and Its Limitations
The Nvidia Multi-Instance GPU (MIG) feature, available in NVIDIA's A100 and subsequent GPUs, allows a single A100 GPU to be divided into up to seven smaller GPUs, each equipped with its own memory, cache, and streaming multiprocessors. This feature is designed to enhance the utilization of GPU resources according to specific workload needs, such as machine learning, data analysis, or graphics processing. Particularly valuable in cloud and data center environments, MIG technology facilitates efficient GPU resource usage, offering flexibility and improved performance across a range of computing tasks. However, we have identified several limitations with the Nvidia MIG driver:
- Resource Partitioning: Dividing memory and compute cores among instances might limit resources for each instance, affecting performance for high-demand tasks.
- Potential Underutilization: There's a risk of mismatch between workload and resource partitioning, leading to resource underutilization.
- Compatibility and Support: MIG technology is limited to certain NVIDIA GPUs like the A100, excluding older models.
- Complexity in Management: Managing multiple GPU instances adds complexity, especially in large-scale deployments.
- Inter-Instance Communication: Communication challenges may arise due to the logical isolation of GPU instances.
Our Recommendation : Aliyun Gpushare Scheduler Extender
We highly recommend the Aliyun Gpushare Scheduler Extender, accessible at Aliyun Gpushare Scheduler Extender. Though it requires advanced configuration in Kubernetes, it proves to be a superior choice. The steps outlined for AKS (Azure Kubernetes Service) are easily adaptable to other cloud providers.
Step-by-Step Tutorial
Here's a step-by-step guide to setting it up, using AKS (Azure Kubernetes Service) as an example, though it's applicable to other cloud providers as well:
Step 1: Configure Docker Runtime (Skip for Azure/GCP)
For non-Azure/GCP environments, ensure your /etc/docker/daemon.json
is correctly configured
sudo vi /etc/docker/daemon.json
Verify that JSON has the below configuration:
{
"default-runtime": "nvidia",
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
}
}
Step 2: Set Up the Scheduler
SSH into a GPU Node and prepare the scheduler configuration (only needed in one gpu node):
cd /etc/kubernetes
sudo curl -O https://raw.githubusercontent.com/AliyunContainerService/gpushare-scheduler-extender/master/config/scheduler-policy-config.json
Update kube-scheduler.yaml
to use the new config:
sudo cd /tmp/
sudo wget https://github.com/AliyunContainerService/gpushare-scheduler-extender/blob/master/config/kube-scheduler-v1.23+.yaml
sudo cp /tmp/kube-scheduler.yaml /etc/kubernetes/manifests/kube-scheduler.yaml
Step 3: Deploy Device Plugin and Scheduler Controller
Exit from the GPU, Use kubectl where you have the cluster configured to deploy the necessary components:
curl -O https://raw.githubusercontent.com/AliyunContainerService/gpushare-scheduler-extender/master/config/gpushare-schd-extender.yaml
wget https://raw.githubusercontent.com/AliyunContainerService/gpushare-device-plugin/master/device-plugin-rbac.yaml
wget https://raw.githubusercontent.com/AliyunContainerService/gpushare-device-plugin/master/device-plugin-ds.yaml
kubectl create -f gpushare-schd-extender.yaml
kubectl create -f device-plugin-rbac.yaml
kubectl create -f device-plugin-ds.yaml
Label the Node for automatic inclusion in the NodePool:
kubectl label node aks-<your_node_name>-xxxxxxxx-vmss000000 gpushare=true
Step 4: Verify GPU Status
Install and run the kubectl plugin to check GPU status:
sudo wget https://github.com/AliyunContainerService/gpushare-device-plugin/releases/download/v0.3.0/kubectl-inspect-gpushare
sudo chmod 755 ./kubectl-inspect-gpushare
./kubectl-inspect-gpushare
Step 5: Configure Scheduler for Node
Modify gpushare-schd-extender.yaml
to run the scheduler on a specific node:
vi gpushare-schd-extender.yaml
Update nodeSelector
:
nodeSelector:
kubernetes.io/hostname: aks-<your_node_pool_name>-xxxxxxxx-vmss00000<node_number>
Redeploy the scheduler:
kubectl delete -f gpushare-schd-extender.yaml
kubectl apply -f gpushare-schd-extender.yaml
For more nodes to show up keep marking the other gpu nodes with this command
kubectl label node aks-<your_node_name>-xxxxxxxx-vmss00000<X> gpushare=true
Step 6: Deploy a Test Pod
Create and deploy a test pod to monitor utilization:
apiVersion: v1
kind: Pod
metadata:
name: gpushare-test-pod
spec:
restartPolicy: OnFailure
containers:
- name: gpushare-test-pod
image: "cheyang/gpu-player:v2"
env:
- name: NVIDIA_VISIBLE_DEVICES
value: "all"
resources:
limits:
aliyun.com/gpu-mem: 5
Deploy it with:
kubectl apply -f test-pods.yaml
Advantages of Using the Aliyun GPU Scheduler
- Flexibility and Scalability: This scheduler excels in dynamically allocating GPU resources to VMs based on their immediate requirements. It is particularly beneficial in environments with frequently changing workloads and demands.
- Performance: Performance levels can vary with the number of workloads sharing the GPU and each VM's specific workload. However, the advantage lies in the ability of all models to utilize the cores, resulting in higher overall utilization.
- Resource Allocation: The scheduler facilitates dynamic allocation and balancing of GPU resources. This adaptability is crucial as it allows adjustments in line with changing workload demands. Its approach to bin packing is not constrained by partition sizes, offering greater flexibility.
- Compatibility and Support: It supports a broader range of GPUs and is commonly integrated with various virtualization software, enhancing its applicability and versatility.
Conclusion
The Aliyun plugin stands out as a highly effective solution for sharing GPU resources in Kubernetes environments. This guide provides detailed, step-by-step setup instructions and sheds light on the enhanced flexibility, efficiency, and compatibility that Aliyun offers. These qualities establish it as an indispensable tool in the management of complex cloud and data center infrastructures.
For those eager to expand their understanding, I am open to sharing our insights. My professional area concentrates on aiding organizations in scaling up GPU workloads.
During my career, I have developed substantial expertise in setting up a variety of Large Language Models (LLMs) and Diffusion models on Kubernetes. This includes a specialized focus on enabling fractional GPU usage, a strategy that significantly aids in the cost-efficient deployment of these workloads.