A Kubernetes cluster with a publicly-accessible control plane endpoint
Each cluster node assigned a public (external) IP
Each cluster node labeled with the GPU type.
If your cluster has the NVIDIA GPU Operator installed or you are using GKE or Karpenter, your nodes already have the necessary GPU labels. No further action is required.
Currently supported GPU labels:
nvidia.com/gpu.product: automatically created by the Nvidia GPU operator
cloud.google.com/gke-accelerator: used by GKE clusters
karpenter.k8s.aws/instance-gpu-name: used by Karpenter
To check if your cluster is compatible, run:
echo"Checking nodes for external IP addresses..."nodes=$(kubectl get nodes -o=jsonpath="{.items[*]['status.addresses', 'metadata.annotations']}"|tr-d" \t\n\r")FOUND_EXTERNAL_IP_FROM_STATUS=falseFOUND_EXTERNAL_IP_FROM_ANNOTATION=falseif[[$nodes== *"ExternalIP"* ]];thenFOUND_EXTERNAL_IP_FROM_STATUS=truefiif[[$nodes== *"rke.cattle.io/external-ip"* ]];thenFOUND_EXTERNAL_IP_FROM_ANNOTATION=truefiFOUND_EXTERNAL_IPS=""if["$FOUND_EXTERNAL_IP_FROM_STATUS"=true]||["$FOUND_EXTERNAL_IP_FROM_ANNOTATION"=true];thenFOUND_EXTERNAL_IPS="FOUND"fiecho"Checking nodes for GPU labels..."output=$(kubectl get nodes --show-labels |awk -F'[, ]''{for (i=1; i<=NF; i++) if ($i ~ /nvidia.com\/gpu.product=|cloud.google.com\/gke-accelerator=|karpenter.k8s.aws\/instance-gpu-name=/) print $i}')if[-z"$output"];thenecho"No valid GPU labels found."elseecho"GPU Labels found:"echo"$output"fiif[-n"$FOUND_EXTERNAL_IPS"]&&[-n"$output"];thenecho"Your cluster is ready for GPU workloads with Komodo AI"elseecho"Your cluster is not ready for GPU workloads with Komodo AI."echo"Contact us at hello@komodoai.dev or join our Discord server at https://discord.gg/baJGK6RKZC for help."fi
Instructions for setting up NVIDIA GPU Operator on your cluster can be found here
This will create resources in the default namespace. If you’d like to run your workloads in another namespace,
download the YAML file above and change the namespace on every line with the comment Change to your namespace if using a different one
Copy this YAML to a file called service-account.yaml and then run:
Now that you have your control plane endpoint, service account token, and (optionally) your certificate authority data you’re ready to connect it to Komodo.
In the Komodo console, navigate to the Settings page, click Connect in the Kubernetes section and paste the values from Step 2 into the respective fields.
And that’s it! You can now launch your workflows on Komodo! Follow this tutorial to get started.
A Service Account in Kubernetes is an identity used by processes running within the cluster. It provides a way for pods to communicate securely with the Kubernetes API. Service accounts are used to manage permissions and control what actions can be performed by the processes within the cluster. In this guide, the Komodo service account is created to allow Komodo AI to interact with your Kubernetes cluster and launch workloads on your behalf.
The Cluster Certificate Authority (CA) data is used to verify the identity of the Kubernetes control plane. This ensures that communication between Komodo AI and your Kubernetes cluster is secure and trusted. By providing the CA data, you enable Komodo to verify the TLS certificate presented by the control plane, preventing man-in-the-middle attacks and ensuring a secure connection. While optional, it is highly recommended to include this information to enhance the security of your cluster’s integration with Komodo AI.
Supported GPU labels are specific labels applied to your Kubernetes nodes to identify the type of GPU available. These labels are essential for Komodo AI to recognize and utilize the appropriate GPU resources for your workloads.
By ensuring your nodes have the correct GPU labels, you enable Komodo AI to effectively allocate GPU resources for your AI and machine learning tasks, optimizing performance and efficiency.