# Deploy the dataplane If you have not yet set up the required Nebius resources (MK8s cluster, Object Storage bucket, service account, access key), see [Prepare infrastructure](https://www.union.ai/docs/v2/union/deployment/selfmanaged/selfmanaged-nebius/selfmanaged-nebius/prepare-infra) first. ## Assumptions * You have a Union.ai organization, and you know the control plane URL for your organization. * You have a cluster name provided by or coordinated with Union. * You have a Nebius Managed Kubernetes cluster running one of the most recent three minor Kubernetes versions. [Learn more](https://kubernetes.io/releases/version-skew-policy/) * You have a Nebius Object Storage bucket, service account, and access key as described in [Prepare infrastructure](https://www.union.ai/docs/v2/union/deployment/selfmanaged/selfmanaged-nebius/selfmanaged-nebius/prepare-infra). ## Prerequisites * Install [Helm 3](https://helm.sh/docs/intro/install/). * Install [uctl](https://www.union.ai/docs/v2/union/deployment/api-reference/uctl-cli/_index). * Install the [`flyte` CLI](https://www.union.ai/docs/v2/union/deployment/api-reference/flyte-cli) (used later to run a sample workflow). * Install the [Nebius CLI](https://docs.nebius.com/cli) and authenticate with `nebius profile create`. ## Deploy the Union.ai operator 1. Set your `KUBECONFIG` to the Nebius MK8s cluster where you want to deploy the data plane: ```bash nebius mk8s cluster get-credentials --id --external export KUBECONFIG= ``` 2. Configure the Union CLI and provision data plane resources: ```bash uctl config init --host=.union.ai uctl selfserve provision-dataplane-resources --clusterName --provider metal ``` The command generates a YAML values file specific to the `metal` provider, including the secrets necessary so your data plane can communicate with Union's control plane. 3. Update the generated values file with your Nebius-specific storage configuration. Replace the placeholders with your actual credentials and settings. ```yaml host: .union.ai clusterName: orgName: provider: metal storage: accessKey: bucketName: endpoint: https://storage..nebius.cloud fastRegistrationBucketName: provider: compat region: secretKey: secrets: admin: create: true clientId: clientSecret: ``` > [!NOTE] > The `uctl selfserve provision-dataplane-resources` command in step 2 generates the `` and `` values and feeds them into the values file. Don't modify them. 4. Add the Union.ai Helm repo: ```bash helm repo add unionai https://unionai.github.io/helm-charts/ helm repo update ``` 5. Install the data plane. Replace `` with the path to the Helm values file you customized in step 3. ```bash helm upgrade --install unionai-dataplane unionai/dataplane \ --namespace union --create-namespace \ --values \ --timeout 10m ``` 6. Verify the pods are running: ```bash kubectl get pods -n union ``` When the deployment succeeds, all pods show a `Running` status, including `union-operator-proxy`, `union-operator-buildkit`, and `executor`. 7. Verify the cluster is registered with the control plane: ```bash uctl get cluster ``` The output is similar to the following: ```text NAME ORG STATE HEALTH union-nebius my-org STATE_ENABLED HEALTHY ``` 8. Create an API key for your organization. This is required for v2 workflow executions on the data plane. If you have already created one, rerun the same command to propagate the key to the new cluster: ```bash uctl create apikey --keyName EAGER_API_KEY --org ``` > [!NOTE] > If you receive a `PermissionDenied` error, contact [Union.ai support](https://www.union.ai/) to have the permission enabled for your organization. ## GPU node configuration (Nebius-specific) Follow these steps to run GPU workloads on Nebius: 1. Ensure the NVIDIA device plugin is installed and your task definitions request GPU resources. Nebius MK8s pre-installs the NVIDIA GPU operator on GPU node groups, so no additional setup is typically required. Learn more about [how to add nodes with GPUs to a cluster](https://docs.nebius.com/kubernetes/gpu/set-up#how-to-add-nodes-with-gpus-to-a-cluster). 2. Configure the Union backend to inject the required tolerations and label selectors so only tasks that require GPUs land in GPU-enabled nodes: 1. Identify the node(s) that have GPU devices available: ```bash kubectl get nodes -o jsonpath='{range .items[?(@.status.allocatable.nvidia\.com/gpu)]}{.metadata.name}{"\n"}{end}' ``` 2. Get the labels of a GPU node: ```bash kubectl get node -o jsonpath='{.metadata.labels}' | jq ``` Nebius nodes typically include a label that displays the instance type. For example, for a node with NVIDIA H200 GPUs: ```text beta.kubernetes.io/instance-type=gpu-h200-sxm ``` 3. If the GPU device supports MIG partitions, the node typically also has a label indicating the partition profile. For example: ```text nvidia.com/gpu-partition-size: 2g.35gb ``` 3. Update your Helm values file with the information gathered in the previous steps: ```yaml # all the existing content of your values file ... # ADD config: k8s: plugins: k8s: gpu-device-node-label: "beta.kubernetes.io/instance-type" accelerator-devices: - H200: "gpu-h200-sxm" gpu-partition-size-node-label: "nvidia.com/gpu-partition-size" ``` 4. Update your installed release: ```bash helm upgrade unionai-dataplane unionai/dataplane \ --namespace union \ --values \ --timeout 10m ``` 5. Once the above steps are completed, request GPU devices or MIG partitions directly from the Flyte task: ```python from flyte import Resources @env.task(resources=Resources(gpu="H200:1", memory="64Gi")) def train_model(...): ... ``` ## Working with the Nebius Container Registry Flyte executions bundle your code and run it inside a container in the Nebius MK8s cluster. The contents of the image include the `flyte` package, your task code, and any other dependency your workflow requires. Flyte automates building the image using an efficient layered mechanism to detect changes. You can decide where to store the images. This section covers the configuration if you plan to use Nebius Container Registry to store your container images. 1. Obtain a long-lived token from Nebius as described in [Working in a CI/CD environment](https://docs.nebius.com/container-registry/authentication#working-in-a-ci/cd-environment). 2. Get the static key token value from the previous step (it usually starts with `v1...`) and add it to an environment variable: ```bash TOKEN='v1.CmQK...' ``` 3. Encode it into a docker config file (replace the registry region accordingly): ```bash cat > docker-config-nebius.json <"], ) ``` ## Test a workflow To run a sample workflow, complete the following steps: 1. Create a Flyte CLI configuration file at the path `.flyte/config.yaml` in your project directory. Replace `` and `` with your organization and project identifiers. ```yaml admin: endpoint: dns:///.union.ai image: builder: remote task: domain: development org: project: ``` 2. Run a sample workflow: ```bash flyte run --image ghcr.io/flyteorg/flyte:py3.13-v2.0.2 \ hello_world.py main --n 5 ``` > [!NOTE] > If the remote image builder isn't enabled for your organization, use the `--image` flag with a pre-built container image as in the preceding `flyte run` example. 3. Check the run status. Replace `` with the workflow run identifier. ```bash flyte get run ``` Look for `ACTION_PHASE_SUCCEEDED` in the output to confirm the workflow completed successfully. ## Additional resources For more information, see the following resources: - [Nebius Managed Kubernetes documentation](https://docs.nebius.com/kubernetes) - [Nebius Object Storage documentation](https://docs.nebius.com/object-storage) - [Nebius IAM and service accounts](https://docs.nebius.com/iam) --- **Source**: https://github.com/unionai/unionai-docs/blob/main/content/deployment/selfmanaged/selfmanaged-nebius/deploy-dataplane.md **HTML**: https://www.union.ai/docs/v2/union/deployment/selfmanaged/selfmanaged-nebius/deploy-dataplane/