Deploy the dataplane

If you have not yet set up the required Nebius resources (MK8s cluster, Object Storage bucket, service account, access key), see Prepare infrastructure first.

Assumptions

You have a Union.ai organization, and you know the control plane URL for your organization.
You have a cluster name provided by or coordinated with Union.
You have a Nebius Managed Kubernetes cluster running one of the most recent three minor Kubernetes versions. Learn more
You have a Nebius Object Storage bucket, service account, and access key as described in Prepare infrastructure.

Prerequisites

Install Helm 3.
Install uctl.
Install the flyte CLI (used later to run a sample workflow).
Install the Nebius CLI and authenticate with nebius profile create.

Deploy the Union.ai operator

Set your KUBECONFIG to the Nebius MK8s cluster where you want to deploy the data plane:

        
nebius mk8s cluster get-credentials --id <CLUSTER_ID> --external
export KUBECONFIG=<PATH_TO_KUBECONFIG>

Configure the Union CLI and provision data plane resources:
```
uctl config init --host=<ORG_NAME>.union.ai
uctl selfserve provision-dataplane-resources --clusterName <CLUSTER_NAME> --provider custom
```
The command generates a YAML values file specific to the custom provider, including the secrets necessary so your data plane can communicate with Union’s control plane.

Update the generated values file with your Nebius-specific storage configuration. Replace the placeholders with your actual credentials and settings.

        
    
host: <ORG_NAME>.union.ai
clusterName: <CLUSTER_NAME>
orgName: <ORG_NAME>
provider: custom

storage:
  accessKey: <YOUR_BUCKET_ACCESS_KEY>
  bucketName: <YOUR_STORAGE_BUCKET_NAME>
  endpoint: https://storage.<REGION>.nebius.cloud
  fastRegistrationBucketName: <YOUR_STORAGE_BUCKET_NAME>
  provider: compat
  region: <REGION>
  secretKey: <YOUR_BUCKET_SECRET_KEY>

secrets:
  admin:
    create: true
    clientId: <CLIENT_ID>
    clientSecret: <CLIENT_SECRET>

The uctl selfserve provision-dataplane-resources command in step 2 generates the <CLIENT_ID> and <CLIENT_SECRET> values and feeds them into the values file. Don’t modify them.

Add the Union.ai Helm repo:

        
helm repo add unionai https://unionai.github.io/helm-charts/
helm repo update

Install the data plane. Replace <PATH_TO_VALUES_FILE> with the path to the Helm values file you customized in step 3.

        
    
helm upgrade --install unionai-dataplane unionai/dataplane \
  --namespace union --create-namespace \
  --values <PATH_TO_VALUES_FILE> \
  --timeout 10m

Verify the pods are running:
```
kubectl get pods -n union
```
When the deployment succeeds, all pods show a Running status, including union-operator-proxy, union-operator-buildkit, and executor.

Verify the cluster is registered with the control plane:

uctl get cluster

The output is similar to the following:

        
NAME            ORG       STATE          HEALTH
union-nebius    my-org    STATE_ENABLED  HEALTHY

Required for helm charts on a version <= 2026.5.8. Create an API key for your organization. This is required for v2 workflow executions on the data plane. If you have already created one, rerun the same command to propagate the key to the new cluster:
```
uctl create apikey --keyName EAGER_API_KEY --org <ORG_NAME>
```
If you receive a PermissionDenied error, contact Union.ai support to have the permission enabled for your organization.

GPU node configuration (Nebius-specific)

Follow these steps to run GPU workloads on Nebius:

Ensure the NVIDIA device plugin is installed and your task definitions request GPU resources. Nebius MK8s pre-installs the NVIDIA GPU operator on GPU node groups, so no additional setup is typically required. Learn more about how to add nodes with GPUs to a cluster.
Configure the Union backend to inject the required tolerations and label selectors so only tasks that require GPUs land in GPU-enabled nodes:
1. Identify the node(s) that have GPU devices available:
  kubectl get nodes -o jsonpath='{range .items[?(@.status.allocatable.nvidia\.com/gpu)]}{.metadata.name}{"\n"}{end}'
2. Get the labels of a GPU node:
  kubectl get node <node-name> -o jsonpath='{.metadata.labels}' | jq
  Nebius nodes typically include a label that displays the instance type. For example, for a node with NVIDIA H200 GPUs:
  beta.kubernetes.io/instance-type=gpu-h200-sxm
3. If the GPU device supports MIG partitions, the node typically also has a label indicating the partition profile. For example:
  nvidia.com/gpu-partition-size: 2g.35gb

Update your Helm values file with the information gathered in the previous steps:

        
    
# all the existing content of your values file
...

# ADD
config:
  k8s:
    plugins:
      k8s:
        gpu-device-node-label: "beta.kubernetes.io/instance-type"
        accelerator-devices:
          - H200: "gpu-h200-sxm"
        gpu-partition-size-node-label: "nvidia.com/gpu-partition-size"

Update your installed release:

        
    
helm upgrade unionai-dataplane unionai/dataplane \
  --namespace union \
  --values <PATH_TO_VALUES_FILE> \
  --timeout 10m

Once the above steps are completed, request GPU devices or MIG partitions directly from the Flyte task:

        
from flyte import Resources

@env.task(resources=Resources(gpu="H200:1", memory="64Gi"))
def train_model(...):
    ...

Working with the Nebius Container Registry

Flyte executions bundle your code and run it inside a container in the Nebius MK8s cluster. The contents of the image include the flyte package, your task code, and any other dependency your workflow requires.

Flyte automates building the image using an efficient layered mechanism to detect changes. You can decide where to store the images. This section covers the configuration if you plan to use Nebius Container Registry to store your container images.

Obtain a long-lived token from Nebius as described in Working in a CI/CD environment.
Get the static key token value from the previous step (it usually starts with v1...) and add it to an environment variable:
```
TOKEN='v1.CmQK...'
```

Encode it into a docker config file (replace the registry region accordingly):

        
    
cat > docker-config-nebius.json <<EOF
{
  "auths": {
    "cr.eu-north1.nebius.cloud": {
      "auth": "$(echo -n "iam:${TOKEN}" | base64)"
    }
  }
}
EOF

Create an image pull secret:

        
    
flyte create secret --type image_pull nebius-image-secret \
  --from-docker-config \
  --docker-config-path docker-config-nebius.json \
  --registries cr.eu-north1.nebius.cloud

Use it in your Flyte Image definition:

        
    
custom_image = flyte.Image.from_debian_base(
    registry="cr.eu-north1.nebius.cloud/e00...",
    registry_secret="<your-secret-name>",
)

Request the secret in your Flyte TaskEnvironment so tasks can pull the image:

        
    
env = flyte.TaskEnvironment(
    name="hello_v2",
    image=custom_image,
    secrets=["<your-secret-name>"],
)

Test a workflow

To run a sample workflow, complete the following steps:

Create a Flyte CLI configuration file at the path .flyte/config.yaml in your project directory. Replace <ORG_NAME> and <PROJECT_NAME> with your organization and project identifiers.
```
admin:
  endpoint: dns:///<ORG_NAME>.union.ai
image:
  builder: remote
task:
  domain: development
  org: <ORG_NAME>
  project: <PROJECT_NAME>
```
Run a sample workflow:
```
flyte run --image ghcr.io/flyteorg/flyte:py3.13-v2.0.2 \
  hello_world.py main --n 5
```
If the remote image builder isn’t enabled for your organization, use the --image flag with a pre-built container image as in the preceding flyte run example.
Check the run status. Replace <RUN_NAME> with the workflow run identifier.
```
flyte get run <RUN_NAME>
```
Look for ACTION_PHASE_SUCCEEDED in the output to confirm the workflow completed successfully.

Next: manage your cluster and pools

uctl selfserve provision-dataplane-resources provisions the data plane and registers this cluster with the control plane. Once it is connected, you manage the cluster pool it belongs to, and route work to it with queues, from the Cluster and workload management user guide:

Cluster pools: group clusters that share one data plane (object store, secrets, registry).
Clusters: inspect and manage the cluster records registered with the control plane.
Queues: route workloads to a pool and enforce concurrency, priority, and fairness.

Every organization is provisioned with a default pool that new clusters join automatically, so a single-cluster deployment needs no extra pool setup.

Additional resources

For more information, see the following resources: