DGXC Lepton Connector
This guide provides an overview of how to set up the DGXC Lepton connector in your Flyte deployment. The DGXC Lepton connector enables seamless deployment and management of AI inference endpoints on the Lepton AI platform directly from your Flyte workflows.
Prerequisites
Before setting up the DGXC Lepton connector, ensure you have:
- A Lepton AI account with appropriate access permissions
- Lepton API tokens configured for your deployment environment
- Access to a Kubernetes cluster with Flyte deployed
Specify connector configuration
flyte-binary
Edit the relevant YAML file to specify the connector.
kubectl edit configmap flyte-sandbox-config -n flyte
tasks:
task-plugins:
enabled-plugins:
- container
- sidecar
- k8s-array
- connector-service
default-for-task-types:
- container: container
- container_array: k8s-array
- lepton_endpoint_deployment_task: connector-service
- lepton_endpoint_deletion_task: connector-service
flyte-core
Create a file named values-override.yaml
and add the following configuration to it:
configmap:
enabled_plugins:
tasks:
task-plugins:
enabled-plugins:
- container
- sidecar
- k8s-array
- connector-service
default-for-task-types:
container: container
sidecar: sidecar
container_array: k8s-array
lepton_endpoint_deployment_task: connector-service
lepton_endpoint_deletion_task: connector-service
Configure DGXC Lepton connector service
Create a connector configuration file to specify the DGXC Lepton connector settings:
# dgxc-lepton-connector-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: dgxc-lepton-connector-config
namespace: flyte
data:
config.yaml: |
connectors:
lepton_endpoint_deployment_task:
endpoint: http://dgxc-lepton-connector:8000
insecure: true
timeout: 1800s
lepton_endpoint_deletion_task:
endpoint: http://dgxc-lepton-connector:8000
insecure: true
timeout: 600s
Apply the configuration:
kubectl apply -f dgxc-lepton-connector-config.yaml
Deploy DGXC Lepton connector service
Deploy the DGXC Lepton connector service to your Kubernetes cluster:
# dgxc-lepton-connector-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: dgxc-lepton-connector
namespace: flyte
spec:
replicas: 1
selector:
matchLabels:
app: dgxc-lepton-connector
template:
metadata:
labels:
app: dgxc-lepton-connector
spec:
containers:
- name: dgxc-lepton-connector
image: your-registry/dgxc-lepton-connector:latest
ports:
- containerPort: 8000
env:
- name: LEPTON_WORKSPACE_ID
valueFrom:
secretKeyRef:
key: workspace_id
name: lepton-secrets
- name: LEPTON_TOKEN
valueFrom:
secretKeyRef:
key: token
name: lepton-secrets
- name: LEPTON_WORKSPACE_ORIGIN_URL
valueFrom:
secretKeyRef:
key: origin_url
name: lepton-secrets
- name: DEBUG_MODE
value: "true"
- name: ROOT_LOG_LEVEL
value: "WARNING"
resources:
requests:
memory: "256Mi"
cpu: "100m"
limits:
memory: "512Mi"
cpu: "500m"
---
apiVersion: v1
kind: Service
metadata:
name: dgxc-lepton-connector
namespace: flyte
spec:
selector:
app: dgxc-lepton-connector
ports:
- port: 8000
targetPort: 8000
type: ClusterIP
Apply the deployment:
kubectl apply -f dgxc-lepton-connector-deployment.yaml
Configure DGXC Lepton API credentials
The DGXC Lepton connector requires specific credentials to authenticate with the Lepton AI platform. These credentials must be configured as Kubernetes secrets.
Required secrets
The connector requires the following secrets to be configured:
origin_url
: The base URL for the DGXC Lepton gateway (Base64 encoded)token
: Your DGXC Lepton API token (Base64 encoded)workspace_id
: Your DGXC Lepton workspace identifier (Base64 encoded)
Setup instructions
-
Create the DGXC Lepton secrets:
# Create the lepton-secrets with all required credentials kubectl create secret generic lepton-secrets -n flyte \ --from-literal=origin_url="https://gateway.dgxc-lepton.nvidia.com" \ --from-literal=token="<YOUR_LEPTON_API_TOKEN>" \ --from-literal=workspace_id="<YOUR_WORKSPACE_ID>"
Replace
<YOUR_LEPTON_API_TOKEN>
with your actual DGXC Lepton API token and<YOUR_WORKSPACE_ID>
with your workspace identifier.Alternatively, you can create the secret from a YAML file:
# lepton-secrets.yaml apiVersion: v1 kind: Secret metadata: name: lepton-secrets namespace: flyte type: Opaque data: origin_url: aHR0cHM6Ly9nYXRld2F5LmRneGMtbGVwdG9uLm52aWRpYS5jb20= # Base64 encoded URL token: <BASE64_ENCODED_TOKEN> # Your Base64 encoded API token workspace_id: <BASE64_ENCODED_WORKSPACE_ID> # Your Base64 encoded workspace ID
kubectl apply -f lepton-secrets.yaml
-
Install
flyteconnector
pod using Helm (if not already installed):helm repo add flyteorg https://flyteorg.github.io/flyte helm install flyteconnector flyteorg/flyteconnector --namespace flyte
-
Restart the deployment:
kubectl rollout restart deployment flyteconnector -n flyte kubectl rollout restart deployment dgxc-lepton-connector -n flyte
Upgrade the Flyte Helm release
flyte-binary
helm upgrade <RELEASE_NAME> flyteorg/flyte-binary -n <YOUR_NAMESPACE> --values <YOUR_YAML_FILE>
Replace <RELEASE_NAME>
with the name of your release (e.g., flyte-backend
),
<YOUR_NAMESPACE>
with the name of your namespace (e.g., flyte
),
and <YOUR_YAML_FILE>
with the name of your YAML file.
flyte-core
helm upgrade <RELEASE_NAME> flyte/flyte-core -n <YOUR_NAMESPACE> --values values-override.yaml
Replace <RELEASE_NAME>
with the name of your release (e.g., flyte
)
and <YOUR_NAMESPACE>
with the name of your namespace (e.g., flyte
).
Verify the setup
After completing the setup, verify that the DGXC Lepton connector is working correctly:
-
Check that the connector pods are running:
kubectl get pods -n flyte | grep dgxc-lepton-connector
-
Check the connector logs for any errors:
kubectl logs -n flyte deployment/dgxc-lepton-connector
-
Test the connector by running a simple DGXC Lepton workflow in your Flyte cluster.
Supported task types
The DGXC Lepton connector supports the following task types:
lepton_endpoint_deployment_task
: Deploy AI inference endpoints to the Lepton platformlepton_endpoint_deletion_task
: Delete existing endpoints from the Lepton platform
Configuration options
The DGXC Lepton connector supports various configuration options including:
- Resource shapes: Specify CPU, GPU, and memory requirements
- Scaling policies: Configure auto-scaling based on traffic, GPU utilization, or queries per minute
- Environment variables: Set custom environment variables and secrets
- Mount configurations: Configure shared storage mounts for model caches and datasets
- Engine configurations: Support for VLLM, SGLang, NIM, and custom container deployments
For detailed usage examples and API reference, refer to the DGXC Lepton plugin documentation in your Flyte deployment.