Connectors

Connectors are long-running, stateless services that receive execution requests via gRPC and initiate jobs with appropriate external or internal services. Each connector service is a Kubernetes deployment that receives gRPC requests when users trigger a particular type of task. (For example, the BigQuery connector is tiggered by the invocation of a BigQuery tasks.) The connector service then initiates a job with the appropriate service.

Connectors can be run locally as long as the appropriate connection secrets are locally available, since they are spawned in-process.

Connectors are designed to be scalable and can handle large workloads efficiently, and decrease load on the core system, since they run outside it. You can also test connectors locally without having to change the backend configuration, streamlining workflow development.

Connectors enable two key use cases:

Asynchronously launching jobs on hosted platforms (e.g. Databricks or Snowflake).
Calling external synchronous services, such as access control, data retrieval, or model inferencing.

This section covers all currently available connectors:

Creating a new connector

If none of the existing connectors meet your needs, you can implement your own connector.

There are two types of connectors: async and sync.

Async connectors enable long-running jobs that execute on an external platform over time. They communicate with external services that have asynchronous APIs that support create, get, and delete operations. The vast majority of connectors are async connectors.
Sync connectors enable request/response services that return immediate outputs (e.g. calling an internal API to fetch data or communicating with the OpenAI API).

While connectors can be written in any programming language since they use a protobuf interface, we currently only support Python connectors. We may support other languages in the future.

Async connector interface specification

To create a new async connector, extend the AsyncConnectorBase and implement create, get, and delete methods. These methods must be idempotent.

create: This method is used to initiate a new job. Users have the flexibility to use gRPC, REST, or an SDK to create a job.
get: This method retrieves the job resource (job ID or output literal) associated with the task, such as a BigQuery job ID or Databricks task ID.
delete: Invoking this method will send a request to delete the corresponding job.

For an example implementation, see the BigQuery connector code.

Sync connector interface specification

To create a new sync connector, extend the SyncConnectorBase class and implement a do method. This method must be idempotent.

do: This method is used to execute the synchronous task, and the worker in Union.ai will be blocked until the method returns.

For an example implementation, see the ChatGPT connector code.

Testing your connector locally

To test your connector locally, create a class for the connector task that inherits from AsyncConnectorExecutorMixin. This mixin can handle both asynchronous tasks and synchronous tasks and allows Union to mimic the system’s behavior in calling the connector.

For testing examples, see the BigQuery connector and Databricks connector documentation.

Enabling a connector in your Union.ai deployment

To enable a connector in your Union.ai deployment, contact the Union.ai team.