# Interactive sandboxes
> This bundle contains all pages in the Interactive sandboxes section.
> Source: https://www.union.ai/docs/v2/union/user-guide/sandboxing/interactive-sandboxes/

=== PAGE: https://www.union.ai/docs/v2/union/user-guide/sandboxing/interactive-sandboxes ===

# Interactive sandboxes

> **📝 Note**
>
> An LLM-optimized bundle of this entire section is available at [`section.md`](https://www.union.ai/docs/v2/union/user-guide/sandboxing/section.md).
> This single file contains all pages in this section, optimized for AI coding agent context.

> [!NOTE]
> Interactive sandboxes are in Beta. APIs may change between releases. Reach out on Slack with feedback or feature requests.

`unionai-sandbox` (`union.sandbox`) runs untrusted Python or shell commands in a live, multi-turn session. You open a sandbox, send it many commands, watch state evolve on its work dir, and close it when you're done.

```
session ┐
        │  work dir + venv persist across calls ────┐
        ├─ run("write data.json")        net: blocked     │
        ├─ run("uv pip install numpy")   net: allowlist   │
        ├─ put_bytes / get_bytes         (push/pull files)│
        ├─ run("load data, compute")     net: blocked     │
        └─ close                                          ┘
```

It's built for the workloads where a one-shot container falls apart: an agent that needs to iterate on its own code, a notebook-style app that runs a sequence of related commands, a tool that compiles, executes, then inspects the result. State persists between calls, output streams as it arrives, and the security posture (network and filesystem) is set per call, not baked in at construction time.

## What you get

- **Built for agents and multi-turn apps.** Untrusted code (LLM-generated, third-party, multi-tenant) gets a real interactive session, not a fresh container per turn. The work dir and the session venv are still there on the next `run()`.
- **Like a long-lived machine.** The session keeps one venv on its work dir, so installing is just a `run()`: `run("uv pip install X")` persists, and a later `run("import X", script_type="python")` sees it — install once, import anywhere. The work dir is the persistent disk; `/tmp` is reset per command.
- **Explicit isolation, no silent downgrade.** You pick the backend per session (`bubblewrap`, user namespaces) — there's no auto-detection and no fallback to weaker isolation. You opt into [gVisor](https://gvisor.dev/), dedicated pods, and stricter capability drops only when the threat model calls for it.
- **Per-call security knobs.** Flip `network_mode` between `blocked`, `allowlist`, and `open` on each `run()`. The same session can execute a network-isolated tool call, then do an allow-listed `uv pip install`, then drop back to blocked, without tearing anything down.
- **First-class integration.** A remote sandbox is a regular Flyte task: observable in the Union console, governed by the same RBAC and project/domain scoping, serializable across task boundaries, recoverable independently of the caller. (On-device sessions skip the extra pod entirely and aren't serializable; pick remote when you need those properties.)
- **Embeddable.** One pip install, one `async with`, drops into any async Python. The on-device transport has no daemon and no Docker requirement; the remote transport adds a one-time per-cluster deploy.

The library exposes one `Session` API over two transports: in-process (`union.sandbox.on_device`) for sandboxing inside the current container or task pod, and a remote `sandbox-server` pod for everything else. The call sites are nearly identical; the choice is documented in **Sandboxing > Interactive sandboxes > Deployment**.

## Quickstart

```sh
pip install 'unionai-sandbox[flyte]'
```

```python
import asyncio
from union import sandbox as sb

async def main():
    async with sb.on_device.session(backend="userns") as sbx:
        proc = await sbx.run("uname -a", stdout=True)
        out, _ = await proc.communicate_text()
        print(out)

asyncio.run(main())
```

That runs `uname -a` inside a sandboxed child process with no network and a restricted filesystem view. If that prints, your install works.

> [!NOTE] Pick the backend explicitly
> The on-device backend defaults to `"bubblewrap"`, which needs `CAP_SYS_ADMIN` + unconfined AppArmor on the pod. On a vanilla pod or a dev laptop, pass `backend="userns"` (as above) since it needs no extra capabilities. There's no auto-detection: an unavailable backend fails loudly rather than downgrading. See **Sandboxing > Interactive sandboxes > Security model**.

> [!IMPORTANT] On-device is for development, remote is for production
> `sb.on_device.session()` shares a container with the calling code, which makes it ideal for laptop, CI and install-check use, but it doesn't isolate the sandboxed process from your task's secrets and credentials. For production use cases (agent loops, multi-turn apps, anything running untrusted code in a real workload), use `sb.session()`, which runs in its own pod. See **Sandboxing > Interactive sandboxes > Security model** for the why and **Sandboxing > Interactive sandboxes > Deployment** for the deploy step.

> [!NOTE]
> Examples on this page use bare `asyncio.run(main())` to keep the code short. In a Union codebase you'll typically open the session inside a `@env.task` instead. Examples on **Sandboxing > Interactive sandboxes > Deployment** show that shape.

### A more involved example

This is the shape interactive sessions are good at: state persists across calls, the security posture changes per call, and a follow-up step uses what an earlier one produced. Shown with `sb.on_device.session()` for brevity; the same code is the body of a `@env.task` that opens `sb.session()` in production.

State that persists across `run()` calls lives in two places: the session **work dir** (the data file below) and the shared **session venv** (the `requests` install below). Each `run()` is otherwise its own isolated process. The writable scratch mounts (`/tmp`, `/dev/shm`) are a fresh tmpfs every command, so anything written to bare `/tmp` is gone by the next call. That's why the example writes its data file under the pinned work dir, not `/tmp`.

```python
import asyncio
import tempfile
from union import sandbox as sb

async def main():
    with tempfile.TemporaryDirectory() as work:
        WRITE_DATA = f"""
import json, pathlib
pathlib.Path("{work}/data.json").write_text(json.dumps([1, 2, 3, 4]))
"""
        COMPUTE = f"""
import json, statistics, requests   # the install from step 2 is visible here
data = json.loads(open("{work}/data.json").read())
print(f"requests={{requests.__version__}} mean={{statistics.mean(data)}}")
"""
        # Session-level network_mode sets the ceiling of what any run() can
        # reach. Per-call run(network_mode="blocked") narrows from there.
        async with sb.on_device.session(
            backend="userns",
            host_work_dir=work,                 # pin the work dir so we can interpolate it
            network_mode="allowlist",
            network_allowlist=sb.PYPI_HOSTS,
        ) as sbx:
            # 1. Write a file to the work dir (persists). Tighten to blocked.
            await sbx.run(WRITE_DATA, script_type="python", network_mode="blocked")

            # 2. Install into the shared session venv (uses the allow-list default).
            await sbx.run("uv pip install requests")

            # 3. Back to blocked. Use the file from step 1 and the package from step 2.
            out = await sbx.run_code(COMPUTE, network_mode="blocked")
            print(out.strip())  # e.g. "requests=2.32.3 mean=2.5"

asyncio.run(main())
```

The same code runs against a remote sandbox by swapping `sb.on_device.session(...)` for `await sb.session()`. The remote transport needs the deploy extra and a one-time per-cluster deploy:

```sh
pip install 'unionai-sandbox[deploy]'
unionai-sandbox-deploy
```

After that, `await sb.session(...)` works from any task. See **Sandboxing > Interactive sandboxes > Deployment** for the full picture.

## Choosing a sandbox

`unionai-sandbox` is one of three sandboxing options Flyte and Union ship. Pick by the shape of the workload, not by isolation strength:

- **One-shot, typed I/O.** Use [`flyte.sandbox.create()`](https://www.union.ai/docs/v2/union/user-guide/sandboxing/code-sandboxing/page.md). It builds an ephemeral container, runs one invocation with declared inputs and outputs, and discards it. Simpler when you don't need a live session.
- **Sandboxed orchestration.** Use [workflow sandboxing](https://www.union.ai/docs/v2/union/user-guide/sandboxing/workflow-sandboxing-flyte/page.md) when the thing you need to sandbox is the control flow (LLM-generated orchestration code that dispatches to known tools).
- **Interactive sessions.** This library. Pick it when you need state to persist between commands, want to stream output, or want per-call network and filesystem control.

The [main sandboxing index](https://www.union.ai/docs/v2/union/user-guide/sandboxing/_index) has a full decision matrix.

## What's on the rest of these pages

The transports share one API, so the docs are organized by concept. Each page covers both on-device and remote, and calls out where they differ.

- **Sandboxing > Interactive sandboxes > Security model**. Isolation backends (bubblewrap, user namespaces, sandbox-exec, gVisor), the explicit-backend contract, blast radius for on-device vs remote, and which posture to pick for which trust level.
- **Sandboxing > Interactive sandboxes > Running commands**. The `run()` call, the `exec()` / `run_code()` one-shot helpers, the shared-venv install model, output handling, script types, timeouts, and the error model.
- **Sandboxing > Interactive sandboxes > Networking**. Per-call `network_mode`, the deny-list and `CLOUD_METADATA_DENY`, bring-your-own proxy, and what the allow-list does and does not protect against.
- **Sandboxing > Interactive sandboxes > Filesystem**. `put_bytes` and `get_bytes`, the default allow-list, and how to extend it.
- **Sandboxing > Interactive sandboxes > Deployment**. When to pick on-device vs remote, `unionai-sandbox-deploy`, `SandboxEnvironment`, custom images and resources, ownership and reference mode, and detached-lifetime sessions.
- **Sandboxing > Interactive sandboxes > GPU**. Running CUDA / PyTorch in a sandbox, why GPU requires the bubblewrap backend, and resource sizing.
- **Sandboxing > Interactive sandboxes > Agents**. The agent-loop pattern: persistent state, per-call network flips, treating non-zero exits as signal, and secret handling.

=== PAGE: https://www.union.ai/docs/v2/union/user-guide/sandboxing/interactive-sandboxes/security-model ===

# Security model

> [!NOTE] On-device is for development, remote is for production
> An on-device sandbox shares a container with the code that launched it. That's fine for development on a laptop, CI or sanity-checking your install, but it doesn't isolate the sandboxed process from your task's own code, secrets and cloud credentials. This page is about production posture, which means a remote sandbox. The "blast radius" section below justifies why; the rest of the page assumes you're picking knobs on a remote `SandboxEnvironment`.

A production sandbox is built from two independent layers:

1. **The isolation backend** running inside the sandbox pod, which constrains the sandboxed process (filesystem, syscalls, capabilities, network namespace).
2. **The pod runtime**, which is whether the pod's syscalls hit the host kernel directly or go through a user-space kernel like gVisor.

These are independent. A sandbox pod can run `userns` inside a gVisor pod, or `bubblewrap` inside a vanilla container pod. Pick each layer for what it actually defends against.

## Isolation backends

The library reports the backend on each process as `proc.backend`. On a remote sandbox, set it with `SandboxEnvironment(sandbox_mode=...)`.

| Backend        | How it works                                                                                                        | Default?                                                                                              |
| -------------- | ------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------- |
| `userns`       | `unshare(2)` + `prctl(NO_NEW_PRIVS)` + `capset` + `setrlimit`, plus Landlock and a seccomp BPF deny-list            | Remote default                                                                                        |
| `bubblewrap`   | `bwrap(1)` with `--unshare-all --die-with-parent --cap-drop ALL`, plus a Landlock ruleset as a kernel-side backstop | On-device default (needs `CAP_SYS_ADMIN` + unconfined AppArmor); opt-in for remote (`DEFAULT_SANDBOX_ENV_BWRAP` or `sandbox_mode="bwrap"`) |
| `sandbox-exec` | macOS wrapper around Apple's `sandbox-exec`; restricts writes to the work dir and can deny outbound sockets         | macOS on-device only                                                                                  |
| `none`         | `setpgid` + best-effort `setrlimit`; logs a warning                                                                 | Dev only (no isolation)                                                                               |

Both `userns` and `bubblewrap` layer namespaces, dropped capabilities, a [Landlock](https://docs.kernel.org/userspace-api/landlock.html) filesystem ruleset, and a seccomp BPF deny-list. They are not equally strong though: `bubblewrap` is the stronger backend. With `CAP_SYS_ADMIN` it pivots into a fresh mount root, which closes the gap where the sandbox shares the pod's root filesystem. `userns` is the lite variant: it runs in a vanilla pod with no extra capabilities, but it leaves that shared-rootfs gap open, so its mount isolation is weaker.

The remote default is `userns` because it runs anywhere (`bwrap` needs `CAP_SYS_ADMIN` + unconfined AppArmor and isn't always present in minimal images). When you can grant the pod those capabilities and want the strongest in-pod isolation, choose `bubblewrap`.

## Pod security for the bubblewrap backend

`bubblewrap` runs as a non-root user via unprivileged user namespaces. But the containerd default seccomp profile only permits the `mount` / `pivot_root` / `setns` / `unshare` syscalls `bwrap` needs when the container's capability set includes `CAP_SYS_ADMIN`, and the default AppArmor profile must be `unconfined` so those calls aren't blocked.

`flyte.PodTemplate().allow_nested_sandboxing()` grants exactly that: `CAP_SYS_ADMIN` plus unconfined AppArmor, `allowPrivilegeEscalation: false`. How you apply it depends on the transport:

- On-device: put it on the task that opens the session, since the sandbox child runs in that pod.

  ```python
  bwrap_env = flyte.TaskEnvironment(
      name="sandboxed-task",
      image=sb.base_sandbox_image,
      pod_template=flyte.PodTemplate().allow_nested_sandboxing(),
  )

  @bwrap_env.task
  async def main() -> str:
      async with sb.on_device.session(backend="bubblewrap") as sbx:
          ...
  ```

- Remote: the `SandboxEnvironment` derives the pod template from `sandbox_mode` / `sys_cap_admin` for you and `sandbox_mode="bwrap"` carries the grant automatically. See [Deployment](./deployment).

The `userns` backend needs none of this because it runs in a vanilla pod. Choose `userns` when you can't (or don't want to) grant the pod extra capabilities; choose `bubblewrap` when you can, for its stronger isolation — at the cost of the `CAP_SYS_ADMIN` + AppArmor grant above.

## Blast radius: why remote

The backend constrains the sandboxed process. What an _escaping_ process can reach is determined by where the sandbox runs.

> [!WARNING] An on-device sandbox shares the caller's container
> An on-device sandbox runs inside the same container as the code that launched it. If the backend is breached, the escaping process can reach your task's own code, mounted secrets, and service-account or cloud credentials. The pod boundary is the only thing still containing it; unless the task pod itself runs under gVisor, that boundary is the host kernel.
>
> This matters only when the sandboxed code is untrusted: for trusted code (your own prompts and tools, not exposed to end users) on-device is a perfectly good production choice. When the code is untrusted, prefer a remote sandbox so an escape lands in a throwaway pod, not your workload.

A remote sandbox runs in its own pod with:

- a typically minimal image (no caller code, no toolchain)
- its own service account (no task secrets, no cloud credentials)
- no access to whatever the caller mounted

The escape blast radius is the sandbox pod, not your workload. Hardening that pod with gVisor (below) further reduces what an escape can do to the host kernel.

## Pod runtime: gVisor

Independent from the in-pod backend:

- `sandbox_mode` (`userns` or `bwrap`) selects the **in-pod** backend constraining the sandboxed process.
- `runtime` (`container` or `gvisor`) selects how the **pod itself** is run.

Setting `runtime="gvisor"` puts `runtimeClassName: gvisor` on the sandbox pod, so its syscalls go through the [gVisor](https://gvisor.dev/) application kernel rather than hitting the host kernel directly. Recommended whenever the sandboxed code is untrusted or the workload is multi-tenant.

```python
hardened = sb.SandboxEnvironment(
    name="hardened-sandbox",
    sandbox_mode="userns",
    runtime="gvisor",
)
```

> [!NOTE] gVisor must be enabled on the cluster
> `runtime="gvisor"` requires the `gvisor` RuntimeClass to be installed and enabled in your cluster. In most cases, talk to your Union solutions engineer to enable it.

## Choosing a posture

Backend choice does affect isolation strength: `bubblewrap` is stronger than `userns`-lite (it closes the shared-rootfs gap, as above), so prefer `bubblewrap` when the pod can carry `CAP_SYS_ADMIN` + AppArmor and `userns` when it can't. But both are solid process-level isolation that's fine for production in a normal container pod. You don't need gVisor to run a sandbox responsibly. The bigger lever as trust drops is the pod runtime (`container` vs `gvisor`) and tenant isolation (shared `SandboxEnvironment` vs one per tenant).

| Trust level                                                   | Pod runtime           | Tenant isolation                                               | Per-call notes                                                                                                            |
| ------------------------------------------------------------- | --------------------- | -------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------- |
| Trusted (your own code/prompts, not exposed to end users)     | `container`           | Shared env is fine                                             | Default `network_mode="blocked"`; allow-list when you need it. Process isolation is sufficient.                          |
| Semi-trusted (vetted third-party libraries, your own ML code) | `container`           | Shared env is fine                                             | Default `network_mode="blocked"`; allow-list when you need it.                                                            |
| Untrusted (LLM-generated from end-user input, user-submitted) | `gvisor` (recommended) | Shared env is fine                                             | Stage inputs via `put_bytes`; keep `network_mode="blocked"` unless a step needs egress.                                   |
| Multi-tenant, hostile inputs assumed                          | `gvisor`           | One `SandboxEnvironment` per tenant; no cross-tenant pod reuse | `network_mode="blocked"` on every `run()`; the proxy allow-list is not adversarial-safe (see [Networking](./networking)). |

The principle: **let the workload pick the floor, let the threat model pick the ceiling**. The default backend is the floor and is appropriate for the common case. Reach for gVisor when you're actually running hostile code or sharing the system across tenants — not as a blanket requirement.

## What's not in scope

- **Side-channel attacks** (timing, cache, Spectre-class) are not addressed by any backend here. If you need defense against them, you need hardware partitioning, not a sandbox.
- **Resource exhaustion** is bounded by `Resources(...)` on the sandbox pod and per-call `timeout_s` on `run()`. The backends do not prevent a sandboxed process from using all the CPU and memory the pod gives it.
- **The proxy-based network allow-list** is not a kernel-level firewall. See [Networking](./networking) for what it does and does not protect against.

## Related

- [Networking](./networking). Per-call `network_mode` and what the allow-list actually constrains.
- [Filesystem](./filesystem). Default filesystem allow-list and how to extend it.
- [Deployment](./deployment). `SandboxEnvironment`, custom images, and per-launch overrides.

=== PAGE: https://www.union.ai/docs/v2/union/user-guide/sandboxing/interactive-sandboxes/running-commands ===

# Running commands

Once you have an open session, everything below works the same whether you got it from `sb.on_device.session(...)` or `await sb.session(...)`. The `Session` API is identical across transports.

## Lifecycle

A session follows `open → run → close`. The recommended shape is an `async with` block:

```python
from union import sandbox as sb

async with sb.on_device.session(backend="userns") as sbx:  # userns: runs on a vanilla pod, no extra capabilities
    proc = await sbx.run("uname -a", stdout=True)
    out, _ = await proc.communicate_text()
# session closed automatically here
```

You can also manage the lifetime yourself:

```python
sbx = await sb.on_device.session(backend="userns").open()
try:
    proc = await sbx.run("uname -a", stdout=True)
    out, _ = await proc.communicate_text()
finally:
    await sbx.close()
```

> [!NOTE] Remote sessions open lazily
> For a remote `SandboxSession`, `async with sbx` (or `await sbx`) waits for the pod to become addressable, but the transport health-check is deferred to the first `run()` / `put_bytes` / `get_bytes` call. Your own setup work overlaps with pod startup. See [Deployment](./deployment) for the detached-lifetime pattern.

## `run()`

`run()` executes one command in the sandbox and returns a `SandboxProcess`, a subprocess-like handle you drain for output:

```python
proc = await sbx.run(
    "python -c 'import os; print(os.uname())'",
    stdout=True,                                    # PIPE | INHERIT | DEVNULL | False
    stderr=True,
    env={"FOO": "bar"},
    cwd="/tmp",                                     # under the sandbox work dir
    script_type="shell",                            # "shell" | "python"
    network_mode="allowlist",                       # "blocked" | "open" | "allowlist"
    network_allowlist=["pypi.org", "*.pythonhosted.org"],
    timeout_s=30,
)
```

| Argument | Type | Meaning |
|---|---|---|
| `cmd` | `str` | The command (shell) or script (`script_type="python"`) to run. |
| `stdout`, `stderr` | `True` / `False` / `PIPE` / `INHERIT` / `DEVNULL` | How to handle each stream. `True` captures (pipe); `False` discards. Constants are exported as `sb.PIPE`, `sb.INHERIT`, `sb.DEVNULL`. |
| `env` | `dict[str, str]` | Extra environment variables for the process. |
| `cwd` | `str` | Working directory, resolved under the sandbox work dir. |
| `script_type` | `"shell"` / `"python"` | Interpret `cmd` as a shell command or a Python script. |
| `network_mode` | `"blocked"` / `"open"` / `"allowlist"` | Network posture for this call. See [Networking](./networking). |
| `network_allowlist` | `list[str]` | CIDRs or DNS patterns, used only with `network_mode="allowlist"`. |
| `timeout_s` | `float` | Kill the process after this many seconds. |

> [!NOTE] `network_denylist` is set on the session, not per `run()`
> `run()` takes `network_mode` and `network_allowlist` per call, but not `network_denylist`. The deny-list is a session-level policy; pass it to `sb.on_device.session(...)` / `sb.session(...)`. See [Networking](./networking).

## One-shot commands: `exec()` and `run_code()`

`run()` returns a process handle that you manage directly. If you only need the command output, use one of the helper methods, which combine `run()` and `communicate()` into a single call.

`exec()` runs a command to completion and returns an `ExecResult` (exit code plus decoded streams):

```python
result = await sbx.exec("ls -la")
result.returncode   # int
result.stdout       # decoded str
result.stderr       # decoded str
result.ok           # True when returncode == 0

result = await sbx.exec("false", check=True)   # raises SandboxCommandError on non-zero
```

`run_code()` is the shortest path from Python source to its stdout. It runs `code` as a Python script (`script_type="python"`), raises `SandboxCommandError` on a non-zero exit, and returns the decoded stdout:

```python
out = await sbx.run_code("print(2 + 2)")   # "4\n"
```

Both take the same `env`, `cwd`, `network_mode`, `network_allowlist`, and `timeout_s` arguments as `run()`. `SandboxCommandError` carries the failing command and the full `ExecResult` (`err.result.stderr` for diagnostics); both are exported from `union.sandbox`.

Reach for `run()` when you need to stream output, branch on a non-zero exit without an exception, or inspect process metadata; reach for `exec()` / `run_code()` for the common capture-and-go case.

## Reading output

`SandboxProcess` gives you three ways to consume output:

```python
proc = await sbx.run("my-command", stdout=True, stderr=True)

# 1. Drain everything at once (bytes)
out, err = await proc.communicate()

# 2. Drain everything at once, decoded to str
out, err = await proc.communicate_text()

# 3. Stream lines as they arrive
async for line in proc.iter_stdout_lines():
    print(line)
async for line in proc.iter_stderr_lines():
    print(line)
```

After the process exits, inspect it:

```python
proc.returncode          # int exit code (None until it exits)
proc.runtime_ms          # wall-clock execution time
proc.backend             # "bubblewrap" | "userns" | "sandbox-exec" | "none"
proc.termination_reason  # "" on a clean exit, otherwise a reason string
```

## Script vs shell

`script_type="shell"` (the default) runs `cmd` through the sandbox shell. `script_type="python"` runs `cmd` as a Python script in the sandbox's interpreter, which is cleaner for multi-line code:

> [!NOTE] `python` vs `python3` on the host
> These examples use `python`. The remote sandbox image is based on `flyte.Image.from_debian_base()`, which ships both `python` and `pip` on PATH, so `python` always works there. The on-device transport runs against the host's Python: stock macOS has no `python` symlink, so use `python3` there (and for installs, prefer `uv pip install` — the session venv is uv-managed and ships no `pip`).

```python
proc = await sbx.run(
    """
    import json, pathlib
    data = json.loads(pathlib.Path("/tmp/my-job/in.json").read_text())
    print(sum(data["values"]))
    """,
    script_type="python",
    stdout=True,
)
```

## Installing packages: install is just a `run()`

A session keeps one shared virtualenv on its work dir, and every Python `run()` uses it. So there's no separate install API. Installing a package is an ordinary `run()`, and it persists for the life of the session:

```python
async with sb.on_device.session(
    backend="userns",
    network_mode="allowlist",
    network_allowlist=sb.PYPI_HOSTS,
) as sbx:
    await sbx.run("uv pip install requests") # lands in the session venv
    out = await sbx.run_code("import requests; print(requests.__version__)")
```

Install once, import anywhere: the package a `run()` installs is visible to every later `run()` in the same session. The session venv is built `--system-site-packages` so it can read the owner interpreter's packages, but installs go into the session venv only; the task's own Python is never mutated. The venv is `uv`-managed and ships no `pip`, so use `uv pip install` (not bare `pip`).

> [!NOTE] Only the work dir and the session venv persist
> Files written under the session work dir, and packages installed into the shared session venv, survive across `run()` calls. The writable scratch mounts (`/tmp`, `/dev/shm`) are a fresh tmpfs on every `run()`, so anything written to bare `/tmp` does not carry over; the rest of the filesystem is read-only. Note the default work dir lives at `/tmp/sandbox-work`. That's a separate persistent mount, distinct from the per-run `/tmp` scratch. See [Filesystem](./filesystem).

## Errors vs non-zero exits

A **non-zero exit of your own code is not an error.** It returns normally; branch on `proc.returncode`:

```python
proc = await sbx.run("exit 5", stdout=True)
await proc.communicate()
print(proc.returncode)   # 5, no exception raised
```

`communicate()` / `wait()` raise `SandboxExecutionError` only when the process never reached a real exit, for example a server-side spawn failure or a stream that died before termination:

```python
from union.sandbox import SandboxExecutionError

try:
    out, err = await proc.communicate()
except SandboxExecutionError as e:
    print(e.reason)        # e.g. "server crashed", "stream died"
    print(e.returncode)    # a fabricated -1 (no real exit occurred)
    print(e.backend)       # which backend was in use
    print(e.runtime_ms)    # how long before it failed
```

The split matters for agent loops: a sandboxed command failing its assertions is *signal*, and you handle it with `proc.returncode`. Only an exception means the sandbox itself misbehaved.

## Timeouts

`timeout_s` on `run()` kills a single process; sessions get their own timeout (remote sessions take it as `sb.session(timeout=...)`, defaulting to one hour). When `timeout_s` fires, the process is signalled and `proc.returncode` reflects the termination; the session stays open and you can run more commands.

## Related

- [Networking](./networking). What `network_mode` and `network_allowlist` actually constrain.
- [Filesystem](./filesystem). `put_bytes` / `get_bytes` to move data in and out without shelling out.
- [Security model](./security-model). What the backend reported in `proc.backend` actually defends against.

=== PAGE: https://www.union.ai/docs/v2/union/user-guide/sandboxing/interactive-sandboxes/networking ===

# Networking

A sandbox session has two layers of network posture: a **session-level default** that bounds what the session can ever reach, and a **per-call override** on each `run()` that can tighten within that bound. Set the session-level posture to the broadest thing any `run()` in the session needs; per-call overrides narrow from there.

```python
async with await sb.session(
    network_mode="allowlist",
    network_allowlist=sb.PYPI_HOSTS,
) as sbx:
    await sbx.run("python my_tool.py", network_mode="blocked")       # tighten to blocked
    await sbx.run("uv pip install requests")                         # uses session default
    await sbx.run("python use_requests.py", network_mode="blocked")  # tighten again
```

`sb.PYPI_HOSTS` is an exported convenience list (`pypi.org`, `files.pythonhosted.org`, `*.pythonhosted.org`) for the common case of allowing `uv pip install` and nothing else.

> [!IMPORTANT] Per-call can narrow, not broaden
> On a remote sandbox, the pod's network namespace is committed at session open and can't be widened later. On-device sessions create a fresh network namespace per `run()` and don't have this constraint, but writing for both transports is simplest if you treat session-level as the ceiling everywhere.

## The three postures

| `network_mode`        | What the sandboxed process sees                                                                                                                 |
| --------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------- |
| `"blocked"` (default) | A fresh network namespace with only loopback. Outbound connections fail at the kernel level.                                                    |
| `"open"`              | The host network. Use only when the sandboxed code is trusted.                                                                                  |
| `"allowlist"`         | A per-call pair of proxies — an HTTP CONNECT proxy and a SOCKS5 proxy — both enforcing the same filter. `HTTP_PROXY` / `HTTPS_PROXY` point at the HTTP proxy and `ALL_PROXY` at the SOCKS5 proxy; anything not on `network_allowlist` is refused. |

`network_allowlist` accepts CIDRs (`10.0.0.0/8`) and DNS patterns including wildcards (`*.pythonhosted.org`). It's only consulted when `network_mode="allowlist"`.

The two proxies exist so different clients can be filtered the same way: HTTP libraries (pip, curl, requests, boto3, huggingface_hub) honour `HTTPS_PROXY`, while non-HTTP TCP clients (git, ssh, database drivers) honour `ALL_PROXY` and route through the SOCKS5 proxy. Both apply the same deny-then-allow check.

## The deny-list

`network_denylist` is the inverse of the allow-list: a set of CIDRs and DNS patterns that are blocked, checked before the allow-list (deny wins). It's a session-level policy. Pass it to `sb.on_device.session(...)` / `sb.session(...)`, not to `run()`, and it's valid with `network_mode="open"` or `"allowlist"` (it has no meaning under `"blocked"`, which already denies everything).

It unlocks two postures a plain allow-list can't express:

- Open egress with carve-outs — `network_mode="open"` plus `network_denylist=[...]`: full egress, except a few named destinations. Everything not denied is allowed.
- A hole punched in an allow-list — `network_mode="allowlist"` plus `network_denylist=[...]`: a host is blocked even when it matches an allow-list wildcard.

```python
async with await sb.session(
    network_mode="open",
    network_denylist=sb.CLOUD_METADATA_DENY,   # block cloud-metadata endpoints, allow the rest
) as sbx:
    ...
```

> [!WARNING] `open` mode does not auto-guard internal ranges
> When you ask for `network_mode="open"`, the internal-IP SSRF backstop is intentionally off since you asked for open egress. So a deny-list-only posture must name the sensitive endpoints explicitly. A bare `169.254.169.254` misses the rest of the link-local range and every IPv6 metadata endpoint, which is exactly what `sb.CLOUD_METADATA_DENY` exists for.

### Blocking cloud metadata

`sb.CLOUD_METADATA_DENY` is an exported list of the well-known cloud instance-metadata (IMDS) and link-local endpoints like the AWS/GCP/Azure IMDS address `169.254.169.254`, the wider IPv4 link-local range, the GCP `metadata.google.internal` hostname, and the AWS IPv6 IMDS endpoint. Splat it into your deny-list and add your own entries:

```python
network_denylist=[*sb.CLOUD_METADATA_DENY, "10.0.0.0/8"]
```

Like the allow-list, this is a guardrail for honest clients, not containment (see the warning below).

## Bring-your-own egress proxy

Instead of the built-in allow/deny proxy, you can route a session's egress through your own inspecting proxy (mitmproxy, squid, a sidecar). Set `network_proxy_url` on the session; the sandbox injects it into the child's `HTTP_PROXY` / `HTTPS_PROXY` and does not start its own proxy. Filtering and inspection are then entirely your proxy's responsibility. `network_allowlist` and `network_denylist` are not applied by the sandbox in this mode. Add a companion `network_socks_url` (`socks5h://...`) to back `ALL_PROXY` so non-HTTP clients route through it too. Both are only meaningful with `network_mode="open"` or `"allowlist"`.

```python
async with sb.on_device.session(
    backend="userns",
    network_mode="open",
    network_proxy_url="http://127.0.0.1:8080",   # your inspecting proxy
    network_socks_url="socks5h://127.0.0.1:1080",
) as sbx:
    ...
```

## What the allow-list actually constrains

> [!WARNING] The allow-list and deny-list are proxy-based, not a kernel wall
> `network_allowlist` and `network_denylist` constrain clients that honour the proxy environment variables: `HTTPS_PROXY` (pip, curl, requests, boto3, huggingface_hub, and most HTTP libraries) and `ALL_PROXY` (git, ssh, database drivers, and other SOCKS5-aware TCP clients). They are not a kernel-level firewall. Adversarial code that bypasses the proxies (raw sockets, DNS-over-UDP, anything that ignores env vars) will not be filtered.
>
> For a hard network boundary against untrusted code, use `network_mode="blocked"`. The allow-list and deny-list are for convenience under trust, not adversarial isolation.

So:

- You're installing dependencies or hitting a known API from your own code: `allowlist` is the right tool. Lower friction than tearing the session down, audit-friendly.
- You're running untrusted code and want to permit only certain egress: don't rely on `allowlist`. Use `blocked` and stage the data the sandboxed code needs via [`put_bytes`](./filesystem) before the call.

## Setting a default for the session

Pass `network_mode=` and `network_allowlist=` to `session(...)` to set the default for the whole session; individual `run()` calls still override it:

```python
async with sb.on_device.session(
    backend="userns",
    network_mode="allowlist",
    network_allowlist=sb.PYPI_HOSTS,
) as sbx:
    await sbx.run("uv pip install numpy")                        # uses session default
    await sbx.run("python untrusted.py", network_mode="blocked") # tightened
```

## Session default vs per-call override

Two places set network posture, and they accept the same two arguments:

| Where you set it                            | Arguments                             | What it controls                                                                                                                                  |
| ------------------------------------------- | ------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------- |
| `sb.on_device.session(...)` / `sb.session(...)` | `network_mode=`, `network_allowlist=`, `network_denylist=` | Default for every `run()` in the session. On a remote sandbox this _also_ sets the pod-level network posture, so the per-call proxy can dial out. |
| `run(...)`                                  | `network_mode=`, `network_allowlist=` | The posture for this one call. Overrides the session default.                                                                                     |

> [!IMPORTANT] Session-level posture sets the pod's network on remote
> On a remote sandbox, the session-level `network_mode` _determines whether the sandbox-server pod has any network at all._ `network_mode="blocked"` at the session level means the pod has no egress, period. A per-call `run(network_mode="allowlist", ...)` will then fail with `Temporary failure in name resolution`, because the per-call proxy has nowhere to dial out from.
>
> Rule of thumb: set the session-level `network_mode` to the **broadest** posture any `run()` in the session needs, then tighten per call. If a session has even one step that needs `pypi.org`, set `network_mode="allowlist"` (or `"open"`) on `sb.session()`; the per-call defaults on other `run()`s will still be `"blocked"`.

## How the proxies are implemented

The on-device transport spins up two short-lived proxies for the `run()` call: an HTTP CONNECT proxy (`HTTP_PROXY` / `HTTPS_PROXY`) and a SOCKS5 proxy (`ALL_PROXY`). Each checks the target against the deny-list first and then the allow-list, and either dials out or refuses (`403` for the HTTP proxy). There is no shared state between calls; each `run()` gets a fresh pair with the policy you passed in. On a remote sandbox the same filtering runs server-side in the sandbox-server pod.

## Related

- [Security model](./security-model). What network isolation buys you and what it doesn't.
- [Running commands](./running-commands). Other `run()` arguments.
- [Filesystem](./filesystem). Staging data into a blocked sandbox with `put_bytes`.

=== PAGE: https://www.union.ai/docs/v2/union/user-guide/sandboxing/interactive-sandboxes/filesystem ===

# Filesystem

Each session has its own work dir on the sandbox filesystem. State written there during one `run()` is visible to the next, and you can push or pull bytes from outside the sandbox without shelling out.

## Moving bytes in and out

`put_bytes` and `get_bytes` are the simplest way to stage a file into a sandbox before a command, or retrieve a result after:

```python
import tempfile

with tempfile.TemporaryDirectory() as work:
    async with sb.on_device.session(host_work_dir=work, backend="userns") as sbx:
        await sbx.put_bytes(f"{work}/input.json", b'{"x": 1}')

        proc = await sbx.run(f"python {work}/process.py")
        await proc.communicate()

        result = await sbx.get_bytes(f"{work}/result.json", max_bytes=10 * 1024 * 1024)
```

Both methods take absolute paths that must resolve under the session's work directory. Pin it with `host_work_dir=...` when you want a known path to interpolate (as above); leave it unset and the session creates a fresh temp dir for its lifetime. Paths outside the work dir are refused, as are symlinks in the prefix. `max_bytes` caps the response size; the call raises if the file is larger.

The work-dir constraint is its own rule, separate from the filesystem allow-list (covered below). The allow-list governs what the sandboxed process can read or write at runtime; `put_bytes` / `get_bytes` operate on the work dir specifically. To make a file available at, say, `/tmp/input.json` inside the sandbox, write it to the work dir with `put_bytes` and have the sandboxed script read from there.

Use these together with `network_mode="blocked"` to keep an untrusted `run()` from doing any I/O of its own: stage inputs via `put_bytes`, run with no network, collect outputs via `get_bytes`.

## The default allow-list

The sandbox grants the sandboxed process read-only access to system paths (`/usr`, `/lib`, `/etc`, `/proc`, `/sys`, and so on) and read-write access to `/tmp`, `/dev/shm`, and the per-session work dir. Everything else on the host is invisible.

These defaults are secure; you don't have to touch them for typical use.

> [!NOTE] What persists and what doesn't
> The writable mounts split into two kinds. `/tmp` and `/dev/shm` are per-run scratch: each `run()` gets a fresh tmpfs, so anything written there is gone on the next call. The work dir is the session's persistent disk: files written there survive across `run()` calls for the life of the session. The default work dir happens to live at `/tmp/sandbox-work`, but that is a separate persistent mount, not part of the per-run `/tmp` scratch. Write state you want to keep to the work dir, not bare `/tmp`.

## Extending the allow-list

You can add host paths to the allow-list. Additions never replace the secure defaults:

```python
sbx = sb.on_device.session(
    backend="userns",                        # vanilla pod; no extra capabilities
    read_only_paths=["/opt/models"],         # extend the read-only allow-list
    read_write_paths=["/data/scratch"],      # extend the read-write allow-list
    host_work_dir="/tmp/my-sandbox-work",    # pin the per-session work dir
)
```

`host_work_dir` is useful when you want a stable, inspectable location for the work dir (CI artifacts, post-mortem debugging). When omitted, the library picks a fresh directory.

One common use of `read_only_paths` is exposing a venv baked into the image so the sandboxed interpreter can import it — point it at the image's `VIRTUAL_ENV` (e.g. `/opt/venv`), which lives outside the default `/usr` allow-list. On the on-device transport you usually don't need to: when the session builds its shared venv (whenever `uv` is available), it bridges and mounts the image venv read-only for you. Reach for `read_only_paths` here only when `uv` isn't present or the framework lives at a path the auto-bridge won't mount.

Inspect the effective allow-list at any time:

```python
allowlist = sbx.fs_allowlist()
# {"read_only":  ["/usr", "/lib", ..., "/opt/models"],
#  "read_write": ["/tmp", "/dev/shm", ..., "/data/scratch", "/tmp/my-sandbox-work"]}
```

> [!NOTE] Allow-list additions on remote sandboxes
> `read_only_paths` and `read_write_paths` extend what the **sandboxed process** can see inside the sandbox-server pod. They do not mount host directories onto a remote pod; the pod's filesystem comes from its image. To make external data available to a remote sandbox, push it with `put_bytes` or mount it through the `SandboxEnvironment` (see [Deployment](./deployment)).

## Volumes (coming soon)

Persistent volume support is on the roadmap, so that sandbox sessions can attach durable storage (PVCs, shared scratch, model caches) without redeploying. Today, persistent state across sessions requires either `put_bytes` / `get_bytes` to your own storage, or baking the data into the sandbox image via a custom `SandboxEnvironment`.

If volumes block a use case you care about, let us know on Slack so we can prioritize accordingly.

## Related

- [Running commands](./running-commands). `cwd` lands under the work dir.
- [Networking](./networking). Pair `put_bytes` with `network_mode="blocked"` for isolated, hermetic runs.
- [Deployment](./deployment). `SandboxEnvironment` for baking data and dependencies into the remote sandbox image.

=== PAGE: https://www.union.ai/docs/v2/union/user-guide/sandboxing/interactive-sandboxes/deployment ===

# Deployment

`unionai-sandbox` ships two transports with identical `Session` APIs. This page covers how to install and deploy each, when to use which, and how to customise the remote-pod environment.

## Pick a transport

The two transports cover different stages of a sandbox workflow:

- On-device (`sb.on_device.session(...)`): development, CI, and install checks. Runs sandboxed child processes inside the current container or task pod, no separate sandbox-server. Needs no Union connection, no extra pod, no deploy. Lowest latency. Not for production untrusted code: it shares a container with your task's code and credentials. See [Security model](./security-model) for the blast-radius argument.
- Remote (`sb.session(...)`): production. Runs the sandbox in its own Flyte-task pod with a minimal image, its own service account, and an independent lifecycle. Serializable across task boundaries, observable in the UI, optionally hardened with gVisor.

In other words: develop on-device, ship remote. The call sites are nearly identical (`sb.on_device.session(...)` vs `await sb.session(...)`), so promoting a working on-device prototype to production is a one-token change plus a one-time deploy.

## On-device: install and go

```sh
pip install 'unionai-sandbox[flyte]'
```

The `[flyte]` extra brings in the Flyte SDK so `@env.task` and the rest of the recommended task-based shape work. (For purely Flyte-less scripts, bare `pip install unionai-sandbox` is enough.) `sb.on_device.session(...)` works inside any async Python: a notebook, a script on your laptop, a CI runner, a Flyte task you're iterating on before shipping.

> [!IMPORTANT] Choose the backend explicitly
> The on-device backend is selected with `backend=` and defaults to `"bubblewrap"`. There is no auto-detection and no silent fallback. An unavailable backend makes `run()` fail loudly rather than downgrade. `bubblewrap` needs `CAP_SYS_ADMIN` + unconfined AppArmor (see below). The chosen backend is reported on each process as `proc.backend`. See [Security model](./security-model#isolation-backends).

### Running an on-device script

If the script calls `asyncio.run(main())` at module scope, run it directly:

```sh
python my_agent.py
```

If the code is wrapped in a `flyte.TaskEnvironment` + `@env.task` (which is the recommended shape inside a Union codebase), the same file still runs as a plain script. Flyte's local executor picks up the `@env.task` and runs it in-process:

```sh
python my_agent.py
```

No Union cluster needed, no `flyte run` invocation. The on-device sandbox spawns inside whatever container or virtualenv you launched `python` in.

### Running an on-device sandbox in a task pod

On-device isn't only for the laptop. You can run the sandbox child inside a real task pod with no extra sandbox-server. The only thing that changes between the two backends is the pod:

```python
import flyte
from union import sandbox as sb

# userns: vanilla pod, no special capabilities.
userns_env = flyte.TaskEnvironment(
    name="sandboxed-userns",
    image=sb.base_sandbox_image,
)

# bwrap: same image, but the pod grants CAP_SYS_ADMIN + unconfined AppArmor.
bwrap_env = flyte.TaskEnvironment(
    name="sandboxed-bwrap",
    image=sb.base_sandbox_image,
    pod_template=flyte.PodTemplate().allow_nested_sandboxing(),
)

@userns_env.task
async def run_userns() -> str:
    async with sb.on_device.session(backend="userns") as sbx:
        proc = await sbx.run("uname -a", stdout=True)
        out, _ = await proc.communicate_text()
        return out

@bwrap_env.task
async def run_bwrap() -> str:
    async with sb.on_device.session(backend="bubblewrap") as sbx:
        proc = await sbx.run("uname -a", stdout=True)
        out, _ = await proc.communicate_text()
        return out
```

`flyte.PodTemplate().allow_nested_sandboxing()` grants exactly the `CAP_SYS_ADMIN` + unconfined-AppArmor posture `bubblewrap` needs (and nothing more because the pod is not privileged). Without it, a `backend="bubblewrap"` session fails loudly. See [Security model](./security-model#pod-security-for-the-bubblewrap-backend).

## Remote: one-time deploy, then per-run sessions

Install the deploy extra:

```sh
pip install 'unionai-sandbox[deploy]'
```

Deploy the default sandbox task envs once per cluster:

```sh
unionai-sandbox-deploy
```

This runs `flyte deploy --all` against the installed `_server.py`.

After deploy, open sessions from any task. The caller task's image must have `unionai-sandbox` installed.

```python
import flyte
from datetime import timedelta
from union import sandbox as sb

env = flyte.TaskEnvironment(
    name="agent",
    image=flyte.Image.from_debian_base().with_pip_packages(
        "unionai-sandbox[remote]"
    ),
)

@env.task
async def main() -> str:
    async with await sb.session(timeout=timedelta(minutes=30)) as sbx:
        proc = await sbx.run("uname -a", stdout=True)
        out, _ = await proc.communicate_text()
        print(sbx.name, sbx.ip, sbx.created_at, sbx.url)
        return out
```

> [!IMPORTANT] Caller image must install `unionai-sandbox[remote]`.

Bringup is split into two phases so your setup work overlaps with pod startup: `sb.session(...)` submits the run and returns instantly; `async with` (or `await sbx`) waits for the pod to become addressable; the transport health-check is deferred to the first `run()`.

### Running a remote script

A script that opens `sb.session(...)` is invoked through the Flyte CLI, which dispatches the calling task to the cluster. The sandbox pod then comes up alongside it:

```sh
flyte run my_agent.py main
```

To target a specific project and domain:

```sh
flyte run --project my-project --domain development my_agent.py main
```

`my_agent.py` here is the file containing your `@env.task async def main(...)` definition; `main` is the task name. The `sb.session(...)` call inside `main` submits the deployed sandbox task as its own run.

A `SandboxSession` exposes this metadata:

| Field        | Meaning                                                     |
| ------------ | ----------------------------------------------------------- |
| `name`       | Session name (equals the Flyte run name).                   |
| `endpoint`   | URL the transport opens against.                            |
| `ip`         | Pod IP, once surfaced.                                      |
| `created_at` | UTC construction timestamp.                                 |
| `is_owner`   | `True` on the side that created the run (and can abort it). |
| `url`        | Union console URL for the run (owner side).                 |

`sb.session()` arguments worth knowing:

| Argument                         | Default               | What it does                                                                                                  |
| -------------------------------- | --------------------- | ------------------------------------------------------------------------------------------------------------- |
| `environment`                    | `DEFAULT_SANDBOX_ENV` | The `SandboxEnvironment` to launch. See below.                                                                |
| `name`                           | random `sbx-<16hex>`  | Run name. Set to make the run discoverable in the UI.                                                         |
| `timeout`                        | `timedelta(hours=1)`  | Soft per-session timeout. Hard ceiling of 24h is baked into the task decorator as a safety net.               |
| `project`, `domain`              | inherited             | Where to launch the sandbox run.                                                                              |
| `resources`                      | env's default         | Per-launch override. Rewrites the deployed task's resources and resizes the in-pod sandbox cgroup ceiling.    |
| `network_mode`, `network_allowlist` | `"blocked"`, `None`   | Session default for every `run()`. On remote this also sets the pod-level network posture, so the per-call proxy can dial out. Per-call `run(network_mode=...)` still overrides for that one call. |
| `network_denylist`               | `None`                | Session-level deny-list (CIDRs / DNS patterns), checked before the allow-list. Valid with `network_mode="open"` or `"allowlist"`. See [Networking](./networking#the-deny-list). |

## Defining a custom `SandboxEnvironment`

`sb.session()` launches `sb.DEFAULT_SANDBOX_ENV` unless you pass your own. Define one to control the image, resources, secrets, and isolation.

The deploy CLI (`flyte deploy --all <file>`) discovers task envs by scanning a Python file for module-level objects. So a custom environment lives in two parts: the `SandboxEnvironment` itself, and a thin deploy module that exposes its `task_env` at module scope.

`my_sandboxes.py`: define the environment:

```python
import flyte
from union import sandbox as sb

ml_sandbox = sb.SandboxEnvironment(
    name="ml-sandbox",
    image=sb.base_sandbox_image.with_pip_packages("torch", "transformers"),
    resources=flyte.Resources(cpu="8", memory="32Gi", gpu="L4:1"),
    secrets=[flyte.Secret(group="hf", key="HF_TOKEN")],
    env_vars={"HF_HOME": "/tmp/hf"},
    sandbox_mode="userns",        # "userns" | "bwrap"
    runtime="container",          # "container" | "gvisor"
    description="ML inference sandbox",
)
```

`deploy_my_sandboxes.py`: the deploy entrypoint. Re-export the `task_env` at module scope so the deploy CLI can find it:

```python
from my_sandboxes import ml_sandbox

# Module-scope name; flyte deploy discovers this via isinstance(v, flyte.Environment).
ml_sandbox_env = ml_sandbox.task_env
```

Deploy once:

```sh
flyte deploy deploy_my_sandboxes.py ml_sandbox_env
```

Then launch sessions against it from any task:

```python
import flyte
from union import sandbox as sb
from my_sandboxes import ml_sandbox

env = flyte.TaskEnvironment(
    name="agent",
    image=flyte.Image.from_debian_base().with_pip_packages(
        "unionai-sandbox[remote]"
    ),
)

@env.task
async def run_inference() -> str:
    async with await sb.session(environment=ml_sandbox) as sbx:
        proc = await sbx.run(
            "python -c 'import torch; print(torch.__version__)'",
            stdout=True,
        )
        out, _ = await proc.communicate_text()
        return out
```

The built-in `unionai-sandbox-deploy` is exactly this pattern applied to the library's own defaults; your custom envs follow the same recipe.

| Parameter      | Notes                                                                                        |
| -------------- | -------------------------------------------------------------------------------------------- |
| `name`         | Task-environment identifier; `session()` resolves `{name}.sandbox_server`.                   |
| `image`        | Defaults to `sb.base_sandbox_image`; extend with `.with_pip_packages(...)` etc.              |
| `resources`    | Default per-session `flyte.Resources`. Override per launch with `sb.session(resources=...)`. |
| `secrets`      | `flyte.Secret`s forwarded to the sandbox pod.                                                |
| `env_vars`     | Environment variables forwarded to the pod.                                                  |
| `sandbox_mode` | In-pod isolation backend: `"userns"` (default) or `"bwrap"`. `"bwrap"` makes the deployed pod carry `CAP_SYS_ADMIN` + unconfined AppArmor. |
| `runtime`      | Pod runtime: `"container"` (default) or `"gvisor"`.                                          |
| `sys_cap_admin`| Explicit override of the `CAP_SYS_ADMIN` grant. `None` (default) grants it iff `sandbox_mode="bwrap"`; `True` always; `False` never. Use `False` to run `bwrap` on a cluster that already allows unprivileged user namespaces, or `True` for `userns` on a cluster whose seccomp profile blocks the userns syscalls. |

Two ready-built defaults are exported: `sb.DEFAULT_SANDBOX_ENV` (userns, container runtime) and `sb.DEFAULT_SANDBOX_ENV_BWRAP` (bubblewrap, container runtime).

## Passing a sandbox between tasks

A `SandboxSession` is serializable, so the task that launches the sandbox can pass it to other tasks. The launcher is the **owner**; a receiver lands in **reference mode**.

```python
import asyncio
from datetime import timedelta
from union import sandbox as sb

@env.task
async def child(sbx: sb.SandboxSession, script: str) -> dict:
    # Reference mode: no `async with` needed. Endpoint round-tripped via
    # serialization; first run() lazily opens the transport.
    proc = await sbx.run(script, stdout=True)
    out, _ = await proc.communicate_text()
    return {"script": script, "stdout": out, "returncode": proc.returncode}

@env.task
async def parent() -> list[dict]:
    # Owner mode: we launched the pod, so we own its lifetime and abort on exit.
    async with await sb.session(timeout=timedelta(minutes=15)) as sbx:
        return await asyncio.gather(
            child(sbx, "echo one"),
            child(sbx, "echo two"),
        )
```

Only the **owner** can abort the run. Calling `close()` on a reference-mode session shuts that receiver's transport only; the run keeps going until the owner aborts it (or the session times out).

## Detached lifetime

A remote `SandboxSession` doesn't require `async with`. Keep the handle and manage the lifetime yourself, useful for apps and services where the sandbox outlives a single block of code:

```python
import flyte
from datetime import timedelta
from union import sandbox as sb

env = flyte.TaskEnvironment(
    name="long-running-service",
    image=flyte.Image.from_debian_base().with_pip_packages(
        "unionai-sandbox[remote]"
    ),
)

@env.task
async def serve_user_session(user_id: str) -> str:
    sbx = await sb.session(timeout=timedelta(minutes=30))
    await sbx                       # wait for the pod to surface, fail fast on a bad launch
    try:
        proc = await sbx.run("uname -a", stdout=True)
        out, _ = await proc.communicate_text()
        return out
    finally:
        await sbx.close()           # closes the transport and aborts the run (owner side)
```

The same pattern works outside a task for on-device development. To attach to a `sandbox-server` you started yourself, use `sb.remote.session(endpoint=...)` instead of `sb.session()`.

## Per-session timeout vs hard ceiling

Two timeouts protect a sandbox pod from leaking forever:

- **Per-session soft timeout** (default 1 hour, settable via `sb.session(timeout=...)`). Enforced inside the task body. On expiry, the body signals the sandbox binary (SIGTERM, then SIGKILL after 10s) and exits cleanly.
- **Hard ceiling** (24 hours, baked into the task decorator). The Flyte runtime kills the action after this. If a session owner crashes without calling `close()` and the soft timeout doesn't fire, the action still terminates within 24h.

Design your soft timeouts to be well below the hard ceiling. The ceiling is a safety net, not a parameter.

## Related

- [Security model](./security-model). When to pick on-device vs remote, pod security for the bubblewrap backend, and when to enable gVisor.
- [Networking](./networking). Per-call `network_mode` and what the allow-list does and does not protect against.
- [Filesystem](./filesystem). `read_only_paths` and `read_write_paths` extensions, volumes roadmap.

=== PAGE: https://www.union.ai/docs/v2/union/user-guide/sandboxing/interactive-sandboxes/gpu ===

# GPU

A sandbox can run GPU workloads: `nvidia-smi`, a CUDA matmul, PyTorch inference, etc. GPU workloads run inside the same isolation boundary as everything else. The sandboxed process gets the GPU devices bound into its namespace, but no broader access to the host than a CPU sandbox would have.

## GPU requires the bubblewrap backend

CUDA's compute path (`cuInit`) does not initialize under the userns-lite backend's unprivileged user namespace. It fails with `cudaErrorOperatingSystem` even though `nvidia-smi` and NVML work and the `/dev/nvidia*` device nodes are visible. The bubblewrap backend runs the workload in a posture the NVIDIA driver accepts, so **GPU sandboxes must use bubblewrap**:

- Remote: set `sandbox_mode="bwrap"` on the `SandboxEnvironment` (or `sb.session(sandbox_mode="bwrap", ...)`).
- On-device: pass `backend="bubblewrap"` to `sb.on_device.session(...)`, which is already the default.

`sb.session()` fails fast if you request a GPU on a non-bubblewrap backend.

## Remote GPU sandbox

The supported path is a remote `SandboxEnvironment` that declares a GPU in its `resources` and bakes the GPU framework into its image. Schedule the pod with the GPU, set `sandbox_mode="bwrap"`, and the sandbox-server binds `/dev/nvidia*` into the sandboxed child.

```python
import flyte
from union import sandbox as sb

gpu_sandbox = sb.SandboxEnvironment(
    name="union-sandbox-gpu-l4",
    sandbox_mode="bwrap",                       # required for CUDA
    image=sb.base_sandbox_image.with_pip_packages("torch"),
    resources=flyte.Resources(cpu="4", memory="16Gi", gpu="L4:1"),
    description="Sandbox with one NVIDIA L4 and PyTorch preinstalled.",
)

env = flyte.TaskEnvironment(
    name="union-sandbox-remote-gpu-l4",
    # `kubernetes` is needed because constructing a SandboxEnvironment builds its
    # pod template (via kubernetes.client). CPU examples that only call
    # sb.session(sandbox_mode=...) without constructing one don't need it.
    image=flyte.Image.from_debian_base().with_pip_packages("kubernetes"),
    resources=flyte.Resources(cpu="500m", memory="512Mi"),
    # Declaring the sandbox env as a dependency means one `flyte deploy` of this
    # env also deploys gpu_sandbox's sandbox-server task, so there's something to launch.
    depends_on=[gpu_sandbox],
)

_TORCH_CHECK = """
import torch
print("cuda available:", torch.cuda.is_available())
assert torch.cuda.is_available(), "CUDA not visible inside sandbox"
dev = torch.device("cuda:0")
print("device:", torch.cuda.get_device_name(dev))
a = torch.randn(2048, 2048, device=dev)
b = torch.randn(2048, 2048, device=dev)
print("matmul sum:", (a @ b).sum().item())
"""

@env.task
async def main() -> dict:
    async with await sb.session(environment=gpu_sandbox) as sbx:
        smi = await sbx.exec("nvidia-smi")
        out = await sbx.run_code(_TORCH_CHECK)
        return {"nvidia_smi": smi.stdout, "torch": out}
```

Deploy the GPU image and sandbox task, then run:

```sh
flyte deploy --all examples/remote/tasks/torch_gpu_matmul.py   # build GPU image + deploy
flyte run examples/remote/tasks/torch_gpu_matmul.py main
```

### How the GPU reaches the sandbox

GPU access is fail-closed. The server's GPU ceiling defaults to zero, which denies the devices. When a session is launched against an environment whose `resources` include a GPU, the pod is scheduled with that GPU count and the ceiling is set to match, so each `run()` asks for the devices and the bubblewrap backend binds `/dev/nvidia*` into the sandboxed child. A sandbox launched on a non-GPU pod or with a zero ceiling, simply can't see a GPU.

## Resource sizing: don't under-cap the sandbox

The in-pod sandbox runs under a memory/CPU ceiling derived from the pod's resources (less a small reserve for sandbox-server itself). If that ceiling is too low, `import torch` fails while memory-mapping libtorch.

To avoid it, size the `SandboxEnvironment.resources` for the framework, not just the GPU. The `cpu="4", memory="16Gi"` above is a sane floor for PyTorch on a single L4. When you set `resources` on the environment (or per launch via `sb.session(resources=...)`), the sandbox ceiling is derived from that value automatically. The fallback `mem_ceiling_mb` / `cpu_ceiling_milli` kwargs apply only when no resources are given.

> [!NOTE] GPU type and count
> The `gpu="L4:1"` form selects the accelerator class and count via `flyte.Resources`. Use whatever GPU classes your cluster offers; the sandbox machinery is agnostic to the specific device.

## On-device GPU

Running a GPU workload on-device (in the calling task's own pod, no sandbox-server) works under the same constraint: `backend="bubblewrap"` on a pod that has a GPU and the bubblewrap prerequisites (`CAP_SYS_ADMIN` + unconfined AppArmor via `flyte.PodTemplate.allow_nested_sandboxing()`). The on-device sandbox runs `script_type="python"` against the current interpreter, so the GPU framework just needs to be importable from the task's own venv.

You usually don't have to expose that venv yourself. When the session builds its shared venv — which it does whenever `uv` is available, as in `base_sandbox_image` — it bridges the image venv's `site-packages` and mounts that venv read-only automatically, so `import torch` resolves with no `read_only_paths`:

```python
import flyte
from union import sandbox as sb

env = flyte.TaskEnvironment(
    name="union-sandbox-on-device-gpu",
    image=sb.base_sandbox_image.with_pip_packages("torch"),
    resources=flyte.Resources(cpu="4", memory="16Gi", gpu="L4:1"),
    pod_template=flyte.PodTemplate().allow_nested_sandboxing(),
)

@env.task
async def main() -> str:
    # base_sandbox_image ships uv, so the session venv is built and the image
    # venv (with torch) is auto-exposed read-only — no read_only_paths needed.
    async with sb.on_device.session(backend="bubblewrap") as sbx:
        return await sbx.run_code("import torch; print(torch.cuda.is_available())")
```

> [!NOTE] When you still need `read_only_paths`
> Add `read_only_paths=[sys.prefix]` only as a fallback: when `uv` isn't available (no session venv is built, so nothing is auto-bridged), or when the framework lives at a path the auto-bridge won't mount. `is_safe_to_mount` refuses broad roots like `/usr` (already in the default allow-list) but accepts a specific prefix such as `/opt/venv`.

Prefer remote for production GPU work: it isolates the GPU job in its own pod with its own credentials, the same blast-radius argument that applies to any remote sandbox (see [Security model](./security-model)).

## Related

- [Security model](./security-model). Why GPU sandboxes use bubblewrap, and the on-device blast-radius caveat.
- [Deployment](./deployment). Defining a custom `SandboxEnvironment`, per-launch `resources` overrides.
- [Running commands](./running-commands). `exec()` / `run_code()` used above.

=== PAGE: https://www.union.ai/docs/v2/union/user-guide/sandboxing/interactive-sandboxes/agents ===

# Agents

Interactive sandboxes are built for agent loops: an LLM (or agent framework) writes code, runs it, reads the result, and iterates — all against one live session whose filesystem and installed packages persist between turns. This page covers the patterns that matter when an agent generates code on the fly, whether that code is trusted (authored against your own prompts and tools) or untrusted (derived from end-user input).

## Why a session, not a container per turn

A one-shot container starts cold every turn: nothing the previous step wrote is there, nothing it installed is there. An agent that iterates on its own code wants the opposite: the file it wrote last turn, the package it installed two turns ago, and the ability to flip network access per step. A session gives it that:

- State persists: Files under the session work dir and packages in the shared session venv survive across `run()` calls (the writable scratch mounts like `/tmp` are fresh per command). So "write a script, run it, fix it, run it again" works without re-staging anything. See [Running commands](./running-commands#installing-packages-install-is-just-a-run).
- Security posture is per call: Flip `network_mode` between `blocked`, `allowlist`, and `open` on each `run()`. A turn that just executes the agent's code runs `blocked`; a turn that needs a package installs under `allowlist`; nothing is torn down in between. See [Networking](./networking).

## The shape of an agent loop

A typical loop stages inputs, lets the agent's tool loop write and execute code inside the sandbox, then collects the result:

```python
import os, tempfile
from union import sandbox as sb

async def run_agent(env_key: str) -> bytes:
    with tempfile.TemporaryDirectory() as work:
        async with sb.on_device.session(
            host_work_dir=work,                       # pin so we can interpolate paths
            network_mode="allowlist",
            network_allowlist=["api.anthropic.com"],  # the model endpoint
            backend="userns",
            timeout_s=1200,
        ) as sbx:
            # 1. Stage inputs into the work dir.
            await sbx.put_bytes(f"{work}/data.csv", _CSV)
            await sbx.put_bytes(f"{work}/driver.py", _driver(work).encode())

            # 2. Run the agent's driver. Its tool loop writes Python, executes
            #    it in the sandbox, and iterates until it writes answer.json.
            proc = await sbx.run(
                f"python {work}/driver.py",
                env={"ANTHROPIC_API_KEY": env_key},   # see the secret note below
                stdout=True, stderr=True, timeout_s=600,
            )
            out, err = await proc.communicate_text()
            assert proc.returncode == 0, f"agent failed rc={proc.returncode}\n{err}"

            # 3. Collect the result.
            return await sbx.get_bytes(f"{work}/answer.json")
```

`put_bytes` / `get_bytes` move data across the boundary without shelling out; pinning `host_work_dir` lets you interpolate a known path into both the driver script and the prompt. See [Filesystem](./filesystem).

## A failing command is signal, not an error

The single most important distinction for an agent loop: a non-zero exit of the sandboxed code is not an exception. When the agent writes code that raises, asserts or exits non-zero, `run()` still returns normally and you branch on `proc.returncode`. That's the feedback the agent learns from.

```python
proc = await sbx.run("python attempt.py", stdout=True, stderr=True)
out, err = await proc.communicate_text()
if proc.returncode != 0:
    feedback = err          # hand this back to the agent and let it retry
```

`communicate()` / `wait()` raise `SandboxExecutionError` only when the process never reached a real exit, e.g. a server-side spawn failure or a stream that died mid-flight. That means the sandbox itself misbehaved, which is a different category from "the agent's code failed its assertions" and usually warrants aborting the loop rather than retrying. (The one-shot `run_code()` and `exec(check=True)` helpers raise `SandboxCommandError` on a non-zero exit instead; use plain `run()` when you want the exit code as data.) See [Running commands](./running-commands#errors-vs-non-zero-exits).

## Match the isolation to the trust level

Pick the isolation by what the agent actually runs, not by the fact that it's an agent. For most agent workloads the process-level default is the right choice, and you only escalate when the inputs are genuinely hostile:

- **Trusted control flow** — the prompts, tools, and any generated code are authored or vetted by you and not exposed to end users or external input. The default backend (bubblewrap / userns) is the intended choice here: fast, cheap, and sufficient. Run on-device while developing; move to a remote session for production to get the pod boundary, observability, and an independent lifecycle.
- **Untrusted or multi-tenant** — the agent runs code derived from end-user input, or several tenants share the system. Use a remote session with `network_mode="blocked"`, stage inputs via `put_bytes`, and harden the pod with gVisor.

See [Security model](./security-model) for the full decision.

## Secrets forwarded into the sandbox

Agents that call a model need a credential, and the natural move is to forward it with `run(env=...)`. One thing to keep in mind: the sandboxed process can read its own environment, and the network allow-list — a proxy that only cooperating clients honour — won't stop code that opens a raw socket from sending the key out. So a forwarded key should be treated as readable by the sandboxed code:

- Use a **scoped, short-lived, low-limit** credential.
- Forward it explicitly per run (`run(env={...})`) rather than relying on env inheritance. A remote sandbox-server does not inherit the caller pod's full env anyway.

To keep the credential out of the sandbox entirely, call the model from the parent task and pass only prompts and results across the boundary with `put_bytes` / `get_bytes` — then the agent's code never holds the key. This is the right pattern when the generated code is untrusted; for trusted agents, forwarding a scoped key is a reasonable, deliberate trade-off.

## Related

- [Running commands](./running-commands). The error model, `exec()` / `run_code()`, and the shared-venv install model the loop relies on.
- [Networking](./networking). Per-call network flips and what the allow-list does and does not protect against.
- [Filesystem](./filesystem). `put_bytes` / `get_bytes` for staging inputs and collecting results.
- [Security model](./security-model). Choosing a posture by trust level, and when gVisor is worth it.

