Autoresearch agent

Code available on GitHub.

This tutorial wraps an autonomous AI research loop in a single Flyte task. The task spins up a GPU container, installs the Claude Code CLI, clones a research repository, and points Claude Code at a program.md brief. The agent runs experiments to improve a model, writes results to disk, and the task then commits the changes and opens a pull request, with a progress plot rendered both in the PR and in the Flyte UI.

It’s an example of using Flyte as durable infrastructure for long-running, autonomous agent work:

A GPU TaskEnvironment with the API-key and GitHub secrets the agent needs.
report=True to stream a progress plot into the Flyte UI.
A reconnecting run.wait() loop in the driver so a dropped client connection doesn’t lose track of a multi-hour run.

This example drives a coding agent that executes arbitrary code and pushes commits to a GitHub repository. Run it against a repository you control, and review the constants described below before launching.

Define the container image

The image is kept in its own _image.py module so edits to the agent logic in run.py don’t invalidate the image cache. Node.js and the Claude Code CLI are installed at run time (see below) to keep the image small.

_image.py
                
image = (
    flyte.Image.from_uv_script(__file__, name="autoresearch-agent", pre=True)
    .with_apt_packages("git")
)

Define the task environment

The task needs a GPU, a generous disk for the cloned repo and model weights, and two secrets: a GitHub token (to clone and push) and an Anthropic API key (for Claude Code).

run.py
                
autoresearch_env = flyte.TaskEnvironment(
    name="autoresearch-agent",
    resources=flyte.Resources(
        cpu=8,
        memory="32Gi",
        gpu="T4:1",
        disk="100Gi",
    ),
    secrets=[
        flyte.Secret(key="github_token", as_env_var="GITHUB_TOKEN"),
        flyte.Secret(key="internal-anthropic-api-key", as_env_var="ANTHROPIC_API_KEY"),
    ],
    image=autoresearch_image,
)

The agent targets a specific repository, identity, and branch via module-level constants. Update these to point at your own fork before running:

GITHUB_USERNAME = "<YOUR_GITHUB_USERNAME>"
GITHUB_EMAIL = "[email protected]"
AUTORESEARCH_REPO_URL = "https://github.com/<YOUR_ORG>/<YOUR_REPO>.git"
AUTORESEARCH_REPO_FULL_NAME = "<YOUR_ORG>/<YOUR_REPO>"

Model the result

The task returns a typed result describing the pull request it created.

run.py
                
@dataclass
class AutoResearchResult:
    """Result of the autoresearch run."""

    pr_url: str
    pr_number: int
    branch_name: str
    files_changed: list[str]
    success: bool
    error_message: Optional[str] = None

The autoresearch task

The task is a long, sequential procedure. It starts by installing Node.js and Claude Code at run time, cloning the repo, configuring git, creating a branch, and loading program.md as the prompt:

run.py
                
                    
                
            
                
            

                
                    
                
            
@autoresearch_env.task(report=True)
async def run_autoresearch() -> AutoResearchResult:
    """
    Run the autoresearch workflow end-to-end.

    Steps:
    - Clone https://github.com/unionai-oss/autoresearch
    - Configure git identity
    - Create a new branch
    - Run Claude Code CLI with program.md as the prompt
    - Commit results (CSV + train/ changes)
    - Push and open a PR against the autoresearch repo
    """
    github_token = os.environ["GITHUB_TOKEN"]
    anthropic_api_key = os.environ["ANTHROPIC_API_KEY"]

    # --- Install Node.js + Claude Code at runtime (keeps image small and submission fast) ---
    import tarfile
    import urllib.request as _urllib

    subprocess.run(["apt-get", "update", "-y"], check=False)
    subprocess.run(["apt-get", "install", "-y", "git"], check=False)

    node_url = "https://nodejs.org/dist/v20.19.0/node-v20.19.0-linux-x64.tar.gz"
    node_tar = Path("/tmp/node.tar.gz")
    print(f"Downloading Node.js from {node_url}...", flush=True)
    _urllib.urlretrieve(node_url, node_tar)
    size_mb = node_tar.stat().st_size / 1024 / 1024
    print(f"Downloaded {size_mb:.1f} MB to {node_tar}", flush=True)
    if size_mb < 1:
        raise RuntimeError(f"Node.js download appears empty/corrupt ({size_mb:.2f} MB) — network may be restricted")
    node_dir = Path("/tmp/node")
    node_dir.mkdir(exist_ok=True)
    print("Extracting Node.js...", flush=True)
    with tarfile.open(node_tar, "r:gz") as tar:
        members = [m for m in tar.getmembers() if m.name.split("/", 1)[-1]]
        for m in members:
            m.name = m.name.split("/", 1)[-1]
        tar.extractall(str(node_dir), members=[m for m in members if m.name])

    # Add node/npm to PATH for this process and all subprocesses
    node_bin = str(node_dir / "bin")
    os.environ["PATH"] = node_bin + ":" + os.environ.get("PATH", "")
    print(f"Node version: {subprocess.run(['node', '--version'], capture_output=True, text=True).stdout.strip()}", flush=True)

    npm_prefix = "/tmp/npm-global"
    Path(npm_prefix).mkdir(exist_ok=True)
    subprocess.run(["npm", "install", "-g", "--prefix", npm_prefix, "@anthropic-ai/claude-code"], check=True)
    os.environ["PATH"] = str(Path(npm_prefix) / "bin") + ":" + os.environ["PATH"]
    print("Node.js + Claude Code installed.", flush=True)

    # --- Clone repo ---
    work_dir = Path("/tmp/autoresearch_workspace")
    work_dir.mkdir(exist_ok=True, parents=True)
    repo_path = clone_repository(AUTORESEARCH_REPO_URL, work_dir, github_token)

    # --- Git identity ---
    subprocess.run(
        ["git", "config", "--global", "user.email", GITHUB_EMAIL], check=True
    )
    subprocess.run(
        ["git", "config", "--global", "user.name", GITHUB_USERNAME], check=True
    )

    # --- Create branch ---
    import time as _time
    branch_name = f"autoresearch/claude-run-{int(_time.time())}"
    try:
        subprocess.run(
            ["git", "checkout", "-b", branch_name],
            cwd=repo_path,
            check=True,
        )
    except subprocess.CalledProcessError:
        subprocess.run(
            ["git", "checkout", branch_name],
            cwd=repo_path,
            check=True,
        )

    # --- Read program.md to use as the Claude Code prompt ---
    program_md = repo_path / "program.md"
    if not program_md.exists():
        raise FileNotFoundError(
            f"program.md not found in {repo_path}. "
            "Make sure the autoresearch repo has a program.md at its root."
        )

    program_md_content = program_md.read_text()
    print(f"Loaded prompt from program.md ({len(program_md_content)} chars)")

From there the task:

Wraps the program.md brief with explicit logging and “write outputs to disk” instructions.
Disables the Claude Code sandbox (it conflicts with the Flyte pod’s container) and runs the CLI non-interactively, streaming its output to the Flyte logs in real time.
Collects the files the agent changed via git status, commits them, and force-pushes the branch.
Opens (or reuses) a pull request with PyGithub.
If the agent produced a results.tsv, renders a progress plot of validation bits-per-byte, attaches it to the PR, and streams it into the Flyte UI:

run.py
                
if __name__ == "__main__":
    import time

    flyte.init_from_config()

    run = flyte.with_runcontext(mode="remote").run(run_autoresearch)

    print(f"AutoResearch run started: {run.url}")
    print("Waiting for completion...")

    while True:
        try:
            run.wait()
            break
        except Exception as e:
            print(f"Connection dropped ({e}), reconnecting in 30s...")
            time.sleep(30)

    print(f"Done! See run at: {run.url}")

The entry point submits the task in remote mode and reconnects automatically if the client connection drops during the long run.

Run the agent

Create secrets

Get an Anthropic API key from the Anthropic console and a GitHub personal access token with permission to push and open PRs on the target repository.

        
flyte create secret github_token <YOUR_GITHUB_TOKEN>
flyte create secret internal-anthropic-api-key <YOUR_ANTHROPIC_API_KEY>

See Secrets for scoping and file-based secrets.

Prepare the research repository

The target repository must contain a program.md at its root describing the research task for the agent. Point AUTORESEARCH_REPO_URL / AUTORESEARCH_REPO_FULL_NAME (and the git identity constants) at a repo you control.

Run remotely

From the example directory:

        
cd v2/tutorials/autoresearch
python run.py

This task runs remotely (it needs a GPU and network access). Follow the printed run URL to watch the agent’s logs stream in, and open the run’s report panel to see the progress plot once results are available. When the task finishes, the returned AutoResearchResult contains the pull request URL.