Introduction

What is Helios?

Helios is a Computer Use Agent Framework that executes AI agents in isolated Docker containers with real-time viewing capabilities. Agents run tasks defined as simple directories containing instructions and configuration, with results verified programmatically through test scripts.

Quickstart

Get up and running with Helios in under 5 minutes

Core Concepts

Understand the architecture and key components

Create Tasks

Build your own agent tasks from scratch

API Reference

Use Helios programmatically in your Python code

Why Helios?

Safe Isolation

Every task runs in its own Docker container, providing complete isolation from your host system. Agents can execute arbitrary commands without risk to your environment.

Deterministic Verification

Tasks are verified using test scripts that check actual outcomes, not model self-assessment. Know with certainty whether an agent completed its task.

Real-time Observability

Watch agents work in real-time through the web viewer. See every tool call, screenshot, and decision as it happens.

Multi-Provider Support

Use any major LLM provider: Anthropic (Claude), OpenAI, Google Gemini, or AWS Bedrock. Switch models with a single flag.

Parallel Execution

Run hundreds of tasks concurrently with controlled parallelism. Perfect for benchmarks, regression testing, and batch operations.

Cloud-Native

Deploy to Daytona cloud sandboxes for scalable execution without local Docker dependencies.

Define a Task

Create a directory with instruction.md (what to do) and task.toml (configuration)

Run the Agent

Execute with helios tasks/your-task - the framework provisions a container and starts the agent

Agent Executes

The agent uses tools (bash, computer, editor) to complete the task in the isolated environment

Verify Results

The test.sh script runs to verify the task was completed correctly

Use Cases

Use Case	Description
Autonomous QA	Run regression tests with agents that interact with real applications
Data Extraction	Scrape websites, process files, and extract structured data
GUI Automation	Automate desktop applications through mouse and keyboard
OS Workflows	Execute system administration tasks and environment setup
Research & Benchmarking	Evaluate agent capabilities on standardized task suites

Featured: PDFBench

Helios includes PDFBench, a benchmark of 100 PDF form-filling tasks for evaluating computer-use agents. Each task requires the agent to fill out a PDF form based on conversational instructions.

Learn about PDFBench

Run the PDFBench benchmark suite on your models

Getting Started

Ready to run your first agent? Head to the Quickstart guide.

Getting Started

Tasks

Execution

Infrastructure

Benchmarks

Reference

Development

What is Helios?

Quickstart

Core Concepts

Create Tasks

API Reference

Why Helios?

Use Cases

Featured: PDFBench

Learn about PDFBench

Getting Started

Getting Started

Tasks

Execution

Infrastructure

Benchmarks

Reference

Development

​What is Helios?

Quickstart

Core Concepts

Create Tasks

API Reference

​Why Helios?

​Use Cases

​Featured: PDFBench

Learn about PDFBench

​Getting Started

What is Helios?

Why Helios?

Use Cases

Featured: PDFBench

Getting Started