Skip to main content

What is Helios?

Helios is a Computer Use Agent Framework that executes AI agents in isolated Docker containers with real-time viewing capabilities. Agents run tasks defined as simple directories containing instructions and configuration, with results verified programmatically through test scripts.

Why Helios?

Every task runs in its own Docker container, providing complete isolation from your host system. Agents can execute arbitrary commands without risk to your environment.
Tasks are verified using test scripts that check actual outcomes, not model self-assessment. Know with certainty whether an agent completed its task.
Watch agents work in real-time through the web viewer. See every tool call, screenshot, and decision as it happens.
Use any major LLM provider: Anthropic (Claude), OpenAI, Google Gemini, or AWS Bedrock. Switch models with a single flag.
Run hundreds of tasks concurrently with controlled parallelism. Perfect for benchmarks, regression testing, and batch operations.
Deploy to Daytona cloud sandboxes for scalable execution without local Docker dependencies.
1

Define a Task

Create a directory with instruction.md (what to do) and task.toml (configuration)
2

Run the Agent

Execute with helios tasks/your-task - the framework provisions a container and starts the agent
3

Agent Executes

The agent uses tools (bash, computer, editor) to complete the task in the isolated environment
4

Verify Results

The test.sh script runs to verify the task was completed correctly

Use Cases

Use CaseDescription
Autonomous QARun regression tests with agents that interact with real applications
Data ExtractionScrape websites, process files, and extract structured data
GUI AutomationAutomate desktop applications through mouse and keyboard
OS WorkflowsExecute system administration tasks and environment setup
Research & BenchmarkingEvaluate agent capabilities on standardized task suites

Featured: PDFBench

Helios includes PDFBench, a benchmark of 100 PDF form-filling tasks for evaluating computer-use agents. Each task requires the agent to fill out a PDF form based on conversational instructions.

Learn about PDFBench

Run the PDFBench benchmark suite on your models

Getting Started

Ready to run your first agent? Head to the Quickstart guide.