What is Helios?
Helios is a Computer Use Agent Framework that executes AI agents in isolated Docker containers with real-time viewing capabilities. Agents run tasks defined as simple directories containing instructions and configuration, with results verified programmatically through test scripts.Quickstart
Get up and running with Helios in under 5 minutes
Core Concepts
Understand the architecture and key components
Create Tasks
Build your own agent tasks from scratch
API Reference
Use Helios programmatically in your Python code
Why Helios?
Safe Isolation
Safe Isolation
Every task runs in its own Docker container, providing complete isolation from your host system. Agents can execute arbitrary commands without risk to your environment.
Deterministic Verification
Deterministic Verification
Tasks are verified using test scripts that check actual outcomes, not model self-assessment. Know with certainty whether an agent completed its task.
Real-time Observability
Real-time Observability
Watch agents work in real-time through the web viewer. See every tool call, screenshot, and decision as it happens.
Multi-Provider Support
Multi-Provider Support
Use any major LLM provider: Anthropic (Claude), OpenAI, Google Gemini, or AWS Bedrock. Switch models with a single flag.
Parallel Execution
Parallel Execution
Run hundreds of tasks concurrently with controlled parallelism. Perfect for benchmarks, regression testing, and batch operations.
Cloud-Native
Cloud-Native
Deploy to Daytona cloud sandboxes for scalable execution without local Docker dependencies.
1
Define a Task
Create a directory with
instruction.md (what to do) and task.toml (configuration)2
Run the Agent
Execute with
helios tasks/your-task - the framework provisions a container and starts the agent3
Agent Executes
The agent uses tools (bash, computer, editor) to complete the task in the isolated environment
4
Verify Results
The
test.sh script runs to verify the task was completed correctlyUse Cases
| Use Case | Description |
|---|---|
| Autonomous QA | Run regression tests with agents that interact with real applications |
| Data Extraction | Scrape websites, process files, and extract structured data |
| GUI Automation | Automate desktop applications through mouse and keyboard |
| OS Workflows | Execute system administration tasks and environment setup |
| Research & Benchmarking | Evaluate agent capabilities on standardized task suites |
Featured: PDFBench
Helios includes PDFBench, a benchmark of 100 PDF form-filling tasks for evaluating computer-use agents. Each task requires the agent to fill out a PDF form based on conversational instructions.Learn about PDFBench
Run the PDFBench benchmark suite on your models