Running Tasks

This guide covers all the ways to run tasks with Helios, from single task execution to interactive debugging.

Basic Execution

Run a task by specifying its directory path:

helios tasks/create-hello-file

The output shows agent progress and the final result:

Task: create-hello-file
Model: gemini/gemini-2.5-computer-use-preview-10-2025
Provider: docker

[Agent] Analyzing task...
[Tool] bash: echo "Hello World" > /home/hello.txt
[Agent] Task completed.
[Verify] Running test.sh...
[Result] PASS (reward: 1.0)

Duration: 12.3s
Output: output/create-hello-file/

With Web Viewer

Add --watch to open the real-time web viewer:

helios tasks/explore-desktop --watch

Then navigate to http://localhost:8080 in your browser.

The web viewer is especially useful for GUI tasks where you can watch the agent interact with the desktop in real-time via VNC.

Selecting Models

Specify a different model with -m or --model:

# Use Claude Sonnet
helios tasks/my-task -m claude-sonnet-4-20250514

# Use Claude Opus via Bedrock
helios tasks/my-task -m bedrock/global.anthropic.claude-opus-4-5-20251101-v1:0

# Use OpenAI
helios tasks/my-task -m openai/computer-use-preview

# Use Gemini (default)
helios tasks/my-task -m gemini/gemini-2.5-computer-use-preview-10-2025

Model Configuration

See all supported models and required API keys

Interactive Mode

Enable interactive mode to pause and resume execution:

helios tasks/my-task -i

Key	Action
`p`	Pause/Resume execution

This is useful for:

Debugging agent behavior
Inspecting intermediate state
Understanding agent decision-making

Output Directory

Specify where to save execution outputs:

helios tasks/my-task -o results/experiment-1/

The output directory contains a run folder per execution, for example:

output/<task_name>_<timestamp>/
├── agent/
│   ├── trajectory.json      # Complete execution trace (ATIF)
│   └── screenshots/
├── verifier/
├── config.json
└── result.json

See the full schema in the Trajectory Format doc.

Environment Providers

Choose between local Docker or cloud execution:

# Local Docker (default)
helios tasks/my-task --provider docker

# Daytona cloud sandboxes
helios tasks/my-task --provider daytona

Daytona Configuration

Set up Daytona for cloud execution

CLI Reference

Run Command

helios [OPTIONS] TASK_PATH

Option	Short	Description
`--watch`		Start web viewer with live updates
`--model`	`-m`	Model identifier
`--interactive`	`-i`	Enable pause/resume with ‘p’ key
`--n-attempts`	`-k`	Number of attempts for pass@k evaluation
`--output`	`-o`	Output directory (default: `output`)
`--provider`		Environment provider: `docker` or `daytona`

Examples

Basic
With Model
pass@k
Advanced

# Simple execution
helios tasks/create-hello-file

# With viewer
helios tasks/create-hello-file --watch

# Claude Sonnet
helios tasks/my-task -m claude-sonnet-4-20250514

# With viewer
helios tasks/my-task -m claude-sonnet-4-20250514 --watch

# Run 3 times, pass if any attempt succeeds
helios tasks/my-task -k 3

# Run 5 times with specific model
helios tasks/my-task -k 5 -m claude-sonnet-4-20250514

# Full options
helios tasks/complex-task \
  -m bedrock/global.anthropic.claude-opus-4-5-20251101-v1:0 \
  -o results/run-001/ \
  --watch \
  --provider docker

Understanding Output

Console Output

Task: create-hello-file
Model: claude-sonnet-4-20250514
Provider: docker

[Agent] I'll create the file with the required content.
[Tool] bash: echo "Hello World" > /home/hello.txt
[Result] Command executed successfully
[Agent] The file has been created. Let me verify.
[Tool] bash: cat /home/hello.txt
[Result] Hello World
[Agent] Task completed successfully.

[Verify] Running test.sh...
[Verify] PASS: File exists with correct content

═══════════════════════════════════════════════════
  Result: PASS
  Reward: 1.0
  Duration: 8.4s
═══════════════════════════════════════════════════

Output Files

output/create-hello-file_20250101_120000/
├── agent/
│   ├── trajectory.json      # Complete execution trace (ATIF)
│   └── screenshots/
│       ├── step_001.png
│       ├── step_002.png
│       └── ...
├── verifier/
├── config.json
└── result.json

Troubleshooting

Task not found

Ensure the task directory contains both instruction.md and task.toml.

ls tasks/my-task/
# Should show: instruction.md task.toml

Docker image not found

Build required images first:

# For GUI tasks
docker build -t cua-desktop -f docker/Dockerfile.desktop .

# For PDFBench tasks
docker build -t pdfbench-base -f docker/Dockerfile.pdfbench .

API key errors

Ensure your API keys are set:

# Check which key you need
echo $GEMINI_API_KEY
echo $ANTHROPIC_API_KEY
echo $OPENAI_API_KEY

Port 8080 in use

The web viewer uses port 8080. If it’s in use:

# Find what's using the port
lsof -i :8080

# Kill the process or use a different task

Agent timeout

Increase the timeout in task.toml:

[agent]
timeout_sec = 300.0  # 5 minutes

Getting Started

Tasks

Execution

Infrastructure

Benchmarks

Reference

Development

Basic Execution

With Web Viewer

Selecting Models

Model Configuration

Interactive Mode

Output Directory

Environment Providers

Daytona Configuration

CLI Reference

Run Command

Examples

Understanding Output

Console Output

Output Files

Troubleshooting

Next Steps

Batch Execution

Web Viewer

Getting Started

Tasks

Execution

Infrastructure

Benchmarks

Reference

Development

​Basic Execution

​With Web Viewer

​Selecting Models

Model Configuration

​Interactive Mode

​Output Directory

​Environment Providers

Daytona Configuration

​CLI Reference

​Run Command

​Examples

​Understanding Output

​Console Output

​Output Files

​Troubleshooting

​Next Steps

Batch Execution

Web Viewer

Basic Execution

With Web Viewer

Selecting Models

Interactive Mode

Output Directory

Environment Providers

CLI Reference

Run Command

Examples

Understanding Output

Console Output

Output Files

Troubleshooting

Next Steps