Skip to main content
This guide covers all the ways to run tasks with Helios, from single task execution to interactive debugging.

Basic Execution

Run a task by specifying its directory path:
helios tasks/create-hello-file
The output shows agent progress and the final result:
Task: create-hello-file
Model: gemini/gemini-2.5-computer-use-preview-10-2025
Provider: docker

[Agent] Analyzing task...
[Tool] bash: echo "Hello World" > /home/hello.txt
[Agent] Task completed.
[Verify] Running test.sh...
[Result] PASS (reward: 1.0)

Duration: 12.3s
Output: output/create-hello-file/

With Web Viewer

Add --watch to open the real-time web viewer:
helios tasks/explore-desktop --watch
Then navigate to http://localhost:8080 in your browser.
The web viewer is especially useful for GUI tasks where you can watch the agent interact with the desktop in real-time via VNC.

Selecting Models

Specify a different model with -m or --model:
# Use Claude Sonnet
helios tasks/my-task -m claude-sonnet-4-20250514

# Use Claude Opus via Bedrock
helios tasks/my-task -m bedrock/global.anthropic.claude-opus-4-5-20251101-v1:0

# Use OpenAI
helios tasks/my-task -m openai/computer-use-preview

# Use Gemini (default)
helios tasks/my-task -m gemini/gemini-2.5-computer-use-preview-10-2025

Model Configuration

See all supported models and required API keys

Interactive Mode

Enable interactive mode to pause and resume execution:
helios tasks/my-task -i
KeyAction
pPause/Resume execution
This is useful for:
  • Debugging agent behavior
  • Inspecting intermediate state
  • Understanding agent decision-making

Output Directory

Specify where to save execution outputs:
helios tasks/my-task -o results/experiment-1/
The output directory contains a run folder per execution, for example:
output/<task_name>_<timestamp>/
├── agent/
│   ├── trajectory.json      # Complete execution trace (ATIF)
│   └── screenshots/
├── verifier/
├── config.json
└── result.json
See the full schema in the Trajectory Format doc.

Environment Providers

Choose between local Docker or cloud execution:
# Local Docker (default)
helios tasks/my-task --provider docker

# Daytona cloud sandboxes
helios tasks/my-task --provider daytona

Daytona Configuration

Set up Daytona for cloud execution

CLI Reference

Run Command

helios [OPTIONS] TASK_PATH
OptionShortDescription
--watchStart web viewer with live updates
--model-mModel identifier
--interactive-iEnable pause/resume with ‘p’ key
--n-attempts-kNumber of attempts for pass@k evaluation
--output-oOutput directory (default: output)
--providerEnvironment provider: docker or daytona

Examples

# Simple execution
helios tasks/create-hello-file

# With viewer
helios tasks/create-hello-file --watch

Understanding Output

Console Output

Task: create-hello-file
Model: claude-sonnet-4-20250514
Provider: docker

[Agent] I'll create the file with the required content.
[Tool] bash: echo "Hello World" > /home/hello.txt
[Result] Command executed successfully
[Agent] The file has been created. Let me verify.
[Tool] bash: cat /home/hello.txt
[Result] Hello World
[Agent] Task completed successfully.

[Verify] Running test.sh...
[Verify] PASS: File exists with correct content

═══════════════════════════════════════════════════
  Result: PASS
  Reward: 1.0
  Duration: 8.4s
═══════════════════════════════════════════════════

Output Files

output/create-hello-file_20250101_120000/
├── agent/
│   ├── trajectory.json      # Complete execution trace (ATIF)
│   └── screenshots/
│       ├── step_001.png
│       ├── step_002.png
│       └── ...
├── verifier/
├── config.json
└── result.json

Troubleshooting

Ensure the task directory contains both instruction.md and task.toml.
ls tasks/my-task/
# Should show: instruction.md task.toml
Build required images first:
# For GUI tasks
docker build -t cua-desktop -f docker/Dockerfile.desktop .

# For PDFBench tasks
docker build -t pdfbench-base -f docker/Dockerfile.pdfbench .
Ensure your API keys are set:
# Check which key you need
echo $GEMINI_API_KEY
echo $ANTHROPIC_API_KEY
echo $OPENAI_API_KEY
The web viewer uses port 8080. If it’s in use:
# Find what's using the port
lsof -i :8080

# Kill the process or use a different task
Increase the timeout in task.toml:
[agent]
timeout_sec = 300.0  # 5 minutes

Next Steps