This guide covers all the ways to run tasks with Helios, from single task execution to interactive debugging.
Basic Execution
Run a task by specifying its directory path:
helios tasks/create-hello-file
The output shows agent progress and the final result:
Task: create-hello-file
Model: gemini/gemini-2.5-computer-use-preview-10-2025
Provider: docker
[Agent] Analyzing task...
[Tool] bash: echo "Hello World" > /home/hello.txt
[Agent] Task completed.
[Verify] Running test.sh...
[Result] PASS (reward: 1.0)
Duration: 12.3s
Output: output/create-hello-file/
With Web Viewer
Add --watch to open the real-time web viewer:
helios tasks/explore-desktop --watch
Then navigate to http://localhost:8080 in your browser.
The web viewer is especially useful for GUI tasks where you can watch the agent interact with the desktop in real-time via VNC.
Selecting Models
Specify a different model with -m or --model:
# Use Claude Sonnet
helios tasks/my-task -m claude-sonnet-4-20250514
# Use Claude Opus via Bedrock
helios tasks/my-task -m bedrock/global.anthropic.claude-opus-4-5-20251101-v1:0
# Use OpenAI
helios tasks/my-task -m openai/computer-use-preview
# Use Gemini (default)
helios tasks/my-task -m gemini/gemini-2.5-computer-use-preview-10-2025
Model Configuration See all supported models and required API keys
Interactive Mode
Enable interactive mode to pause and resume execution:
Key Action pPause/Resume execution
This is useful for:
Debugging agent behavior
Inspecting intermediate state
Understanding agent decision-making
Output Directory
Specify where to save execution outputs:
helios tasks/my-task -o results/experiment-1/
The output directory contains a run folder per execution, for example:
output/<task_name>_<timestamp>/
├── agent/
│ ├── trajectory.json # Complete execution trace (ATIF)
│ └── screenshots/
├── verifier/
├── config.json
└── result.json
See the full schema in the Trajectory Format doc.
Environment Providers
Choose between local Docker or cloud execution:
# Local Docker (default)
helios tasks/my-task --provider docker
# Daytona cloud sandboxes
helios tasks/my-task --provider daytona
Daytona Configuration Set up Daytona for cloud execution
CLI Reference
Run Command
helios [OPTIONS] TASK_PATH
Option Short Description --watchStart web viewer with live updates --model-mModel identifier --interactive-iEnable pause/resume with ‘p’ key --n-attempts-kNumber of attempts for pass@k evaluation --output-oOutput directory (default: output) --providerEnvironment provider: docker or daytona
Examples
Basic
With Model
pass@k
Advanced
# Simple execution
helios tasks/create-hello-file
# With viewer
helios tasks/create-hello-file --watch
# Claude Sonnet
helios tasks/my-task -m claude-sonnet-4-20250514
# With viewer
helios tasks/my-task -m claude-sonnet-4-20250514 --watch
# Run 3 times, pass if any attempt succeeds
helios tasks/my-task -k 3
# Run 5 times with specific model
helios tasks/my-task -k 5 -m claude-sonnet-4-20250514
# Full options
helios tasks/complex-task \
-m bedrock/global.anthropic.claude-opus-4-5-20251101-v1:0 \
-o results/run-001/ \
--watch \
--provider docker
Understanding Output
Console Output
Task: create-hello-file
Model: claude-sonnet-4-20250514
Provider: docker
[Agent] I'll create the file with the required content.
[Tool] bash: echo "Hello World" > /home/hello.txt
[Result] Command executed successfully
[Agent] The file has been created. Let me verify.
[Tool] bash: cat /home/hello.txt
[Result] Hello World
[Agent] Task completed successfully.
[Verify] Running test.sh...
[Verify] PASS: File exists with correct content
═══════════════════════════════════════════════════
Result: PASS
Reward: 1.0
Duration: 8.4s
═══════════════════════════════════════════════════
Output Files
output/create-hello-file_20250101_120000/
├── agent/
│ ├── trajectory.json # Complete execution trace (ATIF)
│ └── screenshots/
│ ├── step_001.png
│ ├── step_002.png
│ └── ...
├── verifier/
├── config.json
└── result.json
Troubleshooting
Ensure the task directory contains both instruction.md and task.toml. ls tasks/my-task/
# Should show: instruction.md task.toml
Build required images first: # For GUI tasks
docker build -t cua-desktop -f docker/Dockerfile.desktop .
# For PDFBench tasks
docker build -t pdfbench-base -f docker/Dockerfile.pdfbench .
Ensure your API keys are set: # Check which key you need
echo $GEMINI_API_KEY
echo $ANTHROPIC_API_KEY
echo $OPENAI_API_KEY
The web viewer uses port 8080. If it’s in use: # Find what's using the port
lsof -i :8080
# Kill the process or use a different task
Increase the timeout in task.toml: [ agent ]
timeout_sec = 300.0 # 5 minutes
Next Steps