Skip to main content
The helios CLI is the primary interface for running tasks.

Commands

helios (run)

Run a single task.
helios [OPTIONS] TASK_PATH
Arguments:
ArgumentDescription
TASK_PATHPath to the task directory
Options:
OptionShortDefaultDescription
--watchfalseStart web viewer at localhost:8080
--port-p8080Port to run the web viewer on (when using —watch)
--model-mGeminiModel identifier
--interactive-ifalseEnable pause/resume with ‘p’ key
--n-attempts-k1Number of attempts (for pass@k evaluation)
--output-ooutputOutput directory
--providerdockerEnvironment: docker or daytona
Examples:
# Basic run
helios tasks/create-hello-file

# With web viewer
helios tasks/explore-desktop --watch

# With web viewer on custom port
helios tasks/explore-desktop --watch --port 3000

# With specific model
helios tasks/my-task -m claude-sonnet-4-20250514

# Interactive mode
helios tasks/my-task -i

# pass@k: run 3 times, pass if any attempt succeeds
helios tasks/my-task -k 3

# Custom output directory
helios tasks/my-task -o results/experiment-1/

# Using Daytona
helios tasks/my-task --provider daytona

helios batch

Run multiple tasks in parallel.
helios batch [OPTIONS] DIRECTORY
Arguments:
ArgumentDescription
DIRECTORYDirectory containing tasks
Options:
OptionShortDefaultDescription
--concurrent-n2Number of concurrent tasks
--n-attempts-k1Number of attempts per task (for pass@k)
--model-mGeminiModel identifier
--output-ooutputOutput directory
--quiet-qfalseShow only aggregate progress
--pattern-p**/task.tomlGlob pattern for finding tasks
--providerdockerEnvironment: docker or daytona
Examples:
# Basic batch
helios batch tasks/ -n 4

# With model selection
helios batch tasks/ -n 4 -m claude-sonnet-4-20250514

# pass@k evaluation: run each task 3 times
helios batch tasks/ -n 4 -k 3

# Custom output
helios batch tasks/ -n 4 -o results/run-001/

# Quiet mode
helios batch tasks/ -n 4 --quiet

# Custom pattern
helios batch tasks/pdfbench/ -p "**/pdfbench_eyemed*/task.toml"

# High concurrency with Daytona
helios batch tasks/ -n 20 --provider daytona

helios dev

Run in development mode with mock data.
helios dev
This starts the web viewer with mock traces for UI development without LLM costs.

Environment Variables

LLM Providers

# Google Gemini (default)
GEMINI_API_KEY=your-key
# or
GOOGLE_API_KEY=your-key

# Anthropic
ANTHROPIC_API_KEY=your-key

# OpenAI
OPENAI_API_KEY=your-key

# AWS Bedrock
AWS_ACCESS_KEY_ID=your-key
AWS_SECRET_ACCESS_KEY=your-secret
AWS_REGION=us-east-1

Cloud Providers

# Daytona
DAYTONA_API_KEY=your-key

Debugging

# Log level
CUA_LOG_LEVEL=DEBUG  # DEBUG, INFO, WARNING, ERROR

Model Identifiers

ProviderModel ID
Geminigemini/gemini-2.5-computer-use-preview-10-2025
Anthropicclaude-sonnet-4-20250514
Anthropicclaude-opus-4-20250514
Bedrockbedrock/global.anthropic.claude-sonnet-4-20250514-v1:0
Bedrockbedrock/global.anthropic.claude-opus-4-5-20251101-v1:0
OpenAIopenai/computer-use-preview

Output Structure

Single Task

output/task-name_20250101_120000/
├── agent/
│   ├── trajectory.json      # Complete execution trace (ATIF)
│   └── screenshots/
├── verifier/
├── config.json
└── result.json

Batch

output/batch_20250101_120000/
├── batch_summary.json
├── 001_task-1/
│   ├── agent/
│   │   └── trajectory.json
│   ├── verifier/
│   ├── config.json
│   └── result.json
├── 002_task-2/
│   ├── agent/
│   │   └── trajectory.json
│   ├── verifier/
│   ├── config.json
│   └── result.json
└── ...
For -k runs, each task folder includes attempt_001/, attempt_002/, etc.

Exit Codes

CodeMeaning
0Success (all tasks passed)
1Failure (one or more tasks failed)
2Error (execution error)

Tips

The web viewer shows real-time execution. Essential for GUI tasks.
Begin with -n 2 and increase based on system resources.
Press ‘p’ to pause and inspect agent state with -i.
helios batch tasks/ -n 4 -o results/$(date +%Y%m%d)/