Skip to main content
Agents interact with the environment using three core tools. These are provided to the LLM as callable functions and executed by the Helios framework.

Overview

ToolAvailabilityDescription
bashAll modesExecute shell commands
editorAll modesView and edit files
computerGUI mode onlyMouse and keyboard interaction

bash

The bash tool executes shell commands inside the container. This is the primary tool for file operations, package management, and running scripts.

Capabilities

  • Execute any shell command
  • Create, read, update, delete files
  • Install packages
  • Run scripts and programs
  • Chain commands with &&, ||, ;

Examples

# Create a file
echo "Hello World" > /home/hello.txt

# Read a file
cat /home/hello.txt

# Create a directory
mkdir -p /home/project/src

# Copy files
cp /home/source.txt /home/dest.txt

# Move/rename files
mv /home/old.txt /home/new.txt

# Delete files
rm /home/unwanted.txt

Best Practices

Always use absolute paths (/home/file.txt) rather than relative paths.
Use flags like -y for apt-get to avoid interactive prompts.
Use || true or proper error handling for commands that might fail.

editor

The editor tool provides structured file editing capabilities. It’s more reliable than shell commands for complex file modifications.

Operations

OperationDescription
viewView file contents (optionally with line range)
createCreate a new file with content
str_replaceReplace a string in a file
insertInsert text at a specific line

Examples

# View entire file
editor(command="view", path="/home/script.py")

# View specific line range
editor(command="view", path="/home/script.py", view_range=[1, 50])

When to Use Editor vs Bash

ScenarioRecommended Tool
Create small filebash (echo)
Create multi-line fileeditor (create)
Simple appendbash (>>)
Replace specific texteditor (str_replace)
View file contentseditor (view)
Complex text manipulationeditor

computer

The computer tool enables GUI interaction through mouse and keyboard actions. Only available when gui = true in task.toml.

Actions

ActionParametersDescription
left_clickcoordinate: [x, y]Click at position
right_clickcoordinate: [x, y]Right-click at position
double_clickcoordinate: [x, y]Double-click at position
triple_clickcoordinate: [x, y]Triple-click (select line)
middle_clickcoordinate: [x, y]Middle-click at position
mouse_movecoordinate: [x, y]Move mouse to position
left_click_dragcoordinate: [x, y], endCoordinate: [x, y]Drag from start to end
typetext: stringType text
keykey: stringPress key
scrollcoordinate: [x, y], direction: "up"/"down"Scroll at position
screenshot-Capture screen

Examples

# Click a button
computer(action="left_click", coordinate=[500, 300])

# Right-click for context menu
computer(action="right_click", coordinate=[500, 300])

# Double-click to open
computer(action="double_click", coordinate=[100, 200])

# Drag and drop
computer(
    action="left_click_drag",
    coordinate=[100, 100],
    endCoordinate=[300, 300]
)

# Scroll down
computer(action="scroll", coordinate=[500, 400], direction="down")

Key Names

Common key names for the key action:
CategoryKeys
Modifiersctrl, alt, shift, super
NavigationUp, Down, Left, Right, Home, End, Page_Up, Page_Down
EditingReturn, Tab, BackSpace, Delete, Escape
FunctionF1 - F12
Combine modifiers with +: ctrl+c, ctrl+shift+s, alt+F4

Coordinate System

Coordinates are in pixels from the top-left corner of the screen:
  • [0, 0] = top-left corner
  • [1920, 1080] = bottom-right on a 1080p display
The default resolution for GUI tasks is 1024x768. Agents receive screenshots and can calculate coordinates based on visible elements.

GUI Workflow

A typical GUI interaction pattern:
# 1. Take a screenshot to see the current state
computer(action="screenshot")

# 2. Identify target coordinates from the screenshot
# 3. Perform action
computer(action="left_click", coordinate=[500, 300])

# 4. Type if needed
computer(action="type", text="search query")

# 5. Press Enter
computer(action="key", key="Return")

# 6. Take another screenshot to verify
computer(action="screenshot")

Tool Selection Guidelines

Use bash when...

  • Installing packages
  • Running scripts
  • File system operations
  • Command-line tools
  • Non-interactive tasks

Use editor when...

  • Creating multi-line files
  • Precise text replacement
  • Viewing file contents
  • Structured code editing

Use computer when...

  • Browser interaction required
  • Desktop app automation
  • Visual verification needed
  • Form filling
  • GUI-only applications

Combine tools when...

  • Complex workflows
  • Mixing CLI and GUI
  • Verifying GUI actions with bash

Next Steps