Overview
| Tool | Availability | Description |
|---|---|---|
bash | All modes | Execute shell commands |
editor | All modes | View and edit files |
computer | GUI mode only | Mouse and keyboard interaction |
bash
Thebash tool executes shell commands inside the container. This is the primary tool for file operations, package management, and running scripts.
Capabilities
- Execute any shell command
- Create, read, update, delete files
- Install packages
- Run scripts and programs
- Chain commands with
&&,||,;
Examples
- File Operations
- Package Management
- Running Scripts
- Chained Commands
Best Practices
Use absolute paths
Use absolute paths
Always use absolute paths (
/home/file.txt) rather than relative paths.Chain related commands
Chain related commands
Suppress interactive prompts
Suppress interactive prompts
Use flags like
-y for apt-get to avoid interactive prompts.Handle errors gracefully
Handle errors gracefully
Use
|| true or proper error handling for commands that might fail.editor
Theeditor tool provides structured file editing capabilities. It’s more reliable than shell commands for complex file modifications.
Operations
| Operation | Description |
|---|---|
view | View file contents (optionally with line range) |
create | Create a new file with content |
str_replace | Replace a string in a file |
insert | Insert text at a specific line |
Examples
- View
- Create
- Replace
- Insert
When to Use Editor vs Bash
| Scenario | Recommended Tool |
|---|---|
| Create small file | bash (echo) |
| Create multi-line file | editor (create) |
| Simple append | bash (>>) |
| Replace specific text | editor (str_replace) |
| View file contents | editor (view) |
| Complex text manipulation | editor |
computer
Thecomputer tool enables GUI interaction through mouse and keyboard actions. Only available when gui = true in task.toml.
Actions
| Action | Parameters | Description |
|---|---|---|
left_click | coordinate: [x, y] | Click at position |
right_click | coordinate: [x, y] | Right-click at position |
double_click | coordinate: [x, y] | Double-click at position |
triple_click | coordinate: [x, y] | Triple-click (select line) |
middle_click | coordinate: [x, y] | Middle-click at position |
mouse_move | coordinate: [x, y] | Move mouse to position |
left_click_drag | coordinate: [x, y], endCoordinate: [x, y] | Drag from start to end |
type | text: string | Type text |
key | key: string | Press key |
scroll | coordinate: [x, y], direction: "up"/"down" | Scroll at position |
screenshot | - | Capture screen |
Examples
- Mouse Actions
- Keyboard Actions
- Screenshots
Key Names
Common key names for thekey action:
| Category | Keys |
|---|---|
| Modifiers | ctrl, alt, shift, super |
| Navigation | Up, Down, Left, Right, Home, End, Page_Up, Page_Down |
| Editing | Return, Tab, BackSpace, Delete, Escape |
| Function | F1 - F12 |
+: ctrl+c, ctrl+shift+s, alt+F4
Coordinate System
Coordinates are in pixels from the top-left corner of the screen:[0, 0]= top-left corner[1920, 1080]= bottom-right on a 1080p display
The default resolution for GUI tasks is 1024x768. Agents receive screenshots and can calculate coordinates based on visible elements.
GUI Workflow
A typical GUI interaction pattern:Tool Selection Guidelines
Use bash when...
- Installing packages
- Running scripts
- File system operations
- Command-line tools
- Non-interactive tasks
Use editor when...
- Creating multi-line files
- Precise text replacement
- Viewing file contents
- Structured code editing
Use computer when...
- Browser interaction required
- Desktop app automation
- Visual verification needed
- Form filling
- GUI-only applications
Combine tools when...
- Complex workflows
- Mixing CLI and GUI
- Verifying GUI actions with bash