Task Structure
Every task is a directory with the following structure:Only
instruction.md and task.toml are required. The environment/ and tests/ directories are optional but recommended.Quick Start: Create Your First Task
1
Create the task directory
2
Write the instruction
Create
tasks/my-first-task/instruction.md:3
Create the configuration
Create
tasks/my-first-task/task.toml:4
Add verification
Create
tasks/my-first-task/tests/test.sh:5
Run the task
Task Types
Headless Tasks (CLI/Terminal)
For tasks that don’t require a graphical interface:- Create files and directories
- Install and configure software
- Run scripts and process data
- Interact with APIs
GUI Tasks (Desktop)
For tasks requiring mouse and keyboard interaction:1
Build the desktop image (once)
2
Configure the task
- Browser automation
- Desktop application interaction
- Form filling
- Visual testing
PDF Form Tasks
For PDF form-filling tasks (part of PDFBench):1
Build the PDFBench base image (once)
2
Configure the task
Custom Dockerfiles
For tasks requiring specific software, create a custom Dockerfile:Example: Python Environment
environment/Dockerfile:
task.toml:
Example: Node.js Environment
environment/Dockerfile:
Example: GUI with Additional Tools
environment/Dockerfile:
Writing Good Instructions
Be Specific and Measurable
- Bad
- Good
Include Success Criteria
- Bad
- Good
Specify File Paths
- Bad
- Good
Example Tasks
Example 1: File Creation (Easy)
Example 2: Web Scraping (Medium)
Example 3: GUI Browser Task (Hard)
Troubleshooting
Task not found
Task not found
Ensure your task directory contains both
instruction.md and task.toml.Verification always fails
Verification always fails
- Test your
test.shscript manually inside a container - Check that
/logs/verifier/directory is created - Verify the reward file path is exactly
/logs/verifier/reward.txt
Docker build fails
Docker build fails
- Check your Dockerfile syntax
- Ensure base images exist (
docker pull ubuntu:22.04) - Increase
build_timeout_secfor complex builds
Agent times out
Agent times out
Increase
timeout_sec in the [agent] section, or simplify the task.GUI not working
GUI not working
- Ensure
gui = truein task.toml - Build the desktop image:
docker build -t cua-desktop -f docker/Dockerfile.desktop . - Use
--watchto see the live view