Agent Types
Dari provides two types of browser agents, each with unique capabilities:Browser Use Agent
DOM-based automation with unlimited steps
Computer Use Agent
Vision-based control with extended reasoning
Browser Use Agent
Step type:dom_browser_use_agent
The Browser Use Agent executes complex multi-step workflows using DOM inspection and autonomous AI decision-making.
How It Works
- Step Limit: None - runs until task completion
- State Management: Maintains agent state and history between steps
- Resumability: Can resume if interrupted
- Timeout: 600 seconds per invocation
When to Use
Perfect for:
- Complex multi-page workflows
- Tasks requiring conditional logic and adaptation
- Login flows with unpredictable steps
- Multi-step form filling with validation
- Tasks where you don’t know exactly how many steps needed
- Simple 1-2 action tasks (use Browser Actions instead)
- Precise data extraction (use Browser Code instead)
- Tasks requiring exact pixel-perfect control
- When cost optimization is critical
Cost
Higher than Browser Actions due to no fixed step limit. Cost scales with workflow complexity.Example Use Cases
Complete Checkout Flow
Multi-Page Signup
Complex Shopping Task
Key Features
- State Persistence: Maintains context across steps
- Task Completion Detection: Knows when the task is done
- Login Support: Can use stored credentials via tools
- Adaptive Behavior: Adjusts to unexpected page states
Computer Use Agent
Step type:browser_use_agent
The Computer Use Agent uses Claude’s Computer Use capabilities for vision-based browser control.
How It Works
- Vision-Based: Takes screenshots and uses vision to understand page state
- Extended Thinking: 1024 token thinking budget for complex reasoning
- Custom Tools: screenshot, click, type, scroll, key presses, mouse movement
- Perception: Image-based understanding of page layout
- Timeout: 300 seconds
When to Use
Perfect for:
- Complex visual tasks requiring layout understanding
- Tasks where text-based DOM inspection isn’t enough
- Multi-step flows requiring visual verification
- Scenarios needing precise element identification by appearance
- Pages with complex visual hierarchies
- Simple DOM-based interactions (use Browser Use Agent instead)
- When vision overhead isn’t necessary
- Extremely time-sensitive operations
- When cost optimization is paramount
Cost
Similar to Browser Use Agent, with additional vision API overhead.Example Use Cases
Visual Button Finding
Layout-Based Form Filling
Visual Verification
Key Features
- Vision Understanding: Understands visual layout and design
- Extended Reasoning: More complex decision-making capability
- Visual Verification: Can verify results visually
- Flexible Interaction: Works with any visual interface
Choosing Between Agent Types
| Scenario | Browser Use Agent | Computer Use Agent |
|---|---|---|
| DOM elements are clearly accessible | ✅ Recommended | ⚠️ Overkill |
| Visual layout understanding needed | ⚠️ Limited | ✅ Recommended |
| Multi-step workflow | ✅ Excellent | ✅ Excellent |
| Complex reasoning required | ✅ Good | ✅ Better |
| Cost optimization important | ✅ Better | ⚠️ Higher cost |
Best Practices
Do ✅
- Provide clear task descriptions
- Use agents for genuinely complex workflows
- Let the agent adapt to unexpected states
- Use stored credentials for login flows
- Monitor execution logs for optimization
Don’t ❌
- Use agents for simple 1-2 step tasks
- Over-specify every single action
- Assume agents will handle impossible tasks
- Ignore timeout limits
- Use when Browser Code would be more reliable
Configuration Best Practices
Clear Task Descriptions
With Variables
When to Downgrade
Consider switching to Browser Actions when:- The task is actually simple and predictable
- You’re always doing the same 1-2 actions
- Caching would provide significant cost savings
- State management isn’t needed
- You need precise, reliable logic
- Data extraction is the primary goal
- The agent’s decisions are too unpredictable
- You can write explicit logic