Skip to main content
Browser Agents execute complex, multi-step workflows using autonomous AI decision-making with no step limits.

Agent Types

Dari provides two types of browser agents, each with unique capabilities:

Browser Use Agent

DOM-based automation with unlimited steps

Computer Use Agent

Vision-based control with extended reasoning

Browser Use Agent

Step type: dom_browser_use_agent The Browser Use Agent executes complex multi-step workflows using DOM inspection and autonomous AI decision-making.

How It Works

  • Step Limit: None - runs until task completion
  • State Management: Maintains agent state and history between steps
  • Resumability: Can resume if interrupted
  • Timeout: 600 seconds per invocation

When to Use

Perfect for:
  • Complex multi-page workflows
  • Tasks requiring conditional logic and adaptation
  • Login flows with unpredictable steps
  • Multi-step form filling with validation
  • Tasks where you don’t know exactly how many steps needed
Not suitable for:
  • Simple 1-2 action tasks (use Browser Actions instead)
  • Precise data extraction (use Browser Code instead)
  • Tasks requiring exact pixel-perfect control
  • When cost optimization is critical

Cost

Higher than Browser Actions due to no fixed step limit. Cost scales with workflow complexity.

Example Use Cases

Complete Checkout Flow

{
  "type": "dom_browser_use_agent",
  "task": "Complete the entire checkout process from cart to confirmation"
}

Multi-Page Signup

{
  "type": "dom_browser_use_agent",
  "task": "Navigate through multi-page signup wizard with email verification"
}

Complex Shopping Task

{
  "type": "dom_browser_use_agent",
  "task": "Search for {{product_category}}, filter by price under {{max_price}}, and add 3 items to cart"
}

Key Features

  • State Persistence: Maintains context across steps
  • Task Completion Detection: Knows when the task is done
  • Login Support: Can use stored credentials via tools
  • Adaptive Behavior: Adjusts to unexpected page states

Computer Use Agent

Step type: browser_use_agent The Computer Use Agent uses Claude’s Computer Use capabilities for vision-based browser control.

How It Works

  • Vision-Based: Takes screenshots and uses vision to understand page state
  • Extended Thinking: 1024 token thinking budget for complex reasoning
  • Custom Tools: screenshot, click, type, scroll, key presses, mouse movement
  • Perception: Image-based understanding of page layout
  • Timeout: 300 seconds

When to Use

Perfect for:
  • Complex visual tasks requiring layout understanding
  • Tasks where text-based DOM inspection isn’t enough
  • Multi-step flows requiring visual verification
  • Scenarios needing precise element identification by appearance
  • Pages with complex visual hierarchies
Not suitable for:
  • Simple DOM-based interactions (use Browser Use Agent instead)
  • When vision overhead isn’t necessary
  • Extremely time-sensitive operations
  • When cost optimization is paramount

Cost

Similar to Browser Use Agent, with additional vision API overhead.

Example Use Cases

Visual Button Finding

{
  "type": "browser_use_agent",
  "task": "Find and click the green 'Confirm' button in the modal"
}

Layout-Based Form Filling

{
  "type": "browser_use_agent",
  "task": "Fill out the multi-column form based on visual layout with these values: {{form_data}}"
}

Visual Verification

{
  "type": "browser_use_agent",
  "task": "Navigate through image-based CAPTCHA verification"
}

Key Features

  • Vision Understanding: Understands visual layout and design
  • Extended Reasoning: More complex decision-making capability
  • Visual Verification: Can verify results visually
  • Flexible Interaction: Works with any visual interface

Choosing Between Agent Types

ScenarioBrowser Use AgentComputer Use Agent
DOM elements are clearly accessible✅ Recommended⚠️ Overkill
Visual layout understanding needed⚠️ Limited✅ Recommended
Multi-step workflow✅ Excellent✅ Excellent
Complex reasoning required✅ Good✅ Better
Cost optimization important✅ Better⚠️ Higher cost
Start with Browser Use Agent for most multi-step tasks. Upgrade to Computer Use Agent only when visual understanding is truly necessary.

Best Practices

Do ✅

  • Provide clear task descriptions
  • Use agents for genuinely complex workflows
  • Let the agent adapt to unexpected states
  • Use stored credentials for login flows
  • Monitor execution logs for optimization

Don’t ❌

  • Use agents for simple 1-2 step tasks
  • Over-specify every single action
  • Assume agents will handle impossible tasks
  • Ignore timeout limits
  • Use when Browser Code would be more reliable

Configuration Best Practices

Clear Task Descriptions

{
  "type": "dom_browser_use_agent",
  "task": "Log in to the admin panel, navigate to settings, and enable dark mode"
}

With Variables

{
  "type": "dom_browser_use_agent",
  "task": "Search for '{{product_name}}', add to cart, and proceed to checkout",
  "variables": {
    "product_name": "wireless headphones"
  }
}

When to Downgrade

Consider switching to Browser Actions when:
  • The task is actually simple and predictable
  • You’re always doing the same 1-2 actions
  • Caching would provide significant cost savings
  • State management isn’t needed
Consider switching to Browser Code when:
  • You need precise, reliable logic
  • Data extraction is the primary goal
  • The agent’s decisions are too unpredictable
  • You can write explicit logic