Browser Agents

Browser Agents execute complex, multi-step workflows using autonomous AI decision-making with no step limits.

Agent Types

Dari provides two types of browser agents, each with unique capabilities:

Browser Use Agent

DOM-based automation with unlimited steps

Computer Use Agent

Vision-based control with extended reasoning

Browser Use Agent

Step type: dom_browser_use_agent The Browser Use Agent executes complex multi-step workflows using DOM inspection and autonomous AI decision-making.

How It Works

Step Limit: None - runs until task completion
State Management: Maintains agent state and history between steps
Resumability: Can resume if interrupted
Timeout: 600 seconds per invocation

When to Use

Perfect for:

Complex multi-page workflows
Tasks requiring conditional logic and adaptation
Login flows with unpredictable steps
Multi-step form filling with validation
Tasks where you don’t know exactly how many steps needed

Not suitable for:

Simple 1-2 action tasks (use Browser Actions instead)
Precise data extraction (use Browser Code instead)
Tasks requiring exact pixel-perfect control
When cost optimization is critical

Cost

Higher than Browser Actions due to no fixed step limit. Cost scales with workflow complexity.

Example Use Cases

Complete Checkout Flow

{
  "type": "dom_browser_use_agent",
  "task": "Complete the entire checkout process from cart to confirmation"
}

{
  "type": "dom_browser_use_agent",
  "task": "Navigate through multi-page signup wizard with email verification"
}

Complex Shopping Task

{
  "type": "dom_browser_use_agent",
  "task": "Search for {{product_category}}, filter by price under {{max_price}}, and add 3 items to cart"
}

Key Features

State Persistence: Maintains context across steps
Task Completion Detection: Knows when the task is done
Login Support: Can use stored credentials via tools
Adaptive Behavior: Adjusts to unexpected page states

Computer Use Agent

Step type: browser_use_agent The Computer Use Agent uses Claude’s Computer Use capabilities for vision-based browser control.

How It Works

Vision-Based: Takes screenshots and uses vision to understand page state
Extended Thinking: 1024 token thinking budget for complex reasoning
Custom Tools: screenshot, click, type, scroll, key presses, mouse movement
Perception: Image-based understanding of page layout
Timeout: 300 seconds

When to Use

Perfect for:

Complex visual tasks requiring layout understanding
Tasks where text-based DOM inspection isn’t enough
Multi-step flows requiring visual verification
Scenarios needing precise element identification by appearance
Pages with complex visual hierarchies

Not suitable for:

Simple DOM-based interactions (use Browser Use Agent instead)
When vision overhead isn’t necessary
Extremely time-sensitive operations
When cost optimization is paramount

Cost

Similar to Browser Use Agent, with additional vision API overhead.

Example Use Cases

Visual Button Finding

{
  "type": "browser_use_agent",
  "task": "Find and click the green 'Confirm' button in the modal"
}

Layout-Based Form Filling

{
  "type": "browser_use_agent",
  "task": "Fill out the multi-column form based on visual layout with these values: {{form_data}}"
}

Visual Verification

{
  "type": "browser_use_agent",
  "task": "Navigate through image-based CAPTCHA verification"
}

Key Features

Vision Understanding: Understands visual layout and design
Extended Reasoning: More complex decision-making capability
Visual Verification: Can verify results visually
Flexible Interaction: Works with any visual interface

Choosing Between Agent Types

Scenario	Browser Use Agent	Computer Use Agent
DOM elements are clearly accessible	✅ Recommended	⚠️ Overkill
Visual layout understanding needed	⚠️ Limited	✅ Recommended
Multi-step workflow	✅ Excellent	✅ Excellent
Complex reasoning required	✅ Good	✅ Better
Cost optimization important	✅ Better	⚠️ Higher cost

Start with Browser Use Agent for most multi-step tasks. Upgrade to Computer Use Agent only when visual understanding is truly necessary.

Best Practices

Do ✅

Provide clear task descriptions
Use agents for genuinely complex workflows
Let the agent adapt to unexpected states
Use stored credentials for login flows
Monitor execution logs for optimization

Don’t ❌

Use agents for simple 1-2 step tasks
Over-specify every single action
Assume agents will handle impossible tasks
Ignore timeout limits
Use when Browser Code would be more reliable

Configuration Best Practices

Clear Task Descriptions

{
  "type": "dom_browser_use_agent",
  "task": "Log in to the admin panel, navigate to settings, and enable dark mode"
}

With Variables

{
  "type": "dom_browser_use_agent",
  "task": "Search for '{{product_name}}', add to cart, and proceed to checkout",
  "variables": {
    "product_name": "wireless headphones"
  }
}

When to Downgrade

Consider switching to Browser Actions when:

The task is actually simple and predictable
You’re always doing the same 1-2 actions
Caching would provide significant cost savings
State management isn’t needed

Consider switching to Browser Code when:

You need precise, reliable logic
Data extraction is the primary goal
The agent’s decisions are too unpredictable
You can write explicit logic

Browser Actions

For simple, quick interactions

Browser Code

For precise programmatic control

Getting Started

Visual Workflow Builder

Programmatic Actions

Browser Agents

Agent Types

Browser Use Agent

Computer Use Agent

Browser Use Agent

How It Works

When to Use

Cost

Example Use Cases

Complete Checkout Flow

Complex Shopping Task

Key Features

Computer Use Agent

How It Works

When to Use

Cost

Example Use Cases

Visual Button Finding

Layout-Based Form Filling

Visual Verification

Key Features

Choosing Between Agent Types

Best Practices

Do ✅

Don’t ❌

Configuration Best Practices

Clear Task Descriptions

With Variables

When to Downgrade

Browser Actions

Browser Code

Getting Started

Visual Workflow Builder

Programmatic Actions

​Agent Types

Browser Use Agent

Computer Use Agent

​Browser Use Agent

​How It Works

​When to Use

​Cost

​Example Use Cases

​Complete Checkout Flow

​Multi-Page Signup

​Complex Shopping Task

​Key Features

​Computer Use Agent

​How It Works

​When to Use

​Cost

​Example Use Cases

​Visual Button Finding

​Layout-Based Form Filling

​Visual Verification

​Key Features

​Choosing Between Agent Types

​Best Practices

​Do ✅

​Don’t ❌

​Configuration Best Practices

​Clear Task Descriptions

​With Variables

​When to Downgrade

​Related Guides

Browser Actions

Browser Code

Agent Types

Browser Use Agent

How It Works

When to Use

Cost

Example Use Cases

Complete Checkout Flow

Multi-Page Signup

Complex Shopping Task

Key Features

Computer Use Agent

How It Works

When to Use

Cost

Example Use Cases

Visual Button Finding

Layout-Based Form Filling

Visual Verification

Key Features

Choosing Between Agent Types

Best Practices

Do ✅

Don’t ❌

Configuration Best Practices

Clear Task Descriptions

With Variables

When to Downgrade

Related Guides