Chaining Actions: Building Complex Multi-Step Web Workflows with a Simple API

Web automation has always promised to save us time and effort, but the reality is often a tangled mess of brittle scripts. Traditional tools force us to rely on fragile CSS selectors and XPath queries that break with the slightest website update. Building a simple workflow, like logging into a site and downloading a report, can turn into a maintenance nightmare. What if you could just describe what you wanted to do, and an AI agent would handle the rest?

At browse.do, we're turning that "what if" into a reality. Our AI-powered agent lets you transform complex browser actions into simple API calls. You provide a high-level objective, and our agent navigates, clicks, types, and extracts data just like a human would.

Today, we're moving beyond single commands to explore one of the most powerful features of browse.do: building robust, multi-step workflows.

The Old Way: A Chain of Breakable Links

Think about the classic approach to automating a multi-step task, like checking an order status on an e-commerce site. Your script would look something like this:

Navigate to the login page.
Wait for the username field (input#user-email) to be visible.
Type the username.
Wait for the password field (input[name='password']) to be visible.
Type the password.
Click the submit button (button.btn-primary.login).
Wait for the page to navigate to the dashboard.
Click the "My Orders" link (a[href='/orders']).
Find the latest order (div.order-history > .order:first-child).
Extract the status text from a specific child element (span.status-text).

This entire chain is a house of cards. If a developer changes a class name from .btn-primary to .btn-main, your script breaks. If a new marketing modal pops up, your "wait" logic fails. This isn't automation; it's a constant, reactive chore.

The browse.do Approach: From Scripts to Stories

With browse.do, you stop thinking in selectors and start thinking in objectives. Instead of providing a rigid set of instructions, you tell the AI a story about what you want to accomplish. The agent understands context, handles interruptions, and intelligently finds the elements it needs to complete the task.

Let's see how we can chain actions together into a single, powerful command.

Example: Automating a SaaS Quote Request

Imagine you want to automate the process of getting a quote from a competitor's website. The workflow involves multiple steps: navigating to the pricing page, selecting a plan, filling out a form, and capturing the confirmation.

With traditional tools, this is a complex, multi-part script. With browse.do, it's a single objective.

Here’s how you'd do it with our API:

import { browse } from "@do-inc/agents";

async function getSaaSQuote() {
  const objective = `
    Go to the pricing page, find the "Enterprise" plan, and click 
    the button to get a quote. On the contact form, fill in the 
    following details:
    - Company Name: ACME Industries
    - Number of Employees: 500+
    - Work Email: lead@acme-industries.com
    
    After submitting the form, find the confirmation message and 
    return its text.
  `;

  const result = await browse.do({
    url: "https://fictional-saas-website.com",
    objective: objective,
  });

  // The AI agent returns structured data based on your request.
  console.log(result.data);
  // Expected output:
  // {
  //   "confirmationMessage": "Thank you for your request! Our sales team will be in touch shortly."
  // }

  return result.data;
}

getSaaSQuote();

That’s it. You described the entire multi-step workflow in plain English.

How Does It Work? The Power of AI-Powered Navigation

When browse.do receives this objective, it doesn't just look for keywords. Our AI agent performs a series of intelligent actions in a full, stateful headless browser environment:

Understands the Goal: It parses the natural language objective into a sequence of sub-tasks: [Navigate -> Find Plan -> Click -> Fill Form -> Submit -> Extract Confirmation].
Maintains Context: The agent knows that after clicking "Get a Quote," it should expect to be on a new page with a form. It manages sessions and cookies automatically, just like a real browser.
Locates Elements Intelligently: It doesn't need input#company-name. It looks for a form field semantically labeled "Company Name" and interacts with it. This makes the automation resilient to minor UI and code changes.
Handles Dynamic Content: Because it operates in a full browser, it can render JavaScript, wait for SPAs to load, and handle any dynamic content that would trip up simpler web scraping tools.
Returns Structured Data: Finally, it understands that the goal is to "return the text" of the "confirmation message" and structures the output into a clean, predictable JSON object.

By chaining actions within a single, descriptive objective, you're not just automating a task—you're encapsulating an entire human workflow into one simple, robust API call. This is the future of Robotic Process Automation (RPA) and data extraction.

Ready to stop wrestling with brittle selectors and start building powerful web automation that just works?