Wrestling with raw HTML is a rite of passage for many developers. Whether you're building a data pipeline, a monitoring tool, or an automation script, the first step is always the messiest: diving into the chaotic world of nested divs, cryptic class names, and inconsistent DOM structures. Traditional web scraping feels less like engineering and more like digital archaeology—painstakingly digging for data and hoping the site's structure doesn't shift and bury your work overnight.
What if you could skip the digging? What if you could simply ask a website for the information you want and get a clean, perfectly structured JSON object in return?
This isn't a far-off dream; it's the core principle behind browse.do. Our AI-powered agent is designed to be your intelligent intermediary, transforming the unstructured web into developer-friendly data. Let's explore how we turn messy HTML into clean JSON, automatically.
For years, web data extraction has followed a rigid and fragile pattern:
This process is tedious, error-prone, and creates a significant maintenance burden. Your code becomes tightly coupled to the website's presentation layer, making it brittle by design.
browse.do throws out the old playbook. Instead of telling the machine how to find the data with selectors, you simply tell it what data you want in plain English.
Our AI agent handles the "how."
Consider this simple task: getting the top story from Hacker News. With a traditional scraper, you'd be hunting for the right <tr> and a.titlelink. With browse.do, the process is as simple as writing a function call:
import { browse } from "@do-inc/agents";
async function getTopHackerNewsStory() {
const result = await browse.do({
url: "https://news.ycombinator.com",
objective: "Find the title of the top story and its URL."
});
// The magic is in result.data
console.log(result.data);
// Expected output:
// {
// "title": "Some Fascinating Tech Story",
// "url": "https://example.com/fascinating-story"
// }
return result.data;
}
getTopHackerNewsStory();
Notice what's happening here. The objective, "Find the title of the top story and its URL", is a high-level instruction. The agent, running in a full headless browser environment, interprets this goal and produces a structured JSON object. You didn't write a single selector.
So, how does the agent turn that simple sentence into a clean JSON object? It's a combination of semantic understanding, contextual awareness, and intelligent inference.
A human looking at Hacker News immediately understands the concept of a "top story." It's the one at the top, it's numbered '1', and it's stylistically presented as the main item. Our AI agent mimics this human intuition. It analyzes the page holistically, considering:
Because it understands the meaning, it's not reliant on a fragile id or class. If the site's CSS changes but the semantic structure remains, browse.do adapts.
The most powerful feature is the agent's ability to infer the desired JSON structure directly from your request.
Let's imagine a more complex workflow on an e-commerce site.
Objective: "On this product page, get the product name, the price, and a list of the 3 most recent customer reviews. For each review, I need the reviewer's name and their star rating."
The result.data from browse.do would look something like this, created automatically:
{
"productName": "High-Performance Wireless Mouse",
"price": "$79.99",
"customerReviews": [
{
"reviewerName": "Alice",
"starRating": 5
},
{
"reviewerName": "Bob",
"starRating": 4
},
{
"reviewerName": "Charlie",
"starRating": 5
}
]
}
This automatic structuring is the key. You save countless lines of code that would otherwise be spent parsing disparate elements and painstakingly stitching them together into a coherent object.
By moving from imperative selectors to declarative objectives, browse.do offers a fundamentally better approach to web automation and data extraction.
Stop wrestling with HTML. Start working with clean, structured data.
Ready to turn any website into a clean API? Try browse.do today and experience the future of web automation.