How to Use Browserless with CrewAI

contents

CrewAI agents are great at automating tasks, but API requests can quickly rack up costs when they need live web data. Browserless solves this by letting agents browse the web directly, reducing token usage and avoiding unnecessary API calls.

This guide will show you how to connect CrewAI with Browserless, reuse sessions to save resources, and optimize scraping with BrowserQL for faster, more efficient data collection.

Setting Up Browserless with CrewAI

Why Browserless Enhances CrewAI

CrewAI agents often need to access live web data for research, fact-checking, and scraping real-time information. While large language models (LLMs) are great for processing existing knowledge, they don’t have built-in browsing capabilities. That’s where Browserless can help, as it allows CrewAI agents to interact with web pages just like humans, gathering fresh, up-to-date information from the Internet.

Let's walk through building a few AI agents to ebay scraper that

Step 1: Set Up & Installing Required Dependencies

To start building a web-scraping AI crew with CrewAI, you install Python dependencies to support agent execution, tool integration, and API access.


pip install pydantic requests python-dotenv PyYAML

Step 2: Create a New CrewAI Project

CrewAI has a CLI that scaffolds a project and sets up the recommended structure: configs, source folders, tasks, agents, tools, etc.

Run the following:


crewai create crew ebay-scraper-crew

This will prompt you to:

  • Select a language model provider (e.g., openai)
  • Choose a model (like gpt-4 or gpt-4o)
  • Input your API key (for OpenAI or other providers)

Step 3: Set the OpenAI Provider and Model

To enable your agents to generate intelligent reports and scrape live websites, you'll need to connect both OpenAI (for the LLM) and Browserless (for headless web automation). CrewAI uses .env to manage these secrets securely and simply.

Set Up OpenAI Access

When initializing your CrewAI project, you’ll be prompted to select your LLM provider and model:


1. gpt-4
2. gpt-4o
3. Gpt-4o-mini

Choose a model based on your needs:

  • gpt-4: Best for detailed reasoning and complex logic.
  • gpt-4o: Cheaper and faster, great for most use cases.

After selecting a model, you’ll be prompted to enter your OpenAI API Key. This key is automatically saved into your local .env file:


OPENAI_API_KEY=sk-xxxxxx…

CrewAI will read this key automatically via environment variables no extra config needed.

Sign Up for Browserless

Head to browserless.io and create a free account. Once inside the dashboard, you’ll find your Browserless API Key.

Add it to your .env file like this:


BROWSERLESS_API_KEY=YOUR_BROWSERLESS_API

This token allows your scraping tool to send headless browsing commands using the Browserless BQL API.

Step 4: Configure Your Custom Tool

To enable real-time eBay scraping, you need to create a custom CrewAI tool that sends BQL (Browserless Query Language) requests to Browserless.

You have two main options. Either extract cleaned HTML for your agent to parse, or write out the code to extract just the product data. It is a tradeoff between flexibility and token cost. For this example, we’ll use manual parsing for cost efficiency.

This example will navigate to eBay, extract product data (titles, prices, and links), and return structured results.

You’ll configure this inside custom_tool.py.


from crewai.tools import BaseTool
from typing import Type
from pydantic import BaseModel, Field
import requests
import json

# Define input schema — the tool requires a string argument for compatibility
class MyCustomToolInput(BaseModel):
    argument: str = Field(..., description="Search term for eBay (currently unused but required).")

# Define the custom CrewAI scraping tool
class MyCustomTool(BaseTool):
    name: str = "EbayScraperTool"
    description: str = (
        "This tool scrapes eBay for iPhone listings using Browserless BQL, "
        "returning a JSON array of product title, price, and link."
    )
    args_schema: Type[BaseModel] = MyCustomToolInput

    def _run(self, argument: str) -> str:
        # Browserless BQL endpoint
        endpoint = "https://production-sfo.browserless.io/chrome/bql"

        # Query string includes your Browserless token and scraping behavior
        query_string = {
            "token": "YOUR_BROWSERLESS_API_KEY", Replace with your own token or load from env
            "proxy": "residential",
            "proxySticky": "true",
            "proxyCountry": "us",
            "humanlike": "true",
        }

        headers = {
            "Content-Type": "application/json",
        }

        # This BQL query visits eBay and maps over product listings
        payload = {
            "query": """mutation ScrapeEbayProductListings {
  goto(url: "https://www.ebay.com/sch/i.html?_nkw=iphone", waitUntil: firstContentfulPaint) {
    status
  }
  waitForTimeout(time: 3000) {
    time
  }
  listings: mapSelector(selector: "li.s-item") {
    title: mapSelector(selector: ".s-item__title", wait: true) {
      text: innerText
    }
    price: mapSelector(selector: ".s-item__price", wait: true) {
      amount: innerText
    }
    link: mapSelector(selector: ".s-item__link") {
      url: attribute(name: "href") {
        value
      }
    }
  }
}
""",
            "operationName": "ScrapeEbayProductListings",
        }

        # Make the actual request to Browserless
        response = requests.post(endpoint, params=query_string, headers=headers, json=payload)

        # Handle error cases
        if response.status_code != 200:
            return f"Failed to scrape eBay. Status code: {response.status_code}"

        # Parse and simplify the JSON output
        data = response.json()
        raw_listings = data.get("data", {}).get("listings", [])[:10]  # Limit to 10 items for brevity

        simplified = []
        for item in raw_listings:
            title = item.get("title", [{}])[0].get("text")
            price = item.get("price", [{}])[0].get("amount")
            link = item.get("link", [{}])[0].get("url", {}).get("value")
            if title and price and link:
                simplified.append({
                    "title": title,
                    "price": price,
                    "link": link
                })

        # Return as a formatted JSON string
        return json.dumps(simplified, indent=2)

What This Is Doing

  • BrowserQL: Sends a GraphQL mutation to a headless browser instance to load eBay, wait for the page to render, and extract product details.
  • Data Selection: Targets specific CSS selectors (like .s-item__title, .s-item__price) to extract text content.
  • Humanlike Behavior: The humanlike: true setting reduces bot detection risks.
  • Proxy Rotation: Ensures requests mimic real traffic from US-based residential IPs.
  • Data Return: The tool limits to 10 listings and formats them in clean, readable JSON for downstream analysis.

Step 5: Update Config Files

To enable the scraping and reporting flow, you need to define two agents and two tasks in CrewAI’s configuration system. These go inside agents.yaml and tasks.yaml under the config/ directory.

CrewAI reads from these files to understand what each agent does and how tasks are orchestrated, making this a key step in modularizing your crew’s logic.

Define Your Agents

This file sets up two specialized agents: a scraper and a market analyst. Both are given clear roles, goals, and backstories that guide how CrewAI prompts the LLM to behave.


# config/agents.yaml

scraper:
  role: >
    eBay Product Data Scraper
  goal: >
    Scrape product titles and prices from eBay for the given {query}
  backstory: >
    You arere a browser automation expert skilled in extracting structured data from real-time websites.
    You specialize in bypassing front-end complexity to quickly gather product titles and pricing
    information from pages like eBay using headless browsers and modern scraping tools.

report_generator:
  role: >
    eBay Product Market Analyst
  goal: >
    Analyze eBay product data to extract trends, price insights, and generate a strategic markdown report.
  backstory: >
    You are a data-savvy product analyst specializing in e-commerce marketplaces like eBay.
    Your expertise lies in identifying pricing trends, common product variants, and market opportunities.
    You transform raw scraped data into actionable insights — such as average prices per iPhone model,
    standout deals, and patterns that could inform buying, selling, or marketing strategies.
    Your reports are structured, concise, and useful to product managers or resellers looking to make data-driven decisions.

These agents are reusable across different tasks and can be swapped or extended without touching core logic.

Define the Workflow

Now, assign each agent a specific task in the flow. The scraper gathers data from eBay, and the report generator creates a markdown report with grouped insights, trends, and business takeaways.


# config/tasks.yaml

scrape_task:
  description: >
    Visit eBay, search for {query}, and scrape product titles and prices using Browserless.
    Return a structured JSON block with at least 10 product results.
  expected_output: >
    A JSON array of product objects, each including a title and price field extracted from eBay.
  agent: scraper

report_task:
  description: >
    Using the eBay scrape results, generate a comprehensive markdown report that summarizes key product listings,
    groups data by iPhone model, highlights average prices, identifies standout deals or pricing outliers,
    and extracts any notable trends or opportunities. 

    Include a professional title, executive summary, insights section with markdown tables, and a conclusion that offers
    actionable suggestions (e.g., best model to resell, models with highest demand, etc.).

  expected_output: >
    A structured markdown report with the following sections:
    - A clear title and brief intro
    - A table grouped by iPhone model with average prices and count of listings
    - A section highlighting best deals or pricing trends
    - A business insight or recommendation summary based on the data
  agent: report_generator
  context:
    - scrape_task
  output_file: output/ebay_report.md

What This Is Doing

  • agents.yaml sets up domain-specific personas for scraping and analysis.
  • tasks.yaml defines a sequential pipeline scrape first, then analyze and report.
  • The {query} parameter (like "iphone") is injected when running the crew, keeping things flexible.

Step 6: Connect Agents & Tasks in Code

Now that we’ve defined our agents and tasks in YAML, we can wire them into a CrewAI Python project by creating the execution logic.

This happens in two core files:

  • crew.py: The brain of your project connects agents, tasks, and their execution order.
  • main.py: The local entry point to run, test, and manage your crew.

Define the Crew Logic

This is where we register our agents and tasks, and link them into a coherent pipeline using CrewAI’s decorators.


# src/scraping_ebay_crew/crew.py

from crewai import Agent, Crew, Process, Task
from crewai.project import CrewBase, agent, crew, task
from scraping_ebay_crew.tools.custom_tool import MyCustomTool  # Custom Browserless scraper tool

@CrewBase
class ScrapingEbayCrew():
    """ScrapingEbayCrew crew"""

    agents_config = 'config/agents.yaml'
    tasks_config = 'config/tasks.yaml'

    @agent
    def scraper(self) -> Agent:
        return Agent(
            config=self.agents_config['scraper'],
            verbose=True,
            tools=[MyCustomTool()]  # 🛠️ Attach scraper tool
        )

    @agent
    def report_generator(self) -> Agent:
        return Agent(
            config=self.agents_config['report_generator'],
            verbose=True
        )

    @task
    def scrape_task(self) -> Task:
        return Task(
            config=self.tasks_config['scrape_task']
        )

    @task
    def report_task(self) -> Task:
        return Task(
            config=self.tasks_config['report_task']
        )

    @crew
    def crew(self) -> Crew:
        """Creates and orchestrates the crew pipeline"""
        return Crew(
            agents=self.agents,
            tasks=self.tasks,
            process=Process.sequential,  # Scrape first, then report
            verbose=True
        )

What this does:

This class-based setup auto-registers each agent and task using decorators. When .crew().kickoff() is called, it runs the scraping followed by the reporting step.

Launch and Test the Crew

main.py is your execution harness. It defines how to run, train, test, or replay crew sessions using the same project configuration.


# src/scraping_ebay_crew/main.py

#!/usr/bin/env python
import sys
import warnings
from datetime import datetime
from scraping_ebay_crew.crew import ScrapingEbayCrew

warnings.filterwarnings("ignore", category=SyntaxWarning, module="pysbd")

def run():
    """
    Run the crew locally with sample inputs.
    """
    inputs = {
        'query': 'iphone'  # This will fill the {query} variable in tasks.yaml
    }

    try:
        ScrapingEbayCrew().crew().kickoff(inputs=inputs)
    except Exception as e:
        raise Exception(f"An error occurred while running the crew: {e}")

# Additional utility functions for testing and debugging (optional)
def train():
    ...
def replay():
    ...
def test():
    ...

Step 7: Run the Crew & Generate Your Markdown Report

With everything configured, agents, tasks, custom tool, and orchestration logic, it’s time to run your CrewAI pipeline and generate actionable output.

In your project root, simply run:


crewai run

What This Is Doing

This command executes your multi-agent system using the main.py file under the hood. It kicks off the crew defined in crew.py, passing in the query input ('iphone' in our case), which flows into your YAML-defined task templates.

Output: Strategic eBay Markdown Report

Once the crew finishes execution, you’ll find a generated report under:


output/ebay_report.md

Here’s a sample of what that output looks like:

eBay iPhone Listings Markdown Report


## Executive Summary

This report provides a detailed markdown analysis of iPhone listings on eBay. The main purpose of this report is to summarize key listings, highlight average prices, identify standout deals and pricing outliers, and present any notable trends or market opportunities.

## iPhone Listing Analysis

Here's a summary of our product listing analysis, grouped by iPhone model:

---

#### iPhone Models

| Model | Count | Average Price |
|---|---|---|
| iPhone 8 | 1 | $174.99 |
| iPhone SE 2nd Gen | 1 | $149.99 |
| iPhone SE 3rd Gen | 1 | $194.99 |
| iPhone 11 Pro Max | 1 | $359.99 |
...

It includes:

  • Executive summary explaining purpose and scope
  • Dynamic table of iPhone models, average prices, and listing counts
  • Pricing analysis section highlighting low-cost options and high-value models
  • Business insights suggesting actionable decisions for buyers or resellers

This example showcases just a fraction of what's possible with CrewAI and Browserless. While we’ve used two agents, a scraper and a market analyst, you can easily expand the crew with additional roles like a pricing strategist, competitor tracker, or even a seller recommendation engine.

You’re also not locked into OpenAI; models like DeepSeek, Mistral, or localized options via Ollama can be plugged in with minor adjustments, giving you full control over privacy, speed, and compute cost. This modularity makes CrewAI an ideal foundation for real-time market monitoring, lead generation, or custom research workflows.

What makes this setup efficient is Browserless’s BQL (Browserless Query Language). It abstracts away all the complexities of navigating dynamic, JavaScript-heavy sites and handling bot detection, no need for messy Selenium scripts or manual DOM parsing.

BQL acts like a headless browser API with built-in resilience. It lets you write declarative, readable queries while it handles CAPTCHAs, waits for content, and mimics human-like behavior. This means fewer boilerplate and bugs and more focus on crafting agent logic that delivers real, structured insight.

Optimizing CrewAI Agents with Browserless Reconnect

Working with CrewAI agents and Browserless, spinning up a brand-new browser session for every task isn't just wasteful, it can slow things down and rack up unnecessary API usage.

Thankfully, Browserless lets us create persistent sessions so agents can reconnect and reuse the same browser instance across multiple steps. That means less loading, fewer tokens, and smoother multi-step scraping workflows.

To start, you’ll just run a simple startSession mutation. It gives you a sessionId that you can stash and reuse whenever needed:


mutation {
  startSession { id }
}

Once you’ve got that session ID, it’s as easy as passing it into your next query. Just add it as a parameter like this:


query_string = {
  "token": "YOUR_BROWSERLESS_API_KEY",
  "sessionId": self.session_id,  # Reuse instead of reloading!
  ...
}

Adding session reuse to your custom_tool.py makes a big difference. Your agent can now click, scroll, or scrape multiple pages without relaunching the browser every time. This is perfect for pagination, follow-up actions, or making your tools feel faster and more reliable without bloated logic or extra complexity. If you want to go further, you could even persist the session ID across tasks or store it in a temporary file to make reuse seamless.

Reducing Token Usage When Agents Scrape the Web

Scraping with CrewAI agents is powerful, but pulling in unnecessary resources like images, media, or third-party scripts can slow things down and waste tokens.

The Browserless reject mutation gives you a clean way to reduce bloat. You can block specific resource types, like images or videos, before they load, keeping sessions lightweight and focused on the data you care about.

Here’s a simple BQL example that tells Browserless to reject image and media requests before hitting the page:


mutation RejectAssets {
  reject(type: [image, media]) {
    enabled
    time
  }
  goto(url: "https://www.ebay.com/sch/i.html?_nkw=iphone", waitUntil: firstContentfulPaint) {
    status
    time
  }
}

To integrate this into your custom CrewAI tool, drop the reject block into your GraphQL payload. Here's how that part of the mutation looks inside the custom_tool.py logic:


"query": """
  mutation ScrapeEbayProductListings {
    reject(type: [image, media]) {
      enabled
      time
    }
    goto(url: "https://www.ebay.com/sch/i.html?_nkw=iphone", waitUntil: firstContentfulPaint) {
      status
    }
    ...
  }
"""

This tweak keeps your scraping sessions fast and efficient. Paired with session reuse and selective JavaScript execution, it’s one of the easiest ways to minimize load time and reduce your token bill.

Using Proxy Rotation & Smart Browsing

Scraping at scale means dealing with IP bans, CAPTCHAs, and detection systems. You'll get flagged fast if your CrewAI agents hit eBay or similar sites too frequently from a single IP.

That’s where Browserless’s built-in residential proxy support comes in. With just a few config tweaks, filter which requests to send through the proxy, assign proxies by country, and enable sticky sessions to maintain continuity when needed.

Enabling proxy rotation in Browserless is as simple as passing options in your GraphQL query. Here’s how you’d set it up to route all traffic through a US residential IP:


mutation ScrapeWithProxy {
  proxy(
    type: [document, xhr],
    country: US
    sticky: true
  ) {
    time
  }
  goto(
    url: "https://www.ebay.com/sch/i.html?_nkw=iphone"
    waitUntil: firstContentfulPaint
  ) {
    status
  }
}

Using proxies smartly helps you stay under the radar, reduce request failures, and scrape reliably without wasting time (or tokens) retrying blocked pages. It’s a must-have for any serious CrewAI scraping workflow.

Conclusion

Making it this far, you’ve seen how much Browserless can level up your CrewAI agents. From reusing sessions to rotating proxies, blocking junk, and trimming unnecessary JavaScript, it all adds up to faster scrapes, fewer errors, and way less token burn. What’s great is that you don’t need to reinvent the wheel. Browserless handles the heavy lifting, stealth, optimization, and performance so you can focus on building smart agents and getting clean, useful data. If you’re curious, grab a free Browserless trial and plug it into your CrewAI flows.

FAQ

How does Browserless improve CrewAI's ability to scrape live web data?

Browserless allows CrewAI agents to interact with web pages like humans, bypassing JavaScript rendering issues, bot detection, and CAPTCHAs. This gives agents reliable access to live web content without relying on expensive APIs or outdated datasets.

Can Browserless reduce token usage for CrewAI agents?

Yes, Browserless helps CrewAI agents significantly reduce token consumption by reusing browser sessions, blocking unnecessary assets like images and ads, and batching actions together in a single request. This means fewer tokens are needed to complete scraping tasks efficiently.

What are the best practices for optimizing CrewAI with Browserless?

The best practices include reusing sessions, rotating proxies, limiting JavaScript execution, blocking unwanted page elements, and caching frequently accessed pages. These strategies prevent token waste and improve scraping speed and reliability.

Why is proxy rotation important when using CrewAI and Browserless?

Proxy rotation prevents agents from being flagged or blocked by websites for repeated requests from the same IP. By rotating residential IPs and mimicking human browsing behavior, agents can access more data consistently without getting rate-limited or blacklisted.

Share this article

Ready to try the benefits of Browserless?