CrewAI agents are great at automating tasks, but API requests can quickly rack up costs when they need live web data. Browserless solves this by letting agents browse the web directly, reducing token usage and avoiding unnecessary API calls.
This guide will show you how to connect CrewAI with Browserless, reuse sessions to save resources, and optimize scraping with BrowserQL for faster, more efficient data collection.
Setting Up Browserless with CrewAI
Why Browserless Enhances CrewAI
CrewAI agents often need to access live web data for research, fact-checking, and scraping real-time information. While large language models (LLMs) are great for processing existing knowledge, they don’t have built-in browsing capabilities. That’s where Browserless can help, as it allows CrewAI agents to interact with web pages just like humans, gathering fresh, up-to-date information from the Internet.
Let's walk through building a few AI agents to ebay scraper that
Step 1: Set Up & Installing Required Dependencies
To start building a web-scraping AI crew with CrewAI, you install Python dependencies to support agent execution, tool integration, and API access.
Step 2: Create a New CrewAI Project
CrewAI has a CLI that scaffolds a project and sets up the recommended structure: configs, source folders, tasks, agents, tools, etc.
Run the following:
This will prompt you to:
- Select a language model provider (e.g.,
openai
) - Choose a model (like
gpt-4
orgpt-4o
) - Input your API key (for OpenAI or other providers)
Step 3: Set the OpenAI Provider and Model
To enable your agents to generate intelligent reports and scrape live websites, you'll need to connect both OpenAI (for the LLM) and Browserless (for headless web automation). CrewAI uses .env to manage these secrets securely and simply.
Set Up OpenAI Access
When initializing your CrewAI project, you’ll be prompted to select your LLM provider and model:
Choose a model based on your needs:
gpt-4
: Best for detailed reasoning and complex logic.gpt-4o
: Cheaper and faster, great for most use cases.
After selecting a model, you’ll be prompted to enter your OpenAI API Key. This key is automatically saved into your local .env
file:
CrewAI will read this key automatically via environment variables no extra config needed.
Sign Up for Browserless
Head to browserless.io and create a free account. Once inside the dashboard, you’ll find your Browserless API Key.
Add it to your .env
file like this:
This token allows your scraping tool to send headless browsing commands using the Browserless BQL API.
Step 4: Configure Your Custom Tool
To enable real-time eBay scraping, you need to create a custom CrewAI tool that sends BQL (Browserless Query Language) requests to Browserless.
You have two main options. Either extract cleaned HTML for your agent to parse, or write out the code to extract just the product data. It is a tradeoff between flexibility and token cost. For this example, we’ll use manual parsing for cost efficiency.
This example will navigate to eBay, extract product data (titles, prices, and links), and return structured results.
You’ll configure this inside custom_tool.py
.
What This Is Doing
- BrowserQL: Sends a GraphQL mutation to a headless browser instance to load eBay, wait for the page to render, and extract product details.
- Data Selection: Targets specific CSS selectors (like
.s-item__title
,.s-item__price
) to extract text content. - Humanlike Behavior: The
humanlike: true
setting reduces bot detection risks. - Proxy Rotation: Ensures requests mimic real traffic from US-based residential IPs.
- Data Return: The tool limits to 10 listings and formats them in clean, readable JSON for downstream analysis.
Step 5: Update Config Files
To enable the scraping and reporting flow, you need to define two agents and two tasks in CrewAI’s configuration system. These go inside agents.yaml
and tasks.yaml
under the config/
directory.
CrewAI reads from these files to understand what each agent does and how tasks are orchestrated, making this a key step in modularizing your crew’s logic.
Define Your Agents
This file sets up two specialized agents: a scraper and a market analyst. Both are given clear roles, goals, and backstories that guide how CrewAI prompts the LLM to behave.
These agents are reusable across different tasks and can be swapped or extended without touching core logic.
Define the Workflow
Now, assign each agent a specific task in the flow. The scraper gathers data from eBay, and the report generator creates a markdown report with grouped insights, trends, and business takeaways.
What This Is Doing
agents.yaml
sets up domain-specific personas for scraping and analysis.tasks.yaml
defines a sequential pipeline scrape first, then analyze and report.- The
{query}
parameter (like"iphone"
) is injected when running the crew, keeping things flexible.
Step 6: Connect Agents & Tasks in Code
Now that we’ve defined our agents and tasks in YAML, we can wire them into a CrewAI Python project by creating the execution logic.
This happens in two core files:
crew.py
: The brain of your project connects agents, tasks, and their execution order.main.py
: The local entry point to run, test, and manage your crew.
Define the Crew Logic
This is where we register our agents and tasks, and link them into a coherent pipeline using CrewAI’s decorators.
What this does:
This class-based setup auto-registers each agent and task using decorators. When .crew().kickoff()
is called, it runs the scraping followed by the reporting step.
Launch and Test the Crew
main.py
is your execution harness. It defines how to run, train, test, or replay crew sessions using the same project configuration.
Step 7: Run the Crew & Generate Your Markdown Report
With everything configured, agents, tasks, custom tool, and orchestration logic, it’s time to run your CrewAI pipeline and generate actionable output.
In your project root, simply run:
What This Is Doing
This command executes your multi-agent system using the main.py
file under the hood. It kicks off the crew defined in crew.py
, passing in the query
input ('iphone'
in our case), which flows into your YAML-defined task templates.
Output: Strategic eBay Markdown Report
Once the crew finishes execution, you’ll find a generated report under:
Here’s a sample of what that output looks like:
eBay iPhone Listings Markdown Report
It includes:
- Executive summary explaining purpose and scope
- Dynamic table of iPhone models, average prices, and listing counts
- Pricing analysis section highlighting low-cost options and high-value models
- Business insights suggesting actionable decisions for buyers or resellers
This example showcases just a fraction of what's possible with CrewAI and Browserless. While we’ve used two agents, a scraper and a market analyst, you can easily expand the crew with additional roles like a pricing strategist, competitor tracker, or even a seller recommendation engine.
You’re also not locked into OpenAI; models like DeepSeek, Mistral, or localized options via Ollama can be plugged in with minor adjustments, giving you full control over privacy, speed, and compute cost. This modularity makes CrewAI an ideal foundation for real-time market monitoring, lead generation, or custom research workflows.
What makes this setup efficient is Browserless’s BQL (Browserless Query Language). It abstracts away all the complexities of navigating dynamic, JavaScript-heavy sites and handling bot detection, no need for messy Selenium scripts or manual DOM parsing.
BQL acts like a headless browser API with built-in resilience. It lets you write declarative, readable queries while it handles CAPTCHAs, waits for content, and mimics human-like behavior. This means fewer boilerplate and bugs and more focus on crafting agent logic that delivers real, structured insight.
Optimizing CrewAI Agents with Browserless Reconnect
Working with CrewAI agents and Browserless, spinning up a brand-new browser session for every task isn't just wasteful, it can slow things down and rack up unnecessary API usage.
Thankfully, Browserless lets us create persistent sessions so agents can reconnect and reuse the same browser instance across multiple steps. That means less loading, fewer tokens, and smoother multi-step scraping workflows.
To start, you’ll just run a simple startSession
mutation. It gives you a sessionId
that you can stash and reuse whenever needed:
Once you’ve got that session ID, it’s as easy as passing it into your next query. Just add it as a parameter like this:
Adding session reuse to your custom_tool.py
makes a big difference. Your agent can now click, scroll, or scrape multiple pages without relaunching the browser every time. This is perfect for pagination, follow-up actions, or making your tools feel faster and more reliable without bloated logic or extra complexity. If you want to go further, you could even persist the session ID across tasks or store it in a temporary file to make reuse seamless.
Reducing Token Usage When Agents Scrape the Web
Scraping with CrewAI agents is powerful, but pulling in unnecessary resources like images, media, or third-party scripts can slow things down and waste tokens.
The Browserless reject mutation gives you a clean way to reduce bloat. You can block specific resource types, like images or videos, before they load, keeping sessions lightweight and focused on the data you care about.
Here’s a simple BQL example that tells Browserless to reject image and media requests before hitting the page:
To integrate this into your custom CrewAI tool, drop the reject
block into your GraphQL payload. Here's how that part of the mutation looks inside the custom_tool.py
logic:
This tweak keeps your scraping sessions fast and efficient. Paired with session reuse and selective JavaScript execution, it’s one of the easiest ways to minimize load time and reduce your token bill.
Using Proxy Rotation & Smart Browsing
Scraping at scale means dealing with IP bans, CAPTCHAs, and detection systems. You'll get flagged fast if your CrewAI agents hit eBay or similar sites too frequently from a single IP.
That’s where Browserless’s built-in residential proxy support comes in. With just a few config tweaks, filter which requests to send through the proxy, assign proxies by country, and enable sticky sessions to maintain continuity when needed.
Enabling proxy rotation in Browserless is as simple as passing options in your GraphQL query. Here’s how you’d set it up to route all traffic through a US residential IP:
Using proxies smartly helps you stay under the radar, reduce request failures, and scrape reliably without wasting time (or tokens) retrying blocked pages. It’s a must-have for any serious CrewAI scraping workflow.
Conclusion
Making it this far, you’ve seen how much Browserless can level up your CrewAI agents. From reusing sessions to rotating proxies, blocking junk, and trimming unnecessary JavaScript, it all adds up to faster scrapes, fewer errors, and way less token burn. What’s great is that you don’t need to reinvent the wheel. Browserless handles the heavy lifting, stealth, optimization, and performance so you can focus on building smart agents and getting clean, useful data. If you’re curious, grab a free Browserless trial and plug it into your CrewAI flows.
FAQ
How does Browserless improve CrewAI's ability to scrape live web data?
Browserless allows CrewAI agents to interact with web pages like humans, bypassing JavaScript rendering issues, bot detection, and CAPTCHAs. This gives agents reliable access to live web content without relying on expensive APIs or outdated datasets.
Can Browserless reduce token usage for CrewAI agents?
Yes, Browserless helps CrewAI agents significantly reduce token consumption by reusing browser sessions, blocking unnecessary assets like images and ads, and batching actions together in a single request. This means fewer tokens are needed to complete scraping tasks efficiently.
What are the best practices for optimizing CrewAI with Browserless?
The best practices include reusing sessions, rotating proxies, limiting JavaScript execution, blocking unwanted page elements, and caching frequently accessed pages. These strategies prevent token waste and improve scraping speed and reliability.
Why is proxy rotation important when using CrewAI and Browserless?
Proxy rotation prevents agents from being flagged or blocked by websites for repeated requests from the same IP. By rotating residential IPs and mimicking human browsing behavior, agents can access more data consistently without getting rate-limited or blacklisted.