Keep Puppeteer browsers alive with our Reconnect API

contents

If you use Puppeteer, you probably know that it opens a fresh Chrome instance every time it runs.

That’s okay, but it does introduce challenges, such as:

  • You’ll need to redo any required logins
  • Any bot detectors will re-check your credentials
  • Every piece of CSS is reloaded from scratch each time
  • You can’t easily run a sequence of scripts

That’s why we’ve launched the Reconnect API to make it easy to reuse a browser.

Downsides of puppeteer.connect()

Sure, you could use the puppeteer.connect() method. It does allow you to attach Puppeteer to an existing browser instance.

But, it requires you to get involved with specifying ports and browserURLs.

It can work for a small, local deployment, but gets unruly when you’re running thousands of automations in a cloud environment.

A fully revamped API for Browserless v2

Last year we launched Browserless v2, which is more deeply involved at the CDP level. One of the big advantages of this was always going to be improved session management, of which reconnections are the first step.

If you’re familiar with v1, you may have come across the keepAlive function. It involved GraphQL calls and a trackingID, so there was room for improvement.

Now with the new reconnect, there’s minimal coding required. Here’s a basic example from the docs with a timeout and captcha listening:


import puppeteer from 'puppeteer-core';
const sleep = (ms) => new Promise((res) => setTimeout(res, ms));
const queryParams = new URLSearchParams({
  token: "YOUR_API_KEY" ,
  timeout: 60000,
}).toString();

// Recaptcha
(async() => {
  const browser = await puppeteer.connect({
    browserWSEndpoint: `wss://chrome.browserless.io/chromium?${queryParams}`,
  });
  const page = await browser.newPage();
  const cdp = await page.createCDPSession();
  await page.goto('https://www.example.com');

  // Allow this browser to run for 1 minute, then shut down if nothing connects to it.
  // Defaults to the overall timeout set on the instance, which is 5 minutes if not specified.
  const { error, browserWSEndpoint } = await cdp.send('Browserless.reconnect', {
    timeout: 60000,
  });

  if (error) throw error;
  console.log(`${browserWSEndpoint}?${queryParams}`);

  await browser.close();
  //Reconnect using the browserWSEndpoint that was returned from the CDP command.
  const browserReconnect = await puppeteer.connect({
    browserWSEndpoint: `${browserWSEndpoint}?${queryParams}`,
  });
  const [pageReconnect] = await browserReconnect.pages();  
  await sleep(2000);
  await pageReconnect.screenshot({
    path: 'reconnected.png',
    fullPage: true,
  }); 
  await browserReconnect.close();

})().catch((e) => {
  console.error(e);
  process.exit(1);
});

This will let a new script connect to the same browser instance until the timeout is reached.  In the background, we’re pinging the function to keep it alive then making sure there’s no memory leaks once it closes.

You can modify this by setting custom authentications, or by restarting the timer each time Browserless.reconnect is called.

Using reconnect with our captcha solving or Hybrid automations

Browser reconnections are ideal for using in combination with our other CDP APIs.

With Hybrid Automations you can create human-in-the-loop scripts, where you stream a login window to your user. With the reconnect API, you can maintain this logged in session for use with multiple automation scripts.

Similarly, getting past bot detectors with our captcha unblocking or solving inevitably adds a delay while the detectors make a decision. You can use the reconnect API to avoid this repeated delay by using the same session for things like scraping a set of pages .

Want to use the reconnect API in your automations?

Due to infrastructure limitations, the reconnect API is only available for our enterprise users.

Please get in touch if you would like to discuss your use case with our team.

Share this article

Ready to try the benefits of Browserless?