In this article we'll look at managing sessions and their data when using Puppeteer. That includes:
Storing and reusing cookies
Reconnecting to browsers
Using captcha-approved endpoints
Managing the --user-data-dir
These techniques are essential if you're performing complex workflows behind login pages, or want to avoid repeat actions such as getting past bot detectors. The clean slate model of automation libraries means it's up to you to connect cookies to instances, secure credentials, synchronize the localstorage and much more.
So let's dive into how to manage sessions.
Freshly baked cookies
Currently, one of the most used approaches is to do a login, then to save the session data to disk. Take for instance, this typical session-data-saving example
and to add on that, if you have multiple instances, you also need to implement a logic for synchronizing the data among instances. Not to mention that there's no way to save HTTP-only cookies. It can get very messy very quickly.
Luckily, that's one of the things we realized at Browserless, and we created a painkiller to tackle this issue with ease and cleanly.
Reconnects with Browserless
With Browserless, you can keep the browser session alive and kicking in the background, while your instances connect remotely to run with the existing cookies, localStorage and sessionStorage.
To reconnect to an existing browser is as easy as calling Browserless.reconnect to get the web socket URL to connect to this existing browser.
const init = async () => {
const browser = await puppeteer.connect({
browserWSEndpoint: `wss://chrome.browserless.io/chromium?token=YOUR_TOKEN`,
});
const page = await browser.newPage();
const cdp = await page.createCDPSession();
/*Your login code*/
// Allow this browser to run for 1 minute, then shut down if nothing connects to it.
// Defaults to the overall timeout set on the instance, which is 5 minutes if not specified.
const { browserWSEndpoint } = await cdp.send('Browserless.reconnect', {
timeout: 60000,
});
// Use browserWSEndpoint to reconnect to this browser
await browser.close();
}
This keeps the session alive in the background for 60000 ms (1 minute), even if you disconnect all your instances — it can be loading something or just idling waiting for new connections, it won't judge you.
Let's look at a real example of staying logged in.
In this example, init method is creating a new connection, loging to Hacker News and returning the reconnection endpoint. Lets reconnect to this existing browser.
But there are some things to consider! These connections are basically "new tabs" on a single browser, not isolated instances running from a clean slate. This means all the advantages and disadvantages, like website ads tracking cookies and all the instances/tabs being able to "speak" with each other.
We'd advise against having a single remote browser for all your work with lots of session data, but to break it down into several remote browsers with small amounts of session data.
Connecting to a captcha-approved session
If you are working with bot-protected sites, you can take advantage of our /unblock endpoint. It uses a range of advanced tactics to hide automation fingerprints and get past strict bot detectors.
You can then return approval cookie generated by the captcha, or a websocket for the approved session. Let's try it out.
Now you can reconnect to the unblocked site using the browserWSEndpoint similar to the previous example, or you can use cookies for new connections.
Reusing the data directory
Another option for maintaining session configurations is to reuse the user data directory. Browserless supports the --user-data-dir argument, allowing you to create a new browser using an existing data directory. Let's take a look.
In this example we are specifying the --user-data-dir argument it means that once the browser is closed the data dir won't be deleted and it can be reused for a future connection. Let's create a new browser with the same use data dir.
By clicking “Accept”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information.