How to Run Puppeteer Within Chrome to Create Hybrid Automations

July 18, 2024

contents

You’ve likely dealt with multiple roadblocks while using Puppeteer and Chrome for browser automation—for example, the inability to bypass logins or2-factor authentication (2FA) with a fully automated script.

Even if you bypass 2FA, what about those unpredictable CAPTCHA challenges? Or those pesky “confirm your identity” pop-ups?

You’re left with a choice: abandon automation for these scenarios or resort to storing sensitive user credentials, both of which are not ideal.

It’s time to focus on a semi-automated approach instead. At our recent conference, I discussed a hybrid automation approach to overcome this issue.

To go along with the recap, here is a guide complete with code blocks.

Why user-in-the-loop automations are a safer alternative

Also known as “human-in-the-loop” (HITL) automation, it’s a way to automate most of the process while involving human actions when needed. It’s important to take this approach when you run into issues like:

Automations needing to run in a logged in account
Unexpected security prompts or CAPTCHAs
Multi-factor authentication (MFA) challenges
Need to handle sensitive credentials in automation systems
Risking compliance with security protocols

For example, if you attempt to automate access Gmail you'll need a username and password followed by a two-step verification. This is a common scenario where HITL automation would be helpful.

Hybrid automation allows your user to step in only when necessary while automating the rest of your workflows.

This way, you don’t have to worry about your automation breaking; there’s a level of human oversight to prevent issues. Many companies are increasing security measures or feeling pressure to comply with regulatory requirements. So, if you’re automating browser interactions while dealing with such systems, you can do that without compromising security.

There are many use cases for UITL automation, such as:

Human resources: Teams that have onboarding processes that require multiple external systems
Finance: Teams that automate payroll processes that are done via access to secure financial platforms
Customer service: Teams who need to access different account management systems through automation
‍IT operations: Teams that need access to multiple cloud/SaaS platforms to do their work

Note: If you don't want to build this yourself, use Browserless's Hybrid Automations

Browserless has Hybrid Automations set up and ready to deploy for Enterprise Users. We handle all of the browsers-within-browsers and potential security concerns such as preventing memory leaks.

Check out the announcement article to learn more:
‍Stream login windows with our Hybrid Automations

Okay, on to the guide...

5 steps to build a user-in-the-loop automation script

In this tutorial, we’ll use the example workflow of:

capturing a video stream in a headless browser
sending it to the user’s browser
capturing the user’s keyboard and mouse events
sending the events back to the headless browser.

To follow along, the code for this example is available in this demo repo.

Step 1: Set up Puppeteer within your browser

The first thing we’ll do is get Puppeteer running in the browser. Don’t import all the files because it could cause issues while running the script. Instead, import only the necessary packages using ECMAScript modules (ESM).

We’re doing this because Puppeteer has a native ESM build—so if you only use “connect,” we don’t have to polyfill anything. Plus, modern browsers can read this and load the subsequent dependencies.

You can do this by following these steps:

Use specific imports: ’connect’ from puppeteer-core/lib/cjs/puppeteer/common/BrowserConnector.js.
Import BrowserWebSocketTransport from puppeteer-core/lib/cjs/puppeteer/common/BrowserWebSocketTransport.js.
Import only necessary parts of Puppeteer using ECMAScript modules (ESM).


// Loading from node_modules isn’t recommended but is done here for demonstration purposes.
// PLUS: No need for a bundler like webpack or others
import { BrowserWebSocketTransport } from
’./node_modules/puppeteer-core/lib/esm/puppeteer/common/BrowserWebSocketTransport.js’;
import { connectToCdpBrowser as connect } from
’./node_modules/puppeteer-core/lib/esm/puppeteer/cdp/BrowserConnector.js’;

Now that we have our dependencies, we’ll connect our user’s browser to our headless browser.

You need to have a headless browser running with an open WebSocket debug port, which is the browser versus endpoint. Then, we pass it to the connection transport, which is the WebSocket connect transport.


const connectionTransport = await BrowserWebSocketTransport.create(
this.browserWSEndpoint,
);
this.browser = await connect(
connectionTransport,
this.browserWSEndpoint,
HybridPuppeteer.cdpOptions,
);

Note: Most APIs are available except those related to file systems, which you use to generate PDFs or screenshots.

Step 2: Set up Chrome Devtools Protocol (CDP)

Here’s how you can do this in your system:

Connect to a headless browser instance with an open WebSocket debug port.
Use Puppeteer’s ’connect’ method to establish a connection to the headless browser.
Pass the WebSocket endpoint and the BrowserWebSocketTransport to the connect method.
Once connected, you have a browser object that gives access to CDP.
Use CDP to access lower-level browser capabilities that are not available through Puppeteer’s high-level API.


this.page = await this.browser.newPage();
this.cdp = this.page._client.call(this.page);
await this.page.goto(this.url);

You should also subscribe to screencast frame messages from CDP. In this case, CDP sends your browser a new screen frame every time it gets drawn—and since it’s a high FPS, it’s network-heavy.

Step 3: Stream images from the browser

Next, tell Chrome you want to get the screencasting frames. Specify a format and quality—for example, PNG and 1 to 100 (arbitrary quality numbers).


this.cdp.on(’Page.screencastFrame’, this.onScreencastFrame);
await this.cdp.send(’Page.startScreencast’, {
format: ’jpeg’,
quality: this.quality,
});

We need to acknowledge that we receive the frame to Chrome on every screencast frame. Otherwise, we may not get future events from the screencasting session.

So, set up an event listener for screencastFrame events. Then, acknowledge receipt to CDP using Page.screencastFrameAck. Chrome typically sends it within base64-encoded data.

Step 4: Draw images and confirm the script works

You need a canvas element to capture and display the streamed images. Here’s how you can do it:

Create a canvas element in the user’s browser to display the streamed images.
The base64 image data is decoded for each received frame, and an image object is created.
Set the SRC to the decoded data.
When the image loads in the canvas, clear it.

Use this code to execute it:


onScreencastFrame = ({ data, sessionId }) => {
this.cdp.send(’Page.screencastFrameAck’, { sessionId }).catch(noop);
this.img.onload = () =>
this.$ctx.drawImage(
this.img,
0,
0,
this.$canvas.width,
this.$canvas.height,
);
this.img.src = ’data:image/png;base64,’ + data;
};

You can add visual feedback or logging to confirm frames are being received and drawn correctly. Also, consider adding an FPS (Frames Per Second) counter to monitor performance.

Step 5: Capture mouse and keyboard events

CDP has two APIs for doing keyboard and mouse events. For this example, we’ll unify that into a single object. Basically, you have to combine DOM-based APIs with CDP-based APIs and create a translation layer to mirror frames.

First, you need to bind keyboard events. Set event listeners on the body or canvas element: ’keydown,’ ’key up’, ’keypress.’


bindKeyEvents = () ⇒ {
document.body.addEventListener(’keydown’, this.emitKeyEvent, true);
document.body.addEventListener(’keyup’, this.emitKeyEvent, true);
document.body.addEventListener(’keypress’, this.emitKeyEvent, true);
};

They all get transferred into the same handler itself. It handles all those cases for us a little bit at a high level. Use the code below to execute it:


const text =
type === ’char’ ? String.fromCharCode(event.charCode) : undefined;
const data = {
autoRepeat: false,
code: event.code,
isKeypad: false,
isSystemKey: false,
key: event.key,
keyIdentifier: event.keyIdentifier,
nativeVirtualKeyCode: event.keyCode,
text,
type,
unmodifiedText: text ? text.toLowerCase() : undefined,
windowsVirtualKeyCode: event.keyCode,
};
this.cdp.send(’Input.dispatchKeyEvent’, data);
};

Note: Refer to the CDP docs in case it doesn’t work, as it requires some trial and error.

The method that we want to send is the input dispatch key event to CDP that will trigger the browser to mirror that key press natively. So it’s a native key press event.

Set up mouse events like ’mouse down,’ ’mouseup,’ ’mouse move,’ ’wheel,’ ’mouse enter,’ ’mouse leave.’


addListeners = () {
this.$canvas.addEventListener(’mousedown’, this.emitMouse, false);
this.$canvas.addEventListener(’mouseup’, this.emitMouse, false);
this.$canvas.addEventListener(’mousewheel’, this.emitMouse, false);
this.$canvas.addEventListener(’mousemove’, this.emitMouse, false);
this.$canvas.addEventListener(’mouseenter’, this.bindKeyEvents, false);
this.$canvas.addEventListener(’mouseleave’, this.unbindKeyEvents, false);
};

Similar to the keyboard events, you need to construct an object. The click count modifiers and XY coordinates need to be relayed into the same spot on the headless browser. So, we draw the headless browser to the same width and height as the end user’s browser.


const data = {
button: event.type === ’mousewheel’ ? ’none’ : buttons[event.which],
clickCount: 1,
modifiers: HybridPuppeteer.getModifiersForEvent(event),
type: types[event.type],
x,
y,
};
if (event.type === ’mousewheel’) {
data.deltaX = event.wheelDeltaX || 0;
data.deltaY = event.wheelDeltaY || event.wheelDelta;
}
this.cdp.send(’Input.emulateTouchFromMouseEvent’, data);

Best practices and tips to run HITL scripts seamlessly

HTIL scripts let you bypass the problem with fully automated ones. That said, there are potential issues you could face or things you need to do to run them without any hassle.

Here are a few tips from Griffith’s experience:

1. Adjust quality settings for reduced bandwidth

Adjusting quality settings (e.g., JPEG quality in CDP’s screencast) reduces network bandwidth usage for a smooth operation.

To overcome this issue, set a lower quality parameter Page.startScreencast to find a balance. As a result, you can run the script without worrying about excessive resource consumption.

2. Match dimensions for better coordinate mapping

Matching dimensions let you accurately perform the coordinate mapping between user and headless browsers. This prevents any issues when you’re running scripts for browsers on different devices.

Set the headless browser’s viewport to match the user’s browser size and handle resize events to maintain synchronisation.

3. Handle resize events to maintain proper coordinate mapping

You must resize events to map coordinates accurately when the user’s browser size changes. If you don’t account for this, automation tasks like clicking or hovering will no longer the actual position of the page element—giving you incorrect results

Use the resize event listeners on the user’s browser, update the headless browser’s viewport accordingly, and recalculate coordinate translations for consistent user interactions.

4. Prepare for high-network bandwidth support

Due to frequent event transmissions, you need a high network bandwidth for this setup. If not, you won’t be able to:

Send large data payloads efficiently
Quickly transfer data between the browser and your script
Detect and respond to user input faster

You can optimise the script implementation process by:

Compressing data
Using WebRTC for efficient streaming
Warning users about potential data usage

5. Create and maintain a list of tests for workflow verification

CDP and DOM APIs can change over time. You risk breaking your implementation, so create a set of validation tests to confirm if your setup works every time.

You can develop a test suite that covers user interactions, browser behaviours, and edge cases. Every time you run the script, run the test to catch and address potential blockers immediately.

6. Wait for Puppeteer APIs to be launched

If you find implementing the script challenging, consider waiting until Puppeteer launches native APIs. This will help you save development time and reduce the maintenance overhead that occurs when you work with CDP.

Keep a close eye on Puppeteer’s roadmap and consider using high-level APIs to avoid CDP-related bottlenecks.

Hybrid automation is the way forward

Browser automation comes with a whole set of issues when you’re trying to bypass inflexible or tightly secured browsers and apps. However, you shouldn’t have to forego automation altogether.

Consider incorporating a hybrid automation or UITL automation script to overcome these issues.

If you want to build it yourself, check out our example repository to access the code from this tutorial.

Or if you'd rather an out-the-box solution, contact our team about getting running with our Hybrid Automations API.

Share this article