Web automation and testing have never been more accessible, thanks to tools like Puppeteer. When combined with cloud platforms like Netlify, the potential for scalable, efficient web applications becomes boundless.
This guide focuses on utilizing puppeteer-core
, a lightweight version of Puppeteer. It lets you control a separately hosted Chrome instance, enabling more flexibility and efficiency in your automations.
Understanding puppeteer-core and Netlify
The default Puppeteer package comes bundled with Chromium. This might sound good at first glance, but can cause a range of challenges around scaling and resource management.
That's why we always recommend using puppeteer-core instead. It's essentially Puppeteer without its bundled Chromium, allowing developers to deploy the browsers separately. This approach is particularly useful for cloud environments where you might want to manage browsers on their own server for cost and performance reasons.
Netlify, on the other hand, offers an easy-to-use, serverless platform for deploying web applications and automating workflows. Its Functions feature supports serverless backend functions, perfect for automation scripts using Puppeteer.
By combining the two, developers can create serverless web automation solutions without the challenge of trying to host Chrome on Netlify.
Setting up your project
1. Initial setup:
- Create a Project Folder: Begin by creating a new folder for your project. This folder will contain all your project files, ensuring a clean and organized workspace.
- Initialize Node.js Project: Open a terminal, navigate to your project folder, and run npm init -y. This command creates a package.json file, which will track your project's dependencies and scripts. Think of it as the blueprint of your project.
2. Installing dependencies:
Install puppeteer-core and doteny. The second library manages allowing you to configure the Chrome instance's WebSocket endpoint securely.
3. Host a Chrome instance
Decide how you'll host Chrome. For development, you might run Chrome on your machine, but for production you'll want to host them yourself in your chosen cloud platform, or use Browserless's managed browsers.
Writing your first puppeteer-core script
Let's craft a simple script that opens a web page and takes a screenshot. This example provides a hands-on introduction to Puppeteer's capabilities:
The script starts by loading environment variables, which includes your Chrome instance's WebSocket endpoint (CHROME_WS_ENDPOINT). It provides a secure way to store sensitive information outside your code.
Your script then connects to Chrome so it can navigate to a webpage and takes a screenshot.
If you would like an example of a more complex export, check out our guide on generating professional looking PDF reports using Puppeteer.
Integrating Puppeteer-core with Netlify Functions
Netlify Functions allows you to run serverless backend code without provisioning or managing servers. Integrating puppeteer-core
into a Netlify Function enables you to execute browser automation tasks triggered by HTTP requests or scheduled events.
- Prepare Netlify Functions:
Create a Netlify directory at the root of your project with a functions directory within it. Netlify requires this structure to recognize and deploy your serverless functions.
- Function Script:
Inside the functions directory, create a JavaScript file for your function, e.g., screenshot.js. This file will contain the code to execute Puppeteer-core tasks.
The exports.handler
is the entry point for your Netlify Function. It's triggered whenever the function is invoked, which can be through an HTTP request or a scheduled event.
Inside the handler, Puppeteer connects to the Chrome instance, navigates to a page, and takes a screenshot, similar to our standalone script.
Then, the function returns a success message, indicating the operation's outcome. In practical applications, you might return the screenshot itself or a link to where it's stored.
Best practise for hosting Chrome yourself
Managing a Chrome deployment comes with challenges. Whether you're hosting Chrome on AWS, GCP, Azure or another platform, we would recommend giving thought to:
- Health checks to regularly ensure your instances are response, and restart or replace any that are unresponsive of have high memory usage.
- Access control using firewalls or network access control lists to prevent unauthorized use and protect against attacks.
- Function timeouts to ensure tasks aren't being prematurely terminated and using aynch processing where suitable.
- Data storage, both in terms of bypassing Netlify's memory limits and to make sure temp files don't fill up your servers.
- Browser updates, making sure your Chrome instance and necessary libraries are all up to date, keeping up with the regular version updates.
Or, you can use our browsers.
Using Puppeteer with our fleet of managed browsers
Browserless lets you avoid the hassle of managing your own Chrome instances. We host thousands of browsers that are ready for anyone to use with their Playwright or Puppeteer scripts.
To use Netlify with Browserless, all you need to do is change the endpoint.
Your scripts will then run without you ever having to worry about memory leaks or issues from version updates.
For more info about using Puppeteer with Browserless, check out our docs.
Closing thoughts about Netlify and Puppeteer
Puppeteer and Netlify are a great combination, especially for tasks such as generating screenshots and PDFs. By using puppeteer-core you can use browsers hosted elsewhere, either on your servers or ones managed by Browserless.
To get started with Browserless, just go ahead and grab a 7-day trial.