Azure Functions App and Puppeteer offer a great combination for browser automation. Functions offer a serverless environment and lets you scale these tasks without worrying about the infrastructure.
If you're interested in learning how to do this, we'll show you how to deploy Puppeteer on Azure Functions App’s consumption plan. That includes how to get around Puppeteer going past the Function's file size limitation.
You'll get instructions to either connect to an existing browser pool, or how to host one yourself.
How to use AzureFunction Apps to deploy Puppeteer
We'll create an Azure function with a consumption plan that uses a website’s URL as its input. As an example use case, we'll include instructions for taking a screenshot with Puppeteer.
Using puppeteer-core and a separate Chrome browser, the function will visit the provided website, capture a screenshot, and save it to a blob storage. This process helps us check if the function works.
A quick note about hosting browsers
Running on a Function comes with limitations. The most important is file size limits, as Chrome by default is too big to fit in a function.
It’s why this guide is going to use puppeteer-core with a separate compressed browser, instead of the default Puppeteer package that’s bundled with the standard image.
Plus, if you’re going to perform large-scale parallel testing or automation, it becomes hard to continue using Azure Function due to these constraints such as usage restrictions and execution speed limits.
{{banner}}
The DIY Option: Running Puppeteer & Chrome on Azure Functions
Step 1: Install dependencies
You need to install puppeteer-core
within the Azure Functions app as a dependency. Do this by navigating to Azure’s function director and using the command line below:
This command lets you install Puppeteer Core, Puppeteer's version that doesn't automatically download Chromium. It also installs the Azure Storage Blob SDK, a required dependency for the Azure function.
Step 2: Create the package.json file
The package.json
file lists the dependencies, and it's created when you install them.
The file should look like this with all the dependencies listed, but keep in mind that the version numbers can be different for your function:
Step 3: Download Chromium
We're using puppeteer-core, which does not include a version of the Chrome browser. So, we'll have to manually download Chrome and provide the path with the executablepath
variable.
We’ll use the Sparticuz/Chromium file from Github, a compressed version of Chrome with the code needed to decompress the brotli package..
Download the file using the command below and use makefile
in the repository to create binaries.
These files will create binaries that you can shift to the project folder. Make a note of the binaries' path so you can use it in the main code.
Note: This step requires regular updates and maintenance because it might not be compatible with new updates for these packages when released.
Step 4: Write the Azure Function
You can create the Azure function using Azure's console and Node.js runtime. Allocate at least 512 MB of RAM to run the function.
Azure Functions' default timeout of 5 minutes — sufficient for automating the task of taking a screenshot. You can increase it to 10 minutes if needed.
Note: Update the Chromium path while running this step.
Below, you can find the index.js code:
Step 5: Trigger and test the code
Tigger the code using the Azure console's test/run button. Use the website's name as the input.
Our code uses "google.com" as the default if you don't provide input. The function stores the website's screenshot in the blob storage if it runs successfully.
Avoid squeezing Chrome into a Function and let Browserless manage your browsers
Automating browser tasks in the cloud has too many moving parts. Puppeteer, Chromium and the cloud's serverless backend work independently — so they require constant monitoring and updates to ensure compatibility.
Instead of going through the hassle of doing this all the time, use our pool of hosted browsers.
We host thousands of browsers ready to use with Puppeteer — and don't come with these headaches.
Take it for a spin using our free trial.
The Easy Option: Deploy Puppeteer-Core On Azur And Let Us Host the Browsers
Puppeteer is simple to deploy, it’s Chrome that causes difficulties. Browsers aren’t designed to run in the cloud, with frequently updating dependencies, excessive memory leaks and poor file management.
For a simpler path, use Browserless.
We host a pool of managed browsers, ready to connect to with a change in endpoint via puppeteer.connect(). The rest of your code then stays the same.
For more details on using Puppeteer with Browserless, check out the docs. We also have REST APIs for common tasks such as extracting HTML, exporting PDFs and downloading files.