How to Deploy Playwright on GCP Compute Engines

October 29, 2024

contents

Deploying Playwright on a Google Cloud Compute Engines is a powerful solution for browser automation, but comes with challenges.

In this guide, we’ll go through the steps for setting up Playwright on a GCP VM, including selecting the ideal VM configuration, installing necessary dependencies, configuring the environment, and running a sample script to capture screenshots.

Choosing the Right VM

When deploying Playwright on Google Cloud, selecting the appropriate VM size is key to ensuring smooth performance. An e2-medium or e2-standard-2 instance, with 4-8 GB of RAM, generally provides sufficient resources to run Playwright smoothly.

For storage, allocating around 10 GB is recommended to handle browser binaries and temporary files generated during automation tasks. We will use Ubuntu as the operating system, since it's officially supported by Playwright.

Launch and Connect to VM

To get started, create a VM instance on Google Cloud with sufficient storage and connect to it using SSH. You’ll install Node.js and Playwright using the commands in the following section. Playwright comes with bundled browser binaries and dependencies; during installation, it will prompt you to install these automatically. If everything installs correctly, additional dependency installation is generally unnecessary, but commands are included below in case they’re needed.

Installing packages and libraries

First install Node.js


# Install curl if not already installed
sudo apt-get install -y curl

# Install Node.js (v22.x): Download and run the setup script
curl -fsSL https://deb.nodesource.com/setup_22.x -o nodesource_setup.sh
sudo -E bash nodesource_setup.sh

# Install Node.js
sudo apt-get install -y nodejs

# Verify Node.js installation
node -v

# Google Cloud storage library
sudo npm install @google-cloud/storage

Followed by Playwright and dependencies


npm init playwright@latest

This command will prompt to install browsers and dependencies. In case of any issues, they can be downloaded separately using the following command:


sudo npx playwright install-deps

Configuring a GCP bucket to store a screenshot

The code in the next section will store screenshots in Google Cloud Storage. First, set up a Cloud Storage bucket and configure the necessary authentication for the VM to save screenshots. Your VM instance also requires the Google Cloud Storage library, which we’ve already installed in the previous section.

Setting up the Cloud Storage Bucket: In Google Cloud Storage, create a bucket to store screenshots.
Enable the Cloud Storage API: In the Google Cloud Console, go to APIs & Services > Library and enable the Cloud Storage API.
Assign Permissions to the VM: The simplest way to manage permissions is to use the default service account associated with your VM instance. Ensure this service account has the "Storage Object Creator" role for your bucket. You can configure this in IAM & Admin > IAM by assigning the role to your bucket.

Granting the VM's service account these permissions allows it to access and upload screenshots directly to Google Cloud Storage

Writing code

Once the environment is set up, write the following javascript code that captures a screenshot of a webpage given as input and uploads it to the GCP bucket. This ensures that the screenshots are easily accessible for testing and validation.

Create a new screenshot.js JavaScript file and paste the following code. Update the bucket name to your bucket name.



const { chromium } = require('playwright');
const { Storage } = require('@google-cloud/storage');
const fs = require('fs');
const path = require('path');

// Get URL from command-line arguments
const url = process.argv[2];
if (!url) {
  console.error('Please provide a URL as the first argument \n Ex - https://www.example.com');
  process.exit(1);
}

// GCP Storage configuration
const bucketName = 'your-bucket-name'; // Replace with your GCP bucket name

async function captureScreenshot(url, outputPath, viewportSize = { width: 1920, height: 1080 }) {
  let browser;
  try {
    // Launch a headless browser
    browser = await chromium.launch({
      headless: true,
      args: ['--no-sandbox', '--disable-setuid-sandbox', '--disable-dev-shm-usage'],
    });

    const context = await browser.newContext({
      viewport: viewportSize,
      userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
    });
    const page = await context.newPage();

    // Navigate to the website with a timeout
    await page.goto(url, { waitUntil: 'networkidle', timeout: 60000 });

    // Capture and save the screenshot to the local directory (/tmp for EC2)
    await page.screenshot({ path: outputPath, fullPage: true });
    console.log(`Screenshot saved to ${outputPath}`);

    // Initialize GCP Storage client (no keyFilename needed)
    const storage = new Storage();

    // Define the screenshot name (filename in GCP bucket)
    const fileName = `screenshots/${url.replace(/https?:\/\//, '').replace(/\//g, '_')}.png`;

    // Upload the screenshot to Google Cloud Storage
    await storage.bucket(bucketName).upload(outputPath, {
      destination: fileName,
      metadata: {
        contentType: 'image/png',
      },
    });

    console.log(`Screenshot uploaded to GCP bucket at ${fileName}`);

  } catch (error) {
    console.error('An error occurred:', error);
  } finally {
    // Ensure the browser is closed even if an error occurs
    if (browser) {
      await browser.close();
    }
  }
}

// Usage: Pass URL and output file path via command line
(async () => {
  const outputPath = '/tmp/screenshot.png'; // Temp location to save the screenshot before uploading
  await captureScreenshot(url, outputPath);
})();

Use the following command to run the code (note the input format with https):


node screenshot.js https://www.example.com

Go and check the screenshot stored in the GCP bucket.

Maintenance tips and challenges

Playwright and VMs are a great combination, but it requires careful maintenance.

One of the most important aspects is dependency management. Playwright, along with its browser binaries, frequently releases updates to stay in sync with modern web standards.

These updates often bring new features, optimizations, and security fixes, which makes it essential to regularly update your Playwright version to avoid potential compatibility issues or vulnerabilities.

You will also need to keep an eye out for memory leaks. Issues such as zombie process and browsers not closing properly can gradually increase the resources needed to keep the automations running smoothly.

Run Playwright with Browserless to Keep Things Simple

To take the hassle out of scaling your scraping, screenshotting or other automations, try Browserless.

It takes a quick connection change to use our thousands of concurrent Chrome browsers. Try it today with a free trial.

The Easy Option: Connect Playwright to Our Browser Pool

Hosting Playwright is easy, it's the browsers that cause the issues. To simplify your setup, use our pool of thousands of concurrent browsers with just a change in endpoint.



// Connecting to Chrome locally
const browser = await playwright.chromium.launch();

// Connecting to Firefox via Browserless
const browser = await playwright.chromium.connect(`https://production-sfo.browserless.io/firefox/playwright?token=GOES_HERE`);

});

You can either host just playwright-core without the browsers, or use our REST APIs. There’s residential proxies, stealth options, HTML exports and other common needed features.

Check out the docs

Share this article