How to Download Files from Remote Browser Connections, With Puppeteer and Playwright

contents

Accessing downloaded files in a local environment is as easy as navigating to the downloads folder but working with remote browsers comes with the additional challenge of sending the files to a local machine or uploading them to a cloud repository.

This approach is particularly useful when dealing with websites that require authentication, where directly accessing the file URLs is not possible. By leveraging remote browser connections, we can simulate user interactions, bypass authentication mechanisms, and capture the files we need in an automated manner.

That's why in this post, we'll explore how to automate file downloads using Puppeteer and Playwright by connecting to a remote browser.

Sample script workflow

Each of these sample scripts will follow the same pattern.

  1. Connect to a remote browser
  2. Navigate to the target site and log in
  3. Listen for download events
  4. Trigger the downloads
  5. Save the downloaded files to disk

How to save files with Puppeteer


import puppeteer from 'puppeteer-core';
import fs from 'fs';

// Establish remote connection
const browserWSEndpoint = `wss://production-sfo.browserless.io/chromium/stealth?token=${process.env.BROWSERLESS_TOKEN}`;
const browser = await puppeteer.connect({
  browserWSEndpoint
});
const [page] = await browser.pages();

// Create cdp session 
const cdp = await page.createCDPSession();

// Navigate to target site and login if needed
await page.goto("https://slackmojis.com/", { waitUntil: "networkidle0" });

// Configure Network Interception
await cdp.send('Network.enable');
await cdp.send('Network.setRequestInterception', {
  patterns: [
    {
      urlPattern: '*',
      interceptionStage: 'HeadersReceived',
    },
  ],
});

// Function to download file once the response was intercepted
const downloadFileFromInterceptedResponse = async (interceptionId, fileName) => {
  const { stream: streamHandle } = await cdp.send('Network.takeResponseBodyForInterceptionAsStream', {
    interceptionId: interceptionId,
  });
  const writer = fs.createWriteStream(`${fileName}`, { encoding: 'base64' });
  while (true) {
    const read = await cdp.send('IO.read', {
      handle: streamHandle,
    });
    if (read.eof)
      break;
    writer.write(read.data);
  }
  // After file is saved, we need to abort the request so that the browser doesn't wait for the response.
  cdp.send('Network.continueInterceptedRequest', {
    interceptionId: interceptionId,
    errorReason: 'Aborted',
  });
};

// Listen for intercepted events events
const downloadPromises = [];
await cdp.on('Network.requestIntercepted', async (event) => {
  if (event.isDownload) {
    // When event is a download we call our download function
    const fileName = event.request.url.split('/').pop();
    downloadPromises.push(downloadFileFromInterceptedResponse(event.interceptionId, fileName));
  } else {
    await cdp.send('Network.continueInterceptedRequest', {
      interceptionId: event.interceptionId,
    });
  }
});

// Trigger the downloads
await page.click('ul>li:nth-child(2)');
await page.click('ul>li:nth-child(3)');
await page.click('ul>li:nth-child(4)');
await page.click('ul>li:nth-child(5)');

// Wait for downloads to finish
await new Promise(r => setTimeout(r, 3000));
await Promise.all(downloadPromises);
browser.close();


How to save files with Playwright


import playwright from "playwright-core";

// Establish remote connection
const browserWSEndpoint = `wss://production-sfo.browserless.io/chromium/playwright?token=${process.env.BROWSERLESS_TOKEN}`;
const browser = await playwright.chromium.connect(browserWSEndpoint);
const page = await browser.newPage();

// Navigate to target site and login if needed
await page.goto("https://slackmojis.com/", { waitUntil: "networkidle" });

// Listen for downloads
const downloadPromises = [];
page.waitForEvent("download", async (download) => {
  downloadPromises.push(download.saveAs(`${download.suggestedFilename()}`));
},);

// Trigger the downloads
await page.click('ul>li:nth-child(2)');
await page.click('ul>li:nth-child(3)');
await page.click('ul>li:nth-child(4)');
await page.click('ul>li:nth-child(5)');

// Wait for the downloads to finish
await new Promise(r => setTimeout(r, 3000));
await Promise.all(downloadPromises);
browser.close();


How to save files with a Playwright CDP Session


import playwright from "playwright-core";
import fs from 'fs';

// Establish remote connection
const browserWSEndpoint = `wss://production-sfo.browserless.io/chromium/stealth?token=${process.env.BROWSERLESS_TOKEN}`;
const browser = await playwright.chromium.connectOverCDP(browserWSEndpoint);
const browserContext = browser.contexts()[0];
const [page] = await browserContext.pages();

// Create cdp session
const cdp = await browserContext.newCDPSession(page);

// Navigate to target site and login if needed
await page.goto("https://slackmojis.com/", { waitUntil: "networkidle0" });

// Configure Network Interception
await cdp.send('Network.enable');
await cdp.send('Network.setRequestInterception', {
  patterns: [
    {
      urlPattern: '*',
      interceptionStage: 'HeadersReceived',
    },
  ],
});

// Function to download file once the response was intercepted
const downloadFileFromInterceptedResponse = async (interceptionId, fileName) => {
  const { stream: streamHandle } = await cdp.send('Network.takeResponseBodyForInterceptionAsStream', {
    interceptionId: interceptionId,
  });
  const writer = fs.createWriteStream(`${fileName}`, { encoding: 'base64' });
  while (true) {
    const read = await cdp.send('IO.read', {
      handle: streamHandle,
    });
    if (read.eof)
      break;
    writer.write(read.data);
  }
  // After file is saved, we need to abort the request so that the browser doesn't wait for the response.
  cdp.send('Network.continueInterceptedRequest', {
    interceptionId: interceptionId,
    errorReason: 'Aborted',
  });
};

// Listen for intercepted events events
const downloadPromises = [];
await cdp.on('Network.requestIntercepted', async (event) => {
  if (event.isDownload) {
    // When event is a download, we call our download function
    const fileName = event.request.url.split('/').pop();
    downloadPromises.push(downloadFileFromInterceptedResponse(event.interceptionId, fileName));
  } else {
    await cdp.send('Network.continueInterceptedRequest', {
      interceptionId: event.interceptionId,
    });
  }
});

// Trigger the downloads
await page.click('ul>li:nth-child(2)');
await page.click('ul>li:nth-child(3)');
await page.click('ul>li:nth-child(4)');
await page.click('ul>li:nth-child(5)');

// Wait for the downloads to finish
await new Promise(r => setTimeout(r, 3000));
await Promise.all(downloadPromises);
browser.close();


These scripts are fully compatible with Browserless and can be used in a production environment with minor improvements like error handling or custom behaviour.

Conclusion

We’ve explored how to download files using both Puppeteer and Playwright, connecting remotely to a browser instance via Browserless. Whether you're automating file downloads from a web page or capturing network traffic, these tools provide powerful methods for controlling the browser and interacting with its network stack. You can further enhance this process by integrating error handling, dynamic waits, or adding advanced download logic.

{{banner}}

Want an easy way to run remote browsers? Try Browserless

With Browserless you never have to worry about version updates or memory leaks from causing issues with your remote browsers.

Let us manage all the details, while you connect to our browser pool with a quick change in endpoint.

Check out the docs
Share this article

Ready to try the benefits of Browserless?