Case study – How Semji create HTML snapshots for their SEO platform

contents

Introducing Semji’s AI for SEO

Semji is an SEO platform that uses artificial intelligence to help businesses produce and optimize their web content.  They help writers to improve their articles or shopping pages to rank better on Google for a given keyword.

They extract content from the selected site to provide feedback and potential keyworks, how competitors are behaving, content to write and various other factors. AI is an important part of their workflow, with content generation coming from OpenAI.

The user then gets metrics about how the optimization behaved, so they can see any improvements.

Going beyond a blank page

The editor is a key part of Semji. It’s where users write their article and see their optimizations.

Instead of writing on a blank page, Semji wanted to recreate the user’s page inside of the editor. It needed to be a full representation, complete with context such as images.

Going beyond a blank page

They tried generating an image or extracting an HTML. This was tricky, snapshotting assets was required because the CSS or javascript sometimes had a hash with disappearing assets.

Semji quickly realised that the only option was to snapshot the full page contents within an HTML script.

Scripting their own solution

After looking for off the shelf solutions, they decided to use a simple script using Puppeteer.

Using Puppeteer let them meet their requirements of

  • Screenshotting a full page
  • Waiting for delayed images to load
  • Removing any banner cookies

It used a Chrome headless browser with Puppeteer, but wasn’t very stable. Chrome was annoying to manage, with the container needing regular reboots.

So while the scripts and Puppeteer were good, they looked at other infrastructure options.

Managing headless Chrome with Browserless

A quick search lead their CTO to Browserless.

Setup was an easy process, with only a few lines of config required to use the websocket. Within a few hours, everything was running smoothly.

Years later, they have barely had to touch the code which is a great sign.

Expanding into PDF exports

A later challenge was creating PDF exports for users.

Their engineers had previously used PHP libraries to handle PDF exports, but they were a pain to work with.

Turning an HTML into a PDF with Puppeteer proved to be straightforward. From there, they again connected to Browserless to handle the exports.

Happily hands off with Browserless

Now that they have Browserless up and running, they are happily letting it run with minimal maintenance in the background.

If you're using Puppeteer or Playwright to create screenshots or PDFs and you'd also like to try Browserless, go ahead and grab a free trial.

Share this article

Ready to try the benefits of Browserless?