Why HTTP load tests fail to catch critical errors

contents

Load testing with HTTP requests or GET calls to API endpoints is great, but has its limits.

It won’t turn up issues such as with third party integration, CDN requests or memory leaks. So let's look at these blindspots in more detail and how else to catch the potential issues.

The shortcomings of testing with HTTP requests

While those tests can tell you whether or not those requests get the expected response at 10 or 100, or even 10,000 concurrent users and how long it takes, they don’t tell you whether a user will get the desired experience at that volume. 

For the same reason you use browsers in end-to-end testing, it’s a good idea to use them for load testing too. You want to make sure you’re really testing mission-critical code and understanding how load impacts the user experience—not just whether HTTP requests are being fulfilled as expected.

What can browser-based load testing do?

1. Validate third-party service performance

You don’t just need to test the parts of the system that you have direct ownership of. Your web application likely connects to a number of third-party services, each of which needs to work at different thresholds of traffic. 

For example, does your payment vendor operate as expected under pressure, so you don’t end up with abandoned carts? 

If you use a CDN, while solutions like Cloudflare or Akamai are usually reliable at scale, they’re not infallible. A good live browser load test would surface any issues with serving a bit of CSS or an asset from your provider.

2. Understand what the user sees when your load test passes

Without browser-based load testing, your load test may pass because the expected 200 status code came back, but you won’t know if an asset failed to load or the CDN request failed, resulting in a broken user experience. If you don’t have a visual reference for what is happening when the test passes, you haven’t truly tested it. 

Aren't you just talking about Lighthouse metrics?
Lighthouse does give you a lot of useful stats and performance timings (time to first meaningful paint, etc.). But, the response you get is probably more in-depth than necessary for load testing your app’s frontend performance, and won’t necessarily tell you what worked vs. what didn’t, or how long it took to get to some specific, key stats.

Measure performance consistency under constant pressure

While concurrency is important for load testing, it’s not always the most impactful metric. A lot of times, degradation, timeouts, or memory leaks aren’t a result of a spike—it takes a long time to get there, without necessarily higher-than-average traffic. You don’t have to hit your website or app so hard all at once to get to that spot where it fails—it’s just consistent pressure over time, not peak pressure.

For example, given a spread of a 10-minute interval with 100 users (otherwise known as soak testing)—does the app stay up? Do we get memory leaks? What other things surface at that threshold?

If you’re doing best programming practices like garbage collection and memory management, you don’t have to worry about pure volume as much, unless your underlying infrastructure is in a language or framework that requires manual memory management (lower-level languages like Java, or Scala).

Why isn’t browser-based load testing prioritized?

So, we understand why live browser testing or browser-based load testing is important. Why does it tend to be skipped over? As one Hacker News comment put it:

"A point that I feel is very important and yet is often overlooked in real-world web load testing; emulating an actual client behavior under different real-world scenarios. Hitting a page or pages X times usually bears almost no resemblance to how a real-world application is used and therefore gives no indication of actual performance and capacity.

"To test a web app properly you need the ability to script and/or replay real-world session browsing behaviors, simulating typical interactions. This can be both very time consuming and difficult to set up but yields far more useful and realistic results."

If you look up the Agile Testing Pyramid, you’re unlikely to find this type of testing included. It’s easy to think that this case is covered by other tests, like end-to-end testing, component testing, device testing, and so on.

If you think about testing as a spectrum:

  • At one end, you have something like ab, that just makes simple GET calls. 
  • At the other is comprehensive device testing like Browserstack or SauceLabs, which is more geared towards validating that your website or app renders on a specific device, operating system, or browser. 

They will run your website or app on actual end user hardware. This is great for fidelity, but more limited on how many you can run at one time and costly—not really suitable for load testing. They may not also support the library or language you want to use. 

Using a headless browser (like Browserless) for browser-based load testing is somewhere in the middle: You can do a lot more and instrument it however you want, using all our APIs and libraries, but it’s not the same as using an actual device. You can also run tests on multiple browsers at a scale that would be prohibitive with device testing. 

Most of the current open source load testing tools (like JMeter, Locust, and Gatling) don’t use a browser. You have to manually fire up a web browser, run a test, and return a result. There are a lot of ways to go about this, from instrumentation, to writing the code and finding the request. Assuming you get that far, now you have to start thinking about infrastructure.

Standing up infrastructure and resourcing

If you wanted to run a million or even a thousand concurrent events for your browser-based load testing, we’re talking in the order of potentially thousands of machines. You have to conduct resource planning to ensure you can launch virtual machines or lease new hardware, the data center, etc. The company you work for may not have the resources or allow you the allocation to do this (we probably don’t need to belabor the challenges with this). 

Even if you’re not trying to test at a massive scale, even 100 web browsers can take a lot of machines to simulate. You have to do a lot of planning and due diligence to ensure they function the way you want them to.

Wait, can’t we just use a container for this?

You can grab a Docker container or really any portable image format, but that only gets you so far. Now you have to manage a fleet of them, and a lot of cloud services have limits on how many you can have running at a time, or how many are even for CPU and RAM utilization.

So, yes, having the container image gets you on your way. But now you have to bring a load balancer into the picture. It has to understand websockets potentially (which some do, some don’t, some cloud services support it, some don’t). And so your problem just becomes bigger and bigger, and then you have to tear all that down once you’re done.

That’s without considering niche requirements, like wanting a GPU to render a site with lots of visuals. Getting GPUs to function a Docker container is not exactly turnkey.

Even once you have a fleet of browsers set up to load test your web app, you have to ensure you’re using the right size of machine to get accurate load test results. If you put your browser on too big or too small of a machine, your results aren’t going to match real-world performance.

The assumptions you make, what machines you use—these can all create a lot of unnecessary work if you get too far down the road with writing and running tests without considering the conditions you’re trying to simulate.

When you want browser-based load testing 

If you’re just looking to test that the page renders in a certain amount of time, something like a screenshot API or our Lighthouse API will confirm that the page renders and give you visual feedback that it loads properly. 

One of our larger customers has a use case where they’re looking for a specific request to be fired from the browser, that indicates that some SLA is met for their product team. With our /function API they were able to write a compact script that gets them exactly what they need to know and nothing else, without running Node or anything—just a simple REST call and response. It’s easy to use a library and headless browser for this type of requirement: grab Playwright, or Puppeteer, connect to Browserless, and look for that request: then you can get a stat back confirming how many seconds it takes to get that response. 

Beyond load testing, you can use a headless browser for continuous automated tests as well, which aren’t necessarily about performance and statistics. Did the page render, did all the images show up, are there any broken requests? For that kind of use case it’s definitely easier to use a library together with a headless browser, because you can just watch all network calls and see if any of them are not returning a 200 response code.

A small plug: Why use Browserless for browser-based load testing?

A lot of the alternative tools you might use are specific to a language or framework. You can end up in a situation where you’re trying to pool results from multiple tests, making it hard to get holistic results. The advantage of using Browserless for these tests is that you can bring your own tooling—Node, Python, Puppeteer, Playwright, whatever library you’re comfortable with. You can simulate any setup you want with any or all of our browser APIs, so you don’t have to try to mock those situations in your load test. 

If you want to just visit a page and run JavaScript (and not bother with a library), there are small APIs that do just that, and you get back some information about page load. If you want to simulate thousands of requests to a website, our Lighthouse API can give you a very nice output of all the performance characteristics of the site.

You can do as much as little as you want and Browserless is not opinionated on how you get it done—you can implement what you want and leave the rest.

Share this article

Ready to try the benefits of Browserless?