Is Your API actually ready for user traffic?

Web traffic

You've built a fantastic API that powers your web application, but how do you know it can handle the load when users start to flood in?

Introducing K6

K6 is a modern load testing tool that makes it easy to test your API's performance under load. It's open-source, easy to use, and can be run from the command line or integrated into your CI/CD pipeline.

It runs on Windows, macOS, and Linux, and supports HTTP/1.1, HTTP/2, and WebSockets.

With K6 you write your tests with JavaScript and run them though a terminal command. Here's an example test that sends a GET request to an API endpoint:

// script.js
import http from 'k6/http'

export default function () {
  http.get('https://api.example.com/')
}

You can run this test by saving it to a file (e.g., script.js) and running the following command:

k6 run script.js
Benchmark results

When you run your test, K6 will output detailed results, including response times, throughput, and error rates. You can use these results to identify bottlenecks in your API and make improvements to handle more traffic.

But before you start implementing your tests, you need to ask yourself:

What does performance mean for your API?

This will influence the type of tests you write and the metrics you track. For example, you might care more about response times than throughput, or you might be more concerned with error rates than response times.

Defining what performance means for your API will also help you decide the success and failure criteria for your tests and ensure you're testing the right things.

In general, you want to define your API's normal traffic and acceptable response times, and then test to see if your API can handle more traffic than that without degrading performance.

Let's write a test that simulates 10 virtual users and check that the API responses are under 200ms:

// script.js
import http from 'k6/http'
import { sleep, check } from 'k6'

export default function () {
  export const options = {
    vus: 10,
    duration: '5m',
  }

  const res = http.get('https://api.example.com/')
  check(res, {
    'status is 200': (r) => r.status === 200,
    'response time is less than 200ms': (r) => r.timings.duration < 200,
  })
  sleep(1)
}
Benchmark results

This will give you some initial indication of how your system performs under normal conditions. You can add this test to your CI/CD pipeline to ensure that your API's performance doesn't degrade over time. This type of testing is called Load testing.

Load testing

In order for your test to fail you need to set some thresholds. But it's not as simple as saying that if one request fails during the Load test, then you can't deploy to production. Because whenever there are network calls involved, there are many factors that can cause a request to fail. That's why we need to give ourselves some flexibility. So, that's why the thresholds are defined in terms of percentages.

For example, you can say that if 99% of the requests are successful, and the response time is under 200ms, the you are good to go and you can deploy to production.

// script.js
import http from 'k6/http'
import { sleep, check } from 'k6'

export const options = {
  vus: 10,
  duration: '5m',
  thresholds: {
    http_req_failed: ['rate<0.01'], // http errors should be less than 1%
    http_req_duration: ['p(99)<200'], // 99% of requests should be below 200ms
  },
}

export default function () {
  const res = http.get('https://api.example.com/')
  check(res, {
    'status is 200': (r) => r.status === 200,
    'duration was <= 200ms': (r) => r.timings.duration < 200,
  })
  sleep(1)
}

But Load testing is not enough. It only tells you how your system behave under an arbitrary load (in this case the normal load condition).

Load testing graph

But in a real application you might have times when you receive more requests than usual, and you want to know how your API will behave when that's the case.

Stages

You can gradually push your API to certain levels, and see how your system behaves.

For example, you can gradually ramp up the number of virtual users up to 100, and then stay at that level, and then load to 200 and stabilize, and then to 500 and stabilize, and at the end you gradually go back to 0.

// script.js
export const options = {
  vus: 10,
  duration: '5m',
  thresholds: {
    http_req_failed: ['rate<0.01'], // http errors should be less than 1%
    http_req_duration: ['p(99)<200'], // 99% of requests should be below 200ms
  },
  stages: [
    // level 1
    { duration: '1m', target: 100 },
    { duration: '2m', target: 100 },
    // level 2
    { duration: '1m', target: 200 },
    { duration: '2m', target: 200 },
    // level 3
    { duration: '1m', target: 500 },
    { duration: '2m', target: 500 },
    // cool down
    { duration: '1m', target: 0 },
  ],
}
Stress testing

This method of testing your API is called Stress testing.

Stress testing

In this type of test you want to push your API to a certain level to see what happens. And you might want to increase the response time you're expecting, or even the percetage of expected errors. Because it's only fair to assume that when your API is under a lot of stress the performance will be degraded. So what you want to assess is whether the performance is still acceptable under those extreme conditions.

Spike testing

Let's imagine that you are building an API for a concert ticket selling platform. So when there is a very popular event you usually get extremely high volumes of requests during a short period of time and then nothing once the tickets are sold out.

So in this case you don't really have the concept of normal traffic. So load testing doesn't really make sense and stress testing with a slow ramp up and a sustained traffic level doesn't make sense either, because you want to test your API when there is like a sudden increase in the number of requests for a very short period of time.

What you need in that case is a Spike test.

Spike testing

So you start with a stage with a very low number of virtual users, then add a stage where you drastically increase the number of users over a very short period of time and you sustain that high load. Then you decrease the number of virtual users to simulate the end of the spike.

// script.js
export const options = {
  vus: 10,
  duration: '5m',
  thresholds: {
    http_req_failed: ['rate<0.01'], // http errors should be less than 1%
    http_req_duration: ['p(99)<200], // 99% of requests should be below 200ms
  },
  stages: [
    // warm up
    {duration: '30s',target: 100},
    // spike
    {duration: '1m',target: 2_000},
    {duration: '10s',target: 2_000},
    {duration: '1m',target: 100},
    // cool down
    {duration: '30s',target: 0},
  ]
}

The importance of defining performance

As you can see it's really important to define what you mean by performance in order to implement the correct testing strategy.

So in some cases you might just need Spike testing in others you might need Load testing and Stress testing or you might need all three.

Soak testing

There is one thing that none of those three tests can spot. Imagine that there is a memory leak that's causing the performance of your API to slowly degrade over time.

So you won't be able to spot it with a test that only runs over a few minutes.

So what you need to do is run your test over a few hours to ensure that your API has stable response times and resource usage.

// script.js

export const options = {
  vus: 10,
  duration: '5m',
  thresholds: {
    http_req_failed: ['rate<0.01'], // http errors should be less than 1%
    http_req_duration: ['p(99)<200'], // 99% of requests should be below 200ms
  },
  stages: [
    // warm up
    { duration: '1m', target: 200 },
    // sustained load over a long time
    { duration: '4h', target: 200 },
    // cool down
    { duration: '1m', target: 0 },
  ],
}
Soak testing

That type of testing is called Soak testing. Not only can you check the k6 output at the end of the test to check the usual metrics on success rate, but if you also gather some metrics such as memory usage, CPU usage or you record request traces; you can get a very good picture of how your API behaves and potentially identify weak links and the area in your system that are a performance bottleneck.

Conclusion

By using K6 you can test your API under different conditions and ensure that it can handle the load when users start to flood in. By defining what performance means for your API, you can write tests that simulate real-world conditions and identify bottlenecks in your system.

So, is your API ready for user traffic?

Further reading