Stress Tests

YASE - Yet Another Software Engineer

Stress Tests

Locust (locust.io) is a tool written in Python that integrates perfectly with your FastAPI project.

Configure the class for the test (in this case, only 3 routes are tested):

from locust import HttpUser, task, between
import random

class FastAPIUser(HttpUser):
    # Waits between 1 and 5 seconds between task executions for a user
    wait_time = between(1, 5)
    # host = "[http://127.0.0.1:8000](http://127.0.0.1:8000)" # Replace with your app's URL (can also be set via CLI)

    @task
    def get_root(self):
        # Example: GET request to the root
        self.client.get("/")

    @task(3) # This task is executed 3 times more often than others
    def get_items_by_id(self): # Renamed for clarity
        # Example: GET request to a common endpoint
        item_id = random.randint(1, 100)
        # Use 'name' to group similar URLs in the report
        self.client.get(f"/items/{item_id}", name="/items/[id]")

    @task
    def create_item(self):
         # Example: POST request to create a resource
         new_item = {"name": f"LocustItem_{random.randint(1000, 9999)}", "price": random.uniform(10.0, 500.0)}
         self.client.post("/items/", json=new_item, name="/items/create") # Assuming /items/ endpoint for POST

    def on_start(self):
        # Code executed when a simulated user starts (e.g., login)
        # print("Simulated user starting...")
        # Example login:
        # response = self.client.post("/login", json={"username":"test_user", "password":"password"})
        # if response.ok:
        #    self.token = response.json().get("access_token")
        #    if self.token:
        #       self.client.headers = {'Authorization': f'Bearer {self.token}'}
        # else:
        #    print(f"Login failed: {response.status_code}")
        print("Simulated user starting...")


    def on_stop(self):
         # Code executed when a simulated user stops (e.g., logout)
         print("Simulated user stopping.")

And then run the file just created:

locust -f your_locust_file.py --host=[http://127.0.0.1:8000](http://127.0.0.1:8000) # Replace your_locust_file.py

(Note: Added --host to the command line as it’s a common way to specify it)

At this point, opening the page at the address:

http://localhost:8089

you can monitor the test results in real-time.

  1. Define Objectives: what do you want to measure?
    • Performance: what is the acceptable average/p95/p99 response time for endpoint X under Y concurrent users? What is the maximum throughput (RPS - Requests Per Second) the app can sustain?
    • Stress: at what number of users/RPS does the application start returning errors > 1%? At what load does latency become unacceptable? How do memory/CPU behave under extreme stress? Does the app recover after stress?
  2. Identify Key Scenarios: don’t just test the root (/). Test the most used endpoints, those critical for the business, and those you suspect might be slow (e.g., complex queries, I/O-intensive operations). Simulate realistic user flows if possible (e.g., login -> read data -> modify data).
  3. Prepare the Test Environment: CRUCIAL!
    • Isolated: never run load/stress tests on the production environment.
    • Similar to Production: hardware (CPU, RAM), network configuration, database version, amount of data in the DB, and FastAPI/Uvicorn/Gunicorn configuration (number of workers!) should mirror the production environment as closely as possible. Otherwise, the results will not be significant.
  4. Establish a Baseline: measure performance with low or no load to have a reference point.
  5. Run Tests (Iteratively):
    • Performance/Load: start with a low load and gradually increase it (e.g., 10 users, 50, 100, 200…). Let each load level run for a sufficient time (e.g., 5-15 minutes) to stabilize measurements and identify memory leaks or degradation issues over time.
    • Stress: increase the load aggressively until the application shows clear signs of distress (errors, very high latency) or crashes. Observe the breaking point and behavior.
  6. Monitor Key Metrics: during execution, monitor both client-side (from the test tool) and server-side.
    • Client-Side (Test Tool):
      • Throughput (RPS): successfully completed requests per second.
      • Latency/Response Time: average, median (p50), p90, p95, p99. High percentiles (p95/p99) show the experience of your slowest users.
      • Error Rate (%): percentage of failed requests (HTTP 5xx, 4xx, timeouts).
    • Server-Side (FastAPI App, Database, OS):
      • CPU Usage: one of the first bottlenecks.
      • Memory Usage: watch for constant growth (possible leaks).
      • Network I/O: bandwidth saturation.
      • Disk I/O: relevant if the app or DB makes heavy disk access.
      • Database Metrics: slow queries, active connections, pool usage.
      • FastAPI/Uvicorn Specific Metrics: request queues, event loop utilization (if monitorable). Use monitoring tools (Prometheus+Grafana, Datadog, New Relic) or commands (top, htop, vmstat).

Analysis and Optimization:

  1. Identify Bottlenecks: where does the application get stuck?
    • CPU-bound? (CPU at 100%) -> Optimize Python code, increase workers (cautiously), scale horizontally.
    • Memory-bound? (Memory exhausted, OOM Killer) -> Optimize memory usage, look for leaks, increase RAM.
    • I/O-bound? (Low CPU but long waits) -> Optimize DB queries, use async I/O correctly, improve disk/network performance, use caching.
    • Blocks in code? (Blocking synchronous calls in async code) -> Ensure all I/O is awaited, use run_in_executor for heavy synchronous code.
    • Uvicorn/Gunicorn configuration? -> The number of workers (--workers) is fundamental. A common starting point is (2 * NUM_CPU_CORES) + 1, but it must be tested.
    • DB connection pool exhausted?
  2. Optimize: modify code, queries, configuration, infrastructure.
  3. Retest: verify if the changes have improved performance and moved (or eliminated) the bottleneck.