Stress Tests

Locust (locust.io) is a tool written in Python that integrates perfectly with your FastAPI project.

Pros: write your test scenarios entirely in Python, making it very flexible and integrable. Supports distributed testing across multiple machines to generate enormous loads. It has a nice web interface for monitoring tests in real-time. It’s natively based on gevent (coroutines), great for generating high concurrency with fewer resources compared to thread/process-based solutions.
Cons: might require a bit more initial setup compared to simpler tools.
Installation: pip install locust

Configure the class for the test (in this case, only 3 routes are tested):

from locust import HttpUser, task, between
import random

class FastAPIUser(HttpUser):
    # Waits between 1 and 5 seconds between task executions for a user
    wait_time = between(1, 5)
    # host = "[http://127.0.0.1:8000](http://127.0.0.1:8000)" # Replace with your app's URL (can also be set via CLI)

    @task
    def get_root(self):
        # Example: GET request to the root
        self.client.get("/")

    @task(3) # This task is executed 3 times more often than others
    def get_items_by_id(self): # Renamed for clarity
        # Example: GET request to a common endpoint
        item_id = random.randint(1, 100)
        # Use 'name' to group similar URLs in the report
        self.client.get(f"/items/{item_id}", name="/items/[id]")

    @task
    def create_item(self):
         # Example: POST request to create a resource
         new_item = {"name": f"LocustItem_{random.randint(1000, 9999)}", "price": random.uniform(10.0, 500.0)}
         self.client.post("/items/", json=new_item, name="/items/create") # Assuming /items/ endpoint for POST

    def on_start(self):
        # Code executed when a simulated user starts (e.g., login)
        # print("Simulated user starting...")
        # Example login:
        # response = self.client.post("/login", json={"username":"test_user", "password":"password"})
        # if response.ok:
        #    self.token = response.json().get("access_token")
        #    if self.token:
        #       self.client.headers = {'Authorization': f'Bearer {self.token}'}
        # else:
        #    print(f"Login failed: {response.status_code}")
        print("Simulated user starting...")


    def on_stop(self):
         # Code executed when a simulated user stops (e.g., logout)
         print("Simulated user stopping.")

And then run the file just created:

locust -f your_locust_file.py --host=[http://127.0.0.1:8000](http://127.0.0.1:8000) # Replace your_locust_file.py

(Note: Added --host to the command line as it’s a common way to specify it)

At this point, opening the page at the address:

http://localhost:8089

you can monitor the test results in real-time.

Define Objectives: what do you want to measure?
- Performance: what is the acceptable average/p95/p99 response time for endpoint X under Y concurrent users? What is the maximum throughput (RPS - Requests Per Second) the app can sustain?
- Stress: at what number of users/RPS does the application start returning errors > 1%? At what load does latency become unacceptable? How do memory/CPU behave under extreme stress? Does the app recover after stress?
Identify Key Scenarios: don’t just test the root (/). Test the most used endpoints, those critical for the business, and those you suspect might be slow (e.g., complex queries, I/O-intensive operations). Simulate realistic user flows if possible (e.g., login -> read data -> modify data).
Prepare the Test Environment: CRUCIAL!
- Isolated: never run load/stress tests on the production environment.
- Similar to Production: hardware (CPU, RAM), network configuration, database version, amount of data in the DB, and FastAPI/Uvicorn/Gunicorn configuration (number of workers!) should mirror the production environment as closely as possible. Otherwise, the results will not be significant.
Establish a Baseline: measure performance with low or no load to have a reference point.
Run Tests (Iteratively):
- Performance/Load: start with a low load and gradually increase it (e.g., 10 users, 50, 100, 200…). Let each load level run for a sufficient time (e.g., 5-15 minutes) to stabilize measurements and identify memory leaks or degradation issues over time.
- Stress: increase the load aggressively until the application shows clear signs of distress (errors, very high latency) or crashes. Observe the breaking point and behavior.
Monitor Key Metrics: during execution, monitor both client-side (from the test tool) and server-side.
- Client-Side (Test Tool):
  - Throughput (RPS): successfully completed requests per second.
  - Latency/Response Time: average, median (p50), p90, p95, p99. High percentiles (p95/p99) show the experience of your slowest users.
  - Error Rate (%): percentage of failed requests (HTTP 5xx, 4xx, timeouts).
- Server-Side (FastAPI App, Database, OS):
  - CPU Usage: one of the first bottlenecks.
  - Memory Usage: watch for constant growth (possible leaks).
  - Network I/O: bandwidth saturation.
  - Disk I/O: relevant if the app or DB makes heavy disk access.
  - Database Metrics: slow queries, active connections, pool usage.
  - FastAPI/Uvicorn Specific Metrics: request queues, event loop utilization (if monitorable). Use monitoring tools (Prometheus+Grafana, Datadog, New Relic) or commands (top, htop, vmstat).

Analysis and Optimization:

Identify Bottlenecks: where does the application get stuck?
- CPU-bound? (CPU at 100%) -> Optimize Python code, increase workers (cautiously), scale horizontally.
- Memory-bound? (Memory exhausted, OOM Killer) -> Optimize memory usage, look for leaks, increase RAM.
- I/O-bound? (Low CPU but long waits) -> Optimize DB queries, use async I/O correctly, improve disk/network performance, use caching.
- Blocks in code? (Blocking synchronous calls in async code) -> Ensure all I/O is awaited, use run_in_executor for heavy synchronous code.
- Uvicorn/Gunicorn configuration? -> The number of workers (--workers) is fundamental. A common starting point is (2 * NUM_CPU_CORES) + 1, but it must be tested.
- DB connection pool exhausted?
Optimize: modify code, queries, configuration, infrastructure.
Retest: verify if the changes have improved performance and moved (or eliminated) the bottleneck.