Scaling FastAPI from 180 1300 Requests/sec: What Actually Worked

Most FastAPI performance issues aren’t caused by the framework – they’re caused by architecture, blocking I/O, and database query patterns.

I refactored a FastAPI backend that was stuck at ~180 requests/sec with p95 latency over 4 seconds. After a ser…


This content originally appeared on DEV Community and was authored by Winson GR

Most FastAPI performance issues aren't caused by the framework - they're caused by architecture, blocking I/O, and database query patterns.

I refactored a FastAPI backend that was stuck at ~180 requests/sec with p95 latency over 4 seconds. After a series of changes, it handled ~1300 requests/sec at under 200ms p95 - on the same hardware.

No vertical scaling. No extra cloud spend. Just removing bottlenecks.

The Starting Point

The system had grown fast. Speed was prioritized over structure - until it wasn’t.

By the time performance became a problem, the backend had 14+ microservices.

In practice:

  • Auth logic duplicated across 6 services
  • Each service maintained its own DB connection pool
  • A single request triggered 4–5 internal API hops
  • Middleware inconsistently applied

The latency wasn’t coming from slow code. It was coming from the architecture.

Fix 1: Kill the Service Fragmentation

14+ repos → 4 domain-focused services:

Before After
auth, token, session identity-service
report, export, pdf jobs-service
user, profile, prefs user-service
scattered core-api

Before:

Client → core-api → auth → user → report → export

After:

Client → core-api → identity / user / jobs

Result: Internal hops dropped ~4 → ~1

→ ~35% latency reduction

Fix 2: The Stack Wasn't Actually Async

@app.get("/users/{user_id}")
async def get_user(user_id: int):
    result = db.execute(...)  # blocks event loop

Async endpoint ≠ async execution.

Fix:

  • asyncpg instead of psycopg2
  • httpx instead of requests
result = await httpx.AsyncClient().get(...)

Result: ~3x worker concurrency

Fix 3: Remove Heavy Work from Requests

Problem:

  • Emails
  • PDFs
  • Webhooks

All inside request lifecycle.

Fix:

send_email.delay(order_id)
generate_invoice.delay(order_id)

Rule:
If user doesn’t need it before 200 OK → move it out.

Result:

800ms → 80ms endpoints

Fix 4: Fix the Database

N+1 Queries

# Before
for user_id in user_ids:
    await db.fetchrow(...)

# After
await db.fetch("SELECT ... WHERE id = ANY($1)", user_ids)

Missing Index

CREATE INDEX idx_events_user_created
ON events(user_id, created_at DESC);

Overfetching

Pulled only required columns.

Result:

  • Query time ↓ 60–70%
  • DB handled ~4x load

Fix 5: Cache What Doesn't Change

cached = await redis.get(key)
if cached:
    return cached

await redis.setex(key, 300, value)

Result:
~90% reduction in DB hits

Fix 6: Runtime Tuning (Last)

  • uvloop
  • httptools
  • worker tuning

Impact: ~10–15%

Architecture fixes gave ~85% of gains.

Final Numbers

(4 vCPU / 8GB, k6 load test)

Metric Before After
RPS ~180 ~1300
p95 latency ~4200ms ~180ms
DB queries 14 2
Services 14+ 4

Production traffic:
~900–1400 req/sec depending on load

What Breaks Next

At ~1500 RPS:

  • DB connection pool saturation
  • Celery backlog
  • Redis CPU spikes

Next steps:

  • read replicas
  • queue sharding
  • rate limiting

What Actually Matters

Order matters:

  1. Architecture
  2. Async correctness
  3. Background work
  4. Database
  5. Caching
  6. Runtime tuning

Most scaling problems aren’t framework problems.

They’re architecture and DB problems.

Before You Go

If this helped, share it with one engineer hitting the same bottleneck.

🔗 LinkedIn: https://www.linkedin.com/in/winsongr/

🐦 X: https://x.com/winsongr

💻 GitHub: https://github.com/winsongr


This content originally appeared on DEV Community and was authored by Winson GR


Print Share Comment Cite Upload Translate Updates
APA

Winson GR | Sciencx (2026-03-17T05:44:06+00:00) Scaling FastAPI from 180 1300 Requests/sec: What Actually Worked. Retrieved from https://www.scien.cx/2026/03/17/scaling-fastapi-from-180-1300-requests-sec-what-actually-worked/

MLA
" » Scaling FastAPI from 180 1300 Requests/sec: What Actually Worked." Winson GR | Sciencx - Tuesday March 17, 2026, https://www.scien.cx/2026/03/17/scaling-fastapi-from-180-1300-requests-sec-what-actually-worked/
HARVARD
Winson GR | Sciencx Tuesday March 17, 2026 » Scaling FastAPI from 180 1300 Requests/sec: What Actually Worked., viewed ,<https://www.scien.cx/2026/03/17/scaling-fastapi-from-180-1300-requests-sec-what-actually-worked/>
VANCOUVER
Winson GR | Sciencx - » Scaling FastAPI from 180 1300 Requests/sec: What Actually Worked. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2026/03/17/scaling-fastapi-from-180-1300-requests-sec-what-actually-worked/
CHICAGO
" » Scaling FastAPI from 180 1300 Requests/sec: What Actually Worked." Winson GR | Sciencx - Accessed . https://www.scien.cx/2026/03/17/scaling-fastapi-from-180-1300-requests-sec-what-actually-worked/
IEEE
" » Scaling FastAPI from 180 1300 Requests/sec: What Actually Worked." Winson GR | Sciencx [Online]. Available: https://www.scien.cx/2026/03/17/scaling-fastapi-from-180-1300-requests-sec-what-actually-worked/. [Accessed: ]
rf:citation
» Scaling FastAPI from 180 1300 Requests/sec: What Actually Worked | Winson GR | Sciencx | https://www.scien.cx/2026/03/17/scaling-fastapi-from-180-1300-requests-sec-what-actually-worked/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.