Scaling PostgreSQL to ChatGPT Scale: Lessons from OpenAI's 800 Million User Database

In the world of massive-scale applications, few stories are as compelling as how OpenAI powers ChatGPT and its API platform for over 800 million users. Surprisingly, at the heart of this behemoth is PostgreSQL—not some exotic distributed database, but a single-primary instance handling all the heavy lifting. If you’ve ever wondered whether good old Postgres can scale to internet-level traffic without sharding or fancy rewrites, OpenAI’s approach proves it’s possible with smart optimizations, rigorous discipline, and a focus on read-heavy workloads.

In this blog post, we’ll dive into the strategies OpenAI used to scale PostgreSQL to millions of queries per second (QPS), the challenges they faced, and the best practices that any team building high-traffic apps can apply. Whether you’re running a startup database or architecting for enterprise growth, these insights could save you from premature sharding or migration headaches.

The Architecture: Single Primary, Global Replicas

OpenAI’s setup is deceptively simple: one primary Azure PostgreSQL Flexible Server instance manages all writes, while nearly 50 read replicas distributed across multiple regions handle the bulk of reads. This design ensures low-latency access for users worldwide—think single-digit milliseconds for most queries—without the complexity of sharding.

Why no sharding? Sharding would require re-architecting hundreds of application endpoints, introducing massive operational overhead. Instead, they keep Postgres unsharded for now, migrating only shardable, write-heavy workloads to alternatives like Azure Cosmos DB. The primary runs in high-availability (HA) mode with a hot standby replica ready for quick promotion, minimizing downtime during failures.

This architecture shines for read-dominant apps like ChatGPT, where user data retrieval (e.g., conversation history) far outpaces writes. By offloading reads to replicas, the primary stays focused on what it does best: consistent, ACID-compliant writes.

Key Optimizations for Massive Scale

Scaling to 800 million users isn’t about throwing hardware at the problem—it’s about ruthless efficiency. Here’s how OpenAI squeezed every ounce of performance from Postgres:

1. Reducing Load on the Primary

Route Reads to Replicas: Most read queries bypass the primary entirely, getting served from replicas. This keeps the primary’s CPU and I/O free for writes.
Migrate Write-Heavy Workloads: Anything that can be horizontally partitioned moves to sharded systems. For the rest, application-level tweaks like lazy writes and bug fixes eliminate redundant operations.
Rate Limiting Everywhere: Applied at the app, proxy, and query levels to prevent traffic spikes from overwhelming the system. During data backfills, strict limits ensure operations take days or weeks if needed, avoiding crashes.

2. Query Optimization

Hunt Down Expensive Queries: Multi-table joins were a common culprit, saturating CPU during peaks. OpenAI moved complex logic to the application layer and scrutinized ORM-generated SQL to avoid inefficiencies.
Timeouts and Autovacuum: Settings like idle_in_transaction_session_timeout kill lingering queries, ensuring autovacuum runs smoothly and prevents table bloat.
Avoid ORM Pitfalls: A single bad query joining 12 tables caused incidents; now, they enforce reviews to catch these early.

3. Connection Pooling with PgBouncer

Connections are a silent killer at scale. OpenAI deploys PgBouncer in statement or transaction mode to reuse connections, slashing setup time from 50ms to 5ms. They co-locate poolers with clients and replicas regionally, and tune idle timeouts to prevent exhaustion. This handles thousands of concurrent connections without choking the database.

4. Caching and Workload Isolation

Smart Caching: A caching layer fronts most reads, with “cache locking” ensuring only one request hits the DB during misses—preventing thundering herds that could spike load.
Tiered Traffic: High-priority queries go to dedicated instances, isolating them from low-priority or noisy workloads. This “noisy neighbor” prevention keeps critical paths reliable.

These tweaks deliver millions of QPS with p99 latencies in the low double digits and near-zero replication lag, even across continents.

Challenges and How They Were Overcome

No scaling story is without warts. PostgreSQL’s multiversion concurrency control (MVCC) is a double-edged sword: great for reads, but it amplifies writes by copying rows, leading to bloat, index maintenance overhead, and read amplification under heavy updates.

Write Spikes: Mitigated by migrating workloads and optimizing apps to minimize writes.
Replica Overload: WAL streaming to 50+ replicas stresses the primary’s CPU and network. OpenAI is testing cascading replication, where intermediate replicas relay WAL, to scale beyond 100 replicas.
Schema Changes: Limited to non-disruptive ops (e.g., adding columns without rewrites) with a 5-second timeout. No new tables in Postgres—everything new starts sharded.
Failovers and Incidents: Only one major outage in a year (during ChatGPT’s image generation launch), thanks to hot standbys and close collaboration with Azure’s team.

The key lesson? Discipline matters more than tech. OpenAI enforces rules like no retry storms (short retries amplify failures) and workload defaults to sharded DBs.

Best Practices for Your Own Postgres Scaling Journey

Drawing from OpenAI’s playbook, here are actionable tips for scaling Postgres in large apps:

Start with Vertical Scaling: Beef up your primary (CPU, RAM, storage) before going horizontal. It’s simpler and often sufficient up to millions of users.
Embrace Read Replicas Early: Distribute reads globally for low latency, but monitor WAL traffic.
Optimize Before Migrating: Query tuning, connection pooling, and caching can delay sharding indefinitely for read-heavy apps.
Rate Limit Aggressively: Protect against spikes at every layer.
Hybrid Approach: Use Postgres for what it’s great at (relational, ACID ops), and shard/migrate the rest.
Monitor and Iterate: Track CPU saturation, replication lag, and query performance religiously. Tools like PgBouncer and Azure’s managed services make this easier.

Remember, OpenAI achieved five-nines availability with this setup—proof that Postgres isn’t just for small apps.

Conclusion: Postgres Can Handle the Big Leagues

OpenAI’s scaling of PostgreSQL for 800 million ChatGPT users shatters the myth that you need distributed databases for massive scale. By focusing on optimizations, isolation, and selective migration, they’ve built a reliable backbone for one of the world’s most popular AI platforms. If your app is growing fast, take a page from their book: optimize first, shard last.

Blog

Scaling PostgreSQL to ChatGPT Scale: Lessons from OpenAI’s 800 Million User Database

The Architecture: Single Primary, Global Replicas

Key Optimizations for Massive Scale

1. Reducing Load on the Primary

2. Query Optimization

3. Connection Pooling with PgBouncer

4. Caching and Workload Isolation

Challenges and How They Were Overcome

Best Practices for Your Own Postgres Scaling Journey

Conclusion: Postgres Can Handle the Big Leagues

Blog

The Architecture: Single Primary, Global Replicas

Key Optimizations for Massive Scale

1. Reducing Load on the Primary

2. Query Optimization

3. Connection Pooling with PgBouncer

4. Caching and Workload Isolation

Challenges and How They Were Overcome

Best Practices for Your Own Postgres Scaling Journey

Conclusion: Postgres Can Handle the Big Leagues

Login with your site account