Why Rate Limiting Matters More Than Ever
Every public API eventually meets the same fate: someone, somewhere, fires off 10,000 requests per second and either crashes the database or bankrupts you in cloud bills. Rate limiting is the safety valve that keeps the lights on. Done right, legitimate users never notice it. Done wrong, you turn away paying customers and still get clobbered by scrapers.
This guide walks through battle-tested patterns you can copy-paste today, starting with the simplest one-line fixes and ending with planet-scale distributed counters. No math degrees required.
The Three Questions You Must Answer First
Before touching code, decide:
- What are you protecting? CPU, memory, database connections, third-party bill, or all of the above?
- Who gets how much? Free tier, pro tier, enterprise, internal microservice?
- What happens when the limit is hit? Hard reject, queue, slowdown, or surcharge?
Write the answers in plain English and store them in the repo readme. Future you will thank present you when the 3 A.M. pages start.
Algorithm Cheat Sheet: Pick One in 60 Seconds
Algorithm | Memory | Precision | Burst Friendly | Code Complexity |
---|---|---|---|---|
Fixed Window | Low | Bad | No | 1/5 |
Sliding Window Log | High | Perfect | Yes | 4/5 |
Sliding Window Counter | Medium | Good | Yes | 3/5 |
Token Bucket | Low | Good | Yes | 2/5 |
Leaky Bucket | Low | Good | No | 2/5 |
Rule of thumb: start with token bucket for APIs, fixed window for cron jobs, sliding window counter for user-facing dashboards.
Token Bucket in 20 Lines of Node.js
No external deps, no Redis, works on a $5 VPS.
const buckets = new Map(); // uid -> tokens
function isAllowed(uid, capacity = 10, refillRate = 1) {
const now = Date.now();
let bucket = buckets.get(uid);
if (!bucket) {
bucket = { tokens: capacity, lastRefill: now };
buckets.set(uid, bucket);
}
const tokensToAdd = Math.floor((now - bucket.lastRefill) / 1000) * refillRate;
bucket.tokens = Math.min(capacity, bucket.tokens + tokensToAdd);
bucket.lastRefill = now;
if (bucket.tokens >= 1) {
bucket.tokens -= 1;
return true;
}
return false;
}
Wrap it in middleware:
app.use((req, res, next) => {
const ok = isAllowed(req.ip, 60, 2); // 60 burst, 2/s steady
if (!ok) return res.status(429).json({ error: 'Too Many Requests' });
next();
});
The map keeps data in RAM, so restart wipes state. Good enough for side projects; production needs persistence.
Scaling Up With Redis
Once you run on two containers, in-memory maps diverge. Redis gives atomic increments and per-key TTL. Install ioredis, then:
import Redis from 'ioredis';
const redis = new Redis();
async function rateLimit(key, limit, windowSeconds) {
const pipeline = redis.pipeline();
pipeline.incr(key);
pipeline.expire(key, windowSeconds);
const results = await pipeline.exec();
const count = results[0][1];
return count <= limit;
}
Call with rateLimit(`api:${userId}`, 1000, 3600)
for 1000 requests per hour. Redis runs the operations atomically, so race conditions disappear even at 50k req/s.
Sliding Window Counter With Redis Lua
Fixed windows reset at 00:00, gifting abusers 2x capacity. Sliding windows fix that by keeping per-second buckets. Ship a Lua script to avoid round-trips:
local key = KEYS[1]
local window = tonumber(ARGV[1])
local limit = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
local trim = now - window
redis.call('ZREMRANGEBYSCORE', key, 0, trim)
local count = redis.call('ZCARD', key)
if count >= limit then
return 0
end
redis.call('ZADD', key, now, now)
redis.call('EXPIRE', key, window)
return 1
Load once, then evaluate with SHA. Gives per-second accuracy with O(log n) memory.
HTTP Headers Developers Expect
Return consistent headers so front-end teams can build retries without reading docs:
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 42
X-RateLimit-Reset: 1710000000
Retry-After: 58
Include Retry-After
in 429 responses; browsers and curl respect it. Use Unix timestamp for Reset
to avoid timezone pain.
Graceful Degradation Strategies
Hard rejects hurt UX. Instead, downgrade:
- Return cached data with
Warning: 299 - "Rate limit exceeded"
- Enqueue request for later processing and email result
- Serve a lighter payload (fewer fields, compressed images)
- Switch to read-only replica
Log every downgrade; if revenue drops, raise the limit.
Different Limits for Different Actors
One size does not fit all. Common buckets:
- IP address
- Stops dumb bots, but breaks corporate NAT
- API key / JWT sub
- Fair per-customer quota
- IP + user combo
- Stop user sharing one pro key with the whole campus
- Endpoint + method
- GET /prices can be 10x higher than POST /order
- GraphQL field
- Deep nesting costs more; limit by complexity score
Store rules in a config file and hot-reload without deploy:
{
"default": { "window": 60, "max": 100 },
"POST /upload": { "window": 60, "max": 5 },
"role:premium": { multiplier: 5 }
}
Distributed Systems: The 1000 Node Problem
When your API runs in 12 regions, Redis becomes a single point of latency. Options:
- Local token bucket on each edge node plus periodic Redis sync; 95 % of traffic stays local.
- Cell-based approach partition users by UID mod 1024; each cell owns its Redis shard.
- Eventually consistent counters using CRDTs; accept 1 % over-limit to save 200 ms.
Cloudflare uses a similar cell design to handle 45 million requests per second.
Observability: Alerts That Actually Page
Export three metrics:
rate_limit_rejected_total
counter with labels{key, route}
rate_limit_utilization_ratio
gauge (0-1) of current/limitrate_limit_duration_seconds
histogram of checker latency
Alert when:
- Rejection rate > 5 % for > 5 min (possible DDoS)
- p99 checker latency > 10 ms (Redis dying)
- Top user > 80 % of global limit (review quota)
Common Pitfalls Checklist
- Returning 403 instead of 429 breaks HTTP spec and SEO.
- Forgetting to expire Redis keys leaks memory until OOM.
- Counting bytes instead of requests invites gzip bombs.
- Applying limits after expensive auth middleware wastes CPU.
- Relying solely on IP breaks IPv6 rapid-changing mobile nets.
Testing Your Limits
Unit test the core function, but also run integration chaos:
const autocannon = require('autocannon');
autocannon({
url: 'http://localhost:3000/api',
connections: 50,
duration: 10,
headers: { 'Authorization': 'Bearer eyJ...' }
}, console.log);
Expect 200s for the first 100 requests, then 429s. Plot latency jump; it should stay flat under load.
Open-Source Libraries You Can Trust
- Express: express-rate-limit, express-slow-down
- Fastify: @fastify/rate-limit (uses Redis or memory)
- NGINX: limit_req_zone (kernel-level speed)
- Envoy: local_rate_limit + global_rate_limit filters
- Kong: rate-limiting plugin with cluster sync
Read their source; most are under 300 lines and easy to audit.
Conclusion: Start Simple, Evolve Later
Ship a token bucket in memory today. When the first tech-crunch spike hits, move the counter to Redis. When you expand to three continents, partition by cell. At every step, monitor, log, and talk to your users. Rate limiting is not a one-time config—it is a living contract between you and everyone who depends on your API.
Disclaimer: This article is for educational purposes only and does not constitute legal or financial advice. All code is provided as-is without warranty. Article generated by an AI language model.