Scaling Database Performance: From MVP to Production
Scaling Database Performance: From MVP to Production
Your database is often the bottleneck as your application grows. What works for an MVP with 100 users won't work for 100,000 users. Here's how to scale your database performance from prototype to production.
Start with the Right Foundation
Choose the Right Database
PostgreSQL: Excellent for complex queries, ACID compliance, and relational data. Great default choice.
MongoDB: Good for flexible schemas, document storage, and rapid iteration. Less ideal for complex joins.
MySQL: Widely supported, good for read-heavy workloads. Less feature-rich than PostgreSQL.
Redis: In-memory cache and key-value store. Perfect for sessions, caching, and real-time data.
Design Your Schema Well
Normalize, but not too much:
- Normalize to reduce redundancy
- Denormalize for read performance when needed
Use appropriate data types:
- Don't use VARCHAR(255) for everything
- Use integers for IDs, not strings
- Use timestamps, not strings for dates
Index strategically:
- Index foreign keys
- Index frequently queried columns
- Don't over-index (slows writes)
Indexing Strategies
Primary and Foreign Keys
Always index primary keys and foreign keys:
SQL
CREATE TABLE users (
id SERIAL PRIMARY KEY,
email VARCHAR(255) UNIQUE NOT NULL
);
CREATE TABLE posts (
id SERIAL PRIMARY KEY,
user_id INTEGER REFERENCES users(id),
title VARCHAR(255),
INDEX idx_user_id (user_id) -- Index foreign key
);
Composite Indexes
Create composite indexes for queries that filter on multiple columns:
SQL
-- Query: SELECT * FROM posts WHERE user_id = ? AND status = 'published'
CREATE INDEX idx_user_status ON posts(user_id, status);
Order matters: Put the most selective column first.
Partial Indexes
Index only a subset of rows:
SQL
-- Only index active users
CREATE INDEX idx_active_users ON users(email) WHERE status = 'active';
Query Optimization
Avoid N+1 Queries
Bad:
JAVASCRIPT
const users = await db.query('SELECT * FROM users');
for (const user of users) {
const posts = await db.query('SELECT * FROM posts WHERE user_id = ?', [user.id]);
}
Good:
JAVASCRIPT
const users = await db.query('SELECT * FROM users');
const userIds = users.map(u => u.id);
const posts = await db.query('SELECT * FROM posts WHERE user_id IN (?)', [userIds]);
Or use JOINs:
SQL
SELECT u.*, p.*
FROM users u
LEFT JOIN posts p ON u.id = p.user_id
WHERE u.id IN (?)
Use EXPLAIN
Always use EXPLAIN to understand query execution:
SQL
EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'user@example.com';
Look for:
- Sequential scans (bad)
- Index scans (good)
- High execution times
Limit Results
Always use LIMIT:
SQL
SELECT * FROM posts ORDER BY created_at DESC LIMIT 20;
Select Only What You Need
Don't use SELECT *:
SQL
-- Bad
SELECT * FROM users;
-- Good
SELECT id, name, email FROM users;
Connection Pooling
Use connection pooling to reuse database connections:
JAVASCRIPT
// Using pg (PostgreSQL)
const pool = new Pool({
max: 20, // Maximum connections
idleTimeoutMillis: 30000,
connectionTimeoutMillis: 2000,
});
Caching Strategies
Application-Level Caching
Cache frequently accessed data:
JAVASCRIPT
// Using Redis
const cacheKey = user:${userId};
let user = await redis.get(cacheKey);
if (!user) {
user = await db.query('SELECT * FROM users WHERE id = ?', [userId]);
await redis.setex(cacheKey, 3600, JSON.stringify(user)); // Cache for 1 hour
}
Query Result Caching
Cache expensive query results:
JAVASCRIPT
const cacheKey = 'popular_posts';
let posts = await redis.get(cacheKey);
if (!posts) {
posts = await db.query('SELECT * FROM posts WHERE views > 1000 ORDER BY views DESC LIMIT 10');
await redis.setex(cacheKey, 300, JSON.stringify(posts)); // Cache for 5 minutes
}
Cache Invalidation
Invalidate cache when data changes:
JAVASCRIPT
async function updateUser(userId, data) {
await db.query('UPDATE users SET ... WHERE id = ?', [userId]);
await redis.del(user:${userId}); // Invalidate cache
}
Read Replicas
Use read replicas to distribute read load:
JAVASCRIPT
// Write to primary
await primaryDb.query('INSERT INTO users ...');
// Read from replica
const users = await replicaDb.query('SELECT * FROM users');
Partitioning
Partition large tables by date or range:
SQL
-- Partition by month
CREATE TABLE events (
id SERIAL,
created_at TIMESTAMP,
data JSONB
) PARTITION BY RANGE (created_at);
CREATE TABLE events_2024_01 PARTITION OF events
FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
Monitoring and Profiling
Slow Query Log
Enable slow query logging:
SQL
-- PostgreSQL
log_min_duration_statement = 1000 -- Log queries > 1 second
Database Metrics
Monitor:
- Query execution time
- Connection pool usage
- Cache hit rates
- Disk I/O
- CPU usage
Use APM Tools
Tools like New Relic, Datadog, or Sentry can help identify slow queries.
Scaling Strategies by Stage
MVP (0-1K users)
- Single database instance
- Basic indexing
- Simple caching (if needed)
Growth (1K-10K users)
- Connection pooling
- Query optimization
- Application-level caching
- Database monitoring
Scale (10K-100K users)
- Read replicas
- Advanced indexing
- Query result caching
- Database tuning
Enterprise (100K+ users)
- Database sharding
- Partitioning
- CDN for static data
- Dedicated database team
Common Mistakes
1. Not using indexes: Causes full table scans
2. Over-fetching data: SELECT * everywhere
3. N+1 queries: Loading related data in loops
4. No connection pooling: Creating new connections per request
5. Ignoring slow queries: Not monitoring performance
6. Premature optimization: Optimizing before measuring
7. No caching strategy: Hitting database for everything
Tools and Resources
- pgAdmin: PostgreSQL administration
- Redis Insight: Redis management
- EXPLAIN ANALYZE: Query analysis
- pg_stat_statements: PostgreSQL query statistics
- New Relic / Datadog: Application performance monitoring
Conclusion
Database performance is critical for application scalability. Start with good schema design and indexing, optimize queries, implement caching, and scale horizontally when needed.
Remember: Measure first, optimize second. Use monitoring tools to identify actual bottlenecks before optimizing.
---
*Struggling with database performance? Contact us to discuss optimization strategies for your application.*
Enjoyed this article?
If you found this helpful, let's discuss how we can help with your next project.
Book a call