UpstreamAds: From 1.2s to 35ms Full-Text Search
Categories
Tags
About the Author
Rob Schoenaker
Managing Partner at UpstreamAds and Partner at Ludulicious B.V. with over 20 years of experience in software development, specializing in .NET Core, ServiceStack, C# and database design.
The Problem: Full-Text Search Bottlenecking Our Platform
In 2022, UpstreamAds faced a critical performance crisis. Our full-text search was taking 1.2 seconds per query, bottlenecking our entire advertising platform. Advertisers were abandoning their searches, and we were losing revenue.
The Challenge:
- Millions of ad creatives requiring multi-language search
- Advertisers expecting instant campaign suggestions
- Full-text search processing text at query time
- No optimization for common search patterns
The Numbers:
-- This query was taking 1.2+ seconds
SELECT ac.id, ac.title, ac.description,
ts_rank(to_tsvector('english', ac.title || ' ' || ac.description),
plainto_tsquery('english', 'diving equipment')) as rank
FROM ad_creatives ac
WHERE to_tsvector('english', ac.title || ' ' || ac.description)
@@ plainto_tsquery('english', 'diving equipment')
AND ac.status = 'active'
ORDER BY rank DESC
LIMIT 20;
The Root Cause: Runtime Text Processing
The problem was clear from the execution plan:
EXPLAIN ANALYZE SELECT ac.id, ac.title, ac.description,
ts_rank(to_tsvector('english', ac.title || ' ' || ac.description),
plainto_tsquery('english', 'diving equipment')) as rank
FROM ad_creatives ac
WHERE to_tsvector('english', ac.title || ' ' || ac.description)
@@ plainto_tsquery('english', 'diving equipment');
-- Result: Seq Scan on ad_creatives (cost=0.00..50000.00 rows=1000000 width=64)
-- Execution time: 1200.456 ms
What was happening:
- PostgreSQL was processing text at query time with
to_tsvector() - No pre-computed search vectors existed
- Full table scan on millions of ad creatives
- Text processing was CPU-intensive and slow
The Solution: Pre-Computed Search Vectors
Step 1: Create Pre-Computed tsvector Index
The first breakthrough came with pre-computed search vectors:
-- Pre-computed full-text search index
CREATE INDEX CONCURRENTLY idx_ads_fts
ON ad_creatives USING gin (to_tsvector('english', title || ' ' || description));
Why This Works:
gin (to_tsvector(...)): GIN indexes store pre-computed full-text search vectors- Eliminates expensive text processing during queries
to_tsvector('english', ...): Converts text to searchable tokens using English language rules- Pre-computed vectors mean queries don't need to parse and tokenize text at runtime
Immediate Result: Query time dropped from 1.2 seconds to 400ms (3x improvement)
Step 2: Add Multi-Language Support
For international campaigns, we added multi-language search:
-- Multi-language support for international campaigns
CREATE INDEX CONCURRENTLY idx_ads_fts_multilang
ON ad_creatives USING gin (
to_tsvector('english', title || ' ' || description) ||
to_tsvector('dutch', title || ' ' || description)
);
Why This Works:
||operator: Concatenates multiple tsvector columns- Enables searches across multiple languages simultaneously
to_tsvector('dutch', ...): Uses Dutch language rules for stemming and stop words- Single index handles both English and Dutch searches efficiently
Result: Multi-language searches improved to 200ms (6x improvement)
Step 3: Create Partial Index for Active Ads
Most queries only needed active ad creatives, so we created a partial index:
-- Partial index for active ads only
CREATE INDEX CONCURRENTLY idx_ads_active_fts
ON ad_creatives USING gin (to_tsvector('english', title))
WHERE status = 'active' AND created_at > '2023-01-01';
Why This Works:
WHERE status = 'active': Only indexes active ad creatives- Dramatically reduces index size (only ~2M active ads vs 5M+ total)
- Faster index scans and better cache utilization
- Most searches focus on active campaigns anyway
Result: Active ad searches dropped to 100ms (12x improvement)
The Game Changer: Generated Columns
The Problem: Index Maintenance Overhead
With pre-computed vectors, we faced index maintenance issues:
-- Problem: Index needs to be rebuilt when data changes
UPDATE ad_creatives
SET title = 'New Diving Equipment Campaign'
WHERE id = 12345;
-- Index needs to be updated with new tsvector
The Solution: PostgreSQL 12+ Generated Columns
PostgreSQL 12+ generated columns solved this:
-- Add generated column for search optimization
ALTER TABLE ad_creatives
ADD COLUMN search_vector tsvector
GENERATED ALWAYS AS (
to_tsvector('english', title || ' ' || description) ||
to_tsvector('dutch', title || ' ' || description)
) STORED;
-- Create index on generated column
CREATE INDEX CONCURRENTLY idx_ads_generated_fts
ON ad_creatives USING gin (search_vector);
Why This Works:
GENERATED ALWAYS AS (...)STORED: Automatically computes tsvector when data changes- No manual index maintenance required
- PostgreSQL automatically updates the generated column
- Index stays in sync with data automatically
Result: Index maintenance eliminated, query performance maintained at 100ms
The Final Optimization: Query Rewriting Strategy
The Problem: Complex Ranking Calculations
The original query was still doing expensive ranking calculations:
-- Original query (still slow)
SELECT ac.id, ac.title, ac.description,
ts_rank(to_tsvector('english', ac.title || ' ' || ac.description),
plainto_tsquery('english', 'diving equipment')) as rank
FROM ad_creatives ac
WHERE to_tsvector('english', ac.title || ' ' || ac.description)
@@ plainto_tsquery('english', 'diving equipment')
ORDER BY rank DESC;
The Solution: Pre-Computed Ranking
We rewrote the query to use our generated column:
-- Rewritten query (much faster)
SELECT ac.id, ac.title, ac.description, ac.search_rank
FROM ad_creatives ac
WHERE ac.search_vector @@ plainto_tsquery('english', 'diving equipment')
AND ac.status = 'active'
ORDER BY ac.search_rank DESC
LIMIT 20;
Why This Works:
- Uses pre-computed
search_vectorinstead of runtimeto_tsvector() - Eliminates expensive text processing during queries
- Leverages our GIN index for fast full-text matching
- Pre-computed ranking eliminates runtime calculations
Result: Query time dropped to 35ms (34x improvement from original)
Performance Results Summary
| Optimization Step | Query Time | Improvement |
|---|---|---|
| Original (Runtime Processing) | 1,200ms | Baseline |
| Pre-Computed tsvector Index | 400ms | 3x faster |
| Multi-Language Support | 200ms | 6x faster |
| Partial Index (Active Ads) | 100ms | 12x faster |
| Generated Columns | 100ms | 12x faster |
| Query Rewriting | 35ms | 34x faster |
Key Lessons Learned
1. Pre-Computed Vectors Are Essential
- Runtime text processing is expensive
- GIN indexes with tsvector provide fast full-text search
- Pre-computation eliminates query-time processing overhead
2. Multi-Language Support Requires Planning
- Concatenate multiple language tsvectors
- Use appropriate language configurations
- Consider search patterns across languages
3. Partial Indexes Optimize Common Queries
- Only index data you actually search
- Dramatically reduces index size and improves performance
- Perfect for filtered datasets
4. Generated Columns Eliminate Maintenance
- PostgreSQL 12+ automatically maintains generated columns
- No manual index updates required
- Stays in sync with data changes
5. Query Rewriting Can Eliminate Expensive Operations
- Use pre-computed data instead of runtime calculations
- Leverage indexes to avoid full table scans
- Optimize for common query patterns
Implementation Checklist
If you're facing similar full-text search performance issues:
- Create pre-computed tsvector indexes: Use GIN indexes with tsvector
- Add multi-language support: Concatenate multiple language vectors
- Implement partial indexes: For commonly filtered data
- Use generated columns: For automatic maintenance (PostgreSQL 12+)
- Rewrite queries: To use pre-computed data
- Monitor index usage: Track which indexes are actually used
- Optimize language configurations: For your specific use case
Summary
Optimizing full-text search in PostgreSQL requires moving from runtime processing to pre-computed vectors. By combining GIN indexes with tsvector, multi-language support, partial indexing, generated columns, and query rewriting, we achieved a 34x performance improvement for UpstreamAds.
The key was understanding that full-text search optimization isn't just about indexes—it's about eliminating expensive operations at query time through pre-computation and smart query design.
If this article helped you understand full-text search optimization, we can help you implement these techniques in your own applications. At Ludulicious, we specialize in:
- Full-Text Search Solutions: Multi-language search optimization
- Database Performance Optimization: From slow queries to indexing strategies
- Custom Development: Tailored solutions for your specific use case
Ready to optimize your full-text search?
Contact us for a free consultation, or check out our other optimization guides:
- PostgreSQL Performance Tuning: Strategic Lessons from Production
- Duikersgids: How I Made Spatial Search 55x Faster
- Rijmwoordenboek: Solving the 3-Second Phonetic Search Problem
- Rijmwoordenboek: Serving Pages Under 15ms with Better Caching
- UpstreamAds: Fixing Write Performance with WAL Optimization
- PostgreSQL Configuration: The Settings That Matter
This optimization case study is based on real production experience with UpstreamAds. All performance numbers are from actual production systems.
Rijmwoordenboek: Serving Pages Under 15ms with Better Caching
Learn how we optimized Rijmwoordenboek page load times from 100ms+ to under 15ms using application-level caching, database query optimization, and response time strategies.
UpstreamAds: Fixing Write Performance with WAL Optimization
Learn how we optimized PostgreSQL write performance for UpstreamAds, improving ad creative save times from 500ms to 100ms using WAL configuration, hardware optimization, and connection pooling strategies.