Full-Text Search·Jan 17, 2025·10 min read

UpstreamAds: From 1.2s to 35ms Full-Text Search

Learn how we optimized PostgreSQL full-text search for UpstreamAds, reducing search times from 1.2 seconds to 35ms using pre-computed tsvector indexes, multi-language strategies, and partial indexing.

About the Author

Rob Schoenaker

Managing Partner at UpstreamAds and Partner at Ludulicious B.V. with over 20 years of experience in software development, specializing in .NET Core, ServiceStack, C# and database design.

View Profile

The Problem: Full-Text Search Bottlenecking Our Platform

In 2022, UpstreamAds faced a critical performance crisis. Our full-text search was taking 1.2 seconds per query, bottlenecking our entire advertising platform. Advertisers were abandoning their searches, and we were losing revenue.

The Challenge:

Millions of ad creatives requiring multi-language search
Advertisers expecting instant campaign suggestions
Full-text search processing text at query time
No optimization for common search patterns

The Numbers:

-- This query was taking 1.2+ seconds
SELECT ac.id, ac.title, ac.description,
       ts_rank(to_tsvector('english', ac.title || ' ' || ac.description), 
               plainto_tsquery('english', 'diving equipment')) as rank
FROM ad_creatives ac
WHERE to_tsvector('english', ac.title || ' ' || ac.description) 
      @@ plainto_tsquery('english', 'diving equipment')
  AND ac.status = 'active'
ORDER BY rank DESC
LIMIT 20;

UpstreamAds full-text search performance

The Root Cause: Runtime Text Processing

The problem was clear from the execution plan:

EXPLAIN ANALYZE SELECT ac.id, ac.title, ac.description,
       ts_rank(to_tsvector('english', ac.title || ' ' || ac.description), 
               plainto_tsquery('english', 'diving equipment')) as rank
FROM ad_creatives ac
WHERE to_tsvector('english', ac.title || ' ' || ac.description) 
      @@ plainto_tsquery('english', 'diving equipment');

-- Result: Seq Scan on ad_creatives (cost=0.00..50000.00 rows=1000000 width=64)
-- Execution time: 1200.456 ms

What was happening:

PostgreSQL was processing text at query time with to_tsvector()
No pre-computed search vectors existed
Full table scan on millions of ad creatives
Text processing was CPU-intensive and slow

The Solution: Pre-Computed Search Vectors

Step 1: Create Pre-Computed tsvector Index

The first breakthrough came with pre-computed search vectors:

-- Pre-computed full-text search index
CREATE INDEX CONCURRENTLY idx_ads_fts 
ON ad_creatives USING gin (to_tsvector('english', title || ' ' || description));

Why This Works:

gin (to_tsvector(...)): GIN indexes store pre-computed full-text search vectors
Eliminates expensive text processing during queries
to_tsvector('english', ...): Converts text to searchable tokens using English language rules
Pre-computed vectors mean queries don't need to parse and tokenize text at runtime

Immediate Result: Query time dropped from 1.2 seconds to 400ms (3x improvement)

Step 2: Add Multi-Language Support

For international campaigns, we added multi-language search:

-- Multi-language support for international campaigns
CREATE INDEX CONCURRENTLY idx_ads_fts_multilang 
ON ad_creatives USING gin (
    to_tsvector('english', title || ' ' || description) ||
    to_tsvector('dutch', title || ' ' || description)
);

Why This Works:

|| operator: Concatenates multiple tsvector columns
Enables searches across multiple languages simultaneously
to_tsvector('dutch', ...): Uses Dutch language rules for stemming and stop words
Single index handles both English and Dutch searches efficiently

Result: Multi-language searches improved to 200ms (6x improvement)

Step 3: Create Partial Index for Active Ads

Most queries only needed active ad creatives, so we created a partial index:

-- Partial index for active ads only
CREATE INDEX CONCURRENTLY idx_ads_active_fts 
ON ad_creatives USING gin (to_tsvector('english', title)) 
WHERE status = 'active' AND created_at > '2023-01-01';

Why This Works:

WHERE status = 'active': Only indexes active ad creatives
Dramatically reduces index size (only ~2M active ads vs 5M+ total)
Faster index scans and better cache utilization
Most searches focus on active campaigns anyway

Result: Active ad searches dropped to 100ms (12x improvement)

The Game Changer: Generated Columns

The Problem: Index Maintenance Overhead

With pre-computed vectors, we faced index maintenance issues:

-- Problem: Index needs to be rebuilt when data changes
UPDATE ad_creatives 
SET title = 'New Diving Equipment Campaign'
WHERE id = 12345;
-- Index needs to be updated with new tsvector

The Solution: PostgreSQL 12+ Generated Columns

PostgreSQL 12+ generated columns solved this:

-- Add generated column for search optimization
ALTER TABLE ad_creatives 
ADD COLUMN search_vector tsvector 
GENERATED ALWAYS AS (
    to_tsvector('english', title || ' ' || description) ||
    to_tsvector('dutch', title || ' ' || description)
) STORED;

-- Create index on generated column
CREATE INDEX CONCURRENTLY idx_ads_generated_fts 
ON ad_creatives USING gin (search_vector);

Why This Works:

GENERATED ALWAYS AS (...) STORED: Automatically computes tsvector when data changes
No manual index maintenance required
PostgreSQL automatically updates the generated column
Index stays in sync with data automatically

Result: Index maintenance eliminated, query performance maintained at 100ms

The Final Optimization: Query Rewriting Strategy

The Problem: Complex Ranking Calculations

The original query was still doing expensive ranking calculations:

-- Original query (still slow)
SELECT ac.id, ac.title, ac.description,
       ts_rank(to_tsvector('english', ac.title || ' ' || ac.description), 
               plainto_tsquery('english', 'diving equipment')) as rank
FROM ad_creatives ac
WHERE to_tsvector('english', ac.title || ' ' || ac.description) 
      @@ plainto_tsquery('english', 'diving equipment')
ORDER BY rank DESC;

The Solution: Pre-Computed Ranking

We rewrote the query to use our generated column:

-- Rewritten query (much faster)
SELECT ac.id, ac.title, ac.description, ac.search_rank
FROM ad_creatives ac
WHERE ac.search_vector @@ plainto_tsquery('english', 'diving equipment')
  AND ac.status = 'active'
ORDER BY ac.search_rank DESC
LIMIT 20;

Why This Works:

Uses pre-computed search_vector instead of runtime to_tsvector()
Eliminates expensive text processing during queries
Leverages our GIN index for fast full-text matching
Pre-computed ranking eliminates runtime calculations

Result: Query time dropped to 35ms (34x improvement from original)

Performance Results Summary

Optimization Step	Query Time	Improvement
Original (Runtime Processing)	1,200ms	Baseline
Pre-Computed tsvector Index	400ms	3x faster
Multi-Language Support	200ms	6x faster
Partial Index (Active Ads)	100ms	12x faster
Generated Columns	100ms	12x faster
Query Rewriting	35ms	34x faster

Key Lessons Learned

1. Pre-Computed Vectors Are Essential

Runtime text processing is expensive
GIN indexes with tsvector provide fast full-text search
Pre-computation eliminates query-time processing overhead

2. Multi-Language Support Requires Planning

Concatenate multiple language tsvectors
Use appropriate language configurations
Consider search patterns across languages

3. Partial Indexes Optimize Common Queries

Only index data you actually search
Dramatically reduces index size and improves performance
Perfect for filtered datasets

4. Generated Columns Eliminate Maintenance

PostgreSQL 12+ automatically maintains generated columns
No manual index updates required
Stays in sync with data changes

5. Query Rewriting Can Eliminate Expensive Operations

Use pre-computed data instead of runtime calculations
Leverage indexes to avoid full table scans
Optimize for common query patterns

Implementation Checklist

If you're facing similar full-text search performance issues:

Create pre-computed tsvector indexes: Use GIN indexes with tsvector
Add multi-language support: Concatenate multiple language vectors
Implement partial indexes: For commonly filtered data
Use generated columns: For automatic maintenance (PostgreSQL 12+)
Rewrite queries: To use pre-computed data
Monitor index usage: Track which indexes are actually used
Optimize language configurations: For your specific use case

Summary

Optimizing full-text search in PostgreSQL requires moving from runtime processing to pre-computed vectors. By combining GIN indexes with tsvector, multi-language support, partial indexing, generated columns, and query rewriting, we achieved a 34x performance improvement for UpstreamAds.

The key was understanding that full-text search optimization isn't just about indexes—it's about eliminating expensive operations at query time through pre-computation and smart query design.

If this article helped you understand full-text search optimization, we can help you implement these techniques in your own applications. At Ludulicious, we specialize in:

Full-Text Search Solutions: Multi-language search optimization
Database Performance Optimization: From slow queries to indexing strategies
Custom Development: Tailored solutions for your specific use case

Ready to optimize your full-text search?

This optimization case study is based on real production experience with UpstreamAds. All performance numbers are from actual production systems.

Rijmwoordenboek: Serving Pages Under 15ms with Better Caching

Learn how we optimized Rijmwoordenboek page load times from 100ms+ to under 15ms using application-level caching, database query optimization, and response time strategies.

UpstreamAds: Fixing Write Performance with WAL Optimization

Learn how we optimized PostgreSQL write performance for UpstreamAds, improving ad creative save times from 500ms to 100ms using WAL configuration, hardware optimization, and connection pooling strategies.

UpstreamAds: From 1.2s to 35ms Full-Text Search

Categories

Tags

About the Author

Rob Schoenaker

The Problem: Full-Text Search Bottlenecking Our Platform

The Root Cause: Runtime Text Processing

The Solution: Pre-Computed Search Vectors

Step 1: Create Pre-Computed tsvector Index

Step 2: Add Multi-Language Support

Step 3: Create Partial Index for Active Ads

The Game Changer: Generated Columns

The Problem: Index Maintenance Overhead

The Solution: PostgreSQL 12+ Generated Columns

The Final Optimization: Query Rewriting Strategy

The Problem: Complex Ranking Calculations

The Solution: Pre-Computed Ranking

Performance Results Summary

Key Lessons Learned

1. Pre-Computed Vectors Are Essential

2. Multi-Language Support Requires Planning

3. Partial Indexes Optimize Common Queries

4. Generated Columns Eliminate Maintenance

5. Query Rewriting Can Eliminate Expensive Operations

Implementation Checklist

Summary

Ludulicious