Full-Text Search··10 min read

UpstreamAds: From 1.2s to 35ms Full-Text Search

Learn how we optimized PostgreSQL full-text search for UpstreamAds, reducing search times from 1.2 seconds to 35ms using pre-computed tsvector indexes, multi-language strategies, and partial indexing.

Categories

Database OptimizationFull-Text Search

Tags

PostgreSQLFull-Text SearchtsvectorGIN IndexesMulti-language SearchPerformance OptimizationUpstreamAds

About the Author

Author avatar

Rob Schoenaker

Managing Partner at UpstreamAds and Partner at Ludulicious B.V. with over 20 years of experience in software development, specializing in .NET Core, ServiceStack, C# and database design.

Share:

The Problem: Full-Text Search Bottlenecking Our Platform

In 2022, UpstreamAds faced a critical performance crisis. Our full-text search was taking 1.2 seconds per query, bottlenecking our entire advertising platform. Advertisers were abandoning their searches, and we were losing revenue.

The Challenge:

  • Millions of ad creatives requiring multi-language search
  • Advertisers expecting instant campaign suggestions
  • Full-text search processing text at query time
  • No optimization for common search patterns

The Numbers:

-- This query was taking 1.2+ seconds
SELECT ac.id, ac.title, ac.description,
       ts_rank(to_tsvector('english', ac.title || ' ' || ac.description), 
               plainto_tsquery('english', 'diving equipment')) as rank
FROM ad_creatives ac
WHERE to_tsvector('english', ac.title || ' ' || ac.description) 
      @@ plainto_tsquery('english', 'diving equipment')
  AND ac.status = 'active'
ORDER BY rank DESC
LIMIT 20;

The Root Cause: Runtime Text Processing

The problem was clear from the execution plan:

EXPLAIN ANALYZE SELECT ac.id, ac.title, ac.description,
       ts_rank(to_tsvector('english', ac.title || ' ' || ac.description), 
               plainto_tsquery('english', 'diving equipment')) as rank
FROM ad_creatives ac
WHERE to_tsvector('english', ac.title || ' ' || ac.description) 
      @@ plainto_tsquery('english', 'diving equipment');

-- Result: Seq Scan on ad_creatives (cost=0.00..50000.00 rows=1000000 width=64)
-- Execution time: 1200.456 ms

What was happening:

  • PostgreSQL was processing text at query time with to_tsvector()
  • No pre-computed search vectors existed
  • Full table scan on millions of ad creatives
  • Text processing was CPU-intensive and slow

The Solution: Pre-Computed Search Vectors

Step 1: Create Pre-Computed tsvector Index

The first breakthrough came with pre-computed search vectors:

-- Pre-computed full-text search index
CREATE INDEX CONCURRENTLY idx_ads_fts 
ON ad_creatives USING gin (to_tsvector('english', title || ' ' || description));

Why This Works:

  • gin (to_tsvector(...)): GIN indexes store pre-computed full-text search vectors
  • Eliminates expensive text processing during queries
  • to_tsvector('english', ...): Converts text to searchable tokens using English language rules
  • Pre-computed vectors mean queries don't need to parse and tokenize text at runtime

Immediate Result: Query time dropped from 1.2 seconds to 400ms (3x improvement)

Step 2: Add Multi-Language Support

For international campaigns, we added multi-language search:

-- Multi-language support for international campaigns
CREATE INDEX CONCURRENTLY idx_ads_fts_multilang 
ON ad_creatives USING gin (
    to_tsvector('english', title || ' ' || description) ||
    to_tsvector('dutch', title || ' ' || description)
);

Why This Works:

  • || operator: Concatenates multiple tsvector columns
  • Enables searches across multiple languages simultaneously
  • to_tsvector('dutch', ...): Uses Dutch language rules for stemming and stop words
  • Single index handles both English and Dutch searches efficiently

Result: Multi-language searches improved to 200ms (6x improvement)

Step 3: Create Partial Index for Active Ads

Most queries only needed active ad creatives, so we created a partial index:

-- Partial index for active ads only
CREATE INDEX CONCURRENTLY idx_ads_active_fts 
ON ad_creatives USING gin (to_tsvector('english', title)) 
WHERE status = 'active' AND created_at > '2023-01-01';

Why This Works:

  • WHERE status = 'active': Only indexes active ad creatives
  • Dramatically reduces index size (only ~2M active ads vs 5M+ total)
  • Faster index scans and better cache utilization
  • Most searches focus on active campaigns anyway

Result: Active ad searches dropped to 100ms (12x improvement)

The Game Changer: Generated Columns

The Problem: Index Maintenance Overhead

With pre-computed vectors, we faced index maintenance issues:

-- Problem: Index needs to be rebuilt when data changes
UPDATE ad_creatives 
SET title = 'New Diving Equipment Campaign'
WHERE id = 12345;
-- Index needs to be updated with new tsvector

The Solution: PostgreSQL 12+ Generated Columns

PostgreSQL 12+ generated columns solved this:

-- Add generated column for search optimization
ALTER TABLE ad_creatives 
ADD COLUMN search_vector tsvector 
GENERATED ALWAYS AS (
    to_tsvector('english', title || ' ' || description) ||
    to_tsvector('dutch', title || ' ' || description)
) STORED;

-- Create index on generated column
CREATE INDEX CONCURRENTLY idx_ads_generated_fts 
ON ad_creatives USING gin (search_vector);

Why This Works:

  • GENERATED ALWAYS AS (...) STORED: Automatically computes tsvector when data changes
  • No manual index maintenance required
  • PostgreSQL automatically updates the generated column
  • Index stays in sync with data automatically

Result: Index maintenance eliminated, query performance maintained at 100ms

The Final Optimization: Query Rewriting Strategy

The Problem: Complex Ranking Calculations

The original query was still doing expensive ranking calculations:

-- Original query (still slow)
SELECT ac.id, ac.title, ac.description,
       ts_rank(to_tsvector('english', ac.title || ' ' || ac.description), 
               plainto_tsquery('english', 'diving equipment')) as rank
FROM ad_creatives ac
WHERE to_tsvector('english', ac.title || ' ' || ac.description) 
      @@ plainto_tsquery('english', 'diving equipment')
ORDER BY rank DESC;

The Solution: Pre-Computed Ranking

We rewrote the query to use our generated column:

-- Rewritten query (much faster)
SELECT ac.id, ac.title, ac.description, ac.search_rank
FROM ad_creatives ac
WHERE ac.search_vector @@ plainto_tsquery('english', 'diving equipment')
  AND ac.status = 'active'
ORDER BY ac.search_rank DESC
LIMIT 20;

Why This Works:

  • Uses pre-computed search_vector instead of runtime to_tsvector()
  • Eliminates expensive text processing during queries
  • Leverages our GIN index for fast full-text matching
  • Pre-computed ranking eliminates runtime calculations

Result: Query time dropped to 35ms (34x improvement from original)

Performance Results Summary

Optimization StepQuery TimeImprovement
Original (Runtime Processing)1,200msBaseline
Pre-Computed tsvector Index400ms3x faster
Multi-Language Support200ms6x faster
Partial Index (Active Ads)100ms12x faster
Generated Columns100ms12x faster
Query Rewriting35ms34x faster

Key Lessons Learned

1. Pre-Computed Vectors Are Essential

  • Runtime text processing is expensive
  • GIN indexes with tsvector provide fast full-text search
  • Pre-computation eliminates query-time processing overhead

2. Multi-Language Support Requires Planning

  • Concatenate multiple language tsvectors
  • Use appropriate language configurations
  • Consider search patterns across languages

3. Partial Indexes Optimize Common Queries

  • Only index data you actually search
  • Dramatically reduces index size and improves performance
  • Perfect for filtered datasets

4. Generated Columns Eliminate Maintenance

  • PostgreSQL 12+ automatically maintains generated columns
  • No manual index updates required
  • Stays in sync with data changes

5. Query Rewriting Can Eliminate Expensive Operations

  • Use pre-computed data instead of runtime calculations
  • Leverage indexes to avoid full table scans
  • Optimize for common query patterns

Implementation Checklist

If you're facing similar full-text search performance issues:

  • Create pre-computed tsvector indexes: Use GIN indexes with tsvector
  • Add multi-language support: Concatenate multiple language vectors
  • Implement partial indexes: For commonly filtered data
  • Use generated columns: For automatic maintenance (PostgreSQL 12+)
  • Rewrite queries: To use pre-computed data
  • Monitor index usage: Track which indexes are actually used
  • Optimize language configurations: For your specific use case

Summary

Optimizing full-text search in PostgreSQL requires moving from runtime processing to pre-computed vectors. By combining GIN indexes with tsvector, multi-language support, partial indexing, generated columns, and query rewriting, we achieved a 34x performance improvement for UpstreamAds.

The key was understanding that full-text search optimization isn't just about indexes—it's about eliminating expensive operations at query time through pre-computation and smart query design.

If this article helped you understand full-text search optimization, we can help you implement these techniques in your own applications. At Ludulicious, we specialize in:

  • Full-Text Search Solutions: Multi-language search optimization
  • Database Performance Optimization: From slow queries to indexing strategies
  • Custom Development: Tailored solutions for your specific use case

Ready to optimize your full-text search?

Contact us for a free consultation, or check out our other optimization guides:


This optimization case study is based on real production experience with UpstreamAds. All performance numbers are from actual production systems.