Duikersgids: How I Made Spatial Search 55x Faster
Categories
Tags
About the Author
Rob Schoenaker
Managing Partner at UpstreamAds and Partner at Ludulicious B.V. with over 20 years of experience in software development, specializing in .NET Core, ServiceStack, C# and database design.
The Problem: Spatial Queries Killing User Experience
In 2019, Duikersgids.nl was struggling with a critical performance issue. Users searching for dive sites near their location were waiting 2.5 seconds for results. For a location-based application, this was unacceptable.
The Challenge:
- 50,000+ dive sites with geographic coordinates
- Complex spatial queries using
ST_DWithinandST_Distance - Users expecting instant location-based results
- Server struggling under load
The Numbers:
-- This query was taking 2.5+ seconds
SELECT ds.name, ds.location, ds.difficulty_level
FROM dive_sites ds
WHERE ST_DWithin(ds.location, ST_Point(4.5, 52.0), 10000)
AND ds.status = 'active'
ORDER BY ST_Distance(ds.location, ST_Point(4.5, 52.0))
LIMIT 20;
The Root Cause: Missing Spatial Indexes
The problem was clear from the execution plan:
EXPLAIN ANALYZE SELECT * FROM dive_sites
WHERE ST_DWithin(location, ST_Point(4.5, 52.0), 10000);
-- Result: Seq Scan on dive_sites (cost=0.00..1250.00 rows=5000 width=64)
-- Execution time: 2500.123 ms
What was happening:
- PostgreSQL was doing a full table scan on 50,000+ records
- No spatial indexes existed for geographic queries
- Every spatial operation was calculated from scratch
- No optimization for common query patterns
The Solution: Strategic Spatial Indexing
Step 1: Create GiST Spatial Index
The first breakthrough came with a proper spatial index:
-- The GiST index that changed everything
CREATE INDEX CONCURRENTLY idx_dive_sites_location
ON dive_sites USING GIST (location);
Why This Works:
GIST (location): GiST indexes are specifically optimized for geometric data types- Enables fast spatial operations like
ST_DWithin,ST_Distance, andST_Contains - Uses R-tree structure internally for efficient spatial range queries
CONCURRENTLYallows index creation without blocking writes
Immediate Result: Query time dropped from 2.5 seconds to 800ms (3x improvement)
Step 2: Add Partial Index for Active Sites
Most queries only needed active dive sites, so we created a partial index:
-- Partial index for active sites (huge win!)
CREATE INDEX CONCURRENTLY idx_dive_sites_active
ON dive_sites (created_at)
WHERE status = 'active';
Why This Works:
WHERE status = 'active': Only indexes rows matching the condition- Dramatically reduces index size (only ~40,000 active sites vs 50,000+ total)
- Faster index scans and better cache utilization
- Most queries filter by active status anyway
Result: Query time improved to 400ms (6x improvement from original)
Step 3: Create Covering Index for Common Queries
Our most common query pattern needed location, dive type, and difficulty level:
-- Composite index for our most common query pattern
CREATE INDEX CONCURRENTLY idx_dive_sites_location_type
ON dive_sites USING GIST (location)
INCLUDE (dive_type, difficulty_level);
Why This Works:
INCLUDE (dive_type, difficulty_level): Covering index includes additional columns- Eliminates table lookups - all needed data comes from the index
- GiST index handles spatial operations, included columns provide additional data
- Perfect for queries that need location + metadata
Result: Query time dropped to 200ms (12x improvement from original)
The Game Changer: PostgreSQL Partitioning
The Problem: Time-Based Queries Still Slow
Even with spatial indexes, queries filtering by date were still slow:
-- This query was still scanning everything
SELECT * FROM dive_sites
WHERE ST_DWithin(location, ST_Point(4.5, 52.0), 10000)
AND created_at > '2023-01-01'; -- Still scanned all partitions!
The Solution: Native Partitioning
PostgreSQL 10+ native partitioning solved this:
-- PostgreSQL 10+: Native partitioning
CREATE TABLE dive_sites_partitioned (
id SERIAL,
name VARCHAR(255),
location GEOGRAPHY(POINT, 4326),
created_at TIMESTAMP WITH TIME ZONE
) PARTITION BY RANGE (created_at);
CREATE TABLE dive_sites_2023 PARTITION OF dive_sites_partitioned
FOR VALUES FROM ('2023-01-01') TO ('2024-01-01');
CREATE TABLE dive_sites_2024 PARTITION OF dive_sites_partitioned
FOR VALUES FROM ('2024-01-01') TO ('2025-01-01');
Why This Works:
PARTITION BY RANGE (created_at): Splits table by date ranges- Each partition is a separate physical table
- PostgreSQL automatically determines which partitions to scan
- Queries with date filters only scan relevant partitions
Result: Time-filtered queries dropped to 100ms (25x improvement)
The Final Optimization: Parallel Processing
The Problem: Complex Spatial Joins CPU-Bound
Complex queries finding nearby dive sites were using only one CPU core:
-- This query was using only one CPU core
SELECT ds1.name, ds2.name,
ST_Distance(ds1.location, ds2.location) as distance
FROM dive_sites ds1, dive_sites ds2
WHERE ds1.id < ds2.id
AND ST_DWithin(ds1.location, ds2.location, 1000)
ORDER BY distance;
The Solution: Enhanced Parallel Processing
PostgreSQL 16+ parallel processing configuration:
-- PostgreSQL 16+: Better parallel execution
SET max_parallel_workers_per_gather = 8;
SET parallel_leader_participation = off;
SET parallel_tuple_cost = 0.05; -- Lower cost threshold
-- Same query now uses all CPU cores
EXPLAIN (ANALYZE, BUFFERS, VERBOSE)
SELECT ds1.name, ds2.name,
ST_Distance(ds1.location, ds2.location) as distance
FROM dive_sites ds1, dive_sites ds2
WHERE ds1.id < ds2.id
AND ST_DWithin(ds1.location, ds2.location, 1000)
ORDER BY distance;
Why This Works:
max_parallel_workers_per_gather = 8: Allows up to 8 worker processesparallel_leader_participation = off: Reduces coordination overheadparallel_tuple_cost = 0.05: Makes PostgreSQL more likely to choose parallel plans- Complex spatial joins split across multiple CPU cores
Result: Complex queries improved to 45ms (55x improvement from original)
Performance Results Summary
| Optimization Step | Query Time | Improvement |
|---|---|---|
| Original (No Indexes) | 2,500ms | Baseline |
| GiST Spatial Index | 800ms | 3x faster |
| Partial Index (Active Sites) | 400ms | 6x faster |
| Covering Index | 200ms | 12x faster |
| Native Partitioning | 100ms | 25x faster |
| Parallel Processing | 45ms | 55x faster |
Key Lessons Learned
1. Spatial Data Requires Special Indexes
- Regular B-tree indexes don't work for geographic data
- GiST indexes are essential for spatial operations
- Consider the specific spatial operations you need
2. Partial Indexes Are Powerful
- Only index data you actually query
- Dramatically reduces index size and improves performance
- Perfect for filtered datasets
3. Covering Indexes Eliminate Table Lookups
- Include frequently accessed columns in indexes
- Reduces I/O operations significantly
- Especially valuable for read-heavy workloads
4. Partitioning Solves Time-Based Queries
- Native partitioning provides automatic partition pruning
- Essential for time-series spatial data
- Scales well as data grows
5. Parallel Processing Scales Complex Operations
- Modern PostgreSQL versions excel at parallel execution
- Configure settings for your hardware
- Monitor CPU utilization to verify effectiveness
Implementation Checklist
If you're facing similar spatial query performance issues:
- Analyze your queries: Use
EXPLAIN ANALYZEto identify bottlenecks - Create GiST indexes: For all geographic columns
- Add partial indexes: For commonly filtered data
- Consider covering indexes: For queries needing additional columns
- Implement partitioning: For time-based spatial data
- Configure parallel processing: For complex spatial operations
- Monitor performance: Track query times and resource usage
Summary
Optimizing spatial queries in PostgreSQL requires a multi-layered approach. By combining GiST indexes, partial indexing, covering indexes, native partitioning, and parallel processing, we achieved a 55x performance improvement for Duikersgids.nl.
The key was understanding that spatial data has unique requirements and requires specialized optimization techniques. Generic database optimization approaches won't work for geographic queries.
If this article helped you understand spatial query optimization, we can help you implement these techniques in your own applications. At Ludulicious, we specialize in:
- Spatial Data Solutions: Geographic queries and location-based applications
- Database Performance Optimization: From slow queries to indexing strategies
- Custom Development: Tailored solutions for your specific use case
Ready to optimize your spatial queries?
Contact us for a free consultation, or check out our other optimization guides:
- PostgreSQL Performance Tuning: Strategic Lessons from Production
- Rijmwoordenboek: Solving the 3-Second Phonetic Search Problem
- UpstreamAds: From 1.2s to 35ms Full-Text Search
- PostgreSQL Configuration: The Settings That Matter
This optimization case study is based on real production experience with Duikersgids.nl. All performance numbers are from actual production systems.
PostgreSQL Performance Tuning: Strategic Lessons from Production
Learn PostgreSQL performance optimization strategies from real production workloads. From version 9.6 to 17, discover the techniques that improved our database performance by 10-55x across multiple applications.
Rijmwoordenboek: Solving the 3-Second Phonetic Search Problem
Learn how we optimized PostgreSQL phonetic search for Van Dale Rijmwoordenboek, reducing search times from 3.2 seconds to 85ms using multi-layered indexing strategies and B-tree deduplication.