Spatial Data·Jan 17, 2025·10 min read

Duikersgids: How I Made Spatial Search 55x Faster

Learn how we optimized PostgreSQL spatial queries for Duikersgids.nl, reducing search times from 2.5 seconds to 45ms using GiST indexes, partitioning, and parallel processing techniques.

About the Author

Rob Schoenaker

Managing Partner at UpstreamAds and Partner at Ludulicious B.V. with over 20 years of experience in software development, specializing in .NET Core, ServiceStack, C# and database design.

View Profile

The Problem: Spatial Queries Killing User Experience

In 2019, Duikersgids.nl was struggling with a critical performance issue. Users searching for dive sites near their location were waiting 2.5 seconds for results. For a location-based application, this was unacceptable.

The Challenge:

50,000+ dive sites with geographic coordinates
Complex spatial queries using ST_DWithin and ST_Distance
Users expecting instant location-based results
Server struggling under load

The Numbers:

-- This query was taking 2.5+ seconds
SELECT ds.name, ds.location, ds.difficulty_level
FROM dive_sites ds
WHERE ST_DWithin(ds.location, ST_Point(4.5, 52.0), 10000)
  AND ds.status = 'active'
ORDER BY ST_Distance(ds.location, ST_Point(4.5, 52.0))
LIMIT 20;

Duikersgids performance monitoring

The Root Cause: Missing Spatial Indexes

The problem was clear from the execution plan:

EXPLAIN ANALYZE SELECT * FROM dive_sites 
WHERE ST_DWithin(location, ST_Point(4.5, 52.0), 10000);

-- Result: Seq Scan on dive_sites (cost=0.00..1250.00 rows=5000 width=64)
-- Execution time: 2500.123 ms

What was happening:

PostgreSQL was doing a full table scan on 50,000+ records
No spatial indexes existed for geographic queries
Every spatial operation was calculated from scratch
No optimization for common query patterns

The Solution: Strategic Spatial Indexing

Step 1: Create GiST Spatial Index

The first breakthrough came with a proper spatial index:

-- The GiST index that changed everything
CREATE INDEX CONCURRENTLY idx_dive_sites_location 
ON dive_sites USING GIST (location);

Why This Works:

GIST (location): GiST indexes are specifically optimized for geometric data types
Enables fast spatial operations like ST_DWithin, ST_Distance, and ST_Contains
Uses R-tree structure internally for efficient spatial range queries
CONCURRENTLY allows index creation without blocking writes

Immediate Result: Query time dropped from 2.5 seconds to 800ms (3x improvement)

Step 2: Add Partial Index for Active Sites

Most queries only needed active dive sites, so we created a partial index:

-- Partial index for active sites (huge win!)
CREATE INDEX CONCURRENTLY idx_dive_sites_active 
ON dive_sites (created_at) 
WHERE status = 'active';

Why This Works:

WHERE status = 'active': Only indexes rows matching the condition
Dramatically reduces index size (only ~40,000 active sites vs 50,000+ total)
Faster index scans and better cache utilization
Most queries filter by active status anyway

Result: Query time improved to 400ms (6x improvement from original)

Step 3: Create Covering Index for Common Queries

Our most common query pattern needed location, dive type, and difficulty level:

-- Composite index for our most common query pattern
CREATE INDEX CONCURRENTLY idx_dive_sites_location_type 
ON dive_sites USING GIST (location) 
INCLUDE (dive_type, difficulty_level);

Why This Works:

INCLUDE (dive_type, difficulty_level): Covering index includes additional columns
Eliminates table lookups - all needed data comes from the index
GiST index handles spatial operations, included columns provide additional data
Perfect for queries that need location + metadata

Result: Query time dropped to 200ms (12x improvement from original)

The Game Changer: PostgreSQL Partitioning

The Problem: Time-Based Queries Still Slow

Even with spatial indexes, queries filtering by date were still slow:

-- This query was still scanning everything
SELECT * FROM dive_sites 
WHERE ST_DWithin(location, ST_Point(4.5, 52.0), 10000)
  AND created_at > '2023-01-01';  -- Still scanned all partitions!

The Solution: Native Partitioning

PostgreSQL 10+ native partitioning solved this:

-- PostgreSQL 10+: Native partitioning
CREATE TABLE dive_sites_partitioned (
    id SERIAL,
    name VARCHAR(255),
    location GEOGRAPHY(POINT, 4326),
    created_at TIMESTAMP WITH TIME ZONE
) PARTITION BY RANGE (created_at);

CREATE TABLE dive_sites_2023 PARTITION OF dive_sites_partitioned
FOR VALUES FROM ('2023-01-01') TO ('2024-01-01');

CREATE TABLE dive_sites_2024 PARTITION OF dive_sites_partitioned
FOR VALUES FROM ('2024-01-01') TO ('2025-01-01');

Why This Works:

PARTITION BY RANGE (created_at): Splits table by date ranges
Each partition is a separate physical table
PostgreSQL automatically determines which partitions to scan
Queries with date filters only scan relevant partitions

Result: Time-filtered queries dropped to 100ms (25x improvement)

The Final Optimization: Parallel Processing

The Problem: Complex Spatial Joins CPU-Bound

Complex queries finding nearby dive sites were using only one CPU core:

-- This query was using only one CPU core
SELECT ds1.name, ds2.name, 
       ST_Distance(ds1.location, ds2.location) as distance
FROM dive_sites ds1, dive_sites ds2 
WHERE ds1.id < ds2.id 
  AND ST_DWithin(ds1.location, ds2.location, 1000)
ORDER BY distance;

The Solution: Enhanced Parallel Processing

PostgreSQL 16+ parallel processing configuration:

-- PostgreSQL 16+: Better parallel execution
SET max_parallel_workers_per_gather = 8;
SET parallel_leader_participation = off;
SET parallel_tuple_cost = 0.05;  -- Lower cost threshold

-- Same query now uses all CPU cores
EXPLAIN (ANALYZE, BUFFERS, VERBOSE)
SELECT ds1.name, ds2.name, 
       ST_Distance(ds1.location, ds2.location) as distance
FROM dive_sites ds1, dive_sites ds2 
WHERE ds1.id < ds2.id 
  AND ST_DWithin(ds1.location, ds2.location, 1000)
ORDER BY distance;

Why This Works:

max_parallel_workers_per_gather = 8: Allows up to 8 worker processes
parallel_leader_participation = off: Reduces coordination overhead
parallel_tuple_cost = 0.05: Makes PostgreSQL more likely to choose parallel plans
Complex spatial joins split across multiple CPU cores

Result: Complex queries improved to 45ms (55x improvement from original)

Performance Results Summary

Optimization Step	Query Time	Improvement
Original (No Indexes)	2,500ms	Baseline
GiST Spatial Index	800ms	3x faster
Partial Index (Active Sites)	400ms	6x faster
Covering Index	200ms	12x faster
Native Partitioning	100ms	25x faster
Parallel Processing	45ms	55x faster

Key Lessons Learned

1. Spatial Data Requires Special Indexes

Regular B-tree indexes don't work for geographic data
GiST indexes are essential for spatial operations
Consider the specific spatial operations you need

2. Partial Indexes Are Powerful

Only index data you actually query
Dramatically reduces index size and improves performance
Perfect for filtered datasets

3. Covering Indexes Eliminate Table Lookups

Include frequently accessed columns in indexes
Reduces I/O operations significantly
Especially valuable for read-heavy workloads

4. Partitioning Solves Time-Based Queries

Native partitioning provides automatic partition pruning
Essential for time-series spatial data
Scales well as data grows

5. Parallel Processing Scales Complex Operations

Modern PostgreSQL versions excel at parallel execution
Configure settings for your hardware
Monitor CPU utilization to verify effectiveness

Implementation Checklist

If you're facing similar spatial query performance issues:

Analyze your queries: Use EXPLAIN ANALYZE to identify bottlenecks
Create GiST indexes: For all geographic columns
Add partial indexes: For commonly filtered data
Consider covering indexes: For queries needing additional columns
Implement partitioning: For time-based spatial data
Configure parallel processing: For complex spatial operations
Monitor performance: Track query times and resource usage

Summary

Optimizing spatial queries in PostgreSQL requires a multi-layered approach. By combining GiST indexes, partial indexing, covering indexes, native partitioning, and parallel processing, we achieved a 55x performance improvement for Duikersgids.nl.

The key was understanding that spatial data has unique requirements and requires specialized optimization techniques. Generic database optimization approaches won't work for geographic queries.

If this article helped you understand spatial query optimization, we can help you implement these techniques in your own applications. At Ludulicious, we specialize in:

Spatial Data Solutions: Geographic queries and location-based applications
Database Performance Optimization: From slow queries to indexing strategies
Custom Development: Tailored solutions for your specific use case

Ready to optimize your spatial queries?

This optimization case study is based on real production experience with Duikersgids.nl. All performance numbers are from actual production systems.

PostgreSQL Performance Tuning: Strategic Lessons from Production

Learn PostgreSQL performance optimization strategies from real production workloads. From version 9.6 to 17, discover the techniques that improved our database performance by 10-55x across multiple applications.

Rijmwoordenboek: Solving the 3-Second Phonetic Search Problem

Learn how we optimized PostgreSQL phonetic search for Van Dale Rijmwoordenboek, reducing search times from 3.2 seconds to 85ms using multi-layered indexing strategies and B-tree deduplication.

Duikersgids: How I Made Spatial Search 55x Faster

Categories

Tags

About the Author

Rob Schoenaker

The Problem: Spatial Queries Killing User Experience

The Root Cause: Missing Spatial Indexes

The Solution: Strategic Spatial Indexing

Step 1: Create GiST Spatial Index

Step 2: Add Partial Index for Active Sites

Step 3: Create Covering Index for Common Queries

The Game Changer: PostgreSQL Partitioning

The Problem: Time-Based Queries Still Slow

The Solution: Native Partitioning

The Final Optimization: Parallel Processing

The Problem: Complex Spatial Joins CPU-Bound

The Solution: Enhanced Parallel Processing

Performance Results Summary

Key Lessons Learned

1. Spatial Data Requires Special Indexes

2. Partial Indexes Are Powerful

3. Covering Indexes Eliminate Table Lookups

4. Partitioning Solves Time-Based Queries

5. Parallel Processing Scales Complex Operations

Implementation Checklist

Summary

Ludulicious