Spatial Data··10 min read

Duikersgids: How I Made Spatial Search 55x Faster

Learn how we optimized PostgreSQL spatial queries for Duikersgids.nl, reducing search times from 2.5 seconds to 45ms using GiST indexes, partitioning, and parallel processing techniques.

Categories

Database OptimizationSpatial Data

Tags

PostgreSQLSpatial QueriesGiST IndexesGeographic DataPerformance OptimizationDuikersgids

About the Author

Author avatar

Rob Schoenaker

Managing Partner at UpstreamAds and Partner at Ludulicious B.V. with over 20 years of experience in software development, specializing in .NET Core, ServiceStack, C# and database design.

Share:

The Problem: Spatial Queries Killing User Experience

In 2019, Duikersgids.nl was struggling with a critical performance issue. Users searching for dive sites near their location were waiting 2.5 seconds for results. For a location-based application, this was unacceptable.

The Challenge:

  • 50,000+ dive sites with geographic coordinates
  • Complex spatial queries using ST_DWithin and ST_Distance
  • Users expecting instant location-based results
  • Server struggling under load

The Numbers:

-- This query was taking 2.5+ seconds
SELECT ds.name, ds.location, ds.difficulty_level
FROM dive_sites ds
WHERE ST_DWithin(ds.location, ST_Point(4.5, 52.0), 10000)
  AND ds.status = 'active'
ORDER BY ST_Distance(ds.location, ST_Point(4.5, 52.0))
LIMIT 20;

The Root Cause: Missing Spatial Indexes

The problem was clear from the execution plan:

EXPLAIN ANALYZE SELECT * FROM dive_sites 
WHERE ST_DWithin(location, ST_Point(4.5, 52.0), 10000);

-- Result: Seq Scan on dive_sites (cost=0.00..1250.00 rows=5000 width=64)
-- Execution time: 2500.123 ms

What was happening:

  • PostgreSQL was doing a full table scan on 50,000+ records
  • No spatial indexes existed for geographic queries
  • Every spatial operation was calculated from scratch
  • No optimization for common query patterns

The Solution: Strategic Spatial Indexing

Step 1: Create GiST Spatial Index

The first breakthrough came with a proper spatial index:

-- The GiST index that changed everything
CREATE INDEX CONCURRENTLY idx_dive_sites_location 
ON dive_sites USING GIST (location);

Why This Works:

  • GIST (location): GiST indexes are specifically optimized for geometric data types
  • Enables fast spatial operations like ST_DWithin, ST_Distance, and ST_Contains
  • Uses R-tree structure internally for efficient spatial range queries
  • CONCURRENTLY allows index creation without blocking writes

Immediate Result: Query time dropped from 2.5 seconds to 800ms (3x improvement)

Step 2: Add Partial Index for Active Sites

Most queries only needed active dive sites, so we created a partial index:

-- Partial index for active sites (huge win!)
CREATE INDEX CONCURRENTLY idx_dive_sites_active 
ON dive_sites (created_at) 
WHERE status = 'active';

Why This Works:

  • WHERE status = 'active': Only indexes rows matching the condition
  • Dramatically reduces index size (only ~40,000 active sites vs 50,000+ total)
  • Faster index scans and better cache utilization
  • Most queries filter by active status anyway

Result: Query time improved to 400ms (6x improvement from original)

Step 3: Create Covering Index for Common Queries

Our most common query pattern needed location, dive type, and difficulty level:

-- Composite index for our most common query pattern
CREATE INDEX CONCURRENTLY idx_dive_sites_location_type 
ON dive_sites USING GIST (location) 
INCLUDE (dive_type, difficulty_level);

Why This Works:

  • INCLUDE (dive_type, difficulty_level): Covering index includes additional columns
  • Eliminates table lookups - all needed data comes from the index
  • GiST index handles spatial operations, included columns provide additional data
  • Perfect for queries that need location + metadata

Result: Query time dropped to 200ms (12x improvement from original)

The Game Changer: PostgreSQL Partitioning

The Problem: Time-Based Queries Still Slow

Even with spatial indexes, queries filtering by date were still slow:

-- This query was still scanning everything
SELECT * FROM dive_sites 
WHERE ST_DWithin(location, ST_Point(4.5, 52.0), 10000)
  AND created_at > '2023-01-01';  -- Still scanned all partitions!

The Solution: Native Partitioning

PostgreSQL 10+ native partitioning solved this:

-- PostgreSQL 10+: Native partitioning
CREATE TABLE dive_sites_partitioned (
    id SERIAL,
    name VARCHAR(255),
    location GEOGRAPHY(POINT, 4326),
    created_at TIMESTAMP WITH TIME ZONE
) PARTITION BY RANGE (created_at);

CREATE TABLE dive_sites_2023 PARTITION OF dive_sites_partitioned
FOR VALUES FROM ('2023-01-01') TO ('2024-01-01');

CREATE TABLE dive_sites_2024 PARTITION OF dive_sites_partitioned
FOR VALUES FROM ('2024-01-01') TO ('2025-01-01');

Why This Works:

  • PARTITION BY RANGE (created_at): Splits table by date ranges
  • Each partition is a separate physical table
  • PostgreSQL automatically determines which partitions to scan
  • Queries with date filters only scan relevant partitions

Result: Time-filtered queries dropped to 100ms (25x improvement)

The Final Optimization: Parallel Processing

The Problem: Complex Spatial Joins CPU-Bound

Complex queries finding nearby dive sites were using only one CPU core:

-- This query was using only one CPU core
SELECT ds1.name, ds2.name, 
       ST_Distance(ds1.location, ds2.location) as distance
FROM dive_sites ds1, dive_sites ds2 
WHERE ds1.id < ds2.id 
  AND ST_DWithin(ds1.location, ds2.location, 1000)
ORDER BY distance;

The Solution: Enhanced Parallel Processing

PostgreSQL 16+ parallel processing configuration:

-- PostgreSQL 16+: Better parallel execution
SET max_parallel_workers_per_gather = 8;
SET parallel_leader_participation = off;
SET parallel_tuple_cost = 0.05;  -- Lower cost threshold

-- Same query now uses all CPU cores
EXPLAIN (ANALYZE, BUFFERS, VERBOSE)
SELECT ds1.name, ds2.name, 
       ST_Distance(ds1.location, ds2.location) as distance
FROM dive_sites ds1, dive_sites ds2 
WHERE ds1.id < ds2.id 
  AND ST_DWithin(ds1.location, ds2.location, 1000)
ORDER BY distance;

Why This Works:

  • max_parallel_workers_per_gather = 8: Allows up to 8 worker processes
  • parallel_leader_participation = off: Reduces coordination overhead
  • parallel_tuple_cost = 0.05: Makes PostgreSQL more likely to choose parallel plans
  • Complex spatial joins split across multiple CPU cores

Result: Complex queries improved to 45ms (55x improvement from original)

Performance Results Summary

Optimization StepQuery TimeImprovement
Original (No Indexes)2,500msBaseline
GiST Spatial Index800ms3x faster
Partial Index (Active Sites)400ms6x faster
Covering Index200ms12x faster
Native Partitioning100ms25x faster
Parallel Processing45ms55x faster

Key Lessons Learned

1. Spatial Data Requires Special Indexes

  • Regular B-tree indexes don't work for geographic data
  • GiST indexes are essential for spatial operations
  • Consider the specific spatial operations you need

2. Partial Indexes Are Powerful

  • Only index data you actually query
  • Dramatically reduces index size and improves performance
  • Perfect for filtered datasets

3. Covering Indexes Eliminate Table Lookups

  • Include frequently accessed columns in indexes
  • Reduces I/O operations significantly
  • Especially valuable for read-heavy workloads

4. Partitioning Solves Time-Based Queries

  • Native partitioning provides automatic partition pruning
  • Essential for time-series spatial data
  • Scales well as data grows

5. Parallel Processing Scales Complex Operations

  • Modern PostgreSQL versions excel at parallel execution
  • Configure settings for your hardware
  • Monitor CPU utilization to verify effectiveness

Implementation Checklist

If you're facing similar spatial query performance issues:

  • Analyze your queries: Use EXPLAIN ANALYZE to identify bottlenecks
  • Create GiST indexes: For all geographic columns
  • Add partial indexes: For commonly filtered data
  • Consider covering indexes: For queries needing additional columns
  • Implement partitioning: For time-based spatial data
  • Configure parallel processing: For complex spatial operations
  • Monitor performance: Track query times and resource usage

Summary

Optimizing spatial queries in PostgreSQL requires a multi-layered approach. By combining GiST indexes, partial indexing, covering indexes, native partitioning, and parallel processing, we achieved a 55x performance improvement for Duikersgids.nl.

The key was understanding that spatial data has unique requirements and requires specialized optimization techniques. Generic database optimization approaches won't work for geographic queries.

If this article helped you understand spatial query optimization, we can help you implement these techniques in your own applications. At Ludulicious, we specialize in:

  • Spatial Data Solutions: Geographic queries and location-based applications
  • Database Performance Optimization: From slow queries to indexing strategies
  • Custom Development: Tailored solutions for your specific use case

Ready to optimize your spatial queries?

Contact us for a free consultation, or check out our other optimization guides:


This optimization case study is based on real production experience with Duikersgids.nl. All performance numbers are from actual production systems.