Skip to content

Performance Benchmarks

gpio includes a benchmark suite for measuring performance and detecting regressions across versions.

Quick Start

Run benchmarks comparing current version against a previous release:

# Run benchmarks on current version
python scripts/version_benchmark.py --version-label "current" -o results_current.json

# Compare against previous results
python scripts/version_benchmark.py --compare results_baseline.json results_current.json

Benchmark Operations

The suite tests these operations covering most gpio capabilities:

Extract Operations

Operation Description
inspect Read and display file metadata
extract-limit Extract first 100 rows
extract-columns Extract specific columns (includes geometry)
extract-bbox Spatial bounding box filtering

Add Column Operations

Operation Description
add-bbox Add bounding box column
add-quadkey Add quadkey column (resolution 12)
add-h3 Add H3 cell ID column (resolution 8)

Sort Operations

Operation Description
sort-hilbert Sort by Hilbert curve for spatial locality
sort-quadkey Sort by quadkey spatial index

Transform Operations

Operation Description
reproject Reproject to Web Mercator (EPSG:3857)

Partition Operations

Operation Description
partition-quadkey Partition by quadkey (resolution 4)
partition-h3 Partition by H3 cells (resolution 4)

Convert/Export Operations

Operation Description
convert-geojson Convert to GeoJSON format
convert-flatgeobuf Convert to FlatGeobuf format
convert-geopackage Convert to GeoPackage format

Import Operations

Operation Description
import-geojson Import from GeoJSON to GeoParquet
import-geopackage Import from GeoPackage to GeoParquet

Note: Import operations only run on tiny/small file sizes as source format files are only available in those sizes.

Chain Operations (Multi-step Workflows)

Operation Description
chain-extract-bbox-sort Extract columns → Add bbox → Hilbert sort
chain-filter-sort Bbox filter → Hilbert sort

Operation Presets

Preset Operations
quick inspect, extract-limit, add-bbox
standard inspect, extract-limit, extract-columns, add-bbox, sort-hilbert
full All 19 operations including imports and chains

Test Data

Benchmark files are hosted on source.coop with different size tiers:

Tier Rows Geometry CRS Source
tiny 1,000 Polygon EPSG:4326 Overture Buildings (Singapore)
small 10,000 Polygon EPSG:4326 Overture Buildings (Singapore)
medium 100,000 Polygon EPSG:4326 Overture Buildings (Singapore)
large 809,000 Polygon EPSG:3794 fiboa field boundaries (Slovenia)
points-tiny 1,000 Point EPSG:3857 Building centroids (Web Mercator)
points-small 10,000 Point EPSG:3857 Building centroids (Web Mercator)

The points files provide variation in geometry type and CRS for regression testing.

File Presets

Preset Files
quick tiny, small
standard small, medium
full tiny, small, medium, large, points-tiny, points-small

Files are automatically downloaded and cached locally in /tmp/gpio-benchmark-cache/.

Running Benchmarks Locally

Version Comparison Script

The scripts/version_benchmark.py script works with any gpio version:

# Run full benchmarks (all files, all operations)
python scripts/version_benchmark.py --version-label "v0.9.0" -o results.json

# Run quick benchmarks (smaller file set, fewer operations)
python scripts/version_benchmark.py --version-label "v0.9.0" -o results.json --files quick --ops quick

# Run benchmarks with more iterations for accuracy
python scripts/version_benchmark.py --version-label "main" -o results.json -n 5

# Run specific operations on specific files
python scripts/version_benchmark.py --version-label "test" --files small,medium --ops add-bbox,sort-hilbert

# Compare two result files
python scripts/version_benchmark.py --compare results_baseline.json results_current.json

# Analyze trends across multiple baselines (oldest to newest)
python scripts/version_benchmark.py --trend results_v0.7.0.json results_v0.8.0.json results_v0.9.0.json

# Customize degradation threshold for trend detection (default: 0.05 = 5%)
python scripts/version_benchmark.py --trend baseline1.json baseline2.json baseline3.json --trend-threshold 0.10

# Skip local caching (test remote file performance)
python scripts/version_benchmark.py --version-label "remote-test" --no-cache

Managing Historical Baselines

Use the scripts/manage_baselines.py tool to work with baselines stored in GitHub artifacts:

# List available baselines
uv run python scripts/manage_baselines.py list

# Download specific baseline versions
uv run python scripts/manage_baselines.py download v0.9.0 v0.8.0

# Compare specific baselines (downloads if needed)
uv run python scripts/manage_baselines.py compare v0.8.0 v0.9.0

# Analyze trends across multiple versions
uv run python scripts/manage_baselines.py trends v0.7.0 v0.8.0 v0.9.0

# Use custom degradation threshold
uv run python scripts/manage_baselines.py trends v0.7.0 v0.8.0 v0.9.0 --threshold 0.10

Authentication: - Requires GitHub token: set GITHUB_TOKEN or authenticate with gh auth login - Auto-detects repository from git remote - Downloads baselines to baselines/ directory by default

### Sample Output

**Point-in-time comparison:**
====================================================================== Comparison: v0.9.0 vs main ======================================================================

Operation File v0.9.0 main Delta

inspect tiny 0.468s 0.440s -5.8% faster extract-limit tiny 0.543s 0.540s -0.5% faster add-bbox large 0.378s 0.408s +8.1% slower sort-hilbert large 27.366s 26.946s -1.5% faster

**Trend analysis across releases:**
====================================================================== Trend Analysis Across Releases ====================================================================== Versions: v0.7.0 → v0.8.0 → v0.9.0 Baselines: 3 Operations tracked: 42

Overall Statistics: Average change: +1.23% Max regression: +12.5% Max improvement: -8.3%

⚠️ Gradual Degradation Detected (2 operations):

• extract-limit (small): 7.2% avg degradation over last 2 releases • partition-quadkey (medium): 6.1% avg degradation over last 2 releases

🚀 Consistent Improvements (3 operations):

• add-bbox (large): 8.5% avg improvement over last 2 releases • sort-hilbert (small): 5.9% avg improvement over last 2 releases • inspect (tiny): 5.2% avg improvement over last 2 releases

### CLI Benchmark Commands

gpio also includes built-in benchmark commands:

```bash
# Run benchmark suite on specific files
gpio benchmark suite --files path/to/file.parquet --operations core

# Run quick benchmark (single operation, timing only)
gpio benchmark run inspect path/to/file.parquet

Profiling Integration

When benchmarks identify performance regressions, profiling helps diagnose which code paths are responsible.

Enabling Profiling

Add the --profile flag to enable cProfile integration:

# Run benchmarks with profiling enabled
gpio benchmark suite \
  --files path/to/file.parquet \
  --operations core \
  --profile \
  --profile-dir ./profiles

# Profile specific operations
gpio benchmark suite \
  --files large.parquet \
  --operations add-bbox,sort-hilbert \
  --profile
from geoparquet_io.core.benchmark_suite import run_benchmark_suite
from pathlib import Path

# Run benchmarks with profiling enabled
result = run_benchmark_suite(
    input_files=[Path('path/to/file.parquet')],
    operations=['add-bbox', 'extract', 'inspect'],
    iterations=3,
    profile=True,
    profile_dir=Path('./profiles'),
    verbose=True
)

# Profile files are saved in ./profiles/
print(f"Generated {len(result.results)} benchmark results")

This generates .prof files in the specified directory (default: ./profiles/).

Analyzing Profile Data

View profile interactively:

uv run python -m pstats profiles/add-bbox_large_1.prof
# Then use commands like:
# - stats 20  (show top 20 functions)
# - sort cumtime  (sort by cumulative time)
# - callers duckdb  (show callers of duckdb functions)
from geoparquet_io.benchmarks.profile_report import format_profile_stats

# Show top 20 slowest functions
summary = format_profile_stats('profiles/add-bbox_large_1.prof', top_n=20)
print(summary)

Sample profile output:

Profile: add-bbox_large_1.prof
================================================================================

Top 20 functions by cumulative time:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.002    0.002   12.456   12.456 geoparquet_io/core/add_column.py:45(add_bbox_column)
        1    0.001    0.001   11.234   11.234 duckdb.py:123(execute)
      100    5.678    0.057    9.876    0.099 duckdb.py:234(_fetch_arrow)
    10000    2.345    0.000    3.456    0.000 pyarrow.lib:456(cast)

Profiling Overhead

  • Profiling adds ~5-15% overhead to benchmark timing
  • Profile files are typically 50-500KB each
  • Disabled by default to keep benchmarks fast

CI Integration

The benchmark workflow automatically suggests profiling when regressions are detected:

⚠️ Performance regression detected (+25% slower on sort-hilbert)

💡 To diagnose, run locally with profiling:
   gpio benchmark suite --files large.parquet --operations sort-hilbert --profile

Profile artifacts are uploaded with 30-day retention when profiling is enabled.

GitHub Actions Workflows

PR Benchmarks (Opt-in)

Benchmarks run on PRs only when the benchmark label is added:

  1. Add the benchmark label to your PR
  2. The workflow runs automatically
  3. Results are posted as a comment on the PR

Manual Benchmark Run

Run benchmarks manually from the Actions tab:

  1. Go to ActionsBenchmark Suite
  2. Click Run workflow
  3. Configure options:
  4. iterations: Number of runs per operation (default: 3)
  5. files: File preset or comma-separated list (default: full)
  6. ops: Operation preset or comma-separated list (default: full)
  7. compare_version: Optional version to compare against (e.g., v0.9.0)
  8. View results in the workflow summary

Release Benchmarks

When a release is created, benchmarks automatically:

  1. Run on the new release version
  2. Compare against the previous release tag
  3. Detect regressions (>25% slower)
  4. Fetch historical baselines from up to 5 previous releases
  5. Analyze performance trends across releases
  6. Append results to the release notes

Results include: - Point-in-time comparison table showing performance delta - Performance trends across multiple releases - Warning for significant regressions (>25% in single release) - Warning for gradual degradation (>5% per release for 2+ consecutive releases) - Detailed benchmark data in collapsible section

Baseline Storage: - Baselines are stored as GitHub Actions artifacts - Retention: 400 days (covers ~5-10 releases) - Artifact naming: release-benchmark-{version} - Contains: benchmark results JSON, comparison text, trend analysis

Where Results Are Published

Trigger Results Location
PR with benchmark label Comment on PR
Manual workflow run Workflow summary + artifacts
Release Appended to release notes

All runs also upload JSON artifacts for historical tracking.

Interpreting Results

Regression Thresholds

Point-in-time (single release comparison):

Severity Threshold Action
Normal variance ±10% No action needed
Warning +10-25% Investigate cause
Regression >+25% Flagged in release notes

Trend analysis (across multiple releases):

Pattern Threshold Action
Gradual degradation >5% per release for 2+ consecutive releases Warning flagged
Consistent improvement >5% per release for 2+ consecutive releases Highlighted
Single spike One-time regression/improvement Ignored (not a trend)

Trend analysis helps detect gradual performance drift that might be missed when comparing only adjacent releases.

Expected Variance

  • Small files (<10K rows): High variance (±20%) due to startup overhead
  • Large files (>100K rows): Low variance (±5%), most reliable for comparison
  • CI environment: May differ from local; compare CI-to-CI results

Known Performance Characteristics

Operation Notes
inspect Slower since v0.6.0 due to geometry type detection
add-bbox 75x faster since v0.6.0 for large files
extract with geometry Slow due to WKB serialization; use --exclude-cols geometry if not needed
sort-hilbert Scales linearly with row count

Pre-Release Checklist

Before releasing a new version:

  1. Run benchmarks locally against the previous release:

    # Install previous version
    git checkout v0.9.0 && pip install -e .
    python scripts/version_benchmark.py --version-label "v0.9.0" -o baseline.json -n 5
    
    # Install new version
    git checkout main && pip install -e .
    python scripts/version_benchmark.py --version-label "new" -o current.json -n 5
    
    # Compare
    python scripts/version_benchmark.py --compare baseline.json current.json
    

  2. Check for regressions (>25% slower on large files)

  3. Document known changes in release notes if performance differs intentionally

  4. Create release - the release-benchmark workflow will automatically verify and append results