Performance Benchmarks¶
gpio includes a benchmark suite for measuring performance and detecting regressions across versions.
Quick Start¶
Run benchmarks comparing current version against a previous release:
# Run benchmarks on current version
python scripts/version_benchmark.py --version-label "current" -o results_current.json
# Compare against previous results
python scripts/version_benchmark.py --compare results_baseline.json results_current.json
Benchmark Operations¶
The suite tests these operations covering most gpio capabilities:
Extract Operations¶
| Operation | Description |
|---|---|
inspect |
Read and display file metadata |
extract-limit |
Extract first 100 rows |
extract-columns |
Extract specific columns (includes geometry) |
extract-bbox |
Spatial bounding box filtering |
Add Column Operations¶
| Operation | Description |
|---|---|
add-bbox |
Add bounding box column |
add-quadkey |
Add quadkey column (resolution 12) |
add-h3 |
Add H3 cell ID column (resolution 8) |
Sort Operations¶
| Operation | Description |
|---|---|
sort-hilbert |
Sort by Hilbert curve for spatial locality |
sort-quadkey |
Sort by quadkey spatial index |
Transform Operations¶
| Operation | Description |
|---|---|
reproject |
Reproject to Web Mercator (EPSG:3857) |
Partition Operations¶
| Operation | Description |
|---|---|
partition-quadkey |
Partition by quadkey (resolution 4) |
partition-h3 |
Partition by H3 cells (resolution 4) |
Convert/Export Operations¶
| Operation | Description |
|---|---|
convert-geojson |
Convert to GeoJSON format |
convert-flatgeobuf |
Convert to FlatGeobuf format |
convert-geopackage |
Convert to GeoPackage format |
Import Operations¶
| Operation | Description |
|---|---|
import-geojson |
Import from GeoJSON to GeoParquet |
import-geopackage |
Import from GeoPackage to GeoParquet |
Note: Import operations only run on tiny/small file sizes as source format files are only available in those sizes.
Chain Operations (Multi-step Workflows)¶
| Operation | Description |
|---|---|
chain-extract-bbox-sort |
Extract columns → Add bbox → Hilbert sort |
chain-filter-sort |
Bbox filter → Hilbert sort |
Operation Presets¶
| Preset | Operations |
|---|---|
quick |
inspect, extract-limit, add-bbox |
standard |
inspect, extract-limit, extract-columns, add-bbox, sort-hilbert |
full |
All 19 operations including imports and chains |
Test Data¶
Benchmark files are hosted on source.coop with different size tiers:
| Tier | Rows | Geometry | CRS | Source |
|---|---|---|---|---|
| tiny | 1,000 | Polygon | EPSG:4326 | Overture Buildings (Singapore) |
| small | 10,000 | Polygon | EPSG:4326 | Overture Buildings (Singapore) |
| medium | 100,000 | Polygon | EPSG:4326 | Overture Buildings (Singapore) |
| large | 809,000 | Polygon | EPSG:3794 | fiboa field boundaries (Slovenia) |
| points-tiny | 1,000 | Point | EPSG:3857 | Building centroids (Web Mercator) |
| points-small | 10,000 | Point | EPSG:3857 | Building centroids (Web Mercator) |
The points files provide variation in geometry type and CRS for regression testing.
File Presets¶
| Preset | Files |
|---|---|
quick |
tiny, small |
standard |
small, medium |
full |
tiny, small, medium, large, points-tiny, points-small |
Files are automatically downloaded and cached locally in /tmp/gpio-benchmark-cache/.
Running Benchmarks Locally¶
Version Comparison Script¶
The scripts/version_benchmark.py script works with any gpio version:
# Run full benchmarks (all files, all operations)
python scripts/version_benchmark.py --version-label "v0.9.0" -o results.json
# Run quick benchmarks (smaller file set, fewer operations)
python scripts/version_benchmark.py --version-label "v0.9.0" -o results.json --files quick --ops quick
# Run benchmarks with more iterations for accuracy
python scripts/version_benchmark.py --version-label "main" -o results.json -n 5
# Run specific operations on specific files
python scripts/version_benchmark.py --version-label "test" --files small,medium --ops add-bbox,sort-hilbert
# Compare two result files
python scripts/version_benchmark.py --compare results_baseline.json results_current.json
# Analyze trends across multiple baselines (oldest to newest)
python scripts/version_benchmark.py --trend results_v0.7.0.json results_v0.8.0.json results_v0.9.0.json
# Customize degradation threshold for trend detection (default: 0.05 = 5%)
python scripts/version_benchmark.py --trend baseline1.json baseline2.json baseline3.json --trend-threshold 0.10
# Skip local caching (test remote file performance)
python scripts/version_benchmark.py --version-label "remote-test" --no-cache
Managing Historical Baselines¶
Use the scripts/manage_baselines.py tool to work with baselines stored in GitHub artifacts:
# List available baselines
uv run python scripts/manage_baselines.py list
# Download specific baseline versions
uv run python scripts/manage_baselines.py download v0.9.0 v0.8.0
# Compare specific baselines (downloads if needed)
uv run python scripts/manage_baselines.py compare v0.8.0 v0.9.0
# Analyze trends across multiple versions
uv run python scripts/manage_baselines.py trends v0.7.0 v0.8.0 v0.9.0
# Use custom degradation threshold
uv run python scripts/manage_baselines.py trends v0.7.0 v0.8.0 v0.9.0 --threshold 0.10
Authentication:
- Requires GitHub token: set GITHUB_TOKEN or authenticate with gh auth login
- Auto-detects repository from git remote
- Downloads baselines to baselines/ directory by default
### Sample Output
**Point-in-time comparison:**
Operation File v0.9.0 main Delta¶
inspect tiny 0.468s 0.440s -5.8% faster extract-limit tiny 0.543s 0.540s -0.5% faster add-bbox large 0.378s 0.408s +8.1% slower sort-hilbert large 27.366s 26.946s -1.5% faster
**Trend analysis across releases:**
Overall Statistics: Average change: +1.23% Max regression: +12.5% Max improvement: -8.3%
⚠️ Gradual Degradation Detected (2 operations):¶
• extract-limit (small): 7.2% avg degradation over last 2 releases • partition-quadkey (medium): 6.1% avg degradation over last 2 releases
🚀 Consistent Improvements (3 operations):¶
• add-bbox (large): 8.5% avg improvement over last 2 releases • sort-hilbert (small): 5.9% avg improvement over last 2 releases • inspect (tiny): 5.2% avg improvement over last 2 releases
### CLI Benchmark Commands
gpio also includes built-in benchmark commands:
```bash
# Run benchmark suite on specific files
gpio benchmark suite --files path/to/file.parquet --operations core
# Run quick benchmark (single operation, timing only)
gpio benchmark run inspect path/to/file.parquet
Profiling Integration¶
When benchmarks identify performance regressions, profiling helps diagnose which code paths are responsible.
Enabling Profiling¶
Add the --profile flag to enable cProfile integration:
# Run benchmarks with profiling enabled
gpio benchmark suite \
--files path/to/file.parquet \
--operations core \
--profile \
--profile-dir ./profiles
# Profile specific operations
gpio benchmark suite \
--files large.parquet \
--operations add-bbox,sort-hilbert \
--profile
from geoparquet_io.core.benchmark_suite import run_benchmark_suite
from pathlib import Path
# Run benchmarks with profiling enabled
result = run_benchmark_suite(
input_files=[Path('path/to/file.parquet')],
operations=['add-bbox', 'extract', 'inspect'],
iterations=3,
profile=True,
profile_dir=Path('./profiles'),
verbose=True
)
# Profile files are saved in ./profiles/
print(f"Generated {len(result.results)} benchmark results")
This generates .prof files in the specified directory (default: ./profiles/).
Analyzing Profile Data¶
View profile interactively:
uv run python -m pstats profiles/add-bbox_large_1.prof
# Then use commands like:
# - stats 20 (show top 20 functions)
# - sort cumtime (sort by cumulative time)
# - callers duckdb (show callers of duckdb functions)
from geoparquet_io.benchmarks.profile_report import format_profile_stats
# Show top 20 slowest functions
summary = format_profile_stats('profiles/add-bbox_large_1.prof', top_n=20)
print(summary)
Sample profile output:
Profile: add-bbox_large_1.prof
================================================================================
Top 20 functions by cumulative time:
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.002 0.002 12.456 12.456 geoparquet_io/core/add_column.py:45(add_bbox_column)
1 0.001 0.001 11.234 11.234 duckdb.py:123(execute)
100 5.678 0.057 9.876 0.099 duckdb.py:234(_fetch_arrow)
10000 2.345 0.000 3.456 0.000 pyarrow.lib:456(cast)
Profiling Overhead¶
- Profiling adds ~5-15% overhead to benchmark timing
- Profile files are typically 50-500KB each
- Disabled by default to keep benchmarks fast
CI Integration¶
The benchmark workflow automatically suggests profiling when regressions are detected:
⚠️ Performance regression detected (+25% slower on sort-hilbert)
💡 To diagnose, run locally with profiling:
gpio benchmark suite --files large.parquet --operations sort-hilbert --profile
Profile artifacts are uploaded with 30-day retention when profiling is enabled.
GitHub Actions Workflows¶
PR Benchmarks (Opt-in)¶
Benchmarks run on PRs only when the benchmark label is added:
- Add the
benchmarklabel to your PR - The workflow runs automatically
- Results are posted as a comment on the PR
Manual Benchmark Run¶
Run benchmarks manually from the Actions tab:
- Go to Actions → Benchmark Suite
- Click Run workflow
- Configure options:
- iterations: Number of runs per operation (default: 3)
- files: File preset or comma-separated list (default: full)
- ops: Operation preset or comma-separated list (default: full)
- compare_version: Optional version to compare against (e.g.,
v0.9.0) - View results in the workflow summary
Release Benchmarks¶
When a release is created, benchmarks automatically:
- Run on the new release version
- Compare against the previous release tag
- Detect regressions (>25% slower)
- Fetch historical baselines from up to 5 previous releases
- Analyze performance trends across releases
- Append results to the release notes
Results include: - Point-in-time comparison table showing performance delta - Performance trends across multiple releases - Warning for significant regressions (>25% in single release) - Warning for gradual degradation (>5% per release for 2+ consecutive releases) - Detailed benchmark data in collapsible section
Baseline Storage:
- Baselines are stored as GitHub Actions artifacts
- Retention: 400 days (covers ~5-10 releases)
- Artifact naming: release-benchmark-{version}
- Contains: benchmark results JSON, comparison text, trend analysis
Where Results Are Published¶
| Trigger | Results Location |
|---|---|
PR with benchmark label |
Comment on PR |
| Manual workflow run | Workflow summary + artifacts |
| Release | Appended to release notes |
All runs also upload JSON artifacts for historical tracking.
Interpreting Results¶
Regression Thresholds¶
Point-in-time (single release comparison):
| Severity | Threshold | Action |
|---|---|---|
| Normal variance | ±10% | No action needed |
| Warning | +10-25% | Investigate cause |
| Regression | >+25% | Flagged in release notes |
Trend analysis (across multiple releases):
| Pattern | Threshold | Action |
|---|---|---|
| Gradual degradation | >5% per release for 2+ consecutive releases | Warning flagged |
| Consistent improvement | >5% per release for 2+ consecutive releases | Highlighted |
| Single spike | One-time regression/improvement | Ignored (not a trend) |
Trend analysis helps detect gradual performance drift that might be missed when comparing only adjacent releases.
Expected Variance¶
- Small files (<10K rows): High variance (±20%) due to startup overhead
- Large files (>100K rows): Low variance (±5%), most reliable for comparison
- CI environment: May differ from local; compare CI-to-CI results
Known Performance Characteristics¶
| Operation | Notes |
|---|---|
inspect |
Slower since v0.6.0 due to geometry type detection |
add-bbox |
75x faster since v0.6.0 for large files |
extract with geometry |
Slow due to WKB serialization; use --exclude-cols geometry if not needed |
sort-hilbert |
Scales linearly with row count |
Pre-Release Checklist¶
Before releasing a new version:
-
Run benchmarks locally against the previous release:
# Install previous version git checkout v0.9.0 && pip install -e . python scripts/version_benchmark.py --version-label "v0.9.0" -o baseline.json -n 5 # Install new version git checkout main && pip install -e . python scripts/version_benchmark.py --version-label "new" -o current.json -n 5 # Compare python scripts/version_benchmark.py --compare baseline.json current.json -
Check for regressions (>25% slower on large files)
-
Document known changes in release notes if performance differs intentionally
-
Create release - the release-benchmark workflow will automatically verify and append results