Quick Start¶
Get started with geoparquet-io in 5 minutes.
Installation¶
uv tool install geoparquet-io
uv add geoparquet-io
See Installation Guide for more options.
Reading and Writing¶
# Convert Shapefile/GeoJSON/GeoPackage to optimized GeoParquet
gpio convert input.shp output.parquet
# Inspect file structure
gpio inspect myfile.parquet
# Rows: 1,523,847 | Size: 245.3 MB | CRS: EPSG:4326
# Preview first 5 rows
gpio inspect myfile.parquet --head 5
import geoparquet_io as gpio
# Read a file
table = gpio.read('data.parquet')
table.num_rows
# 1523847
# Convert from other formats
table = gpio.convert('data.shp')
table.write('output.parquet')
Transforming Data¶
# Add bbox column for faster spatial queries
gpio add bbox input.parquet output.parquet
# Sort using Hilbert curve for spatial locality
gpio sort hilbert input.parquet sorted.parquet
# Chain with pipes—no intermediate files
gpio add bbox input.parquet | gpio sort hilbert - output.parquet
import geoparquet_io as gpio
# Chain operations fluently
gpio.read('input.parquet') \
.add_bbox() \
.sort_hilbert() \
.write('output.parquet')
Adding Spatial Indices¶
# H3 hexagonal cells
gpio add h3 input.parquet output.parquet --resolution 9
# S2 spherical cells
gpio add s2 input.parquet output.parquet --level 13
# Chain multiple indices
gpio add bbox input.parquet | gpio add h3 -r 9 - | gpio sort hilbert - output.parquet
gpio.read('input.parquet') \
.add_bbox() \
.add_h3(resolution=9) \
.sort_hilbert() \
.write('output.parquet')
Partitioning¶
# Partition by H3 cells
gpio partition h3 input.parquet output_dir/ --resolution 6
# Preview first
gpio partition h3 input.parquet --resolution 6 --preview
gpio.read('input.parquet') \
.add_h3(resolution=9) \
.partition_by_h3('output/', resolution=6)
Performance: CLI vs Python¶
| Approach | Time (75MB file) | Notes |
|---|---|---|
| CLI (file-based) | 34s | Each command writes intermediate file |
| CLI (piped) | 16s | Arrow IPC streaming between commands |
| Python API | 7s | In-memory, no I/O overhead |
The Python API is fastest because data stays in memory. Use CLI for shell scripts and one-off commands; use Python for applications and maximum performance.
Next Steps¶
- User Guide - Detailed feature documentation
- Python API Reference - Full API documentation
- CLI Reference - Complete command reference