Skip to content

Converting Between Formats

The convert command transforms between GeoParquet and other vector formats with automatic format detection and optimization.

CLI vs Python Behavior

The CLI gpio convert applies Hilbert sorting by default for optimal spatial queries. The Python gpio.convert() does NOT sort by default - chain .sort_hilbert() explicitly if needed.

Basic Usage

gpio convert input.shp output.parquet

Automatically applies:

  • ZSTD compression (level 15)
  • 100,000 row groups
  • Bbox column with proper metadata
  • Hilbert spatial ordering
  • GeoParquet 1.1.0 metadata
import geoparquet_io as gpio

# Convert with Hilbert sorting (recommended)
gpio.convert('input.shp').sort_hilbert().write('output.parquet')

# Or without sorting (faster but less optimal for spatial queries)
gpio.convert('input.shp').write('output.parquet')

Supported Formats

Input Formats (to GeoParquet)

Auto-detected by file extension:

  • Shapefile (.shp)
  • GeoJSON (.geojson, .json)
  • GeoPackage (.gpkg)
  • FlatGeobuf (.fgb)
  • File Geodatabase (.gdb)
  • CSV/TSV (.csv, .tsv, .txt) - See CSV/TSV Support below

Any format supported by DuckDB's spatial extension (50+ formats) can be read.

Output Formats (from GeoParquet)

Auto-detected from output file extension:

  • GeoParquet (.parquet) - Optimized cloud-native format
  • GeoPackage (.gpkg) - SQLite-based OGC standard
  • FlatGeobuf (.fgb) - Cloud-native streaming format
  • CSV (.csv) - Tabular with WKT geometry
  • Shapefile (.shp) - Legacy ESRI format
  • GeoJSON (.geojson, .json) - Web-friendly JSON format

Converting FROM GeoParquet

Convert GeoParquet to other formats with automatic format detection:

# Auto-detects format from extension
gpio convert data.parquet output.gpkg      # → GeoPackage
gpio convert data.parquet output.fgb       # → FlatGeobuf
gpio convert data.parquet output.csv       # → CSV with WKT
gpio convert data.parquet output.shp       # → Shapefile
gpio convert data.parquet output.geojson   # → GeoJSON
# Use explicit subcommand
gpio convert geopackage data.parquet output.gpkg
gpio convert flatgeobuf data.parquet output.fgb
gpio convert csv data.parquet output.csv
gpio convert shapefile data.parquet output.shp
gpio convert geojson data.parquet output.geojson
import geoparquet_io as gpio

# Load and convert
table = gpio.read('data.parquet')

# Auto-detects from extension
table.write('output.gpkg')      # → GeoPackage
table.write('output.fgb')       # → FlatGeobuf
table.write('output.csv')       # → CSV with WKT
table.write('output.shp')       # → Shapefile
table.write('output.geojson')   # → GeoJSON

# Or use explicit format
table.write('output.dat', format='csv')

Format-Specific Options

GeoPackage:

# Custom layer name
gpio convert data.parquet output.gpkg --layer-name buildings

# Overwrite existing
gpio convert data.parquet output.gpkg --overwrite

Shapefile:

# Custom encoding (default: UTF-8)
gpio convert data.parquet output.shp --encoding ISO-8859-1

# Overwrite existing
gpio convert data.parquet output.shp --overwrite

Shapefile Limitations

  • Column names truncated to 10 characters
  • File size limit of 2GB
  • Limited data type support
  • Creates multiple files (.shp, .shx, .dbf, .prj)
  • Consider using GeoPackage or FlatGeobuf instead

Remote Shapefile Storage

When writing shapefiles to remote storage (S3, GCS, Azure), all sidecar files (.shp, .shx, .dbf, .prj, etc.) are automatically packaged into a single .shp.zip archive before upload. This ensures atomic uploads and avoids incomplete multi-file uploads.

# Local: Creates output.shp, output.shx, output.dbf, etc.
gpio convert data.parquet output.shp

# Remote: Uploads output.shp.zip containing all files
gpio convert data.parquet s3://bucket/output.shp
# → Creates s3://bucket/output.shp.zip

CSV:

# Include WKT geometry (default)
gpio convert data.parquet output.csv

# Exclude geometry
gpio convert data.parquet output.csv --no-wkt

# Exclude bbox column
gpio convert data.parquet output.csv --no-bbox

GeoJSON:

# Custom precision (default: 7)
gpio convert data.parquet output.geojson --precision 5

# Include bbox for each feature
gpio convert data.parquet output.geojson --write-bbox

# Use specific field as feature ID
gpio convert data.parquet output.geojson --id-field osm_id

# Pretty-print JSON
gpio convert data.parquet output.geojson --pretty

Cloud Output Support

All formats support cloud destinations via upload:

# Write local then upload
gpio convert data.parquet local.gpkg
gpio publish upload local.gpkg s3://bucket/output.gpkg
import geoparquet_io as gpio

# Write locally first
table = gpio.read('data.parquet')
table.write('local.gpkg')

# Upload to cloud
gpio.upload('local.gpkg', 's3://bucket/output.gpkg')

Remote Files

Read from cloud storage or HTTPS:

# Convert remote file
gpio convert https://example.com/data.geojson local.parquet

# Convert from S3
gpio convert s3://bucket/input.parquet local-optimized.parquet

# Convert remote to local format
gpio convert s3://bucket/data.parquet local.gpkg

See Remote Files Guide for authentication setup.

Options

Skip Hilbert Ordering

For faster conversion when spatial ordering isn't critical:

gpio convert large.gpkg output.parquet --skip-hilbert

Trade-off: Faster conversion but less optimal for spatial queries.

Custom Compression

Control compression type and level:

# GZIP compression
gpio convert input.shp output.parquet --compression GZIP --compression-level 6

# Uncompressed (not recommended)
gpio convert input.geojson output.parquet --compression UNCOMPRESSED

Available compression types: - ZSTD (default, level 15) - Best compression + speed balance - GZIP (level 1-9) - Wide compatibility - BROTLI (level 1-11) - High compression - LZ4 - Fastest decompression - SNAPPY - Fast compression - UNCOMPRESSED - No compression

Verbose Output

Track progress and see detailed information:

gpio convert input.gpkg output.parquet --verbose

Shows: - Geometry column detection - Dataset bounds calculation - Bbox column creation - Hilbert ordering progress - File size and validation

Examples

Basic Shapefile Conversion

gpio convert buildings.shp buildings.parquet

Output:

Converting buildings.shp...
Done in 2.3s
Output: buildings.parquet (4.2 MB)
✓ Output passes GeoParquet validation

import geoparquet_io as gpio

gpio.convert('buildings.shp').sort_hilbert().write('buildings.parquet')

Large Dataset Without Hilbert

gpio convert large_dataset.gpkg output.parquet --skip-hilbert
import geoparquet_io as gpio

# Python doesn't sort by default, so just skip sort_hilbert()
gpio.convert('large_dataset.gpkg').write('output.parquet')

Skips Hilbert ordering for faster processing on large files.

Custom Compression Settings

gpio convert roads.geojson roads.parquet \
  --compression ZSTD \
  --compression-level 22 \
  --verbose

Maximum ZSTD compression with progress tracking.

Convert and Inspect

# Convert
gpio convert input.shp output.parquet

# Verify
gpio inspect output.parquet

# Validate
gpio check all output.parquet

CSV/TSV Support

Auto-detects geometry columns. WKT columns (wkt, geometry, geom) checked first, then lat/lon pairs (lat/lon, latitude/longitude).

# Auto-detect WKT or lat/lon
gpio convert points.csv points.parquet

# Explicit columns
gpio convert data.csv out.parquet --wkt-column geom_wkt
gpio convert data.csv out.parquet --lat-column lat --lon-column lng

# Custom delimiter
gpio convert data.txt out.parquet --delimiter "|"

CRS and Validation

Default: WGS84 (EPSG:4326). Override with --crs for WKT data:

gpio convert projected.csv out.parquet --crs EPSG:3857

Validates lat/lon ranges (-90 to 90, -180 to 180). Warns on large coordinates suggesting projected CRS.

Invalid Geometries

Fails on invalid WKT by default. Skip with --skip-invalid:

gpio convert messy.csv out.parquet --skip-invalid

Skips invalid rows, disables Hilbert ordering. Mixed geometry types supported.

Delimiters

Auto-detects comma and tab. Override with --delimiter for semicolon, pipe, or any single character.

gpio convert data.csv out.parquet --delimiter ";"

Performance

The convert command uses DuckDB's spatial extension - the fastest option for GeoParquet conversion, especially for large files.

Benchmarks on representative datasets:

Dataset Size Features DuckDB PyOGRIO ogr2ogr Fiona
GAUL L2 Shapefile 739 MB 45k 4.6s 5.9s 4.1s 187s
Argentina Roads 1.1 GB 3.5M 30s 66s 117s 349s

DuckDB also uses significantly less memory than alternatives (near-zero vs 600MB-2GB for GeoPandas).

To run your own benchmarks:

gpio benchmark input.geojson --iterations 3

See gpio benchmark for details.

See Also