Skip to content

benchmark Command

Benchmark GeoParquet conversion and operation performance.

Subcommands

Subcommand Description
compare Compare converter performance on a single file
explain Show DuckDB query plan analysis (EXPLAIN ANALYZE)
suite Run comprehensive benchmark suite
report View and compare benchmark results

gpio benchmark compare

Compare converter performance on a single file.

Quick Reference

gpio benchmark compare INPUT_FILE [OPTIONS]

Available Converters

Converter Description Install
duckdb DuckDB spatial extension Always available
geopandas_fiona GeoPandas with Fiona engine geopandas, fiona
geopandas_pyogrio GeoPandas with PyOGRIO engine geopandas, pyogrio
gdal_ogr2ogr GDAL ogr2ogr CLI System GDAL installation

Install all optional converters:

uv pip install geoparquet-io[benchmark]
# or: pip install geoparquet-io[benchmark]

Options

Option Default Description
--iterations N 3 Number of iterations per converter
--converters LIST all available Comma-separated list of converters to run
--output-json PATH - Save results to JSON file
--keep-output DIR temp (cleaned up) Directory to save converted files
--warmup/--no-warmup enabled Run warmup iteration before timing
--format table\|json table Output format
--quiet - Suppress progress output

Examples

# Basic comparison
gpio benchmark compare input.geojson

# Specific converters with more iterations
gpio benchmark compare input.shp --converters duckdb,geopandas_pyogrio --iterations 5

# Save results and keep output files
gpio benchmark compare input.gpkg --output-json results.json --keep-output ./output

# JSON output
gpio benchmark compare input.geojson --format json

Output

Table Format (default)

======================================================================
BENCHMARK RESULTS
======================================================================

File: ARG.geojson
  Format: .geojson
  Features: 3,486,802
  Size: 1120.15 MB
  Geometry: LINESTRING

Converter                 Time (s)           Memory (MB)
-------------------------------------------------------------
DuckDB                    29.751 +/- 0.443   0.0 +/- 0.0
GeoPandas (PyOGRIO)       59.957 +/- 1.078   1196.7 +/- 0.0

Fastest: DuckDB (29.751s)

gpio benchmark explain

Show DuckDB query plan analysis using EXPLAIN ANALYZE.

Quick Reference

gpio benchmark explain INPUT_FILE [OPTIONS]

Options

Option Default Description
--query, -q SELECT * Custom SQL query (use {file} as placeholder)
--format table Output format: table or json
--output, -o - Save results to JSON file

Examples

# Basic query plan analysis
gpio benchmark explain input.parquet

# JSON output
gpio benchmark explain input.parquet --format json

# Custom query with filter (to test pushdown)
gpio benchmark explain input.parquet --query "SELECT * FROM read_parquet('{file}') WHERE id > 10"

# Save to file
gpio benchmark explain input.parquet --output plan.json

Output

The explain command shows:

  • Operators: Query plan operators with timing and row counts
  • Filter pushdown: Whether filters are pushed to the Parquet reader
  • Row group pruning: Whether row groups are skipped based on metadata

Table Format (default)

======================================================================
QUERY PLAN ANALYSIS
======================================================================

Operator                            Time (s)     Rows
-----------------------------------------------------------
PROJECTION                          0.000500     100
  PARQUET_SCAN                      0.001000     100
    File: input.parquet
    Filters: id>10
    Row Groups: 1/3

Total time: 0.001500s

Observations:
  Filter pushdown: detected
  Row group pruning: detected

gpio benchmark suite

Run comprehensive benchmark suite across multiple operations.

gpio benchmark suite [OPTIONS]

Runs a configurable suite of gpio operations (convert, add, sort, partition) on test files and generates detailed reports.

gpio benchmark report

View and compare benchmark results from previous runs.

gpio benchmark report [OPTIONS] [RESULT_FILES]...

Examples

# View single result file
gpio benchmark report results.json

# Compare multiple runs
gpio benchmark report results/*.json

Interpreting Results

  • Time: Mean elapsed seconds +/- standard deviation
  • Memory: Peak memory usage in MB (Python tracemalloc for in-process, psutil for external)
  • DuckDB shows 0 MB because it manages memory outside Python's allocator

See Also