Troubleshooting¶

Common issues and solutions when using geoparquet-io.

Installation Issues¶

DuckDB Installation Fails¶

Symptom: Error installing DuckDB on certain platforms.

Solution: Upgrade pip and try again:

pip install --upgrade pip
pip install duckdb

If on Apple Silicon (M1/M2/M3), ensure you're using a native ARM Python, not Rosetta.

PyArrow Version Conflicts¶

Symptom: Version conflicts with other geospatial packages.

Solution: Use a fresh virtual environment:

python -m venv gpio-env
source gpio-env/bin/activate  # Windows: gpio-env\Scripts\activate
pip install geoparquet-io

File Access Issues¶

"File not found" for Remote URLs¶

Symptom: Error accessing S3, GCS, or HTTPS files.

Solutions:

Verify the URL is correct and accessible
Check authentication (see below)
For S3, ensure the bucket region is correct

S3 Authentication Errors¶

Symptom: Access denied or credentials errors for S3 files.

Solutions:

# Option 1: Use AWS profile
gpio inspect s3://bucket/file.parquet --profile my-profile

# Option 2: Set environment variables
export AWS_ACCESS_KEY_ID=your_key
export AWS_SECRET_ACCESS_KEY=your_secret
gpio inspect s3://bucket/file.parquet

# Option 3: Use default credentials
aws configure  # Set up ~/.aws/credentials
gpio inspect s3://bucket/file.parquet

Azure Blob Storage Issues¶

Symptom: Cannot read from Azure Blob Storage.

Solution: Set Azure credentials:

export AZURE_STORAGE_ACCOUNT_NAME=myaccount
export AZURE_STORAGE_ACCOUNT_KEY=mykey
# Or use SAS token
export AZURE_STORAGE_SAS_TOKEN=mytoken

GCS Requires HMAC Keys¶

Symptom: GCS authentication fails with service account.

Solution: DuckDB requires HMAC keys for GCS:

# Generate HMAC keys at: https://console.cloud.google.com/storage/settings
export GCS_ACCESS_KEY_ID=your_access_key
export GCS_SECRET_ACCESS_KEY=your_secret_key
gpio inspect gs://bucket/file.parquet

Windows-Specific Issues¶

File Locking Errors¶

Symptom: "The process cannot access the file because it is being used by another process"

Cause: DuckDB keeps file handles open, preventing cleanup. Unlike Unix, Windows doesn't allow deleting files that have open handles.

Solutions for Users:

Close any other applications accessing the file
Use unique output filenames (avoid overwriting)
Run operations sequentially, not in parallel

Solutions for Developers/Contributors:

When writing tests that create temporary files and use DuckDB to read them:

Use safe_unlink from conftest.py instead of Path.unlink():

from tests.conftest import safe_unlink

# In your test cleanup:
safe_unlink(temp_path)  # Retries with delays for Windows

Use fixtures like temp_output_dir which handle cleanup automatically:

def test_example(temp_output_dir):
    output_file = os.path.join(temp_output_dir, "output.parquet")
    # ... test code ...
    # Cleanup handled automatically by fixture

Use safe_rmtree for directories:

from tests.conftest import safe_rmtree
safe_rmtree(temp_dir)  # Retries with delays for Windows

The safe_unlink and safe_rmtree functions retry file deletion up to 5 times with 0.1s delays between attempts, which gives Windows time to release file handles.

Path Issues with Spaces¶

Symptom: Commands fail when file paths contain spaces.

Solution: Quote paths with spaces:

gpio inspect "C:\Users\My Name\data file.parquet"

Performance Issues¶

Slow Operations on Large Files¶

Symptom: Commands take a long time on large files.

Solutions:

Skip Hilbert for conversion: gpio convert input.shp output.parquet --skip-hilbert
Use --limit for testing: gpio extract input.parquet sample.parquet --limit 1000
Process locally: Download remote files before processing for very large files (>10GB)

Out of Memory Errors¶

Symptom: Process killed or memory errors on large files.

Solutions:

Use the default write strategy - gpio automatically streams data with constant memory:
```
gpio extract large_file.parquet output.parquet --bbox -122.5,37.5,-122.0,38.0
```
Set explicit memory limit - For containerized environments or tight constraints:
```
gpio extract input.parquet output.parquet --write-memory 512MB
```

Try streaming strategy - Alternative if default still uses too much memory:

gpio extract input.parquet output.parquet --write-strategy streaming --write-memory 256MB

Process in chunks using partitioning - For extremely large datasets:

gpio partition input.parquet output/ --by-quadkey --quadkey-resolution 4

Increase system swap space or use a machine with more RAM

Slow Writes in Containers¶

Symptom: Write operations are slow in Docker or Kubernetes.

Cause: gpio may not correctly detect container memory limits in some configurations.

Solutions:

Set explicit memory limit - Tell gpio exactly how much memory to use:
```
gpio extract input.parquet output.parquet --write-memory 1GB
```

Verify container memory limits - Ensure your container has enough memory:

docker run -m 4g my-gpio-image gpio extract input.parquet output.parquet

Check cgroup version - gpio supports both cgroup v1 and v2. Verify your container runtime is configured correctly.

Write Strategy Selection¶

Symptom: Need to choose between different write strategies.

When to use each strategy:

Scenario	Strategy	Command
Default (any file size)	`duckdb-kv`	(no flag needed)
Verify output correctness	`in-memory`	`--write-strategy in-memory`
DuckDB has issues	`streaming`	`--write-strategy streaming`
Maximum compatibility	`disk-rewrite`	`--write-strategy disk-rewrite`

Example - Debugging output differences:

# 1. Write with default
gpio extract input.parquet output_default.parquet --bbox 0,0,10,10

# 2. Write with in-memory to verify
gpio extract input.parquet output_verify.parquet --bbox 0,0,10,10 --write-strategy in-memory

# 3. Compare
gpio inspect output_default.parquet --stats
gpio inspect output_verify.parquet --stats

See the Write Strategies Guide for detailed information

GeoParquet Issues¶

"No geometry column found"¶

Symptom: Error about missing geometry column.

Solutions:

Verify file is actually GeoParquet: gpio inspect file.parquet
Check if geometry column has a different name
Specify geometry column explicitly if supported

CRS Warning: Coordinates Look Wrong¶

Symptom: Warning about coordinate ranges not matching CRS.

Cause: Data might be in a projected CRS but metadata says WGS84 (or vice versa).

Solutions:

Check actual coordinate ranges: gpio inspect file.parquet --stats
Convert with correct CRS: gpio convert data.csv output.parquet --crs EPSG:3857

Bbox Column Exists But No Covering Metadata¶

Symptom: gpio check bbox warns about missing covering metadata.

Solution: Add just the metadata (doesn't rewrite data):

gpio add bbox-metadata myfile.parquet

Command-Specific Issues¶

Extract WHERE Clause Errors¶

Symptom: SQL syntax errors with special column names.

Solution: Quote column names with special characters:

# Columns with colons, dashes, dots need double quotes in SQL
gpio extract data.parquet output.parquet --where '"crop:name" = '\''wheat'\'''

# Use --dry-run to preview the SQL
gpio extract data.parquet output.parquet --where "status = 'active'" --dry-run

Partition Preview Shows No Output¶

Symptom: gpio partition --preview shows no partitions.

Cause: Column has no data or all null values.

Solution: Check column values first:

gpio inspect file.parquet --stats

Convert Fails on CSV with WKT¶

Symptom: Error parsing WKT geometry from CSV.

Solutions:

Check WKT syntax is valid

Use --skip-invalid to skip bad rows:

gpio convert data.csv output.parquet --skip-invalid

Getting Help¶

Debug Information¶

Use --verbose for detailed output:

gpio convert input.shp output.parquet --verbose

Use --dry-run to preview SQL without executing:

gpio extract data.parquet output.parquet --where "x > 1" --dry-run

Reporting Issues¶

When reporting issues, include:

Command you ran
Error message
Output of gpio --version
Python version: python --version
Operating system

File issues at: GitHub Issues