Uploading to Cloud Storage¶
The gpio publish upload command uploads files and directories to cloud object storage, supporting S3, GCS, Azure, and HTTP destinations.
Quick Start¶
# Upload single file to S3
gpio publish upload data.parquet s3://bucket/path/data.parquet
# Upload directory (preserves structure)
gpio publish upload output/ s3://bucket/dataset/
# With AWS profile
gpio publish upload data.parquet s3://bucket/data.parquet --aws-profile my-profile
from geoparquet_io import Table
table = Table("data.parquet")
# Upload to S3
table.upload("s3://bucket/path/data.parquet", aws_profile="my-profile")
Supported Destinations¶
| Destination | URL Format | Example |
|---|---|---|
| Amazon S3 | s3:// |
s3://my-bucket/path/file.parquet |
| Google Cloud Storage | gs:// |
gs://my-bucket/path/file.parquet |
| Azure Blob Storage | az:// |
az://container/path/file.parquet |
| HTTP/HTTPS | http:// or https:// |
https://api.example.com/upload |
Directory Uploads¶
When uploading directories, gpio preserves the directory structure and uploads files in parallel:
# Upload all files
gpio publish upload output/ s3://bucket/dataset/
# Only parquet files
gpio publish upload output/ s3://bucket/dataset/ --pattern "*.parquet"
# Increase parallelism
gpio publish upload output/ s3://bucket/dataset/ --max-files 8
AWS Configuration¶
Using AWS Profiles¶
gpio publish upload data.parquet s3://bucket/data.parquet --aws-profile source-coop
S3-Compatible Endpoints¶
For MinIO, Wasabi, or other S3-compatible storage:
gpio publish upload data.parquet s3://bucket/data.parquet \
--s3-endpoint minio.example.com:9000 \
--s3-region us-east-1
Disable SSL¶
For local development or non-SSL endpoints:
gpio publish upload data.parquet s3://bucket/data.parquet \
--s3-endpoint localhost:9000 \
--s3-no-ssl
Multipart Uploads¶
Large files are automatically uploaded using multipart uploads:
# Customize chunk settings
gpio publish upload large.parquet s3://bucket/large.parquet \
--chunk-size 104857600 \
--chunk-concurrency 12
Error Handling¶
By default, directory uploads continue on errors. Use --fail-fast to stop on first error:
gpio publish upload output/ s3://bucket/dataset/ --fail-fast
Dry Run¶
Preview what would be uploaded without actually uploading:
gpio publish upload output/ s3://bucket/dataset/ --dry-run
CLI Reference¶
See the CLI Reference for complete options.