Skip to content

Uploading to Cloud Storage

The gpio publish upload command uploads files and directories to cloud object storage, supporting S3, GCS, Azure, and HTTP destinations.

Quick Start

# Upload single file to S3
gpio publish upload data.parquet s3://bucket/path/data.parquet

# Upload directory (preserves structure)
gpio publish upload output/ s3://bucket/dataset/

# With AWS profile
gpio publish upload data.parquet s3://bucket/data.parquet --aws-profile my-profile
from geoparquet_io import Table

table = Table("data.parquet")

# Upload to S3
table.upload("s3://bucket/path/data.parquet", aws_profile="my-profile")

Supported Destinations

Destination URL Format Example
Amazon S3 s3:// s3://my-bucket/path/file.parquet
Google Cloud Storage gs:// gs://my-bucket/path/file.parquet
Azure Blob Storage az:// az://container/path/file.parquet
HTTP/HTTPS http:// or https:// https://api.example.com/upload

Directory Uploads

When uploading directories, gpio preserves the directory structure and uploads files in parallel:

# Upload all files
gpio publish upload output/ s3://bucket/dataset/

# Only parquet files
gpio publish upload output/ s3://bucket/dataset/ --pattern "*.parquet"

# Increase parallelism
gpio publish upload output/ s3://bucket/dataset/ --max-files 8

AWS Configuration

Using AWS Profiles

gpio publish upload data.parquet s3://bucket/data.parquet --aws-profile source-coop

S3-Compatible Endpoints

For MinIO, Wasabi, or other S3-compatible storage:

gpio publish upload data.parquet s3://bucket/data.parquet \
  --s3-endpoint minio.example.com:9000 \
  --s3-region us-east-1

Disable SSL

For local development or non-SSL endpoints:

gpio publish upload data.parquet s3://bucket/data.parquet \
  --s3-endpoint localhost:9000 \
  --s3-no-ssl

Multipart Uploads

Large files are automatically uploaded using multipart uploads:

# Customize chunk settings
gpio publish upload large.parquet s3://bucket/large.parquet \
  --chunk-size 104857600 \
  --chunk-concurrency 12

Error Handling

By default, directory uploads continue on errors. Use --fail-fast to stop on first error:

gpio publish upload output/ s3://bucket/dataset/ --fail-fast

Dry Run

Preview what would be uploaded without actually uploading:

gpio publish upload output/ s3://bucket/dataset/ --dry-run

CLI Reference

See the CLI Reference for complete options.