Skip to content

Configuration Reference

All ByteFreezer services read configuration from a YAML file at startup. Docker images ship with defaults that work for Docker Compose + MinIO deployments — only secrets need to be provided.

Config File Locations

Service Config Path Default Endpoint
Proxy /etc/bytefreezer-proxy/config.yaml minio:9000
Receiver /etc/bytefreezer-receiver/config.yaml minio:9000
Piper /etc/bytefreezer-piper/config.yaml minio:9000
Packer /etc/bytefreezer-packer/config.yaml minio:9000

Bundled Docker Defaults

Docker images include a config.yaml with production-ready defaults:

  • S3 endpoint set to minio:9000 (Docker service name)
  • Receiver URL set to http://receiver:8080 (Docker service name)
  • All secrets (API keys, S3 credentials) are empty strings
  • Correct bucket names, ports, and intervals

When deploying with Docker Compose, mount a config file to override the bundled defaults, or pass secrets via environment variables. Config files generated by the MCP tools or installer override these defaults entirely.

Config File vs Volume Mount Priority

  1. Bundled config (baked into image) — lowest priority
  2. Volume-mounted config (e.g., ./config/proxy.yaml:/etc/bytefreezer-proxy/config.yaml:ro) — overrides bundled
  3. Environment variables — not currently supported for nested YAML keys due to underscore mapping limitations

Environment Variable Limitation

Services use koanf for config loading. The env var _. mapping breaks keys containing underscores (e.g., access_key becomes access.key). Use config files for S3 credentials and other keys with underscores.

Complete Config Keys

Proxy

Key Type Default Description
server.api_port int 8008 API and health endpoint port
account_id string "" Account ID for control plane auth
bearer_token string "" API key for all auth
control_url string "" Control plane URL
config_mode string "control-only" local-only or control-only
receiver.base_url string "http://receiver:8080" Receiver webhook URL
config_polling.enabled bool true Poll control for config changes
config_polling.interval_seconds int 30 Poll interval
config_polling.cache_directory string "/var/cache/bytefreezer-proxy" Local config cache path
batching.enabled bool true Batch data before sending
batching.max_bytes int 10485760 Max batch size (10MB)
batching.timeout_seconds int 30 Flush timeout
batching.compression_enabled bool true Compress batches
spooling.enabled bool true Spool failed uploads
spooling.directory string "/var/spool/bytefreezer-proxy" Spool directory
spooling.max_size_bytes int 1073741824 Max spool size (1GB)
spooling.retry_attempts int 5 Upload retry count
health_reporting.enabled bool true Report health to control plane
health_reporting.report_interval int 30 Report interval (seconds)
tenant_validation.enabled bool true Proactive tenant status checks
error_tracking.enabled bool true Report errors to control plane

Receiver

Key Type Default Description
server.api_port int 8081 API and health endpoint port
protocols.webhook.enabled bool true Enable webhook ingestion
protocols.webhook.port int 8080 Webhook listen port
protocols.webhook.max_payload_size int 10485760 Max payload (10MB)
bytefreezer.upload_worker_count int 10 S3 upload workers
bytefreezer.spool_path string "/var/spool/bytefreezer-receiver" Spool directory
s3destination.bucket_name string "bytefreezer-intake" Intake bucket name
s3destination.endpoint string "minio:9000" S3 endpoint
s3destination.region string "us-east-1" S3 region
s3destination.ssl bool false Use HTTPS for S3
s3destination.access_key string "" S3 access key
s3destination.secret_key string "" S3 secret key
control_service.enabled bool true Enable control plane integration
control_service.control_url string "" Control plane URL
control_service.api_key string "" Service API key
control_service.account_id string "" Account ID
dlq.enabled bool true Enable dead letter queue
dlq.retry_attempts int 3 Upload retry attempts
housekeeping.enabled bool true Enable housekeeping
housekeeping.interval_seconds int 300 Housekeeping interval
health_reporting.report_interval int 30 Health report interval
error_tracking.enabled bool true Report errors

control_service is required

Without the control_service section (url, api_key, account_id), the receiver cannot validate tenant IDs and returns HTTP 410 on all POSTs. The health_reporting section is a separate concern — it only handles health check reporting.

Piper

Key Type Default Description
server.api_port int 8082 API and health endpoint port
s3_source.bucket_name string "bytefreezer-intake" Source bucket (from receiver)
s3_source.endpoint string "minio:9000" S3 endpoint
s3_source.poll_interval string "30s" S3 poll interval (duration)
s3_source.access_key string "" S3 access key
s3_source.secret_key string "" S3 secret key
s3_destination.bucket_name string "bytefreezer-piper" Destination bucket (for packer)
s3_destination.endpoint string "minio:9000" S3 endpoint
s3_destination.access_key string "" S3 access key
s3_destination.secret_key string "" S3 secret key
processing.max_concurrent_jobs int 10 Parallel processing jobs
processing.job_timeout_seconds int 600 REQUIRED — job timeout
processing.retry_attempts int 3 Job retry count
control_service.enabled bool true Enable control plane integration
control_service.control_url string "" Control plane URL
control_service.api_key string "" Service API key
control_service.account_id string "" Account ID
housekeeping.enabled bool true Enable housekeeping
housekeeping.interval_seconds int 300 Housekeeping interval
health_reporting.report_interval int 30 Health report interval
error_tracking.enabled bool true Report errors

job_timeout_seconds is required

processing.job_timeout_seconds must be a non-zero value (recommended: 600). Piper validates this at startup and refuses to start if zero. Without it, all S3 and API operations during processing fail with "context deadline exceeded".

Packer

Key Type Default Description
server.api_port int 8083 API and health endpoint port
bytefreezer.spool_path string "/var/spool/bytefreezer-packer" Spool directory
bytefreezer.cache_path string "/var/cache/bytefreezer-packer" Cache directory (must be writable, uid 1000)
s3source.bucket_name string "bytefreezer-piper" Source bucket (from piper)
s3source.endpoint string "minio:9000" S3 endpoint
s3source.access_key string "" S3 access key
s3source.secret_key string "" S3 secret key
control_service.enabled bool true Enable control plane integration
control_service.control_url string "" Control plane URL
control_service.api_key string "" Service API key
control_service.account_id string "" Account ID
parquet.max_file_size_mb int 64 Target Parquet file size
parquet.timeout_seconds int 1200 Accumulation timeout (20min)
parquet.compression string "zstd" Compression algorithm
parquet.streaming_mode bool true Stream processing mode
parquet.atomic_upload bool true Atomic S3 uploads
housekeeping.enabled bool true Enable housekeeping
housekeeping.interval_seconds int 300 Housekeeping interval
housekeeping.cleanup.enabled bool true Clean up stale files
health_reporting.report_interval int 30 Health report interval
error_tracking.enabled bool true Report errors

Common Gotchas

Config Key Names Differ Between Services

S3 config keys are not consistent across services:

Service S3 Source Key S3 Destination Key
Receiver s3destination (no underscore)
Piper s3_source (underscore) s3_destination (underscore)
Packer s3source (no underscore) — (per-tenant via control)

Health Reporting Interval Key

All services use health_reporting.report_interval (NOT interval_seconds). Using the wrong key name silently defaults to 0, meaning no health reports are sent.

Docker Health Checks

Alpine-based images do not include curl. Health checks must use wget:

healthcheck:
  test: ["CMD", "wget", "-q", "--spider", "http://localhost:8008/api/v1/health"]

Docker Volumes for Packer

Packer containers run as uid 1000 and require writable volumes:

volumes:
  - packer-spool:/var/spool/bytefreezer-packer
  - packer-cache:/var/cache/bytefreezer-packer

Container ID Detection

Pass HOST_HOSTNAME environment variable for human-readable instance IDs in the control plane:

environment:
  HOST_HOSTNAME: ${HOST_HOSTNAME}

Set HOST_HOSTNAME=myhostname in your .env file. Instance IDs will appear as myhostname:containerid.