On-Premises Deployment¶
Deploy ByteFreezer data processing on your infrastructure while using the managed control plane.
Overview¶
| Location | Components |
|---|---|
| Your Infrastructure | Proxy, Receiver, Piper, Packer, Query, Connector |
| ByteFreezer Managed | Control Plane, UI |
- Data sovereignty: Raw data stays in your environment
- Compliance: Data never leaves your network
- Central management: UI and configuration via managed control
- Simplified operations: No need to manage control plane infrastructure
Architecture¶
┌─────────────────────────────────────────────────────────┐
│ Your Infrastructure │
│ │
│ ┌─────────┐ ┌──────────┐ ┌─────────┐ │
│ │ Proxy │────▶│ Receiver │────▶│ Piper │ │
│ └─────────┘ └──────────┘ └────┬────┘ │
│ │ │
│ ┌────────────────┼───────────┐ │
│ ▼ ▼ │ │
│ ┌─────────┐ ┌─────────┐ │ │
│ │ Packer │ │ Query │ │ │
│ └────┬────┘ └────┬────┘ │ │
│ │ │ │ │
│ ▼ ▼ │ │
│ ┌─────────────────────────────┐ │ │
│ │ Your S3 Storage │ │ │
│ │ (MinIO, AWS S3, etc) │ │ │
│ └──────────────┬──────────────┘ │ │
│ │ │ │
│ ▼ │ │
│ ┌──────────────────┐ │ │
│ │ Connector │──▶ Elasticsearch │
│ │ (data export) │──▶ Splunk │
│ └──────────────────┘──▶ Webhooks │
│ │ │
└─────────────────────────────────────────────────────┘ │
│ │
│ API Key Authentication │
▼ │
┌─────────────────────────────────────────────────────────┘
│ ByteFreezer Managed Control
│
│ ┌─────────────────────────────────────────────┐
│ │ Control Plane │
│ │ - Configuration management │
│ │ - Dataset definitions │
│ │ - Transformation rules │
│ │ - User management │
│ └─────────────────────────────────────────────┘
│
│ ┌─────────────────────────────────────────────┐
│ │ Web UI (bytefreezer.com) │
│ └─────────────────────────────────────────────┘
│
└──────────────────────────────────────────────────────────
Prerequisites¶
- ByteFreezer Account: Sign up at bytefreezer.com
- Infrastructure: Linux servers or Kubernetes cluster
- S3 Storage: MinIO, AWS S3, or compatible object storage
- Network: Outbound HTTPS (443) to api.bytefreezer.com
Service Ports¶
| Component | API Port | Data Port | Notes |
|---|---|---|---|
| Proxy | 8008 | 514, 6343, 4739, etc. | Data ports depend on configured plugins |
| Receiver | 8081 | 8080 (webhook) | Proxy sends data to webhook port |
| Piper | 8082 | — | Reads from S3, writes to S3 |
| Packer | 8083 | — | Reads from S3, writes Parquet to S3 |
| Query | 8000 | — | Optional, SQL + AI queries |
| Connector | 8090 | — | Optional, data export to external systems |
Step 1: Generate an API Key¶
API keys authenticate your on-prem services with the managed control plane.
- Log into bytefreezer.com
- Navigate to API Keys in the sidebar
- Click Generate New Key
- Select Service as the key type
- Enter a descriptive name (e.g., "production-datacenter-1")
- Copy the key immediately — it will only be shown once
Key Security
The service key provides full API access for your account. Treat it like a password:
- Never commit to version control
- Use environment variables or secrets management
- Rotate keys periodically
- Revoke immediately if compromised
Step 2: Deploy Using the Installer¶
The installer project provides ready-to-use deployment packages for all supported platforms.
Deployment Options¶
| Platform | Method | Directory | Best For |
|---|---|---|---|
| Kubernetes | Helm charts | helm/ | Production clusters |
| Docker | Docker Compose | docker/ | Single-host, dev/test |
| Bare metal | Ansible | ansible/ | VMs, systemd services |
| AWS | ECS Fargate | ecs/ | Serverless on AWS |
| GCP | GKE / Cloud Run | gcp/ | Google Cloud |
| Azure | AKS / Container Instances | azure/ | Microsoft Azure |
Quick Start — Docker Compose¶
git clone https://github.com/bytefreezer/installer.git
cd installer/docker/bytefreezer
# Configure
cp .env.example .env
# Edit .env — set CONTROL_URL and CONTROL_API_KEY
# Start with bundled MinIO
docker compose --profile with-minio up -d
Quick Start — Kubernetes (Helm)¶
git clone https://github.com/bytefreezer/installer.git
cd installer/helm
# Deploy processing stack
helm install bytefreezer ./bytefreezer \
--set minio.enabled=true \
--set controlService.url=https://api.bytefreezer.com \
--set controlService.apiKey=YOUR_API_KEY
# Deploy proxy (at edge)
helm install proxy ./proxy \
--set receiver.url=http://bytefreezer-receiver:8080 \
--set controlService.url=https://api.bytefreezer.com \
--set controlService.apiKey=YOUR_API_KEY
Quick Start — Ansible (Bare Metal)¶
git clone https://github.com/bytefreezer/installer.git
cd installer/ansible/bytefreezer
cp inventory.yml.example inventory.yml
# Edit with your servers
cp vars/secrets.yml.example vars/secrets.yml
# Edit with API key
ansible-vault encrypt vars/secrets.yml
ansible-playbook -i inventory.yml playbook.yml --ask-vault-pass
AI-Assisted Deployment (MCP)¶
The ByteFreezer MCP server can generate deployment packages for any platform. Connect it to your AI assistant and use:
bf_generate_docker_compose— Docker Compose + config filesbf_generate_helm_values— Helm values.yamlbf_generate_systemd— systemd install scriptbf_generate_standalone— standalone shell script
Step 3: Verify Connectivity¶
Check that services registered with the control plane:
- Log into bytefreezer.com
- Navigate to Health in the sidebar
- All deployed services should show as Healthy
Or test via API:
curl -s -w "\nHTTP: %{http_code}\n" \
-H "Authorization: Bearer $BYTEFREEZER_API_KEY" \
https://api.bytefreezer.com/api/v1/health
Configuration Reference¶
Each service reads its configuration from a YAML file. The Docker images ship with defaults that work for Docker Compose + MinIO deployments — only secrets need to be provided. See Configuration for the full reference.
Below are the actual config key names for each service.
Proxy¶
app:
name: "bytefreezer-proxy"
server:
api_port: 8008
config_mode: "control-only"
account_id: "your-account-id"
bearer_token: "your-api-key"
control_url: "https://api.bytefreezer.com"
receiver:
base_url: "http://your-receiver:8080"
config_polling:
enabled: true
interval_seconds: 60
cache_directory: "/var/cache/bytefreezer-proxy"
batching:
enabled: true
max_bytes: 10485760
timeout_seconds: 60
compression_enabled: true
spooling:
enabled: true
directory: "/var/spool/bytefreezer-proxy"
max_size_bytes: 1073741824
health_reporting:
enabled: true
report_interval: 30
register_on_startup: true
error_tracking:
enabled: true
Receiver¶
app:
name: "bytefreezer-receiver"
server:
api_port: 8081
protocols:
webhook:
enabled: true
port: 8080
max_payload_size: 10485760
bytefreezer:
upload_worker_count: 10
spool_path: "/var/spool/bytefreezer-receiver"
s3destination:
bucket_name: "bytefreezer-intake"
region: "us-east-1"
endpoint: "minio:9000"
ssl: false
access_key: "your-s3-key"
secret_key: "your-s3-secret"
control_service:
enabled: true
control_url: "https://api.bytefreezer.com"
api_key: "your-service-key"
account_id: "your-account-id"
dlq:
enabled: true
directory: "/var/spool/bytefreezer-receiver"
retry_attempts: 3
retry_interval_seconds: 60
housekeeping:
enabled: true
interval_seconds: 300
health_reporting:
enabled: true
report_interval: 30
register_on_startup: true
error_tracking:
enabled: true
Receiver requires control_service section
The control_service section is required for tenant ID validation. Without it, all POSTs return HTTP 410 "Tenant not found or inactive". The health_reporting section is only for health check reporting — it does NOT replace control_service.
Piper¶
app:
name: "bytefreezer-piper"
server:
api_port: 8082
s3_source:
bucket_name: "bytefreezer-intake"
region: "us-east-1"
endpoint: "minio:9000"
ssl: false
access_key: "your-s3-key"
secret_key: "your-s3-secret"
poll_interval: "30s"
s3_destination:
bucket_name: "bytefreezer-piper"
region: "us-east-1"
endpoint: "minio:9000"
ssl: false
access_key: "your-s3-key"
secret_key: "your-s3-secret"
processing:
max_concurrent_jobs: 10
job_timeout_seconds: 600
retry_attempts: 3
control_service:
enabled: true
control_url: "https://api.bytefreezer.com"
api_key: "your-service-key"
account_id: "your-account-id"
housekeeping:
enabled: true
interval_seconds: 300
health_reporting:
enabled: true
report_interval: 30
register_on_startup: true
error_tracking:
enabled: true
job_timeout_seconds is required
processing.job_timeout_seconds must be set to a non-zero value (recommended: 600). If zero or missing, all S3 and API operations during processing will fail immediately with "context deadline exceeded".
Packer¶
app:
name: "bytefreezer-packer"
server:
api_port: 8083
bytefreezer:
spool_path: "/var/spool/bytefreezer-packer"
cache_path: "/var/cache/bytefreezer-packer"
s3source:
bucket_name: "bytefreezer-piper"
region: "us-east-1"
endpoint: "minio:9000"
ssl: false
access_key: "your-s3-key"
secret_key: "your-s3-secret"
control_service:
enabled: true
control_url: "https://api.bytefreezer.com"
api_key: "your-service-key"
account_id: "your-account-id"
parquet:
max_file_size_mb: 64
timeout_seconds: 1200
compression: "zstd"
streaming_mode: true
atomic_upload: true
housekeeping:
enabled: true
interval_seconds: 300
cleanup:
enabled: true
health_reporting:
enabled: true
report_interval: 30
register_on_startup: true
error_tracking:
enabled: true
Packer paths
bytefreezer.spool_path and bytefreezer.cache_path are required. In Docker, these must be writable volumes (uid 1000). The cache directory is used for intermediate processing files.
Query (Optional)¶
app:
name: "bytefreezer-query"
server:
port: 8000
control:
url: "https://api.bytefreezer.com"
api_key: "your-api-key"
account_id: "your-account-id"
health_reporting:
enabled: true
report_interval: 30
# LLM for natural language queries (optional)
# Leave provider empty to disable NL queries — raw SQL always works
llm:
provider: ""
api_key: ""
model: ""
limits:
max_time_range_hours: 720
max_row_limit: 10000
allow_order_by: true
Connector (Optional)¶
The Connector exports subsets of your parquet data to external systems (Elasticsearch, Splunk, webhooks). See Connector for full documentation.
server:
port: 8090
control:
url: "https://api.bytefreezer.com"
api_key: "your-service-key"
account_id: "your-account-id"
health_reporting:
enabled: true
report_interval: 30
# Query config (required for batch/watch modes)
query:
tenant_id: ""
dataset_id: ""
sql: "SELECT * FROM read_parquet('PARQUET_PATH', hive_partitioning=true, union_by_name=true) LIMIT 100"
destination:
type: stdout # stdout, elasticsearch, webhook
config: {}
schedule:
interval_seconds: 60 # Watch mode poll interval
batch_size: 1000
S3 Credentials
S3 access keys and the control API key are provided via environment variables, .env files, or Kubernetes Secrets depending on your deployment method. The installer handles this for you. See the installer project for details.
Managing API Keys¶
Viewing Keys¶
In the UI, navigate to API Keys to see all keys. Service keys show:
- Key name
- Key prefix (first 8 characters for identification)
- Creation date
- Last used timestamp
Revoking Keys¶
To revoke a compromised or unused key:
- Navigate to API Keys
- Find the service key in the list
- Click Revoke
- Confirm revocation
Immediate Effect
Revoked keys stop working immediately. All services using that key will lose access to the control plane.
Key Rotation¶
- Generate a new service key
- Update service configurations (or
.env/ Secrets) - Restart services to pick up the new key
- Verify connectivity via the Health page
- Revoke the old key
Troubleshooting¶
"401 Unauthorized" errors¶
- Verify API key is correct and not revoked
- Check key is passed in
Authorization: Bearer <key>header - Ensure the API key environment variable is set
"Connection refused" to control plane¶
- Verify outbound HTTPS (443) is allowed to api.bytefreezer.com
- Check for proxy/firewall blocking the connection
- Test with:
curl -v https://api.bytefreezer.com/api/v1/health
Services not processing data¶
- Check logs for authentication errors
- Verify S3 credentials and bucket access
- Confirm control plane connectivity from each service
- Check the Health page in the UI for status details
- Verify
processing.job_timeout_secondsis set (piper) - Verify
control_servicesection exists (receiver)
Packer not producing Parquet files¶
- Verify
bytefreezer.spool_pathandbytefreezer.cache_pathare set and writable - Check
housekeeping.enabled: true - Verify S3 source bucket has data from piper
- Packer started before tenant creation may need a restart
Next Steps¶
- Configuration Reference — Full config key reference for all services
- Data Model — Understand tenants and datasets
- Transformations — Configure data processing
- API Reference — Full API documentation