Architecture¶
ByteFreezer is a distributed system with specialized components for each stage of the data pipeline.
Component Overview¶
┌─────────────────────────────────────────────────────────────────────────────┐
│ DATA SOURCES │
│ UDP │ TCP │ Syslog │ sFlow │ IPFIX │ HTTP │ SQS │ Kafka │ NATS │ Kinesis │
└───────────────────────────────────┬─────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ PROXY │
│ • Receives data from all source types │
│ • Batches and compresses for efficiency │
│ • Forwards to Receiver via HTTP │
└───────────────────────────────────┬─────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ RECEIVER │
│ • HTTP webhook endpoint │
│ • Stores raw data to S3/MinIO │
│ • Maintains data durability │
└───────────────────────────────────┬─────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ PIPER │
│ • Applies transformations │
│ • Runs enrichers (geo-tagging, custom) │
│ • Filters and samples data │
│ • Routes to destinations │
└───────────────────────────────────┬─────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ PACKER │
│ • Reads processed data │
│ • Converts to Parquet format │
│ • Auto-partitions by time │
│ • Handles schema evolution │
└───────────────────────────────────┬─────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ STORAGE (S3/MinIO) │
│ • Parquet files with columnar storage │
│ • Auto-partitioned by account/tenant/dataset/time │
│ • Schema evolution supported │
│ • Ready for AI model integration │
└───────────────────────────────────┬─────────────────────────────────────────┘
│
┌─────────┴──────────┐
▼ ▼
┌──────────────────────────────────┐ ┌────────────────────────────────────────┐
│ QUERY LAYER │ │ CONNECTOR │
│ • DuckDB for SQL queries │ │ • Export subsets to external systems │
│ • AI agents for NL queries │ │ • Elasticsearch, Splunk, webhooks │
│ • BYOA (Bring Your Own AI) │ │ • SQL-based filtering │
└──────────────────────────────────┘ └────────────────────────────────────────┘
Data Sovereignty¶
ByteFreezer is designed so that your data never leaves your infrastructure.
- Proxy runs in your network — data is collected at your edge
- Receiver, Piper, Packer run in your infrastructure — processing happens on your compute
- Storage is your S3 or MinIO bucket — you own every byte
- Control plane (managed by ByteFreezer) only stores metadata: dataset configurations, health status, user accounts. It never touches raw event data.
The control plane communicates with your components over HTTPS. Components pull configuration from Control and report health status back. No raw data flows through the control plane.
See Security Model for a detailed breakdown of what you run vs. what we run.
Components¶
Proxy¶
The Proxy is the data collection point. It accepts data from various sources and efficiently batches and forwards to the Receiver.
| Feature | Description |
|---|---|
| Multi-protocol | UDP, TCP, Syslog, sFlow, IPFIX, HTTP |
| Message queues | SQS, Kafka, NATS, Kinesis |
| Batching | Groups events for efficient transfer |
| Compression | Reduces bandwidth usage |
Receiver¶
The Receiver is an HTTP webhook endpoint that stores raw data to object storage.
| Feature | Description |
|---|---|
| HTTP endpoint | RESTful webhook interface |
| S3 compatible | Works with S3, MinIO, or any S3-compatible storage |
| Durability | 3-stage spooling: memory → disk → S3 |
| Multi-tenant | Routes data by tenant and dataset |
Piper¶
The Piper is the data processing engine that applies transformations and enrichments after Receiver stores raw data.
| Feature | Description |
|---|---|
| Transformations | Modify, rename, parse fields |
| Enrichers | Geo-tagging, custom lookups |
| Filters | Drop unwanted data |
| Sampling | Reduce volume while maintaining visibility |
Packer¶
The Packer converts processed data into optimized Parquet files for long-term storage.
| Feature | Description |
|---|---|
| Parquet conversion | Columnar format for efficient queries |
| Auto-partitioning | Partitions by account/tenant/dataset/time |
| Schema evolution | Handles changing data structures |
| Compression | Snappy/Zstd for space efficiency |
Control¶
The Control plane manages configuration and coordinates all components. This is the only component managed by ByteFreezer.
| Feature | Description |
|---|---|
| Configuration | Centralized config management |
| Health monitoring | Tracks component health |
| API gateway | Unified API for management |
| UI | Web interface for configuration and monitoring |
Data Hierarchy¶
ByteFreezer organizes data in a hierarchical structure:
Account (Organization)
└── Tenant (Data Source / Environment)
└── Dataset (Data Stream / Collection)
└── Events (Individual Records)
See Data Model for details.
Deployment Topologies¶
On-Premises¶
You run all data-plane components in your infrastructure. Control plane is managed by ByteFreezer:
Your Infrastructure ByteFreezer
├── Proxy (data collection) ├── Control (config + health)
├── Receiver (webhook endpoint) └── UI (management dashboard)
├── Piper (processing)
├── Packer (Parquet conversion)
├── Query (analytics)
├── Connector (data export)
└── S3/MinIO (your storage)
Enterprise¶
White-glove deployment for regulated environments. Air-gapped, custom integrations, dedicated support.
See Deployment Options for details.