Architecture¶
ByteFreezer is a distributed system with specialized components for each stage of the data pipeline.
Component Overview¶
┌─────────────────────────────────────────────────────────────────────────────┐
│ DATA SOURCES │
│ UDP │ TCP │ Syslog │ sFlow │ IPFIX │ HTTP │ SQS │ Kafka │ NATS │ Kinesis │
└───────────────────────────────────┬─────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ PROXY │
│ • Receives data from all source types │
│ • Batches and compresses for efficiency │
│ • Forwards to Receiver via HTTP │
└───────────────────────────────────┬─────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ RECEIVER │
│ • HTTP webhook endpoint │
│ • Stores raw data to S3/MinIO │
│ • Maintains data durability │
└───────────────────────────────────┬─────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ PACKER │
│ • Reads raw data from S3 │
│ • Converts to Parquet format │
│ • Auto-partitions by time │
│ • Handles schema evolution │
└───────────────────────────────────┬─────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ PIPER │
│ • Applies transformations │
│ • Runs enrichers (geo-tagging, custom) │
│ • Filters and samples data │
│ • Routes to destinations │
└───────────────────────────────────┬─────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ STORAGE (S3/MinIO) │
│ • Parquet files with columnar storage │
│ • Auto-partitioned by account/tenant/dataset/time │
│ • Schema evolution supported │
│ • Ready for AI model integration │
└───────────────────────────────────┬─────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ QUERY LAYER │
│ • DuckDB for SQL queries │
│ • AI agents for natural language │
│ • Grafana integration │
│ • BYOA (Bring Your Own AI) │
└─────────────────────────────────────────────────────────────────────────────┘
Components¶
Proxy¶
The Proxy is the data collection point. It accepts data from various sources and efficiently batches and forwards to the Receiver.
| Feature | Description |
|---|---|
| Multi-protocol | UDP, TCP, Syslog, sFlow, IPFIX, HTTP |
| Message queues | SQS, Kafka, NATS, Kinesis |
| Batching | Groups events for efficient transfer |
| Compression | Reduces bandwidth usage |
Receiver¶
The Receiver is an HTTP webhook endpoint that stores raw data to object storage.
| Feature | Description |
|---|---|
| HTTP endpoint | RESTful webhook interface |
| S3 compatible | Works with S3, MinIO, or any S3-compatible storage |
| Durability | Ensures data is safely stored before acknowledging |
| Multi-tenant | Routes data by tenant and dataset |
Packer¶
The Packer converts raw data into optimized Parquet files.
| Feature | Description |
|---|---|
| Parquet conversion | Columnar format for efficient queries |
| Auto-partitioning | Partitions by account/tenant/dataset/time |
| Schema evolution | Handles changing data structures |
| Compression | Snappy/Zstd for space efficiency |
Piper¶
The Piper is the data processing engine that applies transformations and enrichments.
| Feature | Description |
|---|---|
| Transformations | Modify, rename, parse fields |
| Enrichers | Geo-tagging, custom lookups |
| Filters | Drop unwanted data |
| Sampling | Reduce volume while maintaining visibility |
Control¶
The Control plane manages configuration and coordinates all components.
| Feature | Description |
|---|---|
| Configuration | Centralized config management |
| Health monitoring | Tracks component health |
| API gateway | Unified API for management |
| Multi-deployment | Manages on-prem and managed instances |
Data Hierarchy¶
ByteFreezer organizes data in a hierarchical structure:
Account (Organization)
└── Tenant (Data Source / Environment)
└── Dataset (Data Stream / Collection)
└── Events (Individual Records)
See Data Model for details.
Deployment Topologies¶
On-Premises¶
All components run in your infrastructure:
Your Network
├── Proxy (data collection point)
├── Receiver (webhook endpoint)
├── Piper (processing)
├── Packer (Parquet conversion)
├── Control (management)
└── S3/MinIO (your storage)
Managed¶
ByteFreezer runs compute, you provide data collection and storage:
Your Network ByteFreezer Cloud
├── Proxy (your site) ├── Receiver
└── S3/MinIO (your storage) ◄── ├── Piper
├── Packer
└── Control
Hybrid¶
Mix of on-prem and managed for different environments.
See Deployment Options for details.