Architecture¶

ByteFreezer is a distributed system with specialized components for each stage of the data pipeline.

Component Overview¶

┌─────────────────────────────────────────────────────────────────────────────┐
│                              DATA SOURCES                                    │
│   UDP │ TCP │ Syslog │ sFlow │ IPFIX │ HTTP │ SQS │ Kafka │ NATS │ Kinesis  │
└───────────────────────────────────┬─────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                               PROXY                                          │
│   • Receives data from all source types                                      │
│   • Batches and compresses for efficiency                                    │
│   • Forwards to Receiver via HTTP                                            │
└───────────────────────────────────┬─────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                              RECEIVER                                        │
│   • HTTP webhook endpoint                                                    │
│   • Stores raw data to S3/MinIO                                              │
│   • Maintains data durability                                                │
└───────────────────────────────────┬─────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                               PACKER                                         │
│   • Reads raw data from S3                                                   │
│   • Converts to Parquet format                                               │
│   • Auto-partitions by time                                                  │
│   • Handles schema evolution                                                 │
└───────────────────────────────────┬─────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                                PIPER                                         │
│   • Applies transformations                                                  │
│   • Runs enrichers (geo-tagging, custom)                                     │
│   • Filters and samples data                                                 │
│   • Routes to destinations                                                   │
└───────────────────────────────────┬─────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                            STORAGE (S3/MinIO)                                │
│   • Parquet files with columnar storage                                      │
│   • Auto-partitioned by account/tenant/dataset/time                          │
│   • Schema evolution supported                                               │
│   • Ready for AI model integration                                           │
└───────────────────────────────────┬─────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                           QUERY LAYER                                        │
│   • DuckDB for SQL queries                                                   │
│   • AI agents for natural language                                           │
│   • BYOA (Bring Your Own AI)                                                 │
└─────────────────────────────────────────────────────────────────────────────┘

Components¶

Proxy¶

The Proxy is the data collection point. It accepts data from various sources and efficiently batches and forwards to the Receiver.

Feature	Description
Multi-protocol	UDP, TCP, Syslog, sFlow, IPFIX, HTTP
Message queues	SQS, Kafka, NATS, Kinesis
Batching	Groups events for efficient transfer
Compression	Reduces bandwidth usage

Receiver¶

The Receiver is an HTTP webhook endpoint that stores raw data to object storage.

Feature	Description
HTTP endpoint	RESTful webhook interface
S3 compatible	Works with S3, MinIO, or any S3-compatible storage
Durability	Ensures data is safely stored before acknowledging
Multi-tenant	Routes data by tenant and dataset

Packer¶

The Packer converts raw data into optimized Parquet files.

Feature	Description
Parquet conversion	Columnar format for efficient queries
Auto-partitioning	Partitions by account/tenant/dataset/time
Schema evolution	Handles changing data structures
Compression	Snappy/Zstd for space efficiency

Piper¶

The Piper is the data processing engine that applies transformations and enrichments.

Feature	Description
Transformations	Modify, rename, parse fields
Enrichers	Geo-tagging, custom lookups
Filters	Drop unwanted data
Sampling	Reduce volume while maintaining visibility

Control¶

The Control plane manages configuration and coordinates all components.

Feature	Description
Configuration	Centralized config management
Health monitoring	Tracks component health
API gateway	Unified API for management
Multi-deployment	Manages on-prem and managed instances

Data Hierarchy¶

ByteFreezer organizes data in a hierarchical structure:

Account (Organization)
  └── Tenant (Data Source / Environment)
        └── Dataset (Data Stream / Collection)
              └── Events (Individual Records)

See Data Model for details.

Deployment Topologies¶

On-Premises¶

All components run in your infrastructure:

Your Network
├── Proxy (data collection point)
├── Receiver (webhook endpoint)
├── Piper (processing)
├── Packer (Parquet conversion)
├── Control (management)
└── S3/MinIO (your storage)

Managed¶

ByteFreezer runs compute, you provide data collection and storage:

Your Network                    ByteFreezer Cloud
├── Proxy (your site)           ├── Receiver
└── S3/MinIO (your storage) ◄── ├── Piper
                                ├── Packer
                                └── Control

Hybrid¶

Mix of on-prem and managed for different environments.

See Deployment Options for details.