Skip to content

Architecture

ByteFreezer is a distributed system with specialized components for each stage of the data pipeline.

Component Overview

┌─────────────────────────────────────────────────────────────────────────────┐
│                              DATA SOURCES                                    │
│   UDP │ TCP │ Syslog │ sFlow │ IPFIX │ HTTP │ SQS │ Kafka │ NATS │ Kinesis  │
└───────────────────────────────────┬─────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────────┐
│                               PROXY                                          │
│   • Receives data from all source types                                      │
│   • Batches and compresses for efficiency                                    │
│   • Forwards to Receiver via HTTP                                            │
└───────────────────────────────────┬─────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────────┐
│                              RECEIVER                                        │
│   • HTTP webhook endpoint                                                    │
│   • Stores raw data to S3/MinIO                                              │
│   • Maintains data durability                                                │
└───────────────────────────────────┬─────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────────┐
│                               PACKER                                         │
│   • Reads raw data from S3                                                   │
│   • Converts to Parquet format                                               │
│   • Auto-partitions by time                                                  │
│   • Handles schema evolution                                                 │
└───────────────────────────────────┬─────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────────┐
│                                PIPER                                         │
│   • Applies transformations                                                  │
│   • Runs enrichers (geo-tagging, custom)                                     │
│   • Filters and samples data                                                 │
│   • Routes to destinations                                                   │
└───────────────────────────────────┬─────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────────┐
│                            STORAGE (S3/MinIO)                                │
│   • Parquet files with columnar storage                                      │
│   • Auto-partitioned by account/tenant/dataset/time                          │
│   • Schema evolution supported                                               │
│   • Ready for AI model integration                                           │
└───────────────────────────────────┬─────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────────┐
│                           QUERY LAYER                                        │
│   • DuckDB for SQL queries                                                   │
│   • AI agents for natural language                                           │
│   • Grafana integration                                                      │
│   • BYOA (Bring Your Own AI)                                                 │
└─────────────────────────────────────────────────────────────────────────────┘

Components

Proxy

The Proxy is the data collection point. It accepts data from various sources and efficiently batches and forwards to the Receiver.

Feature Description
Multi-protocol UDP, TCP, Syslog, sFlow, IPFIX, HTTP
Message queues SQS, Kafka, NATS, Kinesis
Batching Groups events for efficient transfer
Compression Reduces bandwidth usage

Receiver

The Receiver is an HTTP webhook endpoint that stores raw data to object storage.

Feature Description
HTTP endpoint RESTful webhook interface
S3 compatible Works with S3, MinIO, or any S3-compatible storage
Durability Ensures data is safely stored before acknowledging
Multi-tenant Routes data by tenant and dataset

Packer

The Packer converts raw data into optimized Parquet files.

Feature Description
Parquet conversion Columnar format for efficient queries
Auto-partitioning Partitions by account/tenant/dataset/time
Schema evolution Handles changing data structures
Compression Snappy/Zstd for space efficiency

Piper

The Piper is the data processing engine that applies transformations and enrichments.

Feature Description
Transformations Modify, rename, parse fields
Enrichers Geo-tagging, custom lookups
Filters Drop unwanted data
Sampling Reduce volume while maintaining visibility

Control

The Control plane manages configuration and coordinates all components.

Feature Description
Configuration Centralized config management
Health monitoring Tracks component health
API gateway Unified API for management
Multi-deployment Manages on-prem and managed instances

Data Hierarchy

ByteFreezer organizes data in a hierarchical structure:

Account (Organization)
  └── Tenant (Data Source / Environment)
        └── Dataset (Data Stream / Collection)
              └── Events (Individual Records)

See Data Model for details.

Deployment Topologies

On-Premises

All components run in your infrastructure:

Your Network
├── Proxy (data collection point)
├── Receiver (webhook endpoint)
├── Piper (processing)
├── Packer (Parquet conversion)
├── Control (management)
└── S3/MinIO (your storage)

Managed

ByteFreezer runs compute, you provide data collection and storage:

Your Network                    ByteFreezer Cloud
├── Proxy (your site)           ├── Receiver
└── S3/MinIO (your storage) ◄── ├── Piper
                                ├── Packer
                                └── Control

Hybrid

Mix of on-prem and managed for different environments.

See Deployment Options for details.