Skip to content

Architecture

ByteFreezer is a distributed system with specialized components for each stage of the data pipeline.

Component Overview

┌─────────────────────────────────────────────────────────────────────────────┐
│                              DATA SOURCES                                    │
│   UDP │ TCP │ Syslog │ sFlow │ IPFIX │ HTTP │ SQS │ Kafka │ NATS │ Kinesis  │
└───────────────────────────────────┬─────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────────┐
│                               PROXY                                          │
│   • Receives data from all source types                                      │
│   • Batches and compresses for efficiency                                    │
│   • Forwards to Receiver via HTTP                                            │
└───────────────────────────────────┬─────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────────┐
│                              RECEIVER                                        │
│   • HTTP webhook endpoint                                                    │
│   • Stores raw data to S3/MinIO                                              │
│   • Maintains data durability                                                │
└───────────────────────────────────┬─────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────────┐
│                                PIPER                                         │
│   • Applies transformations                                                  │
│   • Runs enrichers (geo-tagging, custom)                                     │
│   • Filters and samples data                                                 │
│   • Routes to destinations                                                   │
└───────────────────────────────────┬─────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────────┐
│                               PACKER                                         │
│   • Reads processed data                                                     │
│   • Converts to Parquet format                                               │
│   • Auto-partitions by time                                                  │
│   • Handles schema evolution                                                 │
└───────────────────────────────────┬─────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────────┐
│                            STORAGE (S3/MinIO)                                │
│   • Parquet files with columnar storage                                      │
│   • Auto-partitioned by account/tenant/dataset/time                          │
│   • Schema evolution supported                                               │
│   • Ready for AI model integration                                           │
└───────────────────────────────────┬─────────────────────────────────────────┘
                          ┌─────────┴──────────┐
                          ▼                    ▼
┌──────────────────────────────────┐ ┌────────────────────────────────────────┐
│          QUERY LAYER             │ │             CONNECTOR                    │
│   • DuckDB for SQL queries       │ │   • Export subsets to external systems   │
│   • AI agents for NL queries     │ │   • Elasticsearch, Splunk, webhooks     │
│   • BYOA (Bring Your Own AI)     │ │   • SQL-based filtering                 │
└──────────────────────────────────┘ └────────────────────────────────────────┘

Data Sovereignty

ByteFreezer is designed so that your data never leaves your infrastructure.

  • Proxy runs in your network — data is collected at your edge
  • Receiver, Piper, Packer run in your infrastructure — processing happens on your compute
  • Storage is your S3 or MinIO bucket — you own every byte
  • Control plane (managed by ByteFreezer) only stores metadata: dataset configurations, health status, user accounts. It never touches raw event data.

The control plane communicates with your components over HTTPS. Components pull configuration from Control and report health status back. No raw data flows through the control plane.

See Security Model for a detailed breakdown of what you run vs. what we run.

Components

Proxy

The Proxy is the data collection point. It accepts data from various sources and efficiently batches and forwards to the Receiver.

Feature Description
Multi-protocol UDP, TCP, Syslog, sFlow, IPFIX, HTTP
Message queues SQS, Kafka, NATS, Kinesis
Batching Groups events for efficient transfer
Compression Reduces bandwidth usage

Receiver

The Receiver is an HTTP webhook endpoint that stores raw data to object storage.

Feature Description
HTTP endpoint RESTful webhook interface
S3 compatible Works with S3, MinIO, or any S3-compatible storage
Durability 3-stage spooling: memory → disk → S3
Multi-tenant Routes data by tenant and dataset

Piper

The Piper is the data processing engine that applies transformations and enrichments after Receiver stores raw data.

Feature Description
Transformations Modify, rename, parse fields
Enrichers Geo-tagging, custom lookups
Filters Drop unwanted data
Sampling Reduce volume while maintaining visibility

Packer

The Packer converts processed data into optimized Parquet files for long-term storage.

Feature Description
Parquet conversion Columnar format for efficient queries
Auto-partitioning Partitions by account/tenant/dataset/time
Schema evolution Handles changing data structures
Compression Snappy/Zstd for space efficiency

Control

The Control plane manages configuration and coordinates all components. This is the only component managed by ByteFreezer.

Feature Description
Configuration Centralized config management
Health monitoring Tracks component health
API gateway Unified API for management
UI Web interface for configuration and monitoring

Data Hierarchy

ByteFreezer organizes data in a hierarchical structure:

Account (Organization)
  └── Tenant (Data Source / Environment)
        └── Dataset (Data Stream / Collection)
              └── Events (Individual Records)

See Data Model for details.

Deployment Topologies

On-Premises

You run all data-plane components in your infrastructure. Control plane is managed by ByteFreezer:

Your Infrastructure                 ByteFreezer
├── Proxy (data collection)         ├── Control (config + health)
├── Receiver (webhook endpoint)     └── UI (management dashboard)
├── Piper (processing)
├── Packer (Parquet conversion)
├── Query (analytics)
├── Connector (data export)
└── S3/MinIO (your storage)

Enterprise

White-glove deployment for regulated environments. Air-gapped, custom integrations, dedicated support.

See Deployment Options for details.