Sources¶
ByteFreezer accepts data from a wide variety of sources. The Proxy component handles data collection and forwarding.
Supported Protocols¶
Network Protocols¶
| Protocol | Port | Description |
|---|---|---|
| UDP | Configurable | High-throughput, connectionless |
| TCP | Configurable | Reliable, connection-oriented |
| Syslog | 514 (UDP/TCP) | Standard logging protocol (RFC 5424) |
| sFlow | 6343 (UDP) | Network traffic sampling |
| IPFIX | 4739 (UDP/TCP) | IP Flow Information Export |
HTTP/Webhook¶
| Method | Endpoint | Description |
|---|---|---|
| POST | /webhook/{tenant}/{dataset} | Direct HTTP ingestion |
| POST | /v1/events | Batch event submission |
Message Queues¶
| Queue | Description |
|---|---|
| SQS | AWS Simple Queue Service |
| Kafka | Apache Kafka topics |
| NATS | NATS messaging |
| Kinesis | AWS Kinesis streams |
Configuration¶
UDP/TCP Sources¶
sFlow/IPFIX Sources¶
Message Queue Sources¶
sources:
- type: kafka
brokers:
- kafka1:9092
- kafka2:9092
topic: security-events
group_id: bytefreezer
- type: sqs
queue_url: https://sqs.us-east-1.amazonaws.com/123456789/events
region: us-east-1
Webhook Endpoint¶
For HTTP-based ingestion, send events directly to the Receiver:
# Single event
curl -X POST https://receiver.bytefreezer.com/webhook/{tenant}/{dataset} \
-H "Content-Type: application/json" \
-d '{"timestamp": "2024-01-15T10:30:00Z", "level": "info", "message": "Event"}'
# Batch events (JSON Lines)
curl -X POST https://receiver.bytefreezer.com/webhook/{tenant}/{dataset} \
-H "Content-Type: application/x-ndjson" \
-d '{"timestamp": "2024-01-15T10:30:00Z", "event": "login"}
{"timestamp": "2024-01-15T10:30:01Z", "event": "logout"}'
Data Formats¶
ByteFreezer accepts multiple data formats:
| Format | Content-Type | Description |
|---|---|---|
| JSON | application/json | Single JSON object |
| JSON Lines | application/x-ndjson | One JSON object per line |
| Syslog | N/A (network) | RFC 5424 structured data |
| Raw | text/plain | Plain text (parsed downstream) |
Best Practices¶
High-Throughput Sources¶
For high-volume sources like sFlow or IPFIX:
- Deploy Proxy close to source - Minimize network latency
- Use UDP where possible - Lower overhead than TCP
- Enable batching - Group events before forwarding
- Consider sampling - Use Piper to sample if volume is too high
Reliable Sources¶
For critical data that must not be lost:
- Use TCP or HTTP - Ensures delivery confirmation
- Enable TLS - Encrypt in transit
- Use message queues - SQS/Kafka provide persistence
Multi-Source Environments¶
When collecting from multiple sources:
- Use separate tenants - One tenant per source type
- Tag at collection - Add source metadata in Proxy
- Consistent schemas - Normalize data with Piper transformations