Transformations¶
Transform your data in real-time with ByteFreezer's pipeline system.
Overview¶
Transformations are processing pipelines that modify your data as it flows through ByteFreezer. Each dataset can have its own transformation pipeline consisting of one or more filters.
Pipeline Structure¶
A transformation pipeline is a JSON configuration with an ordered list of filters:
{
"enabled": true,
"version": "1.0.0",
"filters": [
{ "type": "filter_type", "config": { ... } },
{ "type": "another_filter", "config": { ... } }
]
}
Filters are applied in order—the output of one filter becomes the input to the next.
Available Filter Categories¶
Field Manipulation¶
| Filter | Description |
|---|---|
add_field | Add new fields with static or computed values |
remove_field | Remove unwanted fields |
rename_field | Rename fields |
mutate | Convert types, split/merge fields |
Pattern Matching & Parsing¶
| Filter | Description |
|---|---|
grok | Parse unstructured text with grok patterns |
regex_replace | Find and replace with regex |
kv | Parse key-value pairs |
date_parse | Parse date strings into timestamps |
Data Enrichment¶
| Filter | Description |
|---|---|
geoip | Add geographic data from IP addresses |
dns | DNS lookups (reverse/forward) |
useragent | Parse user agent strings |
enricher | Lookup data from CSV enrichment tables |
fingerprint | Generate hash fingerprints |
Filtering & Sampling¶
| Filter | Description |
|---|---|
include | Keep only matching events |
exclude | Drop matching events |
drop | Drop events based on conditions |
sample | Keep a percentage of events |
conditional | Apply filters conditionally |
JSON Processing¶
| Filter | Description |
|---|---|
json_validate | Validate JSON structure |
json_flatten | Flatten nested objects |
uppercase_keys | Convert keys to uppercase |
Example Pipeline¶
This pipeline parses Apache logs, adds geo data, and removes sensitive fields:
{
"enabled": true,
"version": "1.0.0",
"filters": [
{
"type": "grok",
"config": {
"source_field": "message",
"pattern": "%{COMBINEDAPACHELOG}"
}
},
{
"type": "geoip",
"config": {
"source_field": "clientip",
"target_field": "geo"
}
},
{
"type": "remove_field",
"config": {
"fields": ["auth", "rawrequest"]
}
},
{
"type": "date_parse",
"config": {
"field": "timestamp",
"format": "02/Jan/2006:15:04:05 -0700",
"target": "@timestamp"
}
}
]
}
Testing Transformations¶
Before activating a pipeline, test it with sample data:
- Navigate to your dataset's transformation page
- Configure your pipeline
- Click Test to run against sample events
- Review the output to verify correctness
- Click Activate when ready
AI Assistant
Use the AI Assistant to help build pipelines from natural language descriptions. Click the AI button in the transformation editor.
Best Practices¶
| Practice | Description |
|---|---|
| Filter early | Place include/exclude filters early to reduce processing load |
| Parse then enrich | Extract fields before enriching them |
| Remove unwanted data | Drop unnecessary fields to reduce storage |
| Test thoroughly | Always test with representative sample data before activating |
| Version your configs | Use meaningful version numbers for tracking changes |
Performance Considerations¶
- Grok patterns can be CPU-intensive—use specific patterns when possible
- DNS lookups add latency—use sparingly
- Enricher lookups are fast but keep tables reasonably sized
- Conditional filters help skip unnecessary processing