Transformations¶

Transform your data in real-time with ByteFreezer's pipeline system.

Overview¶

Transformations are processing pipelines that modify your data as it flows through ByteFreezer. Each dataset can have its own transformation pipeline consisting of one or more filters.

Pipeline Structure¶

A transformation pipeline is a JSON configuration with an ordered list of filters:

{
  "enabled": true,
  "version": "1.0.0",
  "filters": [
    { "type": "filter_type", "config": { ... } },
    { "type": "another_filter", "config": { ... } }
  ]
}

Filters are applied in order—the output of one filter becomes the input to the next.

Available Filter Categories¶

Field Manipulation¶

Filter	Description
`add_field`	Add new fields with static or computed values
`remove_field`	Remove unwanted fields
`rename_field`	Rename fields
`mutate`	Convert types, split/merge fields

Pattern Matching & Parsing¶

Filter	Description
`grok`	Parse unstructured text with grok patterns
`regex_replace`	Find and replace with regex
`kv`	Parse key-value pairs
`date_parse`	Parse date strings into timestamps

Data Enrichment¶

Filter	Description
`geoip`	Add geographic data from IP addresses
`dns`	DNS lookups (reverse/forward)
`useragent`	Parse user agent strings
`enricher`	Lookup data from CSV enrichment tables
`fingerprint`	Generate hash fingerprints

Filtering & Sampling¶

Filter	Description
`include`	Keep only matching events
`exclude`	Drop matching events
`drop`	Drop events based on conditions
`sample`	Keep a percentage of events
`conditional`	Apply filters conditionally

JSON Processing¶

Filter	Description
`json_validate`	Validate JSON structure
`json_flatten`	Flatten nested objects
`uppercase_keys`	Convert keys to uppercase

Example Pipeline¶

This pipeline parses Apache logs, adds geo data, and removes sensitive fields:

{
  "enabled": true,
  "version": "1.0.0",
  "filters": [
    {
      "type": "grok",
      "config": {
        "source_field": "message",
        "pattern": "%{COMBINEDAPACHELOG}"
      }
    },
    {
      "type": "geoip",
      "config": {
        "source_field": "clientip",
        "target_field": "geo"
      }
    },
    {
      "type": "remove_field",
      "config": {
        "fields": ["auth", "rawrequest"]
      }
    },
    {
      "type": "date_parse",
      "config": {
        "field": "timestamp",
        "format": "02/Jan/2006:15:04:05 -0700",
        "target": "@timestamp"
      }
    }
  ]
}

Testing Transformations¶

Before activating a pipeline, test it with sample data:

Navigate to your dataset's transformation page
Configure your pipeline
Click Test to run against sample events
Review the output to verify correctness
Click Activate when ready

AI Assistant

Use the AI Assistant to help build pipelines from natural language descriptions. Click the AI button in the transformation editor.

Best Practices¶

Practice	Description
Filter early	Place include/exclude filters early to reduce processing load
Parse then enrich	Extract fields before enriching them
Remove unwanted data	Drop unnecessary fields to reduce storage
Test thoroughly	Always test with representative sample data before activating
Version your configs	Use meaningful version numbers for tracking changes

Performance Considerations¶

Grok patterns can be CPU-intensive—use specific patterns when possible
DNS lookups add latency—use sparingly
Enricher lookups are fast but keep tables reasonably sized
Conditional filters help skip unnecessary processing