Skip to content

Transformations

Transform your data in real-time with ByteFreezer's pipeline system.

Overview

Transformations are processing pipelines that modify your data as it flows through ByteFreezer. Each dataset can have its own transformation pipeline consisting of one or more filters.

Pipeline Structure

A transformation pipeline is a JSON configuration with an ordered list of filters:

{
  "enabled": true,
  "version": "1.0.0",
  "filters": [
    { "type": "filter_type", "config": { ... } },
    { "type": "another_filter", "config": { ... } }
  ]
}

Filters are applied in order—the output of one filter becomes the input to the next.

Available Filter Categories

Field Manipulation

Filter Description
add_field Add new fields with static or computed values
remove_field Remove unwanted fields
rename_field Rename fields
mutate Convert types, split/merge fields

Pattern Matching & Parsing

Filter Description
grok Parse unstructured text with grok patterns
regex_replace Find and replace with regex
kv Parse key-value pairs
date_parse Parse date strings into timestamps

Data Enrichment

Filter Description
geoip Add geographic data from IP addresses
dns DNS lookups (reverse/forward)
useragent Parse user agent strings
enricher Lookup data from CSV enrichment tables
fingerprint Generate hash fingerprints

Filtering & Sampling

Filter Description
include Keep only matching events
exclude Drop matching events
drop Drop events based on conditions
sample Keep a percentage of events
conditional Apply filters conditionally

JSON Processing

Filter Description
json_validate Validate JSON structure
json_flatten Flatten nested objects
uppercase_keys Convert keys to uppercase

Example Pipeline

This pipeline parses Apache logs, adds geo data, and removes sensitive fields:

{
  "enabled": true,
  "version": "1.0.0",
  "filters": [
    {
      "type": "grok",
      "config": {
        "source_field": "message",
        "pattern": "%{COMBINEDAPACHELOG}"
      }
    },
    {
      "type": "geoip",
      "config": {
        "source_field": "clientip",
        "target_field": "geo"
      }
    },
    {
      "type": "remove_field",
      "config": {
        "fields": ["auth", "rawrequest"]
      }
    },
    {
      "type": "date_parse",
      "config": {
        "field": "timestamp",
        "format": "02/Jan/2006:15:04:05 -0700",
        "target": "@timestamp"
      }
    }
  ]
}

Testing Transformations

Before activating a pipeline, test it with sample data:

  1. Navigate to your dataset's transformation page
  2. Configure your pipeline
  3. Click Test to run against sample events
  4. Review the output to verify correctness
  5. Click Activate when ready

AI Assistant

Use the AI Assistant to help build pipelines from natural language descriptions. Click the AI button in the transformation editor.

Best Practices

Practice Description
Filter early Place include/exclude filters early to reduce processing load
Parse then enrich Extract fields before enriching them
Remove unwanted data Drop unnecessary fields to reduce storage
Test thoroughly Always test with representative sample data before activating
Version your configs Use meaningful version numbers for tracking changes

Performance Considerations

  • Grok patterns can be CPU-intensive—use specific patterns when possible
  • DNS lookups add latency—use sparingly
  • Enricher lookups are fast but keep tables reasonably sized
  • Conditional filters help skip unnecessary processing