Module entity_churn

Module entity_churn 

Source
Expand description

Multi-day entity churn analysis with intelligent ephemeral pattern detection.

Tracks entity lifecycle across multiple audit log files to identify:

  • New entities appearing each day
  • Returning vs. churned entities
  • Entity persistence patterns
  • Authentication method usage trends
  • Ephemeral entities using data-driven pattern learning

§Usage

# Analyze entity churn across a week
vault-audit entity-churn day1.log day2.log day3.log day4.log day5.log day6.log day7.log

# With baseline for accurate new entity detection
vault-audit entity-churn *.log --baseline baseline_entities.json

# With entity mappings for enriched display names
vault-audit entity-churn *.log --baseline baseline.json --entity-map entity_mappings.json

# Export detailed churn data with ephemeral analysis
vault-audit entity-churn *.log --output entity_churn.json

# Export as CSV format
vault-audit entity-churn *.log --output entity_churn.csv --format csv

§Ephemeral Pattern Detection

The command uses a sophisticated two-pass analysis to detect ephemeral entities (e.g., CI/CD pipeline entities, temporary build entities) with confidence scoring:

Pass 1: Data Collection

  • Track all entities across log files
  • Record first/last seen times and files
  • Count login activity per entity

Pass 2: Pattern Learning & Classification

  • Learn patterns from entities that appeared 1-2 days
  • Identify naming patterns (e.g., github-repo:org/repo:ref:branch)
  • Calculate confidence scores (0.0-1.0) based on:
    • Days active (1 day = high confidence, 2 days = medium)
    • Similar entities on same mount path
    • Activity levels (low login counts)
    • Gaps in activity (reduces confidence for sporadic access)

§Output

§Entity Lifecycle Classification:

  • new_day_N: Entities first seen on day N (not in baseline)
  • pre_existing_baseline: Entities that existed before analysis period

§Activity Patterns:

  • consistent: Appeared in most/all log files
  • sporadic: Appeared intermittently with gaps
  • declining: Activity decreased over time
  • single_burst: Appeared only once

§Ephemeral Detection:

  • Confidence levels: High (≥70%), Medium (50-69%), Low (40-49%)
  • Detailed reasoning for each classification
  • Top ephemeral entities by confidence
  • Pattern statistics and mount path analysis

§JSON Output Fields

When using --output, each entity record includes:

  • entity_id: Vault entity identifier
  • display_name: Human-readable name
  • first_seen_file / first_seen_time: When first observed
  • last_seen_file / last_seen_time: When last observed
  • files_appeared: List of log files entity was active in
  • total_logins: Total login count across all files
  • lifecycle: Entity lifecycle classification
  • activity_pattern: Behavioral pattern classification
  • is_ephemeral_pattern: Boolean flag for ephemeral detection
  • ephemeral_confidence: Confidence score (0.0-1.0)
  • ephemeral_reasons: Array of human-readable reasons

Only tracks entities that performed login operations (paths ending in /login).

Functions§

run