Skip to content

Graph Algorithms

Traversal, pathfinding, and analysis algorithms in jnkn.

Architectural Overview

jnkn's graph engine uses a robust, interface-driven architecture backed by high-performance rustworkx. This design decouples analysis logic (Blast Radius, Trace, Diff) from storage and processing, enabling backend agnosticism while leveraging Rust's speed for heavy traversals.

The IGraph Interface

All graph operations are defined by the IGraph protocol, ensuring consistent behavior across different backends (e.g., in-memory rustworkx, persistent stores, or test mocks).

class IGraph(Protocol):
    """Abstract contract for dependency graph implementations."""

    def get_descendants(self, node_id: str, max_depth: int = -1) -> Set[str]: ...
    def get_ancestors(self, node_id: str, max_depth: int = -1) -> Set[str]: ...
    def get_impacted_nodes(self, source_ids: List[str], max_depth: int = -1) -> Set[str]: ...
    def trace(self, source_id: str, target_id: str) -> List[List[str]]: ...

Core Data Structure: DependencyGraph

The DependencyGraph class implements IGraph using rustworkx.PyDiGraph for speed, with secondary indexes for O(1) lookups.

class DependencyGraph(IGraph):
    def __init__(self):
        self._graph = rx.PyDiGraph()
        # Map external string IDs to internal integer indices
        self._id_to_idx: Dict[str, int] = {}
        self._idx_to_node: Dict[int, Node] = {}
        # Inverted index for fuzzy stitching
        self.token_index = TokenIndex()

Semantic Impact Analysis

Standard graph traversal (downstream/upstream) is insufficient for infrastructure-as-code analysis because dependency direction often opposes data flow direction.

The Problem:

  • Code: App.py --(READS)--> ENV_VAR
  • Graph Edge: App -> Env
  • Impact: If Env changes, App breaks.
  • Standard Descendants: Env has no outgoing edges to App.

The Solution: get_impacted_nodes jnkn implements a custom semantic BFS that traverses edges based on their logical data flow, not just graph direction.

def get_impacted_nodes(self, source_ids: List[str], max_depth: int = -1) -> Set[str]:
    """
    Calculate impact by traversing:
    1. Downstream for data flow (PROVIDES, WRITES, FLOWS_TO)
    2. Upstream for dependencies (READS, DEPENDS_ON) - Reverse traversal
    """
    FORWARD_TYPES = {"provides", "writes", "flows_to", "provisions"}
    REVERSE_TYPES = {"reads", "depends_on", "calls"}

    # ... BFS implementation that flips direction based on edge type ...

This ensures that changing an infrastructure resource correctly "blasts" upwards to the applications that read its outputs.

Algorithms & Implementations

1. Blast Radius (Semantic BFS)

Finds all artifacts semantically affected by a change.

  • Algorithm: Semantic Breadth-First Search (BFS)
  • Complexity: $O(V + E)$
  • Logic:
    • Start at changed nodes.
    • Follow PROVIDES edges forward.
    • Follow READS edges backward.
    • Repeat until max_depth or exhaust.

2. Lineage Tracing (Pathfinding)

Finds the path explaining why Node A affects Node B.

  • Algorithm: rustworkx.all_simple_paths
  • Complexity: $O(V + E)$ (highly optimized in Rust)
  • Implementation:
    1. Attempt forward semantic trace (following data flow).
    2. Fallback to reverse dependency check if direct flow is broken but a dependency exists.
def trace(self, source_id: str, target_id: str) -> List[List[str]]:
    src_idx = self._id_to_idx[source_id]
    tgt_idx = self._id_to_idx[target_id]
    # Optimized Rust call
    return rx.all_simple_paths(self._graph, src_idx, tgt_idx)

3. Stitching (Fuzzy Matching)

Connects nodes across different domains (e.g., Terraform -> Python) using token overlap.

  • Structure: TokenIndex (Inverted Index)
  • Complexity: $O(1)$ lookup per token
  • Logic:
    • Tokenize node names (e.g., PAYMENT_DB_HOST -> payment, db, host).
    • Index nodes by tokens.
    • Query index to find candidates sharing significant tokens.

Performance Improvements

Moving to rustworkx provided significant gains over the previous NetworkX implementation:

Operation NetworkX (Old) rustworkx (New) Improvement
Graph Construction Python Loop Rust Vector Ops \~5x faster
Pathfinding Pure Python BFS Parallel Rust \~10-50x faster
Memory Usage Heavy Python Objects Compact Rust Structs \~3x reduction
Descendants Recursive Python Optimized Native \~100x faster

Future Enhancements

1. Weighted Impact Scoring

Enhance get_impacted_nodes to decay impact score over distance: $$\text{Score}(node) = \frac{\text{Confidence}(path)}{1 + \text{Distance}}$$

2. Cycle Detection & breaking

Use rustworkx.simple_cycles to detect circular dependencies (e.g., A reads B, B reads A) that might cause infinite analysis loops or deployment deadlocks.

3. Parallel Subgraph Analysis

For massive monorepos, partition the graph into disjoint subgraphs (using rustworkx.connected_components) and run semantic analysis on each component in parallel threads.