Files
clutch/docs/DATABASE_LOGICAL_STRUCTURE.md
2026-02-05 23:26:03 +08:00

3.6 KiB
Raw Blame History

Database Logical Structure (ER Diagram)

This diagram illustrates the logical relationships and data flow between the storage layers (L1, L2, L3) in the optimized architecture.

erDiagram
    %% ==========================================
    %% L1 LAYER: RAW DATA (Data Lake)
    %% ==========================================
    
    L1A_raw_iframe_network {
        string match_id PK
        json content "Raw API Response"
        timestamp processed_at
    }

    L1B_tick_snapshots_parquet {
        string match_id FK
        int tick
        int round
        json player_states "Positions, HP, Equip"
        json bomb_state
        string file_path "Parquet File Location"
    }

    %% ==========================================
    %% L2 LAYER: DATA WAREHOUSE (Structured)
    %% ==========================================

    dim_players {
        string steam_id_64 PK
        string username
        float rating
        float avg_clutch_win_rate
    }

    dim_maps {
        int map_id PK
        string map_name "de_mirage"
        string nav_mesh_path
    }

    fact_matches {
        string match_id PK
        int map_id FK
        timestamp start_time
        int winner_team
        int final_score_ct
        int final_score_t
    }

    fact_rounds {
        string round_id PK
        string match_id FK
        int round_num
        int winner_side
        string win_reason "Elimination/Bomb/Time"
    }

    L2_Spatial_NavMesh {
        string map_name PK
        string zone_id
        binary distance_matrix "Pre-calculated paths"
    }

    %% ==========================================
    %% L3 LAYER: FEATURE STORE (AI Ready)
    %% ==========================================

    L3_Offline_Features {
        string snapshot_id PK
        float feature_tpi "Time Pressure Index"
        float feature_crossfire "Tactical Score"
        float feature_equipment_diff
        int label_is_win "Target Variable"
    }

    %% ==========================================
    %% RELATIONSHIPS
    %% ==========================================

    %% L1 -> L2 Flow
    L1A_raw_iframe_network ||--|{ fact_matches : "Extracts to"
    L1A_raw_iframe_network ||--|{ dim_players : "Extracts to"
    L1B_tick_snapshots_parquet }|--|| fact_matches : "Belongs to"
    L1B_tick_snapshots_parquet }|--|| fact_rounds : "Details"

    %% L2 Relations
    fact_matches }|--|| dim_maps : "Played on"
    fact_rounds }|--|| fact_matches : "Part of"
    
    %% L2 -> L3 Flow (Feature Engineering)
    L3_Offline_Features }|--|| L1B_tick_snapshots_parquet : "Computed from"
    L3_Offline_Features }|--|| L2_Spatial_NavMesh : "Uses Physics from"
    L3_Offline_Features }|--|| dim_players : "Enriched with"

结构说明 (Structure Explanation)

  1. L1 源数据层:

    • 左上 (L1A): 传统的数据库表,存储比赛结果元数据。
    • 左下 (L1B): 虚线框表示的文件系统。虽然物理上是 Parquet 文件但在逻辑上它是一张巨大的“Tick 级快照表”,通过 match_id 与其他层关联。
  2. L2 数仓层:

    • 核心 (Dim/Fact): 标准的星型模型。fact_matches 是核心事实表,关联 dim_players (人) 和 dim_maps (地)。
    • 空间 (Spatial): 独立的查找表逻辑,为每一张 dim_maps 提供物理距离计算支持。
  3. L3 特征层:

    • 右侧 (Features): 这是宽表Wide Table每一行直接对应模型的一个训练样本。它不存储原始数据而是存储计算后的数值 (如 TPI 指数),直接由 L1B (位置) + L2 Spatial (距离) + Dim Players (能力) 融合计算而来。