Trigger: A data scientist or ML engineer initiates a new model training job, referencing a dataset in a feature store or data lake.
Context Pulled: The integration agent captures the dataset URI, job ID, and user context. It queries the data catalog (e.g., Alation, Collibra) to retrieve the existing business glossary terms, data quality scores, and PII classification tags already associated with the source tables.
Agent Action: Using the lineage capabilities of the governance platform (or a tool like MANTA), the agent automatically constructs and publishes a new lineage record. This record links the training dataset to its source systems, documenting the transformation logic (e.g., SQL query, feature engineering notebook). It also propagates critical metadata: classification labels (e.g., Contains_PII), data steward contacts, and retention policies.
System Update: The new lineage artifact is stored in the governance platform with a type: training_dataset. The model registry (e.g., MLflow, Weights & Biases) is updated via API with a link to this governance record.
Human Review Point: For datasets tagged as high-risk (e.g., containing sensitive PII, used for regulated models), the workflow automatically generates a task for the assigned data steward in Collibra to review and approve the lineage before training proceeds.