AI integration for Informatica EDC focuses on three core surfaces: the metadata ingestion pipeline, the business glossary, and the data lineage graph. Instead of replacing EDC, AI agents act as co-pilots that listen for new asset discovery events, analyze technical metadata and sample data, and then propose enrichments via EDC's REST API or CLI. This allows teams to automate the generation of column descriptions, PII classification tags, suggested business terms, and data quality rule recommendations as assets are cataloged, turning a passive inventory into an intelligent, self-documenting system.
Integration
AI Integration for Informatica Data Catalog

Where AI Fits into Informatica EDC
A practical guide to embedding AI agents and LLMs into Informatica Enterprise Data Catalog (EDC) workflows for automated metadata enrichment, governance, and discovery.
Implementation typically involves a lightweight service that subscribes to EDC's event framework (e.g., ASSET_DISCOVERED). For each new table or file, the service uses an LLM to analyze the asset's name, column names, data types, and a statistical sample. It then generates structured proposals: a technical summary for the asset description, confidence-scored PII classifications (like PII.Email), and suggested mappings to existing glossary terms. These proposals are posted back to EDC as draft suggestions, requiring steward approval via EDC's UI or a separate workflow tool, ensuring human-in-the-loop governance. This pattern keeps the catalog's authority intact while dramatically accelerating its population and accuracy.
For rollout, start with a pilot on a single, high-value data domain—such as customer or product data. Configure the AI service to only propose enrichments for assets tagged with that domain. Use EDC's custom attributes and workflow capabilities to create an approval queue for AI suggestions. This controlled approach mitigates risk, allows for tuning of prompts and confidence thresholds, and builds trust with data stewards. Over time, the system can be expanded to automate more complex tasks, like using the enriched lineage graph to generate impact analysis reports in natural language or to identify undocumented data dependencies for migration projects. For a deeper dive on governing these automated workflows, see our guide on AI Integration for Informatica Data Governance.
Key Integration Surfaces in Informatica EDC
Automating Technical Metadata Generation
LLMs can be integrated into Informatica EDC's discovery and profiling workflows to generate human-readable summaries of data assets. When EDC scans new databases, files, or APIs, an AI agent can process the raw schema information—table names, column data types, and sample values—to produce concise descriptions.
Example Workflow:
- EDC's discovery job completes, registering new assets.
- A webhook triggers an AI service with the asset's technical metadata.
- The LLM generates a suggested
business_descriptionfor the table andcolumn_definitionfor key fields. - The enriched metadata is posted back to EDC's REST API (
/api/v2/catalog/assets/{id}) for steward review or auto-approval.
This reduces the time data stewards spend on manual documentation, accelerating catalog usability.
High-Value AI Use Cases for EDC
Transform Informatica Enterprise Data Catalog from a passive inventory into an intelligent, self-enriching system. These AI integration patterns automate the most manual and high-value metadata workflows.
Automated Technical Metadata Summaries
Use LLMs to read table DDL, column names, and sample data, then generate plain-English descriptions for assets in the catalog. Workflow: Trigger on asset discovery or update, call LLM API with schema context, write summary back to EDC via REST API. Value: Eliminates manual documentation backlog, making the catalog instantly useful for new data consumers.
Business Glossary Term Suggestion
Analyze column names, data profiles, and existing glossary to suggest new terms and map assets automatically. Workflow: AI reviews unclassified columns, proposes term definitions and relationships, presents to stewards for approval in EDC UI. Value: Accelerates governance programs and improves term coverage without exhaustive manual review.
PII and Sensitive Data Identification
Augment pattern-based scanners with LLM context to identify non-standard PII, sensitive narratives in comment fields, and inferred data classes. Workflow: AI analyzes sample data and metadata, assigns confidence-scored classifications, and tags assets in EDC for policy enforcement. Value: Closes compliance gaps missed by regex rules, especially in unstructured or free-text fields.
Data Lineage Gap Analysis & Enrichment
Use AI to infer missing lineage links by analyzing job names, SQL logs, and data movement patterns, suggesting connections for steward review. Workflow: AI processes EDC lineage graphs and operational metadata, proposes probable missing edges, integrates approved links. Value: Creates more complete, trustworthy lineage for impact analysis and regulatory reporting.
Natural Language Catalog Search & Q&A
Deploy a RAG-powered agent over EDC's metadata, allowing users to ask 'What tables contain customer revenue for the EU region?' in plain language. Workflow: Vectorize EDC metadata, embed in a vector DB, use LLM to interpret query and retrieve relevant assets. Value: Drastically reduces time for data discovery, especially for non-technical business users.
Stewardship Workflow Automation
Automate ticket creation, assignment, and escalation for data quality issues, term approval requests, and certification workflows detected by AI. Workflow: AI monitors data quality scores and user requests, creates and routes tasks in EDC, follows up via email integration. Value: Ensures governance processes are executed, not just documented, improving data trust.
Example AI-Augmented Workflows
These workflows illustrate how LLM-powered agents can automate high-effort, manual tasks within Informatica Enterprise Data Catalog (EDC), turning passive metadata into active intelligence. Each pattern connects to EDC's APIs and data model to drive measurable efficiency gains.
Trigger: A new data asset (table, file, API endpoint) is discovered and ingested into EDC.
Context Pulled: The agent retrieves the asset's raw metadata from EDC's REST API: object name, column names, data types, and sample data (if profiling is enabled).
Agent Action: An LLM analyzes the column names, data types, and sample values to generate:
- A concise, business-friendly description of the asset's purpose.
- Descriptive, plain-language explanations for each column (e.g.,
cust_id→ "Unique identifier for the customer record, used as the primary key in the CRM system"). - Inferred data classifications (e.g.,
PII,Financial,Operational).
System Update: The agent uses EDC's API to write the generated descriptions and suggested classifications back to the asset's metadata properties. It can also create or link to suggested business glossary terms.
Human Review Point: Suggested PII classifications and new glossary terms are placed in a stewardship queue within EDC for a data owner to review and approve before final application.
Implementation Architecture & Data Flow
A practical blueprint for integrating LLMs with Informatica Enterprise Data Catalog (EDC) to automate metadata enrichment and governance workflows.
The integration connects to Informatica EDC's REST API and metadata database to process discovered assets. A typical flow begins by extracting technical metadata—table names, column definitions, and data lineage—for assets lacking business context. This raw metadata is sent to an LLM service (like Azure OpenAI or Anthropic) via a secure, queued API layer. The LLM is prompted to generate human-readable summaries, suggest business glossary terms, and identify potential PII patterns based on column names, sample values, and existing catalog classifications.
Generated enrichments are returned to a governance service that applies validation rules and, if configured, routes suggestions to designated data stewards in Informatica Axon for approval via webhook. Approved metadata is then written back to EDC, updating asset descriptions, custom properties, and tagging PII columns with appropriate classifications. This creates a closed-loop system where AI suggestions improve catalog quality, which in turn trains better prompts and fine-tunes models on your specific data landscape.
For rollout, we recommend starting with a pilot on a single business domain or data source. Implement the integration as a scheduled batch job (e.g., nightly) to avoid impacting EDC performance. Key governance controls include: logging all AI-generated content with a source:ai_enrichment tag, maintaining an audit trail of changes, and setting confidence thresholds for auto-application versus steward review. This architecture ensures the catalog becomes a living, AI-augmented system of record, dramatically reducing the manual effort of curating technical metadata at scale.
Code & Payload Examples
Automating Business Term Creation
Use LLMs to analyze technical column names and sample data from Informatica EDC, generating candidate business terms and definitions. This automates the initial population of the business glossary, which stewards can then review and approve.
A common pattern is to trigger this enrichment after a new data source is profiled. The payload to the LLM includes the asset name, column metadata, and a few sample values for context. The response is formatted to create or update glossary objects via the Informatica EDC REST API.
python# Example: Generate business term suggestions for a discovered column import openai import requests # Fetch column metadata from Informatica EDC API column_metadata = get_edc_column_metadata(connection_id="conn_123", column_name="cust_acct_num") prompt = f""" Given this database column metadata, suggest a business glossary term and definition. Column Name: {column_metadata['name']} Data Type: {column_metadata['type']} Sample Values: {column_metadata['samples']} Respond in JSON: {{"term": "suggested term", "definition": "clear definition"}} """ response = openai.ChatCompletion.create( model="gpt-4", messages=[{"role": "user", "content": prompt}] ) suggestion = json.loads(response.choices[0].message.content) # Post suggestion to EDC API for steward review requests.post( f"{EDC_BASE_URL}/api/v2/glossary/terms/draft", json={ "name": suggestion['term'], "definition": suggestion['definition'], "associatedAssets": [column_metadata['id']] } )
Realistic Time Savings & Operational Impact
How LLM integration transforms manual metadata management and data discovery workflows within Informatica Enterprise Data Catalog (EDC).
| Data Catalog Task | Manual Process | AI-Assisted Process | Implementation Notes |
|---|---|---|---|
Technical Metadata Summarization | Analyst writes 5-10 minute descriptions per asset | LLM generates draft summaries in seconds | Human curator reviews & refines; scales to 1000s of assets |
Business Glossary Term Suggestion | Stewards manually review data to propose terms | AI scans column names & sample data to suggest candidate terms | Stewards approve, reject, or modify; reduces initial review by 60-70% |
PII & Sensitive Data Identification | Rule-based scans plus manual sampling for context | LLM analyzes unstructured comments & data patterns to flag potential PII | Combines with existing scanners; catches context-dependent sensitive data |
Data Lineage Documentation | Manual interviews and spreadsheet mapping for business logic | AI parses SQL & ETL code to infer and draft business-friendly lineage notes | Accelerates initial documentation; data architect validates connections |
Stale Asset Identification & Triage | Periodic manual reports on last access date | AI analyzes usage patterns, lineage, and project changes to recommend archival candidates | Focuses steward effort on high-impact cleanup decisions |
Cross-System Relationship Discovery | Manual comparison of schemas across source systems | LLM suggests potential foreign key & semantic relationships based on naming and data patterns | Generates hypotheses for stewards to validate, speeding up integration projects |
Data Quality Rule Generation | Stewards manually profile data to define rules | AI profiles sample data and suggests statistical & pattern-based quality rules | Stewards select and tune rules; jumpstarts quality program setup |
Governance, Security, and Phased Rollout
Integrating AI with Informatica EDC requires a controlled approach to ensure metadata quality, security, and user trust.
A production integration typically uses a dedicated service account with role-based access to the Informatica EDC REST API and a secure, isolated environment for the LLM. The AI agent acts as a suggestor, not an auto-applier. All generated metadata—technical summaries, glossary term suggestions, or PII classifications—should be written to a staging table or a dedicated "AI Suggestions" custom object within EDC. This creates a clear audit trail and requires a data steward's review and approval before promotion to production metadata fields, ensuring human oversight and maintaining data governance integrity.
Start with a pilot on a single, well-understood data domain, such as a Customer or Product subject area. Focus the AI on a single high-value task, like generating column descriptions for newly discovered tables. This limits scope, allows for quality benchmarking against manual efforts, and builds stakeholder confidence. Subsequent phases can expand to business glossary term suggestion using approved terminology, and finally to sensitive data identification across broader asset inventories. Each phase should include a feedback loop where steward approvals and rejections are used to fine-tune the AI's prompts and improve suggestion relevance.
Governance is paramount. All AI interactions should be logged, including the source asset ID, the prompt used, the raw LLM output, and the steward's final action (accept, modify, reject). This traceability is critical for compliance audits and for continuously improving the system. Furthermore, ensure no raw business data is sent to the LLM; the integration should pass only technical metadata (column names, data types, sample values from profiling) and existing business glossary context. For PII detection, use pattern matching locally where possible, and only use the LLM for ambiguous cases, always masking or hashing actual data values before any external API call.
Enabling Efficiency, Speed & Accuracy
Intelligent Analysis, Decision & Execution
We build AI systems for teams that need search across company data, workflow automation across tools, or AI features inside products and internal software.
Talk to Us
Search across company data
Give teams answers from docs, tickets, runbooks, and product data with sources and permissions.
Useful when people spend too long searching or get different answers from different systems.

Automate internal workflows
Use AI to route work, draft outputs, trigger actions, and keep approvals and logs in place.
Useful when repetitive work moves across multiple tools and teams.

Add AI to products and internal tools
Build assistants, guided actions, or decision support into the software your team or customers already use.
Useful when AI needs to be part of the product, not a separate tool.
Frequently Asked Questions
Practical answers for data governance teams planning to augment Informatica Enterprise Data Catalog (EDC) with generative AI for metadata enrichment, glossary management, and compliance automation.
AI integration typically connects via Informatica's RESTful APIs and leverages its extensible metadata model. Key touchpoints include:
- Asset API: To fetch discovered technical metadata (tables, columns, files, reports) for AI processing.
- Glossary API: To create, update, or suggest business terms and categories.
- Lineage API: To read and potentially enhance data flow relationships with business context.
- Custom Properties: To write AI-generated summaries, PII classifications, or confidence scores back to assets as extended attributes.
A common pattern uses a middleware service (like a Python app) that:
- Polls EDC for newly discovered or updated assets.
- Sends asset metadata (e.g., column names, sample data, data types) to an LLM via a secure API call.
- Processes the LLM's response (e.g., a business description, PII flag).
- Writes the enriched metadata back to EDC via the Asset API.
This keeps the AI logic decoupled from the core EDC application, allowing for controlled rollouts and easy updates to prompts or models.

About the author
Prasad Kumkar
CEO & MD, Inference Systems
Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.
His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.
Partnered with leading AI, data, and software stack.
How We Work
Custom AI workflows for your Business
One-fit-all AI don't work for modern businesses. At Inferensys, we aim to understand your business & custom requirements; which we use to define most efficient agentic workflows, the data, and the tools for your business.
01
Review the use case
We understand the task, the users, and where AI can actually help.
Read more02
Pick the right approach
We define what needs search, automation, or product integration.
Read more03
Build the first useful version
We implement the part that proves the value first.
Read more04
Improve from there
We add the checks and visibility needed to keep it useful.
Read moreThe first call is a practical review of your use case and the right next step.
Talk to Us