Inferensys

Integration

AI for Custodian Identification and Ranking

A practical integration blueprint for using AI to analyze communication patterns, content, and metadata to automatically identify and prioritize key custodians during legal hold and collection, outputting ranked lists and profiles directly into e-discovery platform custodian modules.
Data scientist building training data pipeline on laptop, data preprocessing visible, technical workspace.
ARCHITECTURE AND ROLLOUT

Where AI Fits in Custodian Identification

A technical blueprint for integrating AI into the custodian identification workflow within e-discovery platforms like Relativity, Everlaw, DISCO, and Nuix.

AI integration for custodian identification typically connects to three key platform surfaces: the custodian management module, the data processing pipeline, and the review workspace. The AI agent ingests communication metadata (from email servers, MS 365, Slack exports) and content, then analyzes patterns like communication volume, centrality in networks, topic clusters, and keyword frequency. Results are pushed back to the platform as a ranked custodian list, often creating or updating custom objects (e.g., Custodian records in Relativity) with AI-generated fields for RelevanceScore, KeyTopics, and CommunicationNetworkRole.

Implementation involves a background service that polls the platform's API for new data collections or uses webhooks triggered by processing completion. For each custodian candidate, the AI runs a multi-step analysis: 1) Network Analysis to map who communicates with whom, 2) Content Scoring based on case-relevant terms and concepts, and 3) Temporal Analysis to flag activity around key dates. The output is a structured payload sent via the platform's REST API to update custodian records, often triggering platform-native workflows for legal hold issuance or collection prioritization. This reduces the manual analyst work from days of spreadsheet analysis to hours of AI-assisted review.

Rollout should start with a pilot on a single matter, comparing AI rankings against a senior reviewer's manual list to calibrate scoring thresholds. Governance is critical: all AI-generated scores and tags must be auditable, with the underlying analysis (e.g., "why was this custodian ranked #1?") accessible via a drill-down report or linked annotation. Integrate a human-in-the-loop approval step before the ranked list updates the production custodian module, ensuring the legal team maintains final control. This phased approach de-risks the integration while demonstrating concrete time savings in the critical early phase of discovery.

CUSTODIAN IDENTIFICATION AND RANKING

Platform-Specific Integration Surfaces

Targeting Native Custodian Objects

AI for custodian ranking integrates directly with the custodian or person management modules within e-discovery platforms. In Relativity, this means enriching the Custodian object or related custom objects via the REST API, adding fields like AI_RelevanceScore, AI_CommunicationVolume, and AI_KeyTopicTags. For Everlaw, integration focuses on the People feature, using its API to append analysis results as properties or link to smart tags.

Key surfaces include:

  • Custodian data grids: Inject AI-generated scores and tags for sortable, filterable columns.
  • Custodian reports: Automate generation of custodian heatmaps and prioritization lists.
  • Hold notification workflows: Trigger communications based on AI-ranked tiers, integrating with platform email or export features.

The goal is to make AI outputs native, searchable, and actionable within the platform's existing custodian workflow, avoiding external reports that create silos.

INTEGRATION PATTERNS

High-Value AI Use Cases for Custodian Identification and Ranking

Custodian identification is a critical, time-intensive phase in e-discovery. These AI integration patterns connect directly to platform custodian management modules to analyze communication patterns, content relevance, and organizational roles, outputting prioritized lists and risk scores.

01

Communication Network Analysis

AI analyzes email To/CC/BCC fields, chat participants, and meeting invites to map communication volume and centrality. Integrates with the platform's custodian object to auto-populate relationship strength and influence scores, highlighting key hubs beyond the org chart.

1 sprint
Initial mapping
02

Content Relevance Scoring

LLMs evaluate custodian document sets against case issues and keywords. Scores for topic prevalence, sentiment shifts, and privilege likelihood are written back as custom fields in the custodian record, enabling sortable, data-driven prioritization for legal hold.

Batch -> Real-time
Scoring workflow
03

Role & Tenure Enrichment

AI agents cross-reference custodian names with HRIS data (via secure APIs) to append job function, project history, and employment timeline. This context is injected into the platform to flag custodians in sensitive roles or during critical periods.

Same day
Data enrichment
04

Risk-Based Custodian Tiering

A composite AI model synthesizes network, content, and role data to assign custodians to Tier 1 (Critical), Tier 2 (Relevant), or Tier 3 (Peripheral). Results populate a platform dashboard or custom object, driving collection strategy and resource allocation.

Hours -> Minutes
Tier assignment
05

Dynamic Custodian List Maintenance

As new data is ingested, AI continuously re-evaluates custodian rankings. Webhooks or platform event handlers trigger re-scoring, and the custodian management interface is updated automatically, ensuring the legal team works from a current, evidence-based list.

Batch -> Real-time
List refresh
06

Integration with Legal Hold Modules

Prioritized custodian lists and risk scores are formatted and pushed directly into the platform's legal hold issuance workflow (e.g., Relativity Legal Hold, Everlaw's Custodian Manager). This automates the creation of hold groups and tracks custodian responsiveness.

Hours -> Minutes
Hold group setup
IMPLEMENTATION PATTERNS

Example AI-Powered Custodian Workflows

These workflows demonstrate how AI agents can be integrated into e-discovery platforms to automate the identification, ranking, and management of custodians. Each pattern connects to platform APIs for data ingestion, analysis, and result output, creating a closed-loop system that reduces manual investigation time from weeks to days.

Trigger: A new matter is created in the e-discovery platform (e.g., Relativity, Everlaw) and a data source (like an Office 365 tenant) is staged for collection.

Workflow:

  1. An AI agent is triggered via a platform webhook or scheduled job.
  2. The agent queries the platform's API for the matter's metadata and uses system connectors to pull communication logs (Exchange Online, Teams, Slack exports) before full collection.
  3. Using graph analysis and LLM-powered pattern recognition, the agent analyzes:
    • Communication Volume & Centrality: Who sends/receives the most messages?
    • Topic Clustering: Which individuals are central to discussions about key case concepts (e.g., "merger," "pricing")?
    • Temporal Analysis: Who was active during critical event periods?
  4. The agent generates a ranked list of custodians with confidence scores and supporting rationale.
  5. It creates Custom Objects or populates a custodian management worksheet within the platform via API, pre-populating fields like Custodian Name, Employee ID, Data Source, Relevance Score, and Key Rationale.

Human Review Point: The legal team reviews the AI-generated list in the platform's UI, adjusting the preservation order and collection scope before issuing legal holds.

FROM DATA SOURCES TO RANKED CUSTODIAN LISTS

Implementation Architecture: Data Flow and APIs

A production-ready architecture for ingesting communication data, applying AI analysis, and outputting prioritized custodian lists directly into your e-discovery platform's management modules.

The integration connects to communication data sources—typically Microsoft 365 Exchange Online, Google Workspace, Slack Enterprise Grid, or on-premises email archives—via their respective APIs or secure data exports. A dedicated ingestion service pulls metadata (To, From, CC, BCC, Timestamps) and content, normalizing it into a unified schema. This raw corpus is then processed by an AI analysis pipeline that performs entity resolution (matching email addresses to known employee records), communication graph analysis (identifying central nodes and clusters), and content salience scoring (using LLMs to flag messages containing key case terms, sensitive topics, or urgent sentiment).

Results are structured into a custodian profile object for each individual, containing calculated metrics like: message_volume, centrality_score, topic_relevance_score, and timeline_coverage. These profiles are pushed via the e-discovery platform's API—such as the Relativity Custodian Manager API, Everlaw's People API, or DISCO's Custodians endpoint—to create or update custodian records. The AI system can auto-populate fields for Collection Priority (High/Medium/Low), Suggested Search Terms, and Notes with the rationale for the ranking, turning a manual, spreadsheet-driven process into an API-driven workflow that updates in near real-time as new data is ingested.

For governance, all AI-generated scores and recommendations are logged with versioning and audit trails, allowing legal teams to review the rationale for a custodian's rank. The system can be configured to operate in an assistive mode, requiring a reviewer's approval before updating platform records, or a fully automated mode for low-risk, high-volume matters. This architecture ensures the AI acts as a force multiplier for legal teams, moving custodian identification from a multi-day, manual analysis task to a process that delivers a continuously refined, data-driven priority list within hours of data receipt.

AI FOR CUSTODIAN IDENTIFICATION AND RANKING

Code and Payload Examples

Analyzing Email and Chat Networks

This Python example uses the Relativity REST API to fetch communication metadata, then applies network analysis to identify central custodians. The script calculates metrics like betweenness centrality and communication volume to score each custodian's potential importance.

python
import requests
import networkx as nx
import pandas as pd

# Fetch communication data from Relativity
relativity_api_url = "https://your-instance.relativity.com/Relativity.REST/api/"
auth_token = "YOUR_OAUTH_TOKEN"
workspace_id = 123456

# Query for email metadata (From, To, CC, Date)
query_payload = {
    "condition": "('Document Type' EQUALS 'Email')",
    "fields": ["Control Number", "Extracted Email From", "Extracted Email To", "Extracted Email CC", "Date Sent"],
    "length": 10000
}

response = requests.post(
    f"{relativity_api_url}workspace/{workspace_id}/documents/query",
    headers={"Authorization": f"Bearer {auth_token}", "Content-Type": "application/json"},
    json=query_payload
)

# Build a network graph
G = nx.Graph()
for doc in response.json()["Objects"]:
    sender = doc["Extracted Email From"]
    recipients = doc["Extracted Email To"].split(';') + doc["Extracted Email CC"].split(';')
    for recipient in recipients:
        if recipient.strip():
            if G.has_edge(sender, recipient.strip()):
                G[sender][recipient.strip()]['weight'] += 1
            else:
                G.add_edge(sender, recipient.strip(), weight=1)

# Calculate centrality scores
centrality_scores = nx.betweenness_centrality(G, weight='weight')
volume_scores = {node: G.degree(node, weight='weight') for node in G.nodes()}

# Combine into a custodian ranking DataFrame
df_ranking = pd.DataFrame([
    {
        "Custodian": custodian,
        "Betweenness_Centrality": centrality_scores.get(custodian, 0),
        "Communication_Volume": volume_scores.get(custodian, 0),
        "Score": centrality_scores.get(custodian, 0) * 0.7 + (volume_scores.get(custodian, 0) / max(volume_scores.values())) * 0.3
    }
    for custodian in set(list(centrality_scores.keys()) + list(volume_scores.keys()))
]).sort_values("Score", ascending=False)

print(df_ranking.head(10))

This analysis identifies custodians who act as communication hubs or bridges between teams, who are often critical for legal hold.

AI FOR CUSTODIAN IDENTIFICATION AND RANKING

Realistic Time Savings and Operational Impact

This table illustrates the operational impact of integrating AI into the custodian identification workflow within platforms like Relativity, Everlaw, DISCO, and Nuix. It compares manual processes against AI-assisted workflows, showing realistic time savings and improvements in accuracy and strategic focus.

Workflow StageManual / Traditional ProcessAI-Assisted ProcessImpact & Implementation Notes

Initial Custodian List Generation

Manual compilation from HR directories and interview notes over 2-3 days

AI analyzes org charts, communication metadata, and content to propose a ranked list in 2-4 hours

Reduces foundational legwork; output is a CSV or JSON for import into platform custodian modules

Communication Pattern Analysis

Manual review of sample email threads to map relationships (weeks)

AI models analyze entire corpus to map communication frequency, centrality, and topic clusters (hours)

Uncovers hidden key players and informal networks not visible in org charts

Relevance Scoring & Prioritization

Subjective ranking based on custodian title and initial interviews

AI scores custodians based on volume of relevant comms, keyword density, and connection to key issues

Creates a data-driven priority queue for legal hold and collection, reducing collection scope by 20-40%

Legal Hold Notice Drafting

Generic notices drafted manually for all custodians

AI generates personalized notice summaries referencing their likely relevant data types and preservation duties

Increases compliance and understanding; integrates with platform's notice tracking via API

Collection Scope Definition

Broad collection mandates based on custodian role

AI recommends specific date ranges, data sources (email, Slack, cloud drives), and search terms per custodian

Focuses collection efforts, cutting downstream processing and review volume significantly

Integration with Platform Custodian Module

Manual data entry of custodian details and manual tagging

Automated API sync of AI-generated profiles, scores, and recommended tags into platform custodian objects

Ensures a single source of truth; enables dynamic reporting and workflow triggers based on custodian rank

Ongoing Custodian Re-ranking

Static list; re-evaluation only upon new manual discovery

Continuous re-scoring as new data is ingested, with alerts for custodians rising in relevance

Maintains investigation agility; implemented via platform event handlers or scheduled scripts

CONTROLLED DEPLOYMENT FOR SENSITIVE LEGAL WORK

Governance, Security, and Phased Rollout

A secure, phased implementation ensures AI-driven custodian analysis enhances—rather than disrupts—existing legal hold and collection workflows.

Implementation begins by connecting to the e-discovery platform's custodian management module via its API (e.g., Relativity's Custodian object, Everlaw's Custodians endpoint). The AI agent ingests communication metadata—From/To/CC, Date, Subject—and content from a controlled, pre-processed dataset. It analyzes patterns like communication volume, centrality in networks, and topic relevance to key issues, outputting a ranked list with confidence scores and supporting evidence. These results are written back as custom fields (e.g., AI_Custodian_Rank, AI_Key_Topics) or to a dedicated reporting object, never overwriting existing legal team classifications.

Security is paramount. The AI service operates within the same secure environment as the e-discovery platform, with all data in transit and at rest encrypted. Access is governed by the platform's native RBAC; the AI only processes data the authenticated user can already see. All AI interactions—queries, model calls, result writes—are logged to a dedicated audit trail, creating a defensible record of the automated analysis for judicial or regulatory scrutiny. For highly sensitive matters, a human-in-the-loop approval step can be configured, where the AI's custodian list is presented in a staging area for a senior reviewer or case manager to approve before promotion to the live custodian list.

A phased rollout minimizes risk. Phase 1 (Pilot): The AI runs in a parallel, non-production workspace on a closed set of historical matters. The legal team compares its rankings against known outcomes to calibrate confidence thresholds. Phase 2 (Assisted): The AI is enabled for active matters but its outputs are presented as "recommendations" alongside manual custodian lists within the platform's UI, allowing teams to build trust. Phase 3 (Integrated): For validated workflows, the AI automatically updates custodian priority scores and tags, triggering platform-native alerts or workflow rules to expedite collection. This controlled approach turns a powerful capability into a reliable, governed component of the legal process.

AI FOR CUSTODIAN IDENTIFICATION AND RANKING

Frequently Asked Questions

Practical questions for legal and technical teams implementing AI to identify and prioritize key custodians within e-discovery platforms like Relativity, Everlaw, DISCO, and Nuix.

The AI agent ingests and analyzes multiple data streams to build a comprehensive communication and content profile for each potential custodian:

  • Communication Metadata: From email headers, chat logs (Slack, Teams), and calendar invites to map To/From/CC patterns, frequency, and timing.
  • Content Analysis: The body of emails, chat messages, and documents to identify topics discussed, project names, technical jargon, and sentiment.
  • Platform-Specific Data: Native fields from the e-discovery platform, such as:
    • Relativity: Custodian Manager fields, Email Threading results, DtSearch index metadata.
    • Everlaw: Custodian object properties, Communication Analysis data.
    • DISCO: Custodian attributes from the processing engine.
    • Nuix: People entities identified during processing.
  • External HR Data (if integrated): Job titles, departments, and tenure from systems like Workday to enrich the analysis.

The agent uses this data to calculate metrics like communication centrality, topic authority, and temporal relevance to the matter's key dates.

Prasad Kumkar

About the author

Prasad Kumkar

CEO & MD, Inference Systems

Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.

His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.