Inferensys

Guide

Setting Up Multi-Vendor Product Data Normalization

A technical guide to ingesting and standardizing disparate product data from multiple suppliers into a unified schema for AI agent consumption using Apache Airflow ETL pipelines.
Developer demonstrating multi-agent tool use, agent tool selection interface on laptop, casual tech demo moment.

Learn how to transform disparate supplier data into a unified, AI-ready product catalog.

Multi-vendor product data normalization is the ETL process that ingests raw data from diverse suppliers and maps it to a unified schema. Without this, AI agents cannot reliably compare products due to conflicting attribute names (e.g., 'color' vs. 'colour'), mismatched units, and inconsistent categorization. This foundational step is critical for enabling agentic commerce, where autonomous AI buyers research and purchase products. The output is a clean, structured product catalog that serves as the single source of truth for all downstream AI applications.

You will implement this using an orchestration tool like Apache Airflow to schedule and monitor data pipelines. The core task involves creating mapping rules—logic that transforms vendor-specific fields into your canonical schema. For example, you'll write rules to convert all weight measurements to kilograms and map various category trees to a standard taxonomy. This structured data is then exposed via an agent-ready API, a prerequisite for guides like How to Architect an AI Buyer-Ready Product API.

ETL TRANSFORMATIONS

Vendor Schema Mapping Examples

A comparison of normalization strategies for common product attributes from disparate vendor APIs.

Product AttributeVendor A (JSON)Vendor B (XML)Unified Schema (Target)

Product Identifier

sku

ProductCode

product_id

Product Name

item_name

ProductTitle

name

Price

price_usd

Cost

price (USD)

Weight

weight_lbs

ShippingWeightOz

weight_kg

In Stock

Y

available

Lead Time

3-5 days

5

lead_time_days

Category

Electronics > Computers

PC

category_path

Dimensions

10x5x2

Length:10,Width:5,Height:2

dimensions_cm

TROUBLESHOOTING

Common Mistakes

When setting up multi-vendor product data normalization, developers often encounter the same pitfalls that break pipelines or produce unusable data for AI agents. This guide addresses the most frequent errors and provides concrete solutions.

This occurs when your Extract, Transform, Load (ETL) pipeline fails to standardize vendor-specific units (e.g., 'lbs' vs 'kg', 'each' vs 'unit'). AI agents cannot perform accurate comparisons with mixed units.

How to fix it:

  • Create a master unit mapping table in your transformation logic.
  • Use a library like pint for Python to handle unit conversion programmatically.
  • Implement a validation step that flags unmapped units for manual review.
python
# Example unit normalization function
import pint
ureg = pint.UnitRegistry()

def normalize_weight(raw_value, raw_unit):
    try:
        # Map vendor abbreviations to standard units
        unit_map = {'lbs': 'pound', 'kg': 'kilogram', 'oz': 'ounce'}
        standard_unit = unit_map.get(raw_unit.lower(), raw_unit)
        quantity = ureg.Quantity(float(raw_value), standard_unit)
        # Convert all weights to kilograms
        return quantity.to('kilogram').magnitude
    except:
        return None  # Flag for data quality review
Prasad Kumkar

About the author

Prasad Kumkar

CEO & MD, Inference Systems

Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.

His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.