Glossary

RML

RML (RDF Mapping Language) is a generic framework and language for defining mappings from heterogeneous data structures (JSON, CSV, XML) to the RDF data model, enabling semantic data integration.

Get in touch Learn more

Data scientist building training data pipeline on laptop, data preprocessing visible, technical workspace.

SEMANTIC DATA FABRIC

What is RML?

RML (RDF Mapping Language) is a declarative language and framework for mapping heterogeneous data sources—including JSON, CSV, XML, and relational databases—directly into an RDF knowledge graph.

RML is a generic extension of the W3C standard R2RML, designed to handle non-relational data structures. It uses mapping rules defined in Turtle syntax to specify how source data fields are transformed into RDF triples, linking them to classes and properties in a target ontology. This enables the creation of a virtual or materialized knowledge graph from raw, disparate data without manual conversion, forming the core of a semantic data fabric.

The language operates through logical sources, subject maps, and predicate-object maps to generate unique URIs, literals, and typed relationships. By providing a standardized, reusable mapping definition, RML automates semantic integration pipelines, ensuring consistent data provenance and enabling federated queries across originally siloed systems. It is a foundational tool for building enterprise knowledge graphs that serve as a deterministic layer for retrieval-augmented generation (RAG) and other reasoning systems.

RML

Core Components of an RML Mapping

An RML mapping document is a declarative specification that defines how data from heterogeneous sources is transformed into RDF. It consists of several key, interrelated components that work together to produce a deterministic graph.

Logical Source

The Logical Source (rml:logicalSource) defines the raw data to be processed. It specifies the access method (e.g., a file path, database query, or API endpoint) and the reference formulation that dictates how to access values within the data structure (e.g., JSONPath for JSON, XPath for XML, or column names for CSV). This component abstracts the physical data location and format, making the mapping reusable across different data retrieval scenarios.

Subject Map

A Subject Map (rr:subjectMap) defines the rule for generating the subject of each RDF triple. It is the central node to which all predicates and objects are attached. The subject is typically created using a template (rr:template) that concatenates a base IRI with values from the source data (e.g., http://example.com/person/{ID}). Alternatively, it can be a constant IRI or a blank node. Each Triples Map must have exactly one Subject Map.

Predicate-Object Map

The Predicate-Object Map (rr:predicateObjectMap) associates one or more predicates and their corresponding objects to the subject. It contains:

A Predicate Map (rr:predicateMap): Defines the property (predicate) IRI, often a constant like foaf:name.
An Object Map (rr:objectMap) or Referencing Object Map (rr:parentTriplesMap): Defines the value (object). The object can be:
- A literal generated from a source column/field.
- An IRI, created via a template.
- A blank node.
- Another subject, via a Referencing Object Map, which creates a relationship between two entities defined in different Triples Maps.

Triples Map

The Triples Map (rr:TriplesMap) is the core container unit of an RML mapping. It binds together a Logical Source, a Subject Map, and one or more Predicate-Object Maps. Each Triples Map defines a rule for generating a set of RDF triples (subject-predicate-object) from each logical record (row, object, element) in the source. A complete mapping document is composed of multiple, potentially interconnected Triples Maps.

Term Map & Reference

Term Maps (rr:TermMap) are the abstract rules for generating RDF terms (subjects, predicates, objects). They specify the term type (IRI, blank node, or literal) and its value. A key subtype is a Reference-valued Term Map, where the term's value is dynamically generated from the source data using a reference (rml:reference). For example, in a JSON source, rml:reference might be "$.firstName", instructing the mapper to extract the value from that JSONPath location to create a literal object.

Function Map & Data Transformations

Function Maps (fnml:functionValue) enable complex data transformations within the mapping process. They allow the application of functions (from libraries like FnO) to source values before they become RDF terms. Common use cases include:

String manipulation: Concatenation, case conversion, substring extraction.
Data type conversion: Casting strings to xsd:dateTime or xsd:integer.
Mathematical operations: Calculations on numerical data.
Conditional logic: Generating different values based on source data conditions. This moves mapping beyond simple direct copying into the realm of data cleansing and enrichment.

SEMANTIC DATA FABRIC

How RML Works: The Mapping Process

RML (RDF Mapping Language) is a declarative framework for transforming heterogeneous data formats into a unified RDF knowledge graph through a structured mapping process.

The RML mapping process begins with a mapping document, an RDF file written in Turtle syntax that defines rules for converting source data—like JSON, CSV, or XML—into RDF triples. Each rule specifies a logical source (the data file or database query), a subject map (how to generate the URI for each new entity), and predicate-object maps (how to assign properties and values, including links to other entities or literal data types). This declarative approach separates transformation logic from the underlying data, enabling reusable, maintainable integration pipelines.

An RML processor, or mapper, executes these rules. It parses the source data, iterates over its logical records (rows, objects, or elements), and instantiates the defined triples into a target RDF dataset or knowledge graph. The process supports complex operations like join conditions across multiple sources, value transformation with functions, and the generation of RDF-star annotations for provenance. By standardizing this transformation, RML enables the deterministic creation of a virtual or materialized knowledge graph from raw, siloed enterprise data.

RDF MAPPING LANGUAGE

Primary Use Cases for RML

RML (RDF Mapping Language) is the standard framework for declaratively mapping heterogeneous data formats—including JSON, CSV, XML, and relational databases—into a unified RDF knowledge graph. Its primary applications center on building semantic data fabrics and enabling deterministic data integration.

Building Enterprise Knowledge Graphs

RML is the foundational tool for constructing enterprise knowledge graphs from siloed operational data. It provides a declarative mapping language to define how records in source systems correspond to RDF triples (subject-predicate-object). This process, known as knowledge graph materialization, creates a persistent, queryable graph that serves as a single source of truth.

Key Activity: Defining mappings from CSV customer files, JSON API responses, and XML product catalogs into a unified ontology.
Outcome: Enables complex SPARQL queries across previously disconnected data domains.

Creating Virtual Knowledge Graphs

Instead of physically materializing all data, RML mappings can power a virtual knowledge graph (VKG). In this architecture, a query engine uses RML definitions to federate queries in real-time across the original source systems. The graph is an on-demand, integrated view without massive data duplication.

Key Benefit: Provides immediate semantic integration for analytics without lengthy ETL processes.
Technical Mechanism: An RML processor acts as a virtual RDFizer, translating a SPARQL query into source-specific queries (e.g., SQL, MongoDB queries) and mapping the results back to RDF.

Semantic Data Fabric Implementation

RML is the core mapping engine within a semantic data fabric. It operationalizes the fabric's semantic layer by providing the executable instructions that transform raw data into contextualized, ontology-aligned knowledge. This bridges the gap between physical data storage and business-centric conceptual models.

Architectural Role: Sits within the semantic integration pipeline, often alongside tools for ontology alignment and entity resolution.
Business Value: Delivers semantic interoperability, allowing different departments to access data with consistent, shared meaning.

Data Product Publication

Domain teams use RML to publish data products as linked data. By mapping a domain-owned dataset (e.g., a product catalog in a PostgreSQL database) to a shared enterprise ontology, the team exposes its data as a standards-compliant RDF graph. This graph can be consumed directly via SPARQL or integrated into the broader knowledge graph.

Data Mesh Alignment: Enforces the data-as-a-product principle by providing a standardized interface (RDF/SPARQL) for domain data.
Governance: Mappings document the exact relationship between source schema and target ontology, providing clear data lineage.

Enabling Graph-Based RAG

RML feeds deterministic factual grounding for Retrieval-Augmented Generation (RAG) systems. By transforming enterprise data into a knowledge graph, RML creates a verifiable source of facts. A Graph RAG architecture can then traverse this graph to retrieve connected subgraphs as context for a large language model, drastically reducing hallucinations.

Process: Raw documents and databases → RML → Knowledge Graph → Graph Retrieval → LLM Context.
Advantage over Vector Search: Provides explicit relationships and logical provenance for retrieved information.

Legacy System Modernization

RML acts as a modernization wrapper for legacy systems (mainframes, old SQL databases) by exposing their data as linked data. Organizations can thus integrate decades-old data into modern AI and analytics platforms without replacing the core transactional systems. The mappings serve as a declarative abstraction layer.

Typical Source: Relational databases mapped using RML's basis in the R2RML standard.
Strategic Outcome: Unlocks legacy data for use in digital twins, advanced analytics, and compliance reporting without disruptive migration.

MAPPING LANGUAGE STANDARDS

RML vs. R2RML: A Technical Comparison

A feature-by-feature comparison of the RDF Mapping Language (RML) and its foundational standard, R2RML, highlighting their capabilities for semantic data integration.

Feature / Specification	R2RML (W3C Standard)	RML (Community Specification)
Primary Scope & Data Source	Relational Databases (RDBMS)	Multiple Heterogeneous Sources (CSV, JSON, XML, RDBMS, APIs)
Core Standardization Body	World Wide Web Consortium (W3C) Recommendation	Community-driven specification by the RML Community
Mapping Definition Target	RDF Dataset (Graph / Named Graph)	RDF Dataset (Graph / Named Graph)
Logical Table Definition	SQL query or base table name	Reference to data source + iterator (e.g., JSONPath, XPath, SQL)
Support for Nested Data Structures
Built-in Function Library for Data Transformation	Limited (R2RML-defined SQL functions)	Extended (via FnO - Function Ontology for string, numeric, date ops)
Referencing Object Maps (Foreign Keys)	Via parent triples map (rr:parentTriplesMap)	Via join condition with parent triples map
Specification of Data Source Format & Access	Implicitly via JDBC connection	Explicit via logical source declarations (rml:source, rml:referenceFormulation)
Primary Use Case	Lifting relational enterprise data to RDF	Building a semantic data fabric from diverse, modern data sources

RML

Frequently Asked Questions

RML (RDF Mapping Language) is a core standard for mapping heterogeneous data into a unified knowledge graph. These FAQs address its purpose, mechanics, and role in enterprise semantic architectures.

RML (RDF Mapping Language) is a declarative, rule-based language for defining mappings from diverse, heterogeneous data sources—including JSON, CSV, XML, and relational databases—into the RDF (Resource Description Framework) data model. It works by specifying mapping rules that define how each field or element in a source document corresponds to RDF triples (subject-predicate-object). A processor engine then executes these rules to transform the raw source data into a structured knowledge graph, creating globally unique identifiers (URIs) for entities and linking them via properties defined in an ontology. RML is an extension of the W3C standard R2RML, generalized to support non-relational data formats.

Enabling Efficiency, Speed & Accuracy

Intelligent Analysis, Decision & Execution

We build AI systems for teams that need search across company data, workflow automation across tools, or AI features inside products and internal software.

Talk to Us

Search across company data

Give teams answers from docs, tickets, runbooks, and product data with sources and permissions.

Useful when people spend too long searching or get different answers from different systems.

Enterprise searchRAGPermissions

Automate internal workflows

Use AI to route work, draft outputs, trigger actions, and keep approvals and logs in place.

Useful when repetitive work moves across multiple tools and teams.

AI agentsWorkflow automationGovernance

Add AI to products and internal tools

Build assistants, guided actions, or decision support into the software your team or customers already use.

Useful when AI needs to be part of the product, not a separate tool.

AI integrationDecision supportModel routing

SEMANTIC DATA FABRIC

Related Terms

RML operates within a broader ecosystem of technologies and methodologies for building semantic data fabrics. These related concepts define the standards, systems, and processes that enable the transformation of raw data into interconnected, meaningful knowledge.

R2RML

R2RML (RDB to RDF Mapping Language) is the W3C standard upon which RML is based. It provides a declarative language for defining mappings from relational database tables to RDF datasets.

Core Focus: Exclusively maps from relational schemas (SQL) to RDF.
Standardization: A formal W3C recommendation, ensuring vendor interoperability.
Foundation: RML extends R2RML's mapping logic to support non-relational, heterogeneous sources like JSON, CSV, and XML.

EXPLORE

Semantic Integration

Semantic integration is the overarching process of combining data from disparate sources by resolving schematic and data-level conflicts. It uses shared ontologies and semantic mappings (like those defined in RML) to create a unified, meaningful view.

Goal: Achieve semantic interoperability, where exchanged data has unambiguous, shared meaning.
Role of Mappings: RML mappings are the executable specifications that drive this integration, transforming raw data into a coherent knowledge graph.

Virtual Knowledge Graph (VKG)

A Virtual Knowledge Graph is a system that provides a unified, graph-based view over heterogeneous data sources in real-time using mapping definitions, without requiring the physical materialization of the entire graph.

On-Demand Access: Data is transformed into RDF at query time via mappings (RML/R2RML).
Key Benefit: Eliminates the latency and storage overhead of ETL, offering a live, integrated view.
Architecture: The VKG engine uses RML documents to understand how to query and map each underlying source.

SPARQL

SPARQL is the W3C standard query language for RDF knowledge graphs. It is used to retrieve and manipulate data stored in RDF format, which is the target output of an RML mapping.

Query Target: The integrated RDF graph created via RML mappings is queried using SPARQL.
Federated Query: SPARQL can query across multiple distributed endpoints, which can themselves be virtualized views created by RML mappings.
Complementary Role: RML defines the data transformation; SPARQL defines the data interrogation.

EXPLORE

Ontology

An ontology is a formal, explicit specification of a shared conceptualization. It defines the classes, properties, and relationships (the vocabulary) for a particular domain, providing the semantic schema for a knowledge graph.

Mapping Target: RML mappings transform data into instances (individuals) of the classes and properties defined in a target ontology (e.g., OWL, RDFS).
Semantic Context: The ontology gives meaning to the generated RDF triples, ensuring all data conforms to a consistent, logical model.

Semantic Pipeline

A semantic pipeline is an automated workflow that ingests, transforms, enriches, and integrates raw data into a knowledge graph. RML is a core component within such pipelines, responsible for the declarative transformation and mapping stage.

Typical Stages: Ingestion → Cleaning → Mapping (RML) → Entity Linking/Lifting → Materialization/Storage.
Orchestration: RML processors (like the RMLMapper or SDM-RDFizer) are executed as steps within larger data pipeline frameworks (e.g., Apache Airflow, Nextflow).

About the author

Prasad Kumkar

CEO & MD, Inference Systems

Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.

His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.

Limited slotsGet a Free AI Consultation

How We Work

Custom AI workflows for your Business

One-fit-all AI don't work for modern businesses. At Inferensys, we aim to understand your business & custom requirements; which we use to define most efficient agentic workflows, the data, and the tools for your business.

Talk to Us

RML

What is RML?

Core Components of an RML Mapping

Logical Source

Subject Map

Predicate-Object Map

Triples Map

Term Map & Reference

Function Map & Data Transformations

How RML Works: The Mapping Process

Primary Use Cases for RML

Building Enterprise Knowledge Graphs

Creating Virtual Knowledge Graphs

Semantic Data Fabric Implementation

Data Product Publication

Enabling Graph-Based RAG

Legacy System Modernization

RML vs. R2RML: A Technical Comparison

Frequently Asked Questions

Intelligent Analysis, Decision & Execution

Search across company data

Automate internal workflows

Add AI to products and internal tools

R2RML

SPARQL

Prasad Kumkar

Partnered with leading AI, data, and software stack.

Custom AI workflows for your Business

Review the use case

Pick the right approach

Build the first useful version

Improve from there