RML is a generic extension of the W3C standard R2RML, designed to handle non-relational data structures. It uses mapping rules defined in Turtle syntax to specify how source data fields are transformed into RDF triples, linking them to classes and properties in a target ontology. This enables the creation of a virtual or materialized knowledge graph from raw, disparate data without manual conversion, forming the core of a semantic data fabric.
Glossary
RML

What is RML?
RML (RDF Mapping Language) is a declarative language and framework for mapping heterogeneous data sources—including JSON, CSV, XML, and relational databases—directly into an RDF knowledge graph.
The language operates through logical sources, subject maps, and predicate-object maps to generate unique URIs, literals, and typed relationships. By providing a standardized, reusable mapping definition, RML automates semantic integration pipelines, ensuring consistent data provenance and enabling federated queries across originally siloed systems. It is a foundational tool for building enterprise knowledge graphs that serve as a deterministic layer for retrieval-augmented generation (RAG) and other reasoning systems.
Core Components of an RML Mapping
An RML mapping document is a declarative specification that defines how data from heterogeneous sources is transformed into RDF. It consists of several key, interrelated components that work together to produce a deterministic graph.
Logical Source
The Logical Source (rml:logicalSource) defines the raw data to be processed. It specifies the access method (e.g., a file path, database query, or API endpoint) and the reference formulation that dictates how to access values within the data structure (e.g., JSONPath for JSON, XPath for XML, or column names for CSV). This component abstracts the physical data location and format, making the mapping reusable across different data retrieval scenarios.
Subject Map
A Subject Map (rr:subjectMap) defines the rule for generating the subject of each RDF triple. It is the central node to which all predicates and objects are attached. The subject is typically created using a template (rr:template) that concatenates a base IRI with values from the source data (e.g., http://example.com/person/{ID}). Alternatively, it can be a constant IRI or a blank node. Each Triples Map must have exactly one Subject Map.
Predicate-Object Map
The Predicate-Object Map (rr:predicateObjectMap) associates one or more predicates and their corresponding objects to the subject. It contains:
- A Predicate Map (
rr:predicateMap): Defines the property (predicate) IRI, often a constant likefoaf:name. - An Object Map (
rr:objectMap) or Referencing Object Map (rr:parentTriplesMap): Defines the value (object). The object can be:- A literal generated from a source column/field.
- An IRI, created via a template.
- A blank node.
- Another subject, via a Referencing Object Map, which creates a relationship between two entities defined in different Triples Maps.
Triples Map
The Triples Map (rr:TriplesMap) is the core container unit of an RML mapping. It binds together a Logical Source, a Subject Map, and one or more Predicate-Object Maps. Each Triples Map defines a rule for generating a set of RDF triples (subject-predicate-object) from each logical record (row, object, element) in the source. A complete mapping document is composed of multiple, potentially interconnected Triples Maps.
Term Map & Reference
Term Maps (rr:TermMap) are the abstract rules for generating RDF terms (subjects, predicates, objects). They specify the term type (IRI, blank node, or literal) and its value. A key subtype is a Reference-valued Term Map, where the term's value is dynamically generated from the source data using a reference (rml:reference). For example, in a JSON source, rml:reference might be "$.firstName", instructing the mapper to extract the value from that JSONPath location to create a literal object.
Function Map & Data Transformations
Function Maps (fnml:functionValue) enable complex data transformations within the mapping process. They allow the application of functions (from libraries like FnO) to source values before they become RDF terms. Common use cases include:
- String manipulation: Concatenation, case conversion, substring extraction.
- Data type conversion: Casting strings to
xsd:dateTimeorxsd:integer. - Mathematical operations: Calculations on numerical data.
- Conditional logic: Generating different values based on source data conditions. This moves mapping beyond simple direct copying into the realm of data cleansing and enrichment.
How RML Works: The Mapping Process
RML (RDF Mapping Language) is a declarative framework for transforming heterogeneous data formats into a unified RDF knowledge graph through a structured mapping process.
The RML mapping process begins with a mapping document, an RDF file written in Turtle syntax that defines rules for converting source data—like JSON, CSV, or XML—into RDF triples. Each rule specifies a logical source (the data file or database query), a subject map (how to generate the URI for each new entity), and predicate-object maps (how to assign properties and values, including links to other entities or literal data types). This declarative approach separates transformation logic from the underlying data, enabling reusable, maintainable integration pipelines.
An RML processor, or mapper, executes these rules. It parses the source data, iterates over its logical records (rows, objects, or elements), and instantiates the defined triples into a target RDF dataset or knowledge graph. The process supports complex operations like join conditions across multiple sources, value transformation with functions, and the generation of RDF-star annotations for provenance. By standardizing this transformation, RML enables the deterministic creation of a virtual or materialized knowledge graph from raw, siloed enterprise data.
Primary Use Cases for RML
RML (RDF Mapping Language) is the standard framework for declaratively mapping heterogeneous data formats—including JSON, CSV, XML, and relational databases—into a unified RDF knowledge graph. Its primary applications center on building semantic data fabrics and enabling deterministic data integration.
Building Enterprise Knowledge Graphs
RML is the foundational tool for constructing enterprise knowledge graphs from siloed operational data. It provides a declarative mapping language to define how records in source systems correspond to RDF triples (subject-predicate-object). This process, known as knowledge graph materialization, creates a persistent, queryable graph that serves as a single source of truth.
- Key Activity: Defining mappings from CSV customer files, JSON API responses, and XML product catalogs into a unified ontology.
- Outcome: Enables complex SPARQL queries across previously disconnected data domains.
Creating Virtual Knowledge Graphs
Instead of physically materializing all data, RML mappings can power a virtual knowledge graph (VKG). In this architecture, a query engine uses RML definitions to federate queries in real-time across the original source systems. The graph is an on-demand, integrated view without massive data duplication.
- Key Benefit: Provides immediate semantic integration for analytics without lengthy ETL processes.
- Technical Mechanism: An RML processor acts as a virtual RDFizer, translating a SPARQL query into source-specific queries (e.g., SQL, MongoDB queries) and mapping the results back to RDF.
Semantic Data Fabric Implementation
RML is the core mapping engine within a semantic data fabric. It operationalizes the fabric's semantic layer by providing the executable instructions that transform raw data into contextualized, ontology-aligned knowledge. This bridges the gap between physical data storage and business-centric conceptual models.
- Architectural Role: Sits within the semantic integration pipeline, often alongside tools for ontology alignment and entity resolution.
- Business Value: Delivers semantic interoperability, allowing different departments to access data with consistent, shared meaning.
Data Product Publication
Domain teams use RML to publish data products as linked data. By mapping a domain-owned dataset (e.g., a product catalog in a PostgreSQL database) to a shared enterprise ontology, the team exposes its data as a standards-compliant RDF graph. This graph can be consumed directly via SPARQL or integrated into the broader knowledge graph.
- Data Mesh Alignment: Enforces the data-as-a-product principle by providing a standardized interface (RDF/SPARQL) for domain data.
- Governance: Mappings document the exact relationship between source schema and target ontology, providing clear data lineage.
Enabling Graph-Based RAG
RML feeds deterministic factual grounding for Retrieval-Augmented Generation (RAG) systems. By transforming enterprise data into a knowledge graph, RML creates a verifiable source of facts. A Graph RAG architecture can then traverse this graph to retrieve connected subgraphs as context for a large language model, drastically reducing hallucinations.
- Process: Raw documents and databases → RML → Knowledge Graph → Graph Retrieval → LLM Context.
- Advantage over Vector Search: Provides explicit relationships and logical provenance for retrieved information.
Legacy System Modernization
RML acts as a modernization wrapper for legacy systems (mainframes, old SQL databases) by exposing their data as linked data. Organizations can thus integrate decades-old data into modern AI and analytics platforms without replacing the core transactional systems. The mappings serve as a declarative abstraction layer.
- Typical Source: Relational databases mapped using RML's basis in the R2RML standard.
- Strategic Outcome: Unlocks legacy data for use in digital twins, advanced analytics, and compliance reporting without disruptive migration.
RML vs. R2RML: A Technical Comparison
A feature-by-feature comparison of the RDF Mapping Language (RML) and its foundational standard, R2RML, highlighting their capabilities for semantic data integration.
| Feature / Specification | R2RML (W3C Standard) | RML (Community Specification) |
|---|---|---|
Primary Scope & Data Source | Relational Databases (RDBMS) | Multiple Heterogeneous Sources (CSV, JSON, XML, RDBMS, APIs) |
Core Standardization Body | World Wide Web Consortium (W3C) Recommendation | Community-driven specification by the RML Community |
Mapping Definition Target | RDF Dataset (Graph / Named Graph) | RDF Dataset (Graph / Named Graph) |
Logical Table Definition | SQL query or base table name | Reference to data source + iterator (e.g., JSONPath, XPath, SQL) |
Support for Nested Data Structures | ||
Built-in Function Library for Data Transformation | Limited (R2RML-defined SQL functions) | Extended (via FnO - Function Ontology for string, numeric, date ops) |
Referencing Object Maps (Foreign Keys) | Via parent triples map (rr:parentTriplesMap) | Via join condition with parent triples map |
Specification of Data Source Format & Access | Implicitly via JDBC connection | Explicit via logical source declarations (rml:source, rml:referenceFormulation) |
Primary Use Case | Lifting relational enterprise data to RDF | Building a semantic data fabric from diverse, modern data sources |
Frequently Asked Questions
RML (RDF Mapping Language) is a core standard for mapping heterogeneous data into a unified knowledge graph. These FAQs address its purpose, mechanics, and role in enterprise semantic architectures.
RML (RDF Mapping Language) is a declarative, rule-based language for defining mappings from diverse, heterogeneous data sources—including JSON, CSV, XML, and relational databases—into the RDF (Resource Description Framework) data model. It works by specifying mapping rules that define how each field or element in a source document corresponds to RDF triples (subject-predicate-object). A processor engine then executes these rules to transform the raw source data into a structured knowledge graph, creating globally unique identifiers (URIs) for entities and linking them via properties defined in an ontology. RML is an extension of the W3C standard R2RML, generalized to support non-relational data formats.
Enabling Efficiency, Speed & Accuracy
Intelligent Analysis, Decision & Execution
We build AI systems for teams that need search across company data, workflow automation across tools, or AI features inside products and internal software.
Talk to Us
Search across company data
Give teams answers from docs, tickets, runbooks, and product data with sources and permissions.
Useful when people spend too long searching or get different answers from different systems.

Automate internal workflows
Use AI to route work, draft outputs, trigger actions, and keep approvals and logs in place.
Useful when repetitive work moves across multiple tools and teams.

Add AI to products and internal tools
Build assistants, guided actions, or decision support into the software your team or customers already use.
Useful when AI needs to be part of the product, not a separate tool.
Related Terms
RML operates within a broader ecosystem of technologies and methodologies for building semantic data fabrics. These related concepts define the standards, systems, and processes that enable the transformation of raw data into interconnected, meaningful knowledge.
Semantic Integration
Semantic integration is the overarching process of combining data from disparate sources by resolving schematic and data-level conflicts. It uses shared ontologies and semantic mappings (like those defined in RML) to create a unified, meaningful view.
- Goal: Achieve semantic interoperability, where exchanged data has unambiguous, shared meaning.
- Role of Mappings: RML mappings are the executable specifications that drive this integration, transforming raw data into a coherent knowledge graph.
Virtual Knowledge Graph (VKG)
A Virtual Knowledge Graph is a system that provides a unified, graph-based view over heterogeneous data sources in real-time using mapping definitions, without requiring the physical materialization of the entire graph.
- On-Demand Access: Data is transformed into RDF at query time via mappings (RML/R2RML).
- Key Benefit: Eliminates the latency and storage overhead of ETL, offering a live, integrated view.
- Architecture: The VKG engine uses RML documents to understand how to query and map each underlying source.
Ontology
An ontology is a formal, explicit specification of a shared conceptualization. It defines the classes, properties, and relationships (the vocabulary) for a particular domain, providing the semantic schema for a knowledge graph.
- Mapping Target: RML mappings transform data into instances (individuals) of the classes and properties defined in a target ontology (e.g., OWL, RDFS).
- Semantic Context: The ontology gives meaning to the generated RDF triples, ensuring all data conforms to a consistent, logical model.
Semantic Pipeline
A semantic pipeline is an automated workflow that ingests, transforms, enriches, and integrates raw data into a knowledge graph. RML is a core component within such pipelines, responsible for the declarative transformation and mapping stage.
- Typical Stages: Ingestion → Cleaning → Mapping (RML) → Entity Linking/Lifting → Materialization/Storage.
- Orchestration: RML processors (like the RMLMapper or SDM-RDFizer) are executed as steps within larger data pipeline frameworks (e.g., Apache Airflow, Nextflow).

About the author
Prasad Kumkar
CEO & MD, Inference Systems
Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.
His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.
Partnered with leading AI, data, and software stack.
How We Work
Custom AI workflows for your Business
One-fit-all AI don't work for modern businesses. At Inferensys, we aim to understand your business & custom requirements; which we use to define most efficient agentic workflows, the data, and the tools for your business.
01
Review the use case
We understand the task, the users, and where AI can actually help.
Read more02
Pick the right approach
We define what needs search, automation, or product integration.
Read more03
Build the first useful version
We implement the part that proves the value first.
Read more04
Improve from there
We add the checks and visibility needed to keep it useful.
Read moreThe first call is a practical review of your use case and the right next step.
Talk to Us