Inferensys

Glossary

RML

RML (RDF Mapping Language) is a generic framework and language for defining mappings from heterogeneous data structures (JSON, CSV, XML) to the RDF data model, enabling semantic data integration.
Data scientist building training data pipeline on laptop, data preprocessing visible, technical workspace.
SEMANTIC DATA FABRIC

What is RML?

RML (RDF Mapping Language) is a declarative language and framework for mapping heterogeneous data sources—including JSON, CSV, XML, and relational databases—directly into an RDF knowledge graph.

RML is a generic extension of the W3C standard R2RML, designed to handle non-relational data structures. It uses mapping rules defined in Turtle syntax to specify how source data fields are transformed into RDF triples, linking them to classes and properties in a target ontology. This enables the creation of a virtual or materialized knowledge graph from raw, disparate data without manual conversion, forming the core of a semantic data fabric.

The language operates through logical sources, subject maps, and predicate-object maps to generate unique URIs, literals, and typed relationships. By providing a standardized, reusable mapping definition, RML automates semantic integration pipelines, ensuring consistent data provenance and enabling federated queries across originally siloed systems. It is a foundational tool for building enterprise knowledge graphs that serve as a deterministic layer for retrieval-augmented generation (RAG) and other reasoning systems.

RML

Core Components of an RML Mapping

An RML mapping document is a declarative specification that defines how data from heterogeneous sources is transformed into RDF. It consists of several key, interrelated components that work together to produce a deterministic graph.

01

Logical Source

The Logical Source (rml:logicalSource) defines the raw data to be processed. It specifies the access method (e.g., a file path, database query, or API endpoint) and the reference formulation that dictates how to access values within the data structure (e.g., JSONPath for JSON, XPath for XML, or column names for CSV). This component abstracts the physical data location and format, making the mapping reusable across different data retrieval scenarios.

02

Subject Map

A Subject Map (rr:subjectMap) defines the rule for generating the subject of each RDF triple. It is the central node to which all predicates and objects are attached. The subject is typically created using a template (rr:template) that concatenates a base IRI with values from the source data (e.g., http://example.com/person/{ID}). Alternatively, it can be a constant IRI or a blank node. Each Triples Map must have exactly one Subject Map.

03

Predicate-Object Map

The Predicate-Object Map (rr:predicateObjectMap) associates one or more predicates and their corresponding objects to the subject. It contains:

  • A Predicate Map (rr:predicateMap): Defines the property (predicate) IRI, often a constant like foaf:name.
  • An Object Map (rr:objectMap) or Referencing Object Map (rr:parentTriplesMap): Defines the value (object). The object can be:
    • A literal generated from a source column/field.
    • An IRI, created via a template.
    • A blank node.
    • Another subject, via a Referencing Object Map, which creates a relationship between two entities defined in different Triples Maps.
04

Triples Map

The Triples Map (rr:TriplesMap) is the core container unit of an RML mapping. It binds together a Logical Source, a Subject Map, and one or more Predicate-Object Maps. Each Triples Map defines a rule for generating a set of RDF triples (subject-predicate-object) from each logical record (row, object, element) in the source. A complete mapping document is composed of multiple, potentially interconnected Triples Maps.

05

Term Map & Reference

Term Maps (rr:TermMap) are the abstract rules for generating RDF terms (subjects, predicates, objects). They specify the term type (IRI, blank node, or literal) and its value. A key subtype is a Reference-valued Term Map, where the term's value is dynamically generated from the source data using a reference (rml:reference). For example, in a JSON source, rml:reference might be "$.firstName", instructing the mapper to extract the value from that JSONPath location to create a literal object.

06

Function Map & Data Transformations

Function Maps (fnml:functionValue) enable complex data transformations within the mapping process. They allow the application of functions (from libraries like FnO) to source values before they become RDF terms. Common use cases include:

  • String manipulation: Concatenation, case conversion, substring extraction.
  • Data type conversion: Casting strings to xsd:dateTime or xsd:integer.
  • Mathematical operations: Calculations on numerical data.
  • Conditional logic: Generating different values based on source data conditions. This moves mapping beyond simple direct copying into the realm of data cleansing and enrichment.
SEMANTIC DATA FABRIC

How RML Works: The Mapping Process

RML (RDF Mapping Language) is a declarative framework for transforming heterogeneous data formats into a unified RDF knowledge graph through a structured mapping process.

The RML mapping process begins with a mapping document, an RDF file written in Turtle syntax that defines rules for converting source data—like JSON, CSV, or XML—into RDF triples. Each rule specifies a logical source (the data file or database query), a subject map (how to generate the URI for each new entity), and predicate-object maps (how to assign properties and values, including links to other entities or literal data types). This declarative approach separates transformation logic from the underlying data, enabling reusable, maintainable integration pipelines.

An RML processor, or mapper, executes these rules. It parses the source data, iterates over its logical records (rows, objects, or elements), and instantiates the defined triples into a target RDF dataset or knowledge graph. The process supports complex operations like join conditions across multiple sources, value transformation with functions, and the generation of RDF-star annotations for provenance. By standardizing this transformation, RML enables the deterministic creation of a virtual or materialized knowledge graph from raw, siloed enterprise data.

RDF MAPPING LANGUAGE

Primary Use Cases for RML

RML (RDF Mapping Language) is the standard framework for declaratively mapping heterogeneous data formats—including JSON, CSV, XML, and relational databases—into a unified RDF knowledge graph. Its primary applications center on building semantic data fabrics and enabling deterministic data integration.

01

Building Enterprise Knowledge Graphs

RML is the foundational tool for constructing enterprise knowledge graphs from siloed operational data. It provides a declarative mapping language to define how records in source systems correspond to RDF triples (subject-predicate-object). This process, known as knowledge graph materialization, creates a persistent, queryable graph that serves as a single source of truth.

  • Key Activity: Defining mappings from CSV customer files, JSON API responses, and XML product catalogs into a unified ontology.
  • Outcome: Enables complex SPARQL queries across previously disconnected data domains.
02

Creating Virtual Knowledge Graphs

Instead of physically materializing all data, RML mappings can power a virtual knowledge graph (VKG). In this architecture, a query engine uses RML definitions to federate queries in real-time across the original source systems. The graph is an on-demand, integrated view without massive data duplication.

  • Key Benefit: Provides immediate semantic integration for analytics without lengthy ETL processes.
  • Technical Mechanism: An RML processor acts as a virtual RDFizer, translating a SPARQL query into source-specific queries (e.g., SQL, MongoDB queries) and mapping the results back to RDF.
03

Semantic Data Fabric Implementation

RML is the core mapping engine within a semantic data fabric. It operationalizes the fabric's semantic layer by providing the executable instructions that transform raw data into contextualized, ontology-aligned knowledge. This bridges the gap between physical data storage and business-centric conceptual models.

  • Architectural Role: Sits within the semantic integration pipeline, often alongside tools for ontology alignment and entity resolution.
  • Business Value: Delivers semantic interoperability, allowing different departments to access data with consistent, shared meaning.
04

Data Product Publication

Domain teams use RML to publish data products as linked data. By mapping a domain-owned dataset (e.g., a product catalog in a PostgreSQL database) to a shared enterprise ontology, the team exposes its data as a standards-compliant RDF graph. This graph can be consumed directly via SPARQL or integrated into the broader knowledge graph.

  • Data Mesh Alignment: Enforces the data-as-a-product principle by providing a standardized interface (RDF/SPARQL) for domain data.
  • Governance: Mappings document the exact relationship between source schema and target ontology, providing clear data lineage.
05

Enabling Graph-Based RAG

RML feeds deterministic factual grounding for Retrieval-Augmented Generation (RAG) systems. By transforming enterprise data into a knowledge graph, RML creates a verifiable source of facts. A Graph RAG architecture can then traverse this graph to retrieve connected subgraphs as context for a large language model, drastically reducing hallucinations.

  • Process: Raw documents and databases → RML → Knowledge Graph → Graph Retrieval → LLM Context.
  • Advantage over Vector Search: Provides explicit relationships and logical provenance for retrieved information.
06

Legacy System Modernization

RML acts as a modernization wrapper for legacy systems (mainframes, old SQL databases) by exposing their data as linked data. Organizations can thus integrate decades-old data into modern AI and analytics platforms without replacing the core transactional systems. The mappings serve as a declarative abstraction layer.

  • Typical Source: Relational databases mapped using RML's basis in the R2RML standard.
  • Strategic Outcome: Unlocks legacy data for use in digital twins, advanced analytics, and compliance reporting without disruptive migration.
MAPPING LANGUAGE STANDARDS

RML vs. R2RML: A Technical Comparison

A feature-by-feature comparison of the RDF Mapping Language (RML) and its foundational standard, R2RML, highlighting their capabilities for semantic data integration.

Feature / SpecificationR2RML (W3C Standard)RML (Community Specification)

Primary Scope & Data Source

Relational Databases (RDBMS)

Multiple Heterogeneous Sources (CSV, JSON, XML, RDBMS, APIs)

Core Standardization Body

World Wide Web Consortium (W3C) Recommendation

Community-driven specification by the RML Community

Mapping Definition Target

RDF Dataset (Graph / Named Graph)

RDF Dataset (Graph / Named Graph)

Logical Table Definition

SQL query or base table name

Reference to data source + iterator (e.g., JSONPath, XPath, SQL)

Support for Nested Data Structures

Built-in Function Library for Data Transformation

Limited (R2RML-defined SQL functions)

Extended (via FnO - Function Ontology for string, numeric, date ops)

Referencing Object Maps (Foreign Keys)

Via parent triples map (rr:parentTriplesMap)

Via join condition with parent triples map

Specification of Data Source Format & Access

Implicitly via JDBC connection

Explicit via logical source declarations (rml:source, rml:referenceFormulation)

Primary Use Case

Lifting relational enterprise data to RDF

Building a semantic data fabric from diverse, modern data sources

RML

Frequently Asked Questions

RML (RDF Mapping Language) is a core standard for mapping heterogeneous data into a unified knowledge graph. These FAQs address its purpose, mechanics, and role in enterprise semantic architectures.

RML (RDF Mapping Language) is a declarative, rule-based language for defining mappings from diverse, heterogeneous data sources—including JSON, CSV, XML, and relational databases—into the RDF (Resource Description Framework) data model. It works by specifying mapping rules that define how each field or element in a source document corresponds to RDF triples (subject-predicate-object). A processor engine then executes these rules to transform the raw source data into a structured knowledge graph, creating globally unique identifiers (URIs) for entities and linking them via properties defined in an ontology. RML is an extension of the W3C standard R2RML, generalized to support non-relational data formats.

Prasad Kumkar

About the author

Prasad Kumkar

CEO & MD, Inference Systems

Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.

His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.