Unofficial Draft
Copyright © 2021-2022 the document editors/authors. Text is available under the Creative Commons Attribution 4.0 International Public License; additional terms may apply.
This document describes Logical Source and Logical Target to access data sources and targets.
A Logical Source is a formal model and common representation for describing access to data sources. A Logical Target is a formal model and a common representation for specifying how a Knowledge Graph should be exported to a given target.
Logical Source and Logical Target reuses existing data access descriptions and is therefore not limited to a specific set of targets or data sources. The current document describes the Logical Source and Logical Target concepts through definitions and examples.
The version of this document is v0.1.
This document is a draft of a potential specification. It has no official standing of any kind and does not represent the support or consensus of any standards organization.
This is an early draft, yet efforts are made to keep things stable.
As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.
The key words MAY, MUST, and MUST NOT in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.
This section is non-normative.
This document specifies Logical Source and Logical Target, Logical Source is a description for specifying how a data source should be accessed. A Logical Source description is not limited to a specific Source which allows to access any type of Source and provides a reference formulation to refer to data inside the Source.
Logical Target is a description for defining how a generated RDF [RDF-Concepts] knowledge graph must be exported. A Logical Target description is not tailored towards a specific Target which allows to export the generated RDF triples to any type of Target and provides fine-grained control over where each RDF triple is exported to.
Logical Source and Logical Target leverage the access descriptions of data access such as DCAT [DCAT], VoID [VoID], SD [SD], etc.
In this document, examples assume the following namespace prefix bindings unless otherwise stated:
The examples are contained in pink colored boxes:
# This box contains the example's Logical Source description.
# This box contains the example's Logical Target description.
The LogicalSource vocabulary namespace is http://semweb.mmlab.be/ns/rml-source#
and it's prefix is rml
.
The Logical Source vocabulary consists of 2 classes:
rml:LogicalSource
describes how data of a source can be referenced.rml:Source
describes how a source can be accessed, it is part of a rml:LogicalSource
.A Logical Source is any data source providing data to be mapped to RDF triples.
A Logical Source (rml:LogicalSource
) MUST contains the following properties:
rml:source
) specifies how a source is accessed through a rml:Source
.rml:referenceFormulation
)
defines the reference formulation used to refer to the elements
of a data source.
The reference formulation must be specified in the case of databases,
CSV, TSV, XML, and JSON data sources.
By default rr:SQL2008
for databases, ql:CSV
for CSV and TSV data sources.
XPath for XML and JSONPath for JSON and JSONL data sources.The following properties MAY be specified in a Logical Source:
rml:iterator
)
defines the iteration loop used to map the data of the input source.
The iterator defines how to refer to any of the following:By default, the iterator is considered a row, if not specified:
rml:iterator
, if not specified, is a "row". The Logical Source definition requires only the source (rml:source
)
to be specified, all other properties are optional.
If a property is specified, it MUST NOT be specified multiple times.
Property | Domain | Range |
---|---|---|
rml:source |
rml:LogicalSource |
Source |
rml:referenceFormulation |
rml:LogicalSource |
ql:ReferenceFormulation |
rml:iterator |
rml:LogicalSource |
Literal |
Each Logical Source has a reference formulation to define how to reference
to elements of the data of the input source.
Several reference formulations (rml:ReferenceFormulation
)
are defined in this specification:
rr:SQL2008
: SQL 2008 standard for relational databasesql:CSV
: CSV or TSV data sourcesql:JSONPath
: JSON documentsql:XPath
: XML documents, a shortcut for ql:XPathReferenceFormulation
with default parametersql:XPathReferenceFormulation
: XML documents with optionally
the definition of XML namespaces used in references.
By default, no namespaces are defined.ql:XPathReferenceFormulation
may specify zero or more
ql:namespace
properties with a ql:Namespace
.
A ql:Namespace
contains the following required properties:
ql:namespacePrefix
: A Literal with the prefix used for the XML namespace.ql:namespaceURL
: A Literal with the URL identifying the XML namespace.@prefix dcat : <http://www.w3.org/ns/dcat#> .
<#XMLNamespace> a rml:LogicalSource;
rml:source [ a rml:Source
rml:access [ a dcat:Dataset;
dcat:distribution [ a dcat:Distribution;
dcat:accessURL <file:///path/to/data.xml>;
];
];
];
rml:referenceFormulation [ a ql:XPathReferenceFormulation;
ql:namespace [ a ql:Namespace;
ql:namespacePrefix "ex";
ql:namespaceURL "http://example.org";
];
];
rml:iterator "/xpath/ex:namespace/expression";
.
SQL databases require a SQL query to be performed to retrieve a table or view
from the database. This is specified through the rr:SQL2008
reference formulation from the W3C R2RML recommendation.
@prefix d2rq : <http://www.wiwiss.fu-berlin.de/suhl/bizer/D2RQ/0.1#> .
<#SQLDatabase> a rml:LogicalSource;
rml:source [ a rml:Source
rml:access [ a d2rq:Database;
d2rq:jdbcDSN "jdbc:mysql://localhost/example";
d2rq:jdbcDriver "com.mysql.jdbc.Driver";
d2rq:username "user";
d2rq:password "password" .
];
];
rml:referenceFormulation rr:SQL2008;
rml:query "SELECT name FROM student;"
.
Tabular data are widely used and described by existing standards such as
W3C CSVW recommendation. Refering to these data can be done by referring
to column names through the ql:CSV
reference formulation.
In the following example, a CSV file is accessed, but the CSV reference formulation is not limited to files. Other type of data sources in a CSV format can use the same reference formulation.
@prefix csvw : <http://www.w3.org/ns/csvw#> .
<#CSVFile> a rml:LogicalSource;
rml:source [ a rml:Source
rml:access [ a d2rq:Database;
csvw:url "file:///data/file.csv" ;
csvw:dialect [ a csvw:Dialect;
csvw:delimiter ";";
csvw:encoding "UTF-8";
csvw:header "1"^^xsd:boolean;
];
];
];
rml:referenceFormulation ql:CSV;
.
JSON data is hierarchical and can be refered to using JSONPath
which is specified through the ql:JSONPath
reference formulation.
In the following example, a JSON file is accessed, but the JSONpath reference formulation is not limited to files. Other type of data sources in a JSON format can use the same reference formulation.
@prefix dcat : <http://www.w3.org/ns/dcat#> .
<#JSONFile> a rml:LogicalSource;
rml:source [ a rml:Source
rml:access [ a dcat:Dataset;
dcat:distribution [ a dcat:Distribution;
dcat:downloadURL "http://example.org/file.xml";
];
];
];
rml:referenceFormulation ql:JSONPath;
.
XML data is hierarchical and can be refered to using XPath
which is specified through the ql:XPath
reference formulation.
If an XML namespace needs to be specified,
ql:XpathReferenceFormulation
class can be used which allows
to define one or multiple XML namespaces.
In the following example, a JSON file is accessed, but the CSV reference formulation is not limited to files. Other type of data sources in a CSV format can use the same reference formulation.
@prefix dcat : <http://www.w3.org/ns/dcat#> .
<#XMLNamespace> a rml:LogicalSource;
rml:source [ a rml:Source
rml:access [ a dcat:Dataset;
dcat:distribution [ a dcat:Distribution;
dcat:accessURL <file:///path/to/data.xml>;
];
];
];
rml:referenceFormulation [ a ql:XPathReferenceFormulation;
ql:namespace [ a ql:Namespace;
ql:namespacePrefix "ex";
ql:namespaceURL "http://example.org";
];
];
rml:iterator "/xpath/ex:namespace/expression";
.
A Source (rml:Source
) defines how a data source should be accessed.
It MUST contain the follow properties:
Optionally, the following properties MAY be specified:
enc:UTF-8
if not specified.null
,
this one is used together with the ones specified through rml:null
.<#JSON> a rml:LogicalSource;
rml:source [ a rml:Source
rml:access [ a dcat:Dataset;
dcat:distribution [ a dcat:Distribution;
dcat:accessURL <file:///path/to/data.json.gz>;
];
];
rml:null ""; # empty string as NULL besides default null character
rml:compression comp:gzip; # GZip compression
rml:encoding enc:UTF-16; # UTF-16 encoding
];
rml:referenceFormulation ql:JSONPath;
rml:iterator "$.jsonpath.expression";
.
Property | Domain | Range |
---|---|---|
rml:access |
rml:Source |
URI or Literal |
rml:encoding |
rml:Source |
enc:Encoding |
rml:null |
rml:Source |
Literal |
rml:compression |
rml:Source |
comp:Compression |
The following example show a Source of an CSV file.
<#CSV> a rml:LogicalSource;
rml:source [ a csvw:Table;
csvw:url "/path/to/data.csv";
];
rml:referenceFormulation ql:CSV;
.
Note that there is not rml:iterator
is present because its default is row.
The following example shows a Source specified for a database.
<#RDB> a rml:LogicalSource;
rml:source [ a d2rq:Database;
d2rq:jdbcDSN "jdbc:mysql://localhost/example";
d2rq:jdbcDriver "com.mysql.jdbc.Driver";
d2rq:username "user";
d2rq:password "password";
];
rml:referenceFormulation ql:SQL2008;
.
Note that there is not rml:iterator
is present because its default is row.
The following example shows a Source of a XML file
<#XML> a rml:LogicalSource;
rml:source [ a dcat:Dataset;
dcat:distribution [ a dcat:Distribution;
dcat:accessURL <file:///path/to/data.xml>;
];
];
rml:referenceFormulation ql:XPath;
rml:iterator "/xpath/iterator/expression";
.
<#JSON> a rml:LogicalSource;
rml:source [ a dcat:Dataset;
dcat:distribution [ a dcat:Distribution;
dcat:accessURL <file:///path/to/data.json>;
];
];
rml:referenceFormulation ql:JSONPath;
rml:iterator "$.jsonpath.expression";
.
The Target vocabulary namespace is http://semweb.mmlab.be/ns/rml-target#
and it's prefix is rml
.
The Target vocabulary consists of a single class: rml:LogicalTarget
to describe how a knowledge graph must be exported after generation.
A Target is any target to where RDF triples are exported to.
A Target (rml:LogicalTarget
) contains the following properties:
rml:target
) locates the output target.
It is a URI [RFC3986]
or Literal [RDF-Concepts]
that represents the target's location.
External vocabulary such as DCAT, VoID, SD is allowed here.
Each rml:LogicalTarget
MUST have one rml:target
property.
The target MAY be a Literal
containing the path the file to where the knowledge graph is exported to,
this is allowed to stay backwards compatibility
with existing data access descriptions.rml:serialization
) MAY specify
the serialization format for exporting a knowledge graph.
The serialization format is described using the W3C
formats namespace.
By default, the serialization format is N-Quads [N-Quads].rml:compression
) MAY describe
the compression algorithm to apply when exporting a knowledge graph.
The compression format is specified through
the comp namespace.
By default, no compression is applied.rml:encoding
) MAY specify which encoding must be used
when exporting a knowledge graph.
The encoding is specified through
enc namespace.The Target definition requires only the target (rml:target
) to be specified,
all other properties are optional.
Property | Domain | Range |
---|---|---|
rml:target |
rml:LogicalTarget |
URI or Literal |
rml:serialization |
rml:LogicalTarget |
formats:Format |
rml:compression |
rml:LogicalTarget |
comp:Compression |
rml:encoding |
rml:LogicalTarget |
enc:Encoding |
The following example show a Target of an RDF dump in Turtle [Turtle] format with GZip compression and UTF-8 encoding:
<#VoIDDump> a rml:LogicalTarget;
rml:target [ a void:Dataset;
void:dataDump <file:///data/dump.ttl>;
];
rml:serialization formats:Turtle;
rml:compression comp:gzip;
rml:encoding enc:UTF-8;
.
The following example shows a Target of a [SPARQL]
endpoint with SPARQL UPDATE
:
<#SPARQLEndpoint> a rml:LogicalTarget;
rml:target [ a sd:Service;
sd:endpoint <http://example.com/sparql-update>;
sd:supportedLanguage sd:SPARQL11Update ;
];
.
The following example shows a Target of a DCAT dataset in N-Quads format with Zip compression:
<#DCATDump> a rml:LogicalTarget;
rml:target [ a dcat:Dataset;
dcat:distribution [ a dcat:Distribution;
dcat:accessURL <http://example.org/dcat-access-url>;
];
];
rml:serialization formats:N-Quads;
rml:compression comp:zip;
.
The following example shows a Target of a MQTT stream in N-Quads format without compression:
<#MQTTStream> a rml:LogicalTarget;
rml:target [ a td:Thing;
td:hasPropertyAffordance [
td:hasForm [
# URL and content type
hctl:hasTarget "mqtt://localhost/topic";
hctl:forContentType "application/n-quads";
# Set MQTT parameters through W3C WoT Binding Template for MQTT
mqv:controlPacketValue "SUBSCRIBE";
mqv:options ([ mqv:optionName "qos"; mqv:optionValue "1" ] [ mqv:optionName "dup" ]);
];
];
];
rml:serialization formats:N-Quads;
.
The following example shows a Target of a TCP stream in N-Quads format without compression:
<#MQTTStream> a rml:LogicalTarget;
rml:target [ a td:Thing;
td:hasPropertyAffordance [
td:hasForm [
# URL and content type
hctl:hasTarget "tcp://localhost:1234/topic";
hctl:forContentType "application/n-quads";
];
];
];
rml:serialization formats:N-Quads;
.
The following example shows a Target of a Kafka stream in N-Quads format without compression:
<#KafkaStream> a rml:LogicalTarget;
rml:target [ a td:Thing;
td:hasPropertyAffordance [
td:hasForm [
# URL and content type
hctl:hasTarget "kafka://localhost:8089/topic";
hctl:forContentType "application/n-quads";
# Kafka parameters through W3C WoT Binding Template for Kafka
kafka:groupId "MyAwesomeGroup";
];
];
];
rml:serialization formats:N-Quads;
.
The following example shows a Target of a HTTP Server Sent Events in N-Quads format without compression:
<#HTTPSSEStream> a rml:LogicalTarget;
rml:target [ a td:Thing;
td:hasPropertyAffordance [
td:hasForm [
# URL and content type
hctl:hasTarget "http://localhost:4242/";
hctl:forContentType "application/n-quads";
# Set HTTP method and headers through W3C WoT Binding Template for HTTP
htv:methodName "POST";
htv:headers ([
htv:fieldName "User-Agent";
htv:fieldValue "Processor";
]);
# Max-Age CoAP property has number 14. Value is in seconds RFC7252
cov:options ([ cov:optionName "14"; cov:optionValue "360" ]);
];
];
];
rml:serialization formats:N-Quads;
.
The following example shows a Target of a HTTP Server Sent Events stream in N-Quads format without compression:
<#HTTPSSEStream> a rml:LogicalTarget;
rml:target [ a td:Thing;
td:hasPropertyAffordance [
td:hasForm [
# URL and content type
hctl:hasTarget "http://localhost:4242/";
hctl:forContentType "text/event-stream";
];
];
];
rml:serialization formats:N-Quads;
.
The following example shows a Target of a WebSocket in N-Quads format without compression:
<#WebSocketStream> a rml:LogicalTarget;
rml:target [ a td:Thing;
td:hasPropertyAffordance [
td:hasForm [
# URL and content type
hctl:hasTarget "ws://localhost:5555/";
hctl:forContentType "application/n-quads";
];
];
];
rml:serialization formats:N-Quads;
.