<img src="https://dataio.goatcounter.com/count?p=/test-noscript">

DataIO

Unofficial Draft

More details about this document
Latest published version:
https://www.w3.org/dataio/
Latest editor's draft:
https://rml.io/specs/dataio
Editors:
(Ghent University – imec – IDLab)
(KU Leuven, Department of Computer Science)
This Version
https://rml.io/specs/dataio/20220518/
Previous Version
https://rml.io/specs/dataio/20220517/
Website
https://rml.io/

Abstract

This document describes Logical Source and Logical Target to access data sources and targets.

A Logical Source is a formal model and common representation for describing access to data sources. A Logical Target is a formal model and a common representation for specifying how a Knowledge Graph should be exported to a given target.

Logical Source and Logical Target reuses existing data access descriptions and is therefore not limited to a specific set of targets or data sources. The current document describes the Logical Source and Logical Target concepts through definitions and examples.

The version of this document is v0.1.

Status of This Document

This document is a draft of a potential specification. It has no official standing of any kind and does not represent the support or consensus of any standards organization.

This is an early draft, yet efforts are made to keep things stable.

1. Conformance

As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.

The key words MAY, MUST, and MUST NOT in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

2. Overview

This section is non-normative.

This document specifies Logical Source and Logical Target, Logical Source is a description for specifying how a data source should be accessed. A Logical Source description is not limited to a specific Source which allows to access any type of Source and provides a reference formulation to refer to data inside the Source.

Logical Target is a description for defining how a generated RDF [RDF-Concepts] knowledge graph must be exported. A Logical Target description is not tailored towards a specific Target which allows to export the generated RDF triples to any type of Target and provides fine-grained control over where each RDF triple is exported to.

Logical Source and Logical Target leverage the access descriptions of data access such as DCAT [DCAT], VoID [VoID], SD [SD], etc.

In this document, examples assume the following namespace prefix bindings unless otherwise stated:

Prefix Namespace
rml http://semweb.mmlab.be/ns/rml#
formats https://www.w3.org/ns/formats/
comp http://semweb.mmlab.be/ns/rml-compression#
void http://rdfs.org/ns/void#
sd http://www.w3.org/ns/sparql-service-description#
dcat http://www.w3.org/ns/dcat#
td https://www.w3.org/2019/wot/td#
hctl https://www.w3.org/2019/wot/hypermedia#
htv http://www.w3.org/2011/http#

The examples are contained in pink colored boxes:

# This box contains the example's Logical Source description.
# This box contains the example's Logical Target description.

3. Logical Source vocabulary

The LogicalSource vocabulary namespace is http://semweb.mmlab.be/ns/rml-source# and it's prefix is rml.

The Logical Source vocabulary consists of 2 classes:

  1. rml:LogicalSource describes how data of a source can be referenced.
  2. rml:Source describes how a source can be accessed, it is part of a rml:LogicalSource.

3.1 Defining Logical Sources

A Logical Source is any data source providing data to be mapped to RDF triples.

A Logical Source (rml:LogicalSource) MUST contains the following properties:

The following properties MAY be specified in a Logical Source:

By default, the iterator is considered a row, if not specified:

The Logical Source definition requires only the source (rml:source) to be specified, all other properties are optional. If a property is specified, it MUST NOT be specified multiple times.

Property Domain Range
rml:source rml:LogicalSource Source
rml:referenceFormulation rml:LogicalSource ql:ReferenceFormulation
rml:iterator rml:LogicalSource Literal
Source structure
Figure 1 The structure of Source

3.2 Reference formulations

Each Logical Source has a reference formulation to define how to reference to elements of the data of the input source. Several reference formulations (rml:ReferenceFormulation) are defined in this specification:

ql:XPathReferenceFormulation may specify zero or more ql:namespace properties with a ql:Namespace. A ql:Namespace contains the following required properties:

@prefix dcat : <http://www.w3.org/ns/dcat#> .
<#XMLNamespace> a rml:LogicalSource;
     rml:source [ a rml:Source
       rml:access [ a dcat:Dataset;
         dcat:distribution [ a dcat:Distribution;
           dcat:accessURL <file:///path/to/data.xml>;
         ];
       ];
     ];
     rml:referenceFormulation [ a ql:XPathReferenceFormulation;
       ql:namespace [ a ql:Namespace;
         ql:namespacePrefix "ex";
         ql:namespaceURL "http://example.org";
       ];
     ];
     rml:iterator "/xpath/ex:namespace/expression";
.

3.2.1 SQL databases

SQL databases require a SQL query to be performed to retrieve a table or view from the database. This is specified through the rr:SQL2008 reference formulation from the W3C R2RML recommendation.

@prefix d2rq : <http://www.wiwiss.fu-berlin.de/suhl/bizer/D2RQ/0.1#> .
<#SQLDatabase> a rml:LogicalSource;
     rml:source [ a rml:Source
       rml:access [ a d2rq:Database;
          d2rq:jdbcDSN "jdbc:mysql://localhost/example";
          d2rq:jdbcDriver "com.mysql.jdbc.Driver";
          d2rq:username "user";
          d2rq:password "password" .
       ];
     ];
     rml:referenceFormulation rr:SQL2008;
     rml:query "SELECT name FROM student;"
.

3.2.2 Tabular CSV & TSV data

Tabular data are widely used and described by existing standards such as W3C CSVW recommendation. Refering to these data can be done by referring to column names through the ql:CSV reference formulation.

In the following example, a CSV file is accessed, but the CSV reference formulation is not limited to files. Other type of data sources in a CSV format can use the same reference formulation.

@prefix csvw : <http://www.w3.org/ns/csvw#> .
<#CSVFile> a rml:LogicalSource;
     rml:source [ a rml:Source
       rml:access [ a d2rq:Database;
         csvw:url "file:///data/file.csv" ;
         csvw:dialect [ a csvw:Dialect;
           csvw:delimiter ";";
           csvw:encoding "UTF-8";
           csvw:header "1"^^xsd:boolean;
         ];
       ];
     ];
     rml:referenceFormulation ql:CSV;
.

3.2.3 JSON data

JSON data is hierarchical and can be refered to using JSONPath which is specified through the ql:JSONPath reference formulation.

In the following example, a JSON file is accessed, but the JSONpath reference formulation is not limited to files. Other type of data sources in a JSON format can use the same reference formulation.

@prefix dcat : <http://www.w3.org/ns/dcat#> .
<#JSONFile> a rml:LogicalSource;
     rml:source [ a rml:Source
       rml:access [ a dcat:Dataset;
         dcat:distribution [ a dcat:Distribution;
           dcat:downloadURL "http://example.org/file.xml";
         ];
       ];  
     ];
     rml:referenceFormulation ql:JSONPath;
.

3.2.4 XML data

XML data is hierarchical and can be refered to using XPath which is specified through the ql:XPath reference formulation. If an XML namespace needs to be specified, ql:XpathReferenceFormulation class can be used which allows to define one or multiple XML namespaces.

In the following example, a JSON file is accessed, but the CSV reference formulation is not limited to files. Other type of data sources in a CSV format can use the same reference formulation.

@prefix dcat : <http://www.w3.org/ns/dcat#> .
<#XMLNamespace> a rml:LogicalSource;
     rml:source [ a rml:Source
       rml:access [ a dcat:Dataset;
         dcat:distribution [ a dcat:Distribution;
           dcat:accessURL <file:///path/to/data.xml>;
         ];
       ];
     ];
     rml:referenceFormulation [ a ql:XPathReferenceFormulation;
       ql:namespace [ a ql:Namespace;
         ql:namespacePrefix "ex";
         ql:namespaceURL "http://example.org";
       ];
     ];
     rml:iterator "/xpath/ex:namespace/expression";
.

3.3 Source

A Source (rml:Source) defines how a data source should be accessed. It MUST contain the follow properties:

Optionally, the following properties MAY be specified:

<#JSON> a rml:LogicalSource;
     rml:source [ a rml:Source
       rml:access [ a dcat:Dataset;
         dcat:distribution [ a dcat:Distribution;
           dcat:accessURL <file:///path/to/data.json.gz>;
         ];
       ];
       rml:null ""; # empty string as NULL besides default null character
       rml:compression comp:gzip; # GZip compression
       rml:encoding enc:UTF-16; # UTF-16 encoding
     ];
     rml:referenceFormulation ql:JSONPath;
     rml:iterator "$.jsonpath.expression";
.
Property Domain Range
rml:access rml:Source URI or Literal
rml:encoding rml:Source enc:Encoding
rml:null rml:Source Literal
rml:compression rml:Source comp:Compression

3.4 Examples

The following example show a Source of an CSV file.

<#CSV> a rml:LogicalSource;
     rml:source [ a csvw:Table;
         csvw:url "/path/to/data.csv";
     ];
     rml:referenceFormulation ql:CSV;
.

Note that there is not rml:iterator is present because its default is row.

The following example shows a Source specified for a database.

<#RDB> a rml:LogicalSource;
     rml:source [ a d2rq:Database;
        d2rq:jdbcDSN "jdbc:mysql://localhost/example";
        d2rq:jdbcDriver "com.mysql.jdbc.Driver";
        d2rq:username "user";
        d2rq:password "password";
     ];
     rml:referenceFormulation ql:SQL2008;
.

Note that there is not rml:iterator is present because its default is row.

The following example shows a Source of a XML file

<#XML> a rml:LogicalSource;
     rml:source [ a dcat:Dataset;
       dcat:distribution [ a dcat:Distribution;
         dcat:accessURL <file:///path/to/data.xml>;
       ];
     ];
     rml:referenceFormulation ql:XPath;
     rml:iterator "/xpath/iterator/expression";
.
<#JSON> a rml:LogicalSource;
     rml:source [ a dcat:Dataset;
       dcat:distribution [ a dcat:Distribution;
         dcat:accessURL <file:///path/to/data.json>;
       ];
     ];
     rml:referenceFormulation ql:JSONPath;
     rml:iterator "$.jsonpath.expression";
.

4. Target vocabulary

The Target vocabulary namespace is http://semweb.mmlab.be/ns/rml-target# and it's prefix is rml.

The Target vocabulary consists of a single class: rml:LogicalTarget to describe how a knowledge graph must be exported after generation.

4.1 Defining Targets

A Target is any target to where RDF triples are exported to.

A Target (rml:LogicalTarget) contains the following properties:

The Target definition requires only the target (rml:target) to be specified, all other properties are optional.

Property Domain Range
rml:target rml:LogicalTarget URI or Literal
rml:serialization rml:LogicalTarget formats:Format
rml:compression rml:LogicalTarget comp:Compression
rml:encoding rml:LogicalTarget enc:Encoding
Target structure
Figure 2 The structure of Target

4.2 Examples

The following example show a Target of an RDF dump in Turtle [Turtle] format with GZip compression and UTF-8 encoding:

<#VoIDDump> a rml:LogicalTarget;
     rml:target [ a void:Dataset;
         void:dataDump <file:///data/dump.ttl>;
     ];
     rml:serialization formats:Turtle;
     rml:compression comp:gzip;
     rml:encoding enc:UTF-8;
.

The following example shows a Target of a [SPARQL] endpoint with SPARQL UPDATE:

<#SPARQLEndpoint> a rml:LogicalTarget;
     rml:target [ a sd:Service;
       sd:endpoint  <http://example.com/sparql-update>;
       sd:supportedLanguage sd:SPARQL11Update ;
     ];
.

The following example shows a Target of a DCAT dataset in N-Quads format with Zip compression:

<#DCATDump> a rml:LogicalTarget;
     rml:target [ a dcat:Dataset;
       dcat:distribution [ a dcat:Distribution;
         dcat:accessURL <http://example.org/dcat-access-url>;
       ];
     ];
     rml:serialization formats:N-Quads;
     rml:compression comp:zip;
.

The following example shows a Target of a MQTT stream in N-Quads format without compression:

<#MQTTStream> a rml:LogicalTarget;
     rml:target [ a td:Thing;
       td:hasPropertyAffordance [
         td:hasForm [
           # URL and content type
           hctl:hasTarget "mqtt://localhost/topic";
           hctl:forContentType "application/n-quads";
           # Set MQTT parameters through W3C WoT Binding Template for MQTT
           mqv:controlPacketValue "SUBSCRIBE";
           mqv:options ([ mqv:optionName "qos"; mqv:optionValue "1" ] [ mqv:optionName "dup" ]);
         ];
       ];
     ];
     rml:serialization formats:N-Quads;
.

The following example shows a Target of a TCP stream in N-Quads format without compression:

<#MQTTStream> a rml:LogicalTarget;
     rml:target [ a td:Thing;
       td:hasPropertyAffordance [
         td:hasForm [
           # URL and content type
           hctl:hasTarget "tcp://localhost:1234/topic";
           hctl:forContentType "application/n-quads";
         ];
       ];
     ];
     rml:serialization formats:N-Quads;
.

The following example shows a Target of a Kafka stream in N-Quads format without compression:

<#KafkaStream> a rml:LogicalTarget;
     rml:target [ a td:Thing;
       td:hasPropertyAffordance [
         td:hasForm [
           # URL and content type
           hctl:hasTarget "kafka://localhost:8089/topic";
           hctl:forContentType "application/n-quads";
           # Kafka parameters through W3C WoT Binding Template for Kafka
           kafka:groupId "MyAwesomeGroup";
         ];
       ];
     ];
     rml:serialization formats:N-Quads;
.

The following example shows a Target of a HTTP Server Sent Events in N-Quads format without compression:

<#HTTPSSEStream> a rml:LogicalTarget;
     rml:target [ a td:Thing;
       td:hasPropertyAffordance [
         td:hasForm [
           # URL and content type
           hctl:hasTarget "http://localhost:4242/";
           hctl:forContentType "application/n-quads";
           # Set HTTP method and headers through W3C WoT Binding Template for HTTP
           htv:methodName "POST";
           htv:headers ([
             htv:fieldName "User-Agent";
             htv:fieldValue "Processor";
           ]);
           # Max-Age CoAP property has number 14. Value is in seconds RFC7252
           cov:options ([ cov:optionName "14"; cov:optionValue "360" ]);
         ];
       ];
     ];
     rml:serialization formats:N-Quads;
.

The following example shows a Target of a HTTP Server Sent Events stream in N-Quads format without compression:

<#HTTPSSEStream> a rml:LogicalTarget;
     rml:target [ a td:Thing;
       td:hasPropertyAffordance [
         td:hasForm [
           # URL and content type
           hctl:hasTarget "http://localhost:4242/";
           hctl:forContentType "text/event-stream";
         ];
       ];
     ];
     rml:serialization formats:N-Quads;
.

The following example shows a Target of a WebSocket in N-Quads format without compression:

<#WebSocketStream> a rml:LogicalTarget;
     rml:target [ a td:Thing;
       td:hasPropertyAffordance [
         td:hasForm [
           # URL and content type
           hctl:hasTarget "ws://localhost:5555/";
           hctl:forContentType "application/n-quads";
         ];
       ];
     ];
     rml:serialization formats:N-Quads;
.

A. References

A.1 Normative references

[DCAT]
Data Catalog Vocabulary (DCAT) - Version 2. W3C. 22 February 2020. W3C Recommendation. URL: https://www.w3.org/TR/vocab-dcat/
[N-Quads]
RDF 1.1 N-Quads. W3C. 25 February 2014. W3C Recommendation. URL: https://www.w3.org/TR/n-quads/
[RDF-Concepts]
Resource Description Framework (RDF): Concepts and Abstract Syntax. Graham Klyne; Jeremy Carroll. W3C. 10 February 2004. W3C Recommendation. URL: https://www.w3.org/TR/rdf-concepts/
[RFC2119]
Key words for use in RFCs to Indicate Requirement Levels. S. Bradner. IETF. March 1997. Best Current Practice. URL: https://www.rfc-editor.org/rfc/rfc2119
[RFC3986]
Uniform Resource Identifier (URI): Generic Syntax. IETF. March 1997. Internet Standard. URL: https://tools.ietf.org/html/rfc3986
[RFC8174]
Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words. B. Leiba. IETF. May 2017. Best Current Practice. URL: https://www.rfc-editor.org/rfc/rfc8174
[SD]
SPARQL 1.1 Service Description. W3C. 21 March 2013. W3C Recommendation. URL: https://www.w3.org/TR/sparql11-service-description/
[SPARQL]
SPARQL 1.1 Overview. W3C. 21 March 2013. W3C Recommendation. URL: https://www.w3.org/TR/sparql11-overview/
[Turtle]
RDF 1.1 Turtle. W3C. 25 February 2014. W3C Recommendation. URL: https://www.w3.org/TR/turtle/
[VoID]
Describing Linked Datasets with the VoID Vocabulary. W3C. 03 March 2011. W3C Interest Group Note. URL: https://www.w3.org/TR/void/