KGC Open Challenges v.1

[C1] Language Tags and Datatype

This challenge refers to both language and datatype tags but it would be best if their solutions are aligned.

[C1a] Language

Challenge resources:

https://github.com/kg-construct/mapping-challenges/tree/main/challenges/language-map

CG discussion:
https://github.com/kg-construct/mapping-challenges/issues/23

R2RML: The language tag is represented by the rr:language property on a term map. If present, its value must be a valid language tag. ( https://www.w3.org/TR/r2rml/#language-tags )

RML: The language tag can be specified using either a language map or a language property. A specified language tag causes generated literals to be language-tagged plain literals. A language map is represented by the rml:languageMap property on a term map. If present, its value must be a term map. The value generated by the language map is used as a language tag on the created term. ( https://rml.io/specs/rml/#language-tag )

RML Processor: The language of a generated term is defined using following precedence rules: IF there's a language map AND its generated value is a valid language tag, use that value ELSIF there's a language property, use that value; ELSE don't specify the language.

[C1b] Datatype

Challenge resources:

https://github.com/kg-construct/mapping-challenges/tree/main/challenges/datatype-map

CG discussion:

R2RML: A datatypeable term map is a term map with a term type of rr:Literal that does not have a specified language tag. Datatypeable term maps may generate typed literals. The datatype of these literals can be automatically determined based on the SQL datatype of the underlying logical table column (producing a natural RDF literal), or it can be explicitly overridden using rr:datatype (producing a datatype-override RDF literal) whose Its value is an IRI.

RML: RML follows the same behavior as R2RML.

problem: How can we extend the datatype tag so it can be generated considering values from one or more data sources?

related to: [C1a], [C5c]

[C2] Iterators

This challenge refers to iteration patterns.

[C2a] access field outside the iteration

Challenge resources:

https://github.com/kg-construct/mapping-challenges/tree/main/challenges/access-fields-outside-iteration

CG discussion:
https://github.com/kg-construct/mapping-challenges/issues/20

other relevant discussion:
https://github.com/RMLio/rmlmapper-java/issues/28

R2RML: R2RML considers that the default iteration pattern is ‘per line’ of the SQL table

RML: The iteration pattern can be manually defined with the property rml:iterator

RML Processor:

problem: Sometimes it is desired to refer to a data value ‘outside’ the iteration pattern. xR2RML proposes one solution to do that using the "pushDown" property. Are there other alternatives?

related to: [C5]

[C2b] iteration over multi-value references

Challenge resources:

https://github.com/kg-construct/mapping-challenges/issues/20

CG discussion:

R2RML: R2RML considers that the default iteration pattern is ‘per line’ of the SQL table.

RML: The iteration pattern can be manually defined with the property rml:iterator

problem: Sometimes it is desired to iterate over the multiple values that are returned. How could that be possible? How should it be differently defined compared to processing multiple values in the same way?

related to: [C2a], [C3], [C5]

[C3] Multi-Value references

This challenge refers to multi-value references.

[C3a] 1 reference - N values

R2RML: R2RML expects that only one value is returned from a reference to the table.

RML: RML allows multiple values to be returned from a reference to a data source.

RML Processor:

problem: RML fails to give a clear explanation of how the multiple values should be handled. By default multiple RDF terms are expected to be generated by an RML processor. However what happens if each returned value needs to be processed separately? What happens if functions need to be included? What should be the default behavior of an RML processor? Would one need to opt out? If so, when and how?

related to: [C2b], [C4]

Challenge resources:

https://github.com/kg-construct/mapping-challenges/tree/main/challenges/multivalue-references

CG discussion:
https://github.com/kg-construct/mapping-challenges/issues/20 ,
https://github.com/kg-construct/mapping-challenges/issues/19

other relevant discussion:
https://github.com/RMLio/rml-implementation-report/issues/11 ,
https://stackoverflow.com/questions/61751174/is-there-a-solution-in-rml-for-multiple-complex-entities-in-one-data-element-ce

[C3b] multi-value references for language/datatype tags

R2RML:

RML:

problem: What happens when multiple values are returned and language tags need to be applied? What happens when multiple values are returned and different languages tags need to be applied?

related to: [C2b], [C3a]

Challenge resources:
https://github.com/kg-construct/mapping-challenges/tree/main/challenges/generate-multiple-values

CG discussion:
https://github.com/kg-construct/mapping-challenges/issues/10

other relevant discussion:
https://github.com/RMLio/rmlmapper-java/issues/65

[C4] RDF Collections and Containers

This challenge refers to RDF collections and containers.

R2RML: R2RML does not foresee the generation of RDF collections and containers.

RML: RML inherits R2RML’s limitations

RML Processor:

problem: How can we generate RDF collections and containers with RML? xR2RML already offers a solution: http://i3s.unice.fr/~fmichel/xr2rml_specification.html#_Toc466307471. Are there alternatives?

related to: [C2b], [C3]

Challenge resources:

https://github.com/kg-construct/mapping-challenges/tree/main/challenges/rdf-collections

CG discussion:
https://github.com/kg-construct/mapping-challenges/issues/10

other relevant discussion:
https://github.com/kg-construct/mapping-challenges/issues/20

[C5] Joins

There are different cases that need to be considered in the case of joins:

[C5a] Joins without conditions

R2RML: R2RML requires that every Referencing Object Map has exactly 1 Parent Triples Map and 0 or more join conditions. An R2RML processor executes a Referencing Object Map as follows: If the Referencing Object Map has no join condition and the Parent Triples Map’s and the Child Triples Map’s logical table/query are the same, a resource should be created for each distinct value in the Child Triples Map.
( https://www.w3.org/TR/r2rml/#dfn-child-query )

RML: RML follows R2RML specification, but it also adds the following case: If a Referencing Object Map has no join condition and the Parent Triples Map’s and the Child Triples Map’s logical source are NOT the same, a resource should be created for each distinct value in the Child Triples Map (cartesian product).
However, it seems that the RMLMapper, RML’s reference implementation, does not follow the R2RML specification.
( https://rml.io/specs/rml/#relationships-among-logical-sources-rr-parenttriplesmap-rr-joincondition-rr-child-and-rr-parent)

RML Processor:

problem: it needs to be well clarified how a Referencing Object Map should be interpreted by the processors in all cases: (i) join condition is defined or not, (ii) logical source is the same or not. Currently, if there is no condition and the logical source is the same, there is no way that the cartesian product is generated. The default and edge cases still need to be well clarified.

related to: [C2a], [C2b]

Challenge resources:

CG discussion:

other relevant discussion:
https://github.com/RMLio/rmlmapper-java/issues/28

[C5b] Joins on literals

R2RML: A Referencing Object Map returns the RDF term generated by the Parent Triples Map’s Subject Map. Thus, a resource is generated by default and there is no way to opt out, namely generate a Literal instead.

RML: RML inherits the same behavior as R2RML.

problem: currently neither R2RML nor RML allow to generate a triple whose object is a literal and comes from another data source.

related to:

Challenge resources:

CG discussion:

other relevant discussion:

[C5c] Joins in the language or datatype tag

R2RML: In R2RML, a term map with a term type of rr:Literal may have a specified language tag. It is represented by the rr:language property and its value must be a valid language tag specified as a literal. A datatypeable term map is a term map with a term type of rr:Literal that does not have a specified language tag. The datatype of these literals can be automatically determined based on the SQL datatype of the underlying logical table column (producing a natural RDF literal), or it can be explicitly overridden using the rr:datatype property and its value is an IRI.

RML: RML follows R2RML’s behavior for the datatype but introduces the rml:LanguageMap that allows to define the language tag either as a constant value as in R2RML, or as a template or reference to the data source in the same way a Term Map is defined (even though a language tag is not an RDF term). However, it is still not possible to generate a language or datatype tag from another data source, as it occurs when an Object Map is generated using a Referencing Object Map.

problem:

related to: [C1], [C5c] (in the case of language tags)

Challenge resources:

CG discussion:
https://github.com/kg-construct/mapping-challenges/issues/20

other relevant discussion: