Knowledge graph construction of heterogeneous data has seen a lot of uptake in the last decade from compliance to performance optimizations with respect to execution time. Besides execution time as a metric for comparing knowledge graph construction, other metrics e.g. CPU or memory usage are not considered. This challenge aims at sparking interest among RDF graph construction systems to comply with the new RML specifications and its modules while benchmarking them regarding e.g. execution time, CPU, memory usage, or a combination of these metrics.
We thank Orange for their generous sponsoring of Virtual Machines for the Challenge. All participants can have their own hardware now and ensures that all participants have exactly the same hardware.
The task is to comply with the new RML specification and its modules in this challenge while also aiming at an efficient implementation regarding execution time and computing resources e.g. CPU and memory usage. This challenge is not limited to execution times to create the fastest pipeline, but also computing resources to achieve the most efficient pipeline.
We provide a tool which can execute such pipelines end-to-end. This tool also collects and aggregates the metrics such as execution time, CPU and memory usage, necessary for this challenge, as CSV files. Moreover, the information about the hardware used during the execution of the pipeline is available as well to allow fairly comparing different pipelines. Your pipeline should consist of Docker images which can be executed on Linux to run the tool.
It is strongly encouraged to use this tool for participating in this challenge. If you prefer to use a different tool or our tool imposes technical requirements you cannot solve, please contact us directly.
Click here to go to the website of the tool.
Click here to go to the website of the resources.
Workflow for submissions of the Challenge:
Do you want to ask questions? Join us in slack
At least one author of each tool needs to present the results during the workshop
(virtual presentations are not allowed)
Test compliance of an engine with all new RML modules:
These parameters are evaluated using synthetic generated data to have more insights of their influence on the pipeline. Data is in CSV (with SQL schema provided) and mappings are in R2RML.
Data Parameters
Mappings Parameters
You can find the KGC Parameters on Zenodo as well where the official version used during the Challenge is published: https://zenodo.org/doi/10.5281/zenodo.10721874
The GTFS-Madrid-Bench provides insights in the pipeline with real data from the public transport domain in Madrid. Mappings are provided in R2RML for scaling and RML for heterogeneity.
Scaling
Heterogeneity
You can find the GTFS-Madrid-Benchmark on Zenodo as well where the official version used during the Challenge is published: https://zenodo.org/doi/10.5281/zenodo.10721874
Submissions must evaluate the following metrics: