Knowledge graph construction of heterogeneous data has seen a lot of uptake in the last decade from compliance to performance optimizations with respect to execution time. Besides execution time as a metric for comparing knowledge graph construction, other metrics e.g. CPU or memory usage are not considered. This challenge aims at benchmarking systems to find which RDF graph construction system optimizes for metrics e.g. execution time, CPU, memory usage, or a combination of these metrics.
The task is to reduce and report the execution time and computing resources (CPU and memory usage) for the parameters listed in this challenge, compared to the state-of-the-art of the existing tools and the baseline results provided by this challenge. This challenge is not limited to execution times to create the fastest pipeline, but also computing resources to achieve the most efficient pipeline.
We provide a tool which can execute such pipelines end-to-end. This tool also collects and aggregates the metrics such as execution time, CPU and memory usage, necessary for this challenge, as CSV files. Moreover, the information about the hardware used during the execution of the pipeline is available as well to allow fairly comparing different pipelines. Your pipeline should consist of Docker images which can be executed on Linux to run the tool.
It is strongly encouraged to use this tool for participating in this challenge. If you prefer to use a different tool or our tool imposes technical requirements you cannot solve, please contact us directly.
Click here to go to the website of the tool and the resources of the benchmark.
Workflow for submissions:
Do you want to ask questions? Join us in slack
At least one author of each tool needs to present the results during the workshop
(virtual presentations are not allowed)
These parameters are evaluated using synthetic generated data to have more insights of their influence on the pipeline. Data is in CSV (with SQL schema provided) and mappings are in R2RML.
Data Parameters
Mappings Parameters
The GTFS-Madrid-Bench provides insights in the pipeline with real data from the public transport domain in Madrid. Mappings are provided in R2RML for scaling and RML for heterogeneity.
Scaling
Heterogeneity
Submissions must evaluate the following metrics:
All Resources and ground truth are openly available in Zenodo: https://doi.org/10.5281/zenodo.7837289. Although, using the tool provided (https://github.com/kg-construct/challenge-tool), everything is automatically setup.