SPARQL-Generate - query heterogeneous documents and generate RDF
Generate RDF from web documents in XML, JSON, CSV, HTML, CBOR, and plain text with regular expressions.
SPARQL-Generate is an extension of SPARQL 1.1 for querying not only RDF datasets but also documents in arbitrary formats. It offers a simple template-based option to generate RDF Graphs from documents, and presents the following advantages:
- Anyone familiar with SPARQL can easily learn SPARQL-Generate;
- SPARQL-Generate leverages the expressivity of SPARQL 1.1: Aggregates, Solution Sequences and Modifiers, SPARQL functions and their extension mechanism.
- It integrates seamlessly with existing standards for consuming Semantic Web data, such as SPARQL or Semantic Web programming frameworks.
Maxime Lefrançois, Antoine Zimmermann, Noorani Bakerally A SPARQL extension for generating RDF from heterogeneous formats, In Proc. Extended Semantic Web Conference, ESWC, May 2017, Portoroz, Slovenia (long paper - PDF - BibTeX)
Maxime Lefrançois, Antoine Zimmermann, Noorani Bakerally Flexible RDF generation from RDF and heterogeneous data sources with SPARQL-Generate, In Proc. the 20th International Conference on Knowledge Engineering and Knowledge Management, EKAW, Nov 2016, Bologna, Italy (demo track - PDF - BibTeX)
Maxime Lefrançois, Antoine Zimmermann, Noorani Bakerally Génération de RDF à partir de sources de données aux formats hétérogènes, Actes de la 17ème conférence Extraction et Gestion des Connaissances, EGC, Jan 2017, Grenoble, France - (PDF - BibTeX)
Use SPARQL-Generate as:
See our predefined SPARQL binding functions and SPARQL-Generate iterator functions. You can also leverage the SPARQL 1.1 extension mechanism and implement your own functions to support any other format.
Test, evaluate, contribute
Our tests report contains tests from related work and more. You can request a new unit test, a new binding function or iterator function, via the mailing list or the issue tracker. We also led a comparative evaluation with the RML reference implementation.
This work has been partly funded by the ITEA2 12004 SEAS (Smart Energy Aware Systems) project, the ANR 14-CE24-0029 OpenSensingCity project, and a bilateral research contract with ENGIE R&D.