Abstract

In the Resource Description Framework, literals are composed of a UNICODE string (the lexical form), a datatype IRI, and optionally, when the datatype IRI is rdf:langString, a language tag. Any IRI can take the place of a datatype IRI, but the specification only defines the precise meaning of a literal when the datatype IRI is among a predefined subset. Custom datatypes have reported use on the web of data, but their support by RDF processors is rare and implementation specific. We propose a mechanism for a generic support of custom datatypes. Following simple guidelines, (i) definitions of arbitrary custom datatypes may be published on the web, and (ii) a generic RDF processor or SPARQL query engine can discover datatypes on-the-fly, and perform operations uniformly.

This document provides:

Status of This Document

This document is merely a public working draft of a potential specification. It has no official standing of any kind and does not represent the support or consensus of any standards organisation.

Table of Contents

1. Conformance

As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.

The key words MUST and SHOULD are to be interpreted as described in [RFC2119].

Custom Datatypes MUST conform to these guidelines.

RDF processors and SPARQL query engines SHOULD take care of the code they process.

2. Recommendations for Custom Datatype Publishers

Recommendations for Custom Datatype Publishers are as follows.

  1. Use a HTTP IRI aaa to identify custom datatype Da.
  2. When a RDF processor looks up aaa, let it retrieve a document that contains executable code.
  3. The executable code implements CustomDatatypeFactory, an interface with a unique function getDatatype( iri ), whose behaviour is described below.
  4. When called with parameter aaa, function getDatatype( iri ) returns an object that implements interface CustomDatatype described below.

3. The Application Programming Interface

This API provides mechanisms that enable developers to process custom datatypes and custom literals uniformly.

3.1 Interface CustomDatatypeFactory

Interface CustomDatatypeFactory MUST implement the following methods.

[Constructor]
interface CustomDatatypeFactory {
    CustomDatatype getDatatype (String iri);
};

3.1.1 Methods

getDatatype

In the context of the code contained in a custom datatype specification file, i.e., an instance of interface CustomDatatypeFactory, getDatatype(uri) returns an instance of interface CustomDatatype, or throws an exception.

ParameterTypeNullableOptionalDescription
iriString
Return type: CustomDatatype

3.2 Interface CustomDatatype

Interface CustomDatatype MUST implement the following methods.

[Constructor]
interface CustomDatatype {
    String   getIri ();
    Boolean  isWellFormed (String lexicalForm);
    Boolean  recognisesDatatype (String datatypeIri);
    String[] getRecognisedDatatypes ();
    Boolean  isEqual (String lexicalForm1, String lexicalForm2, optional String datatypeIri2);
    Integer  compare (String lexicalForm1, String lexicalForm2, optional String datatypeIri2);
    String   getNormalForm (String lexicalForm);
    String   importLiteral (String lexicalForm, String datatypeIri);
    String   exportLiteral (String lexicalForm, String datatypeIri);
};

3.2.1 Methods

compare
Answers if the value of literal with lexical form lexicalForm and this datatype is lower, equal, or greater than the value of literal with lexical form lexicalForm2 and datatype identified by IRI datatypeIri2.
ParameterTypeNullableOptionalDescription
lexicalForm1String
lexicalForm2String
datatypeIri2String
Return type: Integer
exportLiteral
Answers the lexical form of a literal with datatype identified by datatypeIri, with a value equal to that of a literal with lexical form lexicalForm and this datatype.
ParameterTypeNullableOptionalDescription
lexicalFormString
datatypeIriString
Return type: String
getIri
Returns the IRI of this custom datatype.
No parameters.
Return type: String
getNormalForm
Answers the normalized lexical form of the literal with lexical form lexicalForm and this datatype.
ParameterTypeNullableOptionalDescription
lexicalFormString
Return type: String
getRecognisedDatatypes
Answers an array of datatypes IRI this custom datatype recognises.
No parameters.
Return type: String[]
importLiteral
Answers the lexical form of a literal with this datatype, with a value equal to that of a literal with lexical form lexicalForm and datatype identified by datatypeIri.
ParameterTypeNullableOptionalDescription
lexicalFormString
datatypeIriString
Return type: String
isEqual
Answers if literal with lexical form lexicalForm1 and this datatype has the same value as literal with lexical form lexicalForm2 and datatype identified by IRI datatypeIri2.
ParameterTypeNullableOptionalDescription
lexicalForm1String
lexicalForm2String
datatypeIri2String
Return type: Boolean
isWellFormed
Answers if this lexical form is well formed.
ParameterTypeNullableOptionalDescription
lexicalFormString
Return type: Boolean
recognisesDatatype
Answers if this custom datatype recognises the datatype with the given IRI.
ParameterTypeNullableOptionalDescription
datatypeIriString
Return type: Boolean

3.3 Intra-conformance Constraints for interface CustomDatatype

Let da be the specification object of a custom datatype Da identified by IRI aaa, i.e., an instance of interface CustomDatatype returned by a call to method getDatatype(aaa). da is intra-conformant if and only if all of the following is true

Method getIri MUST be such that: Method isWellFormed MUST be such that: Method getNormalForm MUST be such that: Method recognisesDatatype MUST be such that: Method recognisedDatatypes MUST be such that: Method importLiteral MUST be such that: Method exportLiteral MUST be such that: Method isEqual MUST be such that: Method compare MUST be such that:

3.4 Extra-conformance Constraints for interface CustomDatatype

Let da be the specification object of a custom datatype Da identified by IRI aaa. da is extra-conformant if and only if, for every bbb in da.getRecognisedDatatypes(), all of the following is true.

Method importLiteral MUST be such that: Method exportLiteral MUST be such that: Method isEqual MUST be such that: Method compare MUST be such that:

3.5 Inter-conformance Constraints for interface CustomDatatype

Let da be the specification object of a custom datatype Da identified by IRI aaa. da is inter-conformant if and only if, for every bbb in da.getRecognisedDatatypes() such that one may retrieve a specification object db of the custom datatype Db identified by IRI bbb, then db is conformant, and all of the following is true.

Method importLiteral MUST be such that: Method isEqual MUST be such that: Method compare MUST be such that:

3.6 Conformance Constraints for interface CustomDatatype

Let da be the specification object of a custom datatype Da identified by IRI aaa. da is conformant if and only if it is intra-, extra-, and inter-conformant altogether.

A. Implementations

A.1 Implementation of Custom Datatype Length

The implementation of Custom Datatype Length is available at URL http://w3id.org/lindt/v1/custom_datatypes

A.2 Implementation in the Jena RDF processor and the ARQ SPARQL engine

The implementation of Jena and ARQ with support for on-the-fly recognition of custom datatypes is available at URL https://github.com/thesmartenergy/jena

B. Experiments

B.1 Resources

datasets files, are based on DBpedia 2014 English specific mapping-based properties dataset.

The test program is bundled in a Maven project DBpediaLengthQueries

B.2 SPARQL Queries

The following are the [sparql11-query] requests that are evaluated on each dataset.

Dataset dbpedia
          PREFIX dbpdt: <http://dbpedia.org/datatype/>
          SELECT ?x ?prop ?length ?metres WHERE {
            VALUES (?factor ?unit)
            { (0.001 dbpdt:millimetre)
              (0.01 <http://dbpedia.org/datatype/centimetre>)
              (1 <http://dbpedia.org/datatype/metre>)
              (1000 <http://dbpedia.org/datatype/kilometre>)
            }
            ?x ?prop ?length .
            BIND (?factor*<http://www.w3.org/2001/XMLSchema#decimal>(?length) as ?metres)
            FILTER(datatype(?length) = ?unit) 
            FILTER( ?metres < 5 ) 
          } 
          ORDER BY DESC ( ?metres )
          LIMIT 100
                
Dataset custom
          PREFIX cdt: <http://w3id.org/lindt/v1/custom_datatypes#>
          SELECT ?x ?prop ?length WHERE {
            ?x ?prop ?length .
            FILTER(datatype(?length) = cdt:length )
            FILTER( ?length < "5m"^^cdt:length )
          } 
          ORDER BY DESC (?length)
          LIMIT 100
                
Dataset qudt
          PREFIX qudt: <http://qudt.org/schema/qudt#>
          PREFIX qudt-unit: <http://qudt.org/vocab/unit#>
          SELECT ?x ?prop ?length (?factor*?length as ?metres) WHERE {
          VALUES (?factor ?unit)
            { (0.001 qudt-unit:millimetre)
              (0.01 qudt-unit:centimetre)
              (1 qudt-unit:metre)
              (1000 qudt-unit:kilometre)
            }
            ?x ?prop [
              qudt:quantityValue [
                qudt:numericValue ?length ;
                qudt:unit ?unit ] ] .
            FILTER( ?factor*?length < 5 )
          }
          ORDER BY DESC (?metres)
          LIMIT 100
                

B.3 Instructions to reproduce the experiment

  1. Clone the Jena fork with support for on-the-fly recognition of custom datatypes.
  2. Run Maven clean and build on project Jena - Core
  3. Run Maven clean and build on project Jena - ARQ
  4. Download and unzip Maven project DBpediaLengthQueries.
  5. Download datasets files, which are based on DBpedia 2014 English specific mapping-based properties dataset. Unzip them in directory DBpediaLengthQueries/dbpedia.
  6. Run Maven clean and build on project DBpediaLengthQueries. Note: the number of iterations has been reduced to 10 in the source file.
  7. Run java program lindt.dbpedialengthqueries.Main
  8. Results are available in file DBpediaLengthQueries/results.txt

C. References

C.1 Normative references

[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://tools.ietf.org/html/rfc2119

C.2 Informative references

[sparql11-query]
Steven Harris; Andy Seaborne. SPARQL 1.1 Query Language. 21 March 2013. W3C Recommendation. URL: http://www.w3.org/TR/sparql11-query/