Toolbox 80 - Ingénierie et Interopérabilité des Systèmes Informatiques (I2SI)

Département Informatique et Systèmes intelligents - Institut Henri Fayol

Data interoperability and Semantics (20h)

Program

  • Fri 14/09/2018 08:00 - 12:15 - EF Cours room S2.20, then TD/TP room S2.23 (ML)

  • Fri 28/09/2018 08:00 - 12:15 - EF Cours room S2.03, then TP room S2.23 (ML)

  • Fri 05/10/2018 08:00 - 12:15 - EF Cours room S2.03, then TP room S2.23 (ML)

  • Fri 19/10/2018 08:00 - 12:15 - EF Cours room S2.03, then TP room S2.23 (AZ)

  • Fri 26/10/2018 08:00 - 12:15 - EF Cours room S2.03, then TP room S2.23 (AZ)

Session 1

Download the following zip containing different documents (the document extensions were all replaced by .txt). Explore these files and discuss the formats you know, the specificities of these documents. Can you group file according to their formats? Are there some that seem more complicated to you? What do you think the pros and cons are for each format?

Let us review some of them.

  • XML

    • Short introduction

    • Reference specification: https://www.w3.org/TR/2006/REC-xml11-20060816/

    • Exercise: We want to exchange documents that describe members of a given family with their birthdate, height, weight, along with the history of the trips these members did throughout a given day, reporting on the locations, times, and transportation means (by foot, bicycle, bus, car, train, plain, boat, underground, you choose)

  • CSV

    • no standard, just many different implementation choices

Practical session

Validate the documents you proposed online, and prettify them.

In your favourite text editor, check for tools (potentially additional plugins) to:

  • highlight the code

  • validate the syntax

  • prettify the code

For your favorite programming language(s), check the libraries to manipulate (export, import, prettify, validate) XML, JSON, CSV. Develop projects to test these libraries. Ask me for help or recommendation whenever needed.

Develop a class to compute something from your document.

Important
Add a first element in the root of your document with information about the developers of the document, and upload the document to the google drive of this session

Session 2

During this session, you will read and learn from the reference specifications, and test these specifications with Java projects. Using the Maven build tool, potentially Maven plugins, is recommended. You will gather everything in a report that you will send me at the end of the session. You are left (almost) alone learning new things using online resources you have to find, because, well, that’s how we all learn new things at the end of the day. Enhance your ability to learn from the original specifications, to find additional informational resources (slides, tutorials), to cross-check information, and to learn with projects.

You will reuse and enhance the XML examples of last session.

XML Datatypes 1.1 (starting point: the XML Schema Definition Language 1.1 Datatypes spec)

  • Generalities

    • Learn about datatypes, value/lexical space, fundamental and constraining facets

  • Times

    • Read and understand the specification of XML Datatypes 1.1 for Dates and Times (primitive types and other built-in datatypes)

    • Read and understand how Java Simple Date Formats are formed

    • Learn how to use the Java 8 java.time package (dive into the package documentation, the documentation of the different classes and their methods)

    • Develop a method to generate/parse XML datetimestamps from instances of Java.time.Instant

    • Develop a method to parse and add a XML duration to a XML datetimestamp

  • base64Binary (the profile pic of the members of the family)

    • Read and understand the specification of XML base64Binary literals. The spec references a spec (a RFC). Browse it and compare Base64 with Base32 and Base16

    • Find and learn how to use relevant libraries to generate/parse BaseXX from outputStream/inputStream in Java

    • Develop a method to read a picture on the disk, and generate the Base64 representation of this picture

    • What are the pros and cons of these encoding? When/why is it worth encoding the pictures in a XML document vs sending them in a separate document? Btw, how would you do this?

  • Quantity values

    • Learn about the forecoming javax.measure API (Units of Measurement)

    • Find and browse the UCUM specification

    • Develop methods to manipulate lengths using the javax.measure API and the UCUM specification (find and use the relevant APIs)

    • Learn about our proposed Custom Datatypes

    • Propose datatypes for (i) quantity values for electricity with alternative current (complex value with real/irreal, or with module/phase), (ii) position and orientation in a frame of reference (cartesian and polar coordinate systems)

    • write a library to manipulate, compare, and exchange literals with these datatypes in an interoperable manner (best: generate an appropriate javadoc and website for your library)

XML Schema 1.1 (starting point: the XML Schema Definition Language 1.1 spec)

  • Browse and grab a (shallow) understanding of the spec (check out, cross-check, external slide sets and tutorials)

    • Write XML Schema definition for the use case of two weeks ago

    • Find a Maven plugin to generate classes from your XML Schema in the generate-sources lifecycle phase

    • Learn about POJO and the JSR 222 (hint: slideshare, javadoc, jmdoudoux)

Session 3

  • JSON-Path

    • http://jsonpath.com/

    • Exercise: From both JSON documents about weather forecast, write JSON Path queries to retrieve:

      • (i) the latitude and longitude

      • (ii) the icon

  • JSON-Schema

    • Browse and grab a (shallow) understanding of the spec - https://json-schema.org/understanding-json-schema/

    • Exercise: Write a JSON-Schema to represent members of the family and their trips (our running example)

      • Define required keys for objects, minimum and maximum values, regular expressions for birthdate, enumeration for the transportation type

  • JSON-Schema - Java

    • Find and test a library to generate Java classes from a JSON Schema in the generate-sources lifecycle phase

Inside a consortium, agreeing on the form of JSONSchema documents to be exchanged is one way to reach data interoperability.

  • Learn about the FIWARE Data Models for the Smart City

  • Find the sources of the FIWARE Data Models on GitHub, with the JSON Schemas

  • Check out the Data Models for the Transportation sub-domain, and write a script/program to Generate the Java classes for it

Session 4

The set of slides for the course on Knowledge Modeling are:

Additional sets of slides are available. It is not mandatory to learn them.