Data Layer API

A new update of the information is available here

Only the class diagrams have been updated!

This page is a proposal specification for the LarKC Data Layer API to be implemented in the first prototype. Data Layer API is used to by all LarKC platform plugin to persistently store and exchange information. An important distinctive feature is the usage of triplesets as a primitive to pass arbitrary sets of RDF statements by reference.

The specification presents first the conceptual model we aim to implement described by simple text and second a formal API to realize it.

Conceptual model

LarKC data layer is designed to operate with RDF data provided by 4 different types:

  1. RDF store
    • Data modification
    • SPARQL query
    • Pattern query
    • Tripleset [optional?]
  2. RDF published on the web
    • No data modification (no able to modify the remote content)
    • SPARQL query (possible?)
    • Pattern query
    • No tripleset (no serialization format!)
  3. RDF exposed via SPARQL endpoint
    • No data modification (SPARQL do not support it for now)
    • SPARQL query
    • No pattern query (blank nodes could not be expressed)
    • Tripleset [optional?]
  4. Distribtued Query processor (do we intend to build/integrated such a component?)
    • No data modification
    • SPARQL query executed over multiple endpoints
    • No pattern query (blank nodes could not be expressed)
    • Tripleset [optional?]

?DataSet is a SPARQL RDF dataset as defined in http://www.w3.org/TR/rdf-sparql-query/#rdfDataset. It contains one or more default graphs (used to interpret the graph pattern with no specified graph name) and zero or multiple named graphs.

SELECT *
FROM <urn:graph:default:1>
FROM <urn:graph:default:2>
FROM NAMED <urn:graph:named:1>
FROM NAMED <urn:graph:named:2>
WHERE
{
   ?s1 ?p1 ?o1 . // selects from urn:graph:default:*
   GRAPH ?g { ?s2 ?p2 ?o2 } // selects from urn:graph:named:*
}

?TripleSet is a non strict extension of SPARQL RDF dataset to allow to pass arbitrary set of statements by reference. It is a non standard SPARQL endpoint extension, so the triplesets are not encoded in the SPARQL syntax but used a separate endpoint argument.

SELECT *
FROM <urn:graph:default:1>
FROM <urn:graph:default:2>
FROM NAMED <urn:graph:named:1>
FROM NAMED <urn:graph:named:2>
WHERE
{
   ?s1 ?p1 ?o1 . // selects from urn:graph:default:* AND associated to <urn:tripleset:1>
   GRAPH ?g { ?s2 ?p2 ?o2 } // selects from urn:graph:named:* AND associated to <urn:tripleset:1>
}

Issue: Do we need to support conjunction/disjunction of triplesets? No, this probably we will be done in a new SPARQL construct.

Formal API specification

This section presents the types formally to define the concepts defined in the previous section.

eu.larkc.core.data.?RdfGraph - this is an abstract type to represent a RDF multi-graph (collection of quads).

eu.larkc.core.data.?TripleSetStoreConnection - this is a ORDI data connection [RDF type 1 with ?TripleSet support]

eu.larkc.core.data.?RdfStoreConnection - the type is a minimilistic interface to be implemented by quad stores. [RDF type 1 with no ?TrileSet support]

eu.larkc.core.data.?RemoteRdfGraph - RDF data published in a REST-full way / could be extended with microformats convertion [RDF type 2]

eu.larkc.core.data.SPARQLEndpoint - evetyhing that might provide a SPARQLEndpointService [RDF type 3 with/without ?TripleSet suppport]

eu.larkc.core.data.?DistributedQueryProcessor - query processor which opperates over multiple SPARQL endpoints. This could be intergated or LarKC implementation. [RDF type 4]

LarKC Data Layer API.png

eu.larkc.core.data.?DataSet - this represents SPARQL RDF dataset as defined in http://www.w3.org/TR/rdf-sparql-query/#rdfDataset. The type contains one or more default graphs (used to interpret the graph pattern with no specified graph name) and zero or multiple named graphs.

eu.larkc.core.data.?TripleSet - this represents an extension of SPARQL RDF dataset to allow to pass arbitrary set of statements by reference. It is a non standard SPARQL endpoint extension, so the triplesets are not encoded in the SPARQL syntax but used a separate endpoint argument.

This is a reminder for the LarKC plugin API

LarKC plugins.png

Functionality to be consider:

1. Do we need/Is it possible to implement a transparent mechanism to pass ?RdfGraph by reference or value, depending on the size and the non-functional parameters?

2. Do we need to implement URI/URL resolve mechanism in the data layer? Currently the ?DataFactory required URL, but not URI!

Examples

RDF Data Provider

This example demonstrates how to open a remote RDF graph accessible via 303 redirect. It supports only a simple triple pattern query or retrieval of all statements.

        URL url = new URL("http://dbpedia.org/data/Rock_and_Roll:"
                        + "_an_Introduction_to_The_Velvet_Underground.rdf");
        RemoteRdf remote = DataFactory.INSTANCE.createRemoteRdf(url);
        TriplePatternQuery query = new TriplePatternQueryImpl(null,
                        new URIImpl("http://www.w3.org/1999/02/22-rdf-syntax-ns#type"),
                        null);

        CloseableIterator<Statement> i = remote.search(query);
        while (i.hasNext()) {
                System.out.println(i.next());
        }

This example demonstrates how to setup a connection to tripleset repository and create a SPARQL construct query. Then how to add a new statement associated to a tripleset. Finally, how to set the scope of the query to the specific tripleset.

        TripleSetStoreConnection con = DataFactory.INSTANCE
                        .createTripleSetStoreConnectionn(new URL("http://whatever.com"));

        SPARQLQuery query = DataFactory.INSTANCE
                        .createSPARQLQuery("construct {?s ?p ?o} where {?s ?p ?o}");

        con.addStatement(new URIImpl("urn:a"), new URIImpl("urn:a"),
                        new URIImpl("urn:a"), new URIImpl("urn:a"), new URIImpl("urn:ts:ts"));

        query.setDataSet(DataFactory.INSTANCE.createTripleSet(new URIImpl(
                        "urn:ts:ts")));
        RdfGraph result = con.getSPARQLEndpoint().executeGraph(query);
        CloseableIterator<Statement> b = result.getStatements();
        while (b.hasNext()) {
                System.out.println(b.next());
        }

Pass RdfGraph by value or reference

The examples could be started as test case from https://svn.gforge.hlrs.de/svn/larkc/trunk/platform_v04/src/tests/test/eu/larkc/core/data/Examples.java

How to pass RdfGraph by value

        String topic = "http://dbpedia.org/data/Rock_and_Roll:"
                        + "_an_Introduction_to_The_Velvet_Underground.rdf";
        URL url = new URL(topic);

        // Access RDF graph exposed according linked data principles
        RemoteRdf remote = df.createRemoteRdf(url);

        // Transfer all remote data
        CloseableIterator<Statement> iter = remote.getStatements();
        Set<Statement> statements = new HashSet<Statement>();
        while (iter.hasNext()) {
                Statement s = iter.next();
                statements.add(s);
        }

        // Create a RdfGraph passed by value
        URI graphName = new URIImpl(topic);
        RdfGraph graph = df.createRdfGraph(graphName, statements);

How to pass RdfGraph by HTTP reference

RDF data published on the web could be passed by reference using the URL:

        String topic = "http://dbpedia.org/data/Rock_and_Roll:"
                        + "_an_Introduction_to_The_Velvet_Underground.rdf";
        URL url = new URL(topic);

        // Access RDF graph exposed according linked data principles
        RdfGraph remote = df.createRemoteRdf(url);

How to pass RdfGraph by DataSet/TripleSet reference

Another alternative way to pass a collection of statement is to pass them by SPARQL dataset (limited to complete graph named) or ?TripleSet (any arbitrary set of statements).

        // Create a new DataSet
        String topic = "http://dbpedia.org/data/Rock_and_Roll:"
                        + "_an_Introduction_to_The_Velvet_Underground.rdf";
        URI uri = new URIImpl(topic);
        Set<URI> graphs = new HashSet<URI>();
        graphs.add(uri);

        // Create a graph for this DataSet
        DataSet ds = df.createDataSet(graphs);

        // Pass the link to the DataSet
        RdfGraph graph = df.createDataSetGraph(uri, ds);

LarkcProject/WP5/DataLayerAPI (last edited 2010-02-24 12:44:37 by ?VassilMomtchev)