LarKC Plug-in API - First Draft

1. Introduction

This early API is based on the discussions at the API workshop in Ljubljiana, Slovenia in August 2008. This document represents v0.0.1 of the LarKC Plug-in API. The API is in a very very early stage and should not yet be used for developing plug-ins for the LarKC platform, as it will most certainly change. No backward compatibility support will be provided to any version of the LarKC API until v1.0 has been reached. You have been warned! :)

1.1. The Data Objects Present in the LarKC Platform

The Data Objects Present in the LarKC Platform

1.2. The API of the major Plugins in the LarKC Platform

The API of the major Plugins in the LarKC Platform

2. Overview

The following is an overview of each of the plugin types within the LarKC API. The aim of this is to provide more details on the UML diagram above, to provide some examples of plugins and how they would use the API, and to provide rationales for the decisions taken in the API.

Note: Throughout this document the term “user” or “end-user” is used. By this term we mean the person or entity that has submitted a SPARQL query to the LarKC platform.

3. Identify (Formerly known as Retrieve)

The role of the Identify task is to identify resources that could be used for answering the end user’s query and make the locations of these resources available to the LarKC platform. The identify component has the following interface:

   1 public Collection<DataSet> identify(Query theQuery);

The identify component, given a query will return a collection of data sets, with each of these data sets being a pointer to a resource. These resources may be a triple store(s), RDF resources on the web, or natural language documents

3.1. Examples

Example 1 ?TripleStoreIdentifier: This example of an identifier would take a SPARQL query as input and locate triple stores that have relevant triples for this SPARQL query

Example 2 Sindice: This example of an identifier would take a keyword query and execute it against the Sindice search engine. The result would be a collection of URLs for rdf documents on the web that match the keyword query.

Note: The keyword query would be generated from the input SPARQL query to the LarKC platform. This generation would happen in a ?QueryTransformer (see below). Thus integrating Sindice into the LarKC platform would involve creating an Identify and a Transform Plugin. The rationale behind this is that bringing this transformation outside the Identify component encourages reuse (as other identify components could reuse the Sindice transformer from SPARQL to keywords).

Example 3 Google: This example of an identifier would take a keyword query and execute it against the google search engine, the result would be a collection of URLs for RDF documents and natural language documents from the web.

Note: the same type of keyword transformation as in example 2 is required. Also the fact that natural language documents are found by the identify component means that later in the pipeline a natural language document to triple transformation will need to be applied (Prior to selection).

4. Transform (Formerly Abstract)

The role of the transform task is to provide a means for transforming different items within the LarKC platform. The transform task is separated over two distinct types of transform components, namely the ?QueryTransformer, which is capable of transforming queries from one format to another, and the ?DataSetTransformer, which is capable of transforming datasets from one format to another. The two components have the following interfaces:

?QueryTransformer

   1 public Set<Query> transform(Query theQuery);
   2 public boolean canTransform(Query theQuery);

?DataSetTransformer

   1 public DataSet transform(DataSet theDataSet);
   2 public boolean canTransform(DataSet theDataSet);

Note: The purpose of the canTranform method is to enable a decider component to check that a given Query or ?DataSet can be transformed by this plugin. This may be pure type checking, i.e. canTransform() will return true if the query is a SPARQLQuery, or may do further checking, i.e. canTransform() will return true if the query is a SPARQLQuery and has a particular structure.

4.1. Examples

Example 1 SPARQL to Keyword ?QueryTransformer: Certain identifiers will not be compliant with SPARQL thus a means is needed to transform the SPARQL query sent to the LarKC platform into something meaningful for the identify component. In this example a SPARQLQuery would be the input to the transformer and the output would be a ?KeywordQuery which could be then passed to an identifier, e.g. a Google or Sindice based Identifier.

Example 2 SPARQL to SPARQL Terminology ?QueryTransformer: In order to find more resources for answering the users query it may be necessary to transform the terminology of the query into other formats. This sort of transformer would perform mappings based on an existing or learned mapping. A good example of this would be when the query is specified in terms of the FOAF ontology. It may be desirable to transform this query in terms of the ontology used by a Facebook RDF export. Thus the transformer could transform all the foaf#knows in the query to facebook#friend to find more data sources

Example 3 ?TripleSet to ?TripleSet Terminology ?DataSetTransformer: Based on the transformer in example 2 the resources that would come back from the identifier could be in terms of different ontologies, i.e. FOAF and Facebook. Thus in order to enable selection and reasoning the triples from the Facebook resource need to be transformed to the FOAF terminology. This transformer would thus take a stream of triples and replace all the references to facebook#friend to foaf#knows

Example 4 Natural Language Document to ?TripleSet ?DataSetTransformer: As mentioned earlier, the Identify step may find natural language documents on the web that do not contain RDF triples. This transformer would enable the generation of RDF triples from the content in the natural language document.

Note: When building a pipeline with a ?DataSetTransformer it is important that this transformer is placed as late in the pipeline as possible (i.e. it may happen after the select task and not before). This is necessary to ensure that only the data sampled by the select task is transformed and not the entire 1 billion triples within a store. Of course in cases like Example 4 above, this transformation must happen before the select component, as otherwise there would no triplesets available for the select component to select from.

5. Select (Could be renamed to Sample?)

The role of the select task is to take a sample of the triples that have been found by the identify task and to make this sample available to the reason task. The select component has the following interface:

   1 public TripleSet select (Collection <TripleSet> theTripleSets);

Thus the selector tasks a collection of resources that contain triples as input and provides a single resource as output containing the sample to be reasoned over.

5.1. Examples

Examples for the selecter component are required here. Anyone any ideas of how a selector will do its job? Can you give an example?

6. Reason (Formerly Infer)

The role of the reason task is to provide reasoning over the sample obtained by the select component and to provide answers to the end users input query. Its interface is that of a SPARQL End point and thus supports the 4 main SPARQL methods:

   1 public VariableBinding sparqlSelect(SPARQLQuery theQuery, TripleSet theTripleSet);
   2 public TripleSet sparqlConstruct(SPARQLQuery theQuery, TripleSet theTripleSet);
   3 public TripleSet sparqlDescribe(SPARQLQuery theQuery, TripleSet theTripleSet);
   4 public boolean sparqlAsk(SPARQLQuery theQuery, TripleSet theTripleSet);

6.1. Examples

Example 1 Jena: Need to talk to WP4 about this

Example 2 OWL: Need to talk to WP4 about this

Example 3 WSML2Reasoner: Need to talk to WP4 about this

Example 4 Cyc: Could Cyc provide an example here of how they would meet the 4 interfaces

7. Decide

The role of the decide task is to manage the whole process of answering the user’s query from resource identification through to reasoning. The decide component is also responsible for checking the quality of service parameters provided by the end user. If the number of results found is too few and more time is available then the decide component may start another loop of the components in order to find more results. The interface of the decide component is the same as the interface of LarKC itself, in other words it is a SPARQL Endpoint, however it provides additional parameters to the SPARQL entrypoints:

   1 public VariableBinding sparqlSelect(SPARQLQuery theQuery);
   2 public VariableBinding sparqlSelect(SPARQLQuery theQuery, QoSParameters theQoSParameters);
   3 public VariableBinding sparqlSelect(SPARQLQuery theQuery, TripleSet sstheTripleSet, QoSParameters theQoSParameters);
   4         
   5 public TripleSet sparqlConstruct(SPARQLQuery theQuery);
   6 public TripleSet sparqlConstruct(SPARQLQuery theQuery, QoSParameters theQoSParameters);
   7 public TripleSet sparqlConstruct(SPARQLQuery theQuery, TripleSet theTripleSet, QoSParameters theQoSParameters);
   8 
   9 public TripleSet sparqlDescribe(SPARQLQuery theQuery);
  10 public TripleSet sparqlDescribe(SPARQLQuery theQuery, QoSParameters theQoSParameters);
  11 public TripleSet sparqlDescribe(SPARQLQuery theQuery, TripleSet theTripleSet, QoSParameters theQoSParameters);
  12         
  13 public boolean sparqlAsk(SPARQLQuery theQuery);
  14 public boolean sparqlAsk(SPARQLQuery theQuery, QoSParameters theQoSParameters);
  15 public boolean sparqlAsk(SPARQLQuery theQuery, TripleSet theTripleSet, QoSParameters theQoSParameters);
  16         
  17 public void reportStatus(); 

For each of the SPARQL entry-points there are three methods, the first enables the user to just specify a SPARQL query to be executed against whatever the LarKC platform can find, the second is the same but constrains LarKC by some QoSParamters, and the third constrains the location in which LarKC may look for triples to answers the end users query.

The reportStatus method is currently under-specified but is intended to provide a means for plugins to report their status back to the decider, i.e. I am nearly finished, I have a problem I can't solve, or I am finished. The Decider can use this feedback from the plugins to aid it in making further decisions.

7.1. Examples

Example 1 Fixed order Decider: The most basic decider is one which is hardcoded to execute a given set of plugins in a given order based on a configuration file

Example 2 Meta reasoning based Decider: The most complicated decider is one which uses the meta-data provided by each of the plugins registered with the LarKC platform to build a pipeline that can be used to answer the user’s query and dynamically control/re-organise these components.

8. Plugin

All of the plugin interfaces listed above are sub interfaces of the Plugin interface, which provides common methods to all Plugins. The Plugin interface is as follows:

   1 public String getIdentifier();
   2 public QoSInformation getQoSInformation();
   3 public void setInputQuery(Query theQuery);
   4 public MetaData getMetaData();

getIdentifier provides the identifier of the plugin, getQoSInformation allows the plugin to provide information to the decider regarding the way in which it executes in terms of quality of service (this is currently under-specified and needs to be refined later), setInputQuery is used to provide access for all plugins to the original query posed by the end user (which may be needed for their internal function), getMetaData enables plugins to provide metadata about their functionality, which can be used by a complicated decide component as described in example 2 in the decider section.

LarkcProject/WP5/docs/platform/ApiFirstDraft (last edited 2008-09-10 10:43:38 by ?BarryBishop)