LarKC WP5 Plugin Workshop

18-20 August 2008, Ljubljana (Slovenia)

Meeting minutes

During the meeting, both Bottom-up and Top-down approaches have been discussed and combined

Presentation of CycEur platform code

(Bottom-up approach)

Workshop participants have the opportunity to see the current status of the Cyc platform and “play” with the code, programming their own examples plugins and try to plug them into the platform. CycEur code is on SVN: https://svn.gforge.hlrs.de/svn/larkc (download via SVN checkout)

Cycorp Europe requests, strongly, that people not on WP5 avoid using the code in its current state because: 1) it's not independently buildable without some support that was available at the workshop 2) it's very preliminary, and shouldn't be taken to reflect anything like final (or initial release) APIs. 3) it's tricky to work with without the info that was given at the workshop 4) most of it could be replaced, at any instant, at the moment, by a new automatic translation, and we are not yet ready to avoid overwriting other people's changes. Working with the code in its current state, outside WP5 would be a frustrating mistake, in all probability.

Discussion on API considerations document (by Frank&Annette)

(Top-down approach)

Disscussion on the LarKC plugin's, their functionalities and their interfaces, taking as a basis the API considerations document.

Different phases when using LarKC

  1. PLUGIN CONSTRUCTION: write a single plugin
  2. PLATFORM CONFIGURATION: combining existing plugins to solve a task
  3. PLATFORM DEPLOYMENT: execute queries using a particular plugin configuration

QUESTION: should we allow to mix 2 and 3 by dynamic registration of plugins?

names for the people in each of these steps: 1 = plugin writer; 2 = configuration designer; 3 = end user.

What does the LarKC platform do (not the plugins, but the platform)

  1. Facilitate writing a single plugin
    1. data-storage and access;
    2. required I/O data-type definition (triples (pointers & concrete), triple sets, SPARQL queries, QoS info);

    3. registration facility.
  2. Configuring/combining existing plugins (= write a D-box, hence all of the single-plugin facilities plus:)
    1. express control-flow over plugins;
    2. including remote & parallel & anytime;

    3. plugin registration facility.

In summary:

How flexible should the LarKC "pipeline" be?

(See “Control flow in the pipeline” 3 pictures)

The third picture (from Annette&Frank document) is now accepted in its option 3, that is, the one with “meta” role for decision component (from now on, “D-box”)

This means

QUESTION: how convinced are we about the universality of these four plugin types? Why?

After a discussion on the “Informal online definitions” of the plugins, it is agreed and recommended a change of terminology. This must be communicated to the LarKC “plugins WPs”:

Retrieve => Identify (I)

Abstract => Transform (T)

Select => Select (S)

Infer => Reason (R)

Decide => Decide (D)

General functionality of LarKC

"SPARQL endpoints on steroids", steroids = justifications, QoS, anytime, large-scale,...

How to deploy LarKC

Different options:

Analyse implications of the different combinations. Remote issues to be analysed (see action points). Also how remote issues interact with registration of new plugins and even dynamic registration in execution time – think about!!

Considerations about plugins features

Are we TRANSFORMing the query and/or abstracting the data?

The current API allows the TRANSFORM component to transform both query and data;

Passing around the query

It seems that the original query must be passed around as an argument to pretty much every plugin-type.

QUESTION: if the query is being transformed, should we pass around also (or: only?) the transformed query?

Different TRANSFORMs per dataset?

Each dataset could need different transformations. This could be done by either calling different TRANSFORMs on different datasets (intelligence in de DECIDE box), or by letting the TRANSFORM plugin do different things on different datasets (intelligence in the TRANSFORM box). Up to the configuration designer.

Data storage functionality

Will the platform provide a (default) data-storage functionality? Or is this always the responsibility of each plugin?

Related: is potential functionality for data caching the responsibility for individual plugins (e.g. REASONER), or built into the platform.

Pairwise dependencies between plugins

Not all plugins are expected to interoperate with all other plugins, there will be mutual dependencies between plugins. Depending on the input / output, the plugin will be allowed to be combined only with the compatible plugins. (“not any combination is permitted”) (e.g. a reasoner that assumes OWL Lite will only interoperate with a RETRIEVE that only retrieves OWL Lite). @@other example??@@ The API-constraints are aimed at maximising interoperability, but won't guarantee it. The API doesn't even guarantee type-compatibility (syntactic interoperability) because of different subtypes for the dataset-type.

QUESTION: Which kind of data types are we going to consider? Must the data type always be rdf triples? => Not necessarily; Can we have input like text documents ?and outputs?...

Division of work between plugins

Apart from the API restrictions, there are only soft rules for deciding which functionality to put in which type of plugin.

Example: Frank/Zhisheng's incremental inconsistency reasoner can be done as either a single REASONer, or as an iteration over SELECT and REASON In order to maximise re-use of plugins we should try to make these "soft" rules as clear as possible.

Similarly, there is a choice between putting functionality in single large plugin, or split it up over multiple plugins of the same type.

Example: a single composite SELECT plugin vs a chain of multiple SELECTs, each selecting from the result of the previous.

Which query language

All plugins are required to handle SPARQL, may optionally handle other query languages /variants of SPARQL

QoS vocabulary

There is no agreement anywhere in CS on good QoS concepts/vocab, hence we leave this open and define what we need as we go along.

First choice will be "max responsetime" and "min nr. of answers"

QoS vocabulary from user will be different from QoS vocabulary of plugins. Role of the DECIDEr to translate between these.

In: SPARQL query + QoS constraints

QUESTION: Shall we include in the output also the QoS parameters that have been achieved? First tentative answer would be yes. Analyse the convenience to do it.

Load-balancing

- load balancing for single user (between plugins)

- load balancing for multiple users (between user jobs) first, only deal with single user

Anytime behaviour

Possibility to have intermediate answers reported to the users, before the final answer is obtained.

Should we impose a standard "listener-paradigm" on plugins so that their anytime behaviours can be composed.

Example of a simple pipeline (see UML diagram from Mick and whiteboard picture from Barry)

Try to build the pipeline in order to identify any problem with the current definition of interfaces:

IDENTIFY (query) (SINDICE Plugin)

Find 10 RDF docs

TRANSFORM

SELECT (?TripleSet[])

1 tripleSet

REASON (Triple-including “knowledge”)

Review of available storyboards from use cases WPs

Match storyboards with plugins pipeline

Storyboard WP7a

1. Offline - finding and processing new additional information (both texts and ontologies) as the base process to keep the repository updated. (Continuously independent of story unique knowledge collection.)

QUESTION: Shall LarKC support this offline process?? => Yes. QUESTION: Is the input of this process also SPARQL query? Should we have another kind of interface to LarKC for this? => Input should be also sparql query. There is a method called “Construct”, defined by sparql. Consider to use it.

We need to write this in detail for a LarKC scenario and sit together with Sheffield to discuss and concrete it. Here we don’t need QoS considerations (QoS == infinite time,...), as this is offline

2. User interaction - finding and processing additional information (both texts and ontologies) identified to develop hypotheses.

We need user queries examples for this!! Here we YES need QoS considerations

3. User interaction – user explore hypotheses by formulating queries using some high-level query language

4. User interaction - enhancing knowledge bases from findings

5. User interaction – outcome meet expectation? If (Yes) ready, otherwise return to (2)

After this first USEFUL analysis, we must now sit down with them (Use cases partners/ wps) to concrete and solve our questions / doubts Remember: the results of previous steps are very important for them. We have here the issue of the intermediate storage. Shall we give auth to write in the storage in order to enlarge it with the results obtained in previous executions of the plugins? Where: original storage vs another one? When working with new compounds new questions are continuously generate, LarKC will be evaluated as a tool to come up with hypotheses to explain how findings are related to chemical structure, the target, etc.

Ask them what they mean with this “come up with hypotheses...”

5.2.3Inconsistency should be found and improve the next loading.What do they mean by “improve the next loading”?

5.2.4 - Information in the repository must include provenance information. To be able to evaluate the value of a result you need to be able to know the source for used data.Do we need to keep track about the source of the original data? When we reach a conclusion, we need to know which original triple sets were involved in this process.
As the process is defined now, in the selection, this info would be lost.

5.3 Data sources to be integrated – Currently ontotext is supporting ?AstraZeneca to transform their documents into RDF triples

Other relevant plugins issues

  • Registration of plugins in the platform – review the process currently in the cyc platform. Think about to do a Web services registration and model the plugins as web services. For those plugins not being web services, think about wrapping them up with web services interface. Have a standard wrapper / standard pattern to turn plugins to into web-services, consider using SA-WSDL (semantic annotations), WSDL-S. Enable invocation of remote plugins as web-services (but don't require it) (AP: HLRS)
  • Self-Description of plugins: metadata describing the features of the plugin – we need to have it by PM13 the latest (for the publc release). First draft to be done by CycEur, HLRS will contribute on the definition of the content that must be there. The representation of knowledge will be done by Cyc. (AP: CycEur, HLRS)

    • find place/deliverable for work on how to self-describe plugins, registering & finding available plugins on the web, in a registry, using web-service discovery, etc. (make this as OWL descriptions, as a refinement of some ?OpenCyc branch)

    • question: some of this information is potentially specific for particular D-boxes
  • Discussion about top level decider in the platform. The user can choose another decider and in this case, this info must be included in the query. Should we always consider a “top level decider” as part of the entry point of the platform??
  • Collect a QoS vocabulary/language for formulating QoS constraints. Standard vocabulary to talk about performance parameters, network topology, interconnection links,... (AP: HLRS look at such standards, compare, ... Put information in the wiki)

Plan until Amsterdam meeting, Oct 13 – Action Points

(see also D1.2.1 Timeline and Action Points)

  1. coding meetings (biweekly, tuesday@15pm, starting 26th Aug.)

    1. developers meeting (Georgina: check collaborative tool, organize meeting)

    2. catchup with Sheffield(WP2) (Jose), < 26/8

    3. inform WP3 (Frank), < 26/8

    4. WP4 is already in the loop (Amsterdam)

  2. paper exercise API (< 13 Sep) (API against the use case scenarios)

    1. new version by next Tuesday (Mick) – upload to wiki and sourceforge

    2. test against small scenario's (all)

    3. test against storyboards & confront

      1. WP6 (Marko)

      2. WP7a (Amsterdam)

      3. WP7b

  3. code implementation of API (Cyc Eur) (<13 Oct)

    1. hide/replace current subL interfaces behind/with new API

  4. discussion on asynchronous events / listener protocols (Innsbruck) (“anytime behaviour”) (< 13 Sep)

  5. discussion on remote execution (Stuttgart) (< 13 Sep)

  6. make feature roadmap (Amsterdam) (< 13 Sep)

  7. Stutt meeting, Oct. 2008: start on Monday morning (13 Oct.) to do a next Plugin workshop – AP: Georgina to check facilities

  8. Possibility to have a working initial prototype for next Oct meeting? SPARQL endpoint. Look at SPARQL protocol (defined by W3C) – (CycEur + WP5 partners)

  9. Include in sourceforge the “how to build” procedure of the Cyc platform (suggestion from Stefano Bertolo) (AP: Luka)

  10. As soon as we have a stable API and platform, make a “How to write a plugin” manual for plugin developers (AP: CycEur)

  11. During Stutt meeting, present the conclusions of this meeting in a ppt in a plenary session (WP5 session plenary?)
  12. Stuttgart meeting – rediscuss / identify the right reviewers for deliverables
  13. Make timeline for the D5.1 and communicate to WP5 partners. Finalize doc and distribute to partners. Communicate date to reviewer (Hamish) (AP: Georgina)
  14. For the first external release (Month 14): Consider to have a periodic built with new external plugins so that external people to the project can try their plugins => Lower the barrier to enter external “users” (suggestion from Stefano Bertolo). Think about how people can contribute plugins, can experiment with plugins, can call remote plugins

  15. Include the results of discussions during this meeting on interfaces, apis, architecture,... in the next deliverable of WP1, D1.2.1. See Time plan and Action Points corresponding to that deliverable!!

Plugin workshop/Meeting minutes (last edited 2008-08-25 13:12:48 by ?GeorginaGallizo)