LarKC WP5 Plugin Workshop
18-20 August 2008, Ljubljana (Slovenia)
Meeting minutes
During the meeting, both Bottom-up and Top-down approaches have been discussed and combined
Presentation of CycEur platform code
(Bottom-up approach)
Workshop participants have the opportunity to see the current status of the Cyc platform and “play” with the code, programming their own examples plugins and try to plug them into the platform. CycEur code is on SVN: https://svn.gforge.hlrs.de/svn/larkc (download via SVN checkout)
Cycorp Europe requests, strongly, that people not on WP5 avoid using the code in its current state because: 1) it's not independently buildable without some support that was available at the workshop 2) it's very preliminary, and shouldn't be taken to reflect anything like final (or initial release) APIs. 3) it's tricky to work with without the info that was given at the workshop 4) most of it could be replaced, at any instant, at the moment, by a new automatic translation, and we are not yet ready to avoid overwriting other people's changes. Working with the code in its current state, outside WP5 would be a frustrating mistake, in all probability.
Discussion on API considerations document (by Frank&Annette)
(Top-down approach)
Disscussion on the LarKC plugin's, their functionalities and their interfaces, taking as a basis the API considerations document.
Different phases when using LarKC
- PLUGIN CONSTRUCTION: write a single plugin
- PLATFORM CONFIGURATION: combining existing plugins to solve a task
- PLATFORM DEPLOYMENT: execute queries using a particular plugin configuration
QUESTION: should we allow to mix 2 and 3 by dynamic registration of plugins?
names for the people in each of these steps: 1 = plugin writer; 2 = configuration designer; 3 = end user.
What does the LarKC platform do (not the plugins, but the platform)
- Facilitate writing a single plugin
- data-storage and access;
required I/O data-type definition (triples (pointers & concrete), triple sets, SPARQL queries, QoS info);
- registration facility.
- Configuring/combining existing plugins (= write a D-box, hence all of the single-plugin facilities plus:)
- express control-flow over plugins;
including remote & parallel & anytime;
- plugin registration facility.
In summary:
facilities for data-storage & access of individual plugins
- including a scalable data storage layer
- facilities for data flow between plugins
including standard ways for passing around triples (pointers & concrete), triple sets, SPARQL queries
- including standard ways for passing around QoS
- facilities for control flow between plugins
including remote & parallel & anytime
- facilities for plugin registration
How flexible should the LarKC "pipeline" be?
(See “Control flow in the pipeline” 3 pictures)
- fixed data-flow and fixed control-flow (= original proposal), or
- fixed data-flow and flexible control-flow, or
- flexible data-flow and flexible control-flow
The third picture (from Annette&Frank document) is now accepted in its option 3, that is, the one with “meta” role for decision component (from now on, “D-box”)
This means
- other workflows than I;T;S;R are allowed (see acronyms below),
- multiple plugins of the same type are allowed in the same workflow,
- stacking of architectures is allowed (one plugin being realised by another LarKC-configuration)
QUESTION: how convinced are we about the universality of these four plugin types? Why?
After a discussion on the “Informal online definitions” of the plugins, it is agreed and recommended a change of terminology. This must be communicated to the LarKC “plugins WPs”:
Retrieve => Identify (I)
Abstract => Transform (T)
Select => Select (S)
Infer => Reason (R)
Decide => Decide (D)
General functionality of LarKC
"SPARQL endpoints on steroids", steroids = justifications, QoS, anytime, large-scale,...
How to deploy LarKC
Different options:
- platform installation locally OR remote
- plugin execution at the platform-site OR elsewhere
- use own plugins (after registering and uploading (if needed)) OR use other people's plugins (e.g. found in a registry),
Analyse implications of the different combinations. Remote issues to be analysed (see action points). Also how remote issues interact with registration of new plugins and even dynamic registration in execution time – think about!!
Considerations about plugins features
Are we TRANSFORMing the query and/or abstracting the data?
The current API allows the TRANSFORM component to transform both query and data;
Passing around the query
It seems that the original query must be passed around as an argument to pretty much every plugin-type.
QUESTION: if the query is being transformed, should we pass around also (or: only?) the transformed query?
Different TRANSFORMs per dataset?
Each dataset could need different transformations. This could be done by either calling different TRANSFORMs on different datasets (intelligence in de DECIDE box), or by letting the TRANSFORM plugin do different things on different datasets (intelligence in the TRANSFORM box). Up to the configuration designer.
Data storage functionality
Will the platform provide a (default) data-storage functionality? Or is this always the responsibility of each plugin?
Related: is potential functionality for data caching the responsibility for individual plugins (e.g. REASONER), or built into the platform.
Pairwise dependencies between plugins
Not all plugins are expected to interoperate with all other plugins, there will be mutual dependencies between plugins. Depending on the input / output, the plugin will be allowed to be combined only with the compatible plugins. (“not any combination is permitted”) (e.g. a reasoner that assumes OWL Lite will only interoperate with a RETRIEVE that only retrieves OWL Lite). @@other example??@@ The API-constraints are aimed at maximising interoperability, but won't guarantee it. The API doesn't even guarantee type-compatibility (syntactic interoperability) because of different subtypes for the dataset-type.
QUESTION: Which kind of data types are we going to consider? Must the data type always be rdf triples? => Not necessarily; Can we have input like text documents ?and outputs?...
Division of work between plugins
Apart from the API restrictions, there are only soft rules for deciding which functionality to put in which type of plugin.
Example: Frank/Zhisheng's incremental inconsistency reasoner can be done as either a single REASONer, or as an iteration over SELECT and REASON In order to maximise re-use of plugins we should try to make these "soft" rules as clear as possible.
Similarly, there is a choice between putting functionality in single large plugin, or split it up over multiple plugins of the same type.
Example: a single composite SELECT plugin vs a chain of multiple SELECTs, each selecting from the result of the previous.
Which query language
All plugins are required to handle SPARQL, may optionally handle other query languages /variants of SPARQL
QoS vocabulary
There is no agreement anywhere in CS on good QoS concepts/vocab, hence we leave this open and define what we need as we go along.
First choice will be "max responsetime" and "min nr. of answers"
QoS vocabulary from user will be different from QoS vocabulary of plugins. Role of the DECIDEr to translate between these.
In: SPARQL query + QoS constraints
QUESTION: Shall we include in the output also the QoS parameters that have been achieved? First tentative answer would be yes. Analyse the convenience to do it.
Load-balancing
- load balancing for single user (between plugins)
- load balancing for multiple users (between user jobs) first, only deal with single user
Anytime behaviour
Possibility to have intermediate answers reported to the users, before the final answer is obtained.
Should we impose a standard "listener-paradigm" on plugins so that their anytime behaviours can be composed.
Example of a simple pipeline (see UML diagram from Mick and whiteboard picture from Barry)
Try to build the pipeline in order to identify any problem with the current definition of interfaces:
IDENTIFY (query) (SINDICE Plugin)
Find 10 RDF docs
The KB is identified in the Identify step
TRANSFORM
NONE
SELECT (?TripleSet[])
1 tripleSet
REASON (Triple-including “knowledge”)
Variable bindings
Review of available storyboards from use cases WPs
Match storyboards with plugins pipeline
Storyboard WP7a
1. Offline - finding and processing new additional information (both texts and ontologies) as the base process to keep the repository updated. (Continuously independent of story unique knowledge collection.)
QUESTION: Shall LarKC support this offline process?? => Yes. QUESTION: Is the input of this process also SPARQL query? Should we have another kind of interface to LarKC for this? => Input should be also sparql query. There is a method called “Construct”, defined by sparql. Consider to use it.
a. Retrieve data for LLD (retrieval)
'b. Transform to appropriate representations
'c. Integrate related LLD sources (reasoning)
'd. Retrieve texts (retrieval)
'e. Annotate texts using LLD (retrieval of new knowledge)
We need to write this in detail for a LarKC scenario and sit together with Sheffield to discuss and concrete it. Here we don’t need QoS considerations (QoS == infinite time,...), as this is offline
2. User interaction - finding and processing additional information (both texts and ontologies) identified to develop hypotheses.
a. Retrieve data for LLD (retrieval)
'b. Transform to appropriate representations
'c. Integrate related LLD sources (reasoning)
'd. Retrieve texts (retrieval)
'e. Annotate texts using LLD (retrieval of new knowledge)
We need user queries examples for this!! Here we YES need QoS considerations
3. User interaction – user explore hypotheses by formulating queries using some high-level query language
a. Retrieve triples matching the query (selection, retrieval)
'b. Present associated information: author networks, conceptual links etc. (reasoning, retrieval)
'c. Navigate to related triples and texts (reasoning, selection, retrieval)
4. User interaction - enhancing knowledge bases from findings
a. Annotate texts using LLD (retrieval of new knowledge)
'b. Link to existing knowledge bases (reasoning, retrieval)
5. User interaction – outcome meet expectation? If (Yes) ready, otherwise return to (2)
After this first USEFUL analysis, we must now sit down with them (Use cases partners/ wps) to concrete and solve our questions / doubts Remember: the results of previous steps are very important for them. We have here the issue of the intermediate storage. Shall we give auth to write in the storage in order to enlarge it with the results obtained in previous executions of the plugins? Where: original storage vs another one? When working with new compounds new questions are continuously generate, LarKC will be evaluated as a tool to come up with hypotheses to explain how findings are related to chemical structure, the target, etc.
Ask them what they mean with this “come up with hypotheses...”
5.2.3 – Inconsistency should be found and improve the next loading.What do they mean by “improve the next loading”?
5.2.4 - Information in the repository must include provenance information. To be able to evaluate the value of a result you need to be able to know the source for used data.Do we need to keep track about the source of the original data? When we reach a conclusion, we need to know which original triple sets were involved in this process.
As the process is defined now, in the selection, this info would be lost.
5.3 Data sources to be integrated – Currently ontotext is supporting ?AstraZeneca to transform their documents into RDF triples
Other relevant plugins issues
- Registration of plugins in the platform – review the process currently in the cyc platform. Think about to do a Web services registration and model the plugins as web services. For those plugins not being web services, think about wrapping them up with web services interface. Have a standard wrapper / standard pattern to turn plugins to into web-services, consider using SA-WSDL (semantic annotations), WSDL-S. Enable invocation of remote plugins as web-services (but don't require it) (AP: HLRS)
Self-Description of plugins: metadata describing the features of the plugin – we need to have it by PM13 the latest (for the publc release). First draft to be done by CycEur, HLRS will contribute on the definition of the content that must be there. The representation of knowledge will be done by Cyc. (AP: CycEur, HLRS)
find place/deliverable for work on how to self-describe plugins, registering & finding available plugins on the web, in a registry, using web-service discovery, etc. (make this as OWL descriptions, as a refinement of some ?OpenCyc branch)
- question: some of this information is potentially specific for particular D-boxes
- Discussion about top level decider in the platform. The user can choose another decider and in this case, this info must be included in the query. Should we always consider a “top level decider” as part of the entry point of the platform??
- Collect a QoS vocabulary/language for formulating QoS constraints. Standard vocabulary to talk about performance parameters, network topology, interconnection links,... (AP: HLRS look at such standards, compare, ... Put information in the wiki)
Plan until Amsterdam meeting, Oct 13 – Action Points
(see also D1.2.1 Timeline and Action Points)
coding meetings (biweekly, tuesday@15pm, starting 26th Aug.)
developers meeting (Georgina: check collaborative tool, organize meeting)
catchup with Sheffield(WP2) (Jose), < 26/8
inform WP3 (Frank), < 26/8
WP4 is already in the loop (Amsterdam)
paper exercise API (< 13 Sep) (API against the use case scenarios)
new version by next Tuesday (Mick) – upload to wiki and sourceforge
test against small scenario's (all)
test against storyboards & confront
WP6 (Marko)
WP7a (Amsterdam)
WP7b
code implementation of API (Cyc Eur) (<13 Oct)
hide/replace current subL interfaces behind/with new API
discussion on asynchronous events / listener protocols (Innsbruck) (“anytime behaviour”) (< 13 Sep)
discussion on remote execution (Stuttgart) (< 13 Sep)
make feature roadmap (Amsterdam) (< 13 Sep)
Stutt meeting, Oct. 2008: start on Monday morning (13 Oct.) to do a next Plugin workshop – AP: Georgina to check facilities
Possibility to have a working initial prototype for next Oct meeting? SPARQL endpoint. Look at SPARQL protocol (defined by W3C) – (CycEur + WP5 partners)
Include in sourceforge the “how to build” procedure of the Cyc platform (suggestion from Stefano Bertolo) (AP: Luka)
As soon as we have a stable API and platform, make a “How to write a plugin” manual for plugin developers (AP: CycEur)
- During Stutt meeting, present the conclusions of this meeting in a ppt in a plenary session (WP5 session plenary?)
- Stuttgart meeting – rediscuss / identify the right reviewers for deliverables
- Make timeline for the D5.1 and communicate to WP5 partners. Finalize doc and distribute to partners. Communicate date to reviewer (Hamish) (AP: Georgina)
For the first external release (Month 14): Consider to have a periodic built with new external plugins so that external people to the project can try their plugins => Lower the barrier to enter external “users” (suggestion from Stefano Bertolo). Think about how people can contribute plugins, can experiment with plugins, can call remote plugins
Include the results of discussions during this meeting on interfaces, apis, architecture,... in the next deliverable of WP1, D1.2.1. See Time plan and Action Points corresponding to that deliverable!!
