2009-12-03 Joint WP1-WP2 Phone Conference
AGENDA:
- Who is there
- Brainstorming session to discuss possible ways of integrating work in these two WPs
- Discuss development of "user interests based selection plugin" (Yi Zeng)
- Discuss plug-ins developed by USFD in the context of the LarKC's use cases (Mark)
- Discuss joint work between USFD and Onto on selection methods (Johann Petrak)
- Plug-ins being developed by CEFRIEL (Emanuele)
PARTICIPANTS:
Gaston Tagni (VUA), Barry Bishop (STI), Emanuele Della Valle (CEFRIEL), Yi Zeng (WICI), Danica Damljanovic (USFD), Mark Greenwood (USFD), Johann Petrak (USFD), Lael Schooler (MPG), Jose Quesada (MPG)
INTRODUCTION:
The purpose of this joint session between members of WP1 and WP2 is to bring the work of the two WPs closer with the aim of designing an improved operational framework for problem solving in LarKC, which the main goal of deliverable D1.2.2 "Improved Operational Framework". See work on D1.2.2 for more details.
In this deliverable we will attempt to map out the landscape for combining techniques from multiple disciplines for the purpose of achieving web scale reasoning. No easy task!.
The platform and plug-ins we have delivered so far are too coarse-grained as can be seen from the fact that we have mostly wrapped existing components. For the future we should be looking at exposing more fine-grained elements of plug-ins for re-use, i.e. individual functions, algorithms, data-structures, etc. For example, I could imagine a deductive reasoner replacing a sorted index with a cognitively inspired selection algorithm.
The main goal of this deliverable should be to identify as many potential situations where re-use of fine-grained elements is possible. These situations will be inspired by close consultation with the technical work packages 2, 3 and 4.
METHODOLOGY
We agreed that a good place to start would be with a survey of all LarKC personnel actively involved in plug-in creation. This will include WP 2,3,4 leaders, plus all people project-wide who write plug-ins or wrap existing components as plug-ins. We should ask them to list:
all potentially re-usable components (and sub-components) that their plug-ins do or could expose all missing functionality (e.g. from platform, data layer or missing utilities) that they would have liked to use/re-use People will need to be imaginative and should not limit themselves just to source code. Components for re-use could be anything from database structures and java classes, to 3rd party libraries and services.
CONCRETE DISCUSSION
Basically, the purpose of these dicussions is to get insights into the specific plug-ins being developed in each WP (WP2/3/4) and get a better understanding of the type of problems/tasks being solved/addressed by these plug-ins in order to see what features can be reused among plug-ins and what is the functionality that these plug-ins expect from the LarKC platform in order to achieve their goals. On the basis of these discussions we should be able to come up with a matrix of functionality/features for each plug-in and then integrate them into a coherent operational framework.
To make things more concrete, we would like to know which plug-ins are being developed in these WPs and for each of them things like:
- What is the functionality they provide?
- What is the specific task/problem being solved by the plug-in?
- How is the (selection/reasoning/transformation) problem (and the solution implemented by the plug-in) modeled?. This should help in learning common features among plug-ins that can be abstracted into the operational framework.
- What features/functions these plug-in expect/require from the platform? E.g. to be able to get feedback from the execution of other plug-ins or, to be able to invoke other plug-ins without the need to send a request to a decider plug-in.
- What other issues are relevant for each plug-in. E.g. trust, data privacy, etc.
MINUTES:
Notes from Barry/Gaston:
- Yi: Capture context of user, i.e. save information about how they interact with a system. Then use this information when executing a query by using the context to re-write the query. (Re-uses other WP2 components).
- Yi: User's preferences are specified as constraints in the SPAQRL query and extracted by the plug-in upon execution.
- Barry: Perhaps we can exploit OWLIM's ability to attach any number of labels to any statement (triple), e.g. a simple case of attaching a username to triples that the user retrives often
- Lael: is it possible to put activation data (e.g. in the form of labels) not only on a single triple but on a molecule, i.e. set of triples, subgraph. Molecules may have their own activation rather than computing it as a function of the activation of their constituent triples.
- Gaston: molecules could be explicitly defined or "emerge" and change as a function of time and usage.
- Johann: How to evaluate Yi's approach?
- Yi: initial evaluation may consist in comparing the results of a "standard" selection (query w/o user's profile) with those obtained from a selection based on the re-written SPARQL query.
- Yi: Context capture is temporal and recent context is used to modify queries. Field study with users - 100% positive feedback.
- Mark: WP7b - select medline articles relevant to SNIPs, only using small pieces of text, but plan to expand to full abstracts perhaps with keywords to select docs and then abstract search.
- Barry: Perhaps text search can be pushed in to the data-layer/triple-store? Can be done using SPARQL? Are there current techniques for doing this efficiently?
Emanuele: Add timestamp to triples => quads, query engine uses a window-based, stream-based selection, very high-bandwidth
Emanuele: Push filter down to stream management system (even aggregation). For notes on evaluation and more details on the work on this plug-in see: D2.6.1 (Strategies & Design for Selection methods from Stream management Systems) and D3.1 (Survey of relevant literature on abstraction and learning), D3.3 (Description of strategy and design for Data Stream Management approaches)
- Yi: just minor addition to your note for what I said. The context is captured considering both frequecy and recency of user interests. User interests are decaying all the time if they do not show up for some time. And the decaying mechanism is very much like the forgetting mechanism of cognitive memory. So a power law like model is adopted. This is one strategy that we did around Feburary and March. Now we developed more strategy to extract the interests considering different factors, such as lasting time of a interest, etc.
PLUG-IN DESCRIPTION
WICI's selection plug-in
(updated on Dec 22nd, 2009)
Functionality provided by the plug-in:
- We are in the process of developing a user interests based selection plugin. The plugin is supposed to:
- extract users’ previous interests from the RDF in the data layer by various interests acquisition models proposed by WICI;
- before a specific query from a user coming, gradually select the RDF subset based on user interests (each time, 1-2 top interests is involved)
- when the query from the user comes, gradually query on the selected RDF subsets one by one, and provides the results to the user one subset by one subset.
What is the specific task/problem being solved by the plug-in?
- Selecting more relevant RDF subsets based on users’ previous interests for later querying process.
How is the (selection/reasoning/transformation) problem (and the solution implemented by the plug-in) modeled?
- We assume users current query will have close relationship with previous interests, and we observe that many queries provided by the user may be very vague (do not include user background information as contexts for the query). So selecting user interests related subsets for later querying process is needed. Different selection strategies are provided for the interleaving process.
What features/functions these plug-in expect/require from the platform?
- Decider need to pass information whether the search results are satisfied by the user. If not, the selector needs to change selection strategies.
What other issues are relevant for each plug-in. E.g. trust, data privacy, etc.
- None for now
