WP5 Sessions during LarKC Meeting – Berlin 23-25 Sept 2009
Minutes taken by: Georgina Gallizo
http://wiki.larkc.eu/BerlinMeeting09
WP5 agenda and minutes: http://wiki.larkc.eu/LarkcProject/WP5/Minutes/20090923-25
WP5 Sessions
Wednesday 23.09.2009 (09:30-11:00)
09:30-11:00 (incl. ppts + discussion) - Distributed Data Layer
Data Layer + SOA4ALL-like approach: HLRS proposal & open issues (Georgina) (20 min)
P2P systems (and their application to LarKC: Marvin lessons learned, combination with SOA4ALL approach,...) (Spyros) (20 min)
Feedback from Ontotext and open discusion within WP5
- Spyros (VUA):
JXTA is not scalable enough, it doesn’t scale more than 20 peers.Pastry / ?FreePastry are better options.
- Problems with P2P approach:
- Remote joins is still a big issue
- Unrobust, load balancing problem
- Naso (Onto): smooth degradation of consistency and completeness would also be interesting for LarKC. We don’t aim for 100% completeness and consistency, REMEMBER!
- Privacy: We´ll not deal with it. So far no use case is requiring that.
- High loads in the Data Layer are foreseen not only for the number of (end-)users, but also for the number of plug-ins operating over it at the same time.
- AGREED: 3 main issues we definitely need to address:
- Replication
- Data migration (aka Scheduling, Data warming up) (this is partially related to replication as well)
- Real distribution (chopping up data and distribute it among remote / distributed nodes)
- Provenance: the current data layer allows to keep track of explicit data modificaitons, etc. Provenance should be done at plug-in level, ie the decider (or another plug-in) sjhould make use of the tools the data layer is offering for that. But at platform level is low priority for now, as no use case requested it so far.
- Workflow Views: in Ontotext terminology they are called “named graphs”.
- Metadata graph describing the named graphs
- Possibility to get unique URI for your own workflow
- Another user could get the another user´s graph URI if needed.
- What about a WF with Branches? The decider would need to request 2 URIs guarantee isolation
Named graph + metadata graph is equivalent to SOA4ALL Spaces => Dieter: Why call them different?
- Exposed features to Data Layer users:
- RDFdataset
- MetadataRDFgraph
- Complex Queries
Does OWLIM implement distributed query execution in the Data Layer => NO
- Currently it must be split manually by the workflow designer!!
- No plans to change it for now.
- Metadata Handling
- VOID Annotation, following Review report recommendation
- Plan for Short term (6 months): annotation of datasets
- Plan for Longer term: describe generic data profile of a plug-in
- VOID Annotation, following Review report recommendation
- P2P strategies – see Spyros’ ppt.
RDF storage on top of DHTs, based on content. Data distribution is not yet solved. In order to have joins, we need to have the data at the same node => overload of 1 node
- As we´re doing incomplete reasoning, maybe we can live with overload and increase the time of computation in moving data through peers. We can do anytime behaviour moving data in streaming fashion.
- Implementation of P2P mgmt overlay will be more curiosity driven more than use case driven (for now). But we should investigate on it.
- Start by deploying use cases and measuring performance.
- And identify bottlenecks!!
- We should define performance metrics and goals, in order to see how the p2p solution is improving the use cases and eliminating the bottlenecks.
- We need a demanding use case in terms of data:
- Urban Computing WP6 is a good candidate. Distributed data sources + big datasets:
- Traffic data (big!!!)
- Maps
- Urban Computing WP6 is a good candidate. Distributed data sources + big datasets:
Thursday 24.09.2009 (14:00-15:30)
WP5 Parallel Session
CODE RE-STRUCTURE
- Code restructure. Create one project per plug-in + project for platform + workflow projects
- We need to setup a process for this
- When do we release the restructured code in sourceforge
- No parameters in the constructor. The inizialization method should have this pearameters.
- Next steps:
- We still need class-loaders even if we have separate folders
- For the deployment environment, without eclipse or ant, we need to have the class-loaders. This is something being done by eclipse and ant behind the scenes.
- For every plug-in we need to come up with:
- Jar file
- Lib folder
- Xml doc, machine readable with plug-in description + specific things of the java implementation (order in which libraries need to be loaded,…)
- For each plug-in: plugin.xml are configurationoptions to plug the plugins to the platform, for plug-in developers support
- PluginX.Larkc after compilation
Server where we do the night builds,… : ?SourceForge
- We should release at least 1 version of download with only the platform
- Another download with platform and all plug-ins available
- Application downloads, with platform+plug-ins corresponding to one concrete application example
- Agreement of new structure!!!
- – it´ll be implemented by Alexey / Axel – Date: before ISWC.
- Mick / Alexey will prepare an email for the official migration announcement
DISTRIBUTED MODEL
- Plug-in manager:
- Handle control flow
- Handle data flow
- Handle events
- Check licensing policy of COMPs!! – AP: Alexey
- Is it mature enough??
- First experiments with more than 1 plug-in
Deadline: 1 week clarify licensing issue. 1 more week make assessment and experiment on whether it is to be used in larkc!!
Session on Aligment of strategic issues
- www.oasis-open.org/committees/tc_home.php?wg_abbrev=semantic-ex. See email from Mick with ppt on OASIS SEE TC.
- Georgina + Barry : Plug-in annotation language including update of non-functional properties
- Ioan Toma will develop the ontology as part of his phd thesis.
- NFP: if we find out that we need to add properties, make a proposal to WSMO
- Data layer: Dieter will not accept a non aligned data layer!!
- AP: Naso will organize a workshop in Sofia (before end year) on how to align the concepts of clouds and p2p structured semantic spaces with the poor reality of a single and monolithic space
- Second alignment workshop also including SEALS at FIS, sept 2010 berlin
Friday 25.09.2009 (09:30-11:30)
Plug-in State
3 general approaches
- Input
- Session object (not part of the operations of the service, but in the header)
- Service maintain a repository of states, identified with ids
One possibility would be to leave up tot the plug-ins the decision to be stateless or stateful Caching, efficiency and persistence
- AGREEMENT: we don’t want to pass state as input!
- Decider is managing the lifecycle of plug-ins
- Plug-ins can be stateful if they want. We should have a flag indicating whether they are or not (in the wsdl)
- Take context parameter in the API for this purpose!!
Multiplexers
- Splits: same data to two plug-ins
Split half and half data => we will not consider it for now
- Joint: merge data coming from two plug-ins
Simple merge followed by a reasoner plug-in?? or should we do inference as part of the merge?.
Simple union and 1 tree replication.
In the data layer always inference happens when we put data together (manifestation).
We need to look at synchronization!!
Frist draft:
- Split (Queue, Queue [])
- Join (Queue [], Queue)
AP: Luka will implement a first version. Update to be made by Alexey, integrated with the distributed model solution.
Exceptions handling
- Currently we pass Control messages and data flows.
- Should a plugin failing inform the plug-ins before and after also , besides the decider
- We need to analyse the quee-based architecture and decide queues this will be the right way of implementation !!
- We need a demanding use case to test scalability of this design
- We need to test the event handling, making one plug-in fail.
- Make first tests with Alpha Urban LarKC.
Timeline
(see wiki, WP5 meeting, Berlin Session) - http://wiki.larkc.eu/LarkcProject/WP5/Minutes/20090923-25
