WP5 sessions in Milano meeting, 18-21 May 2009
Monday 18 May 2009
Plenary session: WP5 status presentation
Tuesday 19 May 2009
- 10:00 – 10:20 (20 min) - Requirements analysis: final check to LarKC requirements (D5.3.1) and approval by WP5 partners
- o Mick: Chapter 1, sections 2.1, 2.2, 2.3, 4.3, chapter 5; Table 1, table 8, Table 10, table 14
- o Vassil: Sections 2.5.3, 2.5.4, chapter 5, Table 8, Table 9, Table 15, Table 16
- o Michael: Sections 2.5.3, 2.5.4, 4.2, 4.5, chapter 6; Table 4, table 6, Table 7, table 11
- o Gulay: Sections 2.5.8, 2.5.9, chapter 6; , table 3, Table 12, Table 13
- o Luka: Sections 2.5.5, 2.5.6, 2.5.7, 2.5.9, 4.4; table 3, table 5, table 14, table 16
- o Eyal: Sections 2.4, 2.5 introd., 2.5.1, 2.5.2, 4.1, chapter 5; table 2, table 4, table 10, table 15
- o Alexey: Sections 2.4, 2.5 introd., 2.5.1, 2.5.2, 4.6; table 2, table 5, table 6, table 7
- o Spyros: Sections 2.5.5, 2.5.6, 2.5.7; table 9, table 11, Table 12, Table 13
- 10:20 - 10:25 (5 min) – Timeline definition for opening and announcing early adopters environment (GForge, support list,…)
- 10:25 – 11:45 (1h20 min) - Other issues related to Platform Support:
- (UBL) Merge and pull results: How to approach, in general, the merge of several SELECT plug-ins results, which should be input to a REASON plug-in (Platform vs special Plug-in vs Distributed SPARQL query)?
- Mick is implementing a solution for UBL (as part of the Platform´s PSS)
- Distributed SPARQL query? (distributed reasoning? is this implying reasoing on different sparql endpoints?)
- distributed database problem. Frank: it would be the "test" to have an imcomplete solution
- 1st solution: send sparql queries to the different sparql endpoints and then join the results (REMOTE JOIN). This would give us incomplete results. Virtuoso on EC2 is doing something similar.
- 2nd solution: Federated RDF store. Most expensive in terms of performance,... need big resources for that.
- Data Layer should offer the utility to do that, but plug-ins must decide how to solve it (making use of that utility).
We should have WP5-Plug-ins sessions in order to have this kind of discussions.
- data layer has to "invent" the heuristics on how to do the join! Nobody can do arbitrary distributed joins in an optimal way.
- to pass datasets (by an identifier) we need to have a sparql endpoint (either remote or local). Is possible to change the implementation of the dataset and pass it not only be reference but also by value? This would solve a lot of problems.
- (UBL) Data caching: who should take care of the data caching or removing (Platform vs DECIDE)?
- data is kept in data layer between iterations. This may lead to wrong / different solutions
- Eyal:
- Data isolation (conflicts between iterations): put the data in different graphs. Independent from each user /iteration. This would fill up the data layer if nobody remove it. You can put diffrent labels in the same data, avoiding duplicating data. This must be done inside the plug-in. Include guidelines in the best practices.
- Cache data in the data layer:once a user finish it execution, delete data. You can lazy deleting, just mark it as deleted. This would cost less to the next user needing this data.
Clean up: add clean-up method in the plug-in API. AP: Mick will add method to every plug-in.
- Georgina: who should keep state and ocntext of all plug-ins running (should it be PPS and Plug-in managers)? Mick: Decider!
AP: Mick have some ideas. Mick will develop a first implementation of this and then we must check whether it works fine.
We still have the problem of centralization. We are putting more and more functionality to the Data Layer in a centralize approach (single point) . We need to investigate alternative solutions. Small Data store locally to every plug-in, where they can process it. But we have problem on how to ship data between plug-ins. This is smthg to investigate during Y2.
Query answering in a p2p system: proposal for the next meeting to have a ppt given by Spyros.
- (UBL) Rising and catching exceptions: General approach for rising exceptions and catching them. Who should catch plug-in exceptions (platform vs DECIDE)?
* DECIDE should catch exceptions from plug-ins and decide what to do with it (Plug-in=>Platform (Plug-in Manager+PSS) => DECIDE). Create a notification system.
Think how to allow plug-ins to ask questions to the DECIDE!! => To be further discussed!! Open issue in the tracking system
- (UBL) Plug-in config files: location and accessibility of plug-in config files. Should the platform offer a repository for that purpose?
- related to the libraries conflicts. Where we store the libraries for the plug-in we will also store the config files for the plug-ins.
- relevant plug-in data for the DECIDE must be in the Meta-data
- config files are only accessible by the plug-in itself. Plug-in must ask platform where its content is located and the platform must offer a method to access it.
- Data Preparation Plug-in: shall we create a new type, shall it be TRANSFORM, other solution?
- query-driven vs data-driven. Analysis whether every plug-in type can be both
- TRANSFORM or REASONER?
AP: make a list of all plug-ins type and try to figure out the data-driven and the query-driven implementation of each one. Eyal start discussion in mailing list. This would clean the definition of the API. Preparation "plug-in" is a chain of data-driven plug-ins.
- We should create an input to the LarKC platform that is query independent (currently it is only possible to have a query as an input: SPARQL endpoint of steroids)
- Other pending issues from the requirements tables (D5.3.1): monitoring and performance measurement, benchmarking and performance logging,…
- (UBL) Merge and pull results: How to approach, in general, the merge of several SELECT plug-ins results, which should be input to a REASON plug-in (Platform vs special Plug-in vs Distributed SPARQL query)?
- 11:45 – 12:00 (15 min) - Best practices for coding in LarKC: completeness of files, libraries conflicts, class-loaders,…
- We must freeze a stable version before the Early Adopters workshop.
- Restrict who can commit to the platform!
Create a branch for the early adopters and freeze it, next Monday afternoon. AP: Fix branch on Thursday (Alexey will do).
- Class-loaders: all libraries and config files from a plug-in will go to its own class-loader.Then we need a separate project per plug-in if we want to run it with Eclipse. The class-loader will be in the plug-in manager
AP: Alexey will outline a plan on how to proceed (think about osgi) (5 June 2010). Luka will support him to perform a first research on the class-loaders issue.
- Best practices: keep track of components, every developer to provide certain info according to Axel´s email.
Wednesday 20 May 2009
- 9:00 – 9:30 (30 min) - Pre-review feedback and how WP5 is affected: should we change our direction in anything? - discussed in the plenary session, where Frank gave us feedback.
- 9:30 – 10:30 (60 min) - Distribution and remote execution: distributed containers, streaming, IBIS, cluster,…
- Split of plug-in manager:
- ship plug-in code to the remote location
- ship input data (with JavaGAT, using GridFTP or other)
- 1st step: having running remotely at all
- 2nd step: which functionality the platform must have to decide when it is worth to have it running remotely.
- Anytime behaviour: platform should provide facility to check whether the result file has changed or not.
- *** We need intermediate results back and also intermetiate input: streaming input and streaming output.
- How to pass from remote to remote without passing through the decider?
- Can you start the job and at runtime get the input data before having to send it in advance?
- Pull vs push data.
- We need to analyse how the data is managed in the local case and see if it works for the remote one.
- Remote plug-in manager is sent, wrapping the plug-in itself.
- *** Job is submitted when the decide plug-in build the pipeline. For this purpose, we need to reserve the resources for a longer time. We would be wasting resources until input data arrives.
- *** Data federation. Passing data by reference. Issue for Data Layer. (complete this issue description!!)
- Two info must be passed to the plug-in: Control and Data
- Local plug-in manager general + Remote plug-in manager specific for the concrete environment
- *** Control flow messages
- *** Using queues: data queue + control queue (notifications are sent through this queue)
- How to solve the scheduling issue? running a plug-ins which take short will have no sense if we need to wait 10 min to start execution.
- How to cope with unexpected issues: we need suddenly to communicate with a new plug-in or use a new data storage
- Mick is currently changing PPS and Plug-in manager for the local case. HLRS must keep a look at the changes in order to be in line with the remote implmentation.
- Split of plug-in manager:
- 10:30 – 11:30 (60 min) - General Platform discussion: *** Not discussed during the meeting finally ***
- Platform as a Server vs. SW component
- Extra support Always on vs. Only when needed
- Plug-in registry: how much it´s dependant on Cyc,how the registration and discovery is working, grade of automation and how much plug-in developers should learn about it, …
- Use of Semantic Web Service technology
- 11:30 – 11:50 (20 min) - AOB
- 11:50 – 12:00 (10 min) - Summary Next Steps
Thursday 21 May 2009
1st LarKC review rehearsal => cancelled. Anticipated to the previous days.
