WP5 sessions in Milano meeting, 18-21 May 2009

Monday 18 May 2009

Tuesday 19 May 2009

  1. 10:00 – 10:20 (20 min) - Requirements analysis: final check to LarKC requirements (D5.3.1) and approval by WP5 partners
    1. o Mick: Chapter 1, sections 2.1, 2.2, 2.3, 4.3, chapter 5; Table 1, table 8, Table 10, table 14
    2. o Vassil: Sections 2.5.3, 2.5.4, chapter 5, Table 8, Table 9, Table 15, Table 16
    3. o Michael: Sections 2.5.3, 2.5.4, 4.2, 4.5, chapter 6; Table 4, table 6, Table 7, table 11
    4. o Gulay: Sections 2.5.8, 2.5.9, chapter 6; , table 3, Table 12, Table 13
    5. o Luka: Sections 2.5.5, 2.5.6, 2.5.7, 2.5.9, 4.4; table 3, table 5, table 14, table 16
    6. o Eyal: Sections 2.4, 2.5 introd., 2.5.1, 2.5.2, 4.1, chapter 5; table 2, table 4, table 10, table 15
    7. o Alexey: Sections 2.4, 2.5 introd., 2.5.1, 2.5.2, 4.6; table 2, table 5, table 6, table 7
    8. o Spyros: Sections 2.5.5, 2.5.6, 2.5.7; table 9, table 11, Table 12, Table 13
  2. 10:20 - 10:25 (5 min) – Timeline definition for opening and announcing early adopters environment (GForge, support list,…)
  3. 10:25 – 11:45 (1h20 min) - Other issues related to Platform Support:
    • (UBL) Merge and pull results: How to approach, in general, the merge of several SELECT plug-ins results, which should be input to a REASON plug-in (Platform vs special Plug-in vs Distributed SPARQL query)?
      • Mick is implementing a solution for UBL (as part of the Platform´s PSS)
      • Distributed SPARQL query? (distributed reasoning? is this implying reasoing on different sparql endpoints?)
        • distributed database problem. Frank: it would be the "test" to have an imcomplete solution
        • 1st solution: send sparql queries to the different sparql endpoints and then join the results (REMOTE JOIN). This would give us incomplete results. Virtuoso on EC2 is doing something similar.
        • 2nd solution: Federated RDF store. Most expensive in terms of performance,... need big resources for that.
        • Data Layer should offer the utility to do that, but plug-ins must decide how to solve it (making use of that utility).
        • We should have WP5-Plug-ins sessions in order to have this kind of discussions.

        • data layer has to "invent" the heuristics on how to do the join! Nobody can do arbitrary distributed joins in an optimal way.
      • to pass datasets (by an identifier) we need to have a sparql endpoint (either remote or local). Is possible to change the implementation of the dataset and pass it not only be reference but also by value? This would solve a lot of problems.
    • (UBL) Data caching: who should take care of the data caching or removing (Platform vs DECIDE)?
      • data is kept in data layer between iterations. This may lead to wrong / different solutions
      • Eyal:
        • Data isolation (conflicts between iterations): put the data in different graphs. Independent from each user /iteration. This would fill up the data layer if nobody remove it. You can put diffrent labels in the same data, avoiding duplicating data. This must be done inside the plug-in. Include guidelines in the best practices.
        • Cache data in the data layer:once a user finish it execution, delete data. You can lazy deleting, just mark it as deleted. This would cost less to the next user needing this data.
        • Clean up: add clean-up method in the plug-in API. AP: Mick will add method to every plug-in.

      • Georgina: who should keep state and ocntext of all plug-ins running (should it be PPS and Plug-in managers)? Mick: Decider!
      • AP: Mick have some ideas. Mick will develop a first implementation of this and then we must check whether it works fine.

      • We still have the problem of centralization. We are putting more and more functionality to the Data Layer in a centralize approach (single point) . We need to investigate alternative solutions. Small Data store locally to every plug-in, where they can process it. But we have problem on how to ship data between plug-ins. This is smthg to investigate during Y2.

      • Query answering in a p2p system: proposal for the next meeting to have a ppt given by Spyros.

    • (UBL) Rising and catching exceptions: General approach for rising exceptions and catching them. Who should catch plug-in exceptions (platform vs DECIDE)?
    • * DECIDE should catch exceptions from plug-ins and decide what to do with it (Plug-in=>Platform (Plug-in Manager+PSS) => DECIDE). Create a notification system.

      • Think how to allow plug-ins to ask questions to the DECIDE!! => To be further discussed!! Open issue in the tracking system

    • (UBL) Plug-in config files: location and accessibility of plug-in config files. Should the platform offer a repository for that purpose?
      • related to the libraries conflicts. Where we store the libraries for the plug-in we will also store the config files for the plug-ins.
      • relevant plug-in data for the DECIDE must be in the Meta-data
      • config files are only accessible by the plug-in itself. Plug-in must ask platform where its content is located and the platform must offer a method to access it.
    • Data Preparation Plug-in: shall we create a new type, shall it be TRANSFORM, other solution?
      • query-driven vs data-driven. Analysis whether every plug-in type can be both
      • TRANSFORM or REASONER?
      • AP: make a list of all plug-ins type and try to figure out the data-driven and the query-driven implementation of each one. Eyal start discussion in mailing list. This would clean the definition of the API. Preparation "plug-in" is a chain of data-driven plug-ins.

      • We should create an input to the LarKC platform that is query independent (currently it is only possible to have a query as an input: SPARQL endpoint of steroids)
    • Other pending issues from the requirements tables (D5.3.1): monitoring and performance measurement, benchmarking and performance logging,…
  4. 11:45 – 12:00 (15 min) - Best practices for coding in LarKC: completeness of files, libraries conflicts, class-loaders,…
    • We must freeze a stable version before the Early Adopters workshop.
    • Restrict who can commit to the platform!
    • Create a branch for the early adopters and freeze it, next Monday afternoon. AP: Fix branch on Thursday (Alexey will do).

    • Class-loaders: all libraries and config files from a plug-in will go to its own class-loader.Then we need a separate project per plug-in if we want to run it with Eclipse. The class-loader will be in the plug-in manager
    • AP: Alexey will outline a plan on how to proceed (think about osgi) (5 June 2010). Luka will support him to perform a first research on the class-loaders issue.

    • Best practices: keep track of components, every developer to provide certain info according to Axel´s email.

Wednesday 20 May 2009

  1. 9:00 – 9:30 (30 min) - Pre-review feedback and how WP5 is affected: should we change our direction in anything? - discussed in the plenary session, where Frank gave us feedback.
  2. 9:30 – 10:30 (60 min) - Distribution and remote execution: distributed containers, streaming, IBIS, cluster,…
    • Split of plug-in manager:
      • ship plug-in code to the remote location
      • ship input data (with JavaGAT, using GridFTP or other)
    • 1st step: having running remotely at all
    • 2nd step: which functionality the platform must have to decide when it is worth to have it running remotely.
    • Anytime behaviour: platform should provide facility to check whether the result file has changed or not.
    • *** We need intermediate results back and also intermetiate input: streaming input and streaming output.
    • How to pass from remote to remote without passing through the decider?
    • Can you start the job and at runtime get the input data before having to send it in advance?
    • Pull vs push data.
    • We need to analyse how the data is managed in the local case and see if it works for the remote one.
    • Remote plug-in manager is sent, wrapping the plug-in itself.
    • *** Job is submitted when the decide plug-in build the pipeline. For this purpose, we need to reserve the resources for a longer time. We would be wasting resources until input data arrives.
    • *** Data federation. Passing data by reference. Issue for Data Layer. (complete this issue description!!)
    • Two info must be passed to the plug-in: Control and Data
    • Local plug-in manager general + Remote plug-in manager specific for the concrete environment
    • *** Control flow messages
    • *** Using queues: data queue + control queue (notifications are sent through this queue)
    • How to solve the scheduling issue? running a plug-ins which take short will have no sense if we need to wait 10 min to start execution.
    • How to cope with unexpected issues: we need suddenly to communicate with a new plug-in or use a new data storage
    • Mick is currently changing PPS and Plug-in manager for the local case. HLRS must keep a look at the changes in order to be in line with the remote implmentation.
  3. 10:30 – 11:30 (60 min) - General Platform discussion: *** Not discussed during the meeting finally ***
    • Platform as a Server vs. SW component
    • Extra support Always on vs. Only when needed
    • Plug-in registry: how much it´s dependant on Cyc,how the registration and discovery is working, grade of automation and how much plug-in developers should learn about it, …
    • Use of Semantic Web Service technology
  4. 11:30 – 11:50 (20 min) - AOB
  5. 11:50 – 12:00 (10 min) - Summary Next Steps

Thursday 21 May 2009

LarkcProject/WP5/Minutes/20090518-21 (last edited 2009-05-20 23:02:54 by ?GeorginaGallizo)