Meeting: WP1 Break out session
Agenda
Deliverable 1.4.1: Initial framework for measuring and evaluating heuristics problem solving
1. Discuss the status of contributions to D1.4.1
- Theory of Evaluation (MPG)
- Individual plugin evaluation (WP2, WP3 and WP4 plugins)
- Evaluation of the overall platform
- Quality measures for different types of reasoning (e.g. anytime, approximate reasoning)
- Integration of different evaluation criteria from different plugins into a coherent evaluation framework
- Definition of gold standards for each plugin
2. Terminology of evaluation theory (evaluation, benchmarking, validation of platform components)
- The goal is to agree on a commong terminology for evaluation and benchmarking of components (shared with evalation approaches in other WPs)
3. Setup a worlplan for the coming months (responsabilities and tasks)
Deliverable 1.2.2: Improved Operational Framework (month 24)
- Status of deliverable
Deliverable 1.1.4: Improved Knowledge Representation Language (month 24)
- Status of deliverable
Minutes
- For the purposes of evaluating each plug-in type we should consider generic criteria that are applicable/relevant for every plug-in type as well as task/plug-in specific criteria. E.g:
- Relevant criteria for selection (selection-specific criteria) (TODO WP2 Sheffield):
- precision
- recall
- ranking (possibly based on different aspects)
- Relevant criteria for Reasoners (TODO VUA):
- soundness
- completeness
- criteria for anytime and approximate reasoning methods
- Relevant criteria for transformation plug-ins (TODO Cefriel and Siemens):
- Evaluation criteria relevant for every plug-in type (TODO ALL)
- performance
- accuracy
- resource usage (time, memory, etc.)
- scalability (can be inferred from the measures of several other paramters/criteria)
look at others QoS parameters from ?D1.3.1 Initial Plug-in Annotation Language
- Relevant criteria for selection (selection-specific criteria) (TODO WP2 Sheffield):
- We should also report on common techniques for evaluating selection/transformation/reasoning strategies/methods used both in general and specifically in the context of LarKC (if any)
- For the specification of the initial evaluation framework we will focus on different evaluation criteria for each plug-in type, including evaluation criteria required/imposed by the different use-cases and criteria typically used for evaluation of selection, transformation and reasoning strategies.
- This deliverable will also specify an initial framework for combining the previous evaluation criteria into a method for evaluating the entire platform.
- The first draft will focus on the different evaluation criteria for each plug-in type while the second version of the document will focus on the definition of the initial framework.
Tentative Structure of the Deliverable
- Introduction
- Overview of LarKC Platform
- Theory of Evaluation from the Cognitive Science's point of view (MPG)
- Use of Heuristics from the Computer Science's point of view
- Mention the interpretation of the pipeline as an Heuristic Problem Solving Method
- Evaluation Techniques and Criteria for Selection/Identify Plug-ins (Sheffield)
- Evaluation Techniques and Criteria for Transformation Plug-ins (Siemens)
- Evaluation Techniques and Criteria for Reasoning Plug-ins (VUA)
- Specification of the Initial Evaluation Framework (ALL)
- Conclusion
Timeline and Responsabilities
- July 29th.: Deadline for first draft version (includes plug-ins evaluation criteria and methods)
- August 14th. (soft deadline): Deadline for 2nd. version (includes initial framework specification + modifications, improvements to the previous version)
- August 28th (hard deadline): Deadline for 2nd. version (includes initial framework specification + modifications, improvements to the previous version)
- Sept. 2nd.: Send to deliverable leader to control
- Sept. 9th.: Send to reviewers (2) TODO decide on reviewers
- Sept. 16th.: Send to WP leader for quality control
- Sept. 23rd.: Send to Frank
- Sept. 30th.: Send to PO
Scheduled Telcos
- 3rd. week of July
