Teleconference: Integration of WP2, WP3 and WP7a
Tue 27th October, 13:00-14:00 CET
Agenda
- how we could use ML methods for selection (25 min)
- using active learning for selection?
- how we could evaluate selection methods on AZ usecase (25 min):
Interfaces: replicate gotopubmed (http://www.gopubmed.com) interface using for example facetted interface from Ontotext
Dataset: use ?LinkedLifeData
- Give starting problem to scientists (They use the system)
- We produce hypotheses. They cross the ones that are silly. We use this as data for training ML stuff.
- F2F meeting (end Nov/beginning Dec, Munich, Amsterdam, Sheffield)? (10 min)
- would this be useful?
- shall we organize it together with WP4 (end Nov/beginnig Dec in Amsterdam)?
- shall we invite use cases as well?
Minutes
Participants:
- Siemens: Volker Tresp, , Markus Bundschus, Yi Huang, and Achim Rettinger
- USFD: Johann Petrak, Danica Damljanovic
- Ontotext: Vassil Momchev
- IRF: Mihai Lupu
- AZ: Gunnar Engström, Bo Andersson
- MPG: Jose Quesada
- WICI: Yan Wang
Using ML methods for selection:
- using Active Learning (AL) in order to reduce the cost of training
- but you would need to have the training set for each query?
- what would be the decision the AL would make? (we don't have the final answer to this)
- we are trying to make a decision if the triple is relevant to the query or not as selection is query specific; however, seems like for each query we need a model as a problem of classification is that it would be query-dependant
- "selection could be seen as classification problem: relevant vs irrelevant"
- people seem to have different views on this especially with regagards to the discussion in Berlin where it was mentioned that it is not just relevant vs. irrelevant but also less relevant, more relevant, etc; however, at the end, for selection, we will have to make the decision where is the threshold so we end up with 2 groups: relevant and irrelevant
- how selection uses ML? what would be the benefit of using ML for selection? finding similar triples? we need to define this more precisely
- another interesting approach would be experimental design
- using Random Projection (Random Indexing)
- various views and pretty much agreement that using RI for selection is not really applicaple on RDF
- it would be more suitable to use RI for presenting results to the user than for the subsetting task
Vassil: a possible use case example
so, we don't want to think about SPARQL/RDF layer, and in this example there is a search with the predicate mentions; so we don't really want to apply RI to RDF but we would probably benefit from using RDF schema and documents (such as ?PubMed abstracts in this case)
- RDF Molecules: not all RDF resources should be molecules; it might be more useful to apply selection on RDF schema not RDF
- concrete scenario: you see the document and terms which are mentioned in that document;
- Johann: wouldn't this document be perfect for using RI as you have list of relevant terms; with RI you could feed in the terms semantically related which are not in the document; ideally, we would like to combine semantics with search
- various views and pretty much agreement that using RI for selection is not really applicaple on RDF
Evaluation of selection methods on AZ usecase:
Two ideas arised, both should be complementary. The first one is the one that evolved from the discussion in Berlin, and the second one is from the discussion today based on the AZ usecase document.
- Idea 1:
Interfaces: replicate gotopubmed (http://www.gopubmed.com) interface using for example facetted interface from Ontotext
Dataset: use ?LinkedLifeData
- Give starting problem to scientists (They use the system)
- We produce hypotheses. They cross the ones that are silly. We use this as data for training ML stuff.
- Idea 2:
- Example:
- IR gets you the 100 docs that are related
- human would produce hypothesis
- WP3 would take Ho and produce these bits (0 1, -1)
Example: abstract12314, Ho: WBC -> lowLungFunction, bit: -1 (not related)
AZ scientists have often observations about some relations, but they cannot anyhow prove these. Example in this document: AstraZeneca usecase
- it would be interesting to find out if relations (arrows in the document) exist, or even the direction of these relations
- Jose: related field which would be good for solving this problem is sentiment analysis
- is there evidence in abstracts for these relations?
- Example:
It is the general feeling that we need usecases in order to precisely formulate the problem, but it is also truth that usecases need to give more precise requirements. However, from the discussion today, we should have enough material to start about thinking of problem solving and the draft paper. We could for now do some experiments with LDSR as well.
F2F meeting (between 20 Nov and 4th Dec, probably Munich)?
- we don't want too many people, so might just have WP2+WP3 so probably organize it separately from WP2+WP4
- AZ usecase currently collects ideas for concrete requirements; after they refine these we might look at it and decide whether any of those usecase problems could be solved by selection; we then decide shall it be useful for AZ or Ontotext
