D7a.2.1 Pathway and Interaction Knowledge Base
Type: other (knowledge base) Scope: public Delivery date: M18 09-16 September 09 (1 week), Quality Assessor = person outside the WP in which deliverables is produced, 16-23 September 09 (1 week), Quality Controller = WP leader, 23-30 September 09 (1 week), buffer/ check by Frank van Harmelen, 30 September 09, submission to EC.
Contents
Introduction (Problem description)
Questions
Current page to present the questions is: LarkcProject/WP7a/Questions
Type of entities, relations and data sources
The chapter outlines the core entity types, relations and data sources to generate them. Some of the information will be dereived as result of information extraction process of various textual documents.
Basic Concept Types |
||||
Concept |
Identifier |
Data source |
Example (all URIs should be resolveable!) |
Comment |
Gene |
?EntrezGene identifier |
?EntrezGene |
|
|
Protein |
Uniprot primary accession number |
Uniprot |
|
|
Pathway + classification |
|
?PathwayCommons (Intact, Reactome, BioGRID, NCI-Nature) + Pathway Ontology (SKOS) |
|
|
Disease and disored |
Disease ontology identifier |
Disease ontology (OBO) |
http://linkedlifedata.com/resource/diseaseontology/id/DOID:766 |
The category may be too broad |
Drug Active Substance |
?DrugBank identifier/ CAS number |
?DrugBank (LODD) |
http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugs/DB00002 |
|
Symptom |
Symptom ontology identifier |
Symptom ontology (OBO) |
? |
|
Phenotype |
Human phenotype ontology identifier |
Human phenotype (OBO) |
? |
|
Anatomical loci |
? |
NCI (OBO) |
? |
? |
Tissue |
? |
BRENDA tissue / enzyme source? (OBO) |
? |
|
Cell type |
? |
BRENDA tissue / enzyme source? (OBO) + ontology population + Cell type (OBO) |
? |
?BioTagger has a trained model to recognize the concepts |
Cell line |
? |
BRENDA tissue / enzyme source? (OBO) + ontology population |
? |
?BioTagger has a trained model to recognize the concepts |
Company |
Wikipedia page |
DBPedia |
|
|
Cellular component |
GO identifier |
GO (OBO/RDF?) |
http://linkedlifedata.com/resource/GeneOntology/id/GO:0005739 |
|
Biological process |
GO identifier |
GO (OBO/RDF?) |
http://linkedlifedata.com/resource/GeneOntology/id/GO:0030154 |
|
Molecular function |
GO identifier |
GO (OBO/RDF?) |
http://linkedlifedata.com/resource/GeneOntology/id/GO:0006355 |
|
Document |
?PubMed id + other? |
Pubmed + other? |
Other document could be also integrated |
|
Author |
?PubMed authors |
Generated from First Last name and the initials |
|
Duplications are possible |
Chemical |
CAS |
? |
? |
? |
Relation (all could be a subject of information extraction process; the arguments may be optional) |
|||
Relation |
Identifier |
Datasource |
Comment |
Interaction (?PhysicalEntity(Gene/Protein)+, Pathway?) |
? |
?PathwayCommons? |
Possibly redundant |
Target (Gene/Protein, Drug active substance) |
?DrugBank target identifier |
?DrugBank |
http://www4.wiwiss.fu-berlin.de/drugbank/resource/targets/228 |
Drug Product (Company name, Drug Active Substance+, Route of Administration, Region) |
? |
Wikipedia/FDA/Dailymed? |
|
Indication (Drug Product/Drug Active Subtance?, Disease/Symptom) |
? |
?DrugBank |
Information extraction indication field |
Gene function (Gene, Molecular function/Cellular component/Biological process) |
? |
?EntrezGene / Uniprot / GO |
|
Treatment (???) |
? |
? |
very unclear concept; maybe it will be better if we can replace with clinical trial |
Concepts or relations that have been excluded from the initial list:
- Concept (the original idea was to model semantic annotations; it will be difficult if we mix it with the system types)
- Country / target region (for now it's not clear in what type of relation we can use it)
- Drug type (it will be part of the drug meta-data)
- Investigation (unclear)
- Metabolizing enzyme (unclear)
- Molecular profile (unclear)
- Physiological process (in general there is a problem how we can model process above molecular level)
- Target function general (unclear)
- Target function specific (unclear)
Data source transformation & instances mappings
To describe the OBO to SKOS transformations
To describe the other required transformations in order to "link data"
Reasoning schema & requirements
Schema reasoning
To successfully implement WP7a M18 prototype we required atleast the support of SKOS schema which invovles heavy usage of:
- Symetric properties (skos:related)
- Transitive properties (skos:broaderTransitive/skos:narrowTransitive)
Example:
<A> skos:broader <B> . <B> skos:broader <C> . entails <A> skos:broaderTransitive <B> . <B> skos:broaderTransitive <C> . <A> skos:broaderTransitive <C> .
Another example used for the purpose of semantic data integration is the alignement of different biomedical thesaurus:
<A> skos:broadMatch <B> . entails <A> skos:mappingRelation <B> . <A> skos:broader <B> . <A> skos:broaderTransitive <B> . <A> skos:semanticRelation <B> . <A> rdf:type skos:Concept . <B> rdf:type skos:Concept .
Inconsistency rules
<Love> skos:prefLabel "love"@en ; skos:prefLabel "adoration"@en .
