Linked Life Data 0.2.x

Data integration process

FIXME: Improve the section

  1. Requirements to database (stable vs unstable id)
  2. Plan the data source cross links
  3. Develop a test SPARQL endpoint
  4. Test the data source cross links
  5. Consolidate the data in a single repository
  6. Calculate the inference closure

Data sources overview

Concept type

Primary data source

Secondary data source

Gene

Entrez-Gene

HGNC, Ensembl, DBpedia

Protein

UniProt

DBpedia

SNP

dbSNP

HapMap

Target

DrugBank

HMDB (Human Metabolome Database)

Interaction/Pathway

IntAct

BioPAX 2.0, BioGRID, DIP

Disease

UMLS (Unified Medical Language System) (SNOMEDCT, ICD10)

OMIM, MedPedia

Drug

DrugBank

DBpedia, DailyMed, HMDB (Human Metabolome Database)

Clinical trials (patient)

LinkedCT

-

Adverse event (patient)

FDA AERS (Adverse Events Reporting System)

-

Additional datasources

Databse

Description

PubMed

Gene Ontology

NCBI Taxonomy

iProClass

Possible cross-links between datasources

Protein concept database links

Database

Category

Link to database

Link to concept

Comment

Uniprot

GeneID

Entrezt Gene

Gene

Uniprot

Features

dbSNP

SNP

Uniprot

Binary Interactions

Intact

Interaction

Uniprot

Ontologies

Uniprot

Disease

Keywords assigned to proteins involved in a specific disease

Uniprot

Comments

?

Disease

Concepts occurred in textual field

Uniprot

Other Resources

Drugbank

Drug

Gene concept database links

Database

Category

Link to database

Link to concept

Comment

Entrez Gene

mRNA and Protein(s)

Uniprot

Protein

Entrez Gene

Genotypes

dbSNP

SNP

Entrez Gene

Interactions

BIND

interaction

Entrez Gene

See related

OMIM

Disease

Entrez Gene

Phenotypes

OMIM

Disease

Entrez Gene

Additional Links

OMIM

Disease

SNP (Single Nucleotide Polymorphisms) concept database links

Database

Category

Link to database

Link to concept

Comment

dbSNP

geneID

Entrez Gene

Gene

Hapmap

dbSNP report

dbSNP

SNP

Hapmap

Ensembl SNPview

Ensembl

SNP

Drug concept database links

Database

Category

Link to database

Link to concept

Comment

Drugbank

Target Gene Name

Drugbank

Target

Links to internal database identifiers

Drugbank

Target UniprotKB/Swiss-Prot ID

Drugbank

Target

Links to internal database identifiers

Drugbank

Target Gene Name

dbSNP

Drugbank

Links to internal database identifiers

Drugbank

Indication

UMLS

Disease

Concepts occurred in textual field

Drugbank

Indication

OMIM

Disease

Concepts occurred in textual field

Drugbank

FDA label

FDA

Adverse event

Concepts occurred in PDF documents

Drugbank

CAS Registry Number

DBpedia

Drug

DBpedia

drugbank

Drugbank

Drug

Drug targets concept database links

Database

Category

Link to database

Link to concept

Comment

Drugbank

Target Gene Name

Entrez Gene

Gene

Drugbank

Target UniprotKB/Swiss-Prot ID

Uniprot

protein

Drugbank

Target Gene Name

dbSNP

SNP

Via Entrez-Gene cross-reference links

Drugbank

Target UniprotKB/Swiss-Prot ID

Intact, DIP

Interaction

Via Uniprot cross-reference links

Drugbank

Indication

UMLS

Disease

Concepts occurred in textual field

Drugbank

Indication

OMIM

Disease

Concepts occurred in textual field

Protein-Protein Interactions concept database links

Database

Category

Link to database

Link to concept

Comment

Intact

Interacting molecules(Identifier)

Uniprot

Protein

BioGRID

Links

Uniprot

Protein

BioGRID

Links

Entrez-Gene

Gene

BioGRID

Links

HGNC

Gene

BioGRID

Links

OMIM

Disease

DIP

Cross Reference(Swiss-Prot)

Uniprot

Protein

Adverse events concept database links

Database

Category

Link to database

Link to concept

Comment

FDA AERS

Drug

Drugbank

Drug

Concepts occurred in textual field

FDA AERS

Indications

UMLS

Disease

Concepts occurred in textual field

TODO: Generate datasources integration schema image

Database specific information

Database name

Concepts topic

Last processed release

Download link

Schema

Converter

Size as NTriples (Mb)

Entrez-Gene

gene

Sep 23, 2008

Gene download

EntrezGene.owl (custom)

500

Uniprot

protein

14.4

UniProt RDF download

core.owl (original by the provider)

56400

BioGRID

Interaction/Pathway

2.0.39

BioGRID download

biopax-level2.owl (original by the provider)

322

dbSNP

SNP

Drugbank

Drug, Target

2.5

DrugBank download

LinkedCT

Clinical Trials

LinkedCT download

FDA AERS

adverse events

AERS download

DBpedia

Drug, Protein, Gene

-

DBpedia download

dbpedia-ontology.owl (original by the provider)

LarkcProject/WP7a/DataRepository2 (last edited 2009-03-13 13:37:54 by ?DeyanPeychev)