Data Repository

This page describes the LinkedLifeData data source analysis process. It is used as main page for discussions between WP7a and WP7b about selection and transformation of individual structured databases to RDF.

Selected list of data sources

The list summarizes the data sources and their current status. Some of the data sources may be very big with respect to the information variety. In dataset column we denote which types of data are focus of our interest. The status column is "evaluation" (its structure and data is revised), "under development" (the transformation process is in implementation phase), "completed" (ready to be deployed on ?LinkedLifeData), "revised" (problems are detected and need to be fixed).

Database

Dataset

Schema

Description

Status

Uniprot

Curated entries

Original by the provider

Protein sequences and annotations

completed

Entrez-Gene

Complete

Custom RDF schema

Genes and annotation

completed

iProClass

Complete

Custom RDF schema

Protein cross-references

completed

Gene Ontology

Complete

Schema by the provider

Gene and gene product annotation thesaurus

completed

BioGRID

Complete

BioPAX 2.0 (custom generated)

Protein interactions extracted from the literature

completed

National Cancer Institute - Pathway Interaction Database

Complete

BioPAX 2.0 (original by the provider)

Human pathway interaction database

completed

The Cancer Cell Map

Complete

BioPAX 2.0 (original by the provider)

Cancer pathways database

completed

Reactome

Complete

BioPAX 2.0 (original by the provider)

Human pathways and interactions

completed

?BioCarta

Complete

BioPAX 2.0 (original by the provider)

Pathway database

completed

KEGG

Complete

BioPAX 1.0 (original by the provider)

Metabolic pathways

completed

?BioCyc

Complete

BioPAX 1.0 (original by the provider)

Metabolic pathways

completed

NCBI Taxonomy

Complete

Custom RDF schema

Organisms

completed

Medline

Complete

Custom schema

Medline citations

under development (to be verified only)

UMLS

SNODMED (no significant effort to include other also)

Custom schema

Meta-thesaurus

completed

TODO: Remove the differences between the schemata used by LifeSKIM application and these provided with ?LinkedLifeData.

TODO: Add extra column to group the knowledge sources for the different ?LinkedLifeData variants (e.g., PIKB)

Transformation to RDF

The section is aimed to the ?LinkedLifeData development contributor. It discuss how the different sources could be recreated. We should consider additional service to allow download of already generated data sources.

Database name

Last process release

Download link

ORDI descriptor

RDF schema

Converter

Short comment

?UniProt

14.0

ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/rdf

no (distributed in RDF)

ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/rdf/core.owl

The converter filters non-curated entries. Revise the changes in the blank node generation

Entrez-Gene

?

ftp://ftp.ncbi.nlm.nih.gov/gene/DATA

?

?

no (database dump)

we need also the db schema and script to import the data

?GeneOntology

5.631

http://archive.geneontology.org/latest-termdb/

?

?

no (database dump)

we need also the db schema and script to import the data

Taxonomy

?

ftp://ftp.ncbi.nih.gov/pub/taxonomy/

?

?

no (database dump)

we need also the db schema and script to import the data

UMLS

2008AA

http://umlsks.nlm.nih.gov/uPortal/tag.645d76c156782213.render.userLayoutRootNode.uP?uP_fname=umls-download

? , ?

?

no (database dump)

we need also the db schema and script to import the data

?DrugBank

?

http://drugbank.ca/downloads

?

?

?

the converter reads the first two fields. the dump has formatting problems

BioGRID

2.0.39

http://www.thebiogrid.org/downloads.php

?

http://www.biopax.org/release/biopax-level2.owl#

no (database dump)

the database is aligned to BioPAX schema

TODO: Complete the list with Medline and all BioPAX sources that requires transformations.

TODO: Revise the code of the ORDI descriptors/converters and upload the new ones to LarKC version control @ ?SourceForge.

LarkcProject/WP7a/DataRepository (last edited 2008-11-20 14:03:41 by AngusRoberts)