Information extraction
The page contains information about the information extraction process to be applied over ?LinkedLifeData resources. Our understanding is that the documents and their meta-data to be analyzed will be also part of knowledge base.
Corpora
See also WP7b information extraction
To perform correct information extraction we need a good evaluation criteria to measure the precision/recall.
Corpus |
Comment |
Size |
Converter to Gate |
GENIA |
|
2000 medline abstracts: terms “blood cells”, “human”, “transcription factors” |
no |
Biocreative I |
|
15000 (including testing data) |
yes |
Biocreative II |
|
15000 from ?BioCreative I + 5000 new |
yes |
Penn BioIE CYTP450 |
|
1100 Files, (370784) Base Annotations, (53875) Specific Annotations, (0) Relations, (1147) Chains |
no |
Penn BioIE Malignancy |
|
1157 Files, (341767) Base Annotations, (31886) Specific Annotations, (12) Relations, (1251) Chains |
no |
Possible biomedical entities to be extracted
See also WP7b information extraction
Entity |
Type |
GENIA |
Biocreative I |
Biocreative II |
Penn BioIE CYTP450 |
Penn BioIE Malignancy |
DNA |
Named Entity |
+ (no mapping to database entries) |
|
|
|
|
RNA |
Named Entity |
+ (no mapping to database entries) |
|
|
|
|
Cell Line |
Named Entity |
+ (no mapping to database entries) |
|
|
|
|
Cell Culture |
Named Entity |
+ (no mapping to database entries) |
|
|
|
|
Gene |
Named Entity |
+ (no mapping to database entries) |
+ (document level mappings to Entrez-Gene; fly, mouse, yeast) |
+ (document level mapping to Entrez-Gene; human) |
|
+ (categories: gene-protein?, gene-rna?, gene-generic?) |
CYP450? |
Named Entity |
|
|
|
? |
|
Substance |
Named Entity |
|
|
|
any protein, chemical etc. |
|
Quantative Measurements |
Named Entity |
|
|
|
+ (units, value, quantity) |
+ (units) |
Malignancy |
Named Entity |
|
|
|
|
+ |
Biological Process (hierarchy) |
Relation |
+ |
|
|
|
|
Artificial Process |
Relation |
+ |
|
|
|
|
Corelation (coocurence?) |
Relation |
+ |
|
|
|
|
Gene Variantions |
Relation |
|
|
|
|
+ (type, location, state-original, event) |
Protein Protein Interaction |
Relation |
|
|
+ |
|
|
POS |
Lexical |
+ |
|
|
+ |
+ |
Token |
Lexical |
+ |
|
|
+ |
+ |
Sentence |
Lexical |
+ |
+ |
+ |
+ |
+ |
Treebank |
Lexical |
+ |
|
|
+ |
+ |
