Statistical semantics in LarKC
Note: If you are working with large corpora you might need to look at this page where HRLS describes how Random Indexing can be run on the cluster using their MPI implementation on the top of Airhead library.
This page contains information about application of statistical semantics methods such as LSA and Random Indexing to RDF graphs. Application of these methods to RDF graphs includes the following steps:
Generate virtual documents.
Generate semantic index.
Search. Given a topic of interest, find related URIs/literals.
From this page, you can assess virtual documents and semantic indexes already generated, as well as the code which is used. In addition, there are references to the code used for search (see the bottom of this page).
Examples
Dataset: DBpedia 3.6. Method: Random Indexing doc-doc cosine similarity search
Input URI: http://dbpedia.org/property/spouse
0.73833865:http://dbpedia.org/property/spouse 0.63593835:http://mpii.de/yago/resource/Ernest_Lehman 0.6295992:http://mpii.de/yago/resource/Chazz_Palminteri 0.6290349:http://mpii.de/yago/resource/Sidney_Morgan 0.62235934:http://mpii.de/yago/resource/Richard_Neill 0.6209119:http://mpii.de/yago/resource/Alfred_Herman 0.6178053:http://mpii.de/yago/resource/Herbert_Rawlinson 0.61268044:http://mpii.de/yago/resource/Alice_Pearce 0.6112338:http://mpii.de/yago/resource/Vincent_Sherman 0.6104505:http://mpii.de/yago/resource/Elmo_Williams 0.61020947:http://dbpedia.org/resource/Barbara_Hayden 0.60907245:http://mpii.de/yago/resource/Bernard_Herzbrun 0.6087178:http://mpii.de/yago/resource/Holmes_Herbert 0.6072172:http://en.wikipedia.org/wiki/Jane_Lawrence 0.60721505:http://mpii.de/yago/resource/Barry_Levinson ....
Input URI: http://dbpedia.org/resource/Belgrade
Output:
0.65238595:http://mpii.de/yago/resource/Belgrade 0.614219:http://dbpedia.org/class/yago/Singi 0.59367216:http://dbpedia.org/resource/Prinz-Eugenstadt 0.5876824:http://dbpedia.org/resource/N%C3%A1ndorfeh%C3%A9rv%C3%A1r 0.58644086:http://dbpedia.org/resource/Capital_of_Serbia_and_Montenegro 0.5798055:http://dbpedia.org/resource/Bgd 0.57569927:http://dbpedia.org/resource/Belograd 0.5717645:http://dbpedia.org/resource/Belgrade,_Serbia 0.5700993:http://dbpedia.org/resource/Beograd 0.57004863:http://dbpedia.org/resource/UN/LOCODE:RSBEG 0.569855:http://dbpedia.org/resource/Belgrade_District 0.5643723:http://dbpedia.org/resource/Nandorfehervar 0.56242085:http://dbpedia.org/resource/Prinzeugenstadt 0.5620222:http://en.wikipedia.org/wiki/Belgrade 0.5614954:http://dbpedia.org/resource/Capital_of_Yugoslavia 0.55826:http://dbpedia.org/resource/Europe/Belgrade 0.55814195:http://dbpedia.org/resource/Belgrad 0.5569123:http://dbpedia.org/resource/%D0%91%D0%B5%D0%BE%D0%B3%D1%80%D0%B0%D0%B4 0.5506616:http://dbpedia.org/resource/City_of_Belgrade 0.5500857:http://dbpedia.org/resource/UN/LOCODE:CSBEG 0.54836565:http://dbpedia.org/resource/Belgrade,_Yugoslavia 0.54535824:http://dbpedia.org/resource/Capital_of_Serbia ...
Input URI: http://dbpedia.org/resource/Bill_Clinton
Output:
0.53870606:http://dbpedia.org/resource/I_did_not_inhale 0.5370285:http://dbpedia.org/resource/Category:Hillary_Rodham_Clinton 0.529921:http://en.wikipedia.org/wiki/Bill_Clinton 0.52962655:http://dbpedia.org/resource/President_Clinton 0.52633643:http://dbpedia.org/resource/Clinton_Gore_Administration 0.5246629:http://dbpedia.org/resource/William_J._Blythe_III 0.52457106:http://dbpedia.org/resource/President_Bill_Clinton 0.52424276:http://mpii.de/yago/resource/Bill_Clinton 0.5193237:http://dbpedia.org/resource/Buddy_%28Clinton%27s_dog%29 0.51768357:http://dbpedia.org/resource/William_Jefferson_Blythe_IV 0.51679724:http://dbpedia.org/resource/Klin-ton 0.51374197:http://dbpedia.org/resource/William_J_Clinton 0.51278436:http://dbpedia.org/resource/William_J._Clinton 0.5085358:http://dbpedia.org/resource/Bill_Blythe_IV 0.5065459:http://dbpedia.org/resource/William_Clinton 0.5056521:http://dbpedia.org/resource/42nd_President_of_the_United_States 0.5040903:http://dbpedia.org/resource/William_J._Blythe 0.5056521:http://dbpedia.org/resource/42nd_President_of_the_United_States 0.5040903:http://dbpedia.org/resource/William_J._Blythe 0.5034488:http://dbpedia.org/resource/Billl_Clinton 0.50080174:http://dbpedia.org/resource/William_Jefferson_Blythe_III 0.49960986:http://dbpedia.org/resource/Bill_Klinton 0.498533:http://dbpedia.org/resource/Billll_Clinton 0.49698466:http://dbpedia.org/resource/William_Jefferson_Clinton 0.4968173:http://dbpedia.org/resource/Bill_J._Clinton 0.49598023:http://dbpedia.org/resource/Clinton,_Bill 0.49558133:http://dbpedia.org/resource/Billy_Clinton 0.4950436:http://dbpedia.org/resource/Bill_Clinton%5C 0.49453247:http://dbpedia.org/resource/BillClinton 0.49413666:http://dbpedia.org/resource/WilliamJeffersonClinton 0.49355277:http://dbpedia.org/resource/Bill_clinton 0.4904687:http://dbpedia.org/resource/Willam_Jefferson_Blythe_III 0.49022478:http://dbpedia.org/resource/William_%22Bill%22_Clinton 0.49005547:http://dbpedia.org/resource/William_Jefferson_%22Bill%22_Clinton 0.48986775:http://dbpedia.org/resource/Putting_People_First 0.48948374:http://dbpedia.org/resource/Bil_Clinton 0.48931384:http://dbpedia.org/resource/William_Blythe_III 0.48818415:http://dbpedia.org/resource/Bull_Clinton ....
Data
- Virtual documents:
Download here virtual documents generated from LLD1 dataset (size: ~231M).
Download here virtual documents generated from LLD2 dataset (size: ~1.3G).
- Random Indexing
Download here the semantic index generated using the following parameters on LLD1 dataset:
- dimensionality: 500
- seed length: 10
- minimum term frequency: 1
Download here the semantic index generated using the following parameters on LLD2 dataset:
- dimensionality: 500
- seed length: 10
- minimum term frequency: 1
- Latent Semantic Analysis
Download here the semantic index generated using the following parameters on LLD1 dataset:
- dimensionality: 300
Download here the semantic index generated using the following parameters on LLD2 dataset:
- dimensionality: 300
Code
The code which is used for application of statistical semantics within LarKC is available through plugins of the LarKC platform from http://sourceforge.net/projects/larkc/. In particular, there is RDFToText Transformer plugin (LarKC version 1.0) which generates virtual documents from an RDF graph, and the Random Indexing Selecter for applying Random Indexing to virtual documents (LarKC version 1.0). This code wraps open source SemanticVectors and Airhead libraries.
The latest plugins and workflows that use statistical semantics are available from: http://wiki.larkc.eu/LarkcProject/WP2/workflows
The semantic spaces which can be accessed from this page use Semantic Vectors library.
Read more
- Reports
- Publications
D. Damljanovic, J. Petrak, M. Lupu, H. Cunningham, M. Carlsson, G. Engstrom, B. Andersson: Random Indexing for Finding Similar Nodes within Large RDF graphs. In: Proceedings of the Fourth International Workshop on Resource Discovery, Collocated with the 8th Extended Semantic Web Conference (ESWC 2011). Heraklion, Greece (June 2011) PDF
M. Assel, A. Cheptsov, B. Czink, D. Damljanovic, J. Quesada: MPI Realization of High Performance Search for Querying Large RDF Graphs using Statistical Semantics . In: Proceedings of the 1st Workshop on High-Performance Computing for the Semantic Web, Collocated with the 8th Extended Semantic Web Conference (ESWC 2011). Heraklion, Greece (June 2011) PDF
D. Damljanovic, J. Petrak, H. Cunningham: Random Indexing for Searching Large RDF Graphs. In the Proceedings of the 7th Extended Semantic Web Conference (ESWC 2010), Springer Verlag, Heraklion, Greece, May 31-June 3, 2010. Poster session PDF
