Statistical semantics in LarKC

Note: If you are working with large corpora you might need to look at this page where HRLS describes how Random Indexing can be run on the cluster using their MPI implementation on the top of Airhead library.

This page contains information about application of statistical semantics methods such as LSA and Random Indexing to RDF graphs. Application of these methods to RDF graphs includes the following steps:

From this page, you can assess virtual documents and semantic indexes already generated, as well as the code which is used. In addition, there are references to the code used for search (see the bottom of this page).

Examples

Dataset: DBpedia 3.6. Method: Random Indexing doc-doc cosine similarity search

Input URI: http://dbpedia.org/property/spouse

0.73833865:http://dbpedia.org/property/spouse
0.63593835:http://mpii.de/yago/resource/Ernest_Lehman
0.6295992:http://mpii.de/yago/resource/Chazz_Palminteri
0.6290349:http://mpii.de/yago/resource/Sidney_Morgan
0.62235934:http://mpii.de/yago/resource/Richard_Neill
0.6209119:http://mpii.de/yago/resource/Alfred_Herman
0.6178053:http://mpii.de/yago/resource/Herbert_Rawlinson
0.61268044:http://mpii.de/yago/resource/Alice_Pearce
0.6112338:http://mpii.de/yago/resource/Vincent_Sherman
0.6104505:http://mpii.de/yago/resource/Elmo_Williams
0.61020947:http://dbpedia.org/resource/Barbara_Hayden
0.60907245:http://mpii.de/yago/resource/Bernard_Herzbrun
0.6087178:http://mpii.de/yago/resource/Holmes_Herbert
0.6072172:http://en.wikipedia.org/wiki/Jane_Lawrence
0.60721505:http://mpii.de/yago/resource/Barry_Levinson
....

Input URI: http://dbpedia.org/resource/Belgrade

Output:

0.65238595:http://mpii.de/yago/resource/Belgrade
0.614219:http://dbpedia.org/class/yago/Singi
0.59367216:http://dbpedia.org/resource/Prinz-Eugenstadt
0.5876824:http://dbpedia.org/resource/N%C3%A1ndorfeh%C3%A9rv%C3%A1r
0.58644086:http://dbpedia.org/resource/Capital_of_Serbia_and_Montenegro
0.5798055:http://dbpedia.org/resource/Bgd
0.57569927:http://dbpedia.org/resource/Belograd
0.5717645:http://dbpedia.org/resource/Belgrade,_Serbia
0.5700993:http://dbpedia.org/resource/Beograd
0.57004863:http://dbpedia.org/resource/UN/LOCODE:RSBEG
0.569855:http://dbpedia.org/resource/Belgrade_District
0.5643723:http://dbpedia.org/resource/Nandorfehervar
0.56242085:http://dbpedia.org/resource/Prinzeugenstadt
0.5620222:http://en.wikipedia.org/wiki/Belgrade
0.5614954:http://dbpedia.org/resource/Capital_of_Yugoslavia
0.55826:http://dbpedia.org/resource/Europe/Belgrade
0.55814195:http://dbpedia.org/resource/Belgrad
0.5569123:http://dbpedia.org/resource/%D0%91%D0%B5%D0%BE%D0%B3%D1%80%D0%B0%D0%B4
0.5506616:http://dbpedia.org/resource/City_of_Belgrade
0.5500857:http://dbpedia.org/resource/UN/LOCODE:CSBEG
0.54836565:http://dbpedia.org/resource/Belgrade,_Yugoslavia
0.54535824:http://dbpedia.org/resource/Capital_of_Serbia
...

Input URI: http://dbpedia.org/resource/Bill_Clinton

Output:

0.53870606:http://dbpedia.org/resource/I_did_not_inhale
0.5370285:http://dbpedia.org/resource/Category:Hillary_Rodham_Clinton
0.529921:http://en.wikipedia.org/wiki/Bill_Clinton
0.52962655:http://dbpedia.org/resource/President_Clinton
0.52633643:http://dbpedia.org/resource/Clinton_Gore_Administration
0.5246629:http://dbpedia.org/resource/William_J._Blythe_III
0.52457106:http://dbpedia.org/resource/President_Bill_Clinton
0.52424276:http://mpii.de/yago/resource/Bill_Clinton
0.5193237:http://dbpedia.org/resource/Buddy_%28Clinton%27s_dog%29
0.51768357:http://dbpedia.org/resource/William_Jefferson_Blythe_IV
0.51679724:http://dbpedia.org/resource/Klin-ton
0.51374197:http://dbpedia.org/resource/William_J_Clinton
0.51278436:http://dbpedia.org/resource/William_J._Clinton
0.5085358:http://dbpedia.org/resource/Bill_Blythe_IV
0.5065459:http://dbpedia.org/resource/William_Clinton
0.5056521:http://dbpedia.org/resource/42nd_President_of_the_United_States
0.5040903:http://dbpedia.org/resource/William_J._Blythe
0.5056521:http://dbpedia.org/resource/42nd_President_of_the_United_States
0.5040903:http://dbpedia.org/resource/William_J._Blythe
0.5034488:http://dbpedia.org/resource/Billl_Clinton
0.50080174:http://dbpedia.org/resource/William_Jefferson_Blythe_III
0.49960986:http://dbpedia.org/resource/Bill_Klinton
0.498533:http://dbpedia.org/resource/Billll_Clinton
0.49698466:http://dbpedia.org/resource/William_Jefferson_Clinton
0.4968173:http://dbpedia.org/resource/Bill_J._Clinton
0.49598023:http://dbpedia.org/resource/Clinton,_Bill
0.49558133:http://dbpedia.org/resource/Billy_Clinton
0.4950436:http://dbpedia.org/resource/Bill_Clinton%5C
0.49453247:http://dbpedia.org/resource/BillClinton
0.49413666:http://dbpedia.org/resource/WilliamJeffersonClinton
0.49355277:http://dbpedia.org/resource/Bill_clinton
0.4904687:http://dbpedia.org/resource/Willam_Jefferson_Blythe_III
0.49022478:http://dbpedia.org/resource/William_%22Bill%22_Clinton
0.49005547:http://dbpedia.org/resource/William_Jefferson_%22Bill%22_Clinton
0.48986775:http://dbpedia.org/resource/Putting_People_First
0.48948374:http://dbpedia.org/resource/Bil_Clinton
0.48931384:http://dbpedia.org/resource/William_Blythe_III
0.48818415:http://dbpedia.org/resource/Bull_Clinton

....

Data

Code

The code which is used for application of statistical semantics within LarKC is available through plugins of the LarKC platform from http://sourceforge.net/projects/larkc/. In particular, there is RDFToText Transformer plugin (LarKC version 1.0) which generates virtual documents from an RDF graph, and the Random Indexing Selecter for applying Random Indexing to virtual documents (LarKC version 1.0). This code wraps open source SemanticVectors and Airhead libraries.

The latest plugins and workflows that use statistical semantics are available from: http://wiki.larkc.eu/LarkcProject/WP2/workflows

The semantic spaces which can be accessed from this page use Semantic Vectors library.

Read more

Questions?

Email us.

LarkcProject/statisticalSemantics (last edited 2011-09-12 11:13:49 by ?DanicaDamljanovic)