Random Indexing Plugins and Workflows within LarKC Platform
This page contains information about plugins/workflows available with the LarKC Platform that are based on Random Indexing.
Contents
Plugins and workflows for the LarKC Platform 2.0/2.5
The below described plugins (except the RISubsettingPlugin) are used within the Query Expansion workflow described here: http://wiki.larkc.eu/LarkcProject/WP2/workflows#QueryExpansionWorkflow
RISearchPlugin
Given a SPARQL query this plugin extracts URIs/literals and then uses them to search for the contextually related URIs/literals within the predefined semantic space. By default, it will return 20 similar URIs/literals, although this number can be specified through the _numOfWords_ parameter. The form of the SPARQL query is following the pattern:
SELECT ?s ?p ?o WHERE { { ?s ?p ?o . ?s ?p "asthma"} }
Note that you need to build your own semantic space using ?AirHead S-Space Package which contains a collection of algorithms for building Semantic Spaces (http://code.google.com/p/airhead-research/). You can also download some of the existing semantic spaces from http://wiki.larkc.eu/LarkcProject/statisticalSemantics
QueryExpansionPlugin
This plugin expands the original SPARQL query by adding additional UNION statements which are dinamically generated based on the list of contextually similar URIs/literals found by the RISearch Plugin. The form of the expected SPARQL query is following the pattern:
SELECT ?s ?p ?o WHERE { { ?s ?p ?o . ?s ?p "asthma"} }and this will be expanded to something similar to the following:
SELECT ?s ?p ?o WHERE { { ?s ?p ?o . ?s ?p "asthma"} UNION { ?s ?p ?o . ?s ?p "a contextually similar URI/literal to asthma"} UNION { ?s ?p ?o . ?s ?p "a contextually similar URI/literal to asthma"} }
LLDReasoner
This plugin evaluates the SPARQL query against the http://www.linkedlifedata.com SPARQL endpoint.
RISubsettingPlugin
This plugin accepts a SPARQL query as input and produces RDF statements as output. It basically:
1) Extracts the keywords from that query, append them to a single string;
2) projects the string ( sentence) created in the previous step in a multi-dimensional semantic space ( created using the semantic vectors library - see http://code.google.com/p/semanticvectors/), containing 1027355 wikipedia docs, creating this way a document representing that string in that semantic space.
3) calculates the cosines similarity between the document created in the previous step and all 1027355 wikipedia articles, returning the 10( default value) most similar docs to the query document.
4) creates RDF-Triples using the doc titles and doc-uri's of those 10 most similar docs that were found.
The needed parameters to be passed are
titleIndexPath : the path to the directory where the wikipedia-titles lucene index
pathToTermvectors : the path to the semantic-space terms vector ( termsvectors.bin ) .
pathToDocvetors : the path to the semantic-space terms vector (docvectors.bin ).
numberOfSimilarDocs : the number of similar documents that should be returned.
- the needed files ( "termsvectors.bin" , "docvectors.bin" and " titles_index.zip" need to be copied to the machine where the plugin runs ) can be downloaded from
.
( unzip the "titles_index.zip" archive after downloading. It will create a directory).
An Input SPARQL Query like
SELECT ?s ?p ?o WHERE { { ?s ?p ?o . ?s ?p "anarchism"} }will produce the outputmpib:'Victor Yarros' skos:related http://en.wikipedia.org/wiki/Victor_Yarros mpib:'Social anarchism' skos:related http://en.wikipedia.org/wiki/Social_anarchism mpib:'Sam Dolgoff' skos:related http://en.wikipedia.org/wiki/Sam_Dolgoff mpib:'Lifestyle anarchism' skos:related http://en.wikipedia.org/wiki/Lifestyle_anarchism
Plugins and workflows for the LarKC Platform 1.0
Random Indexing Transformer
Given a SPARQL query this plugin generates expanded SPARQL query by adding additional UNION statements with relevant literals/URIs as found by Random Indexing method.
Input: SPARQL query Output: expanded SPARQL query
Random Indexing Identifier
This plugin evaluates the expanded SPARQL query against www.linkedlifedata.com.
Input: expanded SPARQL query Output: set of statements as found in the www.linkedlifedata.com repository.
Random Indexing Decider
This plugin sets up a workflow which starts with a SPARQL query, and ends with the results of the expanded SPARQL query. This workflow should be used in cases when the original SPARQL query does not return satisfying results: it will apply random indexing method on the RDF graph, and expand the query by adding UNION statements which take into account similar literals/URIs to those which appear in the original query. The workflow should not be used for SPARQL queries which already return a large number of results. It is based on ?SemanticVectors (http://code.google.com/p/semanticvectors/) Random Indexing library and ?AirHead S-Space Package which contains a collection of algorithms for building Semantic Spaces (http://code.google.com/p/airhead-research/).
Input: SPARQL query Output: variable bindings
