Random Indexing Plugins and Workflows within LarKC Platform

This page contains information about plugins/workflows available with the LarKC Platform that are based on Random Indexing.

Plugins and workflows for the LarKC Platform 2.0/2.5

The below described plugins (except the RISubsettingPlugin) are used within the Query Expansion workflow described here: http://wiki.larkc.eu/LarkcProject/WP2/workflows#QueryExpansionWorkflow

RISearchPlugin

Given a SPARQL query this plugin extracts URIs/literals and then uses them to search for the contextually related URIs/literals within the predefined semantic space. By default, it will return 20 similar URIs/literals, although this number can be specified through the _numOfWords_ parameter. The form of the SPARQL query is following the pattern:

Note that you need to build your own semantic space using ?AirHead S-Space Package which contains a collection of algorithms for building Semantic Spaces (http://code.google.com/p/airhead-research/). You can also download some of the existing semantic spaces from http://wiki.larkc.eu/LarkcProject/statisticalSemantics

QueryExpansionPlugin

This plugin expands the original SPARQL query by adding additional UNION statements which are dinamically generated based on the list of contextually similar URIs/literals found by the RISearch Plugin. The form of the expected SPARQL query is following the pattern:

 SELECT ?s ?p ?o WHERE { { ?s ?p ?o . ?s ?p "asthma"} }

and this will be expanded to something similar to the following:

 SELECT ?s ?p ?o WHERE { { ?s ?p ?o . ?s ?p "asthma"} UNION  { ?s ?p ?o . ?s ?p "a contextually similar URI/literal to asthma"} UNION  { ?s ?p ?o . ?s ?p "a contextually similar  URI/literal to asthma"} }

LLDReasoner

This plugin evaluates the SPARQL query against the http://www.linkedlifedata.com SPARQL endpoint.

RISubsettingPlugin

This plugin accepts a SPARQL query as input and produces RDF statements as output. It basically:

1) Extracts the keywords from that query, append them to a single string;

2) projects the string ( sentence) created in the previous step in a multi-dimensional semantic space ( created using the semantic vectors library - see http://code.google.com/p/semanticvectors/), containing 1027355 wikipedia docs, creating this way a document representing that string in that semantic space.

3) calculates the cosines similarity between the document created in the previous step and all 1027355 wikipedia articles, returning the 10( default value) most similar docs to the query document.

4) creates RDF-Triples using the doc titles and doc-uri's of those 10 most similar docs that were found.

The needed parameters to be passed are

An Input SPARQL Query like

Plugins and workflows for the LarKC Platform 1.0

Random Indexing Transformer

Given a SPARQL query this plugin generates expanded SPARQL query by adding additional UNION statements with relevant literals/URIs as found by Random Indexing method.

Input: SPARQL query Output: expanded SPARQL query

Random Indexing Identifier

This plugin evaluates the expanded SPARQL query against www.linkedlifedata.com.

Input: expanded SPARQL query Output: set of statements as found in the www.linkedlifedata.com repository.

Random Indexing Decider

This plugin sets up a workflow which starts with a SPARQL query, and ends with the results of the expanded SPARQL query. This workflow should be used in cases when the original SPARQL query does not return satisfying results: it will apply random indexing method on the RDF graph, and expand the query by adding UNION statements which take into account similar literals/URIs to those which appear in the original query. The workflow should not be used for SPARQL queries which already return a large number of results. It is based on ?SemanticVectors (http://code.google.com/p/semanticvectors/) Random Indexing library and ?AirHead S-Space Package which contains a collection of algorithms for building Semantic Spaces (http://code.google.com/p/airhead-research/).

Input: SPARQL query Output: variable bindings

LarkcProject/RandomIndexingPlugins (last edited 2011-06-28 13:43:26 by ?rvidal)