Workflows
Contents
Contents
Running workflows in LarKC
These are fairly brief instructions on how to run workflows in LarKC. If you need more details refer to the user manual available from the Sourceforge: http://sourceforge.net/projects/larkc/support.
Build and run the platform and plugins first
check out larkc (svn co https://larkc.svn.sourceforge.net/svnroot/larkc larkc)
- cd trunk/platform
- mvn clean install -Dmaven.test.skip=true
NOTE: if you are lazy you might want to skip getting the latest version of Larkc from trunk and building it; alternatively, download the binary version from SF and launch it using run_larkc.bat/.sh;
- build the plugin that you need for your workflow
- Copy plugin jars (from THE-PLUGIN-FOLDER/target into platform/plugins
- cd platform
- set up the larkc project in Eclipse (see the manual for how to do this, also recommended to install m2eclipse plugin); NOTE: it is currently not possible to run larkc using run_larkc.bat/.sh script if you are building everything yourself
right click on the eu.larkc.core.Larkc class >> Run as Application
- after a while the larck will be initialised on port 8182
go to http://localhost:8182 to see the larkc management interface
Initialise and run workflow using the management interface (currently broken, use the larkc designer instead)
- copy paste the workflow description and click submit (do not change the link localhost:8182/workflow)
- this will return the workflow id
- go back to localhost:8182/workflows
- this will show your workflow and the endpoint: note that here the endpoint is some ip address but you need to change it to localhost if this is happening on your own machine
go to localhost:8182 and enter query and endpoint url eg http://localhost:8184 and click submit
Initialise and run workflow using the larkc designer (recommended)
- you need to have tomcat installed
- copy trunk/wfd/Larkc.war into TOMCAT-DIR/webapps
- start tomcat (catalina run)
- go to localhost:8080/LarKC and the plugins should appear on the left
- drag them into the workspace in the order you need to connect them
- add input and output
- connect input and output and plugins (it is important to make the connections properly)
- click initialise to initialise the worklfow
- after succesfull initialisation enter a SPARQL query and click Run
Query Expansion Workflow
What is the purpose of the workflow?/Which problem does it solve? This workflow will expand the result set of the original SPARQL query with the additional statements that are 'relevant' according to the Random Indexing method.
What is it good for?/Why should I use it? This workflow should be used in cases when the original SPARQL query does not return enough/satisfying results - it will expand the original SPARQL query by adding UNION statements which take into account similar literals/URIs to those which appear in the original query, thus increasing recall.
When should I not use it? The workflow should not be used for SPARQL queries which already return a large number of results.
What dataset does it use? This workflow uses http://linkedlifedata.com dataset (life science).
Can I use the same workflow with a different dataset? Yes, it is possible to use different dataset, but there are two preconditions:
You need to build your own semantic space using ?AirHead S-Space Package which contains a collection of algorithms for building Semantic Spaces (http://code.google.com/p/airhead-research/). See http://wiki.larkc.eu/LarkcProject/statisticalSemantics
You need to use different (implement new) plugin instead of LLDReasoner. LLDReasoner works by remotely accessing http://linkedlifedata.com
How to get Query Expansion workflow running?
The required plugins
- cd plugins/RISearchPlugin
- mvn clean install
- cd plugins/QueryExpansion
- mvn clean install
- cd plugins/LLDReasoner
- mvn clean install
- cd plugins/SOStoVBTransformer
- mvn clean install
Initialise and run workflow (using the larkc designer)
- drag the 4 plugins into the workspace in the same order as you built them
- connect them: 1-2-3-4
add input and connect it to 1 and 2 (important to connect input to both plugins otherwise it will not work!)
- add output and connect 4 to output
- for RISearchPlugin click 'add' to add parameter and enter the full path to the semantic space, e.g.:
- larkc:inputPath "c:/projects/larkc/branches/RandomIndexing/MPI-Search/use_cases/lld1-ah-params-1000-4/lld1-docs-vectors.sspace"
- click initialise to initialise the worklfow
- after succesfull initialisation enter a SPARQL query and click Run, e.g.
- SELECT ?s ?p ?o WHERE { { ?s ?p ?o . ?s ?p "asthma"} }
Need help?
Did it work for you? Email danica.damljanovic@gmail.com or larkc-dev-support@lists.sourceforge.net for any comments regarding these instructions or to get help.
Baseline Selection workflow
What is the purpose of the workflow?/Which problem does it solve? The purpose of the workflow is to provide a baseline for selection methods. It selects RDF molecules that have something to do with the keywords mentioned in the input query
What is it good for?/Why should I use it? The workflow is good for evaluation/benchmarks.
When should I not use it? For all purposes different than plain information retrieval.
What dataset does it use? Any (for example http://www.w3.org/TR/owl-guide/wine.rdf)
Can I use the same workflow with a different dataset? Yes.
How to get this workflow running?
The required plugins
BaseLineFTSelecter
Initialise and run workflow
- start the platform
- open the workflow designer
pipe together input -> BaseLineFTSelecter -> output
- add a parameter to the BaseLineFTSelecter plugin instance and set its value to one of the following:
<http://larkc.eu/schema#graph> <http://www.w3.org/TR/owl-guide/wine.rdf> . - if you want to specify input RDF graph
<http://larkc.eu/schema#endpoint> <http://path/to/a/sparql/endpoint> . - if you want input to be obtained from a SPARQL endpoint instead
- initialize workflow
- execute query
Need help?
Email ivan.peikov@ontotext.com or larkc-dev-support@lists.sourceforge.net for any comments regarding these instructions or to get help.
Spreading Activation workflow
What is the purpose of the workflow?/Which problem does it solve? The workflow select the statements in the local data store that remain active (primed) after the input query has been used for spreading activation source (with default parameters) and the activation has faded away.
What is it good for?/Why should I use it? It is good for large data sources that need to first be reduced before usage. For instance when the input data has to be used in an algorithm that is exponential on the input size such a reduction could make the computation feasible.
When should I not use it? When spreading activation is appropriate for the graph reduction.
What dataset does it use? Any
Can I use the same workflow with a different dataset? Yes
How to get this workflow running?
The required plugins
SASelecter and SOStoVBtransformer.
Initialise and run workflow
- start the platform
- open the workflow designer
pipe together input -> SASelecter -> SOStoVBtransformer -> output
- initialize workflow
- execute query
Need help?
Email ivan.peikov@ontotext.com or larkc-dev-support@lists.sourceforge.net for any comments regarding these instructions or to get help.
Active Academic Visit Recommendation workflow
What is the purpose of the workflow?/Which problem does it solve?
The purpose of this workflow is to recommend interesting persons or organizations to end users for academic visit among a large number of possible candidates. It is aimed at demonstrating the power of interests-based selection for large scale reasoning.
Description of the workflow: The workflow can be downloaded from http://www.wici-lab.org/wici/larkc/workflows/aavra/
For interests-based selection, what is it good for?/Why should I use it?
It can shorten the selection time, and at the same time, the selected triples are very relevant to user background. In addition, it can also reduce the amount of results so that users do not have to brows too many irrelevant ones.
When should I not use it? If you do not want to get query results that are related to your previous or recent interests.
What dataset does it use? It uses triples from identifiers. In our samples, the data are from Semantic Web Dog Food and triplified Twitter data.
Can I use the same workflow with a different dataset? Yes.
How to get this workflow running?
Required plugins
(1) Personal Twitter Identifier : http://wiki.larkc.eu/LarkcPlugins/Personal-Twitter-Identifier
(2) Interests-based Selecter : http://wiki.larkc.eu/LarkcPlugins/Interests-based-Selecter
(3) SPARQL Query Evaluation Reasoner
- Input: stated in the workflow description, such as a specific user name "Frank van Harmelen"
- Output: triples that are related to user query, in addition, each of them are also related to "Frank van Harmelen" identified real time from Twitter.
To be continued.
Need help?
Did it work for you? Email Yi Zeng ( yizeng@bjut.edu.cn ) or larkc-dev-support@lists.sourceforge.net for any comments regarding these instructions or to get help.
Subsetting Workflow
What is the purpose of the workflow?/Which problem does it solve? This workflow will extract keywords from the the original SPARQL query , will use those keywords to create a document in a wikipedia semantic space, compare the created document with all other documents in the wikipedia space ( over 1 Million docs, each doc being a Wikipedia article) , and will create RDF statements using the most similar documents. (created RDF's will have the form <wikipedia title> skos:related <wikipedia URL>
What is it good for?/Why should I use it? This workflow should be used in cases when you want to get similar docs to your query.
When should I not use it?
What dataset does it use? This workflow uses the http://en.wikipedia.org/wiki/Wikipedia:Database_download#English-language_Wikipedia dataset ( a wikipedia dump ). We extracted all stopwords from the set, and all non-alpha characters. After that we created a semantic space with all docs using the "semantic vectors" library ( seehttp://code.google.com/p/semanticvectors ). The parameters for the space creation were : dimension=1000 , seedlength=2 ;
Can I use the same workflow with a different dataset? At the moment the plugin works with the provided dataset, as there are some steps involved in the creation of additional files that are not explained here ( for example the creation of an extra lucene index , i.e. an index that will map doc numbers to doc titles). For downloading the ready-to-go semantic space and the needed lucene index see the instructions at the RISubsetting plugin page ( http://wiki.larkc.eu/LarkcProject/RandomIndexingPlugins#RISubsettingPlugin )
How to get Subsetting workflow running?
The required plugins
- cd plugins/RISubsettingPlugin
- mvn clean install
- cd plugins/SOStoVBTransformer
- mvn clean install
Initialise and run workflow (using the larkc designer)
- drag the 2 plugins into the workspace in the same order as you built them
- connect them: RISubsettingPlugin to the SOStoVBTransformer
add input and connect it to the RISubsettingPlugin .
- add output and connect the SOStoVBTransformer to the output
- click 'add' to add parameter and enter the full path to the semantic space , the path to the titles index directory and the number of similar docs in the semantic space that should be used to create the RDF statements , e.g.:
larkc:titleIndexPath "/path/2/Index"
- larkc:pathToTermvectors "/path/2/term/vectors"
- larkc:pathToDocvetors "/path/2/doc/vectors"
larkc:numberOfSimilarDocs "?NrOfDocs"
- click initialise to initialise the worklfow
- after succesfull initialisation enter a SPARQL query and click Run, e.g.
SELECT ?s ?p ?o WHERE { { ?s ?p ?o . ?s ?p "anarchism"} }
- The plugin should return results like
mpib:'Victor Yarros' skos:related http://en.wikipedia.org/wiki/Victor_Yarros
mpib:'Social anarchism' skos:related http://en.wikipedia.org/wiki/Social_anarchism
mpib:'Sam Dolgoff' skos:related http://en.wikipedia.org/wiki/Sam_Dolgoff
mpib:'Lifestyle anarchism' skos:related http://en.wikipedia.org/wiki/Lifestyle_anarchism
Need help?
Did it work for you? Email R.Vidal (brandao.vidal@mpib-berlin.mpg.de) or larkc-dev-support@lists.sourceforge.net for any comments regarding these instructions or to get help.
Want to add a new workflow describing one of your plugins?
Follow the template below.
What is the purpose of the workflow?/Which problem does it solve?
What is it good for?/Why should I use it?
When should I not use it?
What dataset does it use?
Can I use the same workflow with a different dataset?
How to get this workflow running?
The required plugins
Initialise and run workflow
* Provide an easy testing example i.e. if I download larkc and plugins how do I use the workflow within few minutes?
Need help?
Did it work for you? Email NAME and Email and larkc-dev-support@lists.sourceforge.net for any comments regarding these instructions or to get help.
