Parallelisation for identifying and selection

As mentioned in this page, in order to do within-plugin parallelisation, we need to prepare the following and send it to HRLS:

1.- Algorithm (in general, source code of the plug-in, and whether it is already embedded within a pipeline/workflow, has the LarKC API, etc. Or source code of the complete workflow, if it exists)

2. Data set

3. Input query

4. Expected output

5. Performance metrics and some evaluation results (before parallelization)

6. Guidelines on how to test the code (together with the sw to do so)

Random Indexing

Both USFD and MPG are doing experiments with Random Indexing (semanticVectors and airHead libraries). It takes quite a long time to generate vectors for huge datasets such as those we experiment with in LarKC. Moreover, it takes a while to calculate inner product between query vector and semantic space. Could parallelisation help here?

SemanticVectors (Proposal by MPG)

Prepare a self contained example for HLRS and send them in order to parallelise semanticVector library?

AirHead

Requirement from

USFD

USFD Participants

Danica Damljanovic

HLRS Participants

Georgina Gallizo

Both generating vectors and searching is time-consuming on the subsets of LLD with which we are experimenting. We are currently in the process of trying to parallelize these with the help from HLRS.

Another possible case for parallelisation is generation of text from an RDF Graph.

User interests based Selection and Query Refinement (by WICI)

Requirement from

WICI

WICI Participants

Yi Zeng, Yan Wang

HLRS Participants

Georgina Gallizo

For user interests based selection, mainly 3 set of tasks could be parallelized:

(1) User interests extraction and calculation;

(2) Interests based Query refinement processing;

(3) Interests based Selection.

For (2), currently, there are two types of query refinement processing that could be parallelized:

The materials prepared for HLRS (already sent to Katharina) is attached here WICI-query-parallelization.rar, including:

  1. Algorithm;
  2. Dataset (2);
  3. Input query;
  4. Expected output samples;
  5. Performance metrics and some evaluation results;
  6. Guidelines on how to test the code;
  7. How it can be parallelized;

Currently, the program is running under the following environment.

LarkcProject/WP2/parallelisation (last edited 2010-03-29 13:02:57 by ?KatharinaBenkert)