This deliverable presents two methods for doing subsetting of large knowledge bases. The first method is statistical semantics, particularly random indexing (RI) and explicit semantic analysis (ESA). These techiques assume that meaning can be represented in a metric space, and each concept is assigned a vector. The dataset we use for testing is the largest reason-able semantic repository: LDSR [Kiryakov et al., 2009]. The second method is to use user interests. In the second part, WICI continues work in D2.3.1, by using Cognitive memory retention like models, the work presented extracts users interests and uses them to refinequeries so that more personalized results can be acquired. Recently, WICI have found that more powerful interests vocabulary should be provided to standardize the user interests description in various use cases. In this deliverable, firstly we introduced our recent effort on the vocabulary standardization of user interests description by extending the FOAF vocabulary.

The full text is here: (2010-03-30 07:08:04, 7862.9 KB) ALL-larkc232.pdf

LarkcProject/WP2/D2.3.2 (last edited 2010-03-30 07:14:26 by ?JoseQuesada)