Towards distributed scalable (high-throughput) reasoning
On this page, we're presenting our solutions which show first approaches and concepts in order to deal with distributed scalable (high-throughput) reasoning over very large data sets. Actually, there is a strong need for these systems although existing (mainly local) solutions do scale very well up to a certain number of RDF triples (≈ 10 Billion triples).
However, with the permanent increase of RDF triples available on the Internet, local reasoner need to be redesigned for processing RDF statements in parallel and on distributed machines. Promising approaches for processing complex tasks concurrently can be found in parallel programming models such as MPI, MapReduce etc.
So far, people from VUA (WP5) have investigated the use of MapReduce in the context of large-scale reasoning. They built a distributed reasoning framework on top of the Hadoop MapReduce implementation. Initial results are presented in ISWC and ESCW.
Plans for the upcoming months:
- Rerun computation on HLRS cluster (March 2010)
- Provide simple plug-in to start computation remotely (July 2010)
Investigate usage of MapReduce for complete workflow parallelisation
