The LarKC Platform - Software Requirements Specification

Executive summary

This software requirements specification (SRS) document represents the consortium’s mutual understanding of the requirements of the LarKC platform in terms of its intended behaviour and interoperability with other LarKC software components - specifically the retrieval/selection/abstraction/learning/reasoning/deciding plug-ins to be realised in work packages 2, 3 and 4.

The application of parallel and distributed techniques has significant implications for the developers of the LarKC components that must be considered at the beginning of the design phase.

1. Introduction

1.1. Purpose

This software requirements specification (SRS) document represents the consortium’s mutual understanding of the requirements of the LarKC platform in terms of its intended behaviour and interoperability with other LarKC software components - specifically the retrieval/selection/abstraction/learning/reasoning/deciding plug-ins to be realised in work packages 2, 3 and 4.

1.2. Audience

All partners in work package 5 will contribute to the creation and evolution of this document. Furthermore, partners in work packages 2, 3 and 4 will be consulted and may also contribute as their understanding of the behaviour of plug-in components becomes better defined.

1.3. Document Conventions

The following styles and colours are used to indicate areas where requirements have not been fully agreed:

Style

Indication

LarKC must run on an iPhone

Strikethrough. This style is used to indicate a requirement that has been removed by agreement, but is retained for historical reasons.

The platform will be written in 100% pure Java, OR
The platform will allow Fortran code to be linked in.

Red. Indicates where agreement has not been reached. This will normally be used for requirements that have been suggested, but not agreed, where a number of options apply and one has yet to be selected, or where some open question indicates an area for future discussion.

The selection process shall include probabilistic techniques.

Normal. Indicates a (currently) agreed upon requirement.

1.4. Origin of Requirements

The Description of Work (DoW) outlines some goals of the LarKC consortium, which are necessarily vague at the commencement of a research project. However, over time, these goals will be more concretely defined in this SRS as the consortium collectively works to achieve a better understanding of the research goals.

1.5. Development of Requirements

Normally, an SRS is the result of the initial product development phase in which information is gathered about which requirements are needed, and those which are not. Typically, this information-gathering stage can include onsite visits, questionnaires, surveys, interviews, and perhaps a return-on-investment (ROI) analysis or needs analysis of the customer or client's current business environment. Consequently, the SRS is usually written after the requirements have been gathered and analyzed.

In the case of LarKC, there is neither client nor product development group. The concrete LarKC platform requirements (D5.3.1 – M12) will come after the release of the initial prototype (D5.2.1 – M10). Nevertheless, there are many important issues that will impact the ultimate design of the platform and it is important to be aware of these as early as possible. This SRS document serves the role of identifying the important issues and providing a medium for discussion.

1.6. Documents that will Reference this SRS

It is anticipated that subsequent project management documents, such as design specifications, software architecture specifications, test plans and user documentation will refer back to this SRS.

1.7. Goals of the SRS

This SRS document shall state in precise and explicit language those functions and capabilities that the LarKC platform must provide, as well as stating any required constraints by which the system must abide. It is important to note that this document contains functional and non-functional requirements only; it does not offer design suggestions, possible solutions to technology issues, or any other information other than what the consortium understands the system requirements to be.

This document has the following goals:

2. Overall Description

To quote the DoW:

The following sections aim to answer such questions as:

2.1. Platform Architecture

Id

Requirement

Source

The platform must be implemented on high-performance computing cluster, i.e. one that utilises a ‘massively’ parallel infrastructure of many processors, shared memory and shared file-system.

DoW, page 2

The platform must be implemented on a distributed platform, i.e. one that utilises the processing resources of many network connected machines – many processors, but no shared memory or file-system. Otherwise known as ‘computing@home’ or ‘thinking@home’.

DoW, page 2

The platform will perform ‘massive inference’ by distributing problems across heterogeneous computing resources, i.e. breaking up a single search/reasoning task to run in parallel.

DoW, page 6

The platform will utilise a variety of reasoning methods (not only logic) from the areas of cognitive science (human heuristics), economics (limited rationality and cost/benefit trade-offs), information retrieval (recall/precision trade-offs), and databases (very large datasets).

DoW, page 5

The platform shall comprise a pluggable architecture that will ensure that computational methods from different fields can be coherently integrated.
Integrated with the platform?
OR
Integrated with each other?
OR
Integrated because they are represented with the same formalism?
OR
Integrated because a retrieval plug-in is connected to a reasoning plug-in?

DoW, page 5

The platform shall support the processing of many concurrent user requests? (Discuss!)

The platform will manage data distribution when deployed on a thinking@home architecture?

2.2. Plug-in Architecture

Id

Requirement

Source

A reasoning plug-in is a software component that captures the functionality of a single reasoning/search technique.

A plug-in will interface to the LarKC platform using a defined, standard interface and implement such behaviour as required by the platform.

A plug-in will interoperate with the LarKC platform when deployed on any of its architectures (parallel/cluster AND thinking@home)?

Communication between the platform and the plug-ins will be via:
Web Service
Java RMI
Some other RPC mechanism?

2.3. Storage

Id

Requirement

Source

The platform shall utilise massive heterogeneous information sources

DoW, page 4

Typically reasoning over 10 billion RDF triples (RDF Schema and OWL-Lite) shall be accomplished in approximately 100 ms

DoW, page 5, 7

Move well beyond existing formalisms such as RDF, RDF-S and OWL

DoW, page 8

SwiftOWLIM and BigOWLIM shall be the basis of the storage and querying technology

DoW, page 12

The storage technology will be capable of being updated in real-time.

DoW, page 9

2.4. Reasoning

Id

Requirement

Source

The system shall facilitate the integration of reasoning and Web-search that scales to billions (zillions) of facts.

DoW, page 1

Inference (RDF Schema and OWL-Lite) on data-sets of tens of billions of triples shall be achievable in real-time response times.

DoW, page 7

LarKC shall fuse reasoning (logic) with search (retrieval)

DoW, page 4

LarKC shall exploit techniques and heuristics from diverse areas

DoW, page 4

LarKC shall provide sufficient conceptual integration of diverse, heterogeneous approaches in order to seamlessly integrate reasoning components from diverse fields.

DoW, page 7

LarKC shall trade quality for computational cost by embracing incompleteness and unsoundness

DoW, page 5

The overall LarKC system shall adapt to varying requirements of quality and scale

DoW, page 5

The selection process shall include probabilistic techniques.

DoW, page 7

The rule-based reasoning component shall be capable of performing inference using very many axioms.

DoW, page 9

2.5. Operating environment

Id

Requirement

Source

The plug-in configuration for an instance of a LarKC platform shall be: statically defined or based on automatic plug-in selection?

2.6. User environment

Id

Requirement

Source

Users will interact with LarKC using: a web browser, web service, client application, OR something else??

How will users interact with LarKC? Real-time query submission and fast response OR Batched query submission

2.7. User Classes and Characteristics

Id

Requirement

Source

Users of LarKC will fall in to the following categories: Human OR Machine OR Both

2.8. Design/implementation constraints

2.9. Assumptions and dependencies

2.10. Open Questions

2.10.1. Architecture

What is the motivation for supporting the both distributed and parallel architectures?

Is it that LarKC must be usable no matter what hardware is available or is it that the hardware is chosen to suit the problem domain?

In other words, should we aim for all our use-cases to function well on both architectures?

Is it enough to show that one architecture better supports a particular problem domain and that an instance of LarKC running on that architecture shows an increased performance over the state-of-the-art?

2.10.2. Plug-in architecture

Write plug-in once only for all architectures?

Design platform accordingly?

Parallelism inside plug-ins only?

How will a plug-in interoperate with distributed architecture?

Considering a thinking@home environment, surely plug-in writers should not have to deal with the intricacies of distributing compute tasks to remote nodes, synchronising their responses, resending failed tasks, etc etc.

Who controls the plug-ins’ resource allocation?

Will the platform attempt give a plug-in all the resources it requests (threads/memory)?

2.10.3. Multi-user/Single-user?

Platform shall support processing of multiple concurrent user requests?

Frank says “no”.

Barry says “why not”?

HLRS say that there might be interesting behaviour regarding how one query affects another (already cached data, etc)

Barry thinks: The synchronisation issues will likely be taken care of already due to the nature of the target architectures, however, there will be other issues regarding resource allocation between the two query jobs.

2.10.4. Configuration/Integration

Plug-in configuration, statically defined or selected automatically based on the request?

If statically defined, i.e. a platform is configured to use reasoning plug-in X, then how can there be any ‘coherent integration’ of different reasoning techniques?

Are we expecting to build plug-ins that utilise several other reasoning techniques (plug-ins), i.e. compose new plug-ins from more ‘atomic’ ones? Is this the new ground that the research project hopes to cover?

2.10.5. Pipe-line

Is this always sequential, or can abstraction/reasoning start while retrieval is still in progress, i.e. some kind of parallel and streaming set-up? In this case, rather than a loop around the pipeline, it just needs a signal from the last pipe-line element to indicate when to stop (assuming each previous element is not exhausted).

2.10.6. Data distribution for thinking@home

Compute tasks can be distributed to many machines with available processor cycles, but will it be necessary to distribute data also? If so, what mechanism/technology will be used to manage the grid-like distribution of data?

2.10.7. Communication for thinking@home

What communications technology will be used to distribute compute tasks? Will sub-divided problems require ‘inter-division’ communication, i.e. if a problem is broken down in to 100 parts and each part runs on a remote node, will these parts need to communicate with each other while they are executing? Do we forbid this? If it is allowed, then how is it achieved?

3. Detailed System Features

To be completed as more detailed requirements of specific system features are agreed.

4. Abbreviations/Glossary

DoW

Description of Work

Dx.y.z

Deliverable x.y.z

LarKC

The Large Knowledge Collider

Plug-in

A reasoning/search component that the LarKC platform uses to answer queries

SRS

Software Requirements Specification

WPx

Work Package x

LarkcProject/WP5/docs/platform/development/Software_Requirements_Specification (last edited 2008-11-22 00:06:40 by WitbrockMichael)