The LarKC Platform - Software Requirements Specification
Executive summary
This software requirements specification (SRS) document represents the consortium’s mutual understanding of the requirements of the LarKC platform in terms of its intended behaviour and interoperability with other LarKC software components - specifically the retrieval/selection/abstraction/learning/reasoning/deciding plug-ins to be realised in work packages 2, 3 and 4.
The application of parallel and distributed techniques has significant implications for the developers of the LarKC components that must be considered at the beginning of the design phase.
Contents
- The LarKC Platform - Software Requirements Specification
1. Introduction
1.1. Purpose
This software requirements specification (SRS) document represents the consortium’s mutual understanding of the requirements of the LarKC platform in terms of its intended behaviour and interoperability with other LarKC software components - specifically the retrieval/selection/abstraction/learning/reasoning/deciding plug-ins to be realised in work packages 2, 3 and 4.
1.2. Audience
All partners in work package 5 will contribute to the creation and evolution of this document. Furthermore, partners in work packages 2, 3 and 4 will be consulted and may also contribute as their understanding of the behaviour of plug-in components becomes better defined.
1.3. Document Conventions
The following styles and colours are used to indicate areas where requirements have not been fully agreed:
Style |
Indication |
LarKC must run on an iPhone |
Strikethrough. This style is used to indicate a requirement that has been removed by agreement, but is retained for historical reasons. |
The platform will be written in 100% pure Java, OR |
Red. Indicates where agreement has not been reached. This will normally be used for requirements that have been suggested, but not agreed, where a number of options apply and one has yet to be selected, or where some open question indicates an area for future discussion. |
The selection process shall include probabilistic techniques. |
Normal. Indicates a (currently) agreed upon requirement. |
1.4. Origin of Requirements
The Description of Work (DoW) outlines some goals of the LarKC consortium, which are necessarily vague at the commencement of a research project. However, over time, these goals will be more concretely defined in this SRS as the consortium collectively works to achieve a better understanding of the research goals.
1.5. Development of Requirements
Normally, an SRS is the result of the initial product development phase in which information is gathered about which requirements are needed, and those which are not. Typically, this information-gathering stage can include onsite visits, questionnaires, surveys, interviews, and perhaps a return-on-investment (ROI) analysis or needs analysis of the customer or client's current business environment. Consequently, the SRS is usually written after the requirements have been gathered and analyzed.
In the case of LarKC, there is neither client nor product development group. The concrete LarKC platform requirements (D5.3.1 – M12) will come after the release of the initial prototype (D5.2.1 – M10). Nevertheless, there are many important issues that will impact the ultimate design of the platform and it is important to be aware of these as early as possible. This SRS document serves the role of identifying the important issues and providing a medium for discussion.
1.6. Documents that will Reference this SRS
It is anticipated that subsequent project management documents, such as design specifications, software architecture specifications, test plans and user documentation will refer back to this SRS.
1.7. Goals of the SRS
This SRS document shall state in precise and explicit language those functions and capabilities that the LarKC platform must provide, as well as stating any required constraints by which the system must abide. It is important to note that this document contains functional and non-functional requirements only; it does not offer design suggestions, possible solutions to technology issues, or any other information other than what the consortium understands the system requirements to be.
This document has the following goals:
- It provides feedback to the consortium. This SRS is the consortium’s assurance that development work is conducted such that the software created addresses the issues or problems described here. Therefore, the SRS should be written in natural language, in an unambiguous manner that may also include charts, tables, data flow diagrams, decision tables, and so on.
- It decomposes the problem into component parts. The simple act of writing down software requirements in a well-designed format organizes information, places borders around the problem, solidifies ideas, and helps break down the problem into its component parts in an orderly fashion.
- It serves as an input to the design specification. As mentioned previously, the SRS serves as the parent document to subsequent documents, such as the software design specification and statement of work. Therefore, the SRS must contain sufficient detail in the functional system requirements so that a design solution can be devised.
- It serves as a product validation check. The SRS also serves as the parent document for testing and validation strategies that will be applied to the requirements for verification
2. Overall Description
To quote the DoW:
- "The Large Knowledge Collider (LarKC): a platform for integrated reasoning and Web-search that scales to zillions of facts"
The following sections aim to answer such questions as:
- What is meant by ‘platform’?
- How many is ‘zillions’?
- What hardware architectures does this platform run on?
- Where does this platform execute?
- What is the platform responsible for?
- What other software does the platform interface to?
- How is the platform started?
- How is the platform configured?
- Who uses the platform?
- What is the platform used for?
- What can the platform can do?
- What are the limitations of the platform?
- What is a ‘plug-in’?
- How does the platform interact with a plug-in?
- What is a plug-in responsible for?
2.1. Platform Architecture
Id |
Requirement |
Source |
|
The platform must be implemented on high-performance computing cluster, i.e. one that utilises a ‘massively’ parallel infrastructure of many processors, shared memory and shared file-system. |
DoW, page 2 |
|
The platform must be implemented on a distributed platform, i.e. one that utilises the processing resources of many network connected machines – many processors, but no shared memory or file-system. Otherwise known as ‘computing@home’ or ‘thinking@home’. |
DoW, page 2 |
|
The platform will perform ‘massive inference’ by distributing problems across heterogeneous computing resources, i.e. breaking up a single search/reasoning task to run in parallel. |
DoW, page 6 |
|
The platform will utilise a variety of reasoning methods (not only logic) from the areas of cognitive science (human heuristics), economics (limited rationality and cost/benefit trade-offs), information retrieval (recall/precision trade-offs), and databases (very large datasets). |
DoW, page 5 |
|
The platform shall comprise a pluggable architecture that will ensure that computational methods from different fields can be coherently integrated. |
DoW, page 5 |
|
The platform shall support the processing of many concurrent user requests? (Discuss!) |
|
|
The platform will manage data distribution when deployed on a thinking@home architecture? |
|
2.2. Plug-in Architecture
Id |
Requirement |
Source |
|
A reasoning plug-in is a software component that captures the functionality of a single reasoning/search technique. |
|
|
A plug-in will interface to the LarKC platform using a defined, standard interface and implement such behaviour as required by the platform. |
|
|
A plug-in will interoperate with the LarKC platform when deployed on any of its architectures (parallel/cluster AND thinking@home)? |
|
|
Communication between the platform and the plug-ins will be via: |
|
2.3. Storage
Id |
Requirement |
Source |
|
The platform shall utilise massive heterogeneous information sources |
DoW, page 4 |
|
Typically reasoning over 10 billion RDF triples (RDF Schema and OWL-Lite) shall be accomplished in approximately 100 ms |
DoW, page 5, 7 |
|
Move well beyond existing formalisms such as RDF, RDF-S and OWL |
DoW, page 8 |
|
SwiftOWLIM and BigOWLIM shall be the basis of the storage and querying technology |
DoW, page 12 |
|
The storage technology will be capable of being updated in real-time. |
DoW, page 9 |
2.4. Reasoning
Id |
Requirement |
Source |
|
The system shall facilitate the integration of reasoning and Web-search that scales to billions (zillions) of facts. |
DoW, page 1 |
|
Inference (RDF Schema and OWL-Lite) on data-sets of tens of billions of triples shall be achievable in real-time response times. |
DoW, page 7 |
|
LarKC shall fuse reasoning (logic) with search (retrieval) |
DoW, page 4 |
|
LarKC shall exploit techniques and heuristics from diverse areas |
DoW, page 4 |
|
LarKC shall provide sufficient conceptual integration of diverse, heterogeneous approaches in order to seamlessly integrate reasoning components from diverse fields. |
DoW, page 7 |
|
LarKC shall trade quality for computational cost by embracing incompleteness and unsoundness |
DoW, page 5 |
|
The overall LarKC system shall adapt to varying requirements of quality and scale |
DoW, page 5 |
|
The selection process shall include probabilistic techniques. |
DoW, page 7 |
|
The rule-based reasoning component shall be capable of performing inference using very many axioms. |
DoW, page 9 |
2.5. Operating environment
Id |
Requirement |
Source |
|
The plug-in configuration for an instance of a LarKC platform shall be: statically defined or based on automatic plug-in selection? |
|
2.6. User environment
Id |
Requirement |
Source |
|
Users will interact with LarKC using: a web browser, web service, client application, OR something else?? |
|
|
How will users interact with LarKC? Real-time query submission and fast response OR Batched query submission |
|
2.7. User Classes and Characteristics
Id |
Requirement |
Source |
|
Users of LarKC will fall in to the following categories: Human OR Machine OR Both |
|
2.8. Design/implementation constraints
2.9. Assumptions and dependencies
2.10. Open Questions
2.10.1. Architecture
What is the motivation for supporting the both distributed and parallel architectures?
Is it that LarKC must be usable no matter what hardware is available or is it that the hardware is chosen to suit the problem domain?
In other words, should we aim for all our use-cases to function well on both architectures?
Is it enough to show that one architecture better supports a particular problem domain and that an instance of LarKC running on that architecture shows an increased performance over the state-of-the-art?
2.10.2. Plug-in architecture
Write plug-in once only for all architectures?
Design platform accordingly?
Parallelism inside plug-ins only?
How will a plug-in interoperate with distributed architecture?
Considering a thinking@home environment, surely plug-in writers should not have to deal with the intricacies of distributing compute tasks to remote nodes, synchronising their responses, resending failed tasks, etc etc.
Who controls the plug-ins’ resource allocation?
Will the platform attempt give a plug-in all the resources it requests (threads/memory)?
2.10.3. Multi-user/Single-user?
Platform shall support processing of multiple concurrent user requests?
Frank says “no”.
Barry says “why not”?
HLRS say that there might be interesting behaviour regarding how one query affects another (already cached data, etc)
Barry thinks: The synchronisation issues will likely be taken care of already due to the nature of the target architectures, however, there will be other issues regarding resource allocation between the two query jobs.
2.10.4. Configuration/Integration
Plug-in configuration, statically defined or selected automatically based on the request?
If statically defined, i.e. a platform is configured to use reasoning plug-in X, then how can there be any ‘coherent integration’ of different reasoning techniques?
Are we expecting to build plug-ins that utilise several other reasoning techniques (plug-ins), i.e. compose new plug-ins from more ‘atomic’ ones? Is this the new ground that the research project hopes to cover?
2.10.5. Pipe-line
Is this always sequential, or can abstraction/reasoning start while retrieval is still in progress, i.e. some kind of parallel and streaming set-up? In this case, rather than a loop around the pipeline, it just needs a signal from the last pipe-line element to indicate when to stop (assuming each previous element is not exhausted).
2.10.6. Data distribution for thinking@home
Compute tasks can be distributed to many machines with available processor cycles, but will it be necessary to distribute data also? If so, what mechanism/technology will be used to manage the grid-like distribution of data?
2.10.7. Communication for thinking@home
What communications technology will be used to distribute compute tasks? Will sub-divided problems require ‘inter-division’ communication, i.e. if a problem is broken down in to 100 parts and each part runs on a remote node, will these parts need to communicate with each other while they are executing? Do we forbid this? If it is allowed, then how is it achieved?
3. Detailed System Features
To be completed as more detailed requirements of specific system features are agreed.
4. Abbreviations/Glossary
DoW |
Description of Work |
Dx.y.z |
Deliverable x.y.z |
LarKC |
The Large Knowledge Collider |
Plug-in |
A reasoning/search component that the LarKC platform uses to answer queries |
SRS |
Software Requirements Specification |
WPx |
Work Package x |
