FlowControllers
This page documents the underlying motivation, the initial design, and the ongoing implementation work of FlowController components (naming up for discussion!). as an additional, reusable Pipeline Control Elements
Motivation
Parallelizatoin and distribution require the partitioning of data and more generally more control over workflows. E.g.after splitting and distributing data for parallel computation, data will have to be merged again for further processing in a workflow. This is currently not well supported through by the platform, since plug-ins have exactly one previous element in a workflow.
There are several generic ways to split data/a workflow and merge again. The precise way to perform this depends on
- the dataset (which might not be trivial to split, e.g. due to data interdependencies),
- involved plug-ins,
- the workflow/use-case.
Use case scenarios where this functionality is required are
- WP7b: Semantic annotation: Information extraction application (GATE) requires processing of large amounts of text documents. This calls for a split of input data and parallel processing of individual documents with a following merging of results (see following picture).
- WP6: Branching within workflows, routing of data to different plug-ins.
Therefore, it should not be plug-ins (or rather plug-in managers) that perform this operation but rather distinct components that only impact workflow execution and if required perform the splitting of data. This achieves a separation of concerns (aspects): Distribution (remote communication, etc.) is encapsulated within plug-in managers, just as it is now. The only responsibility of a FlowController is workflow manipulation, e.g. splitting of workflows, merging of workflows, conditional execution of a branch, etc.
A multitude of different implementations are possible once appropriate interfaces are in place. Hopefully, this also results in granular components that can be re-used in different workflows, since a basic splitting/merging are fairly generic.
Initial Design
The basic design goals is to introduce a fine-grained components that can be used within workflows. The main functionality of a FlowController is to direct the execution flow between plug-ins in ways that cannot be achieved with the current plug-in design. The reason to isolate this functionality from other aspects, such as processing (handled within a plug-in) and distrubtion/communication (handled by a plug-in manager).
Thus the functionality of a FlowController is:
- Split data and/or execution flow to several plug-ins
- Merge execution data and/or execution flow
- Conditional execution paths
- etc.
Design issues:
Plug-ins need to be able to be connected directly or through one or more intermediate FlowController.
A FlowController share functionality with a PluginManager. For example, they accept messages, they have previous elements in a workflow (however they can have several of those), etc. For this reason we introduce a new abstract base class PipelineElement (TBC: WorkflowElement) that containts this common functionality. This has no direct impact on the current implementation of Plug-in managers. It only means that at some point the current plug-in manager implementation will be refactored and some elements will be pushed up in the common base class. This does not create direct dependencies between FlowController and PluginManager implementations. See rough class diagram below.
Control messages (e.g. to indicate shutdown) between workflow elements are now simply propagated to the previous element in the workflow. For more complex workflows (involving loops, branches, parallel sections) this will cause problems. A more ellaborate way to handle control messages is needed -- they can also serve as valuable diagnostic information. This is a general concern and not something that can per se be solved through a FlowController.
Implementation
- First implementation until end of April 2010

