WP7a feedback on LarKC platform based on the M18 prototype implementation
Vassil shares his experience of implementing the WP7a workflow within LarKC platform
The page presents the WP7a feedback on the current LarKC platform implementation. We evaluate the practical applicability of the LarKC concept and the concrete Java API to WP7a use case. The platform conceptually defines plug-ins of type:
- Decider orchestrates the plug-in execution
- Identifier finds relevant information
?InformationSetTrasnformer transforms one information model to another
?QueryTransformer transforms a list of queries to another
- Reasoner implements the execution of SPARQL queries
- Selecter filters/expand the scope of the queried RDF data
The concept seems ok until we do not take a closer look of the current Java API implementation:
- Decider - input: SPARQL query; output: SPARQL result - e.g., every interaction with the LarKC server must be done with a SPARQL query (?!)
- Identifier - input: any sort of query; output: collection of serializable data objects - you can pass nearly everything as input/output
?InformationSetTransformer - input: serializable data; output: serializable data - once again you can pass nearly everything as input/output
?QueryTransformer - input: a query; output collection of queries
Reasoner - input: a SPARQL query & RDF data; output: SPARQL result
- Selecter - input: RDF data; output: RDF data
The first obvious issues is the separation between:
- Plug-ins that uses RDF and SPARQL queries: Decider, Reasoner, Selecter
Plug-ins that uses just serializable data: Identifier, ?InformationSetTransformer, ?QueryTransformer (e.g., these plug-ins could also pass RDF data)
It is obvious that there is a conflict between the ideas of having strong types (everything fits and clicks compile time) and weak types (no need to cover all border line scenarios that will never happens). Also in every realistic workflow the Decider should take the role of component that fortuitously down-cast (case data to RDF and query to SPARQL) and parallelize the execution flow (fork/merge collections to a single input value). Unfortunately this will lead always to a tight-coupling between the Decider, the plug-ins and their order.
LarKC plug-ins do we need so many (if weak types are used)?
The section describes the overlaps between the different plug-ins as Java interfaces and the consequences of strong vs weak type interfaces. It seems to me that in LarKC platform we have already gave up the idea of workflows that communicates with strong types method signatures (and figuring all the problems compile time). I can also agree that this is a far easier and more flexible (indeed!) approach to realize a platform applicable to a broad number of use cases (Note: I am a big fan of strong types if you can afford it!)
Using the current plug-in interfaces and the idea of tightly coupled Decider I can reprogram any workflow to use only one plug-in Java type - ?InformationSetTransformer (the input is serializable data and output is serializable data - e.g., more or less everything). Here is a simple example:
Create a new type ?PluginMessage that extends ?InformationSet with fields: sourcePlugin, data
- The Decider will read the sourcePlugin field and try to perform the cast and check the type of destinationPlugin (whether uses a single or list of objects)
The described ?InformationSetTransformer + ?InformationSetTransformer Decider will be able to cover every scenario that is realizable with the current implementation.
Is it InformationSetTransformer the right plug-in then?
The short answer is firmly - No! The section will try to explain all the shortcoming of the described approach in "LarKC plug-ins do we need so many" section.
Of course we can use it, but in the modern world when RDF is de-facto a standard and the linked data concept gains a momentum it does not sound very serious to speak about the exchange of serializable Java types. We have specification like JSR311 (JAX-RS) (Java to RDF serialization). My idea is to have one single plug-in Java type which will get as:
- Input: RDF data + minimal schema level agreement that model the plugin contracts (note: it can evolve for different setups)
- Output: RDF data + minimal schema level agreement that model the plugin contracts
Please note that the schema may repeat the method signatures in the current API methods. It can also extend it by adding predicates whether the information is send by reference or value (e.g., whether a GET operation is needed to load the data or you get the actual data).
I believe that this changes will make the LarKC platform much more open for its developers and allows the smoother implementation of workflows like:
- Transform OBO files to RDF
- Validate the RDF file (we want to validate whether the file is generated syntactically but also semantically correct)
- Import statements to RDF repository
- Generate mappings between the plug-ins
- Create semantic annotations between the data
- and etc...
The implementation of such a workflow seems not very trivial if the plug-in concepts are not totally misused (e.g., to use only Selecter or ?InformationSetTransformer).
