Deploying Plugins Using JEE Technology

1. Purpose

This page is intended to record the results from attempting to deploy LarKC plugins using JEE technology and to be a focus for discussion over the issues raised. The basic approach is as follows:

  1. Create servlet classes (containers) for each of the plug-in types (query transformer, identifier, data transformer, selecter, reasoner)
  2. Create a converter utility that takes the existing <plugin_name>.larkc files and converts them to a Web Application Archive (.war) files that can be deployed to an application server, e.g. apache tomcat.

  3. Create client-side plug-in proxy classes that communicate with the plug-in containers to instantiate, invoke and destroy plug-in instances.
  4. For testing purposes, extend the simple workflow demo to use plug-ins deployed in this way
  5. For further experimentation, rework the simple workflow demo as a standalone Web application (servlet).

2. Motivation

In order for LarKC to achieve its goals for reasoning at Web scale, some form of parallel and/or distributed execution is necessary. While parallel execution is being explored in a cluster environment, this alternative approach achieves parallelism by distributing LarKC components (plug-ins) using industry standard Web technology. The perceived advantages are as follows:

3. Implementation

All the source code for using JEE servlets with LarKC components are located in the JEE extensions project, which is located in the LarKC subversion repository here:

https://larkc.svn.sourceforge.net/svnroot/larkc/trunk/jee_extensions

The project contains an ant build script that will build the following components:

  1. The servlet classes for wrapping each type of plug-in, e.g. ?SelectPluginServlet. These classes are all packaged in to larkc_jee_server.jar

  2. Proxy plug-in classes that implement plug-in interfaces and redirect method invocations to their remote counterparts, i.e. the servlet wrappers above. These are drop-in replacements for all the different plug-in types and are packaged in to larkc_jee_client.jar
  3. A converter utility that converts a .larkc file to a .war file ready for deployment. The converter needs to know the input file (.larkc), output file (.war), plugin class and plugin type. It will then repackage the .larkc file, add some platform and data layer jar files, create a web.xml and also adds in another experimental property editor servlet. The converter utility is packaged in to a standalone executable jar file call larkc_jee_converter.jar

3.1. Protocol

Invocation of the plug-in's methods is facilitated using the HTTP protocol with Java objects serialised in to HTTP message bodies. At the moment, standard Java object streams are used, but for compatibility and flexibility, it would be possible to change to an XML serialisation protocol. Client-side plug-in proxy classes can be used to interact with the deployed plug-ins and these proxies are indistinguishable from 'local' plug-in classes.

The current platform uses the plug-in registry/factory to instantiate plug-ins. With the JEE extensions, the platform will be able to instantiate plug-in proxy classes passing only the URL of the remotely deployed 'real' plug-in. Further work will be required to extend the plug-in registry with a database of URLs of known plug-in deployments.

However, existing demo code that instantiates plug-ins directly has been extended to cover remotely deployed plug-in, see https://bazbishop237@larkc.svn.sourceforge.net/svnroot/larkc/trunk/demos/simple_pipeline.

The important interactions between client plug-in proxy and remotely deployed plug-in are shown in the following diagram. servlet_plugin_protocol.PNG

4. Issues discovered

A number of issues have been uncovered when following this strategy, ranging from the slightly inconvenient to serious problems. Issues are grouped and discussed below.

4.1. API/platform

4.1.1. Plugin instances should have a unique identity

Plugin instances should have a unique identity, i.e the platform should give them an ID when they are instantiated.

Motivation: In an environment when several plug-ins of the same type are running, it would be very useful for plugins to be able to use their unique id to maintain separation, e.g. if two plugins are reading/writing triples to the same data store, then they can use this ID to label their triples and avoid interfering with each other.

I suggest adding the following method to the 'Plugin' Java interface:

   1 interface Plugin {
   2     // ...
   3     void setId( URI instanceId )
   4     /...
   5 }

4.1.2. Plugins should be factories for their own context

The original motivation for the Context interface was as follows:

  1. It was agreed that plug-ins should be stateless.
  2. Plug-ins that need to maintain state between invocations can record their state in an instance of an object that implements the 'Context' interface.
  3. The platform will pass the same context object to a plug-in every time it is invoked as part of the same workflow.
  4. The plug-in can modify the context object as appropriate.
  5. The platform never needs to know anything about the concrete type of the context object (it is dependent on the plug-in type).

Consequently, a context parameter has been added to the invocation methods for each of the plug-in types. However, for the platform to remain completely unaware of the concrete type of the context object, someone else has to instantiate it and this should be the plug-in itself, i.e. the entity that will record its own state in it.

I suggest adding the following method to the 'Plugin' Java interface:

   1 interface Plugin {
   2     // ...
   3     Context createContext();
   4     /...
   5 }

UPDATE! This has been implemented across the entire code-base. Plug-ins which do not store their state are not required to return an object, i.e. plug-ins are allowed to return 'null'.

4.1.3. Plugin managers should take care to create serialisable 'Contract' objects

In order for Contract objects to be passed to plug-ins during invocation they must be serialisable. Even though Contract objects are not being utilised at present, it became clear that the existing method to use anonymous inner classes is not appropriate. Anonymous inner classes are by definition not 'static' classes and so their enclosing types would need to be serialisable as well, which for the plug-in managers includes the inner thread classes and the plug-in managers themselves.

A solution to this issue has been to create a separate contract object that implements the Contract interface: eu.larkc.core.pluginManager.local.?SimpleContract This class is also derived from the Java Properties class and would be suitable for passing key-value pairs to plug-ins.

4.2. Data layer

4.2.1. Serialisation

A few more unserialisable classes where discovered, but most were easily fixed. However, eu.larkc.core.data.?DataSetImpl uses org.openrdf.query.impl.?DatasetImpl and this class is not serialisable and not 'owned' by us either. The net effect is that it is not possible to pass instances of ?DataSet to/from remote processes (well, at least using serialisation).

This is quite serious, so are there any suggestions for fixing this?

Should we avoid using this class altogether?

Should we write our own implementation? or write our own implementation

Can we modify the sesame code-base?

4.2.2. Multiple instances of a DataFactory

The LarKC ?DataFactory class is a singleton that hides an instance of an ORDI data model. When all plug-ins and the platform run in the same Java virtual machine (JVM), then all components will use and share the same underlying ORDI instance.

However, when plug-ins execute remotely they will use their own instance of the ORDI data model that corresponds to the ?DataFactory singleton in their JVM. This will be a different instance and strange effects will start to become apparent. For example, statements retrieved from ?RdfStoreConnection.search() are not available after connection is closed or in a different JVM after serilising/deserialising. The reason for this is that they use a lazy instantiation optimisation, which avoids creating objects that may never be used. However, this optimisation means that a statement object serialised to another JVM will never be able to retrieve its state, because it has no access to the ORDI data model from which it came.

Is there an easy way to force statement contents in to memory?

How difficult/intrusive is it to share a single ORDI data model between JVMs/networked machines?

4.2.3. DataFactory singleton

?DataFactory is an interface which collects together methods for obtaining access to the underlying storage component (ORDI) and a few other utilities. It has a single implementation class: ?DataFactoryImpl that exposes the static singleton method: getInstance().

Further to this, ?DataFactory also declares a static data member, called 'INSTANCE', that is statically initialised like this:

   1 public interface DataFactory {
   2     /**
   3      * To consider the implementation of DefaultFactory, which to automated the
   4      * creation of the different factories.
   5      */
   6     public final static DataFactory INSTANCE = DataFactoryImpl.getInstance();
   7     //...
   8 }

The current implementation makes it hard to pass configuration information to the ?DataFactory (and underlying ORDI data model) implementation. A specific requirement from wrapping plug-ins in servlets is that the location for the ORDI storage files must be specified in order to prevent interference from two Java instances in separate JVMs attempting to write to the same stroage files.

My suggestion is to remove the ?DataFactory.INSTANCE data member and instead use ?DataFactoryImpl.getInstance() directly. This would then allow another access method to be added, e.g. ?DataFactoryImpl.getInstance( Map<Object,Object> ), so that the first to call this in a JVM can pass ORDI specific configuration parameters, e.g. the location of the storage files.

Vassil is considering this problem and will suggest a solution in the near future.

5. Wrapping LarKC plug-ins inside a tomcat servlet container

This section should guide you through the process of wrapping a LarKC plug-in inside a .war file which then can be deployed on a servlet container. In the following example the servlet container used is Apache Tomcat.

5.1. Preliminary steps

Before any plug-in can be wrapped inside a tomcat servlet container make sure that the platform as well as the plug-in to be wrapped can be build.

1) Run the ant build script inside the LarKC root directory1.

cd LarKC/
ant

This will build the LarKC platform, plug-ins, and the simple_pipeline demo. Make sure your plug-in is located inside one of the directories of the plugins folder and contains a valid build.xml file so it will be build automatically.

2) Copy the larkc-platform.jar to the jee_extension's lib folders.

cp platform/dist/bin/larkc-platform.jar jee_extensions/lib/ 
cp platform/dist/bin/larkc-platform.jar jee_extensions/webapp_lib/

Building the platform will generate a dist folder. The larkc-platform.jar is located in this folder (dist/bin/larkc-platform.jar). Copy this jar to the lib/ as well as the webapp_lib/ directory of the JEE extensions.

3) Build the JEE extensions

cd jee_extensions
build

5.2. Wrap .larkc plug-ins inside a .war file

If the preliminary steps were completed successfully one or more LarKC plug-ins can be wrapped inside a .war file.

4) Convert .larkc plug-ins to .war files using the larkc_jee_converter of the JEE extensions

Usage:

java -jar larkc_jee_converter.jar
 {plugin.larkc}
 {plugin.war}
 {full.java.class.name}
 {query_transformer|identifier |data_transformer|selecter|reasoner}

Example: The following example is run inside the LarKC root directory and builds the ?SindiceTriplePatternIdentifier plug-in.

java -jar jee_extensions/build/jar/larkc_jee_converter.jar
    platform/dist/plugins/SindiceTriplePatternIdentifier.larkc
    SindiceTriplePatternIdentifier.war
    eu.larkc.plugin.identify.sindice.SindiceTriplePatternIdentifier
    identifier 

5.3. Deploy the .war file

5) Copy generated .war file to the tomcat webapps folder

mv SindiceTriplePatternIdentifier.war /var/lib/tomcat6/webapps/

Note: /var/lib/tomcat6/webapps/ is the default webapps path of tomcat6 on Ubuntu 9.10 as well as Ubuntu 10.04. Please change the path accordingly if your tomcat server is installed in a different directory.

The .war file will be deployed automatically if your tomcat server allows autodeploy. If this is not the case please deploy the .war file manually.

5.4. Invoke remote plug-ins using JEE proxies

Since plug-ins wrapped inside a tomcat servlet container differ from conventional LarKC plug-ins instantiation of those remote plug-ins differs. To use the plug-ins which have been deployed in step 5) one has to use JEE proxy classes. For every plug-in type there exists a JEE proxy, all of which are contained in the larkc_jee_client.jar.

The following example shows how to create a simple workflow using a local instance of the ?SimpleAnytimeDecider which constructs a ?QueryTransformer → Identifier → Selecter → Reasoner pipeline, which is executed on a tomcat server.

   1 SimpleAnytimeDecider decider = new SimpleAnytimeDecider(
   2  new QueryTransformerJeeProxy(
   3    "http://localhost:8080/SPARQLToTriplePatternQueryTransformer/query_transformer" ),
   4  new IdentifierJeeProxy(
   5    "http://localhost:8080/SindiceTriplePatternIdentifier/identifier" ),
   6  new SelecterJeeProxy(
   7     "http://localhost:8080/GrowingDataSetSelecter/selecter" ),
   8  new ReasonerJeeProxy(
   9     "http://localhost:8080/SparqlQueryEvaluationReasoner/reasoner" ) );

Note: The four remote plug-ins are deployed on a local tomcat6 server running on port 8080.

Please keep in mind that you have to adjust the security policies of your tomcat installation in order to enable communication between the plug-ins. More information on the tomcat security manager can be found here.

  1. The LarKC root directory refers to the directory which holds the platform, plugins as well as the jee_extensions directory, among others (demos, development_kit) (1)

LarkcProject/WP5/DeployingPluginsWithJEE (last edited 2010-06-24 12:24:30 by ?Christoph)