Early Adopters Workshop, Hands-on session - Rule-based Reasoner Scenario

1. Description of the plug-in

A rule engine can be used compute inferences for RDF, RDFS, eRDFS, etc entailment regimes, see http://www.w3.org/TR/rdf-mt/#rules.

A prototype implementation of such a component based on the IRIS reasoner has been created in the larkc_plugins project in package src/eu.larkc.plugin.reason.rulebased. This reasoner treats input triples as belonging to an arity 3 relation and then applies the specified entailment rules to materialise the inferred triples.

Different entailment rules have very different complexity results and industrial implementations often do not implement the computationally expensive rules.

2. Purpose

The purpose of this scenario is to:

3. Step-by-Step

3.1. Setting up LarKC with a rule-based reasoner

This tutorial requires to check out (i) the LarKC platform itself (https://svn.gforge.hlrs.de/svn/larkc/trunk/platform/) and (ii) the wrapped up IRIS reasoner plugin (https://svn.gforge.hlrs.de/svn/larkc/trunk/plugins/).

The second step involves to build the platform by invoking the ant build script (see <futurelink>) and subsequently building the plugins as well. Correct setup can be validated by running the test cases in eu.larkc.plugin.reason.rulebased.?IrisRuleBasedReasonerTest.

Example data sets can be found in the plugin/reason/rulebased/files/example subfolder. The example data sets include the wine ontology (http://www.w3.org/TR/owl-guide/wine.rdf) and RDF data about the ESWC2008 (http://data.semanticweb.org/conference/eswc/2008/html), including information about authors, papers, ...

3.2. Execution

The rulebased IRIS reasoner plugin loads a set of RDF triples along with a set of predefined entailment rules. These rules, which can capture i.e. RDF, L2, or an potentially even larger subset of OWL (i.e. OWL 2 RL) are then used to infer new information. If no rule set is given, then the L2 rule set is used by default.

A practical example of such an entailment rule is:

triple(?v, _iri('owl:equivalentClass'), ?w) :- triple(?v, _iri('rdfs:subClassOf'), ?w), triple(?w, _iri('rdfs:subClassOf'), ?v).

Overall there are two main configuration possibilties for the rule based reasong plugin:

A test file to easily start experimenting with the plugin is available in eu.larkc.reason.rulebased.earlyAdopters.?IrisRuleBasedReasonerWorkshopExample. In the main() method the reasoner is constructed and it is possible to adjust the behavior or the reasoner by changing constructor parameters as mentioned, for example

reasoner = new IrisRuleBasedReasoner(IrisRuleBasedReasoner.L2);

starts the reasoner with the L2 entailment rules (defined in files/l2_entailment.rules).

reasoner = new IrisRuleBasedReasoner(null);

starts the plugin without any actual reasoning performed.

After that it is possible to define a SPARQL query and there are some example queries given in the class already, i.e.:

PREFIX wine: http://wine-light.org/wine#

PREFIX rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#

SELECT ?x

WHERE {?x wine:hasFlavor wine:Moderate. ?x wine:locatedIn wine:?CaliforniaRegion}

This query is then evaluated query is run against a predefined ontology, i.e.

String theRDFFile = new String(RDF_TEST_WINE_LIGHT);

3.3. What is happening internally

In the following the internals of a typical query evaluation based on the rule based reasoner are explained, starting from the example ?IrisRuleBasedReasonerWorkshopExample.

  1. Initially the reasoner is set up and initialized with a set of entailment rules to be used (line 55). These rulesets are loaded from external text files as specified in the constants in the ?IrisRuleBasedReasoner class. Inside the constructor (line 171 in ?IrisRuleBasedReasoner ) the variable sttemRDF records the number of triples loade (it is an interesting value to watch for!). After this stepping down into the setEnailmentRules method shows how an a specific set of of entailment rules is parses as a datalog program by using an IRIS specific parser const

  2. After this the reasoner plugin can accept SPARQL queries, according to the basic eu.larkc.plugin.reason.reasoner interface. This interface contains methods for the different types of SPARQL queries, i.e. SELECT, CONSTRUCT, ...

  3. Once the plugin is invoked with a concrete query the internal rule engine (IRIS) is started.
  4. Consecutively IRIS performs reasoning according to the chosen evaluation strategy. In the case of bottom-up evaluation this corresponds to inferring and materializing additional triples, based on the input dataset and the given entailment rules.
  5. Afterwards the SPARQL query iteself is being evaluated.
  6. Finally results are returned.

3.4. Effect of changing rules

The effect of different rule sets becomes very visible in certain scenarios. A good example is the simplified wine dataset mentioned below, which clearly shows the benefit of reasoning. Executing it with only RDF entailment yields much less triples in total and also delivers much less answers to relevant queries.

In this example the advantage of employing reasoning becomes apparant in the following way : locatedIn is a transtitive property which can be used to relate different regions. There are Wines which are produced in an given region (say ?NapaRegion, located in California). Performing a simple SPARQL query which asks for all wines from a specified region (i.e. ?CaliforniaRegion) without entailment rules does not produce very comprehensive results. Enabling a suitable degree of reasoning in turn delivers a more complete query (try it with different rulesets!). The specific entailment rule which captures transitivity in this case is:

triple(?u, ?p, ?w) :- triple(?u, ?p, ?v), triple(?v, ?p, ?w), triple(?p,  "rdf:type", "owl:TransitiveProperty").

In general there are several rules which have a big impact on reasoning. The IRIS rule based reasoner plugin comes with several preconfigured rule sets of increasing complexity.

All off the predfined rulesets so far do not use RDF axomatic triples. Adding them in general adds little interesting results for queries (but might be an interesting to try!).

In general L2 produces the most triples. (Try the example query above but change the intended entailment rules from the demo example by passing a different constant to the constructor!).

Next remove rule rdfp5ab from the L2 ruleset. It is a rule that generates a lot of triples and helps to capture individual equality. Since it is at the moment still implemented in a naive way it causes quite some bloat. Now rerun the example query - the output should be exactly the same. The same can be done with rule rdfp7.

As a next step switch to the RDFS ruleset. Specific rules in this set introduce new blank nodes during inference and are difficult to implement efficiently. There are several approaches to deal with this.

<Solution>

3.5. Different evaluation strategies

Unfortunately, the current prototype does not yet do a conversion from SPARQL to disjunctive Datalog query, see From SPARQL to Rules (and back), Axel Polleres. Hence, the current prototype must:

However, there are still opportunities for experimenting with different evaluation strategies.

Bottom-up evaluation options include using naive and semi-naive rule evaluation algorithms. Unfortunately, the magic sets optimisation can not be used, because the user SPARQL query is not applied until after the inferences have been computed - magic sets can only be applied when constants appear in the query.

Top-down evaluation can also be examined. This may not be the most efficient method when computing all possible inferences, but a comparison of evaluation times with the bottom up strategies might prove interesting. Both OLDT and SLDNF strategies should work for simple rule-sets. SLDNF may fail if any combination of rules is recursive.'

LarkcProject/1stEarlyAdoptersWorkshop/Scenarios/RuleBasedReasoner (last edited 2009-05-12 00:06:43 by ?FlorianFischer)