Why we need a fixed syntax, but no fixed (minimal) semantics
Frank, Annette, 17 Sep 2008
Summary of the conclusions of this note
(arguments follow below).
CONCLUSION: we should choose RDF (or some light extension of it) as the required syntactic form.
CONCLUSION: we need some lightweight meta-vocabulary to indicate the concrete syntax that plugins use for their I/O (if they do any triple transmission at all).
CONCLUSION: we don't see the need for enforcing a minimally required semantic interpretation (and in fact, it would positively hurt).
CONCLUSION: just as we do not want to have a minimal requirement on the REASONing component (previous point), we see no reason to (and indeed: no way to) definine minimal functional requirements on any of the other plugin types (besides them fitting the I/O signature as defined in the API, of course).
CONCLUSION: LarKC cannot do without a vocabulary for indicating the semantic interpretation that plugins give to their input & output.
Heterongeity vs. interoperability
In principle, we could decide to not enforce any agreements on syntactic format and semantic interpretation of the I/O of plugins. The plugins could then simply indicate using some meta-vocabulary what their syntactic format and their semantic interpretation are, and it would then be the job of a human user or a DECIDE plugin to piece together different compatible plugins. This would allow the maximum flexibility for plugin developers. However, in practice this would limit the chance that two independently developed plugins would ever be compatible.
Thus, the trick will be to make some restrictions on both syntactic format and semantic interpretation to maximise "serendipitous interoperability" while limiting as little as possible the design flexibility for individual plugins.
Fixed syntax, flexible semantics?
In the current Semantic Web world, the situation is asymmetric between syntax and semantics:
There is little if any agreement on what the "right" semantic interpretation is, and it seems unlikely that there will ever be such agreement (different use-cases have different semantic requirements). Thus, we will want to maximise the semantic freedom between the LarKC components, while still somehow allowing them to interoperate (possibly with loss of completeness and soundness).
There is much more agreement on the syntact format: even if people hate it, most have learned to live with encoding their data and knowledge as RDF graphs (although with different semantic interpretations, see previous point).
Fixed syntac: RDF
Thus, a reasonable option would be to enforce a single syntactic format (or only a small variety of variations on a single format), but to be much more liberal on the semantic interpretation.
Even here, there is not full agreement. Should we use triples, or quads? And should we then use the fourth field for modules? Provenance? Weights? Or should we allow arbitraty N-tuples with a fixed interpretation on the first 3 as <S,P,O> and the rest open? Should we allow reification or not? etc. There still remains the question then if in LarKC we enforce a single syntactic form (giving an answer on all these questions once and for all) or if we leave the syntactic format somewhat "extensible".
CONCLUSION: we should choose RDF (or some light extension of it) as the required syntactic form.
[NOTE: the "light extension of it" is hiding a real problem on quads etc, see next note].
[NOTE: the argument for the a fixed syntactic format is not a theoretical or technical argument, but a pragmatic/sociol one, based on current practice in the Semantic Web community (widespread use of RDF). Asking the same question (should we allow syntactic heterogenity) 15 years ago in the AI community would have received a different answer: people were representing knowledge as formula-trees, S-expressions, ASCII-files, etc. In such a community, one would choose to allow syntactic heterogenity and then start building a set of syntactic translators to deal with it. Fortunately, this is not AI in the 90's, but Semantic Web now].
By the way, when we say "syntax" here, we mean the abstract syntax of RDF (labelled graphs with blank nodes and URI's or strings). Of course there are different forms of concrete syntax for RDF (XML/RDF, N3, Turtle, OWL2-XML, etc). It is a simple matter of serialising and parsing to transform from on to the other (in the case when we want to transmit the triples; if plugins just communicate by pointing to RDF graphs, the problem of the concrete syntax disappears alltogether). In the cases when the concrete syntax does matter (when "transmitting" triples between the plugins), the plugins need to be able to indicate what their concrete I/O syntax is, and they need to be able to call conversion software (e.g. from N3 to RDF/XML), but this is a simple issue: these convertors exist, and some simple form of meta-vocabulary to indicate the concrete syntax is all that is needed. This is typical stuff that would be arranged either by the DECIDE component at run-time, or by a human user at configuration time (in case of a hardwired DECIDEr.
CONCLUSION: we need some lightweight meta-vocabulary to indicate the concrete syntax that plugins use for their I/O (if they do any triple transmission at all).
Now that we have settled the syntactic side, we explore some questions about how liberal the semantic interpretation should be (do we need a minimally required semantics? How to indicate the semantic interpretation of a given dataset).
why minimally required semantics at all?
Why would we need agreement on a minimal logical semantics anyway? Presumably because plugins would be required to handle this semantics in a sound and complete way. (we wouldn't know which other reason there would be for agreement on a minimal logical semantics. If plugins are allowed to be incomplete/unsound even on the minimal semantics, then we why would we need a minimal semantics at all?).
It would seem much too strong to us to require that *every* LarKC plugin must guarantee soundness and completeness on some particular minimal semantics. We don't really see the need for it, and it would limit the design space of plugin developers without reason.
For example, there could be use for a REASONing plugin that does database-style interpretation of RDF domain/range statements (interpreting them as constraints instead of implications (Dieter would like this pluging a lot :-). Such a plugin would be impossible if all plugins would have to be sound&complete on (say) at least the RDF semantics.
CONCLUSION: we don't see the need for enforcing a minimally required semantic interpretation (and in fact, it would positively hurt).
Minimal semantics for which plugin-type?
Among all the different plugin-types (IDENTIFY, TRANSFORM, SELECT, REASON, DECIDE), the demand to be "sound-and-complete wrt the minimal semantics" only makes sense for the REASONing plugins. What would the minimal quality requirements be for IDENTIFY, TRANSFORM, SELECT and DECIDE?
CONCLUSION: just as we do not want to have a minimal requirement on the REASONing component (previous point), we see no reason to (and indeed: no way to) definine minimal functional requirements on any of the other plugin types (besides them fitting the I/O signature as defined in the API, of course).
The need for "formal-semantics tags"
The obvious requirement in a pipeline is that plugins can understand each others input & output. If we don't go for a single fixed semantics (which would seem too limited in any case), such mutual understanding is no longer guaranteed. The DECIDE component must know how the plugins interpret their input and output in order to decide which ones to combine The role of the formal-semantics tags is to allow components state what their assumptions are on the semantic interpretation of their input and output, and these can be matched by the DECIDE plugin for compatability.
Examples:
- if the DECIDE plugin has to choose between two different REASONers it is important to know that (a) the TRANSFORM component produced OWL data, and that (b) one REASONer interprets OWL semantics and the other REASONer interprets only RDF semantics. The DECIDE component can in principle use both on the OWL dataset, but the RDF REASONer will be incomplete wrt to the OWL semantics of the dataset.
- if a TRANSFORM component does vocabulary mapping on SKOS data, the DECIDE plugin must choose an IDENTIFY plugins that indeed returns SKOS data.
(the formal-semantics tags are needed these examples for sophisticated DECIDE plugins that do meta-reasoning over the selection of other plugins, but the tags are already needed for simple hardcoded script DECIDErs, because the human coder of the script needs to know the same compatability information).
CONCLUSION: LarKC cannot do without a vocabulary for indicating the semantic interpretation that plugins give to their input & output.
