Requirements by Example

In this section, we will investigate reasoning requirements of traffic management (claiming that they can be extended to urban computing as a whole). As argued before, we are particularly interested in the reasoning requirements for ontology-based semantic techniques. The discussion in this section is expected to provide some hints for logicians or engineers who work in the semantic technology to explore the possibilities to develop new form of reasoning for the semantic web.

Coping with Heterogeneity

Dealing with heterogeneous data has been appealed for long time in many areas in computer science and engineering, which include database systems, multimedia application, network systems, and artificial intelligence. Here, we would like to propose a comprehensive notion of heterogeneity processing for semantic technologies. We distinguish the following different levels of heterogeneity: Representative Heterogeneity, Semantic Heterogeneity, and Default Heterogeneity.

Representational Heterogeneity

Representational Heterogeneity means that semantic data are represented by using different specification languages. Systems supporting representational heterogeneity would allow for semantic data which are specified by multiple semantic languages, rather than using a single metadata or ontology language, like OWL or RDF/RDFS. However, note that different representations of semantic data does not necessarily mean that they have different semantics. That would be different from Semantic Heterogeneity discussed in the next paragraph.

Examples

Urban computing-related data can come from different and independent data sources (see ?/PublicAvailableDataSources), which can be developed with traditional technologies and modeling methods (e.g., relational DBMS) or expressed with "semantic" formats and languages (e.g., RDF/S, OWL, WSML). For example,

The integration and reuse of those data therefore need a process of conversion/translation for the data to become useful together.

Semantic Heterogeneity

Semantic Heterogeneity means that the systems allow for multiple paradigms of reasoners. For instance, many applications of urban computing may need different reasoners for temporal reasoning, spatial reasoning, and causal reasoning. However, it does not necessarily mean that we have to develop a single but powerful reasoner which can cover all of those reasoning tasks. A system which supports semantic heterogeneity would find a way to allow multiple single-paradigm-based reasoner to achieve the result of Semantic Heterogeneity.

Examples

Therefore, the requirement is for different kinds of techniques and reasoners to deal with those kinds of data; moreover, another requirement is for a system which dynamically selects and runs a specific reasoner on the basis of the available data and the desired processing tasks.

Note from Siemens: there are two things mentioned mixed up and in turn: paradigms and tasks. It seems that selecting paradigms is only dependent on data and that the data is somehow nothing but tasks, since the word DATA in examples can be replaced by TASK.

Default Heterogeneity

By Default Heterogeneity, we mean that systems support for various specification defaults of semantic data. Well-known specification defaults of semantic data are closed world assumption, open world assumption, unique name assumption and non-unique name assumption. In the semantic web community, it is widely accepted that semantic data for the Web should take the open world assumption and the non-unique name assumption, as taken by the popular ontology language OWL.

Examples

However, as we have observed in many applications of urban computing, we should not commit to any single specification default. Take the example of traffic and transportation ontologies, although in many cases we can take the open world assumption and non-unique name assumption, because of our limited knowledge and information about the data. Sometimes it is much convenient to take a "local" closed world assumption. For example, for a time table of a bus station, it is well reasonable to assume that the information about the bus schedule in the time table is locally complete, with the sense that if you cannot find any information about a bus which is scheduled at specific time would mean that there are no any bus scheduled for that time. The same scenario is also applied to a city map. If there is not any information which states a road which connects two streets directly on the map that would mean that there are no road which connects two streets directly.

The same applies to Unique Name Assumption. Consider the Story Board and in particular the fact that Repubblica is the name of the metro station colocated with the train station di Milano Piazza Repubblica; if you have to calculate a trip and the user is aware that you will use multiple means of transportation then you can ignore that the two Repubblica stations are not exactly the same one. If, on the contrary, the user wants only to use subways then you cannot assume that the two Repubblica station are one physical place.

The examples above show that the semantic systems of urban computing should support multiple specification defaults. It should allow user or knowledge engineers feel freedom to state any data with any reasoning assumption. Some part of semantic data may be based on the open world assumption, and some part may be based on the closed world assumption.

Coping with Scale

The advent of Pervasive Computing and Web 2.0 technologies led to a constantly growing amount of data about urban environments, like information coming from multiple sensors (traffic detectors, public transportation, pollution monitors, etc.) as well as from citizens' observation (black points, commercial activities' ratings, events organization, etc.). The result, however, is that the amount of data available to be used and integrated is not manageable by state-of-the-art technologies and tools and a severe focus on scalability issues must be taken into account. For example, intelligent methods for data sampling or selection should be adopted before employing traditional reasoning techniques, e.g. to select traffic data to employ in predictions.

MORE TO BE ADDED HERE

Coping with time-dependency

Knowledge and data can change over the time. For instance, in Urban Computing names of streets, landmarks, kind of events, etc. change very slowly, whereas the number of cars that go through a traffic detector in five minutes changes very fast.

This means that the system must have the notion of observation period, defined as the period when we the system is subject to querying.

Moreover the system, within a given observation period, must consider the following four different type of knowledge and data:

Periodically changing data and Event driven changing data are best represented as data streams, unbounded sequences of time-varying data elements. Data streams occur in a variety of modern applications, such as network monitoring, traffic engineering, sensor networks, RFID tags applications, telecom call records, financial applications, Web logs, click-streams, etc. The very nature of Traffic Management can be explained by means of data streams, representing real objects that are monitored at given locations: cars, trains, crowds, ambulances, parking spaces, and so on.

Processing of data streams has been largely investigated in the last decade, specialized Stream Database Management System (SDMS) have been developed, and features of SDMS are becoming supported by major database products, such as Oracle and DB2. While reasoners are year after year scaling up in the classical, time invariant domain of ontological knowledge, reasoning upon rapidly changing information has been neglected or forgotten.

Coping with Noisy, Uncertain and Inconsistent Data

We distinguish the following different levels of data uncertainty and inconsistency.

Examples

Traffic data and expecially Floating car data are a very good example of such data.

LarkcProject/WP6/WorkInProgress/RequirementsByExample (last edited 2008-11-04 18:57:51 by ?DanieleDellAglio)