Requirements by Example
In this section, we will investigate reasoning requirements of traffic management (claiming that they can be extended to urban computing as a whole). As argued before, we are particularly interested in the reasoning requirements for ontology-based semantic techniques. The discussion in this section is expected to provide some hints for logicians or engineers who work in the semantic technology to explore the possibilities to develop new form of reasoning for the semantic web.
Coping with Heterogeneity
Dealing with heterogeneous data has been appealed for long time in many areas in computer science and engineering, which include database systems, multimedia application, network systems, and artificial intelligence. Here, we would like to propose a comprehensive notion of heterogeneity processing for semantic technologies. We distinguish the following different levels of heterogeneity: Representative Heterogeneity, Semantic Heterogeneity, and Default Heterogeneity.
Representational Heterogeneity
Representational Heterogeneity means that semantic data are represented by using different specification languages. Systems supporting representational heterogeneity would allow for semantic data which are specified by multiple semantic languages, rather than using a single metadata or ontology language, like OWL or RDF/RDFS. However, note that different representations of semantic data does not necessarily mean that they have different semantics. That would be different from Semantic Heterogeneity discussed in the next paragraph.
Examples
Urban computing-related data can come from different and independent data sources (see ?/PublicAvailableDataSources), which can be developed with traditional technologies and modeling methods (e.g., relational DBMS) or expressed with "semantic" formats and languages (e.g., RDF/S, OWL, WSML). For example,
geographic data are usually expressed in some kind of geographic standard;
events details are published on the Web under the form of RSS feeds; for instance compare search event response of Eventful and Upcoming
- traffic data are stored in databases; etc.
The integration and reuse of those data therefore need a process of conversion/translation for the data to become useful together.
Semantic Heterogeneity
Semantic Heterogeneity means that the systems allow for multiple paradigms of reasoners. For instance, many applications of urban computing may need different reasoners for temporal reasoning, spatial reasoning, and causal reasoning. However, it does not necessarily mean that we have to develop a single but powerful reasoner which can cover all of those reasoning tasks. A system which supports semantic heterogeneity would find a way to allow multiple single-paradigm-based reasoner to achieve the result of Semantic Heterogeneity.
Examples
- data that need precise and consistent inference
- knowing if two roads are connected for a given kind of vehicle; at a given junction all vehicles, but public transportation ones, must go straight
- checking if private cars are allowed to enter a specific urban area
- data that need approximate reasoning or imperfect estimations
- calculating the probability of a traffic jam given the current traffic conditions and the past history
Therefore, the requirement is for different kinds of techniques and reasoners to deal with those kinds of data; moreover, another requirement is for a system which dynamically selects and runs a specific reasoner on the basis of the available data and the desired processing tasks.
Note from Siemens: there are two things mentioned mixed up and in turn: paradigms and tasks. It seems that selecting paradigms is only dependent on data and that the data is somehow nothing but tasks, since the word DATA in examples can be replaced by TASK.
Default Heterogeneity
By Default Heterogeneity, we mean that systems support for various specification defaults of semantic data. Well-known specification defaults of semantic data are closed world assumption, open world assumption, unique name assumption and non-unique name assumption. In the semantic web community, it is widely accepted that semantic data for the Web should take the open world assumption and the non-unique name assumption, as taken by the popular ontology language OWL.
Examples
However, as we have observed in many applications of urban computing, we should not commit to any single specification default. Take the example of traffic and transportation ontologies, although in many cases we can take the open world assumption and non-unique name assumption, because of our limited knowledge and information about the data. Sometimes it is much convenient to take a "local" closed world assumption. For example, for a time table of a bus station, it is well reasonable to assume that the information about the bus schedule in the time table is locally complete, with the sense that if you cannot find any information about a bus which is scheduled at specific time would mean that there are no any bus scheduled for that time. The same scenario is also applied to a city map. If there is not any information which states a road which connects two streets directly on the map that would mean that there are no road which connects two streets directly.
The same applies to Unique Name Assumption. Consider the Story Board and in particular the fact that Repubblica is the name of the metro station colocated with the train station di Milano Piazza Repubblica; if you have to calculate a trip and the user is aware that you will use multiple means of transportation then you can ignore that the two Repubblica stations are not exactly the same one. If, on the contrary, the user wants only to use subways then you cannot assume that the two Repubblica station are one physical place.
The examples above show that the semantic systems of urban computing should support multiple specification defaults. It should allow user or knowledge engineers feel freedom to state any data with any reasoning assumption. Some part of semantic data may be based on the open world assumption, and some part may be based on the closed world assumption.
Coping with Scale
The advent of Pervasive Computing and Web 2.0 technologies led to a constantly growing amount of data about urban environments, like information coming from multiple sensors (traffic detectors, public transportation, pollution monitors, etc.) as well as from citizens' observation (black points, commercial activities' ratings, events organization, etc.). The result, however, is that the amount of data available to be used and integrated is not manageable by state-of-the-art technologies and tools and a severe focus on scalability issues must be taken into account. For example, intelligent methods for data sampling or selection should be adopted before employing traditional reasoning techniques, e.g. to select traffic data to employ in predictions.
MORE TO BE ADDED HERE
Coping with time-dependency
Knowledge and data can change over the time. For instance, in Urban Computing names of streets, landmarks, kind of events, etc. change very slowly, whereas the number of cars that go through a traffic detector in five minutes changes very fast.
This means that the system must have the notion of observation period, defined as the period when we the system is subject to querying.
Moreover the system, within a given observation period, must consider the following four different type of knowledge and data:
invariable knowledge:
- it includes obvious terminological knowledge (such as an address is made up by a street name, a civic number, a city name and a ZIP code) and
- less obvious nomological knowledge that describes how the world is expected
- to be (e.g., given traffic lights are switched off or certain streets are closed during the night) or
- to evolve (e.g., traffic jams appears more often when it rains or when important sport events take place).
Invariable data: they not change in the observation period, e.g. the names and lengths of the roads.
Periodically changing data: they change according to a temporal law that can be
- Pure periodic law, e.g. the fact that every night at 10pm Milan west-side overpass road closes; or
- Probabilistic law, e.g. the fact that a traffic jam is present in the west side of Milan due to bad weather or due to a soccer match is taking place in San Siro stadium.
Event driven changing data: they are updated as a consequence of some external event and they can be further characterized by the mean time between changes:
- Fast, as an example consider the intensity of traffic (as monitored by sensors) for each street in a city;
- Medium, as an example consider roads closed for accidents or congestion due to traffic;
- Slow, as an example consider roads closed for scheduled works.
Periodically changing data and Event driven changing data are best represented as data streams, unbounded sequences of time-varying data elements. Data streams occur in a variety of modern applications, such as network monitoring, traffic engineering, sensor networks, RFID tags applications, telecom call records, financial applications, Web logs, click-streams, etc. The very nature of Traffic Management can be explained by means of data streams, representing real objects that are monitored at given locations: cars, trains, crowds, ambulances, parking spaces, and so on.
Processing of data streams has been largely investigated in the last decade, specialized Stream Database Management System (SDMS) have been developed, and features of SDMS are becoming supported by major database products, such as Oracle and DB2. While reasoners are year after year scaling up in the classical, time invariant domain of ontological knowledge, reasoning upon rapidly changing information has been neglected or forgotten.
Coping with Noisy, Uncertain and Inconsistent Data
We distinguish the following different levels of data uncertainty and inconsistency.
Noisy Data. Part of data is completely useless or semantically meaningless.
Inconsistent Data. Part of data are logically self-contradictory, or are semantically impossible.
Uncertain data. The semantics of data are partial, incomplete, or they are conceptually arranged into a range with multiple possibilities.
Examples
Traffic data and expecially Floating car data are a very good example of such data.
different sensors observing the same road area give apparently inconsistent information; e.g. a traffic camera may say that the road is empty whereas a inductive loop traffic detector may tell 100 vehicles went over it. The two information may be coherent if one consider that a traffic camera transmits an image per second with a delay of 15-30 seconds, whereas an inductive loop traffic detector tells you the number of vehicles that when over it in 5 minutes and the information may arrive to you 5-10 minutes later.
- a single data coming from a sensor in a given moment may have no certain meaning; e.g. consider an inductive loop traffic detector, it tells you 0 car went over, what does it mean? The road is empty? The traffic is completely stuck? may have somebody parked the car above the sensor? Is the sensor broken? Combining multiple information from multiple sensors in a given time window can be the only reasonable way to reduce the uncertainty.
