Applying temporal databases to geographical data analysis M.-C. Fauvet , S. Chardonnel , M. Dumas , P.-C. Scholl and P. Dumolard
Laboratoire LSR-IMAG, Universit´e de Grenoble, BP 72, 38402 St Martin d’H`eres cedex, France
Laboratoire SEIGAD, Universit´e de Grenoble, Espace Serge Martin, BP 53, 38041 Grenoble cedex 9, France
[email protected] Abstract This paper reports an experience in which a temporal database was used to analyze the results of a survey on human behaviors and displacements in a ski resort. This survey was part of a broader study about the use of the resort’s infrastructure, based on the time-geography methodology. As such, the presented experience may be seen as an attempt to implement some concepts of the time-geography using temporal database technology. Throughout the paper, some shortcomings of current temporal data models regarding human displacements analysis are pointed out, and possible solutions are briefly sketched. Keywords: temporal databases, time-geography, human displacements analysis.
1 Introduction 1.1 Motivation and context It is a well known fact that space and time are ubiquitous components of data in general, and geographical data in particular. During nearly two decades, two separate branches of database research, respectively known as Spatial and Temporal Databases, have independently studied these two aspects of data, with little effort being done to integrate their results. Recently, many research projects and networks have been created with the goal of either integrating proposals from these two branches, or proposing database models tailor-cut for spatio-temporal applications. In this context, the MUST project1 adopts an empirical approach: it intends to study the contributions and limits of temporal and constraint data models regarding concrete time-varying geographical data analysis applications. Up to now, two models have been used for this experiment: 1 Partially funded by GDR CASSINI, the French research network on GIS, MUST gathers geographers from the SEIGAD lab and computer scientists from CNAM, INRIA and LSR-IMAG labs.
Accepted to DEXA Workshop on Spatio-Temporal DML’99 - Florence (Italy)
the constraint database framework DEDALE [7] and the temporal object model TEMPOS [3, 5]. The study reported here is part of this project, and deals more specifically with an application managing data about the use of resources and space over time, in the ski resort “Valloire” located in the French Alps. This application is part of a broader study on the use of the resort’s infrastructures, aiming at formulating proposals for introducing new activities and improving the use of the existing environment. Concretely, a survey on 200 tourists and inhabitants was conducted on the ski resort. During this survey, people described their activities, companionships and movements, during the day(s) preceding the survey. After digitalizing this information, no ready-to-use adequate tool was found to analyze this kind of data, essentially because of the tight connection between their spatial and temporal components. This was the main reason for choosing this application as a testbed for starting the work around MUST.
1.2 Application requirements and limitations of existing solutions From a spatial analysis viewpoint, what is required is to understand the relashionship between two multidimensional spaces: (1) a reference space (XYT) and (2) a domain where attributes take their values. These two spaces cannot be merged into a unique one because each of them plays a particular role. The solutions that can be envisaged are of two main kinds: (1) approximate this relationship, by means of statistical functions, fuzzy sets or AI techniques, (2) explore exact pattern matchings by way of neural nets or GIS. During our work, we chose this latter solution. However, when developing it, we realized that the spatio-temporal analysis facilities of current GIS are quite poor. Indeed, neither raster nor vectorial commercial GIS really support spatio-temporal querying based on a unique XYT reference [9, 10]. Raster GIS (or raster modules of page 1
vectorial GIS, such as GRID in Arc Info) are extremely limited regarding spatio-temporal data management. In our ski resort application, each itinerary could be represented by a set of layers where pixels are annotated with the duration of a stay or crossing. Adding time dependent attributes, such as the activity, or the companionship of people is possible by adding still more layers per interviewed person. These layers being numeric tables and raster GIS providing matrix operators, it is admittedly possible to compare spatio-temporal itineraries, but not in a simple, comfortable and generalizable way. Vectorial GIS on the other hand, inherit from their underlying DBMS models, many drawbacks regarding temporal, and a fortiori, spatio-temporal data management. Indeed, although most modern DBMS provide some built-in temporal datatypes modeling instants, durations and intervals, they still lack facilities for managing time-varying entities and associations. As a result, when the need for maintaining data evolution arises, their history has to be managed at the application level, which increases data and programs complexity. Temporal data models and DBMS [11, 4], and in particular the T EMPOS data model that we used in our work, aim at overcoming this lack of functionalities. Many research efforts are currently directed towards integrating proposals from spatial and temporal databases, e.g. the TEMPESTA [10] project and the CHOROCHRONOS network [6]. However, there is still much to be done before integrating these proposals into a spatio-temporal GIS.
2 Application description The purpose of this application is to produce an overall view of the actions (activities, movements) performed by people in the public areas of the resort and to compare this view with the resort’s architecture and resources [2]. More precisely, the goal of the study is to answer questions such as: ? Which groups of people perform which activities and under what conditions? ? Which are the main space-time trajectories? How and when do individual trajectories meet? What interactions between individuals and groups take place? ? How is the flow of individuals’ activities organized in space and time? What are the relevant spatial and temporal constraints that apply during the execution of some activities? We started by gathering spatio-temporal data on individuals’ activities. Two hundred interviews allowed us to follow space-time itineraries over a period of one to three days. Interviewed people described their former day(s) by answering questions such as: what were you doing, when, where, with whom? and by showing on a map their paths Accepted to DEXA Workshop on Spatio-Temporal DML’99 - Florence (Italy)
and stays. Each interview also included information on the social status of individuals and the conditions of their sojourn in Valloire. From this information, a set of time series of activities, places and companionships were produced. The methodology used to define the survey protocol and to analyze the results is based on the Time Geography approach [8]. Traditionally, geographers study change and immobility of entities by comparing their state at different instants, at aggregated and discrete levels. Time Geography, on the other hand, considers space and time as a unique continuous reference in which natural and social processes take place. Time Geography is based upon a view of a permanently moving world where actors must interact between them and with their environment in order to achieve their goals. Experience shows that from this perspective, complex relations between individuals, natural and artificial objects, become easier to retrieve. In the context of our study, the principles of time geography helped us to compare trajectories, so as to establish profiles of space-time usages.
2.1 Conceptual Modeling The following simplified statement summarizes the semantics of the data to be modeled : at a given instant, an individual is alone or accompanied, he/she performs some activity at some location which is part of a possibly broader place. A place has a functionality and a geographical extension. We therefore identify the following entities : Person: a person (or individual) participating to the survey is uniquely identified. For each person, some “profile” information is stored: age, sex, origin, etc. Activity: “lunch”, “sleep”, “walk” etc. are examples of activities. The set of activities is organized in a hierarchy (winter sports is less general than sports and more general than ski). Companionship: “family”, “friends”, “kids”, etc. are possible companionships of an individual. At a given instant, an individual may have none, one or several kinds of companionships. Place: the resort’s geographical extension is divided into places, the geometry of which may be represented by a point, a polyline or a polygon. Each place is characterized by its functionality: street, shopping center, living accommodation, etc. Location: a location models some specific zone in the resort, contained in a possibly broader place. For instance, a location may be a piece of street or some part of a mall. Figure 1 provides an UML class diagram, modeling a snapshot view of the application data (i.e. the temporal aspects of data are omited). The associations whose evolution is to be observed are represented by an ad hoc UML stereotype. A possible definition of this stereotype in the page 2
case of the activity association is given in figure 2. To simplify, we omitted in this figure the constraints ensuring that the timestamps of the snapshots composing the history of activities of an individual do not overlap. accompaniments
Person *
Activity
activity
name
0..1
*
surveyDate sex ageCategory address
location
*
Companionship name
*
0..1
Location description
geometry
1
Geometry
{contained in} 1
1 geometry
Place function name public?
Point
Polyline
Polygon
Figure 1: UML snapshot class diagram
similar queries aimed at establishing some temporal correlations between activities and places. The second step focused on finding typical spatiotemporal routines. At this point, we were first interested on establishing an overall partition of the daytime into “slots”. A useful query for this purpose was: Determine typical moments at which individuals change their activity (resp. location)?. After determining such “daytime slots”, all the queries formulated during the first step where revisited, by restricting them to each of these slots. For instance, the query How long do people spend on each place”, became: How long do people spend on each place between 10 a.m. and 1 p.m.. Finally, in order to find some internal structure in the daily schedule of individuals, we formulated several queries which retrieved frequent sequences of activities. A typical query of this kind is: Which is the most frequent sequence of three activities?. Conversely, we also formulated several queries which retrieved the frequency of occurrence of a given sequence of activities.
3 Logical modeling and querying Activity
Person
name
surveyDate sex ageCategory address
Snapshot
activities *
Interval
Figure 2: UML stereotype diagram for temporal association It is worth noting that the conceptual modeling of the application in UML allowed us to point out some flaws in the survey protocol. For instance, the possibility that a surveyed individual refused to describe part of his/her activities was originally not considered, and the question only arised when choosing the cardinalities of the roles in the activity association.
2.2 Queries The first step of the data analysis aimed at outlining the time-budget of the individuals. To this end, we issued several queries which performed some quantitative analysis of the time spent by individuals on each activity and/or place. A typical example of such kind of query is: How long do people spend on each activity (and/or each place)? By analyzing the result of these queries, we were able to classify activities and places in terms of the amount of time people spent on them, and a fortiori, to determine which are globally the most practiced activities (respectively places). A similar study was conducted to analyze the global spatial behaviour of people, and more precisely, to determine concentrations of individuals over some places at some moment. A typical query issued to this end was: What is the average time people stay in the commercial zones?. Other Accepted to DEXA Workshop on Spatio-Temporal DML’99 - Florence (Italy)
In this section, we discuss how the TEMPOS data model was used to model and query the data from the application. Throughout the discussion, we point out some shortcomings of TEMPOS that appeared during this experience. We believe that these shortcomings are not specific to TEMPOS, but rather apply to most of the existing temporal data models. As other proposed ODMG [1] temporal extensions, T EMPOS is made up of three main components : ? A collection of datatypes modeling temporal values (instants, durations, sets of instants) and temporal associations (histories). ? An extension of ODMG’s object model defining temporal counterparts of the concepts of class and property, as well as update operators over them. ? Extensions of ODMG’s data definition and query languages (resp. TempODL and TempOQL) integrating the main features of the above models. In the following paragraph, we use the second of these components to model the data of the “Valloire” application. Later, in section 3.3, we express some queries presented in the previous section using T EMP OQL.
3.1 Temporal properties & semantic assumptions In ODMG, a property is defined as an attribute or traversal path of a binary relationship attached to some class. For instance, possible properties of the class Person in application “Valloire” are name and activities. T EMPOS extends ODMG’s notion of property by introducing a distinction between temporal and fleeting properties. A property is temporal if its successive values are page 3
meaningful and thus recorded, or else fleeting if only its most recent value is meaningful (e.g. attributes name and ageCategory of the class Person depicted in figure 1). The value of a temporal property is a history, i.e. a function from a finite set of instants observed at a fixed granularity, to a set of values of a given type. The domain and the range of a history are respectively called its temporal and structural domain. In the sequel, we distinguish the historical value of a temporal property (which is a history), from its value at some fixed instant. For a given objet, the historical value of a temporal property is built from an observation domain, an effective history, and a semantic assumption. The observation domain is the set of instants during which the modeled phenomena is intended to be observed. For instance, given an object of the class Person, the observation domain of attribute activities with respect to this object is the set of instants during which the corresponding person described his/her activities. The effective history on the other hand, is a history whose temporal domain is included in (but not necessarily equal to) the observation domain, and which contains the input timestamped values attached to the property with respect to the corresponding object. Given an object of the class Person, the effective history of attribute activities for this person is exactly the brute time-series of activites of this person extracted from the survey (see section 2). As stated above, the effective history is included in, but not necessarily equal to, the corresponding observation domain. The difference between these two is called the potential domain of a temporal property with respect to an object. A semantic assumption is a characteristics of a temporal property which determines how the values taken by this property at those instants in the potential domain are calculated. In the actual definition of T EMPOS , there are three possible semantic assumptions: discrete, stepwise and linearly interpolated. In the discrete semantic assumption, the value of a property’s history at those instants not described by its effective history are equal to the neutral element of the structural domain type (e.g. 0 if it is integer, Nil is it is Object or a subclass of it, etc.). In the stepwise semantic assumption, the value of the property’s history remains constant between two successive values in the effective history. Attributes activities, companionships and locations of the class Person are examples of it. Finally, the linearly interpolated assumption applies only to numerically valued temporal properties and uses a simple linear interpolation between two successive points in the effective history.
3.2 Spatio-temporal semantic assumptions? Our experience on the “Valloire” application put forward the need to introduce in T EMPOS other semantic asAccepted to DEXA Workshop on Spatio-Temporal DML’99 - Florence (Italy)
sumptions than the three discussed above, and especially, some semantic assumption dealing with spatially-valued temporal properties. Indeed, the attribute location of the class Person, is a temporal attribute with a spatial structural domain type (a location has a geometry associated to it). The semantic assumption of this temporal attribute was taken above as stepwise, which means that, for a given object, the value of this property between two successive instants in the effective history remains constant. It is straightforward to see that this cannot accurately model continuous and arbitrary movement. Instead, a movement over some interval of time is modeled under this approach, by associating a segment or polyline denoting a path, to each instant in this interval. In the setting of the Valloire application this fitted well the requirements initially expressed since, in the surveys, people represented their movements by polylines whose segments were annotated by time intervals, and that no assumption was fixed a priori on the kind of movement people had. However, it became clear at the end of our experience that the application data and queries should be revisited under other semantic assumptions. For example, under some conditions, it would be natural to assume that individuals move at constant speed between the two edges of a path. Alternatively, this assumption could be replaced with a more complex one, which takes into account the fact that people in a car move slower in downtown than in a road. Still more sophisticated semantic assumptions could be envisaged, but probably none of them really captures the complexity of human displacements. A corollary of this statement is that the answers to queries involving the attribute location in the “Valloire” application are generally not exact. A possible approach to circumvent this problem would be to formulate such queries under different semantic assumptions, and to compare (and eventually merge) the results thereof obtained. To implement this approach, T EMPOS (or any other temporal data model intended to cope with this problem) should be enhanced in two ways: ? Incorporate new semantic assumptions on spatiallyvalued temporal attributes and/or provide extensibility mechanisms so that application developers may do so. ? Parameterize query expressions by the semantic assumptions of the involved interpolated properties
3.3 Queries on histories We express some of the previously discussed queries, expressed in T EMP OQL : an extension of OQL integrating some datatypes such as instant, interval and history together with operators over them. Figure 3 describes the T EMP OQL operators used in these queries. In addition, we use some basic operators on spatial datatypes and assume that the reader is familiar with them. Most of these queries are running on the current version page 4
of the T EMPOS prototype. For the sake of comprehensiveness, some of them have been deliberately simplified. We consider five fundamental kinds of queries: “history restriction” (a kind of selection), “history join”, “succession”, “pattern-matching” and “temporal grouping”. The first query that we present, combines a spatial restriction over a region, with a temporal restriction over a set of instants. Such queries usually arise when the user needs to focus on some portion of the temporal data. Q1 : Spatial and temporal restrictions Let R be a polygon, retrieve persons’ locations when she/he is located in R between 10am and 1pm on 12/12/1998. select struct (person: p, location: (p.location during [@“12/12/98 10am”..@“12/12/98 1pm”]) as loc when loc.geometry.containedIn (R)) from ThePersons as p f @ is an instant constructor and [i1..i2] denotes an interval bounded by the two instants i1 and i2. “containedIn” is a boolean spatial operator which checks that a spatial object is contained in another one. g
As it may be seen from section 2.2, the queries on the “Valloire” application often involve comparisons of synchronous values taken by several histories. Depending on the query, this may be achieved either by using set operators extended to histories (i.e. intersect and except) or the temporal join operator, which allows to combine two histories. Q2 : History combination Retrieve the pairs of persons who never got closer to each other than 200 meters? select * from ThePersons as p1, ThePersons as p2 where p1 != p2 and not exists (join (local1 : p1.location, local2 : p2.location) as c when c.local1.geometry.within (“200m”, c.local2.geometry)) f “exists(h)” is true iff history “h” is not empty g
Notice that in this expression, there is a temporal join condition (expressed by the T EMP OQL join construct), and a spatial join condition (masked in the when clause). More generally, the above two query expressions show how operators on spatial objects may be embedded into temporal operators in order to express some spatio-temporal queries. Two questions arise at this point. First, what kinds of spatio-temporal queries may be expressed using this query expression paradigm? and a fortiori, is the expressive power of this paradigm sufficient for a given kind of application? Second, How are the interactions between temporal and spatial operators handled at the query evaluation level? We believe that answers to these questions are fundamental for integrating spatial and temporal data models. As stated in 2.2, the second phase of the data analysis aimed at examining sequences of activities. To express the queries formulated during this phase, it was often necessary to reason about successive values of histories and their Accepted to DEXA Workshop on Spatio-Temporal DML’99 - Florence (Italy)
correlations. The beforefirst operator on histories provides such facility. Informally, this operator selects all snapshots following the “first” snapshot which satisfies a given predicate. Q3 : Succession Retrieve those persons who first were at hotel h1, then moved to hotel h2 (for simplicity we assume that h1 and h2 designate instances of the class Location.) select * from ThePersons as p where h1 in range (p.location as loc beforefirst loc = h2) f h1 is in range of the history where person p stayed before the first time she/he stayed at h2. g
More generally, queries about succession in time intensively deal with the sequenced structure of histories. T EM POS offers a pattern-matching language that takes into account this structure, therefore providing a powerful tool for expressing this kind of queries [3]. The main originality of the pattern description language provided by T EM POS with respect to previous similar approaches, lies on the use of regular expression operators, i.e. sequencing (followed by), repetition with or without time constraints (several, at most, at least) and disjunction. T EMP OQL provides a construct (matches) that checks if a history contains at least one occurrence of a given pattern. Q4 : Pattern-matching. Retrieve all persons who went shopping, then skiing during at least 3 hours and then, at most one hour after ceasing skiing, went shopping again. select p from ThePersons as p where p.activity as a matches a.name = “shopping” followed by a.name = “skiing” at most ]“3 hours” followed by at most ]“1 hour” followed by a.name = “shopping”
Another class of operators which was quite useful in the context of the Valloire application concerns temporal grouping. These operators structure a history into a history of histories according to some temporal criteria. The following query illustrates a particular kind of grouping based on time units. Q5 : Temporal grouping. For each person and for each day, how long did this person skied? select struct (person : p, skied : map duration (partition) on p.activity when a.name “ski” group by Day) from ThePersons as p
=
In the above example, the history of the activities of a person is restricted to those instants when she/he was skiing. The result is grouped by days, and for each generated partition, the duration is computed, yielding a history at the granularity of the day which associates to each day the total amount of time that the person spent skiing.
4 Conclusion The work presented throughout this paper is the result of a tight collaboration between geographers and database researchers, which led to several mutual contributions. For page 5
Operator h during S h as x when P(x) h as x beforefirst P(x) join (a1: h1, a2: h2) h as x matches P map f(partition) on H group by U
Specification
f j 2 h ^ i 2 Sg f j 2 h ^ P(v)g f j 2 h ^ : 9 2 h (P(v’) ^ i i’)g f j 2 h1 ^ 2 h2g
There is an occurrence of pattern P in history h (see [3] for a formal definition) f j 9 I’ 2 domain(H), I = approx(I’, U) ^ subh = H during expand(I, Unit(H))g “domain(h)” yields the domain of history “h” seen as a function; “approx” and “expand” are conversion functions over instants such that approx(@“1/12/98”, Month) = @“12/98” and expand(@“12/98”, Day) = [@“1/12/98”..@“31/12/98”] Figure 3: Specification of T EMP OQL major operators
instance, during the design of the survey protocol (which took place before the collaboration started) some decisions were constrained by the limitations of the statistical tool that was to be used to analyze the results of the survey. The requirements analysis and the UML conceptual modeling of the application developed during this collaboration, provide a faithful and detailed modeling of the data, which could be quite useful for future investigations. From the querying viewpoint, this experience allowed to highlight that the operators on histories provided in T EMPOS (which are representative of those provided in most temporal data models), match the needs of the studied application, and perhaps, some of the basic paradigms of the time geography. In particular, operators such as the history restriction ( during and when) and the history join (tjoin), allow to deal with questions related to the context under which individuals perform their activities, whereas operators for reasoning about succession in time (e.g. beforefirst and match) are useful when analyzing the chains of activities that individuals perform over time. We plan to continue our research by studying design and implementation issues related to the management of interpolated histories of spatial objects such as points or regions. As discussed in 3.2, this is one of the major actual limits of existing temporal data models regarding spatiotemporal data handling. From the querying viewpoint on the other hand, the limits of the approach consisting on embedding spatial operators into temporal ones for expressing spatio-temporal queries should be furtherly studied. Acknowledgments. We wish to thank Michel Scholl for his relevant comments on an early draft of this paper.
References [1] R.G.G. Cattell and D. Barry, editors. The Object Database Standard: ODMG 2.0. Morgan Kaufmann, 1997. [2] S. Chardonnel. Emplois du temps et de l’espace - Pratique des populations d’une station touristique de montagne. Th`ese de doctorat, Universit´e Joseph Fourier, Grenoble (France), janvier 1999. Accepted to DEXA Workshop on Spatio-Temporal DML’99 - Florence (Italy)
[3] M. Dumas, M.-C. Fauvet, and P.-C. Scholl. Handling temporal grouping and pattern-matching queries in a temporal object model. In proc. of the CIKM International Conference, Bethesda, MD (USA), November 1998. [4] O. Etzion, S. Jajodia, and S.M. Sripada, editors. Temporal Databases: Research and Practice. Springer Verlag, LNCS 1399, 1998. [5] M.-C. Fauvet, J.-F. Canavaggio, and P.-C. Scholl. Modelling histories in object DBMS. In proc. of the 8th Int. Conference on Database and Expert Systems Applications (DEXA), Toulouse (France), September 1997. Springer Verlag. LNCS 1308. [6] A. Franck and S. Winter. First CHOROCHRONOS Intensive Workshop (CIW’97). Technical Report CH-97-02, http://www.dbnet.ece.ntua.gr/˜choros, CHOROCHRONOS, november 1997. [7] S. Grumbach, P. Rigaux, and L. Segoufin. The DEDALE System for Complex Spatial Queries. In Proc. of the ACM-SIGMOD International Conference on Management of Data, Seattle, USA, June 1998. [8] T. H¨agerstrand. Time geography: Focus on the corporeality of Man, Society and Environment. The Science and Praxis of Complexity, 1985. [9] G. Langran. Time in Geographic Information Systems. Taylor & Francis, 1993. [10] D. Peuquet and L. Qian. An integrated database design for temporal GIS. In 7th international symposium on Spatial Data Handling, August 1996. [11] A. U. Tansel, J. Clifford, S. Gadia, S. Jajodia, A. Segev, and R. Snodgrass, editors. Temporal Databases. The Benjamins/Cummings Publishing Company, 1993.
page 6