Mining spatiotemporal associations using queries

July 3, 2017 | Autor: Hana Alouaoui | Categoria: Data Mining, Database Systems, Databases, Spatio Temporal Data Mining (Data Mining)

Share Embed

Denunciar este link

Descrição do Produto

International Conference on Information Technology and e-Services 2012

Mining spatiotemporal associations using queries HANA Alouaoui, SAMI Yassine Turki, SAMI Faiz LTSIRS Laboratory of Remote Sensing and Spatial Information Systems ENIT National Engineering School of Tunis- Tunis El Manar University Tunis, Tunisia

relationships between geographical objects during given time intervals. This step is achieved by spatial data mining queries enriched by time management tools.

Abstract— In this paper, we present our approach for mining spatiotemporal knowledge. The proposed method is based on the computation of neighborhood relationships between geographical objects during a time interval. This kind of information is nonexplicitly stored in spatio-temporal database and is extracted by the means of special mining queries enriched by time management parameters. The general aim of our approach is to develop a method that utilizes the inherent structure of spatiotemporal information as well as its rich semantics to derive spatio-temporal association rules in order to improve the decision making process about land changes and resulting prohibited risks.

This paper is organized as follows: in the section 2, we introduce the spatio-temporal concepts and we give an overview of the existing data mining query languages used in the KDD. The section 3 describes our proposed approach of minig spatio-temporal knowledge. II.SPATIO-TEMPORAL KNOWLEDGE EXTRACTION A.

Spatio-temporal Knowledge Spatio-temporal data is usually modeled by extending temporal databases or spatial databases. We can achieve this extension in two ways: spatial properties and operations can be added in temporal databases or temporal properties and operations can be added in spatial databases.

Keywords: spatio-temporal data mining; data mining query languages; spatio-temporal predicates.

I.INTRODUCTION The high availability of huge databases - rich in hidden information beyond human’s ability to retrieve manually- and the prominent necessity of information and knowledge extraction from such data, have demanded valuable efforts from the scientific community. The exploration of these huge data sets with existing querying techniques is a challenging task. To alleviate this problem and offer additional tools for analysis, data mining and knowledge discovery in databases (KDD) provide techniques to extract useful, implicit information from large databases [7].

In addition, spatio-temporal data is usually complex due to the complexity of merging time and space concepts together then it is so difficult to process spatial relationships evolving during the time. Furthermore, spatio-temporal data mining is distinguished from classical and general data mining due to special characteristics such as rules that we can mine from it, similar patterns of change and spatio-temporal evolution patterns and so on. 1) Spatio-temporal database Spatio-temporal database (STDB) is a powerful tool that embodies the spatial, temporal and spatio-temporal concepts [20]. A perfect STDB, in addition to the normal functions of spatial database, has also the ability to keep track of modified data that has changed over the time.

However, these techniques have been successfully used in applications dealing with transactional and relational data. On a lesser scale, spatial applications have existed and have been explored in a minority of research works [12], [1], [17].The comparatively small number of data mining techniques available for spatial and temporal information systems can be explained by not only the more widespread use of other data types, but also the complexity of spatial and spatio-temporal data vs. relational data. Consequently, extracting interesting and useful patterns from spatial and temporal sets is more difficult than extracting corresponding patterns from traditional data due to the complexity of spatial and temporal data types and spatial relationships changing over the time.

2)

Spatio-temporal object A spatio-temporal object can be represented as a four dimensional vector such as (ai, gi, pi, ti), where ai, gi, pi and ti describe attributes, geometrical positions, topological relations and temporal dimensions, respectively [11]. Each element in this vector can have various dimensions, such as geometrical position can have from one to three dimensions and temporal dimension can have one, two (bi-temporal) or many (multitemporal) dimensions [19].

The present research, proposes an approach aiming to mine spatio-temporal association rules (STAR). These rules will be useful in improving the decision making process related to risk prediction. To reach this finality, we should take into account the spatio-temporal knowledge derived by the current phase of work. Our contribution presented in this paper is to process the spatiotemporal components by computing neighborhood

B. Data mining query languages for KDD There have been a number of contributions dealing with different aspects of this problem by proposing structured languages for KDD specification. These languages follow SQL

127

patterns and provide techniques for data preprocessing such as accessing, cleaning, transforming, deriving and mining data [3].

our proposal, the determination of the STAR goes through 3 phases:

These languages can integrate background knowledge, like concept hierarchies and can define thresholds (e.g. support, confidence; in the case of association rules extraction) in order to extract just the most interesting patterns [3]. A Data Mining Query Language DMQL has been proposed in [9] for mining association rules using concept hierarchies [10] as background knowledge.



Phase 1: Calculation of spatio-temporal predicates (spatial relationships between geographical entities over the time).



Phase 2: Generation of frequent item sets: an item set is frequent if at least its support is equal to a minimum threshold (minsup).



Phase 3: Extraction of spatiotemporal association rules.

In the reference [18], the authors based their work on a new operator, named MINE RULE, designed as an extension of the SQL language in order to discover association rules.

In this research, we focus on the first step considered as the largest typically because the effectiveness and efficiency of the extracted rules is based on these relationships.

Other languages have been built on the principles of relational databases [14], [21], [16]. They follow the SQL patterns with resources for accessing, cleaning, transforming, deriving and mining data, beyond knowledge manipulation.

In order to enhance this first phase of ST knowledge extraction we shall cope with several parameters: We need to represent the objects taking into account their positions in space and their existence in time.

The knowledge discovery from spatial databases is another important field. Nevertheless, it needs to get a more attention. Complex data types, intrinsic relations between spatial components and non-spatial components as well as relationships between data themselves make the spatial data mining more difficult. This explains the small number of data mining query languages that have been proposed for spatial data.

We need also to specify the type of application we are interested. In our case we don’t study the continuous change, which results into motion. For example, the position of a moving person. But, we are treating the case of discrete change, such as the change of the shape of objects over time. For example, the position of a land parcel can change when a new parcel is attached to it. This shape change causes a position change.

The GMQL (Geographic Mining Query Language) proposed by [8] is an extension of DMQL to support spatial data mining.

The previously mentioned points are interesting in our study and facilitate the computing of spatiotemporal associations (spatial relationships among objects in time).

Another approach based on the transformation of a spatial database into an inductive one was proposed by [17]. However, the proposed language needs a complex data preprocessing tasks in order to formulate the queries.

A. Computing neighborhood relationships during time intervals This phase summarized by (Fig.1) is the first step of our proposal. We start by specifying the reference object [13] on which the discovery will be achieved and the relevant task objects related to the reference object due to spatial relationships. The reference object and the relevant objects are the different tables of the ST-database. The result of this phase can be summarized in the examples given by the tables (TAB. 1 and TAB. 2). Each instance of the reference object is presented by one row e.g. the instances Refobj1-1, Refobj1-2 for the reference object Refobj1 and the columns are the spatiotemporal predicates searched by the query. They express the neighborhood relationships between the reference object and the relevant task objects calculated during a time interval (e.g. CLOSE-TO-refobj1-1-relvob1-T1).

A spatiotemporal data mining query language was proposed in [4]. The SQLST sees reality as instantaneous sequences of moving objects [15], [2], [6] and is limited to mine knowledge from trajectories evolving in space and time. This language is built on the basis of a temporal data mining query language proposed by [5]. All of these languages dealt with traditional, temporal, or spatial data. They treated separately the space and the time. The proposed languages merging space and time aspects were simply limited to the trajectory of moving objects. To the best of our knowledge, no data mining query language has been proposed in order to cope with the discrete evolution of spatial data over the time. Our problematic is to mine knowledge from discrete evolving objects like parcels or river changing of shape during large time intervals. Our proposed queries are settled on a combination of spatial mining queries and time features.

Our architecture is based on a Query Interpreter Module (Q.I.M) containing spatiotemporal queries and a Spatiotemporal Feature Extractor Module (STFEM) that aims to organize and store the knowledge generated by the Q.I.M. The spatiotemporal predicates to be mined must have the following form indicating the topologic relationship between two geographical objects A and B during the time T: ST_Spatial_relationship (A, B, T)

III. PROPOSED APPROACH FOR MINING SPATIOTEMPORAL KNOWLEDGE Our approach is a spatiotemporal data mining process aiming to mine spatiotemporal association rules. According to

E.g. ST_DISJOINT (A, B, T), ST_CLOSE_TO (A, B, T) etc…

128

Validtime is the time interval verifying the validity of the computed features. Each type of neighborhood relationship is computed during many time intervals in order to capture the generated changes. TAB. 1 SAMPLES OF NEIGHBORHOOD RELATIONSHIPS COMPUTED DURING A GIVEN TIME INTERVAL T1

Figure 1: The proposed Architecture for ST-predicates discovery

1) Query interpreter module The aim of this module is to extract spatio-temporal predicates (spatio-temporal associations). This objective is accomplished by spatiotemporal queries.

C lose − to − t1

Intersects – t1

Near-to- t1

Refobj1-1

Relvobj 1

Relvobj 5

Relvobj 6

Refobj1-2

Relvobj 3

Relvobj 8, Revobj 6

Refobj1-3

Relvobj 9

………..

……………

………….

………………..

Refobj1-n

Relvobj 4

Relvobj 10

TAB. 2 SAMPLES OF NEIGHBORHOOD RELATIONSHIPS COMPUTED DURING A GIVEN TIME INTERVAL T2

a) ST-Query for ST -predicates extraction The proposed query has the following form and is used in order to compute the ST-predicates (ST-associations) describing the neighborhood relationships during a given time interval. The nature of the relationship is defined on the basis of distance measures.

Those measures are given by domain experts. Select Att1X, t1, Att2X, t1,.., AttnX, t1 Nom-Y as Rel-vois-X-Y-t1, Nom-Z as Rel-vois-X-Z-t1 From "TableX" X, "Table Y" Y, "Table Z" Z Where c o n t a i n s (buffer (X.thegeom, distance),Y.the-geom) And c o n t a i n s (buffer (X.thegeom, distance), Z.the-geom) and X.Xvalidtime between ’aaaa-mm-jj’ AND ’aaaa-mm-jj’ and Y.Yvalidtime between ’aaaa-mm-jj’ AND ’aaaa-mm-jj’ and Z.Zvalidtime between ’aaaa-mm-jj’ AND ’aaaa -mm-jj’;

C lose − to – t2

Intersects – t2

Near-to- t2

Refobj1-1

Relvobj 1

Relvobj 5

Relvobj 6

Refobj1-2

Relvobj 3

Relvobj 2

Relvobj 10

Refobj1-3

Relvobj 9

………..

……………

………….

………………..

Refobj1-n

Relvobj 8

b) Case study We want to compute CLOSE-TO, INTERSECT and NEAR-TO neighborhood relationships during 3 time intervals (TAB. 3). In order to achieve this objective, we apply three queries for each relationship and the total number of queries will be 9. In the following examples, we have Town as reference object, River and Parc two relevant task objects. We want compute the spatio-temporal predicates during 3 time intervals ([1999; 2002] [2003; 2006] and [2007; 2010])

X is the reference object, Y and Z are the relevant task objects. Rel_vois_X_Y_t1, Rel_vois_X_Z_t1 are the neighborhood relationships valid during the time interval t1.

Further predicates belonging to other time intervals can be computed. We need simply to change the dates of time interval validity in the query.

Att1X, t1, Att2X, t1, …., AttnX, t1 are other descriptors of X at t1.

2) ST feature extractor module The results of the queries are collected and organized into spatiotemporal predicates related to dependencies between spatial objects computed during time intervals.

Contains (buffer (X.the_geom, distance), Z.the_geom) is a predefined function used to compute the relationship on the basis of distance measurements. For example: Two objects Oi and Oj have a closeTo relationship if the distance between them is ≤ Ɛ, where Ɛ is a user-specified parameter related to a distance measure.

Example: @relation 'Town_neighbor' @attribute RIVER_CLOSE_TO_99_2002 {no}

129

@attribute RIVER_INTERSECTS_99_2002 {no}

After 3 years, the distance has changed due to the evolutionary approximation of the town to the river. This can be explained by the town expansion and the increasing number of buildings in the zone near to the river. As a consequence, the river is no longer NEAR_TO the town but it is CLOSE_TO it (0m

Lihat lebih banyak...

Mining spatiotemporal associations using queries

Descrição do Produto

Comentários