Inconsistency Quality Concerns For Spatial Database

May 29, 2017 | Autor: Barkha Bahl | Categoria: Computer Graphics, Social Networking, Spatial Databases
Share Embed


Descrição do Produto

Proceedings of the 9thINDIACom; INDIACom-2015; IEEE Conference ID: 35071 2015 2 International Conference on “Computing for Sustainable Global Development”, 11th- 13thMarch, 2015 BharatiVidyapeeth's Institute of Computer Applications and Management (BVICAM), New Delhi (INDIA) nd

Inconsistency Quality Concerns For Spatial Database BarkhaBahl Professor & Director Delhi Institute of Advanced Studies Affiliated to GGSIPU, Dwarka, New Delhi-78, INDIA

Abstract- One of the major challenges the Geographic Information System applications face today relates to quality of data. Better the quality of data,efficient is the application. Quality of data have multiple dimensions, these arecompleteness, validity, consistency, timeliness and accuracy that makes data appropriate for a specific use. The rationale of quality refers to the degree of excellence exhibited by the data in relation to portrayal of the actual scenarios. Inconsistency issues in spatial databases have major concern for quality data as it results in data integrity. The paper deliberates on quality concerns with reference to inconsistencies in spatial data. It has been observed that most of the inconsistencies are addressed through either algorithms or software. However, the aspects arising at data collection and preservation phase are much recommended while designing the database and thereby handling the inconsistencies. The paper discusses geometric and topological inconsistencies and its assessmentthrough the proposed Triangular Pyramid Framework. Keywords- Data Quality, inconsistency, Topological Pyramid, Spatial Database 1.

Data Model, inconsistency,

Geometrical Triangular

INTRODUCTION

Today Geographic Information System (GIS) is increasingly being employed for a wide range of applications such as comprehensive planning, zoning, transportation, utilities, flood management, urbanization, coastal resource management and mapping of vegetation distributions and its impact on the environment and the use of demographics in making public policy [8].Large amount of spatial data for the same is derived from map digitizing, aerial photos, GPS data, remote sensing imagery etc. [16] due to which the spatial data are usually inconsistent. This would result in ineffective spatial analysis, spatial query and spatial decision making. It has been found that for accurate decision making quality aspects should be kept in mind at every stage i.e. data collection and preservation, data inputting and conversion, data analysis and processing [15].In this paper we will be discussing inconsistency issues in detail with respect to data collection andpreservation and will evaluate them in data processing phase through test cases

applied to the “TerzattoTool” developed using proposed Triangular Pyramid Framework. The framework corrects inconsistency of geographic data in vector format for spatial analysis. Geometric inconsistency and Topological inconsistencies have been considered for spatial analysis. Geometric inconsistency refers to geometric part of geographical features (shapes and coordinates). Three kinds of geometric inconsistency errors are repeated point: two or more points in the same position of a feature (line or polygon), repeated segment: two or more segment in the same position of a feature (line or polygon), overlapping boundaries without the same coordinates: there are different vertexes of the shared boundary of features, including different number of vertexes, or different coordinates of vertexes.Topological Inconsistency in spatial relation can be defined in eight topological relations that can be realized between two spatial regions A and B. They are Equals, Disjoint, Meet(Touches), Inside(Within), Contains, Covers(Intersects),Overlaps, and Coveredby(Crosses)[8]. TABLE II.TOPOLOGICAL PREDICATES AND THEIR CORRESPONDING MEANINGS AFTER THE DIMENSIONALLY EXTENDED NINE-INTERSECTION MODEL (FROM [7]). Topological Predicate Equals Disjoint Intersects Touches Crosses

Overlaps

Within Contains

Meaning The Geometries are topologically equal The Geometries have no point in common The Geometries have at least one point in common (the inverse of Disjoint) The Geometries have at least one boundary point in common, but no interior points The Geometries share some but not all interior points, and the dimension of the intersection is less than that of at least one of the Geometries. The Geometries share some but not all points in common, and the intersection has the same dimension as the Geometries themselves Geometry A lies in the interior of Geometry B Geometry B lies in the interior of Geometry A (the inverse of Within)

The objective of this research was to assess the practical usefulness of the proposed framework in supporting the task of database designing for GIS applications. Spatial data quality aspects related to geometric and topological

Copy Right © INDIACom-2015; ISSN 0973-7529; ISBN 978-93-80544-14-4

4.166

Inconsistency Quality Concerns For Spatial Database

inconsistencies directly affect the correctness and efficiency of spatial analysis, spatial query and spatial decision making for GIS applications. These inconsistencies between two regions can be found in the data acquisition process while collecting the geographic data from different datasets or results of spatial analysis[17]. It has been observed that due to the relevance of data quality,its nature, and the variety of data types and information systems, achieving data quality is a complex, multidisciplinary area of investigation. It involves several research topics and real life application areas [13]. Research issues concern models,techniques, tools, methodologies and frameworks. In this paper Section 2 will be discussing the related work with respect to the available tools, models and techniques for th spatial data quality. Section 3 will be discussing methodology to resolve geometric and topological inconsistencies using Triangular Pyramid Model. Section 4 will resolve and assess the inconsistency issues through an indigenoustool. Section 5 concludes the paper. II. RELATED WORK Spatial data quality concerns related to data accuracy, precision, consistency and completeness are the key issues in Geographic Information Systems[3]. Research reveals that various models,techniques,methodologies, tools and frameworks have been developed to meet these data quality dimensions[13].Dimensions are applied with different roles in models, techniques, tools, and frameworks. With reference to inconsistency dimension, it has been found that Geometric and topological inconsistencies have been handled eitherby usingnodesnapping [14]algorithm or through software written in AutoCAD LISP [4]. They are mainly discussing Geographical boundary inconsistency [17] caused when the geographical data are from different data sets or results of spatial analysis. The inconsistencies between two adjacent geographic boundaries are either because two boundaries have same number of vertices, but not the same coordinates or they have the different numbers of vertices. To resolve the mentioned inconsistency, node snapping generalized algorithm is used for finding matching vertexes, and standard formalizing of inconsistent boundaries by vertical projection. Topological inconsistencies like intersection ,separation and interlaced intersection have been corrected by using Delaunay triangulation[12].It is used for obtaining adjacent areas to remove topological inconsistencies. Topological error correction of GIS vector data has been accomplished by AutoCAD VE Autolisp[6].It eliminates floating or short lines, overlapping lines, overshoots and undershoots, unclosed and weird polygons, dangle nodes, nodes and pseudo nodes , slivers and gaps error etc. The software automatically checks the mentioned errors and makes the necessary corrections for accurate spatial analysis. In another paper a GeoExpert- framework [1] has been proposed for data quality in spatial database. It is a cleansing tool for spatial data that integrates the spatial data visualization

and analysis capabilities of the ARCGIS Engine for an expert system. All above mentioned anomalies refers to the inconsistencies that can be resolved either algorithmically or through the software design. However, the inconsistencies arising while digitization and due to multiple inputs may be best suited to be resolved at the data model level. The concept though has been recommended [17], its implementation details were not evident. Next section will discuss the proposed Triangular Pyramid Framework followed by the methodology adopted by theproposed Frameworkfor handling the inconsistencies while designing the spatial database. III. TRIANGULAR PYRAMID FRAMEWORK A. Introduction GIS process the spatial information which is the information derived from spatial data in a database. To sensibly work with these systems, we need models of spatial information as a framework for database design. These models address the spatial and thematic dimension of real world phenomena, at the same time they are geometrically and topologically consistent. The proposed framework address the spatial and thematic dimension while at the same time it is consistent with geometric and topological constraints. The Data Model being developed has three levels of abstraction. They are: The Object Component (Highest Level), the Geometric Component (Middle level), and the Location Component (Lowest Level). [2] The diagrammatic representation of the same is shown in Figure 1. B. The Object Component (Highest Level) Map is a combination of different types of layers. These maps and layers are called objects as they are real life entities having both attribute and behavior. Thus object level is the highest level of abstraction, and at this level geographic data is represented by layers representing the relative position of spatial objects. C. The Geometric Component (Middle Level) The Geometric component is the middle level as it is the interface between the object and its actual spatial existence on the earth. Each geographic object in the higher level has itscorresponding geometric object. The information at geometric level represents the shape of the geographic object, which is categorized into three: - point, line, polygon. The first is point data where each object is associated with a single location, Example a city, district, school, hospital etc. The second is line data where the location is described by the string of points, Example: - road, river, drainage, national highway etc. The third is polygon data, where the location of object is represented by a closed string of coordinates. They are thus

Copy Right © INDIACom-2015; ISSN 0973-7529; ISBN 978-93-80544-14-4

4.167

Proceedings of the 9thINDIACom; INDIACom-2015; IEEE Conference ID: 35071 2015 2 International Conference on “Computing for Sustainable Global Development”, 11 th - 13th March, 2015 nd

associated with areas over defined space, Example: -blocks, villages etc. D. The Location Component (Lowest Level) The lowest level of proposed data model is location component which represent the actual screen coordinate values of the geometric objects at the middle level. This third level is the reason for inconsistency removal of the spatial data being stored using the proposed framework. Section 3.2 discusses how the framework resolves inconsistencies. E.Methodology adopted in the Triangular Pyramid Framework for inconsistency removal while designing the spatial database Spatial features on a map needs to be converted into a digital format so that the digitized data can be used for GIS applications. Point, line and area features[6] that form a map are converted into X, Y coordinates and stored in tabular form along with some other attributes while digitization. Digitizing involves manually tracing all features on a map. While digitizing responses will be entered from either the keyboard attached to the computer or the keypad attached to the digitizing table. The resultant database may contain topological and geometrical inconsistent relations. These inconsistencies can further be resolved while transforming the digitized data through the proposed Triangular Pyramid data model. Existing vector data available in shape file format has been transformed in a triangular pyramid database model. Transformation has been done to handle the inconsistencies. Topological inconsistencies:disjoint, meet, contains, covers, equals, overlap, inside and coveredbyhave been handled by the introduction of separate and dynamic line, point and polygon data bases for each layer in the proposed framework. The framework has introduced three components i.e. an object component, the Geometric Component, and the Location Component. It has been observed that map information in the object component and point, line, polygon layer information in the geometric components have been stored independently in separate relations, which resulted in removing topological inconsistencies in two spatial regions . The lowest level of proposed data model is location component which represents theactual screen coordinate value of the geometric object in middle level. In the proposed model, maps with same extent are considered i.e. they have same (Xmin, Ymin) and (Xmax, Ymax) screen coordinates. Screen x, y coordinates are stored in the table by clicking on any location in the layer which help in maintaining Geometric consistencies while designing database dynamically. Geometric Inconsistencies occurs from geometric part of geographical features (shape and coordinate). Three types of Geometric Inconsistencies i.e. Repeat Point,Repeat Segment Repeat Polygon are removed by storing location Screen X,Y coordinate and by generating unique id in common location table(location component) as shown in table 1.Pointing to

point, line and polygon tables respectively.Repeated click on same location will not store coordinates again in table, once it has been stored. So, there will be no repeat point, line and polygon information in the database [14]. The frame work has been tested by developing a suitable GIS product and has been demonstrated in the next section. IV. DEPLOYMENT OF INDIGENOUS TOOL TO ASSESS INCONSISTENCY THROUGH TRIANGULAR PYRAMID FRAMEWORK A Triangular Pyramid Framework[2] has been implemented using a data transformation “TerzattoTool”. The existing vector data in Shape file format available at National Informatics Centre, GIS department, India, has been successfully brought to the level of prescribed standards. The standards mean that the software should provide a common method to acquire, manage, and display information with no inconsistency (geometric and topological). For this vb.net, mapwingis and postgress for data storage has been used. A GIS capable tool for assessing the quality of the spatial data in terms of inconsistencies has been developed. Currently, the tool can handle the spatial data in the Shapefile format. Initially after loading the sample data of GautumBudh Nagar, Noida, Uttarpradesh, India, with MAPWINGIS the map is drawn. On click,the data is transformed inTriangular Pyramid Framework. The object-relational schema of the framework is shown in Figure 2. A Free sample background from www.awesomebackgrounds.com

Slide 13

Database Design

© 2006 By Default!

Fig.2. Database Design

Map Table in the above mentioned schema contains path and title information which is connected to Map2layer. Map2layer contains mapid and lid(layered). Lid key is further connected to layers table containing layer-name (lname) and shape information. Since layers can be point, line and polygon therefore data corresponding to respective layers can be saved

Copy Right © INDIACom-2015; ISSN 0973-7529; ISBN 978-93-80544-14-4

4.168

Inconsistency Quality Concerns For Spatial Database

TABLE II. LAYER TABLE

in separate layers to resolve inconsistencies. Details of how the framework helps in resolving inconsistencieshave been explained in section 4.1 and 4.2. A. Resolving and assessing geometric inconsistencies (i.e. No repeated polygon) Geometric inconsistencies have been handled with the inclusion of common reference table in the Triangular Pyramid Framework. Common reference table stores the screen coordinates for the map. It is named as common since common screen coordinates will be there for different map with the common extent. Once the data on click is being saved in the respective layer database, it should not be saved again on another click on the layer or on the specific location. The same has been handled with the help of checks applied to the common reference table and respective layer database. Handling these type of inconsistencies are referred to as no repeat geometric inconsistency. Working of the tool for resolving and assessing the geometric inconsistencyhas been demonstrated with the help of the data available for GautamBudhNagar. On selecting village layer,the layer gets loaded and on adding label,it is being labeled. Further selecting the identify option will show the information in the data grid as shown in fig. 3. Clicking again will show the same data-grid entries. This has been achieved by generating the comloc table with the common identification (cid) as primary key and x and y coordinates. Second layer table contains layerid(primary key), layername and layershape and the next table being created is map2layer which contains mapid and layerid. The schema has been demonstrated in Tables 1, 2, and 3.

Fig. 3. Resolving Geometric Inconsistencies (no repeated polygon)

The screen shot of village Arthala showing no repeated polygon is represented in Figure 3. TABLE I:COMMON LOCATION TABLE

TABLE III. MAP2LAYER

This cid,id generated will go in row of table layer table (village layer) where “ARTHALA” information is stored. When User again clicks on village ARTHALA, there will beno repeated entry in common location table as well as in row of village table in database because cid in common location table is primary key and is having unique value and on clicking again even though x and y are different but cid is unique which cannot have repeated entry in database. The table in database on clicking again is shown below: TABLE IV. COMMON LOCATION TABLE

This shows that there is NO REPEATED GEOMETRIC INCONSISTENCY in Database that is NO REPEATED POLYGON even if x and yisdifferent on clicking again on different map load. DDL for Common Location Table CREATE TABLE comloc ( cid integer NOT NULL, x double precision, y double precision, CONSTRAINT "Common reference_pkey" PRIMARY KEY (cid) ) BResolving and Assessing Geometric Inconsistencies (i.e. No Repeated Line) When user selects drainage from Select Layers option,drainage (line) layer will be loaded and its label can be added by choosing Add Label from menu bar. It can be identified by clicking on identify option. Its information will appear in Datagrid as shown in Figure 4. Clicking again on the same drainage the values does not repeats. This has been achieved based onthelogic being applied for polygon in section 4.1. The screen shot for no repeated lines has been shown in Figure 4 and its table entries in Table 5,6 and 7.

Copy Right © INDIACom-2015; ISSN 0973-7529; ISBN 978-93-80544-14-4

4.169

Proceedings of the 9thINDIACom; INDIACom-2015; IEEE Conference ID: 35071 2015 2 International Conference on “Computing for Sustainable Global Development”, 11 th - 13th March, 2015 nd

with different cids,ids.This inconsistency has been assessed when user selects blocks from select layers option and the block layer will be loaded and then add its label can be added by choosing Add Label from menu bar. On identifying the information of blocks can be viewed.Meet inconsistency can be checked by clicking on the boundary where two blocks BISARKHA and DADRI meets , then information of both the blocks will appear in relations as shown in Tables8,9,10, and 11. Figure 5 demonstrates the handling of meet inconsistency.

Fig.4. Resolving Geometric Inconsistencies(NoRepeatedLines)

COMMON LOCATION TABLE comloc table as shown below

cid , X, Y generated in

TABLE V. COMMON LOCATION TABLE Fig5. Resolving Topological Inconsistency (Meet) TABLE VIII. COMMON LOCATION TABLE

In Layer table entries are as shown below TABLE VI: LAYER TABLE

In Layer table entries are as shown below TABLE IX. LAYER TABLE

In map2layer table entries are as shown below TABLE VII. MAP2LAYER TABLE

TABLE X. MAP2LAYER TABLE

C. Resolving Topological Inconsistencies In the proposed framework separate database relations have been created dynamically for each point, line and polygon layer, hence the inconsistencies related to topological relations between two spatial regions like disjoint, covered by, contains have been handled automatically. Whereas, meet inconsistency has been resolved by storing the common boundary location information of both the blocks in common location relation

This cid, id generated will go in row of table blocks where that blocks information is stored TABLE XI. BLOCK TABLE( COMMON BOUNDARY INFORMATION

Copy Right © INDIACom-2015; ISSN 0973-7529; ISBN 978-93-80544-14-4

4.170

Inconsistency Quality Concerns For Spatial Database

In table comloccid is made primary key. CREATE TABLE comloc ( cid integer NOT NULL, x double precision, y double precision, CONSTRAINT "Common reference_pkey" PRIMARY KEY (cid) ) WITH ( OIDS=FALSE ); When User again click on that blocks then there will beno repeated entry in comloc table as well as in row of blocks table in database because cid in comloc table is primary key and is having unique value and on clicking again even though x and y are different butcid is unique which cannot have repeated entry in database. The table in database on clicking again at different form load Table comloc will appear as:-

available at the initial data collection and preservation phase.The future research will be focusing on improving the “Terzatto Tool” further so that the digitized data can be kept directly in the Triangular Pyramid Framework and one may integrate Triangular Pyramid Framework in their application to perform operations on the consistent data. REFERENCES [1]

[2]

[3]

TABLE XII. COMLOC TABLE

[4]

[5] Table blocks will appear as TABLE XIII: BLOCK TABLE( COMMON BOUNDARY INFORMATION)

This shows that there is no Topological Inconsistency i.e at boundary where two polygons meet cid generated is unique at each click even though X,Y are different and there entry goes in there respective row of table. V. CONCLUSION The research undertaken strengthens the usefulness of the Triangular Pyramid Framework by assessing quality concerns related to Topological and Geometrical inconsistenciesin spatial databases. Triangular Pyramid Framework has been assessed for resolving inconsistencies with the help of the “TerzattoTool”.The tool has been used to transform and preserve the data available in shapefile format to Triangular Pyramid Framework. The assessment results through experiments provedthat for any Geographic Information System’s application the quality parameters related to Topological and Geographical consistent data can be made

[6]

[7]

[8]

[9]

[10]

[11]

Tadakaluru, A.Karla, A. Ernest,”GeoExpert-A Framework for Data Quality in Spatial Databases.Proceedings of the International Conference on Computational Intelligence for Modelling,Control and Automation, IEEE http://doi.ieeecomputersociety.org/10.1109/CIMCA.200 5.1631527 Bahl, N.Rajpal, V.Sharma,”Triangular Pyramid Framework for Enhanced Object relational Dynamic Data model for GIS. IJCSI”, International Journal of Computer Science Issues, vol. 8, issue 1, January 2011,ISSN (online):1694-0814,pg. 320-328,2011 Jun, Li Chengming , Li Zhilin , Chris Gold ,” A Voronoi-based 9-intersection model for spatial relations”, Int Journal of geographical information science, vol. 15, no. 3, 201 - 220, Tayler&Francis, 2001. D.J Cowen ,”GIS versus CAD versus DBMS: what are the differences?. Introductory Readings”Geographic Information System, Taylor&Francis Ltd., Burgess Science Press,London. pp. 52-61,1990 L.Warneckle, J. Beattie, C.Kollin, W.Lyday,N. S. French, ” GIS In Cities & Counties: A Nationwide Assessment”, 1998,URISA.http://www.urisa.org/node/533 L.Xiulin , H. Weigen , A.Shi,T.Junhua .” Raster to Vector Conversion of Classified Remote Sensing Image Zhejiang”.Nature Science Foundation(China), pp. 3656-3658,IEEE.,2005 M. Davis M and J. Aquino, “JTS Topology Suite Technical Specifications”, Vivid Solutions Victoria, British Columbia, 2003 M.Egenhofer, J Sharma, “Assessing the Consistency of Complete and Incomplete Topological Information”. Geographical Systems, vol 1, pg. 47-68, 1993 O.Kersting, J.Dollner,” Interactive 3d visualization of vector data in gis”. in Proceedings of the 10th ACM International Symposium on Advances in Geographic Information Systems (ACMGIS 2002), ISBN 1-58113, pg 107-112. P. Jeffrey and B.Kate,”Visualization of Spatial Data Quality for the Decision – Maker: A Data-Quality Filter”, URISA JOURNAL, vol 6, no. 2, pg. 25-34, 1994 R.Y. Wang,D.M. Strong,”What Data Quality Means to Data Consumers”. Journal of Management Information Systems 12, 4, 1996

Copy Right © INDIACom-2015; ISSN 0973-7529; ISBN 978-93-80544-14-4

4.171

Proceedings of the 9thINDIACom; INDIACom-2015; IEEE Conference ID: 35071 2015 2 International Conference on “Computing for Sustainable Global Development”, 11 th - 13th March, 2015 nd

[12]

[13]

[14]

[15]

T. H. Ai, H. H. Wu,” Consistency Correction of Shared Boundary between Adjacent Polygons”. Journal of Wuhan Technical University of ditya Surveying and Mapping, Wuhan, 25(5), pp.426-431, 2000. V.C. Storey ,R.Y. Wang. “ An Analysis of Quality requirements in Database Design” in proc. 4th International Conference on information Quality(IQ 1998). W. B. Liu, Z. G. Xia and X. G. Cui ,” A New Generalized Algorithm of Node Snapping and a Universal Model of Error Propagating”. ActaGeodaetica et CartographicaSinica, 30(2), pg.140-147,2001. Y. Wang, F.Guo, L. Zhou, H. Wang, C.Ge, ”Research on the Quality Control Method of Urban Engineering

[16]

[17]

[18]

[19]

Geological Database”. First International Conference on Information Science and Engineering .2009. pg.963-966 Y.Zheng , “Standardization Guide of Urban Geographic Information System”,1998. BeiJing Science Press, pp.158-162. Z.Xie, G.Tian, L. Wu, L. Xia , ”A framework for correcting geographical boundary inconsistency”, in proceedings of Geoinformatics,2010, pg. 1-5 Y. Wand, R.Y. Wang, ”Anchoring Data Quality Dimensions in Ontological Foundations”. Communications of the ACM 39,11,1996 Davis M and Aquino J (2003): JTS Topology Suite Technical Specifications.- Vivid Solutions Victoria, British Columbia

Fig.1. Triangular Pyramid Framework (from[2])

Copy Right © INDIACom-2015; ISSN 0973-7529; ISBN 978-93-80544-14-4

4.172

Lihat lebih banyak...

Comentários

Copyright © 2017 DADOSPDF Inc.