Quality assessment for geo‐spatial objects derived from remotely sensed data

Share Embed


Descrição do Produto

International Journal of Remote Sensing Vol. 26, No. 14, 20 July 2005, 2953–2974

Quality assessment for geo-spatial objects derived from remotely sensed data QINGMING ZHAN*{{§, MARTIEN MOLENAAR{, KLAUS TEMPFLI{ and WENZHONG SHI§ {School of Urban Studies, Wuhan University, Wuhan 430072, China {International Institute for Geo-Information Science and Earth Observation (ITC), 7500 AA Enschede, The Netherlands §Advanced Research Centre for Spatial Information Technology, Department of Land Surveying and Geo-Informatics, The Hong Kong Polytechnic University, Hong Kong, China

Airborne laser scanners and multi-spectral scanners provide information on height and spectra that offer exciting possibilities for extracting features in complicated urban areas. We apply an object-based approach to building extraction from image data in an approach that differs from conventional perpixel approaches. Since image objects are extracted based on the thematic and geometric components of objects, quality assessments will have to be made object-based with respect to these components. The known per-pixel-based methods for assessing quality have been examined in the new situation as well as their limitations. A new framework for carrying out quality assessments by measuring the similarity between the results of feature extraction and reference data is proposed in this paper. The proposed framework consists of both perobject and per-pixel measures of quality, thus providing measures pertaining to qualitative and quantitative measurements of object quality from thematic and geometric aspects. The proposed framework and measures of quality have been applied to an assessment of the results of object-based building extraction using high-resolution laser data and multi-spectral data in two test cases. The results show that the per-object-based method of assessing quality gives additional information to conventional per-pixel, attribute-only assessment methods.

1.

Introduction

Many classifiers have been widely applied in traditional per-pixel classifications, such as the maximum likelihood classifier (MLC), the k nearest-neighbour classifier (k-NN), and the neural network classifier (NN). In per-pixel classification, the individual pixel is treated as a basic unit throughout the whole process: the selection of samples, training of classifiers, the making of classifications, the preparation of reference data, and the assessment of accuracy and uncertainty. Many efforts have been made in accuracy and uncertainty assessment that are based mainly on perpixel approaches (Congalton and Mead 1983, Janssen 1994, Congalton and Green 1999, Skidmore 1999, Foody 2000). In these approaches, only uncertainty at a particular location (per-pixel) of the variable has been discussed. Many applications, however, require predictions about multi-pixel regions (objects), and issues of *Corresponding author. Email: [email protected] International Journal of Remote Sensing ISSN 0143-1161 print/ISSN 1366-5901 online # 2005 Taylor & Francis Group Ltd http://www.tandf.co.uk/journals DOI: 10.1080/01431160500057764

2954

Q. Zhan et al.

uncertainty become more complicated in such circumstances (Dungan 2002). In our cases, objects are evaluated and detected as image regions (objects) in an objectbased approach. Thus, we are interested in additional per-object measures of quality that are valid for objects rather than for individual pixels. Such measures can be derived based on the different properties of objects, such as size, location, and so forth, and we should consider per-object measures of classification such as the correctness and completeness of the extracted objects. Per-object quality assessment can be traced back to the 1970s and early 1980s when the visual interpretation method was mainly used. In these cases, manually delineated polygons were used as sample units for quality assessment. This assumes that the polygons are homogeneous by crown closure and land cover type class, accurately delineated, and free of errors of omission, and so an object-by-object comparison is often applied to assess quality (Congalton and Green 1999). Per-field classification uses GIS data to delineate the boundaries of objects in an image, and the assumption is that objects have the same boundaries for both GIS and remote sensing data. Quality assessments of per-field classifications are often made by a direct pixel-by-pixel comparison (Curran and Williamson 1988, Pedley and Curran 1991, Aplin et al. 1999). In these two cases, polygons and fields were used as ‘hard’ sample units (objects) and their boundaries are considered ‘static’ or unchangeable and homogeneous inside. But the fact is that object boundaries were obtained via evaluation of membership function, so they vary when different characteristics and parameters were applied in our object-based approach. We need to know not only if expected objects have been correctly extracted (classification quality) but also if their spatial extent and location (geometric quality) are accurately identified. Therefore, in this study, we have to pay more attention to assessments of quality using both pixel-by-pixel and object-by-object approaches with regard to objects that are automatically or semi-automatically extracted from digital images using objectbased approaches and in terms of various object properties. The term ‘object’ used in this paper refers to the raster representation of entities in the real world, such as buildings, roads, and lakes. We also use the terms ‘image objects’ or ‘image regions’ to refer to the raster representation of real-world entities. Since high-resolution data with a spatial resolution of 1 m are used in both study cases presented in this paper, we assume that our target objects are larger than a pixel (1 m2). In fact, we consider that a building should contain at least a room, which should have a minimum size of 10 m2 (about 3 m by 3 m) or contain at least 10 pixels in both study cases. The error matrix or confusion matrix is often used to compute measures of quality such as user’s accuracy, producer’s accuracy, overall accuracy, and the Kappa coefficient for assessing the quality of classification results obtained by visual interpretation or by using a per-pixel approach. In the case of a visual interpretation, operators usually interpret an object, delineate the boundaries of the object, and then label it according to a designated class. To assess the quality of the interpretation, a field visit is normally made to check whether the assigned classes are correct for some randomly selected sample objects and to count the number of objects that are correctly classified and the number of objects that are misclassified for each class. The results of a quality assessment are represented as an error matrix in order to compute the specified measures of quality. In the case of the digital classification of images by computer, the results of a quality assessment are also represented by an error matrix in order to compute the specified measures of

Integrated remote sensing and GIS

2955

quality, but often using randomly selected sample pixels or other sampling methods. In the former case, the measures of quality that are obtained indicate the quality of the classification in terms of objects (per-object) from the perspective of the objects. In the latter case, the obtained measures of quality indicate the quality of the classification in relation to the locations of the objects (per-pixel). It has been recognized that geo-spatial data may contain uncertainties due to errors in value, space, time, consistency, and completeness (Sinton 1978, Chrisman 1991, Gahegan and Ehlers 2000). In this paper, we will not touch upon issues of quality that are caused by temporal factors. In our object-based land-cover and land-use classification, we consider that the objects that are acquired (by being extracted and classified from images) contain both errors of classification and geometric errors. Assessments of quality concerning classification errors can be divided into two aspects: correctness and completeness. Correctness measures the percentage of extracted objects that are correctly classified. Completeness measures the percentage of objects in the real world, as given by the reference data, which are explained by the extracted objects. Geometric errors can be divided into two categories: errors in terms of an object’s location and errors in terms of the spatial extent of an object. We define per-object errors of location as indicating the positional difference between the centre of the mass of an extracted object and the centre of the mass of the same object in the reference data. In the following sections of this paper, we will discuss classification and geometric measures of per-object quality. Classification quality will be measured mainly by per-object measures with regard to different object properties. Geometric quality of objects is considered to be measured by per-pixel measures that are mainly concerned with the spatial extent of objects. Both measures of quality will be utilized in our united framework, which is proposed and developed based on a comparison of the similarities between extracted objects and objects in the reference data by applying Tversky’s feature contrast model (Tversky 1977). A short review of the error matrix and related measures, as well as certain limitations, is presented and discussed in Section 2. The proposed framework that is expected to utilize per-object and per-pixel measures of quality is presented and discussed in Section 3. A brief introduction of the object-based building extraction approach and an assessment of the quality of the extracted buildings of two test sites are presented in Section 4. The paper closes with a discussion and concluding remarks. 2.

A review and analysis of existing methods of assessing quality

Quality is a very broad issue that may relate to a variety of properties. Most frequently, the property of interest is the accuracy of maps or classifications (Foody Table 1. Error matrix for assessing quality. Reference data Classified data

A B C Total Producer’s accuracy

A

B

C

Total

User’s accuracy

nAA nBA nCA n+A nAA/n + A

nAB nBB nCB n+B nBB/n + B

nAC nBC nCC n+C nCC/n + C

nA + nB + nC + n

nAA/nA + nBB/nB + nCC/nC +

2956

Q. Zhan et al.

2000). The accuracy of classification is typically taken to mean the degree to which the derived thematic map agrees with reality (Campbell 1996). The error matrix or confusion matrix defined in table 1 is a popular means for assessing the quality of classification results. Based on the error matrix, a number of measurements of quality can be derived, such as overall accuracy, user’s accuracy, producer’s accuracy, and the Kappa coefficient: User’s accuracy~

nkk nkz

Producer’s accuracy~

Overall accuracy~

n Kappa coefficient~

m P

ð1Þ

nkk nzk

ð2Þ

m 1X nkk n k~1 m P

nkk {

k~1

n2 {

ð3Þ

nkz nzk

k~1 m P

ð4Þ

nkz nzk

k~1

2.1

Need to assess the quality of single-class cases

A single-class case deals with the situation in which only one class of objects is of interest for detection, such as buildings or vegetation, while other objects are not considered interesting. To examine whether known measures of quality are still applicable in our object case, we assume a possible test site where 70% of the area is water, 30% is land, and buildings cover only 3%. We take building extraction as an example of a classification. Figures are likely to be obtained based on 1000 randomly selected samples, as presented in table 2. The elements of the error matrix are computed, as shown in table 2. A problem can be observed from the figures presented in this table, where a very large number of pixels are found in the cell representing non-buildings in both classified data and reference data due to the existence of a large portion of water and other non-building areas (e in table 2). This indicates that the objects of interest only cover a small portion of the scene. Samples falling in areas that do not contain objects will not make much sense in an Table 2. Error matrix for assessing the quality of extracted buildings from a possible test site, based on 1,000 randomly selected samples. Reference data Classified data

Building Non-building Total Producer’s accuracy

Building

Non-building

Total

26 (a) 7 (d) 33 78.8% (g)

9 (b) 958 (e) 967 99.1% (h)

35 965 1000

Overall accuracy598.4%; Kappa575.6%.

User’s accuracy (%) 74.3 (c) 99.3 (f)

Integrated remote sensing and GIS

2957

assessment of quality, since we are only interested in the extracted objects. As a consequence, the error matrix shows overall accuracy to have been overestimated. A very high user’s accuracy and producer’s accuracy for non-buildings (f and h in table 2) are meaningless to our objective—building extraction. The user’s accuracy and the producer’s accuracy for buildings (c and g in table 2), however, can still be considered reasonable measures for assessing quality, since they have not been influenced by the large number of pixels that exist in non-object areas. We also consider the Kappa coefficient to be valid, since the Kappa coefficient takes into account agreements contributed by chance. Kappa considers that the frequency of a sample appearing in a class is proportional to the percentage of locations (pixels) that this class covers among all possible locations (the total size of the image). In addition, we will have a different interpretation of the figures in the error matrix. We can observe that the user’s accuracy and the producer’s accuracy for the objectrelated cells (c and g in table 2) are calculated based on pixels falling in the objectrelated cells (a, b and d in table 2). Figures falling in these cells can be understood as correctly classified (a in table 2), wrongly detected (b in table 2), and undetected (d in table 2). When we use an error matrix to assess the quality of the single-class case (i.e. buildings) by using a per-object approach as we usually do when assessing the results of visual interpretation, we will likely obtain the following figures: the number of objects that have been correctly classified, the number of objects that have been wrongly classified, and the number of objects that have not been classified or remain undetected. We will not be able to produce the Kappa coefficient based on these three figures. But we still need a per-object overall measure of quality for single-class cases. 2.2

Differences between per-object and per-pixel measures

Dungan (2002) stated that uncertainty may change when one is talking about a single pixel or multiple pixels. A confidence statement about the limited area represented by a single pixel may be different from a confidence statement about a large area of which the pixel forms only a part. This statement implies that current per-pixel-based measures are considered inadequate for assessing the quality of objects extracted from images. This is because our spatial unit has been changed from an individual pixel to an individual object or a multi-pixel region, whereas in image classifications the error matrix and related measures are usually locationbased (per-pixel). Additional per-object measures are needed to assess the quality of extracted objects from different perspectives such as position, size, and shape in terms of per-object geometric properties, as well as overall quality, correctness, and completeness in terms of per-object thematic properties. To apply per-object measures, we must solve the problem of matching objects. In this research, we consider an extracted object (such as a building) as matching an object in the reference data if the two overlap by at least 50%, and the overlapping part is larger than or equal to 10 pixels (we consider that a building should have a size of at least 10 m2 and avoid cases where the first criteria are satisfied, but the overlapping portion is very small). We have chosen these values, considering that the ratio criterion of 50% is often insufficient for small objects that consist of only a few pixels. Figure 1 illustrates several permutations of two matched objects. All four cases shown in figure 1 are considered as correctly classified by the above criteria. The per-object measures, however, should also be capable of assessing the differences in the spatial extent of matching objects.

2958

Q. Zhan et al.

Figure 1. Four matched cases of an extracted object (the matched region is shown in orange; green indicates an extracted region that is not explained by the reference data; blue indicates a region in the reference data that is not extracted): (a) a more than 50% match; (b) an extracted reference object matches with the same shape and size but differ in position; (c) and (d ) an extracted reference object matches with the same position but differ in spatial extent.

2.3

Need to assess the quality of object classifications

A per-pixel classification is normally made for each pixel by using a classifier solely based on the values of the pixel without considering the spatial extent of a geospatial object. Therefore, per-pixel classification can be regarded as a per-location system of classification, since class labels are assigned to each pixel or each location in the geographical space. In traditional assessments of the quality of per-pixel classifications, the user’s accuracy and producer’s accuracy are often used to describe errors of commission and omission based on figures presented in the error matrix table. In per-object classification, the terms consistency (consistency of obtained results as compared with the reference data or called correctness) and completeness are much more directly meaningful to users than the user’s accuracy and producer’s accuracy considering the use of extracted objects. Consistency can be related to the user’s accuracy, since consistent results will only be obtained when the user’s accuracy is high. We use correctness instead of consistency in the remaining part of this paper to measure how good the extracted results are as compared with the reference data. Completeness can be related to the producer’s accuracy, since complete results will only be obtained when the producer’s accuracy is high. They can be varied by applying different membership functions and strategies for extracting objects in order to acquire results that are more accurate or complete. This implies that a strategy has to be chosen according to the user’s requirements and the context in which extracted results are used. In many cases, the user’s accuracy (correctness) and producer’s accuracy (completeness) cannot be high at the same time. For instance, membership functions can be constructed in more restricted terms, and we can use more conservative thresholds if consistency is more important than simply minimizing overall errors in classification. On the other hand, more potential objects should be extracted if we want to meet the requirement for completeness. As a result, many extracted objects may not meet the requirement for consistency at the same time. In the next section, we will examine a number of existing measures to determine if they are still valid for assessing quality in such a case. Several measures will be proposed and examined. 2.4

Need for the quality of various object properties to be assessed

In many cases, we are interested in quality in terms of properties of the desired objects, such as the size and position of the extracted objects in addition to simply assessing whether the desired objects have been correctly classified. A per-object measure of quality related to position can be used to assess situations such as that

Integrated remote sensing and GIS

2959

in case B in figure 1, where the extracted object is not in the same position as the corresponding reference object, while their sizes are identical. This measure can be associated with a registration error between the image and the reference map. A perobject measure of quality related to the size of the object can be used to assess situations, as presented in cases C and D in figure 1. Case C is an example of the extracted object being smaller than the reference object, while in case D the extracted object is larger than the corresponding object in the reference data. In both cases, there is no error in position, but they differ in size. Therefore, new measures are needed that can be used to quantitatively assess quality based on different properties of objects, such as the situations illustrated in figure 1. Based on the example presented in table 2, we have introduced the existing methods and demonstrated that the existing methods are limited in that a number of per-pixel measures of quality are often over-estimated or biased in the single-class case (buildings in this case). The single-class case deals with the situation in which only one class of objects is of interest for detection, such as buildings or vegetation, while other objects are not considered interesting. In addition, many issues of quality that are directly associated with different properties of an object (such as the size and location of an object) cannot be measured by using per-pixel methods. Therefore, a framework is needed to assess the quality of results that are obtained using the object-based approach. In the new framework, we will propose a set of per-object measures of quality in order to obtain more comprehensive measurements of the quality of extracted objects. The new framework should be able to solve the problems presented in this section. The existing per-pixel measures are also covered and explained in the new framework. 3.

A framework and measures of quality for geo-spatial objects

To avoid the limitations of the existing methods and to better describe the quality of geo-spatial objects from different perspectives, we will present a quality assessment framework developed based on Tversky’s feature contrast model. A number of per-object measures of quality will also be proposed in this section. We expect that both per-pixel and per-object measures of quality can be integrated in this framework. 3.1

A framework for assessing quality based on the feature contrast model

To develop a framework for assessing quality, we consider the use and extension of Tversky’s feature contrast model (Tversky 1977) to measure the degree of similarity between the results of classification and reference data from different aspects, and to use them as measures of quality suitable for both per-object and per-pixel cases. The more features that match between the results of the classification and the reference data that supposedly represent reality, the higher we consider the quality of those results to be. This also applies when reality is subjectively described by definitions such as land-use classes. Similarity~

f ðC\RÞ f ðC\RÞza f ðC{RÞzb f ðR{C Þ

ð5Þ

The similarity between classified data (C) and reference data (R) based on a specific feature is expressed as a function (f) of the three arguments: f(C>R), the features that are common to both C and R; f(C–R), the features belong to C but not to R;

2960

Q. Zhan et al.

and f(R–C), the features that belong to R but not to C. a and b denote the weights for f(C–R) and f(R–C), respectively. a5b only if C and R are symmetric. We can relate this model to the error matrix. For an error matrix of those classes (see table 1), nAA can be regarded as f(C>R); nAB and nAC can be treated as f(C–R); and nBA and nCA can be treated as f(R–C). This similarity ratio model can be extended and applied to assess the quality of extracted objects, since many features can be selected for such a comparison. We will now explain the existing measures of quality and propose some new measures within the framework of similarity of features. 3.2

Explanation of the existing measures of quality under the new framework

The two parameters a and b in the feature contrast model can be regarded as weights for two aspects of mismatch. In most cases, we consider a5b51. Figures in the diagonal cells of an error matrix are regarded as matched features, i.e. f(C>R), and the figures in the off-diagonal cells of an error matrix are regarded as mismatched features, i.e. f(C–R) and f(R–C). For the single-class assessment, we introduce the overall quality (OQ) (please note that this is different from accuracy as defined earlier). The overall quality can be understood as a percentage of the number of matched objects among the total number of objects in the classification result and reference data. OQk ~

f ðCk \Rk Þ , k~1,    , m f ðCk \Rk Þzf ðCk {Rk Þzf ðRk {Ck Þ

ð6Þ

where k denotes a designated class (land cover or land use), and m denotes the total number of designated classes. Thus, the overall quality of both the results of the visual interpretation (per-object) and for digital image classification made by computer (per-pixel) can be expressed as: N ðC\RÞ N ðCk \Rk ÞzN ðCk {Rk ÞzN ðRk {Ck Þ nkk ~ nkk zðnkz {nkk Þzðnzk {nkk Þ

OQ for class k~

(7)

where N is a function of object numbers (number of objects (No) in the case of visual interpretation, the number of pixels (Np) in digital image classification), n denotes the actual number of objects, and k denotes a designated class. Similarly, the user’s accuracy (UA) and producer’s accuracy (PA) can be expressed as follows: UA for class k~

N ðC\RÞ nkk nkk ~ ~ N ðCk \Rk ÞzN ðCk {Rk Þ nkk zðnkz {nkk Þ nkz

ð8Þ

PA for class k~

N ðC\RÞ nkk nkk ~ ~ N ðCk \Rk ÞzN ðRk {Ck Þ nkk zðnzk {nkk Þ nzk

ð9Þ

In the proposed framework, the problem of overestimation in assessments of quality when using the error matrix for a single class (the example presented in Section 2) can be solved as follows. Both the user’s accuracy and producer’s accuracy are useful measures for assessing the quality of a single class, and so is the

Integrated remote sensing and GIS

2961

overall quality that combines the user’s accuracy and the producer’s accuracy. The overall quality is suggested to replace overall accuracy. Np ðCb \Rb Þ Np ðCb \Rb ÞzNp ðCb {Rb ÞzNp ðRb {Cb Þ nbb ~ nbb zðnbz {nbb Þzðnzb {nbb Þ nbb ~ nbz znzb {nbb

Perpixel OQ for class k~

ð10Þ

nbb is the number of matching pixels (within the random sample). Np ðCb \Rb Þ nbb ~ Np ðCb \Rb ÞzN ðCb {Rb Þ nbz

ð11Þ

Np ðC\RÞ nbb ~ Np ðCk \Rk ÞzNp ðRk {Ck Þ nzb

ð12Þ

UA for extracted buildings~

PA for extracted buildings~

The concepts of UA and PA have been applied in assessing the quality of different types of objects, e.g. assessing the quality of buffer analysis (Shi et al. 2003) and of extracted linear features such as roads (Heipke et al. 1997, Wiedemann et al. 1998). The overall quality can also be expressed based on UA and PA: OQ~

1 1 1 UA z PA {1

ð13Þ

The quality assessment results obtained in this case are considered measures of quality in terms of spatial extent or location in the new framework, since pixels with random locations are used to compute the results. Following the same line of thinking, an assessment of quality can be applied by counting the number of objects to produce per-object overall quality, correctness, and completeness. A per-object assessment of quality can be made as well, based on the various properties of extracted objects such as the object’s size, location, and so on. In the remaining part of this section, we will define a number of per-object measures of quality based on the properties of objects. 3.3

Quality measures based on different properties of objects

3.3.1 Per-object overall quality. The formula remains the same for per-object overall quality, but what is counted is the number of objects instead of the number of pixels. The per-object overall accuracy will produce the same figure as when the accuracy of a visual interpretation is measured. No ðCk \Rk Þ No ðCk \Rk ÞzNo ðCk {Rk ÞzNo ðRk {Ck Þ nkk ~ nkk zðnkz {nkk Þzðnzk {nkk Þ nkk ~ nkz znzk {nkk

Per  object OQ for class k~

ð14Þ

2962

Q. Zhan et al.

nkk is now the number of matching objects. 3.3.2 Per-object user’s accuracy—correctness. The per-object user’s accuracy can be regarded as the correctness of the results that were obtained and can be computed by using the same formula as is used to determine the user’s accuracy.

Correctness~

No ðCk \Rk Þ nkk ~ No ðCk \Rk ÞzNo ðCk {Rk Þ nkz

ð15Þ

3.3.3 Per-object producer’s accuracy—completeness. The per-object producer’s accuracy can be regarded as the completeness of the results that were obtained and can be computed by using the same formula as is used to determine the producer’s accuracy. Completeness~

No ðCk \Rk Þ nkk ~ No ðCk \Rk ÞzNo ðRk {Ck Þ nzk

ð16Þ

3.3.4 Per-object measure of quality defined in terms of the similarity in the size of the objects. In many cases, we wish to find out how good or how similar the extracted objects are in terms of size (Sim_Size), as compared with the reference data. A measure of the quality of an object in terms of size is proposed. Here, what is measured is the similarity between the size of an extracted object and the size of the corresponding object presented in the reference data.

Sim SizeðOi Þ~

MeanSim

StdSim

1 Size ~ n

minðSize CðOi Þ, Size RðOi ÞÞ maxðSize CðOi Þ, Size RðOi ÞÞ

ð17Þ

n 1X Sim SizeðOi Þ n i~1

ð18Þ

Size ~

sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi n X ðSim SizeðOi Þ{MeanSim Size Þ2

ð19Þ

i~1

where Size_C(Oi) denotes the size of the extracted object Oi, and Size_R(Oi) denotes the size of the corresponding object presented in the reference data. The function min(a, b) returns the smaller value of the two arguments. If min(a, b) is a, then max(a, b) is b. Sim_Size(Oi) measures similarity in terms of the size of object Oi. MeanSim_Size is a measure that indicates the average similarity in terms of the size of all extracted objects. StdSim_Size is the standard deviation of similarity in terms of the size of all extracted objects. 3.3.5 Per-object measure of quality defined in terms of the location of an object. In many cases, we wish to determine how good the extracted objects are in terms of their location as compared with the reference data. A measure of the quality of an individual object’s location (Q_Loc) is proposed, based on the Euclidean distance

Integrated remote sensing and GIS

2963

between the centres of the mass of an extracted object and the corresponding object in the reference data. qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð20Þ Q LocðOi Þ~ ðXC ðOi Þ{XR ðOi ÞÞ2 zðYC ðOi Þ{YR ðOi ÞÞ2

MeanQ

1 StdQ Loc ~ n

Loc ~

n 1X Q LocðOi Þ n i~1

sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi n  X 2 Q LocðOi Þ{MeanQ Loc

ð21Þ

ð22Þ

i~1

where XC(Oi) and YC(Oi) denote the x and y coordinates of the centre of the mass of an extracted object Oi. XR(Oi) and YR(Oi) denote the x and y coordinates of the centre of the mass of the corresponding object presented in the reference data. MeanQ_Loc is a measure that indicates average quality in terms of the location of all extracted objects. StdQ_Loc is the standard deviation of the measure of the quality of the location of all extracted objects. 4. Examples of the application of assessments of quality for objects extracted using the object-based approach 4.1

Test sites

To test the proposed measures for assessing quality, two areas, one in Amsterdam and another one in Ravensburg, Germany, were selected to carry out case studies. 4.1.1 Amsterdam test site. A study area of 3 km63 km, in south-east Amsterdam, was selected for the experiment. Approximately 200,000 people live in this suburban district. Several types of residential areas and commercial areas, as well as more natural features such as parks, lakes, and canals, can be found in the study area. The landscape of this test site is generally flat, but elevated roads obstruct the straightforward feature extraction of buildings from laser data or images when using existing approaches. The laser data are provided in a raster with a resolution of 1 m, obtained by TopoSys I (TopoSys 2004) in 1998. The multi-spectral IKONOS image (Space Imaging 2004) has a resolution of 4 m, captured in 2000. 4.1.2 Ravensburg test site. An area of 1 km61 km in the south-west of Ravensburg, Germany was chosen for a further study of the proposed methods. This is a difficult area for extracting buildings because it contains many small oneto two- storey houses, often with gable roofs. Some tall trees are very close to the buildings, and the site is situated in a hilly area with various types of vegetation. Both urban and rural land-use types can be found in this area. For this site, we used high-resolution data that are produced simultaneously by a laser scanner and a four-channel multi-spectral scanner. The data used are a digital surface model acquired from the first pulse of the laser beam (DSM1), a digital surface model acquired from the second pulse of the laser beam (DSM2), colour infrared images, and real-colour images, obtained by TopoSys II (TopoSys 2004). A detailed description of these study areas and the data used in these cases can be

2964

Q. Zhan et al.

found in Zhan (2003). Based on the proposed framework for assessing quality, we obtain quality assessment results for various aspects of extracted buildings from the Amsterdam and Ravensburg test sites. 4.2

Object-based approach to extracting buildings

In this section, we review an object-based approach developed by the authors for extracting buildings (Zhan et al. 2002). Many algorithms have been developed for this purpose from images or laser data (Brunn and Weidner 1997, Hug and Wehr 1997, Haala and Brenner 1999). Many of them are per-pixel-based approaches and use features such as edge, texture, profile, and so forth. In our approach, we look at properties of image regions or segments rather than pixels. We slice the digital surface model (DSM) at a fixed vertical interval (1 m in this case) to obtain image segments (regions or objects) at various levels that are then subject to reasoning. The underlying assumption is that certain properties of the image object of a building hardly change from one level to the next (see figure 2). In the present study, we detect buildings based on two properties: the vertical change in the size of an image segment and the shift of its centre of mass. To this end, we have to link the image objects in the different layers by a tree structure. The changes in size from the bottom segment to higher levels can be plotted as a curve taking a very large value at the start (the size of whole image). The size of the segment is gradually reduced, up to a certain point (i.e. at the ground floor of a building; this is at an elevation of 1 m for curve a and of 2 m for curve b—see figure 3). Onward from this breakpoint, the size of the segment stabilizes as we hit the vertical wall of a building. The curve convincingly illustrates the potential of considering the change in segment size for distinguishing buildings from other protruding objects. A small change in segment size between adjacent levels of elevation (15% in this case) and a reasonable size (10– 5000 m2) are indicators for identifying a building. This approach works quite well in areas where relatively large buildings are present, as in our test site in Amsterdam (Zhan et al. 2002). In this test site, most buildings were successfully extracted when a threshold of 15% was applied to check the ratio of the change in size between segments from two adjacent layers based on laser data. A small portion of the original laser data and the extracted buildings are shown in figures 4 and 5. Buildings either lower than an elevated road (a) or at the top of an elevated road (b) were correctly extracted. These two cases are considered

Figure 2. Schematic profile of a real world (a), laser image (b), and profile of image segments (c) for reasoning about buildings.

Integrated remote sensing and GIS

2965

Figure 3. Plots of differences in the size of the vertical image segments of two buildings (curves a and b correspond to buildings a and b, respectively, shown in figures 4 and 5).

difficult for many existing methods that have been developed to extract buildings because of the presence of elevated roads at a short distance. The curves of the change in size of the two buildings marked a and b in figures 4 and 5 are plotted in figure 3. In our second test site in Ravensburg, Germany, things are different. There are many small houses with gabled roofs; many trees are very close to relatively small and low houses; and the terrain varies in relief. In this case, a large threshold (30%) is used to obtain segments that may potentially contain buildings. As a result, extracted segments based on slices of elevation may contain other features that have characteristics similar to buildings, such as trees. The second feature, NDVI (Normalized Difference Vegetation Index), derived from a multi-spectral image is then used to refine segments extracted in the previous stage. This is based on the

Figure 4.

Original laser data (small portion of the Amsterdam test site).

2966

Q. Zhan et al.

Figure 5.

Extracted buildings (small portion of the Amsterdam test site).

general assumption that the roofs of buildings should not contain vegetation. The third feature, based on the assumption that the roofs of buildings are solid in comparison with other features such as vegetation, can be useful as well. The subtraction of the first- and last-laser pulse (dDSM) provides information to determine whether surfaces are solid. Thus, only when the segments and pixels in these regions meet these criteria at the same time can they be considered buildings. A detailed description of the object-extraction approach can be found in Zhan et al. (2002) and Zhan (2003). By incorporating previously introduced features, vertical walls, non-vegetation, and solid surfaces, buildings are extracted based on image segments obtained in the first step and refinements made by using NDVI data and dDSM data based on corresponding fuzzy membership functions. The overall fuzzy membership function (MFOA) is computed as: MFOA ~MFSeg :MFNDVI or MFOA ~MFSeg :MFdDSM

ð23Þ

MFSeg is a binary image that gives a value of 1 to pixels in segments determined to contain a vertical wall. Both MFNDV I and MFdDSM are ‘Z-shape’ fuzzy membership functions for non-vegetation based on NDVI and for solid surfaces based on dDSM, respectively (see details in Zhan 2003). The extracted results and corresponding reference data prepared by visual interpretation and manual delineation for the Ravensburg test site are shown in figures 6 and 7, respectively. 4.3 Assessing the quality of the spatial extent of buildings by using randomly generated sample pixels The per-pixel overall quality computed for buildings extracted from the Amsterdam test site, based on 100,000 sample pixels, is 76.4%, according to the figures shown in table 3. These figures are obtained based on a comparison of the extracted buildings and the reference map, as shown in figure 6. Based on the same figures, the Kappa

Integrated remote sensing and GIS

2967

Figure 6. Comparison of extracted buildings with the reference data, Amsterdam test site (portion).

Figure 7. Comparison of extracted buildings with the reference data, Ravensburg test site (portion).

Table 3. Error matrix for assessing the quality of buildings extracted from the Amsterdam test site, based on 100,000 randomly selected samples. Reference data Classified Building data Non-building Total Producer’s accuracy

Building

Non-building

Total

User’s accuracy (%)

9123 789 9912 92.0%

2032 88,056 90,088 97.7%

11,155 88,845 100,000

81.8 99.1

Overall accuracy597.2%; Kappa585.0%.

2968

Q. Zhan et al.

coefficient, user’s accuracy, and producer’s accuracy are calculated as 85.5%, 81.8%, and 92.0%, respectively, for the Amsterdam test site. The main causes of error in terms of the spatial extent of buildings on the Amsterdam test site are several large parking garages. These have not been extracted because they are directly connected to the nearby road and do not show the desired characteristic of vertical walls. Moreover, a number of metro stations have been extracted as buildings, but are not shown on the reference map. The per-pixel overall quality computed for buildings extracted from the Ravensburg test site, based on 100,000 sample pixels, is 73.0%, according to figures shown in table 4. These figures are obtained based on a comparison between the extracted buildings and the reference map, as shown in figure 7. Based on the same figures, the Kappa coefficient, user’s accuracy, and producer’s accuracy are calculated as 83.7%, 86.9%, and 82.0%, respectively, for the Ravensburg test site. The main causes of error in terms of the spatial extent of the buildings on the Ravensburg test site are the existence of many small houses with gable roofs and high trees that are very close to low-rise buildings. 4.4

Assessing quality by counting the number of objects

Per-pixel measures can provide information on quality, but they basically deal with quality at ‘individual’ locations of the variables. Measures of quality are still needed from an object perspective. Assessing quality by counting the number of objects that are correctly detected, the number of objects that are wrongly detected, and the number of objects that are not detected can provide information about the quality of the extracted objects. 4.4.1 Amsterdam test site. The error matrix shown in table 6 is obtained for perobject assessments of quality according to figures presented in table 5. These figures are obtained based on a comparison of the extracted objects and the reference map, as shown in figure 6. The overall quality of the buildings extracted from the Amsterdam test site is calculated as 683/(683 + 25 + 26)593.1%. The correctness and completeness of the extracted buildings are computed as 96.5% and 96.3%, respectively. 4.4.2 Ravensburg test site. The error matrix shown in table 7 is obtained for perobject assessments of quality according to the figures presented in table 8. The figures in table 8 are obtained based on a comparison of the extracted objects and the reference map, as shown in figure 7. The overall quality of the buildings Table 4. Error matrix for assessing the quality of extracted buildings from the Ravensburg test site, based on 100,000 randomly selected samples. Reference data Classified data

Building Non-building Total Producer’s accuracy

Building

Non-building

Total

3177 699 3876 82.0 (%)

479 95,645 96,109 99.5 (%)

3656 96,344 100,000

Overall accuracy598.8%; Kappa583.7%.

User’s accuracy (%) 86.9 (c) 99.3 (f)

Integrated remote sensing and GIS

2969

Table 5. Accuracy assessment of buildings extracted from the Amsterdam test site based on the number of objects in the updated map. Number of extracted buildings Total number Correct number Incorrect number Correct (%) Incorrect (%)

708 683 25 96.5 3.5

Number of buildings from map Total number Correctly detected Not detected Correct (%) Mistake (%)

730 704 26 96.4 3.6

Table 6. Error matrix for assessing the quality of buildings extracted from the Amsterdam test site in terms of the number of objects. Reference data Classified data

Building Others Total Completeness

Building

Others

Total

Correctness

683 26 709 96.3%

25

708

96.5%

Overall quality593.1%. Table 7. Error matrix for assessing the quality of buildings extracted from the Ravensburg test site in terms of the number of objects. Reference data Classified data

Building Others Total Completeness

Building

Others

Total

Correctness

150 23 173 86.7 (%)

7

157

95.5%

Overall quality: 83.3%. Table 8. Quality assessment of buildings extracted from the Ravensburg test site based on the number of objects. Number of extracted buildings Total number Correct number Incorrect number Correct (%) Incorrect (%)

157 150 7 95.5 4.5

Number of buildings from visual interpretation Total number Correctly detected Not detected Correct (%) Mistake (%)

177 154 23 87.0 13.0

Note: Four buildings are spatially separate in the reference data, but they are merged in the extracted results.

extracted from the Ravensburg test site is calculated as 150/(150 + 7 + 23)583.3%. The correctness and completeness of the extracted buildings are computed as 95.5% and 86.7%, respectively.

2970

Q. Zhan et al.

4.4.3 Relationship between correctness and completeness in the case of the Ravensburg test site. A test was carried out to determine the responses of these measures to the removal of small objects. When all small objects remain, the figure for completeness is high, whereas the figure for correctness is lowered because of the existence of many small objects. Many small objects are in fact non-buildings. When small objects are removed to a certain degree, the figures for correctness rise while those for completeness drop, as shown in figure 8. The overall quality indicates a point of balance between correctness and completeness, which is useful for general cases such as when using these data for purposes of planning. In the case of the Ravensburg test site, when objects that are smaller than 40 m2 are removed, overall quality reaches a peak, as indicated in figure 8. In extreme cases, when we provide building data to locate people in these buildings for a rescue operation, completeness is far more important than correctness, since we do not want to miss any buildings under such circumstances. 4.5

Assessing quality in terms of similarities in the size of objects

In many cases, we wish to determine how good or how similar extracted objects are in terms of their size as compared with the reference data. As proposed earlier, a perobject measure of quality in terms of the similarity in the size of the objects is obtained for buildings extracted from the Amsterdam and Ravensburg test sites. 4.5.1 Amsterdam test site. A mean value of 0.88 for similarity in size and a standard deviation of 0.13 are calculated based on the result of the classification and the reference data, as shown in figures 6 and 7, respectively. The mean similarity in size of 0.88 in a scale of 0 to 1 shows that there is a high degree of similarity in terms of building size, which means that the major parts of the extracted buildings match the corresponding buildings presented in the reference data. The low standard deviation of 0.13 shows that consistent results are obtained for the extracted buildings. 4.5.2 Ravensburg test site. A mean value of 0.86 for similarity in size and a standard deviation of 0.11 are calculated based on the result of the classification and

Figure 8. quality.

Effects of removing small objects on correctness, completeness, and overall

Integrated remote sensing and GIS

2971

the reference data, as shown in figure 7. These figures show a high degree of similarity in terms of building size; thus, the major parts of the extracted buildings match the corresponding buildings presented in the reference data. The low standard deviation of 0.11 shows that consistent results are obtained for the extracted buildings. 4.6

Assessing quality in terms of the locations of objects

The Euclidean distance between the centres of the mass (gravity centres) of an extracted object and the corresponding object presented in the reference data is computed as a measure of the positional quality of the extracted objects. The mean distance between the centres of the mass of the corresponding buildings and the standard deviation computed for the Amsterdam test site are 2.0 m and 4.4 m. This means that the positions of the extracted buildings may have shifted about two pixels on average from their positions in the reference data. The mean distances between the centres of the mass of the corresponding buildings and the standard deviation computed for the Ravensburg test site are 1.1 m and 0.9 m. This means that the positions of the extracted buildings may have shifted by an average of one to two pixels from their positions in the reference data. The results obtained for both test sites show that correctly extracted buildings are well located (1–2 pixels or 1–2 m). Both average errors of location and the standard deviation for the Amsterdam test site (2.0 m and 4.4 m, respectively) are higher than the Ravensburg test site (1.1 m and 0.9 m, respectively), which can be explained by higher image registration errors for the Amsterdam case because the extracted buildings and buildings from the reference map have been acquired from different sources (laser data and topographical mapping). Another reason why a higher standard deviation is obtained in the Amsterdam test site is that the outlines of buildings are defined differently between extracted buildings and those from the reference map. In the extracted results, buildings are extracted and delineated based on the footprints of buildings, including the forecourts of buildings, while we notice that only the main parts or the raised parts of buildings were delineated as outlines in the reference map in the Amsterdam test site. In the Ravensburg case, the reference buildings were delineated on the computer screen by a visual interpretation based on the same image. 4.7 A comparison between the results obtained from the Amsterdam test site and from the Ravensburg test site To make a more comprehensive comparison of the building extraction results obtained for the two test sites, we list all quality assessment results obtained based on the proposed measures of quality, as shown in table 9. Based on these figures, we can conclude that the extracted result for the Amsterdam test site is generally better than that obtained for the Ravensburg test site, according to several measures of overall quality, such as per-pixel and per-object overall quality and the Kappa coefficient. The main reason is that the Amsterdam test site has many large buildings, which are relatively easier to extract than the many small residential buildings of the Ravensburg test site, especially as these are mixed with trees. Measures of quality in terms of building size show that very similar results are obtained from both sites. The measures of quality in terms of the location of objects show that the result obtained for the Ravensburg test site is better than the result

2972

Q. Zhan et al.

Table 9. Comparison of quality assessment results for buildings extracted from the two test sites. Quality measures Per-pixel (spatial extent of objects) Per-object (other per-object properties)

Overall quality (%) Kappa coefficient (%) User’s accuracy (%) Producer’s accuracy (%) Overall quality Correctness (%) Completeness (%) MeanSim_Size StdSim_Size MeanQ_Loc (m) Error_RangeLoc (m)

Amsterdam

Ravensburg

76.4 85.0 81.8 92.0 93.1 96.5 96.3 0.88 0.13 2.0 0–6.4

73.0 83.7 86.9 82.0 83.3 95.5 86.7 0.86 0.11 1.1 0–2.0

obtained for the Amsterdam test site. The main reason is likely to be that the reference data for the Amsterdam test site are obtained by digitizing the large-scale base map. Many building forecourts were not delineated as parts of buildings in this base map but are extracted as parts of buildings in the extracted result. The reference data for the Ravensburg test site are acquired by visual interpretation and screen digitizing, based on the same image used for extracting buildings. 5.

Conclusions and discussions

Assessing the quality of objects extracted directly from images is a process that is still at an initial stage. In this paper, several existing per-pixel-based measures have been tested in the object environment and in single-class cases. Some of them are no longer valid, while others may be reused, but a different interpretation of these measures is needed in an object-based environment. Therefore, a framework for object-based assessments of quality is proposed, integrating per-pixel and per-object measures of quality and providing measures of quality for single-class cases. In this framework, several existing measures of quality are explained, and a number of per-object measures to assess quality are proposed. We have examined the new framework and measures for assessing the quality of extracted land-cover objects— buildings—by considering the properties of the object, the correct classification, size, and position. These buildings are obtained by an object-based approach for detecting and extracting buildings from our two test sites. The results of the test show that the proposed framework that integrated existing per-pixel measures and the proposed per-object measures can provide more comprehensive information on quality by checking different properties of the object, while existing per-pixel approaches can provide information on quality mainly for individual locations (pixels). The proposed framework developed based on the feature similarity model is considered to be an open framework that can be extended and applied to any feature or property of an object for an assessment of quality. Therefore, additional measures are expected to be proposed and developed when additional features and properties can be derived and used to assess quality. Acknowledgements This research was supported by the SUS-DSO project, a joint project funded by both China and the Netherlands governments, and the research was further

Integrated remote sensing and GIS

2973

supported by grants from the Research Grants Council of the Hong Kong Special Administrative Region (Project No. 3-ZB40) and The Hong Kong Polytechnic University (Project No. 1.34.9709). The test data used in this study were provided by Survey Department, Ministry of Transport and Public Works (Rijkwaterstaat, Meetkundige Dienst), The Netherlands and TopoSys GmbH, Germany. The authors are grateful to the referees for their critical examination of the paper and very valuable suggestions, which led to considerable improvements to the paper.

References APLIN, P., ATKINSON, P.M. and CURRAN, P.J., 1999, Fine spatial resolution simulated sensor imagery for land cover mapping in the United Kingdom. Remote Sensing of Environment, 68, pp. 206–216. BRUNN, A. and WEIDNER, U., 1997, Extracting buildings from digital surface models. International Archives of Photogrammetry and Remote Sensing, 32, pp. 27–34. CAMPBELL, B.J., 1996, Introduction to Remote Sensing, 2nd ed. (New York: Guilford Press). CHRISMAN, N.R., 1991, The error component in spatial data. In Geographical Information Systems, Vol. 1: Principles, D.J. Maguire, M.F. Goodchild and D.W. Rhind (Eds), pp. 165–174 (Harlow, UK: Longman). CONGALTON, R.G. and GREEN, K., 1999, Assessing the Accuracy of Remotely Sensed Data: Principles and Practices (Boca Raton, FL: Lewis). CONGALTON, R.G. and MEAD, R.A., 1983, A quantitative method to test for consistency and correctness in photo-interpretation. Photogrammetric Engineering and Remote Sensing, 49, pp. 69–74. CURRAN, P.J. and WILLIAMSON, H.D., 1988, Selecting a spatial resolution for estimation of per-field green leaf area index. International Journal of Remote Sensing, 9, pp. 1243–1250. DUNGAN, J.L., 2002, Toward a comprehensive view of uncertainty in remote sensing analysis. In Uncertainty in Remote Sensing and GIS, G.M. Foody and P.M. Atkinson (Eds), pp. 25–35 (Chichester, UK: Wiley). FOODY, G.M., 2000, Accuracy of thematic maps derived from remote sensing. In Proceedings of 4th International Symposium on Spatial Accuracy Assessment in Natural Resources and Environmental Sciences, G.B.M. Heuvelink and M.J.P.M. Lemmens (Eds), pp. 217–224 (Delft, Netherlands: Delft University Press). GAHEGAN, M. and EHLERS, M., 2000, A framework for the modelling of uncertainty between remote sensing and geographic information systems. ISPRS Journal of Photogrammetry and Remote Sensing, 55, pp. 176–188. HAALA, N. and BRENNER, C., 1999, Extraction of buildings and trees in urban environments. ISPRS Journal of Photogrammetry and Remote Sensing, 54, pp. 130– 137. HEIPKE, C., MAYER, H., WIEDEMANN, C. and JAMET, O., 1997, Evaluation of automatic road extraction. International Archives of Photogrammetry and Remote Sensing, 32, pp. 47–56. HUG, C. and WEHR, A., 1997, Detecting and identifying topographic objects in imaging laser altimeter data. International Archives of Photogrammetry and Remote Sensing, 32, pp. 19–26. JANSSEN, L.L.F., 1994, Methodology for updating terrain object data from remote sensing data: the application of Landsat TM data with respect to agricultural fields. PhD thesis, Wageningen Agricultural University and ITC, Enschede, The Netherlands. PEDLEY, M.I. and CURRAN, P.J., 1991, Per-field classification: an example using SPOT HRV imagery. International Journal of Remote Sensing, 12, pp. 2181–2192.

2974

Integrated remote sensing and GIS

SHI, W., CHEUNG, C.K. and ZHU, C., 2003, Modelling error propagation in vector-based buffer analysis. International Journal of Geographical Information Science, 17, pp. 251–271. SINTON, D.F., 1978, The inherent structure of information as a constraint to analysis: Mapped thematic data as a case study. In Harvard Papers on Geographic Information Systems, Vol. 6, G. Dutton (Ed.), pp. 43–59 (Reading MA: Addison-Wesley). SKIDMORE, A.K., 1999, Accuracy assessment of spatial information. In Spatial Statistics for Remote Sensing, A. Stein, F. van der Meer and B. Gorte (Eds), pp. 197–209 (Dordrecht, Netherlands: Kluwer). SPACE IMAGING, 2004, Available online at: http://ww.spaceimaging.com (accessed July 2004). TOPOSYS 2004, Available online at: http://ww.toposys.com (accessed July 2004). TVERSKY, A., 1977, Features of similarity. Psychological Review, 84, pp. 327–352. WIEDEMANN, C., HEIPKE, C., MAYER, H. and JAMET, O., 1998, Empirical evaluation of automatically extracted road axes. In Empirical Evaluation Techniques in Computer Vision, K. Bowyer and P. Phillips (Eds), pp. 172–187 (Los Alamitos, CA: IEEE Computer Society). ZHAN, Q., 2003, A hierarchical object-based approach for urban land-use classification from remote sensing data. PhD thesis, Wageningen University/ITC, The Netherlands. ZHAN, Q., MOLENAAR, M. and TEMPFLI, K., 2002, Building extraction from laser data by reasoning on image segments in elevation slides. The International Archives of Photogrammetry and Remote Sensing, 34, pp. 305–308.

Lihat lebih banyak...

Comentários

Copyright © 2017 DADOSPDF Inc.