PDB: a pictorial database oriented to data analysis

July 21, 2017 | Autor: Domenico Tegolo | Categoria: Data Analysis

Descrição do Produto

SOFTWARE—PRACTICE AND EXPERIENCE VOL. 23(l), 55–73 (JANUARY 1993)

PDB: a Pictorial Database Oriented to Data Analysis MARIA CONCETTA MACCARONE AND MARIO TRIPICIANO

Istituto di Fisica Cosmica e Applicazioni dell’Informatica, C.N.R., Piazza G. Verdi 6, 90138 Palermo, Italy AND VITO DI GESÙ AND DOMENICO TEGOLO

Dipartimento di Matematica ed Applicazioni, University of Palermo, Via Archirafi 34, 90124 Palermo, Italy

SUMMARY The paper describes a new pictorial database oriented to image analysis, implemented inside the MIDAS data analysis system. Pictorial databases need expressive data structures in order to represent a wide class of information from the numerical to the visual. The model of the database is relational; however, a full normalization is not achievable, owing to the complexity of the visual information. The paper reports the general design and notes on the software implementation. Preliminary experiments show the performance of the pictorial database. Information retrieval Relational databases Pictorial databases analysis systems Image analysis

KEY WORDS

Management and query Data

INTRODUCTION Data analysis systems (DAS) require the integrated management of data (measurements and results) and methods (algorithms and procedures of analysis); moreover, a suitable organization of the data is necessary to reach efficiency in both the management and the retrieval phases. A pictorial database (PDB) plays a relevant role in the design of a DAS, whenever the applications include treatment of images. In this case, numerical, alphanumeric, graphical and pictorial information are present; each of these plays different roles, depending on the particular situation: non-visual information can be used to extend the description of the pictorial information; pictorial information is, on the other hand, an effective communication tool because it is a more immediate, expressive and readable way to represent intermedi1 ate and/or final data in the analysis phases. 2 PDBs must realize a uniform and integrated view of heterogeneous information (visual and textual). Pictorial queries should be formulated in an expressive and efficient fashion; for this purpose many solution have been proposed, such as Pictorial 3 4 5 SQL (PSQL), PICQUERY and PROBE. Furthermore, several pictorial databases have been produced to satisfy the demand in different areas of interest, as in the

56

M. C. MACCARONE ET AL. 6

area of office automation, where textual, visual and acoustic information must 7 be integrated in a multimedia information system; in computer-aided design; in cartography and in typical geographical information systems, where the integration is between geographical entities (e.g. socio-economic data, city names, . . , ) and 8–10 spatial entities (e.g. maps, raster images derived from remote-sensing equipment). On the other hand, a DAS should allow choice among a wide class of different 11 methods that can be applied to the analysis of the same data set. The choice clearly depends on the final goal of the investigation. Two main procedure paradigms can be identified whenever the analysis is carried out by using multiple methods: 1. Co-operation, which consists of the combination of the results obtained from l different analyses. For example, a ‘sparse image’ may be the input data of both a cluster analysis and a restoration algorithm; the results of the clustering can be used to confirm those ‘signals’ which are enhanced after the restoration. In this case the combination is parallel: the methods are applied at notionally the same time and then their results are used for the final decision. Other combinations may be sequential: two or more methods can be applied sequentially and the results of each algorithm in the sequence are used as input to the next method. 2. Validation. In this case the analyst needs to compare different methodologies, which produce the same type of output information. For example, a test image can be segmented by using histogram thresholding, mathematical morphology operators, or some other filtering technique. The resulting segmentations are then analysed further to evaluate the ‘best’ one. The evaluation of the ‘best’ method is based on the user’s a priori knowledge. Usually this decision is very hard and a satisfactory answer is not always reached. The validation procedure is always parallel. Each analysis clearly requires objective and/or subjective conditions to be satisfied by the data: statistical conditions, congruence with results of previous analyses on the same data or of other applications as known from the literature, and also notwell-defined conditions, such as those coming from the direct experience of the analyst, whose knowledge cannot always be described formally because mathematical 11,12 and heuristic reasoning are usually intermixed. Therefore, the inclusion of a suitable PDB in a DAS must provide tools for efficient handling of queries related to the types of analyses outlined above. It should allow a system to keep track of all operations that have been performed on an image (analysis, parameters and conditions used, output results.. .), and to handle images as structured objects where visual and non-visual information are mutually related. In this paper we describe a new PDB oriented to image-data analysis. The 13 14 environment chosen for its implementation is the MIDAS system, ’ which was first developed to analyse astronomical data. MIDAS has been chosen because of its portability: it is based on Unix, and its image data are essentially represented by 15 FITS files, allowing a complete system-independence. The last feature is relevant in distributed PDB organization. The MIDAS system is an open system that includes pictorial functions, standard image-analysis primitives, and specialized applications; moreover, the user can design and implement in it his own applications. The PDB has been designed in order to permit inclusion of ‘knowledge’; in this respect the PDB may be considered also as a consultant system.

The criteria introduced in the design of the proposed PDB are intended to promote generality and extensibility: for example, the retrieval operations are designed to support a uniform view of heterogeneous information (visual and non-visual), and they should be capable of extension to deal with uncertainties, errors and intrinsic indeterminacies of the data. Subsequent sections of this paper cover: the fundamental data structure supported by the MIDAS system; the description of the PDB design, management, and queryprocessing; some implementation notes; a brief comparison between our PDB and existing image database systems; and final remarks. The Appendix describes the syntax of the PDB commands. DATA STRUCTURE IN MIDAS The MIDAS system recognizes three basic abstract data structures: keywords (implemented as a global data structure in the environment to provide communication among different main programs or to store intermediate results), descriptors and data values. The combination of data and their descriptors (variables associated with the data) yields two composite data structures: bulk data frames (usually called images ) and tables (also named bdf and tbl ). Other data structures defined in MIDAS are not used in the implementation of the present PDB. A set of standard interfaces performs the I/O operations on all the abstract and composite data structures; also, standard interfaces support communication with the user, with the image displays and the interactive graphics devices, as well as with the command language. 15 Images, derived from the FITS format, consist of a set of virtual blocks on the disk, logically divided into three parts, as shown in Figure l(a):

.

(a) a frame-control block, containing internal information, e.g. the total number of virtual blocks in the file, and used only by MIDAS system routines (b) a values part containing the real homogeneous data (for example, the intensity of each pixel in the image) stored line by line; the number of these blocks is a fixed number N, proportional to the number of pixels in the image (c) a descriptive part, holding the descriptors of the image; a minimal set of these, called standard descriptors, must exist for every image: number of axes ( NAXIS ), number of pixels along each axis ( NPIX(NAXIS) ), world-co-ordinate value of the first pixel of each axis ( START(NAXIS) ), size of each pixel in the world-co-ordinate system ( STEP(NAXIS) ), minimum and maximum data values ( LHCUTS(1), LHCUTS(2) ), plus others. A user can add variables to this descriptive part, related to various kinds of information in the image (displaying, statistics, creation date . ..). The creation-date field is useful because it facilitates selection of image sets for different dates, which is relevant in timeanalysis applications. The number M of virtual blocks in the descriptive part is variable, depending on number and length of descriptors of the image at a given time. This composite data structure is not fully homogeneous, and (as will be shown), it does not allow a full normalization of the PDB. Tables are collections of heterogeneous data, ordered by rows and columns; each column contains elements of the same type, associated with the same property of the object to which they are related (semantic integrity). The internal structure of

58

M. C. MACCARONE ET AL.

Figure 1. Internal format of data structures in MIDAS: (a) scheme of a bulk data frame; (b) scheme of a table. ‘s’ indicates 0 or 1 in the selection of rows (column zero); ‘v’ indicates some value in the table element; ‘*’ stans for NULL values

a table file is similar to that of an image, with a first virtual control block followed by the data, stored column by column, and by a set of various descriptors. Columns in a table correspond to scan lines in the values part, with the number of columns and rows actually defined by the user; this number may also be less than the total allocated size in the values-part, as shown in Figure l(b); undefined values are set to ‘*’ There are three special columns in a table: sequence, selection and reference columns: 1. The sequence number is a virtual column associated with each row, computed ‘on the fly’ without being allocated in any physical position in the table. 2. The selection column is allocated physically as column number zero and is used to select subsets of the table. 3. Each entry in the table is associated with a key, called ‘reference’, to allow direct access to the information in the table: it plays the role of ‘primary key’, as defined in database theory, and it can be any of the user-defined columns in the table. MIDAS is a command-driven system in which user enters commands followed by parameters. MIDAS commands divide themselves into three categories: (a) primitive commands, (data I/0, image-editing, displaying, plotting, image-

PDB: A PICTORIAL DATABASE

59

algebra, zooming, rotation, segmentation, edge-extraction, cursor operations...) (b) application commands, developed for special purposes (filtering, centring, clustering, FFT, deconvolution, inventory, fractal geometry...) (c) procedure-control commands, to write programs for the MIDAS system. Applications, generally created by the users, are organized in contexts, as understood in many data-analysis systems, which contain all the special programs and commands needed in the given application. Examples of contexts in MIDAS are: MVA context for multivariate data analysis, including clusters, discriminant and principal component analysis; STATIST context for statistical tests on tables; KNN context for the restoration of sparse images and computational geometry; FUZ context for the extension of the mathematical morphological analysis on fuzzy images. Contexts are opened and closed in order to preserve the memory space to be overloaded whenever users create their own applications. PDB DATA ORGANIZATION A database maybe defined as a set of structured data and relations grouped together in a homogeneous environment to be used by different applications under the supervision of a suitable database management system. In the following, we shall use both the terms of the relational model and their synonyms, as listed in Table I. A PDB is a collection of images, which are structured as MIDAS entities, nonhomogeneous data, and tables on which various kinds of queries are performed. The queries have different levels of depth, and they act on several types of information: numerical, alphanumeric, graphical and pictorial. The basic model that we choose to satisfy the requirements for the efficient 16 17 management and querying of such a PDB is the relational one. ’ Numerical and non-numerical data are both naturally represented in tables that allow the establishment of relations among data, and a ‘query’ is a formula to be evaluated with respect to this model. Note that in this model each relation is a set and it therefore does not contain duplications: each tuple is unique with respect to the relation, and the combination of all the attributes has the property of unique identification. The PDB handles three different types of relations: (a) the collection of image data on which to perform management and query, denoted by ( PDB_1, PDB_Z, . . . . PDB_i, . . . . PDB_n ) or by pdbname Table I Relational terminology

Synonyms

Relation Tuple Position of the tuple Attribute Attribute-value in a tuple Attribute-domain

table, archive, entity row, record field column, class column-value in a row set of values of a column

60

M. C. MACCARONE ET AL.

(b) the application-code table (ACT)containing references and infomation on the contexts (c) the application-result tables (ARTs), containing the output of operations performed by a context on the image data. In Figure 2 the general organization of the PDB is shown, together with the links between the different types of relations. Note that the ACT maybe considered as away to realize an associative model for memory access; as a matter of fact, its entries are strictly related to the names of the ARTs. Referring to Figure 2, image dodo, contained in PDB_1 and in PDB_zeta, can access the results tables FUZ_dodo and MVA_dodo via the codes FUZ- and MVA_ present in the ACT. The ACT represents a many-to-many relation between the PDB and ARTs, and it can be very useful whenever management and query of the PDB have to be performed in an iconic environment.

Figure 2. General organization of the PDB. Double arrows link relates images in the PDB_i (s) with ART(s) via the codes in the ACT relation

61

PDB: A PICTORIAL DATABASE

Table II Structure of a PDB_i relation IMAGE

IDENT

FATHER

NAXIS

NPIX1

NPIX2

myima myimal ipOOO

real data spectrum real data

ipOOO myima

*

2 1 2

128 200 300

128 * 300

...... ...... . . . . . ..

...... ...... ......

...... ...... ......

.... .... ....

.... .... ....

.... .... ....

NPIX3

* * * ...... ...... ......

PDB_i

Each PDB_i is a relation that contains: name ( IMAGE ), identifier ( IDENT ) and dimensions ( NAXIS, NPIX1, NPIX2, NPIX3 ) of the images belonging to it: PDB_i = 〈 IMAGE, IDENT, FATHER, NAXIS, NPIX1, NPIX2, NPIX3 〉

The column FATHER specifies the name of the image from which the IMAGE included in PDB_i originates; for example, in Table II, the attribute lMAGE=myima could be the result of an operation performed on the image FATHER =ip000. Whenever a FATHER does not exist, the attribute value is set to ‘*’ (this terminology is always used for undefined fields). This attribute plays a central role in the retrieval of all the information related to a given analysis. For example, very often the analyst works with many items of data and needs to retrieve the path of his analysis by following the data-chain produced by starting from the origin or from a given point. Note that the relation PDB_i is built by using the information stored in the descriptive-part of the bdf data. Moreover, the attribute IMAGE refers to a ‘bdf’ data-item which is a composite data structure, as previously described; in this sense the relations PDB_i are not fully normalized. Many PDB_i ’s can obviously reside in the working disk area; however, only one is active at any one time in the present implementation. ACT

The ACT relation ACT = {CODE, APPIDEN, INFO} is shown in Table III. It contains the codes ( CODE ) of reference to the existing applications, the full names of which are contained in the attribute APPIDEN. The attribute INFO gives supplementary information about the event application. Table III Structure of the application code relation ACT CODE

APPIDEN

INFO

CLU_ FUZ_ KNN_

cluster analysis fuzzy morphology knn restoration .........

lD, 2D, MST , ........................................................... 2D multilevel images , ................................................... 2D sparse images, . . . . . . . .. . . . . . .. . . . . ......... . . . . . ......... ........... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......... . . . . . ......... . . . . . ......... . ........

......

62

M. C. MACCARONE ET AL

More specifically, the column INFO contains information about the type of data and the algorithms available in the context of a given application. This feature helps the user to plan the application, and it could be a good starting-point for the development of a consultant system. For example, in Table III the cluster analysis can only be performed on mono- and hi-dimensional data (1D and 2D), for which the minimum spanning tree (MST) algorithm is available. The user could also fill the column INFO with comments related to a specific task or analysis algorithm. The ACT relation can be always updated with new applications, both in the user view of the PDB and, upon request, at the system level; in this case, the updating holds for the whole PDB. ARTs

Pictorial databases must support application sessions where algorithms act on image data and results are stored automatically. Moreover, the user needs to keep track of the whole sequence of operations performed during the analysis. The ART relations contain the results of the operations performed by the contexts on image data, and are the lowest level of the PDB. The schema of an ART relation is variable: it depends on the given context, which is defined by the user, and it can be updated with new attributes. To each image data a set of ART relations can be associated, with names related to the entries in the ACT relation (attribute CODE ), as follows: name of the ART relation == 〈 ACT: CODE 〉 _ 〈 imagename 〉 For example, if the fuzzy morphology context FUZ_ has been applied on the image alfa, the results will be stored in the ART relation FUZ_alfa, as shown in Table IV, in which the first three rows contain the results ( clos1, clos2, clos3 ) of the application of a given fuzzy operator (closing) at different numbers of iterations (1, 2, 3) on the input image alfa. The syntax defined for building the name of the ART relations plays a relevant role in the query of the PDB; in fact it allows us to construct a simple mechanism which links data with application algorithms and results via the ACT relation. Table IV Example of an ART for the fuzzy morphological application. If it is related to the image alfa, its name will be FUZ_alfa OPER

STREL

closing closing closing . . . . .

maxstru maxstru maxstru . . . . .

ITER

OUT clos1 clos2 clos3 . . . . .

1 2 3 ..

METR min min min . . . . .

FUZZY

F1

norm norm norm . . . . .

F2 * * * ....

F3 * * * .....

* * * ....

MANAGEMENT OF THE PDB The management system includes commands to provide both the definition of the PDB_i and ACT schemes and the data-handling of the PDB. These commands are

PDB: A PICTORIAL DATABASE

63

designed to preserve the integrity of the PDB when they operate on a user’s database. They are subdivided into the following cLasses: ( a) creation commands: to create a PDB_i, or a new application code table; or to set the active PDB_i (b) control commands: to check and update the given PDB_i with respect to the existing images on the disk c ( ) update command: to add (and subtract) images into a given PDB_i (d) deletion commands: to delete the chosen PDB_i physically from the disk area (e) execution commands: to execute procedures in the selected entries of the chosen PDB_i. All the management commands, listed in Table V, together with the query commands, described in the next paragraph, are stored in a special MIDAS context, named PDB. A brief description of the PDB context commands is given in the Appendix. QUERIES ON THE PDB Pictorial databases must provide tools in order to perform queries by using: (a) attribute conditions (AC) defined on the attribute values; for example: query 1: List all the image names in PDB_test with NAXIS = 2. query 2: Print all the image names in PDB_test with NPIX1 =128 and with NPIX2 =256. (b) values conditions (VC) defined on the image data values; for example: query 3: List all the image names in PDB_alfa with value equal to 2·5 in the pixel [10,20] (here expressed in the world co-ordinate system). query 4: List all the image names in PDB_alfa with minimum value of the pixel intensity greater than 2·5 and maximum value less than 50. defined on the applications performed on the (c) application conditions (APC) images; for example, a query may be required to select images for a given analysis, by excluding those for which the same (or other) analysis has been already performed: query 5: Select all the images in PDB_test on which the cluster analysis has Table V List of the PDB management commands Creation commands Control commands Updating commands Deletion command Execution commands

CREATE/PDB CREATE/ACT SET/PDB CHECK/PDB UPDATE/ACT ADD/PDB SUBTRACT/PDB DELETE/PDB EXECUTE/PDB

64

M. C. MACCARONE ET AL.

been performed. query 6: Print all images in PDB_test on which an erosion operator has been applied during a fuzzy analysis. The retrieval of information is mainly performed via conditions provided to the selection command SELECT/PDB that supports logical ( AND, OR, NOT ) and relational operators (=, 〈, 〉, ≠, ≥, ≤ ). Inside the conditions, functions ( SQRT, LN, LOG10, EXP, SIN, COS, TAN, ASIN, ACOS, ATAN, SINH, COSH, TANH, ABS, INT ) and arithmetic operators (+, –, /, *) may be applied to the numerical values. In addition, a complete set of operators taken from relational algebra (union, projection, product, 18 19 intersection, join, quotient, difference) has been implemented to handle tables. ’ Two logical variables select and all are used to include the previous selected mask and to restore the complete PDB_i, respectively. The logical variable select represents the true value of all the previous conditions performed on the PDB_i under examination. The query syntax is: 〈 query 〉:: =〈 command 〉 〈 pdbname 〉 〈 condition 〉 〈 command 〉 :: =〈 pdbcommand 〉 /PDB 〈 pdbcommand 〉:: = SELECT  LISTIPRINT  EXECUTE 〈 pdbname 〉:: = PDB _xxx 〈 condition 〉:: =〈 simple condition 〉 〈 condition 〉:: =〈 simple condition 〉 .〈 logop 〉.〈 simple condition 〉 〈 condition 〉:: =[〈 simple condition 〉.〈 logop 〉 .〈 simple condition 〉]∗ 〈 condition 〉:: = select. 〈 logop 〉.〈 condition 〉 〈 condition 〉:: = all 〈 simple condition 〉:: = AC VC APC . relop 〈 〉 .AC VC APC  AC VC APC :: = 〈 function 〉 and/or 〈 arop 〉 on attributes  values  applications attributes  application values. 〈 Iogop 〉: : = AND  OR  NOT 〈 relop 〉. :: =EQ GT LE NE GE LE 〈 arop 〉::= +-  /  * 〈 function 〉:: =〈function 〉[〈function 〉] ∗SQRT  LN  LOG 10  EXP  SIN  COS  TAN  ASIN  ACOS  ATAN  SINH  COSH  TANH  ABS  lNT  user_ 〈 function name 〉

Several operations are allowed on the selected tuples of the given PDB_i, or on the complete PDB; for example, it is possible to list or to print it, or to execute a procedure on it, or physically to create a new reduced database. The most important commands related to the query are SELECT/PDB, LIST/PDB, PRINT/PDB a n d EXECUTE/PDB; they permit retrieval of a subset of the given PDB_i. Some additional commands have been implemented to extract information about the existence of images in the PDB, and the applications performed on them ( SEARCH/PDB and TREE/PDB ), or about the codes in the ACT relation ( LIST/ACT and PRINT/ACT ). Such commands can also be useful in management tasks. EXAMPLES Owing to the fact that the pictorial database is built on the basis of a tabular structure, all MIDAS commands that refer to tables are incorporated in our design.

PDB: A PICTORIAL DATABASE

65

In terms of the MIDAS command syntax and the query syntax presented above, the queries l-6 of the previous section are expressed as follows: query 1: SELECT/PDB LIST/PDB query 2: SELECT/PDB PRINT/PDB query 3: SELECT/PDB LIST/PDB query 4: SELECT/PDB LIST/PDB

query 5: SELECT/PDB query 6: SELECT/PDB PRINT/PDB

PDB_test PDB_test PDB_test PDB_test PDB_alfa PDB_alfa PDB_alfa

:NAXIS.EQ.2 :NPIX1 .EQ.128.AND.:NPIX2 .EQ.256 :IMAGE[10.,20.].EQ.2.5 :IMAGE,LHCUTS(1).GT.2.5.AND. :IMAGE,LHCUTS(2).LT.50.

PDB_alfa PDB_test CLU_ PDB_test FUZ_:OPER.EQ. ″ erosion ″ PDB_test

Note that, in queries 3 and 4, it is necessary to go down into the structure of the images present in the given PDB_i, owing to the fact that neither the pixel value nor the minimum and maximum intensities are declared explicitly in the PDB_i. Indeed, query 5 needs only to read the ACT relation and the active PDB_i, and query 6 also needs to read the ART FUZ_ ′ ima ′ related to the image names ima in the active PDB_i. Complex queries with logical operators ( AND, NOT, OR ) can be computed via recursive selections, by splitting them into a sufficient number of selections before performing the final action. Obviously, the order of selection is not relevant. For example, query 4 can also be expressed as: query 4: SELECT/PDB PDB_alfa SELECT/PDB PDB_alfa LIST/PDB PDB_alfa

: IMAGE, LHCUTS(1).GT.2.5 select. AND.: IMAGE, LHCUTS(2) 〉 .LT.50.

The combined query: list all the image names in the PDB_alfa with NPIX1 =512, NPIX =512, with values ranging from 1 to 325, on which some cluster analysis has been performed and for which the application of the fuzzy morphological operators erosion and dilation has been executed with a cross structuring element with reference to the ART relation shown in Table IV, would then be expressed as: SELECT/PDB SELECT/PDB SELECT/PDB SELECT/PDB SELECT/PDB SELECT/PDB SELECT/PDB LIST/PDB

PDB_alfa PDB_alfa PDB_alfa PDB_alfa PDB_alfa PDB_alfa PDB_alfa PDB_alfa

:NPIX1 .EQ.512.AND.NPIX2 .EQ.512 select.AND.:IMAGE, LHCUTS(l1).EQ.1. select.AND.: IMAGE,LHCUTS(2),EQ.325. select.AND.CLU_ select. AND.FUZ_: OPER.EQ. ″ erosion ″ select. AND.FUZ_:OPER.EQ. ″ dilation ″ select. AND.FUZ_:STREL.EQ. ″ cross ″

66

M. C. MACCARONE ET AL.

IMPLEMENTATION NOTES AND EVALUATION MIDAS is a system developed for Unix and DEC VMS operating systems in an XWindows environment. MIDAS is available for several computer systems (VAX, Apollo, Bull, Sun, IBM RISC/6000, Macintosh, Stellar, Alliant, PCS, HP). For technical and implementation details, see References 13 and 14. The set of PDB commands has been implemented as a context inside a portable version of MIDAS under Unix, on a Sun-4/l10 workstation with operating systems Sun/OS.4.1, with a disk subsystem (average seek time 18 ms); it (as well as all the other existing MIDAS contexts) has been developed in FORTRAN 77. Essentially the implementation of each PDB command requires the creation of three different file types: 1. Source-code file ( ′ filename ′ for ). It contains the source code of the PDB command. 2. Procedure file ( ′ filename ′. prg ). It contains the parser to check the syntax of the command line, and to assign the appropriate set of parameters to MIDAS keywords. It may also contain MIDAS commands, e.g. those used to run an executable code. 3. Help file ( ′ filename ′. hlp ). It provides the on-line explanation of PDB commands. All the commands are defined in a context file ( pdb.ctx ), which contains the names of the new commands and the links to the related procedure-files. Images and tables are organized in composite data structures derived from FITS format, as previously described. At present, MIDAS allows tables with 256 columns at most, while the number of rows is limited by the mass storage available. Attribute values can be of integer, real, double-precision, logical or character type. In principle, image data may have an unlimited number of dimensions ( NAXIS ); nevertheless, the actual implementation of MIDAS system only allows 1D, 2D and 3D real data. Descriptors can be of integer, real, double-precision, logical and character type. A graphics interface allows the user to extract subimages for query purposes and to display results. Image operators allow the merging of pairs of images and performance of standard algebraic operations. The last feature is very useful in expressing complex queries. The performance of the proposed PDB has been evaluated by testing query and management commands under variable conditions of data-set size. In Figure 3, curve (a), the CPU times needed to create PDB_is with increasing numbers of entries ( 0, 100, . . . . 500) are shown. For zero entries, the CPU time is still greater than zero ( ~0·723 s); this happens because the command CREATE/PDB generates a table of 250 empty rows by default. The experimental points fit a straight line, as expected. The performance of the SUBTRACT/PDB command has been evaluated by deleting 100 elements from the tail (worst case) of a PDB_i with varying number of entries (loo, 200, . . . . 500). Figure 3, curve (b), shows that the points fit a straight line. The performance of the ADD/PDB command has been evaluated in two situations: (a) add 100 elements to a PDB_i containing 0, 100, . ...500 entries ( Figure 4, curve (a)); (b) add 100,200, ..., 500 elements to a PDB_i with 500 entries ( Figure 4, curve (b)). In both cases, a sorting is performed after the addition; this implies a total time complexity of O(N+N log N). Figure 4 shows the results of these ‘experiments; the variations in curve (a) are due to the internal management of MIDAS tables

PDB: A PICTORIAL DATABASE

67

Figure 3. CPU time (in seconds) required to create PDBs with different numbers of entries (curve (a)) and to delete 100 entries from the same PDBs (curve (b))

which provides an automatic extension of the number of rows, as is occasionally required by the memory mechanism. Nevertheless, this effect is not relevant (~5 per cent of the total CPU time). The SEARCH/PDB command’s performance has been evaluated under two situations: (a) search 10 entries at the end of PDBs with increasing size (100, . . . . 1000) ( Figure 5, curve (a)); (b) search 10, . . . . 500 entries at the end of a PDB_i with 1000 entries ( Figure 5, curve (b)). Figure 5, curve (a), shows a non-linear dependence due to the fact that when few entries have to be searched, the system service routines play a more relevant role. Figure 5, curve (b), shows a linear trend due to the fact that the PDB_i size (1000 entries) is the main contributor to the search time. The SELECT/PDB command has a computation time complexity similar to that of SEARCH/PDB; but in this case the proportional factor is bigger because the selection is a more general command: in fact, it can select multiple entries from any attribute (column) of the PDB_i, while the SEARCH/PDB command works on only one entry in the primary attribute (: IMAGE column). For example, the selection of 10 elements in a PDB_i with 1000 entries requires 0·93 s, whereas the search, under equal conditions, needs 0·53s.

68

M. C. MACCARONE ET AL.

Figure 4. CPU time (in seconds) required to execute the ADD/PDB command under two different conditions (see text)

COMPARISON WITH EXISTING IMAGE DATABASE SYSTEMS Most of the data-management systems oriented to pictorial information are developed for use in specific application areas. Consequently, their query languages depend strongly on these applications. This is particularly evident whenever a pictorial database is set up to hold experimental scientific data. This problem can be partially solved by designing modular and open systems, in which application programs, belonging to different contexts, can be included5 in query statements. This reasoning underlies the PROBE architecture. The goal of PROBE is to provide a general-purpose database system for applications involving spatial and temporal data and other kinds of data with complex structure. The PROBE data model has two basic types of data objects: entities and functions. An entity is a data object that donates some individual thing. Functions are relationships between entities or operations on entities. Thus, in order to access properties of an entity or other entities related to an entity, one must evaluate a function having the entity as an argument. For example, consider a proximity query: find all images in the PDB_alfa within a given positive threshold, THRESH, of a given TEST image. Instead of requiring an object class capable of processing this entire query, PROBE only requires the object class to provide a function that indicates whether a pair of objects satisfies the selection condition. This functionality is included in PDB where the previous PROBE query is performed as follows: SELECT/PDB PDB_alfa user_dist1 (: iMAGE,TEST).LE.THRESH

PDB: A PICTORIAL DATABASE

69

Figure 5. CPU time required to execute the SEARCH/PDB command under two different conditions (see text)

Another relevant aspect is man-machine communication via a graphical interface. 4 PICQUERY is a non-procedural tabular query language beyond QPE (query by pictorial examples) for logically enhancing on-line access of robust pictorial database management systems. PICQUERY commands may operate on the whole pictorial database or on a set of picture-object identifiers: this opportunity is also present in PDB commands, as shown in the previous Sections. For each required operation, PICQUERY displays a related table, columns of which have to be filled by the user to perform the operation under the desired conditions. This is particularly efficient to manage pictures with predefine objects, related to simple procedures. At the present time PDB is a command procedural language, without user-dialogue tables: in scientific applications, the procedures can become very complex and interrelated so that it is not easy to define unambiguous dialogue tables for many structured operations. We intend to work on this issue in the future. Pictorial and alphanumeric databases must be integrated to provide a uniform interface, but their representation and processing must be clearly distinguished. 3 PSQL is a system that allows pictures to be represented, stored and queried in their analogue form. PSQL also allows the user to do direct manipulations on the pictorial database. Alphanumeric data associated with pictures can be displayed on the picture to assist the user. PDB can operate, like PSQL, on both pictorial and alphanumeric domains; nevertheless it does not provide basic pictorial domains such as points, line segments and regions. Pictorial databases like EIDES (ETL image database for experimental studies) offer unified methods of access to standardized image-file organization, where row

70

M. C. MACCARONE ET AL.

data accessible by pointers are integrated with alphanumeric symbolic information. In addition, the PDB supports a hierarchical organization of images and applications performed on it: the hierarchy links the results of the applications to the input 20 21 images. ’ PDB incorporates, at the same time, most of the properties individually included in the systems mentioned above. In particular, due to the fact that PDB is implemented in MIDAS, it is naturally interfaced with all the existing application contexts, by sharing data and functions with them. This implies two advantages: 1. To have the opportunity to build powerful and very complex queries, based on the user’s procedures and functions. This feature makes PDB a very open system. 2. The operation of import/export of data from/to PDB is very easy and transparent within any MIDAS context. Moreover, PDB allows the user to retrieve the history of an application session by means of the ARTs previously described; this feature is very useful to the analyst who performs many iterated analyses on the same data set. Finally, the portability of PDB among Unix-based computer systems is guaranteed by the portability of MIDAS. FINAL REMARKS The organization of the image data in a pictorial database is crucial for achievement of efficiency in the analysis, selection and retrieval phases at the same time. The integration of image and tabular data in PDB allows integration of visual and nonvisual information that is shared among different applications, and hence promotes such efficiency. The design of PDB is oriented particularly to the analysis of experimental scientific data analysis: it is portable, well integrated with the applications, and the user can keep track of the history of any working session. It has to be noted that in a wide range of application areas, including γ− and Xray astronomy, handling and retrieval of pictorial information must take into account that image data are characterized by an intrinsic indeterminacy mainly due to the low statistical quality of the information, to the diffuse morphology of the objects, and to intrinsic experimental errors. Therefore, an information-retrieval system should ideally also support the evaluation of data containing statistical uncertainties, 11 and this is a requirement that conventional pictorial databases do not fully satisfy. ’ 12 Future developments of the PDB system that we have presented here will address this problem. APPENDIX In the following, all the commands related to management and query of the PDB are described briefly, as implemented in the MIDAS system. Each command line is structured as: command/quaIifier par1 par2 . . .

The commands describe the general action to be performed, and the qualifiers

PDB: A PICTORIAL DATABASE

71

specify the object of that action. The parameters hold all other infomation needed to perform the required action. The commands related to the PDB are stored in a special MIDAS application, named PDB context. The name of the active PDB_i is stored in the PDB_ON keyword. The PDB files are completely equivalent to table files in MIDAS. They have the same extension ‘ .tbl ’; therefore all commands operating on tables (displaying, editing$ plotting, statistics and relational algebra commands) can be applied on PDB_i tables, ACTS, and ARTs. SET/CONTEXT PDB

activate all commands in the PDB MIDAS context. CLEAR/CONTEXT

deactivate all commands of the current MIDAS context. CREATE/PDB pdbname list-of-images create a new pdbname database and activate it. CREATE/ACT

interactively create a new application code table ACT. UPDATE/ACT

upgrade the application code table ACT via interactive editing. SET/PDB pdbname

set the name for the active database and check its integrity with respect the existing bdf images on the disk. CHECK/PDB pdbname correction_flag

check if all images in the given PDB_i are yet physically existing in the working directory on the disk, and erase all the entries concerning images no longer present on it. ADD/PDB list-of-images pdbname

add an image (or a list of images) entry to the given PDB_i SUBTRACT/PDB list-of-images pdbname

subtract an image (or a list of images) entry from the given PDB_i. DELETE/PDB pdbname

physically delete the given PDB_i from the working directory. (This is a very risky operation, like . . . ...* in Unix!) EXECUTE/PDB pdbname proc.prg par1 par2 . . . EXECUTE/PDB pdbname command/qualifier par1 par2 . . .

execute a MIDAS procedure or command on all the selected images in the given PDB_i. SEARCH/PDB list-of-images Iist-of-pdbname out-table disp-flag search for an image (or a list of images) in all the PDB_i present in the user’s

working directory, and store the results in an output table. TREE/PDB image-name

display on the terminal the code and the identification of all applications (if present) performed on the given image. The ACT relation is used.

72

M. C. MACCARONE ET AL.

LIST/ACT

display on the terminal the applications present in the application code table ACT. PRINT/ACT

print the application code table ACT. SELECT/PDB pdbname condition select entries in the given pdbname under conditions specified. LIST/PDB pdbname

display on the terminal all the selected elements of the PDB_i table. PRINT/PDB pdbname

print all the selected elements of the PDB_i table. REFERENCES 1. V. Di Gesù and M. C. Maccarone, ‘An approach to random image analysis’, in V. Cantoni, V. Di Gesù and S. Levialdi (eds), Image Analysis and Processing II, Plenum Press, New York, 1988, pp. 111–118. 2. H. Tamura and N. Yokoya, ‘Image database systems: a survey’, Pattern Recognition, 17, (l), 29–43 (1984). 3. N. Roussopoulos, C. Faloutsos and T. Sellis, ‘An efficient pictorial database system for PSQL’, IEEE Trans. SoftWare Eng., 14, (5), 639–650 (1988). 4. T. Joseph and A. F. Cardenas, ‘PICQUERY: a high level query language for pictorial database management’, IEEE Trans. Software Eng., 14, (5), 631–638 (1988). 5. J. A. Orenstein and F. A, Manole, ‘PROBE spatial data modeling and query processing in image database application’, IEEE Trans. Software Eng, 14, (5), 611–629 (1988). 6. M. M. Zloof, ‘QBE OBE: a language for office and business automation’, Computer, 14, 13–22 (1981). 7. M. L. Baird, ‘A computer vision data base for the industrial bin of parts problem’, GM Research Publication GMR-2502, GM Research Laboratories, 1977. 8. M. J. Jackson, ‘Digital cartography, image analysis, and remote sensing, towards an integrated approach’, Interdisciplinary Science Reviews, 12, (1), 33–40 (1987). 9. A. L. Zobrist and G. Nagy, ‘Pictorial information processing of Landsat data for geographic analysis’, Computer, 14, (l), 34–41 (1981). 10. D. T. Lauer, ‘Applications of Landsat data and the data base approach’, Photogrammetric Eng. and Remote Sensing, 52, (8), 1193–1199 (1986). 11. M. C. Maccarone and R. Buccheri, ‘Decision problems in the search for periodicities in gammaray astronomy. How can A.I. help?’, in A. Heck and F. Murtagh (eds), Knowledge-based Systems in Astronomy, Springer-Verlag, Series ‘Lecture Notes in Physics’, 1989, pp. 79–87. 12. V. Di Gesii, M. C. Maccarone, D. Ponz, D. Tegolo and M. Tripiciano, ‘Pictorial information retrieval with uncertain knowledge’, in V. Cantoni, L. P. Cordella, S. Levialdi and G. Sanniti di Baja (eds), Progress in Image Analysis and Processing, World Publishing Corp., 1990, pp. 583–589, 13. K. BanSe, D. Ponz, Ch. Ounnas, P. Grosbol and R. Warmels, ‘The MIDAS image processing system’, in L. B. Robinson (cd.), Instrumentation for Ground-Based Optical Astronomy, SpringerVerlag, New York, 1988, pp. 431–442. 14. ESO Image Processing Group, The MIDAS Environment, 1986. 15. D. C. Wells and E. W. Greisen, ‘FITS: a flexible image transport system’, Kitt Peak National Observatory, 1979. 16. C. J. Date, An Introduction to Database Systems, 2nd edn, Addison-Wesley, Reading, Mass., 1977, 17. J. D. Unman, Principles of Database Systems, 2nd edn, Computer Science Press, Rockville, MD, 1982. 18. M. C. Maccarone, N. Robba and O. Di Rosa, ‘A support environment in the X-ray astronomical data analysis: some preliminaries’, in F. Murtagh and A. Heck (eds), Astronomy from Large Databases, ESO Conf. and Workshop Proc. no. 28, 1988, pp. 383–388. 19. V. Di Gesti, M. C. Maccarone, D. Ponz and N. R. Robba, ‘REAL: relational algebra package in MIDAS’, ESO Messenger, 1989.

PDB : A PICTORIAL DATABASE

73

20. D. M. McKeown, Jr. and D. R. Reddy, ‘A hierarchical symbolic representation for an image database’, Proc. 1977 IEEE Workshop on Picture Data Description and Management, 1977, pp. 40–44. 21. T. A. Pavlidis and S. L. Tanimoto, ‘A hierarchical data structure for picture processing’, Comp. Graphics & Image Processing, 4, 104–119 (1975).

Lihat lebih banyak...

PDB: a pictorial database oriented to data analysis

Descrição do Produto

Comentários