protégé as a vehicle for developing medical terminological systems

June 23, 2017 | Autor: Ronald Cornet | Categoria: Knowledge Representation, Knowledge Based System, INTENSIVE CARE, Conceptual Framework, Knowledge Acquisition, Knowledge Modeling, Spectrum, Knowledge Modeling, Spectrum

Share Embed

Denunciar este link

Descrição do Produto

ARTICLE IN PRESS

Int. J. Human-Computer Studies 62 (2005) 639–663 www.elsevier.com/locate/ijhcs

PROTE´GE´

as a vehicle for developing medical terminological systems

Ameen Abu-Hannaa,, Ronald Corneta, Nicolette de Keizera, Monica Crube´zyb, Samson Tub a

Department of Medical Informatics, AMC-University of Amsterdam, Meibergdreef 15, 1105 AZ Amsterdam, The Netherlands b Stanford Medical Informatics, Stanford University School of Medicine, 251 Campus Drive Stanford, CA 94305-5479, USA Available online 21 April 2005

Abstract A medical terminological system (TS) is essentially an ontology consisting of concepts, attributes and relationships pertaining to medical terms. There are many TSs around today, most of which are essentially frame-based. Various efforts have been made to get a better understanding of the requirements and the conceptual and formal structures of TSs. However, the actual implementation of a TS consisted so far of ad hoc approaches starting from scratch and, due to ad hoc semantics of the representation, the interoperability with external applications of the knowledge represented is diminished. In recent years, PROTE´GE´ has been gaining in popularity as a software environment for the development of knowledge-based systems. It provides an architecture for integrating frame-based ontologies with knowledge acquisition and other applications operating on these ontologies. In its recent version, PROTE´GE´ provides the ability to specify meta-classes and -slots. This contributes to an explicit separability of knowledge levels and allows for an increased modeling ﬂexibility. These properties, and the fact that it complies with a standard knowledge model, enable PROTE´GE´ to be an attractive candidate for the implementation of frame-based TSs. This paper investigates how to specify a TS in PROTE´GE´ and demonstrates this in a speciﬁc application in the domain of intensive care. Our approach is characterized by the utilization of a conceptual framework for Corresponding author. Tel.: +31 20 5665959; fax: +31 20 6919840.

E-mail addresses: [email protected] (A. Abu-Hanna), [email protected] (R. Cornet), [email protected] (N. de Keizer), [email protected] (M. Crube´zy), [email protected] (S. Tu). 1071-5819/$ - see front matter r 2005 Elsevier Ltd. All rights reserved. doi:10.1016/j.ijhcs.2005.02.005

ARTICLE IN PRESS 640

A. Abu-Hanna et al. / Int. J. Human-Computer Studies 62 (2005) 639–663

understanding TSs and mapping its components onto PROTE´GE´ constructs. This results in speciﬁcations of knowledge components for the implementation of terminological systems. The signiﬁcance of our work stems from the generality of these speciﬁcations. This facilitates their reuse, leading to a principled process for the development of terminological systems for a broad spectrum of medical domains. r 2005 Elsevier Ltd. All rights reserved. Keywords: Medical ontologies; Terminological systems;

PROTE´GE´;

Knowledge representation

1. Introduction Medical care is an information-intensive process. Patient information about diagnosis, treatment or prognosis is recorded by health care professionals and communicated to others. It is also used by managers and researchers to analyse health care. An important building block of medical information is formed by medical terms such as ‘Hepatitis’ or ‘Coronary artery bypass graft’ which, in this case, correspond to a medical condition and a surgical procedure, respectively. In accordance with de Keizer et al. (2000) we will refer to the model of the concepts and relationships associated with medical terms as a terminological system (TS)—a system in the sense that it poses order on its elements. A TS can range from a simple list of terms to a full-ﬂedged ontology. A terminological server provides services using the TS. In particular, the server uses the concepts, terms and relationships in the TS to provide services supporting the delivery, management and analysis of medical care. For example, one would often like to communicate about the condition of a patient using the preferred term for that condition regardless of the speciﬁc term that was recorded. A simple functional architecture that puts these notions in perspective is shown in Fig. 1. In practice these components are often packaged together, that is, without explicit distinction between the TS and the terminological

Representation Formalism (e.g. Frames, Description Logic)

(e.g. Intensive Care Ontology)

service . . . service

Interface

Terminological System

Interface

Terminological Server User (e.g. Knowledge Engineer) User (e.g. Physician)

Fig. 1. A functional architecture that distinguishes the notions of terminological system and terminological server.

ARTICLE IN PRESS A. Abu-Hanna et al. / Int. J. Human-Computer Studies 62 (2005) 639–663

641

server. In addition, they often lack an explicit description of the underlying representation formalism. There are many ‘‘terminological packages’’ in clinical use or research today such as ICD (WHO, 1993), SNOMED (Rothwell, 1995), UMLS (Lindberg et al., 1993) and Galen (Rector et al., 1997). They differ in their aims, in their realizations of the services and in their representation of the terminological system. We recently suggested a framework for understanding the concepts behind this variety of TSs (de Keizer et al., 2000; de Keizer and Abu-Hanna, 2000). In this paper we will refer to this framework as FUTS (Framework for Understanding Terminological Systems). The FUTS is a conceptual framework, not a tool, which provides a typology of TSs and formalizes perceived requirements on TSs in model fragments, or patterns. Our framework, the FUTS, relies on a representation formalism based on conceptual and formal languages, namely Entity-Relationship (ER) modeling (Chen, 1976) and ﬁrstorder logic (FOL). We use this formalism for describing existing terminological systems in order to understand them better and also to compare their structure to the model fragments that formalize TS requirements. Alternatively, the model fragments can be used as a basis for the design of a new TS where these fragments are reused and extended. This has been the case in the DICE TS (de Keizer et al., 1999), which has been designed at the department of Medical Informatics at the University of Amsterdam. The DICE TS is a representation of concepts and relationships associated with terms referring to reasons for admission to an intensive care unit. In the development of a new terminological system, e.g. in a specialized medical area, the analysis and design phases are followed by the implementation of the TS and the terminological services, in some programming language or environment. For example, the DICE TS and the services built on top of it have been implemented in Java. Although the developer of the DICE TS has complete control on the implementation, this realization of the DICE TS has a number of important drawbacks. First, the DICE TS has been developed from scratch, which required extensive efforts. These efforts included making various representational and design decisions about what to express and how to express it (for example, if and how to facilitate concept deﬁnitions and concept compositions). Second, the implementation of the DICE TS has not relied on a standard knowledge model that provides the meaning of the primitives used in the representation (e.g. frame slots). Instead, the meaning of the representational primitives are procedurally deﬁned by the proprietary programs, such as service programs, that access them. This hinders the interoperability of the TS as well as of the services operating on it: the TS cannot be easily accessed by third party applications and the current service programs cannot be assumed to work with other TSs. In the last decade PROTE´GE´ (Gennari et al., 2003) has been continuously evolving and improving to provide an ontology engineering environment that supports the implementation of knowledge-based systems. The recent version includes features that make it an attractive candidate for the development of a medical TS, which can be considered as an ontology from PROTE´GE´’s point of view. The architecture of PROTE´GE´ enables the integration between ontologies and applications, meaning that service applications can operate on the ontology in the same environment. Other

ARTICLE IN PRESS A. Abu-Hanna et al. / Int. J. Human-Computer Studies 62 (2005) 639–663

642

important features of PROTE´GE´ include its compliance with the open knowledgebased connectivity (OKBC) knowledge model (Chaudhri et al., 1998) and its ability to support the deﬁnition of meta-classes and -slots (Noy et al., 2000). PROTE´GE´’s compliance with the OKBC knowledge model contributes to making the frame semantics more explicit and understood by other systems, hence increasing the interoperability of ontologies developed with PROTE´GE´. The ability to deﬁne application-speciﬁc meta-classes and -slots allows the explicit structuring of ontologies at different knowledge levels. It also provides modeling ﬂexibility allowing for the creation of user-deﬁned constructs that mirror useful conceptual structures, such as those proposed in FUTS. In this paper we investigate how the PROTE´GE´ methodology can be harnessed to implement medical TS structures, exempliﬁed by the DICE TS. We chose DICE for the following two reasons. First, it is designed based on common desiderata of TSs, and hence the results can be (re)used as a basis for building new TSs. Second, the DICE TS includes a variety of essential TS types, in particular a thesaurus allowing for synonyms, a vocabulary that enables (partial) deﬁnitions and a nomenclature allowing the composition of new concepts. Our approach is characterized by the use of the FUTS as the starting point and by developing PROTE´GE´ constructs that correspond to the framework’s components. The paper provides a proof of concept for building TSs in PROTE´GE´ and it results in speciﬁcations and speciﬁcation patterns that can be used as the basis for the implementation of various frame-based TSs in PROTE´GE´. The paper is organized as follows. The next section describes the materials and methods used. These include the FUTS, the DICE TS which is based on this framework and the PROTE´GE´ environment. Section 3 reports on results in terms of a set of speciﬁcations for a PROTE´GE´ ontology corresponding to the DICE TS. Section 4 discusses our approach and concludes this paper.

2. Materials and methods An end user of terminological services usually observes terms and codes. However, the true power of a TS that lies behind these services emanates from its ability to explicitly represent the concepts designated by these terms and codes, and the relationships between these concepts. For example, the medical concept of the heart organ can be designated by the term ‘Heart’ in English, or ‘Coer’ in French, and may be coded as, e.g. ‘900’. This concept can be speciﬁed as a sub-concept of the more general concept of ‘Body Organ’ by relating these two concepts through an ‘is_a’ relationship. The materials and methods in this paper rely on the notions of concept, relationship, term, language and code. 2.1.

FUTS:

a framework for understanding TSs

A TS, whether based on an explicit or implicit conceptual representation, can be of various types. Fig. 2, adapted from de Keizer et al. (2000), shows the most important

ARTICLE IN PRESS A. Abu-Hanna et al. / Int. J. Human-Computer Studies 62 (2005) 639–663

643

Terminological System isa

d Composed Terminological System

Basic Terminological System

isa

o Thesaurus index, synonyms

Classification generic relation

Vocabulary definitions

Nomenclature composition scheme

Coding system coding scheme

Fig. 2. An entity-relationship model of types of TSs. Diamonds denote aggregations, ‘‘d’’ denotes disjointness and ‘‘o’’ denotes possible overlap between the specializations of an entity.

TS types. A basic TS is essentially a representation of terms without explicit relationships (e.g. the is_synonym_of relationship) among these terms, or among the concepts (e.g. the is_a relationship). Composite TSs are obtained by imposing some ordering on concepts and terms. A composite TS may be one or more of the following: a Thesaurus, if synonymous terms are clustered together and indexed; a Classification, when it uses the generic is_a relationship between concepts; a Vocabulary when it includes deﬁnitions of concepts; a Nomenclature when it includes rules for composing new concepts and terms; and a Coding system when it includes a coding scheme for designating concepts. In the literature, there have been suggestions for requirements and desiderata on medical TSs (Cimino et al., 1994; Campbell et al., 1997; Chute et al., 1998; Cimino, 1998). These requirements imply constraints on the representation formalism and what should or should not be represented. Important desiderata include the ability to: explicitly represent concepts; label relationships; cover the medical domain at hand; compose new concepts from existing ones; have multiple classiﬁcations of a

ARTICLE IN PRESS 644

A. Abu-Hanna et al. / Int. J. Human-Computer Studies 62 (2005) 639–663

concept; use multilingual terms (we mean by this terms in different languages) and synonyms; assign a concept identiﬁer that does not disclose the meaning of the concept; and assign a unique code for each concept. Conversely, the following aspects should be avoided: redundancy (different concepts with the same meaning); ambiguity (a concept corresponding to more than one object in reality); vagueness (a concept which is incomplete in meaning and hence does not correspond to one speciﬁc object in reality); restriction of the depth or breadth of the concept hierarchy and use of residual categories (like ‘not otherwise speciﬁed’). Many of these desiderata concern the actual content of the TS (e.g. that the concepts cover the domain at hand), hence they are difﬁcult or impossible to formalize. Other desiderata, however, concern the representation formalism itself (e.g. ability to classify a concept under multiple concepts) or concern constraints on the model (e.g. provide for multilingual terms and synonyms). A framework which is meant, among other things, to cluster and formalize this latter category of desiderata is the FUTS (the ﬁrst part is described by de Keizer et al. (2000) and the second by de Keizer and Abu-Hanna, 2000). The starting point of the FUTS is that at the heart of any TS is a pattern consisting of a relationship between a concept, language and term upon which (formal) constraints are speciﬁed. The fragment of these constraints, together with the concept-language-term model, express many of the desiderata and requirements on TSs. In FUTS, the ER formalism (Chen, 1976) is used for conceptual modeling and FOL is used to express the constraints. The concept-language-term (hereafter C-L-T) model is shown in the upper part of Fig. 3. The C-L-T model can be extended to describe various types of TSs such as a vocabulary. It is important to note that ‘Model Concept’ is in fact a meta-concept that is instantiated by any model concept whether it is ‘Disease’, ‘Organ system’, ‘Aetiology’ and also relationships such as ‘causes’. Note that ‘Aetiology’ and ‘Disease’ are instances (of the meta-concept ‘Model Concept’) and concepts at the same time. Hence the C-L-T model is essentially a meta-model. For simplicity, in this

Language

Model Concept

Description

Model Term

Synonym type

Name

code instance_of

Aetiology

instance_of causes

Disease

Fig. 3. An ER diagram of the C-L-T model capturing the relationship between terms and model concepts in the TS. It also shows how relationships such as ‘causes’ are represented. Although ‘Aetiology’ and ‘Disease’ are instances of ‘Model Concept’, they are also concepts themselves.

ARTICLE IN PRESS A. Abu-Hanna et al. / Int. J. Human-Computer Studies 62 (2005) 639–663

645

paper we will not make the effort to represent relationships, such as ‘causes’, as instances of the meta-class ‘Model Concept’. This is because in most applications, relationships are merely used in navigation between concepts, and not used for their own terms and codes, to which ‘Model Concept’ provides access. In FUTS, constraints that are implied by the desiderata are imposed on the C-L-T model by using cardinality constraints in ER diagrams and formalized in FOL. The FOL representation can express all the cardinality constraints in ER diagrams, and, in addition, also constraints not expressible in these diagrams. Below is a list of important constraints that capture the following requirements: non-redundancy, the ability to represent multilingual terms and the ability to represent synonyms with one preferred term for a concept for each language. For brevity we use ‘Concept’ to refer to ‘Model Concept’. Synonym_type : Description ! fpreferred; synonymg A description can be preferred or synonymous. Loosely speaking, it means that a term may be preferred or synonymous for a concept in a given language. 8c2Concept 8l2Language 91t2Term ðDescriptionðc; l; tÞ ^ synonym_typeðhc; l; tiÞ ¼ preferredÞÞ For every language there is exactly one preferred term for each concept. 91y A denotes that exactly one y exists for which A is true. 8c12Concept 8l2Language 8t2Term ðDescriptionðc1; l; tÞ ^ synonym_typeðhc1; l; tiÞ ¼ preferredÞ ! :9c22Concept ðDescriptionðc2; l; tÞ ^ synonym_typeðhc2; l; tiÞ ¼ preferred ^ c2ac1Þ For each language a term can be preferred for just one concept, i.e. different concepts cannot have the same preferred term. In Section 3 we will show how C-L-T-like meta-models and the constraints are to be represented in PROTE´GE´. 2.2. Putting the

FUTS

into use

FUTS, which implies using the combined ER and FOL formalisms, can be used in two major ways. First, it provides concepts and relationships that form the basic blocks for describing (e.g. reverse-engineering) an existing TS. Second, it provides modeling patterns that conform to some requirements and desiderata that can be used as a basis for designing a new TS. The C-L-T model and its corresponding constraints are an example of such a pattern. Another implied use of such patterns is inspecting a reverse-engineered TS to see how it conforms to, or deviates from, the patterns. In the next subsection we will illustrate the use of FUTS in reverse engineering a part of the well known UMLS (Lindberg et al., 1993). Then, in Section 2.2.2, we show how to use the C-L-T patterns in designing a new TS, the DICE TS. Our eventual goal in this paper is to show how PROTE´GE´ can be used to implement existing or new TSs using FUTS constructs (conceptual (meta-)models, constraints and patterns).

ARTICLE IN PRESS A. Abu-Hanna et al. / Int. J. Human-Computer Studies 62 (2005) 639–663

646

2.2.1. The FUTS for describing an existing TS As an example of the suitability of FUTS for modeling existing TSs, we provide an ER model corresponding to a part of the UMLS. The UMLS interlinks medical concepts, and hence terms, from various existing TSs. It primarily supports search and retrieval activities (e.g. bibliographic retrieval) and does not, for example, contain rules to compose new complex concepts. The UMLS includes the Metathesaurus, a Semantic Network of semantic types and the SPECIALIST Lexicon. We will focus on the Metathesaurus due to its relevance to existing TSs and hence to the FUTS patterns. The Metathesaurus provides information about concepts, terms, string-names and the relationships among them, drawn from established terminological systems such as ICD-9-CM/ICD-10, SNOMED and MeSH, which are called the source TSs. The Semantic Network consists of concepts denoting semantic types and relationships (called links) among these concepts. Fig. 4 shows a part of a meta-model of the Metathesaurus and the Semantic Network. By inspecting the model, one observes, for example, that the Metathesaurus represents the ‘broader’, ‘narrower’ and ‘other’ relationships between different concepts, modeled in the ﬁgure as attributes of ‘UMLS relation’. Another observation is the overlap between the C-L-T pattern and this model. There are obvious reﬁnements in this UMLS meta-model to the C-L-T model. For example, although a concept in the UMLS is described by one preferred and possibly one or more synonymous terms, as in the C-L-T, we see here that a term is in turn linked to multiple term strings (plurals, etc.). UMLS concepts have an attribute ‘deﬁnition’ whose value is a textual deﬁnition which describes the meaning of the concept. Each concept in the Metathesaurus is assigned to the most speciﬁc semantic type(s) available in the Semantic Network. Semantic types are further grouped (but not shown in the ﬁgure) according to: Physical objects (e.g. organisms), Conceptual entities (e.g. ﬁndings), Activities (e.g. behavior) and Phenomena and processes (e.g. biological function).

Semantic Network

Metathesaurus Language String definition

Semantic net concept 2

Assigned_to

concept_id

string_id Description

2

term status Connected_by

>= 1

>=1

Semantic net relationship

Generic relationship

Identifier

UMLS concept

Belongs_to

isa

Identifier

isa Non generic relationship

source

UMLS relation

{preferred, synonym}

Term term_id

Terminological system source Term string

String_description

Identifier

Terminological system value

{broader, narrower, other}

Fig. 4. A partial model of the Semantic Network and Metathesaurus in the UMLS.

ARTICLE IN PRESS A. Abu-Hanna et al. / Int. J. Human-Computer Studies 62 (2005) 639–663

647

For brevity we will not show all the FOL constraints on this model but they do conform to, and extend the FOL constraints imposed on the C-L-T model, which we have presented before. These include constraints that express the fact that in each language there is exactly one preferred term per concept and that a term can be preferred for just one concept in a given language. There are additional FOL constraints such as the following one: 8c2UMLS_concept 9s2Sem_net_concept Assigned_toðc; sÞ meaning that each Metathesaurus concept is assigned to at least one semantic net concept. This FUTS model of the UMLS is helpful in understanding the UMLS and for comparing it with the C-L-T model. Moreover, when implemented in a tool such as PROTE´GE´, it can be used to check the constraints imposed on the model. 2.2.2. The FUTS for designing the DICE TS The DICE TS is a terminological system, currently consisting of about 2500 concepts and relationships, which we developed at the department of Medical Informatics at the University of Amsterdam to represent reasons for admission of patients to the intensive care unit. We have speciﬁed and designed the DICE TS according to the FUTS and to the knowledge architecture described by Abu-Hanna and Jansweijer (1994). Fig. 5 shows the knowledge architecture of the DICE TS which we brieﬂy explain below and then illustrate. The ‘Meta model’ compartment in the ﬁgure is equivalent to the C-L-T metamodel that appears in Fig. 3. The domain model compartment includes 8 concepts and 14 relationships from clinical medicine, which are also relevant to intensive care, of which only the essence is shown in Fig. 6. In this domain model a Disease is related to a Body System, Anatomical Component, Dysfunction and Aetiology. The intensive care (IC) model instantiates these concepts, for example: Hepatitis-B, Digestive system, Liver, Infection and Hepatitis-B virus (these are, respectively, instances of Body System, Anatomical Component, Dysfunction and Aetiology). The formal speciﬁcation in the compartment ‘Formal speciﬁcations in FOL’ includes constraint speciﬁcations in FOL such as those referring to the C-L-T metamodel. The vocabulary compartment includes (parts of) deﬁnitions of concepts. The nomenclature compartment consists of rules for the composition of new concepts in

Meta model Formal Domain model

in FOL

Vocabulary Nomenclature

specifications

Intensive Care model

Fig. 5. Knowledge compartments in DICE. Except for the IC model, these compartments are quite generic.

ARTICLE IN PRESS 648

A. Abu-Hanna et al. / Int. J. Human-Computer Studies 62 (2005) 639–663

Disease

system component

Body

Anatomical

System

Component

cause dysfunction

Dysfunction

Aetiology

Fig. 6. The kernel of the domain model. Rectangles and lines denote concepts and relationships, respectively.

Disease

Viral Hepatitis

Anatomical Component Def: Liver

Dysfunction

Aetiology

Def:

OR:

Infection

Hep−V Epst−Barr−V Cytomegalo−V

Fig. 7. A simpliﬁed view of the data structure representing the vocabulary and nomenclature compartments. The header includes concepts from the domain model. One entry is shown for the (partial) deﬁnition and composition rule for the concept ‘Viral Hepatitis’.

the intensive care. The vocabulary and the nomenclature in DICE are implemented in one data structure which is illustrated in Fig. 7 (for simplicity we omit ‘Body System’). The central concept in Fig. 7 is that of Disease. The disease described by the row shown is ‘Viral Hepatitis’. The speciﬁcation implies that the anatomical component of Viral Hepatitis is the liver and that this knowledge is deﬁnitional (denoted by the ‘Def:’ qualiﬁer). The Viral Hepatitis entry also speciﬁes that the dysfunction is, by deﬁnition, infection. Using the ‘Def:’ qualiﬁer means that ‘Viral Hepatitis’ logically entails the anatomical component ‘Liver’ and the dysfunction ‘Infection’. In other words, ‘Liver’ and ‘Infection’ are necessary conditions for ‘Viral Hepatitis’. The deﬁnitional knowledge can be used by applications to prevent attempts at composing concepts such as ‘Viral Hepatitis which is located in liver’ as it would be redundant. The ‘Aetiology’ column speciﬁes the qualiﬁer ‘OR:’. The ‘OR:’ qualiﬁer means that ‘Viral Hepatitis’ may be used to create a new composite concept using one or more of the causes given (Hepatitis virus; Epstein-Barr virus; and Cytomegalo virus) or any of their sub-concepts. For example, the ‘OR:’ qualiﬁer allows one to compose ‘Viral Hepatitis with aetiology [Hepatitis-B virus and Cytomegalo virus]’, where ‘Hepatitis-B virus’ (not shown in the ﬁgure) is a sub-concept of ‘Hep-V’ (Hepatitis

ARTICLE IN PRESS A. Abu-Hanna et al. / Int. J. Human-Computer Studies 62 (2005) 639–663

649

virus). In contrast, an ‘XOR:’ qualiﬁer would mean that only one cause may be chosen. Another qualiﬁer is ‘Only Descendants:’ which means that only subconcepts, but not the concept itself, must be used in compositions. In this example, if the qualiﬁer ‘Only Descendants:’ had been speciﬁed at Aetiology, alongside the ‘OR:’ qualiﬁer, then one would not have been allowed to use the Hepatitis virus (Hep-V) itself in the composition. More than one row may be required to describe the same disease. This is due to the lack of ﬂexibility of the qualiﬁers such as ‘Only Descendants:’ which apply to all entries of a column, instead of some selected ones. Another particularity in the DICE TS is that all diseases appearing in the ﬁrst column in the table are assumed to be complete in meaning (e.g. non-vague). This implies, for example, that a physician may record the disease term as such without being obliged to reﬁne it (i.e. make it more speciﬁc by composition) ﬁrst. Reﬁnement is hence always optional. In general though, one might think of situations in which one would want to specify a useful disease category that is not yet complete in meaning which must be reﬁned before it may be used (for example, for being recorded in an electronic patient record). In the Results section we provide a general approach to tackle such situations. We have implemented the DICE TS in Java according to the speciﬁcations above. The DICE TS along with a server and client applications are planned to be run in a pilot setting at some Dutch IC units. 2.3.

PROTE´GE´

is a platform-independent environment for creating knowledge bases. A knowledge-base is represented as a frame-based ontology consisting of classes and slots. A graphical user interface (GUI) is provided to develop and maintain the hierarchy of classes and to create and assign slots to them (corresponding to class attributes or binary relationships with other classes). To support the acquisition of knowledge by end-users such as physicians, PROTE´GE´ automatically generates forms for creating instances of classes and acquiring their slot values. For example suppose that the class ‘Health_Problem’ is speciﬁed in the ontology as a frame with two slots: ‘HP_name’ that is speciﬁed to take string values (also called slot ﬁllers) and the slot ‘HP_severity’ that may take one, and only one, of the following values: ‘mild’, ‘moderate’ and ‘severe’. The speciﬁcation of this example in PROTE´GE´ is shown in Fig. 8. When a user indicates that he or she wants to create a new instance of the class ‘Health_Problem’, PROTE´GE´ will automatically create a form that provides two items: a ﬁeld for acquiring the string value of the slot ‘HP_name’ and a pull-down menu with the values ‘mild’, ‘moderate’ and ‘severe’. This is illustrated in Fig. 9 where ‘Asthma’ is the string value of ‘HP_name’ and ‘mild’ is chosen from the pull-down menu of the three possible values of ‘HP_severity’. Similarly, PROTE´GE´ provides different standard graphical interface behaviors for the different primitive data types (e.g. boolean) and also when the slot value is an instance or a class itself. The common cardinality constraints, such as 1:1, 1:N, etc. are automatically supported by PROTE´GE´. Besides managing ontologies, PROTE´GE´ is PROTE´GE´

ARTICLE IN PRESS 650

A. Abu-Hanna et al. / Int. J. Human-Computer Studies 62 (2005) 639–663

Fig. 8. The class Health_Problem and its slots, as speciﬁed in

PROTE´GE´.

Fig. 9. Acquiring attribute values (slot ﬁllers) for an instance of the class Health_Problem.

also an extensible environment allowing the user to integrate other applications in a seamless manner. Examples of common applications are those to visualize or reason with the ontologies (see http://protege.stanford.edu). For the objectives of this paper the most relevant aspects of PROTE´GE´ are: PROTE´GE´’s compliance with the Open Knowledge-Based Connectivity (OKBC) knowledge model, the ability to specify meta-classes and -slots and the PROTE´GE´ Axiom Language (PAL). The ﬁrst aspect, the compliance with OKBC, enhances interoperability of the knowledge base. OKBC is an application programming interface (API) for accessing knowledge bases. It provides a uniform model of knowledge representation systems based on a common conceptualization of classes, individuals, slots, slot facets and inheritance. The OKBC model is speciﬁed using logical axioms so that there is a clear understanding on what each notion means. The second aspect, the ability to allow the speciﬁcation of user-speciﬁc metaclasses and meta-slots, allows an explicit separation in the model between metaknowledge and knowledge at a lower level and ﬂexibility in modeling. Meta-classes in PROTE´GE´ are classes whose instances are themselves classes (often referred to as instance classes hereafter to stress this fact). Meta-classes are implemented as

ARTICLE IN PRESS A. Abu-Hanna et al. / Int. J. Human-Computer Studies 62 (2005) 639–663

651

sub-classes of the default meta-class ‘:STANDARD-CLASS’. Following OKBC, PROTE´GE´ makes the distinction between ‘template slots’ and ‘own slots’. Own slots are associated with frames in the system and are not inherited in any way. A template slot of a class is propagated to its subclasses and inherited by an instance of the class as an own slot. Slots can be attached to meta-class, instance class and instance levels. As illustrated in Fig. 10, the template-slot ‘severity’ of the meta-class ‘Meta_asthma’ will be inherited by the class ‘Asthma_type’ as an own slot because ‘Asthma_type’ is an instance of ‘Meta_asthma’. The slot ‘severity’ is also inherited as an own slot by ‘Mild_asthma’ because this class is a sub-class of ‘Asthma_type’. However, ‘severity’ does not appear at the instance ‘John’s_Mild_asthma’. Because slots are important in mapping TS constructs into PROTE´GE´ we now illustrate an aspect of their usage. In analogy to Fig. 10, consider the class ‘Asthma_M’ in Fig. 11. By being a sub-class of ‘:STANDARD-CLASS’ it becomes a new meta-class available for modeling in PROTE´GE´. It has a template slot ‘severity_type’ which must have one of the values ‘mild’, ‘moderate’ and ‘severe’. Now let us create the class ‘Asthma_type’ which is an instance of ‘Asthma_M’ (as well as a sub-class of ‘:THING’). As is illustrated in Fig. 12, this class inherits the severity_type slot as its own slot. A pull-down menu, labeled ‘Severity Type’, is provided to specify one of the three possible values. We can now create three subclasses of ‘Asthma_types’, namely ‘mild_asthma’, ‘moderate_asthma’ and ‘severe_asthma’ with the corresponding values for their ‘severity_type’ slot. This last

Meta class Meta_asthma

Own slots

name

Template slots

severity instance_of

Instance class Asthma_type

Own slots

severity

Template slots

progression is_a

Instance class Mild_asthma

Own slots

severity: Mild

Template slots

progression instance_of

Instance John’s_Mild_asthma

(Own) slots

progression

Fig. 10. Slots attached at various levels. A template-slot of meta-class ‘Meta_asthma’ will become an own slot of classes ‘Asthma_type’ and ‘Mild_asthma’, but will not appear at the instance ‘John’s_Mild_asthma’. However, ‘John’s_Mild_asthma’ will inherit the template slots of ‘Mild_asthma’.

ARTICLE IN PRESS 652

A. Abu-Hanna et al. / Int. J. Human-Computer Studies 62 (2005) 639–663

Fig. 11. ‘Meta_asthma’ deﬁned as a meta-class with the meta-slot ‘severity_type’.

Fig. 12. Three asthma types are created by using their own slot ‘severity_type’ inherited from the metaclass ‘Meta_asthma’.

example illustrates how template slots correspond to own slots at the instance class level and, as such, how they become integrated in the graphical user interface. In mapping the deﬁnition and nomenclature constructs onto PROTE´GE´ in Section 3 we will see a usage of another type of slots: meta-slots. In PROTE´GE´ slots are frames in their own right. A slot is an instance of a ‘:STANDARD-SLOT’ meta-class. Constraints on slot values, such as type and cardinalities, are called facets of a slot. These facets are implemented as (template) slots of the ‘:STANDARD-SLOT’ metaclass. Users can deﬁne their own meta-slots by creating slots that are sub-classes of ‘:STANDARD-SLOT’. When in need for new facets, one can deﬁne a new meta-slot and attach new template slots to it, which will become facets once they are instantiated. For example, to add a boolean facet ‘slot.type’ alongside the standard facets such as ‘name’ and ‘cardinality’, one would deﬁne a new meta-slot as sub-class of ‘:STANDARD-SLOT’, say ‘Qualiﬁer_slot’, and assign it a boolean template slot called ‘slot.type’. One then creates an instance, say ‘cause’, of ‘Qualiﬁer_slot’. When

ARTICLE IN PRESS A. Abu-Hanna et al. / Int. J. Human-Computer Studies 62 (2005) 639–663

653

‘cause’ is attached to any class then ‘slot.type’ will appear as a facet of ‘cause’. PROTE´GE´ will also provide the appropriate GUI, a check-box in this case, to acquire the value of this facet, just as it acquires other facet types such as providing a pulldown menu for the value-type of a slot. The third relevant aspect of PROTE´GE´, the add-in language called PAL (PROTE´GE´ Axiom Language), provides support to express constraints. PROTE´GE´ can be instructed to check whether the knowledge modeled is consistent with the constraints formulated. In PAL one can express constraints beyond the ability of the simple cardinality constraints often used in conceptual modeling languages such as ER. This is because in PAL one is not restricted to statements about classes related by a binary relationship but, instead, one may specify constraints between seemingly unrelated classes.

3. Results In this section we report on the set of PROTE´GE´-based constructs that we designed to model the DICE compartments, in the following order: the C-L-T meta-model, the PAL logical statements, the speciﬁcation of the domain model of which the ICmodel is an instantiation and the vocabulary and nomenclature structures. First, however, we report on the suitability of the knowledge representation in PROTE´GE´ for modeling a TS. 3.1. The knowledge representation formalism in PROTE´GE´

PROTE´GE´

provides a knowledge representation that meets many desiderata for TSs:

It is a concept-oriented formalism. It allows for an explicit representation of concepts and attributes, including binary relationships, by means of classes and slots. It allows for the speciﬁcation of explicit relationships by reiﬁcation. It supports the ‘is_a’ relationship and its semantics due to compliance with the OKBC knowledge model. It supports multiple classiﬁcations of a concept due to OKBC compliance. It does not restrict the depth or breadth of the concept hierarchy (as is the case with e.g. the ICD). It allows for marking classes as ‘Abstract’ to prevent their instantiation. This can be used to prevent the creation of ‘‘vague’’ concepts, that do not yet refer to a speciﬁc object in reality. However, PROTE´GE´ does not have built-in representations of: compositional rules; synonymous terms; and multilingual terms. Moreover, as a frame-based formalism it hinders automatic inference such as classiﬁcation and the detection of redundant or inconsistent concepts. Such inference requires a more expressive and formal language combined with a reasoning facility. Description Logics constitute a family

ARTICLE IN PRESS 654

A. Abu-Hanna et al. / Int. J. Human-Computer Studies 62 (2005) 639–663

of such formalisms. Cornet and Abu-Hanna (2003) provide an example of how to use Description Logics for the detection of modeling errors and redundant concepts. 3.2. The C-L-T model For the implementation of the C-L-T meta-model, the meta-classes and meta-slots in PROTE´GE´ seem the most natural choice. Fig. 13 shows the class hierarchy where ‘Description’, ‘Language’, ‘Model Term’ and ‘Model Concept’ are deﬁned as subclasses of PROTE´GE´’s predeﬁned meta-class ‘:STANDARD-CLASS’. This makes these four subclasses meta-classes as well. Note that ‘Description’ is a reiﬁed relationship represented as a (meta) class. It is common to reify non-binary relationships in object- and concept-oriented representations. The speciﬁcation of the template slots of ‘Description’ is shown in Fig. 14. The slots correspond to relationships between the C-L-T concepts. Note that the value types of the slots ‘description.modelterm’, ‘description.language’ and ‘description.modelconcept’ are all speciﬁed as instances of ‘Model_Concept’, in other words, an instance class. When ‘Description’ is instantiated, these slots become own slots of the instances. We also specify inverse relationships (whose presence is indicated in the ﬁgure by a superscript ‘I’), from each of the other C-L-T concepts back to ‘Description’. For example, the inverse of ‘description.modelterm’ is a slot of ‘Model_Term’ pointing back to ‘Description’. 3.3. Using PAL statements to express FOL constraints Once the C-L-T model is in place, the constraints on this model need to be expressed and enforced. These constraints have been already expressed in FOL and the best way to represent them in PROTE´GE´ is to use PAL. Fig. 15 illustrates a PROTE´GE´ window showing the various ﬁelds that constitute a PAL constraint. The displayed constraint expresses the uniqueness of codes. A code

Fig. 13. The

PROTE´GE´

class hierarchy with the C-L-T meta-concepts.

ARTICLE IN PRESS A. Abu-Hanna et al. / Int. J. Human-Computer Studies 62 (2005) 639–663

655

Fig. 14. Template slots of the Description meta-concept.

Fig. 15. A PROTE´GE´ window showing the name, description, statement and range ﬁelds that go into deﬁning a PAL constraint. Displayed is the constraint expressing uniqueness of codes.

here is associated with a description and, hence, to a unique combination between a concept, term and language. This is a generalization of the common view of a code as an identiﬁer of a concept. Somewhat more challenging is expressing the following two requirements: that there is exactly one preferred term for each (model) concept in a language and that a term is preferred by only one (model) concept in each language. The PAL statement corresponding to the ﬁrst constraint appears in Fig. 16. Noteworthy is the observation that there is a straightforward correspondence between the PAL statement and the equivalent FOL constraint described in Section 2.1. Although constraints can be checked on demand, PROTE´GE´ does not provide an option to check constraints automatically and warn the user about constraint violation. Hence a violation of a constraint speciﬁcation may result in a cascade of errors which are detected only once the user requests constraint checking. At the same time, PAL allows for the inevitable inconsistencies arising during the construction of an ontology. 3.4. Specification of the domain model The next step is the speciﬁcation of the domain model (see Fig. 6). We suggest that the concepts in this model be implemented as classes which are instances of the metaclasses in the C-L-T meta-model. ‘Disease’, ‘Dysfunction’, ‘Anatomical Component’

ARTICLE IN PRESS 656

A. Abu-Hanna et al. / Int. J. Human-Computer Studies 62 (2005) 639–663

Fig. 16. PAL statement expressing the requirement ‘‘Each concept has exactly one preferred term in each language’’.

Language

Model Concept

Model Term

Description

instance_of

instance_of

instance_of

instance_of

Disease

c_Language

sub_class Viral_Hepatitis

sub_class English_Lang

c_Description sub_class Descr_viral_hep

c_Term sub_class Term_viral_hep

Fig. 17. Instantiating the C-L-T model.

and ‘Aetiology’ will be deﬁned as instances of the meta-class ‘Model Concept’. There will be three classes ‘c_Language’, ‘c_Term’, and ‘c_Description’ which are instances of the meta classes ‘Language’, ‘Model Term’ and ‘Description’ as shown in Fig. 17. For creating the intensive-care model, or an ontology in any other speciﬁc application domain, the end-user creates sub-classes of the above mentioned classes (e.g. ‘Descr_viral_hepatitis’, ‘Descr_meningitis’, etc.). Because ‘Description’ interrelates the other relevant concepts, the creation of a new sub-class of ‘c_Description’ will automatically set the machinery of acquiring all the other relevant information pertaining to the C-L-T model by navigating the relationships and their inverses in

ARTICLE IN PRESS A. Abu-Hanna et al. / Int. J. Human-Computer Studies 62 (2005) 639–663

657

this model. Recall that the template slots of ‘Description’ in Fig. 14 will now apply as own slots for ‘c_Description’ as outlined in Fig. 10. Each time a sub-class of ‘c_Description’ is created, the system will ‘‘ask’’ about these (own) slots which will be integrated in the acquisition form. For example, for the slot ‘synonym_type’ PROTE´GE´ will automatically create a pull-down menu with the options ‘preferred’ and ‘synonym’, in the same manner that ‘HP Severity’ was treated in Fig. 9. Consider now the case that one is to add the concept ‘Viral_Hepatitis’. One may then create a new sub-class of the class ‘Disease’ (which is an instance of the metaclass ‘Model Concept’). Due to the inverse relationship from ‘Disease’ to ‘c_Description’, the system will ask for the creation of a concept of type ‘c_Description’, say ‘Descr_viral_hep’ (see lower part of Fig. 17). When this is created it will in turn provide the links to the language and term concepts (either existing ones or ones to be created). In this way the slots together with their cardinality constraints, whose violation is automatically indicated by the GUI, guide the knowledge acquisition process. 3.5. Modeling the vocabulary and the nomenclature Finally, we show here how to implement the vocabulary and nomenclature structure. One can interpret the second, third and fourth columns in Fig. 7 as a way to further describe ‘Disease’. After all, ‘Disease’ is connected to the classes appearing in these columns by the relationships appearing in Fig. 6. Using each column (system, component, dysfunction, or aetiology) as a simple slot for ‘Disease’ would not allow the user room to express additional information about these attributes, for example whether their value belongs to a concept deﬁnition. This leads to the insight that one can describe these columns as meta-slots. Recall that a meta-slot can be used to introduce additional slot facets. To implement this idea we exploit the structure shown in Fig. 18. We create a new meta-slot called ‘Qualiﬁer slot’ as a sub-class of the ‘:STANDARD-SLOT’ in PROTE´GE´. Viewed as a frame, as nearly everything in PROTE´GE´, the meta-slot has two slots: ‘slot.type’, which holds a qualiﬁer type ‘Deﬁnition’, and ‘only.descendants’ which can be true or false (allowing or disallowing applications to select sub-concepts). We then make four instances of ‘Qualiﬁer slot’: (body) ‘system’, (anatomical) ‘component’, ‘dysfunction’ and ‘aetiology’. We then attach these slots to ‘Disease’. We commented earlier that in DICE a disease in the ﬁrst column in the table must be non-vague in the sense that its term is a legitimate one to use, without any reﬁnement. However, we want to allow for the general case in which one may also specify a disease that must be reﬁned. This is achieved by allowing ‘slot.type’ to have one of two possible values. The value ‘Deﬁnition’ means that the value of the attribute (system, component, dysfunction, or aetiology) is part of the disease deﬁnition. The value ‘Reﬁnable’ means that the attribute value may be used in the reﬁnement: if the cardinality speciﬁes ‘min ¼ X ’ where X X1 then reﬁnement is obligatory, otherwise it is not. Cardinality is simply implemented as PROTE´GE´’s builtin facet for cardinality speciﬁcation. If the attribute is reﬁnable then the slot

ARTICLE IN PRESS 658

A. Abu-Hanna et al. / Int. J. Human-Computer Studies 62 (2005) 639–663

meta slot Qualifier slot slot.type: {Definition, Refinable} only.descendants: Boolean cardinality: (min, max)

instance of

meta class Model Concept

instance of

Disease instance class system: System component: Component dysfunction: Dysfunction aetiology: Aetiology sub_class Viral_Hepatitis

Fig. 18. The infrastructure for the implementation of the vocabulary and nomenclature shown in Fig. 7.

‘only.descendants’ becomes relevant: if it is true, then only sub-concepts of the attribute values (slot ﬁllers) may be used in the composition, otherwise the attribute values themselves may be used as well. The separation between the speciﬁcation of cardinality constraints and the speciﬁcation of reﬁnability, as implemented here, provides a more general solution than the approach adopted in the DICE TS in which the two types of speciﬁcations are mixed. Moreover the cardinality constraints can express arbitrary conditions, beyond the simple ‘OR:’ and ‘XOR:’. Let us consider now what happens when a new disease, ‘Viral_Hepatitis’ is created. ‘Viral_Hepatitis’ will be created as a sub-class of ‘Disease’. The fact that ‘Disease’ is an instance of ‘Model Concept’ means that the system will enforce the whole machinery around the C-L-T model as shown above (by asking for the language and terms, etc.). In addition, the creation of a Disease class such as ‘Viral_Hepatitis’ will allow the speciﬁcation of the composition rule, which is part of the nomenclature, for that disease. This happens through the speciﬁcation of the slots ‘system’, ‘component’, ‘dysfunction’, or ‘aetiology’. Because these slots are instances of the meta-slot ‘Qualiﬁer slot’, the user-deﬁned ‘slot.type’ and ‘only.descendants’ are now slot-facets of these four slots just like the cardinality facets. Together, these facets can now specify the composition rules. For example, when the slot ‘component’ is selected, as shown in Fig. 19, a pull-down menu is automatically provided for acquiring the value of the slot-facet ‘slot.type’ in order to specify whether the (anatomical) component is deﬁnitional or reﬁnable. In our case, the component is ‘Liver’ and the qualiﬁer is chosen to be ‘Deﬁnition’. The ‘only.descendants’ and cardinality slots are irrelevant and should be ignored in this case as no composition should be attempted by an application operating on the ontology. If, instead, we select ‘aetiology’, one can specify ‘slot.type’ as ‘Reﬁnable’

ARTICLE IN PRESS A. Abu-Hanna et al. / Int. J. Human-Computer Studies 62 (2005) 639–663

659

Fig. 19. Speciﬁcation of the slot ‘component’. Notice the graphical interface of the user-deﬁned slot-facets ‘Slot Type’ and ‘Only Descendants’.

and ‘only.descendants’ as False (to allow, e.g., the use of ‘Cytomegalo-V’ itself in the composition). Cardinality should be speciﬁed as X0 (and not X1) because the user should not be forced to reﬁne the disease. A standard GUI behavior, an interface with a check-box in this case, is then provided to indicate the requested Boolean value of this user-deﬁned facet. Above we have shown how to specify for each disease both the vocabulary (simply by providing the ‘Deﬁnition’ qualiﬁer) and the nomenclature (by providing the other qualiﬁers). This speciﬁcation of the slots corresponds to one entry in Fig. 7. The concept composition step occurs at the moment that a disease such as ‘Viral Hepatitis’ (which is a class) is itself instantiated as is illustrated in Fig. 20. In this case PROTE´GE´ creates an appropriate graphical interface to account for the system (here Digestive_system), component (Liver), dysfunction (Infection) and aetiology (the ﬁgure shows the possible choices as a tree representation of the hierarchy). Note that although PROTE´GE´ can enforce some of the semantics on these elements (e.g. cardinality), it cannot enforce the semantics of qualiﬁer values such as ‘slot:type ¼ Definition’ and ‘only:descendants ¼ True’, as it cannot understand them.

4. Discussion and conclusions In this work, we have aimed at ﬁnding a systematic approach for the implementation of frame-based TSs in PROTE´GE´. In particular we have adopted an

ARTICLE IN PRESS 660

A. Abu-Hanna et al. / Int. J. Human-Computer Studies 62 (2005) 639–663

Fig. 20. An instance of ‘Viral Hepatitis’ is being created to allow for concept composition. For example, to create the composite concept ‘Viral Hepatitis with aetiology Cytomegalo_Virus’ the aetiology ‘Cytomegalo_Virus’ would be selected.

approach that creates PROTE´GE´-counterparts of constructs stemming from a conceptual framework for understanding TSs. The advantage of our FUTS framework is that it makes the implicit information in TSs explicit. This facilitates communication about the TS and the design of such systems, as we have done with the DICE TS. We have shown how the knowledge compartments of the DICE TS are implemented in PROTE´GE´. We may conclude that the framework, upon which DICE has been designed, proved to be a very useful instrument for our objectives. Hence, although not the primary aim of the paper, the suitability of FUTS has in effect been extended to include its use as a starting point for the implementation of TSs in PROTE´GE´. As for PROTE´GE´, our assumption that its characteristics make it attractive as a vehicle to build frame-based TSs turned out to be justiﬁed. These characteristics are discussed below. First, as a representation formalism that is compliant with the OKBC knowledge model, PROTE´GE´ allows us to: represent concepts and attributes by means of classes and slots; label relationships explicitly; support the ‘is_a’ relationship and multiple classiﬁcations of concepts; extend the class hierarchy without restricting its depth or breadth; and mark classes as abstract to avoid the instantiation of ‘‘vague’’ concepts. The compliance of PROTE´GE´ with the OKBC knowledge model has another important advantage in developing TSs, namely that the frame-representation of a TS follows a standard semantics (e.g., of the ‘is_a’ relationship) and allows for the

ARTICLE IN PRESS A. Abu-Hanna et al. / Int. J. Human-Computer Studies 62 (2005) 639–663

661

retrieval of its contents using the application programming interface of OKBC. This has been a disadvantage in the DICE TS with its Java implementation which relied on procedural semantics. The implementation of DICE in PROTE´GE´ will facilitate retrieval of its contents using different servers. Second, the ability of PROTE´GE´ to deﬁne meta-classes and -slots turned out to be a major advantage. This allows for a conceptual separation between meta-knowledge and other instance-level knowledge. Moreover, it proved to be quite ﬂexible as is evident from the heavy reliance on these constructs in our solutions. This allowed us to specify the C-L-T meta-model and the vocabulary and nomenclature constructs in a straightforward manner. The use of meta-slots means in effect that these constructs can be treated as slot facets at the instance level. Finally, PROTE´GE´ uses its GUI to represent and acquire these slot facets at the instance class level (e.g. it will automatically provide check-boxes, pull-down-menus, etc.). In other words, one beneﬁts from the standard GUI behavior. All these aspects make the use of these constructs in PROTE´GE´ advantageous. Third, the constraint language of PROTE´GE´, PAL, proved also to be adequate to represent and verify the relevant constraints on the representation as is evident from expressing the C-L-T constraints in PAL. A major advantage here is that the translation of a constraint expressed in FOL into a PAL constraint is quite straightforward. This is not surprising as PAL relies on KIF (Genesereth and Fikes, 1992), which itself relies on FOL. PROTE´GE´ does not check these constraints during knowledge acquisition but can do this on demand showing which constraints are, and are not, respected. Although PROTE´GE´ seems suitable for the implementation of TSs, it does not have built-in support for the compositional rules or for synonymous and multilingual terms. It does indeed provide the infra-structure to deﬁne these constructs as we have seen in this paper but it cannot enforce all their semantics. For example it does not understand the meaning of qualiﬁer values such as ‘slot:type ¼ Definition’ and hence the composition of concepts cannot be guaranteed to be consistent with the nomenclature speciﬁcations. This means that ‘‘external’’ applications should be written. However such applications can be integrated in the PROTE´GE´ environment. There are of course things that PROTE´GE´, as a sheer frame-based formalism, simply cannot do. It does not provide a way for intensional deﬁnitions of concepts. Hence it cannot automatically classify concepts or detect ambiguous, redundant, or inconsistent ones. More powerful formalisms such as Description Logics are needed for these tasks (Cornet and Abu-Hanna, 2003). As often is the case, there are many possible ways to implement the constructs of a TS in a frame-based representation such as PROTE´GE´. The guiding principle behind the solutions that we propose in this paper is that the conceptual TS representation (that of the DICE TS in our case) should map straightforwardly to their counterparts in PROTE´GE´. In other words the PROTE´GE´ solutions should preserve the conceptual character of the TS as much as possible. This paper can be seen as proof of concept for the suitability of PROTE´GE´ to represent frame-based TSs that are comparable to the DICE TS. Because the DICE TS represents ideas common to most TSs around today, and because FUTS is a general

ARTICLE IN PRESS 662

A. Abu-Hanna et al. / Int. J. Human-Computer Studies 62 (2005) 639–663

framework whose constructs have been translated into PROTE´GE´, it is fair to assume that PROTE´GE´ should be able to represent a wide range of medical TSs. Moreover, the implementation of existing TSs in PROTE´GE´, such as the UMLS, can be translated in the same manner in PROTE´GE´ following our guiding principle of preserving its FUTS conceptual structure. For example, one may take the FUTS meta-model of the UMLS described in this paper and map it onto PROTE´GE´. This model can then be used, for example, for checking the integrity of the contents of the UMLS.

Acknowledgements This work has been initiated when the ﬁrst author was visiting the Stanford Medical Informatics department. Thanks to Mark Musen for his help and hospitality. We would also like to thank our anonymous reviewers for their valuable comments. This work has been partially funded by the Netherlands Organization for Scientiﬁc Research for the projects titled ‘‘Terminology and Semantics: Making semantics explicit’’ Number 014-18-014 and ‘‘I-Catcher: Intensive-Care Access to Terminology and Course of Health Exploration and Retrieval’’ Number 634.000.020.

References Abu-Hanna, A., Jansweijer, W.H.N., 1994. Modeling application domain knowledge using explicit conceptualization. IEEE-Expert 9 (5), 53–64. Campbell, J., Carpenter, P., Sneiderman, C., et al., 1997. Phase II evaluation of clinical coding schemes: completeness, deﬁnitions and clarity. Journal of the American Medical Informatics Association 4, 238–251. Chaudhri, V.K., Farquhar, A., Fikes, R., Karp, P.D., Rice, J.P., 1998. OKBC: A programmatic foundation for knowledge base interoperability. The 10th Conference on Innovative Applications of AI (IAAI-98). AAAI Press, Menlow Park, CA, pp. 600–607. Chen, P., 1976. The entity-relationship model: towards a uniﬁed view of data. ACM Transaction on Database Systems 1 (1), 9–36. Chute, C., Cohn, S., Campbell, J., 1998. A framework for comprehensive health terminology systems in the United States: development guidelines, criteria for selection and public policy implications. Journal of the American Medical Informatics Association 5 (6), 503–510. Cimino, J.J., 1998. Desiderata for controlled medical vocabularies in the twenty-ﬁrst century. Methods of Information in Medicine 37, 394–403. Cimino, J.J., Clayton, P.D., Hripcsak, G., Johnson, S.B., 1994. Knowledge-based approaches to the maintenance of a large controlled medical terminology. Journal of the American Medical Informatics Association 1, 35–50. Cornet, R., Abu-Hanna, A., 2003. Using description logics for managing medical terminologies. In: Dojat, M., Keravnou, E., Barahona, P. (Eds.), Artiﬁcial Intelligence in Medicine. Proceedings of the Ninth Conference on Artiﬁcial Intelligence in Medicine in Europe (AIME), pp. 61–70. Genesereth, M.R., Fikes, R.E., 1992. Knowledge interchange format version 3.0 reference manual. Technical Report Logic-92-1, Stanford University, Stanford, CA. Gennari, J.H., Musen, M.A., Fergerson, R.W., Grosso, W.E., Crube´zy, M., Eriksson, H., Noy, N.F., Tu, S.W., 2003. The evolution of Prote´ge´: an environment for knowledge-based systems development. International Journal of Human-Computer Studies 58 (1), 89–123.

ARTICLE IN PRESS A. Abu-Hanna et al. / Int. J. Human-Computer Studies 62 (2005) 639–663

663

International Classiﬁcation of Diseases, 1993. Manual of the International Statistical Classiﬁcation of Diseases, Injuries and Causes of Death: 10th revision. World Health Organisation. de Keizer, N.F., Abu-Hanna, A., 2000. Understanding terminological systems (II): experience with conceptual and formal representation of structure. Methods of Information in Medicine 39, 22–29. de Keizer, N.F., Abu-Hanna, A., Cornet, R., Zwetsloot, J.H.M., Stoutenbeek, C., 1999. Analysis and design of an ontology for intensive care diagnoses. Methods of Information in Medicine 38 (2), 102–112. de Keizer, N.F., Abu-Hanna, A., Zwetsloot, J.H.M., 2000. Understanding terminological systems (I): terminology and typology. Methods of Information in Medicine 39, 16–21. Lindberg, D., Humphreys, B., Mc Cray, A., 1993. The uniﬁed medical language system. Methods of Information in Medicine 34, 281–291. Noy, N.F., Fergerson, R.W., Musen, M.A., 2000. The knowledge model of PROTE´GE´: combining interoperability and ﬂexibility. Second International Conference on Knowledge Engineering and Knowledge Management (EKAW). Rector, A., Bechhofer, S., Goble, C., Horrocks, I., Nowlan, W., Solomon, W., 1997. The Grail concept modelling language for medical terminology. Artiﬁcial Intelligence 9, 139–171. Rothwell, D., 1995. SNOMED-based knowledge representation. Methods of Information in Medicine 34, 209–213.

Lihat lebih banyak...

protégé as a vehicle for developing medical terminological systems

Descrição do Produto

Comentários