Temporal similarity measures for querying clinical workflows

Share Embed


Descrição do Produto

ARTMED-1026; No of Pages 18 Artificial Intelligence in Medicine (2008) xxx, xxx—xxx

http://www.intl.elsevierhealth.com/journals/aiim

Temporal similarity measures for querying clinical workflows Carlo Combi a, Matteo Gozzi a, Barbara Oliboni a, Jose M. Juarez b,*, Roque Marin b a

Department of Computer Science, University of Verona, Strada le Grazie 15, I-37134 Verona, Italy Department of Information and Communication Engineering, University of Murcia, Campus de Espinardo, 30100 Murcia, Spain

b

Received 18 December 2007; received in revised form 28 July 2008; accepted 29 July 2008

KEYWORDS Clinical workflows; Temporal similarity; Clinical guidelines; Temporal constraint networks

Summary Objective: In this paper, we extend a preliminary proposal and discuss in a deeper and more formal way an approach to evaluate temporal similarity between clinical workflow cases (i.e., executions of clinical processes). More precisely, we focus on (i) the representation of clinical processes by using a temporal conceptual workflow model; (ii) the definition of ad hoc temporal constraint networks to formally represent clinical workflow cases; (iii) the definition of temporal similarity for clinical workflow cases based on the comparison of temporal constraint networks; (iv) the management of the similarity of clinical processes related to the Italian guideline for stroke prevention and management (SPREAD). Background: Clinical processes are composed by clinical activities to be done by given actors in a given order satisfying given temporal constraints. This description means that clinical processes can be seen as organizational processes, and modeled by workflow schemata. When a workflow schema represents a clinical process, its cases represent different instances derived from dealing with different patients in different situations. With respect to all the cases related to a workflow schema, each clinical case can be different with respect to its structure and to its temporal aspects. Clinical cases can be stored in clinical databases and information retrieval can be done evaluating the similarity between workflow cases. Methodology: We first describe a possible approach to the conceptual modeling of a clinical process, by using a temporally extended workflow model. Then, we define how a workflow case can be represented as a set of activities, and show how to express them through temporal constraint networks. Once we have built temporal constraint networks related to the cases to compare, we propose a similarity function able to evaluate the differences between the considered cases with respect to the order and

* Corresponding author at: Departamento de Ingenieria de la Informacion y las Comunicaciones, Facultad de Informatica, Campus de Espinardo, Universidad de Murcia, 30100, Spain. Tel.: +34 968 367345; fax: +34 968 364151. E-mail address: [email protected] (J.M. Juarez). 0933-3657/$ — see front matter # 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.artmed.2008.07.013

Please cite this article in press as: Combi C, et al. Temporal similarity measures for querying clinical workflows. Artif Intell Med (2008), doi:10.1016/j.artmed.2008.07.013

ARTMED-1026; No of Pages 18

2

C. Combi et al. duration of corresponding activities, and with respect to the presence/absence of some activities. Results: In this work, we propose an approach to evaluate temporal similarity between workflow cases. The proposed approach can be used (i) to query clinical databases storing clinical cases representing activities related to the management of different patients in different situations; (ii) to evaluate the quality of the service comparing the similarity between a (possibly synthetic) case, perceived as the good one with respect to a given clinical situation, and the other clinical cases; and (iii) to retrieve a particular class of cases similar to an interesting one. # 2008 Elsevier B.V. All rights reserved.

1. Introduction In health care organizations, clinical (business) processes are becoming of particular importance: they allow the healthcare actors to focus on crucial aspects as evaluating the quality of healthcare services, suggesting the more suitable clinical pathway, controlling the budget for each clinical activity [1]. Clinical business processes are related to the medical care and can be considered and studied from different perspectives [2—5]. From a more clinicaloriented point of view, clinical guidelines and protocols may be seen as specific medical processes, having a widely acknowledged structure: clinical guidelines describe, in natural language, the recommended behaviour of a medical team, the activities to apply to the patient, and their fulfilment with respect to the time and to the state of patient health, for defining the best way to manage patients. The definition and management of clinical processes — based on guidelines, protocols, or on specific clinical and therapeutical actions — provide a support to physicians in their daily activities and help physicians to improve patient care and health outcomes for their patients by providing recommendations usually based on scientific evidence and expert clinical opinion. According to this scenario, in the next years there will be a huge amount of clinical process-related data available for several different purposes: evaluating the quality of the provided care, extracting medical knowledge, assessing clinical procedures, and so on. In a more general perspective, clinical processes can be managed by means of business modeling tools such as workflow management systems (WfMSs) [2,3]. Workflows are processes involving the coordinated execution of single atomic activities (named tasks), assigned and executed by processing entities (named agents) to achieve a common goal. Clinical processes can be modeled by workflow schemata and enacted by suitable software systems, called WfMSs. Workflow cases, instances of the same workflow schema, can be different with respect to the

structure, i.e., to the activities composing the cases, and to their (temporal) order and length. When a workflow schema represents a clinical process, its cases represent different instances deriving from dealing with different patients in different situations. The best (worst) application of a clinical process can be represented by means of workflow cases and can be used to evaluate the quality of the service comparing the similarity between several (possibly synthetic) cases, perceived as the good ones with respect to different clinical situations, and the other clinical cases. Moreover, a given case, representing something of interest, can be used to retrieve a particular class of cases similar to the given one. Thus, information retrieval can be done evaluating the similarity between workflow cases. The evaluation of temporal similarity seems to be an important issue in the clinical context, where cases are slightly different according to the patient situation. In this work, we extend the proposal described in [6] and discuss in a deeper and more formal way an approach to evaluate temporal similarity between workflow cases. More precisely, in this paper we face some important methodological issues, which need to be suitably solved before the design and implementation of clinical systems for managing and retrieving clinical process data. In particular, we focus on the following methodological aspects:  use of a temporal conceptual workflow model to represent clinical processes;  definition of ad hoc temporal constraint networks to formally represent clinical workflow cases;  definition of temporal similarity for clinical workflow cases based on the comparison of temporal constraint networks;  use as a proof-of-concept of real world clinical processes derived from modeling some fragments of the Italian guideline for stroke prevention and management (SPREAD) [7]. The structure of the paper is as follows: Section 2 provides some background on the modeling of clin-

Please cite this article in press as: Combi C, et al. Temporal similarity measures for querying clinical workflows. Artif Intell Med (2008), doi:10.1016/j.artmed.2008.07.013

ARTMED-1026; No of Pages 18

Temporal similarity measures for querying clinical workflows ical processes by workflow systems and on the concept of temporal similarity. Section 3 presents a portion of the considered guideline and its conceptual schema obtained through a workflow temporal conceptual model. Section 4 defines the main concepts we propose to evaluate similarity between workflow cases and provides an overall example of similarity evaluation for clinical cases. Finally, Section 5 sketches some concluding remarks and future research directions.

2. Background Any health or clinical process requires the coordinated execution of single activities to achieve a common goal: as an example, we may consider a general process related to the intensive care of patients, where diagnosis-related and therapeutic tasks, possibly involving several different people such as physicians, nursery, and technicians, have to be coordinated. Clinical guidelines may be considered as an informal, natural language specification of a clinical process for a given category of patients; guidelines are acknowledged by the medical community as the recommendation for dealing in a sound way with the specified patients: for these reasons, issues related to the formal representation of guidelines have been considered by several research teams [1,3—5]. In general, there are some similarities between guidelines and organizational processes: on one hand, in the clinical context, guidelines describe a sequence of activities to be done; on the other hand, in the business context, an organizational process can be defined as a description of tasks and consists of subprocesses, decisions and activities. In both cases, a sequence of activities must be done to reach a (given) goal, in the former case to manage in a correct way the patient situation, while in the latter case to satisfy the business needs. This means that guidelines can be seen as processes, and can be managed by means of business modeling tools such as workflow management systems [3]. In general, WfMSs are software systems able to support the specification and the coordination of organizational processes [8]: a workflow formally describes process activities, including criteria to assign every single activity to an executing unit. We name process model or schema the structure of the workflow, defining how the single atomic activities are coordinated and enacted in sequence over time. The workflow designer specifies the process model, along with criteria to assign tasks to the executing units, named agents. Given a process model, several instances or cases can be run, each

3

one owning its specific data [8]. In general, WfMSs specify, control, and coordinate the flow of work cases (sequences of activities which form an organizational process). WfMSs support the execution of workflow instances and need the management of huge quantities of data. To the best of our knowledge, no WfMS as such escapes from the use of a database management system (DBMS): all of the adopted DBMSs are based on the relational model. Recently, some research efforts have been focused on using WfMSs for managing clinical or health processes [2,3,9]. In the clinical workflow context, time is an important aspect to consider. For instance, activities described in a guideline must be executed according to given coordination rules, involving also some temporal constraints. Thus, a workflow schema may contain qualitative and quantitative temporal constraints. In clinical workflows, and in most models for clinical scenarios, it is fundamental to find the best way for representing time, processing temporal data, and comparing temporal information by similarity techniques. Similarity measures, taken from the analogical reasoning, can be used to quantify how similar two elements are. These measures are traditionally defined by mathematical functions, stating the properties of reflexivity and symmetry. In general, there are two kinds of medical temporal data: time series (biosignals), and temporal sequences (time-stamped clinical data). Proposals of similarity measures for time series usually work on the raw time series data (e.g., ECG or EEG directly obtained from monitoring) and aim to derive the most representative features from a large amount of data [10]. Some of the most successful strategies are based on the dimensionality reduction (Discrete Fourier Transform, Discrete Wavelet, Time Warping Transformations), in order to obtain a feature vector or model parameters [10,11]. Temporal sequences are collections of occurrences of different event types, as, for example, the set of test results of a patient during a week in the Intensive Care Unit. Occurrences are usually associated to single time points. The work described in [12,13] defines the distance between two sequences using the Euclidean distance, by viewing a sequence as a point in a suitable multidimensional space. In [14], the similarity evaluates the relative position of an event occurrence within a window context. That is, event occurrences are similar if they occur in a similar context, and contexts are defined as the set of events happening within a predefined time window. Furthermore, it is common to find sequences composed by facts holding on intervals, such as

Please cite this article in press as: Combi C, et al. Temporal similarity measures for querying clinical workflows. Artif Intell Med (2008), doi:10.1016/j.artmed.2008.07.013

ARTMED-1026; No of Pages 18

4

C. Combi et al.

the description of protocols, treatments, patient symptoms, or parameters abstracted from biosignals (e.g., ST-segment elevation of an ECG). This kind of temporal data is called interval sequence and is of increasing interest in many research fields, as in temporal clinical abstraction [15,16], and in temporal data mining [17,18]. Focusing on the few proposals dealing with similarity for interval sequences, in [19], the authors discuss different solutions for defining similarity between two temporal sequences, according to the distance between their composing intervals. In [20], the authors consider the issue of recognizing similar clinical scenarios, composed by both events (point-based) and facts (interval-based); as the scenarios are represented through temporal constraint networks, similarity is led back to the fusion of both networks. Temporal Constraint Networks are a powerful approach for representing and querying temporal information. They are represented as a constraint satisfaction problem (CSP) [21], where variables denote event types and constraints represent the temporal relations amongst them. The interval algebra (IA), introduced by James Allen to represent and manage interval relations [22], is one of the most considered, used and suitably modified/extended models. For instance, in [23] a general framework for qualitative—quantitative interval constraints is described. As for the stroke management domain, in [24], the authors analyse the discovery of time dependency patterns in clinical pathways related to the care of patients having had stroke, proposing a temporal similarity measure between pathways considering the common edges between the time dependency networks.

Synthesis 9.1: A stroke victim should rapidly be assessed after hospitalization (T1), by means of a general examination (T2) and [. . .] Recommendation 9.1 and 9.2: an early and standardized neurological evaluation (T3) is recommended in the setting of a qualitatively adequate management of acute stroke (COND-1). Recommendation 9.4: [. . .] the following blood exams (T4) are recommended: complete blood count including platelets, [. . .], and coagulation tests [. . .] Recommendation 9.6: The electrocardiogram (T5) is recommended in all suspected stroke victims who are admitted to an Emergency Room. Recommendation 9.7: A non-contrast CT scan (T7) is recommended as soon as possible in the emergency care, in any case not later than 24 h from stroke onset. Recommendation 9.8: The use of adequate technical parameters and positioning criteria (T6) is recommended for CT scan assessment of acute stroke (COND-2). Recommendation 9.9: Digital subtraction angiography (T8) is recommended in the acute stroke only if pre-procedural to an intra-arterial thrombolytic approach (COND-3). Otherwise, the study of arterial occlusion may be obtained by means of MRA (magnetic resonance angiography: T10) or CTA (computed tomography angiography: T9).

3. Representing clinical temporal workflows

Recommendation 9.10: The repetition of non-contrast CT scan (T11) is suggested within 48 h and anyhow not later than 7 days from stroke onset. It is particularly recommended when stroke is severe (COND-5) [. . .].

In this section, we consider a possible approach for conceptually modeling clinical workflows. We represent clinical workflows by using a temporally extended workflow model, which allows one to simply show the required clinical tasks (i.e., activities), the execution flow of tasks, and the temporal constraints on them [2]. Throughout the paper, we will deal with the representation and the management of workflows related to some fragments of the Italian guideline for stroke prevention and management (SPREAD). The fragments we will consider to practically explain our approach will be the following ones:

Fig. 1 depicts the considered portion of the guideline by means of a workflow schema. The schema is a directed graph, called workflow graph, where nodes correspond to activities and edges represent control flows that define the execution order of activities. There are two different activity types: task and connector. Tasks are the elementary work units that collectively achieve the process goal. Each task has one incoming edge and one outgoing edge and is graphically represented as a box. For example, task T1 in Fig. 1 models the hospital admission activity. A connector can be initial (startcase), final (endcase), or intermediate. Each workflow graph must

Please cite this article in press as: Combi C, et al. Temporal similarity measures for querying clinical workflows. Artif Intell Med (2008), doi:10.1016/j.artmed.2008.07.013

ARTMED-1026; No of Pages 18

Temporal similarity measures for querying clinical workflows

5

Figure 1 The workflow schema of the considered guideline portion. Dashed arrows represent a constraint on the time distance between two non-subsequent tasks, where ½B=E; x; B=Egranularity quantifies the maximum duration (x) between the begin (B) or end (E) of the tasks.

have only one startcase and at least one endcase. When any endcase is reached, the execution of the process is completed: more precisely, when the earliest endcase is performed, the execution is considered successful and all the other tasks, which are still active, are cancelled. Graphically, startcase and endcase are represented by two parallel lines. Intermediate connectors are exploited to model different execution paths. Paths that are (possibly) executed concurrently are splitted by the Total

connector and then joined by the And connector. Similarly, paths that are executed alternatively are splitted by the Cond connector and then joined by the Or connector. Conditional connectors are graphically represented as diamonds. It is worth noting that data required and/or produced during the performance of different tasks may be related to several and different databases: for example, data acquired during tasks T3 and T5, referring to the neurological assessment and to ECG,

Please cite this article in press as: Combi C, et al. Temporal similarity measures for querying clinical workflows. Artif Intell Med (2008), doi:10.1016/j.artmed.2008.07.013

ARTMED-1026; No of Pages 18

6

C. Combi et al.

respectively, are part of the electronic medical record of the considered patient, while data derived from task T6, referring to the tuning of parameters for CT scan, could be either only operational data, not stored permanently, or technical data stored in some radiology document. The modeling of guideline activities is enriched by additional temporal information, such as allowed durations and delays or temporal constraints. The duration is a temporal property of both nodes and edges and is always expressed by using intervals. A node duration represents the temporal span of the corresponding activity and is the temporal distance between the beginning and ending instants of the activity itself. For instance, the allowed duration of task T5 is within 3 and 12 min. The edge duration, called delay, denotes the interval between the ending instant of the predecessor and the beginning instant of the successor. For instance, the allowed delay between the end of T4 and the beginning of T5 is between 5 and 7 min. In some cases, only maximum or minimum duration may be defined: consequently intervals could be only partially defined. In our model, duration and delay bounds are integer values if the interval is completely specified, otherwise we use the symbol U to represent undefined values. For instance, the maximum delay between the end of T3 and the beginning of T4 is unbounded. Another important temporal feature is the capability of constraining the time distance (duration) between the beginning/ending instants of two nonsubsequent tasks. It is expressed as a dashed arrow connecting the two non-subsequent tasks and a label [IF , TimeDistance, IL ] Granule that denotes that TimeDistance must be the maximum span of time between the beginning or ending execution instant (IF ) of the first task and the beginning or ending execution instant (IL ) of the last task. Fig. 1 provides an example of a relative constraint,

Figure 2

which fixes to 48 h the maximum time distance between the beginning of task T7 and the end of T11, as required by the considered fragment of the guideline. Moreover, this constraint can be used to link the task execution to external events, such as the Stroke Event. In our model, an event is represented as a rounded clock and indicates a time instant that could bias the execution of activities. For instance, task T7 must be completed not later than 24 h from the stroke event. Further constructs of the workflow conceptual model are discussed in [2]. It is worth noting that expressing a clinical guideline through a workflow schema arises several issues related to the interpretation of natural language sentences, requiring a sound background of medical knowledge. Another important issue is to verify the temporal consistency of a clinical guideline: by temporal consistency we mean that the guideline could be executed satisfying all the given temporal constraints. Checking the temporal consistency of a clinical guideline described through a workflow schema involves the study of some computational issues, which are out of the scope of this paper. In the following, we will assume that these aspects have been deeply considered and that the resulting workflow schema is sound and describes the considered guideline in a proper way. According to the specified schema, there are several possible workflow cases, which correspond to the clinical path of different patients. During the execution of workflow cases, a temporal WfMS is able to detect violations of the specified constraints and either to possibly perform some compensation tasks or to stop the execution of the current case [25]. A workflow case is defined as follows: Definition 1. Being C the set of cases corresponding to a given workflow schema, a workflow case

Examples of workflow cases of five different patient scenarios.

Please cite this article in press as: Combi C, et al. Temporal similarity measures for querying clinical workflows. Artif Intell Med (2008), doi:10.1016/j.artmed.2008.07.013

ARTMED-1026; No of Pages 18

Temporal similarity measures for querying clinical workflows CASE 2 C is an ordered set of labeled task intervals. A labeled task interval is a pair (taskName, t) where taskName is a task label (e.g., T1, T2, T3), and t is the task interval. The task interval t can be expressed as ðt ; tþ Þ where t and tþ are timestamps describing the beginning and ending times of the task interval, respectively. CASE ¼ < ðtaskInterval1 Þ; ðtaskInterval2 Þ; . . . ; ðtaskIntervaln Þ > þ ¼ < ðtaskName1 ; ðt 1 ; t1 ÞÞ; ðtaskName2 ; þ ðt 2 ; t2 ÞÞ; þ . . . ; ðtaskNamen ; ðt n ; tn ÞÞ >  such that 8 ðtaskNamei ; ti Þ 2 CASE;t i  tiþ1 ; i ¼ 1; . . . ; n  1.

Fig. 2 depicts five different workflow cases for the workflow schema shown in Fig. 1: each case is represented as a sequence of intervals labeled by the corresponding task name on the timeline having the beginning of the case as origin. In general, cases are only temporally constrained by the specification of the workflow schema. Thus, cases of the same schema could differ with respect to the order and duration of tasks, and with respect to the presence/absence of some tasks due to alternative paths. For instance, cases CASE_1and CASE_3 in Fig. 2 differ on the order between tasks T1 and T2. CASE_3 differs from all the other cases on the absence of tasks but T1 and T2, due to the alternative paths induced by the connector COND-1. CASE_4 differs from the other cases because it has task T10. According to this scenario, a huge amount of cases for the same guideline will be stored in a database by a hospital stroke unit. Querying and analyzing this database is, thus, extremely important for several clinical applications: for example, to evaluate the quality of the provided care, we could compare the similarity between a given case, considered as a good one, and the real clinical cases in the database. Moreover, a given case, representing something clinically interesting, may be used to retrieve a set of cases similar to the given one. In the next section we propose a suitable definition of (temporal) similarity for comparing different cases.

7

4. A temporal similarity measure for clinical cases To evaluate temporal similarity between clinical workflow cases, we propose to (i) compare corresponding performed tasks; (ii) compare qualitative/ quantitative temporal relations between corresponding tasks; and (iii) consider the presence/ absence of some tasks. Our proposal compares workflow cases by means of an interval similarity function. The given cases are represented through constraint networks, and the similarity is evaluated by considering the distance between intervals representing corresponding tasks, and the distance for the relations between corresponding tasks.

4.1. Augmented basic interval network: ABIN Temporal constraint networks (TCNs) are a temporal modeling approach that provides an explicit representation of temporal relations for a given scenario. This is an advantage when two temporal scenarios must be compared since, instead of comparing temporal event data, we could directly identify the overall differences between temporal relations. Moreover, these constraints can also contain different aspects of the temporal information (e.g., quantitative/qualitative, crisp/fuzzy) enriching its description and providing a flexible representation for different purposes. The interval algebra (IA) [22] deals with variables representing intervals and qualitative constraints between them. The interval algebra assumes 13 basic temporal relations (BTR): before (b), meets (m), overlaps (o), starts (s), during (d), finishes (f), their inverses, and equal (e). However, there are some IA aspects that, from the point of view of our particular domain, must be reviewed. On the one hand, IA is able to represent combinations of BTRs (213 possible disjunctions), making the consistency algorithms NP-Complete because non-convex disjunctions are allowed; intuitively, a disjunction of relations between two intervals is convex if they can be directly transformed into one another by con-

Figure 3 Examples of BTR disjunction types. Graphically, in the non-convex disjunction (left), moving the interval A from left to right, from the relation AbB we reach AaB, by passing through several other relations, such as, for example, AmB. It is not the case for the convex disjunction (right), where it is possible to move from AbB to AmB, by simply moving A. Please cite this article in press as: Combi C, et al. Temporal similarity measures for querying clinical workflows. Artif Intell Med (2008), doi:10.1016/j.artmed.2008.07.013

ARTMED-1026; No of Pages 18

8

C. Combi et al.

Figure 4 Example of expressivity of ABIN: sequence, language, and network. In the network at the right side, nodes represent intervals (with the range of allowed durations) while edges represent interval constraints (i.e., a qualitative relation followed by a quantitative constraint).

tinuously deforming the intervals (Fig. 3 depicts an example of non-convex and convex disjunctions). On the other hand, IA lacks of quantitative information, which is useful to measure temporal similarity. Our approach is based on augmented basic interval networks. An augmented basic interval network (ABIN) allows one to represent in an explicit way the temporal information useful to measure temporal similarity between sequences of intervals. An ABIN is an interval constraint network that permits a constrained set of the qualitative interval relations, namely BTRs and reducible disjunctions (those disjunctions which can be translated into conjunctions of relations between the bounds of the intervals [26]), and quantitative constraints for intervals, i.e., constraints on interval durations and constraints on the distance between interval bounds. We define an augmented basic interval network as follows: Definition 2. Given a time domain isomorphic to real numbers R, an augmented basic interval network (ABIN) is a structure S ABIN ¼ hT; R; DðTÞi formed by a set T of intervals, a set R of augmented interval constraints, and a set of unary metric constraints DðTÞ where:  T is the set of intervals ft1 ;    ; tn g. Each t j 2 T is represented as t j ¼ ðtj ; tþj Þ and tj ; tþj 2 R characterize begin and end of the interval, respectively.  R ¼ fr i; j jti ; t j 2 Tg is the set of augmented interval constraints, with r i; j ¼ ðbi; j ; mi; j Þ, where:  [  bullet] bi; j 2 Rb is either a BTR or a reducible disjunction of BTRs between ti ; t j 2 T;  [  bullet] mi; j 2 M : M ¼ f½x; yg with x; y 2 R, is the set of quantitative constraints between  tþ i and t j .  DðTÞ ¼ fd j jfor each t j 2 Tg where d j ¼ ½x; y with x; y 2 R; d j represents the range of the allowed durations for interval t j , i.e. tþj  tj .

This formalism provides a flexible language to express temporal qualitative—quantitative information on intervals. For instance, Fig. 4 (left) depicts an example with three potential scenarios of intervals A, B, and C, and, on the right, shows how they can be expressed using a single ABIN network. In the example, any depicted range has the same value for upper and lower bounds, since there is no imprecision in the temporal scenario. Note that an ABIN describes the relevant information of the workflow case, discarding the absolute references of the temporal sequence (i.e., the time-stamped information of the begin-end information of each interval). Our approach requires to build a temporal constraint network for each clinical workflow case. To this end, we propose three essential steps to obtain an ABIN network, from a given workflow case CASE i 2 C: 1. Obtain the intervals of an ABIN network from the intervals of the clinical workflow case. 2. Obtain the unary constraints (interval durations). 3. Obtain the rest of relations of an ABIN network from the temporal data of the workflow case. Let us to suppose that we want to describe an ABIN network S ABIN ¼ hT; R; DðTÞi from the following workflow case: CASE i

¼ < ðtaskIntervali1 Þ; ðtaskIntervali2 Þ; . . . ; ðtaskIntervalin Þ > þ ¼ < ðtaskNamei1 ; ðt i1 ; ti1 ÞÞ; ðtaskNamei2 ; þ ðt i2 ; ti2 ÞÞ; þ . . . ; ðtaskNamein ; ðt in ; tin ÞÞ >

At the first step, we can state, by definition, a direct match between each task interval of the workflow case (taskIntervali j 2 CASE i), and a corresponding interval of the ABIN network (t 2 T). Formally we can state that: 8 taskIntervali j 2 CASE i; 9 t j 2 T with d j 2 DðTÞ; d j þ  þ ¼ ½jt i j  ti j j; jti j  ti j j

Please cite this article in press as: Combi C, et al. Temporal similarity measures for querying clinical workflows. Artif Intell Med (2008), doi:10.1016/j.artmed.2008.07.013

ARTMED-1026; No of Pages 18

Temporal similarity measures for querying clinical workflows Then, we must state which are the relations (r 2 R) of the ABIN network. At this point, an essential decision must be taken concerning the number of relations that have to be obtained. That is, we must choose between these two approaches: (i) to obtain the complete network (by considering the relations between all pairs of intervals) and (ii) to obtain a partial network. The main advantage of the first option is the fact that all relations are obtained directly from the temporal data of the workflow case, where the complexity of the algorithm, considering jCASE ij ¼ n, is Oðn2 Þ. Furthermore, the inference of new relations will not be needed. The main disadvantage of the calculus of a complete network is the increment of temporal information redundancy compared to the information provided by the sequence of task intervals. For instance, given the intervals tk ; t j 2 T, one of relations rk; j and r j;k is redundant. Despite that this particular type of redundancy can be avoided, the complexity order of the algorithms is the same. In our application domain, if the workflow case similarity must be measured, we consider that is more convenient to use only partial networks, avoiding temporal information redundancy. Because workflow cases have, by definition, a total task order (see Definition 1), the most simple mechanisms to obtain partial networks is to obtain only relations between consecu-

9

tive tasks. This derivation has linear complexity with respect to the number of tasks of the workflow case. The obtained network is composed by nodes and edges: nodes represent task intervals, while edges stand for qualitative relations, enriched by some quantitative information, between two task intervals. Nodes are labeled by the corresponding task name and by the related (upper and lower bounds of) interval durations, while edges are labeled by a single Allen’s interval relation enriched with some quantitative data. Each task interval has a corresponding node in the network, while edges are introduced only for relations between each task taskIntervali and its successive one taskIntervaliþ1 . For example, Fig. 5 depicts the networks corresponding to cases reported in Fig. 2: in the ABIN corresponding to CASE_1, tasks T1and T2are represented by labeled nodes and the relation before between them is represented as a directed edge labeled by b and by the interval [1,1], describing the (minimum and maximum) delay between the end of T1 and the beginning of T2. It is worth noting that, as the network corresponding to a case has no uncertainty for task durations and delays, it is redundant to have range values for them and edges are labeled only by a single Allen’s relation. However, this expressiveness is useful in other scenarios,

Figure 5 The temporal networks obtained from the workflow cases of Fig. 2. Continuous edges represent relations between consecutive tasks of the case, while dashed edges are derived to compare the networks. The label m½0; 0 for edges is shortened as m. Please cite this article in press as: Combi C, et al. Temporal similarity measures for querying clinical workflows. Artif Intell Med (2008), doi:10.1016/j.artmed.2008.07.013

ARTMED-1026; No of Pages 18

10 not considered in this paper, such as representing imprecise clinical records or describing general clinical pathways. For instance, the history of a patient could register that the electrocardiogram test (T5) lasted between 4 and 7 min. Moreover, uncertainty could arise from missing data, when some task features are not completely stored by the clinical information system: in this case, some flexible and context-related techniques should be designed and implemented to allow one to estimate a range of possible values for missing data. These aspects of the model and their applications will be considered as part of our future work. One important facet of the use of a TCN is the tractability of the defined problem. The ABIN model is based on the enrichment of Allen’s interval algebra. However, despite that deciding consistency of interval constraint networks is NP-Complete [21] for the general problem (non-convex disjunctive relations allowed), there are tractable subclasses of the problem. In particular, as described in [21,27], there are three tractable subclasses:  The pointisable subclass can be represented by an interval algebra allowing the use of interval relations that can be expressed as a disjunction of the point algebra relations (f < ; ¼; > g) between interval bounds. The consistency problem associated to this class can be solved using path consistency algorithms in Oðn4 Þ.  The continuous endpoint subclass corresponds to an interval algebra allowing the use of interval relations that can be expressed as a disjunction of the point relations excluding 6¼ , i.e., f < ; > g. The consistency problem associated to this class can be solved using path consistency algorithms in Oðn3 Þ.  The ORD-Horn subclass corresponds to the maximal tractable fragment of IA, where a 5-consistency algorithm Oðn5 Þ is required to obtain the minimal network. In an ABIN, the model restricts the qualitative relations of the problem to BTRs and to the 181 reducible disjunctions proposed by Van Beek [26]. Thus, the problem is part of the pointisable subclass and, therefore, it can be reduced into a point constraint network, which is a simple temporal problem (i.e., it can be solved in polynomial time). Moreover, quantitative aspects of ABIN imply the introduction of metric point constraints into the point constraint network. That is, the duration (d i 2 DðTÞ) expresses a metric constraint between the begin and end points of the interval, while the quantitative parameters of interval relations (mi; j ) are basic metric constraints between the end and the begin of the two involved intervals.

C. Combi et al. Theorem 1. An augmented simple interval network (ABIN) is a structure S ABIN ¼ hT; R; DðTÞi that can be reduced to a simple temporal problem. Proof. A simple temporal problem (STP) is a problem which can be expressed by a metric constraint network, where nodes stand for time points and directed edges are labeled by a single interval of allowed distances between the connected nodes; consistency of STPs may be checked in polynomial time [28]. In order to prove that an ABIN network can be reduced to an STP, the following statements must be considered: 1. In S ABIN the intervals of the network (t 2 T) can be expressed by a pair of points describing the bounds of the interval (t and tþ ). 2. In S ABIN , d j 2 DðTÞ is, by definition, a range of allowed values for the distance between the bounds of interval t j , 8 t j 2 T. 3. In S ABIN , 8 ti ; t j 2 T, r i; j 2 R can be decomposed into two constraints mi; j and bi; j . The relation mi; j ¼ ½x; y is a range of allowed values for the  (module of the) distance between tþ i and t j by definition. The constraint bi; j can be translated into four metric constraints between the four nodes (time points) that describe the two intervals ti and t j : these metric constraints are the quantitative version of the qualitative relations between the limits of the interval. Therefore, when the transformations from 1 to 3 have been applied, S ABIN is reduced to a STP. & Once the transformations are applied, all STP algorithms proposed by Dechter and Meiri [28] can be used. Thus, the classical path consistency algorithms can be used to obtain the minimal network.

4.2. Evaluating similarity measure Our proposal is based on the steps defined in Algorithm 1, and compares workflow cases, represented as ABINs, by means of an interval similarity function.

Intuitively, the algorithm expresses the two given cases (line 1) through interval constraint networks (lines 2 and 3), evaluates the distance between

Please cite this article in press as: Combi C, et al. Temporal similarity measures for querying clinical workflows. Artif Intell Med (2008), doi:10.1016/j.artmed.2008.07.013

ARTMED-1026; No of Pages 18

Temporal similarity measures for querying clinical workflows intervals representing corresponding tasks (line 4), and the distance for relations between corresponding tasks (line 5), and uses intra-task and inter-task distances to evaluate the overall similarity by considering also possible dissimilarities of cases with regard to the occurring tasks (line 6). As we already discussed, in this paper we do not deal with uncertainty for task durations and for delays between tasks. Thus, hereinafter we will use a simplified notation for the quantitative part of ABIN constraints: mi; j and d j will be considered as single real numbers instead of ranges, in describing similarity measures. Once workflow cases are translated into temporal networks, in order to perform a comparison between cases, we need to establish a correspondence between nodes and edges of two different networks. As task names univocally identify tasks within a case, the correspondence between nodes is built through task names; the correspondence between edges is built up on the correspondence of the connected nodes. When two connected nodes are consecutive in a case and are not consecutive in the other one, we need to derive the missing edge in order to compare corresponding (possibly derived) edges. For example, for comparing CASE_1 and CASE_4 we need to derive the edges between T2 and T1, and between T1 and T3 in the temporal network representing CASE_1, and the edges between T1 and T2, and between T2 and T3 in the temporal network representing CASE_4. In Fig. 5 derived edges are represented through dashed edges.

4.3. Intra-task distance The intra-task distance provides a direct method to evaluate the similarity of corresponding tasks, with respect to their durations, and to other atemporal features within the workflow process (such as the agent that performed the task). The intra-task distance (d in ) is based on the duration of the task interval (Dt ¼ ðtþ  t Þ) and is defined as follows: Definition 3. Given two workflow cases having two tasks with the same name, represented by two nodes with the same name in the networks S 0 and S 00 , obtained from the considered workflow cases, and given the intervals t0 2 T 0 and t00 2 T 00 related to the two nodes, the intra-task distance function is defined as follows: d in : T  T ! ½0; 1Þ jðD 0  Dt00 Þj d in ðt ; t Þ ¼ a t jDt0 j þ jDt00 j 0

00

þð1  aÞd wf ðt0 ; t00 Þ

11

where a 2 ½0; 1 is the weight of the duration in the function, and the distance function d wf measures other workflow parameters not related to the temporal dimension. The distance function d in fulfills the following properties:

Property 1 (Reflexivity). Given ðTÞi and t0 2 T, d in ðt0 ; t0 Þ ¼ 0.

S ABIN ¼ hT; R; D

Property 2. Given  S 0ABIN ¼ hT 0 ; R0 ; D0 ðT 0 Þi  S 00ABIN ¼ hT 00 ; R00 ; D00 ðT 00 Þi  t0 2 T 0 ; t00 2 T 00 then d in ðt0 ; t00 Þ ¼ d in ðt00 ; t0 Þ (assuming that the distance function d wf is symmetric). The proof of Property 2 is trivial since jðDt0  Dt00 Þj ¼ kðDt00  Dt0 Þj where Dt0 ; Dt00 2 R.

4.4. Inter-task distance After considering intra-task distance, we have to take into account the inter-task distance. The inter-task similarity deals with similarities for temporal relations between corresponding tasks. At this aim, we define an inter-task distance function (d IN ), which takes into account both qualitative (Q) and quantitative (q) components of the edge labels in the temporal network, by using functions Q and q, respectively (as shown in Fig. 6). Function Q measures the distance between two qualitative temporal relations. The distance between two Allen’s relations is evaluated according to the distance of the considered relations on one of the neighbour graphs proposed by Freksa in [29]: two interval relations between the same intervals are neighbours if it is possible to directly move from one relation to the other one, by continuously deforming the intervals (i.e., shortening, shifting,. . .). For example, if we have a before relation between two intervals, we can move from before to meets, by simply shifting the first interval to be contiguous to the second one: in this case, the distance between before and meets is 1. In this work, we adopted, without loss of generality, the A-neighbours graph depicted in Fig. 7.

Definition 4. Given two qualitative temporal relations b0 , b00 , the distance function Q can be defined as follows:

Please cite this article in press as: Combi C, et al. Temporal similarity measures for querying clinical workflows. Artif Intell Med (2008), doi:10.1016/j.artmed.2008.07.013

ARTMED-1026; No of Pages 18

12

C. Combi et al.

Figure 6

Example of ABIN reduction to point constraint network.

Q : BTR  BTR ! ½0; 1 pathðb0 ; b00 ; GÞ Q ðb0 ; b00 Þ ¼ max fpathð; ; GÞg

Property 4. Given ri;0 j 2 R0 Q ðb0i; j ; b00i; j Þ ¼ Q ðb00i; j ; b0i; j Þ.

Being G the neighbours graph between interval relations, function pathðb0 ; b00 ; GÞ measures the shortest path on the graph G between the relations b0 and b00 . Function Q is normalized considering the longest path on the graph G.

The proof of Property 4 is trivial since pathðbi ; b j ; GÞ is the length of the shortest path between the nodes bi and b j of the graph G and max fpathð; ; GÞg is constant regarding G. Function q takes into account the quantitative aspects of an ABIN network, and is defined as follows:

Function Q has the following properties: Property 3 (Reflexivity). Given ri;0 j 2 R0 ri;0 j ¼ ðb0i; j ; m0i; j Þ, then Q ðb0i; j ; b0i; j Þ ¼ 0.

Figure 7

and

and ri;00 j 2 R00

then

Definition 5. Given two quantitative temporal constraints between time points in M (i.e., the set of all

The A-neighbours graph proposed by Freksa and an example of the Q function.

Please cite this article in press as: Combi C, et al. Temporal similarity measures for querying clinical workflows. Artif Intell Med (2008), doi:10.1016/j.artmed.2008.07.013

ARTMED-1026; No of Pages 18

Temporal similarity measures for querying clinical workflows possible allowed ranges), the distance function q is defined as follows:

4.5. Overall distance: d ABIN The distance function d ABIN evaluates the overall distance (inter-task and intra-task) between two ABIN networks as follows:

q : M  M ! ½0; 1 jm0  m00 j qðm0 ; m00 Þ ¼ 0 jm þ m00 j Function q has the following properties: Property 5 (Reflexivity). Given ri;0 j 2 R0 and ri;0 j ¼ ðb0i; j ; m0i; j Þ then qðm0i; j ; m0i; j Þ ¼ 0. Property 6. Given ri;0 j 2 R0 qðm0i; j ; m00i; j Þ ¼ qðm00i; j ; m0i; j Þ.

13

and ri;00 j 2 R00

then

Definition 7. Given S 0 ¼ hT 0 ; R0 ; D0 ðT 0 Þi; S 00 ¼ 00 00 00 00 hT ; R ; D ðT Þi, the function d ABIN : ABIN  ABIN ! ½0; 1 is:  If jT 0 \ T 00 j 6¼ 0 (and, thus, jR0 \ R00 j 6¼ 0), where jT 0 \ T 00 j is the number of common tasks, and jR0 \ R00 j is the number of common (even derived) relations: d ABIN ðS 0 ; S 00 Þ ¼ d

The proof of Property 6 is trivial since 8 m0 ; m00 2 M; jm0  m00 j ¼ jm00  m0 j and jm0 þ m00 j ¼ jm00 þ m0 j. Once the distance functions Q and q have been defined, the inter-task distance can be defined as follows: Definition 6. Given two corresponding relations r 0 , r 00 in two ABIN networks representing two different clinical workflow cases, the inter-task distance function d IN is defined as follows: d IN : R0  00R ! ½0; 1 0 00 Q ðb ; b Þ if Q ðb ; b Þ > 0 d IN ðr ; r Þ ¼ bqðm0 ; m00 Þ if Q ðb0 ; b00 Þ ¼ 0 0

00

where r 0 ¼ ðb0 ; m0 Þ represents the relation between the tasks of the first case, through a qualitative value b0 , i.e., one of the Allen’s relations, and a quantitative value m0 , i.e., the distance between the end time of the first task and the beginning of the second one; and where r 00 ¼ ðb00 ; m00 Þ represents analogously the relation between the tasks of the second case. b is a weight for the quantitative component within the function. Moreover, the distance function d IN has the following properties: Property 7 (Reflexivity). Given ri;0 j 2 R0 , then d IN ðri;0 j ; ri;0 j Þ ¼ 0.

S 0 2 ABIN

and

Property 8. Given S 0 ; S 00 2 ABIN and ri;0 j 2 R0 , ri;00 j 2 R00 then d IN ðri;0 ; ri;00 j Þ ¼ d IN ðri;00 j ; ri;0 j Þ: The proof of Property 8 is trivial since both Q and q, are reflexive and symmetric.

P

0 00 t0 2 T 0 ;t00 2 T 00 d in ðt ; t Þ 0 00

P  dÞ

jT \ T j

r 0 2 R0 ;r00 2 R00 d IN ðr 0 00

0

þ ð1

; r 00 Þ

jR \ R j

where t0 and t00 stand for intervals of corresponding tasks, while r 0 and r 00 stand for relations between corresponding tasks.  If jT 0 \ T 00 j ¼ 0 (and, thus, jR0 \ R00 j ¼ 0) then d ABIN ¼ 1.

The function d ABIN has the following properties: Property 9 (Reflexivity). Given S 0 ¼ hT 0 ; R0 ; 0 0 0 0 D ðT Þi, then d ABIN ðS ; S Þ ¼ 0 due to d in ðt; tÞ ¼ 0 and d IN ðr; rÞ ¼ 0 8 t 2 T 0 ; r 2 R0 . Property 10. Given S 0 ; S 00 , then d ABIN ðS 0 ; S 00 Þ ¼ d ABIN ðS 00 ; S 0 Þ.

The proof of Property 10 is trivial since d in ðt0 ; t00 Þ ¼ d in ðt00 ; t0 Þ and d IN ðr 0 ; r 00 Þ ¼ d IN ðr 00 ; r 0 Þ for each corresponding t0 , t00 and r 0 , r00 where t0 2 T 0 ; r 0 2 R0 and t00 2 T 00 ; r 00 2 R00 .

4.6. Similarity measure: SABIN In general, the similarity is inversely proportional to the distance between elements (i.e., tasks and temporal relations), to be compared. Until now, we have defined the concept of distance between corresponding elements. In our case, when defining the overall similarity between two clinical workflow cases, we have to consider also the presence of tasks in one workflow case without corresponding tasks in the other workflow case. Such a presence of non-

Please cite this article in press as: Combi C, et al. Temporal similarity measures for querying clinical workflows. Artif Intell Med (2008), doi:10.1016/j.artmed.2008.07.013

ARTMED-1026; No of Pages 18

14

C. Combi et al.

corresponding tasks is suitably represented in the overall similarity function we will define for clinical workflow cases, by the function p. The function p is defined as follows: Definition 8. Given two workflow cases CASE0 and CASE00 , represented by sets T 0 , T 00 of tasks in the corresponding ABINs, the function p measures the presence of non-common nodes as follows: pðT 0 ; T 00 Þ ¼

jðT 0 [ T 00 ÞnðT 0 \ T 00 Þj jT 0 j þ jT 00 j

Moreover, pðT 0 ; T 0 Þ ¼ 0 and pðT 0 ; T 00 Þ ¼ pðT 00 ; T 0 Þ because of the commutative property of the intersection and union operators of sets. Once function p is defined, we define the overall similarity measure using functions d ABIN and p. Definition 9. Given two cases CASE0 and CASE00 , represented by S 0 ¼ hT 0 ; R0 ; D0 ðT 0 Þi and 00 00 00 00 00 S ¼ hT ; R ; D ðT Þi, the overall similarity is defined as: SABIN : C  C : ! ½0; 1 SABIN ðCASE0 ; CASE00 Þ ¼

1  ðgd ABIN ðS 0 ; S 00 Þ þð1  gÞ pðT 0 ; T 00 ÞÞ

where g 2 ½0; 1 is the distance weight and S 0 ; S 00 are the ABIN networks that represents cases CASE0 and CASE00 , respectively. The similarity function SABIN has the following properties: Property 11 (Reflexivity). Given CASE0 2 C, 0 0 SABIN ðCASE ; CASE Þ ¼ 1, because d ABIN ðS 0 ; S 0 Þ ¼ 0 and pðABIN0 ; ABIN0 Þ ¼ 0. 0

00

0

Property 12. Given CASE ; CASE 2 C, SABIN ðCASE ; CASE00 Þ ¼ SABIN ðCASE00 ; CASE0 Þ. The proof of Property 12 is trivial since d ABIN ðS 0 ; S 00 Þ ¼ d ABIN ðS 00 ; S 0 Þ and pðABIN0 ; ABIN00 Þ ¼ pðABIN00 ; ABIN0 Þ.

4.7. An overall example of similarity evaluation In this section, we show how to evaluate similarity among cases reported in Fig. 2. We consider the CASE_1as a reference case of the clinical workflow, and thus, given the workflow cases of different patients (CASE_2, CASE_3, CASE_4, and

Table 1 Intra-task distance between CASE_1 and CASE_2 CASE_2

d in

T1 T2 T3 T4 T5 T6 T7 T9 P

0 0.20 0 0 0.25 0 0 0 0.45

d in

CASE_5), we could be interested to state which of them is closer to the workflow reference case. The similarity evaluation starts from the computation of the intra-task distance between CASE_1and all the other cases. To this end, we quantify the distance between the performed task of each case with respect to the reference case. Tables 1—4 show the intra-task distance values (d in ) between CASE_1 and the rest of cases, by fixing a ¼ 1. Once we have compared the local properties of the performed tasks (intra-task), the next step is to measure the difference between the order in which tasks were performed in the reference case CASE_1 and the order in the other cases. Let us suppose that, in this medical scenario, the relative order of tasks with respect to the other ones is more relevant than the quantitative aspects. For instance, the fact that the acquisition of blood sample for testing was done before the ECG test is more significant than the fact that between them there is a delay of 5 min. Therefore, parameter b must be near 0. Tables 5—8 Table 2 Intra-task distance between CASE_1 and CASE_3 CASE_3

d in

T1 T2 P

0.5 0 0.5

d in

Table 3 Intra-task distance between CASE_1 and CASE_4 CASE_4

d in

T1 T2 T3 T4 T5 T6 T7 P

0 0 0 0 0 0 0 0

d in

Please cite this article in press as: Combi C, et al. Temporal similarity measures for querying clinical workflows. Artif Intell Med (2008), doi:10.1016/j.artmed.2008.07.013

ARTMED-1026; No of Pages 18

Temporal similarity measures for querying clinical workflows Table 4 Intra-task distance between CASE_1 and CASE_5 CASE_5

d in

T1 T2 T3 T4 T5 T6 T7 T9 P

0.5 0 0 0 0.14286 0 0 0 0.64286

d in

Table 5 Inter-interval distance between CASE_1 and CASE_2 CASE_2 r 1;2 r 2;3 r 3;4 Q q d IN

0 0 0

0 0 0

r 4;5

r 5;6

r 6;7 r 7;9

0.16666 0 0 0 — 0.09090 0.00893 0 0.16666 0.00909 0.00089 0

0 0.00509 0.00051

15

Once we have computed d in and d IN , we can easily obtain a similarity measure between the reference case (CASE_1) and the other cases. Let us assume two arbitrary values g ¼ 0:5 and d ¼ 0:5; the similarity between CASE_1 and cases CASE_2, CASE_3, CASE_4, and CASE_5 can be calculated as reported in the following. The similarity measure between CASE_1 and CASE_2 is: 0:45 0:17714 þ 0:5 8 7 ¼ 0:04078

d ABIN ðCASE 1; CASE 2Þ ¼ 0:5

pðCASE 1; CASE 2Þ jfT1; T2; T3; T4; T5; T6; T7; T9gn 0 T1; T2; T3; T4; T5; T6; T7; T9gj ¼0 ¼ ¼ 16 j8j þ j8j SABIN ðCASE 1; CASE 2Þ ¼ 1  ð0:5  d ABIN ðCASE 1; CASE 2Þ þ 0:5  pðCASE 1; CASE 2ÞÞ

Table 6 Inter-interval distance between CASE_1 and CASE_3

The similarity measure between CASE_1 and CASE_3 is:

CASE_3

r 1;2

r 2;1

Q q d IN

0.66667 — 0.66667

0.66667 — 0.66667

Table 7 Inter-interval distance between CASE_1 and CASE_4 CASE_4 r 1;2 r 1;3 Q q d IN

1 — 1

r 2;3

r 2;1 r 3;4 r 4;5 r 5;6 r 6;7

0 0 1 0.66667 0.25 — 0.06667 0.025 1

0 0 0

0 0 0

0 0 0

0 0 0

Table 8 Inter-interval distance between CASE_1 and CASE_5 CASE_5 r 1;2 Q q d IN

r 2;3 r 3;4 r 4;5 r 5;6

0.33333 0 — 0 0.33333 0

0 0 0

0 0 0

r 6;7 r 7;9

0 0 0.00444 0 0.00044 0

¼ 1  ð0:5  0:04078 þ 0:5  0Þ ¼ 0:97961

0 0 0

show the inter-interval distance values of the considered example between CASE_1 and the other cases, by fixing b ¼ 0:1. In Tables 5—8, r i;k stands for corresponding relations (for example, r1;2 stands for the relation between tasks T1 and T2).

0:5 1:33334 þ 0:5 2 2 ¼ 0:45834

d ABIN ðCASE 1; CASE 3Þ ¼ 0:5

pðCASE 1; CASE 3Þ jfT1; T2; T3; T4; T5; T6; T7; T9gnfT1; T2gj ¼ j8j þ j2j 6 ¼ 0:6 ¼ 10 SABIN ðCASE 1; CASE 3Þ ¼ 1  ð0:5  d ABIN ðCASE 1; CASE 3Þ þ0:5  pðCASE 1; CASE 3ÞÞ ¼ 1  ð0:5  0:45834 þ 0:5  0:6Þ ¼ 0:47083 The similarity measure between CASE_1 and CASE_4 is: 0 2:09167 d ABIN ðCASE 1; CASE 4Þ ¼ 0:5 þ 0:5 7 8 ¼ 0:13073

pðCASE 1; CASE 4Þ ¼

2 ¼ 0:125 16

Please cite this article in press as: Combi C, et al. Temporal similarity measures for querying clinical workflows. Artif Intell Med (2008), doi:10.1016/j.artmed.2008.07.013

ARTMED-1026; No of Pages 18

16

C. Combi et al.

SABIN ðCASE 1; CASE 4Þ ¼ 1  ð0:5  d ABIN ðCASE 1; CASE 4Þ þ 0:5  pðCASE 1; CASE 4ÞÞ ¼ 1  ð0:5  0:12656 þ 0:5  0:125Þ ¼ 0:87422 The similarity measure between CASE_1 and CASE_5 is: d ABIN ðCASE 1; CASE 5Þ ¼ 0:5

0:64286 0:33377 þ 0:5 8 7

¼ 0:06402

pðCASE 1; CASE 5Þ ¼

1 ¼ 0:05882 17

SABIN ðCASE 1; CASE 5Þ ¼ 1  ð0:5  dðCASE 1; CASE 5Þ þ 0:5  pðCASE 1; CASE 5ÞÞ ¼ 1  ð0:5  0:06402 þ 0:5  0:05882Þ ¼ 0:93858 We can conclude, therefore, that CASE_2 is the closest case to CASE_1 since it has obtained the highest value of similarity (0.97961). This result seems reasonable since both workflow cases followed the same path of the workflow schema and have a similar performance. Other observations could be derived from this example. For instance, the second most similar case is CASE_5 because, unlike CASE_4, its performed tasks have practically the same order than CASE_1. Finally, CASE_3 is the less similar case since it followed radically different paths of the workflow schema.

5. Discussion and conclusions This work deals with the representation of clinical processes by using temporally extended workflow modeling techniques. In particular, we propose an approach to evaluate the temporal similarity between workflow cases representing different applications of the same guideline. The similarity measure proposed in this paper provides a simple but powerful way to compare workflow cases by the use of temporal constraint networks, providing explicit temporal information about interval distances. Therefore, we can propose similarity measures based on temporal relations of temporal constraint networks. Related proposals in the literature deal with event sequences [12—14], or interval sequences, as in the case of the quantitative approach proposed in [19]. The authors state a direct comparison between intervals of two sequences by the overall

sum of the offsets of intervals (comparing the start and end components). Unlike our proposal, these sequence similarity approaches do not consider (or consider partially) the relative order position of intervals within the overall sequence. In [24], a temporal similarity measure between time dependency patterns for clinical pathways was proposed for the stroke management domain. The described similarity function measures only the cardinality of common edges between time dependency networks. In our proposals, the similarity measure covers this aspect and also considers the presence and duration of tasks represented as nodes. We present the similarity function as a linear combination of functions to differentiate intra-task distance (matching and comparing individually task intervals) and inter-task distance (considering relative positions with respect to other task intervals). These formulae can be easily configured by only four parameters (a, b, g, and d), allowing a flexible way to set up the quantitative and qualitative aspects independently. Parameters a and b represent relevance of qualitative and quantitative aspects, respectively. In particular, a states the importance of durations when considering corresponding tasks at the expense of other atemporal factors. The parameter b indicates the relevance of the quantitative similarity, when we have the same qualitative relation between corresponding tasks. For instance, in some medical situations the fact that one symptom is present before another one could be more relevant than the exact number of minutes or hours between them (or vice versa). Parameters d and g regard a more general criterion. The parameter d states whether the direct comparison between intervals (durations) is more relevant than the order within the task interval sequence of the case. For instance, let us suppose d ¼ 1: thus, our proposal is similar to those proposals in the literature that only consider the direct comparison of corresponding task intervals to measure the similarity. Finally, the parameter g is used to weight the fact that the two considered cases have non-matching tasks. In this sense, the use of weights is a strategy broadly used in many similarity-based approaches, such as in case-based reasoning or information retrieval [30,31]. This strategy provides the capability to modify the function behaviour depending on the knowledge domain. The main disadvantage is that the weight configuration demands a knowledge acquisition process; especially the direct weight assignment by physicians is very costly in terms of resources and time. This work focuses on the definition of similarity measures, and weight settings experiments in a concrete domain have not been

Please cite this article in press as: Combi C, et al. Temporal similarity measures for querying clinical workflows. Artif Intell Med (2008), doi:10.1016/j.artmed.2008.07.013

ARTMED-1026; No of Pages 18

Temporal similarity measures for querying clinical workflows carried out. Tuning this parameters in real clinical settings deserves further investigation and a strong involvement of clinicians. Another relevant aspect of our approach is the potential capability of managing and inferring temporal knowledge from the ABIN network. Moreover, the use of temporal constraint networks also provides a flexible representation for evaluating the similarity of uncompleted and imprecise descriptions of a temporal scenario (e.g., when it is not possible to obtain a crisp duration of tasks in a workflow case). In [20], temporal constraint networks represent clinical scenarios and the consistency of the fusion of networks (incompatible, compatible, or satisfactory) is used for a qualitative similarity evaluation. In this sense, our proposal also covers this aspect, considering the absence of some tasks in the scenario. In our proposal, the ABIN network is reduced into a time point constraint network when consistency and inferring capabilities are required. The translation of intervals into points (beginning and ending points of the intervals) is a classical approach to reduce the algorithmic complexity of many interval problems [21]. Therefore, for each pair of intervals that must be compared, four time points must be considered. This fact implies a sensible increment of the number of relations in the time point constraint network. However, using this approach we are able to reuse a wealth of algorithms to check consistency and derive the minimal network in a polynomial order [28]. The ABIN network only allows a reduced number of interval relations between its elements in order to deal with a tractable interval problem. Despite ABIN network is less expressive than the whole IA, it is expressive enough to describe any kind of workflow case (sequence of crisp intervals) for similarity measure purposes. It is worth mentioning that in those medical situations, far from our case of study (workflows), where the interval of cases includes vague information, the ABIN network is also expressive enough to represent fuzziness by the use of imprecise values of the metric (unary and binary) relations. For instance, the ABIN network could be used to represent a generic path, performed by a workflow case (i.e., subsets of the workflow schema without branches). An immediate practical application of this expressiveness is the representation of both workflow cases and generic paths for similarity purposes, which is part of our future research. The main focus of this paper is related to some important methodological issues, which need to be suitably solved before the design and implementa-

17

tion of clinical systems for managing and retrieving clinical process data. This proposal is a basic step towards the use of a formal approach to evaluate similarity between workflow cases representing clinical processes. This means that real data coming from clinical domains are needed to apply this proposal to a concrete medical domain such as the one described in the SPREAD guideline. Other future work will focus on the description of specific temporal constraint network models to obtain more efficient similarity functions and their evaluation in concrete medical domains.

Acknowledgements This work has been partially supported by contributions from the Spanish MEC under the FPU National Plan (Grant Ref. AP2003-4476), the National Project TIN2006-15460-C04-01 and the PETRI project PET2006-406, and the Department of Computer Science of the University of Verona.

References [1] Adlassnig KP, Combi C, Das AK, Keravnou ET, Pozzi G. Temporal representation and reasoning in medicine: research directions and challenges. Artif Intell Med 2006;38(2):101— 13. [2] Combi C, Gozzi M, Jua ´rez JM, Oliboni B, Pozzi G. Conceptual modeling of temporal clinical workflows. In: Goranko V, Wang XS, editors. Proceedings of the 14th International Symposium on Temporal Representation and Reasoning (TIME 2007). Los Alamitos: IEEE Computer Society; 2007 . p. 70—81. [3] Panzarasa S, Stefanelli M. Workflow management systems for guideline implementation. Neurol Sci 2006;27:245—9. [4] Peleg M, Tu S, Bury J, Ciccarese P, Fox J, Greenes RA. Comparing computer-interpretable guideline models: a case-study approach. J Am Med Inform Assoc: JAMIA 2003;10(1). [5] Quaglini S, Ciccarese P. Models for guideline representation. Neurol Sci 2006;27. [6] Combi C, Gozzi M, Jua ´rez JM, Marı´n R, Oliboni B. Querying clinical workflows by temporal similarity. In: Bellazzi R, AbuHanna A, Hunter J, editors. Artificial Intelligence in Medicine, 11th Conference on Artificial Intelligence in Medicine, AIME 2007, Proceedings, volume 4594 of Lecture Notes in Computer Science. Berlin: Springer; 2007. p. 469—78. [7] The Stroke Prevention and Educational Awareness Diffusion (SPREAD) Collaboration. The Italian guidelines for stroke prevention. Neurol Sci 2000;21. [8] Workflow Management Coalition and David Hollingsworth. The workflow reference model. http://www.wfmc.org/ standards/framework.htm(Accessed 22 July 2008); 1995. [9] Panzarasa S, Quaglini S, Micieli G, Marcheselli S, Pessina M, Pernice C. Improving compliance to guidelines through workflow technology: implementation and results in a stroke unit. In: Leong TY, editor. Proceedings of the 12th World Congress on Health (Medical) Informatics;

Please cite this article in press as: Combi C, et al. Temporal similarity measures for querying clinical workflows. Artif Intell Med (2008), doi:10.1016/j.artmed.2008.07.013

ARTMED-1026; No of Pages 18

18

[10]

[11]

[12]

[13]

[14]

[15]

[16] [17] [18]

[19]

C. Combi et al. Building Sustainable Health Systems. Amsterdam: IOS Press; 2007 . p. 834—9. Keogh E, Chakrabarti K, Pazzani M, Mehrotra S. Dimensionality reduction for fast similarity search in large time series databases. Knowl Inf Syst 2001;3(3):263—86. Roddick JF, Spiliopoulou M. A survey of temporal knowledge discovery paradigms and methods. Knowl Data Eng 2002;14(4):750—67. Agrawal R, Faloutsos C, Swami AN. Efficient similarity search in sequence databases. In: Lomet D, editor. Proceedings of the 4th International Conference of Foundations of Data Organization and Algorithms (FODO). Berlin: Springer; 1993. p. 69—84. Kahveci T, Singh AK, Gurel A. Similarity searching for multiattribute sequences. In: Proceedings of the 14th International Conference on Scientific and Statistical Database Management (SSDBM02). Los Alamitos: IEEE Computer Society; 2002. p. 175. Mannila H, Moen P. Similarity between event types in sequences. In: Mohania MK, Tjoa AM, editors. DaWaK’99: Proceedings of the First International Conference on Data Warehousing and Knowledge Discovery, volume 1676 of Lecture Notes in Computer Science. Berlin: Springer; 1999. p. 271—80. Chittaro L, Combi C. Visualizing queries on databases of temporal histories: new metaphors and their evaluation. Data Knowl Eng 2003;44(2):239—64. Shahar Y, Musen MA. Knowledge-based temporal abstraction in clinical domains. Artif Intell Med 1996;8(3):267—98. Ho ¨ppner F, Klawonn F. Finding informative rules in interval sequences. Intell Data Anal 2002;6(3):237—55. Villafane R, Hua KA, Tran D, Maulik B. Knowledge discovery from series of interval events. J Intell Inf Syst 2000;15(1): 71—89. Yi BK, Roh JW. Similarity search for interval time sequences. In: Lee YJ, Li J, Whang KY, Lee D, editors. Database Systems for Advances Applications, 9th Interna-

[20]

[21]

[22] [23]

[24] [25]

[26] [27]

[28] [29] [30] [31]

tional Conference, DASFAA 2004, Proceedings, volume 2973 of Lecture Notes in Computer Science. Berlin: Springer; 2004 . p. 232—43. Dojat M, Ramaux N, Fontaine D. Scenario recognition for temporal reasoning in medical domains. Artif Intell Med 1998;14(1—2):139—55. Vilain M, Kautz H. Constraint propagation algorithms for temporal reasoning. In: Proceedings of the 5th National Conference on Artificial Intelligence (AAAI-86). San Francisco: Morgan Kaufmann; 1986. p. 377—82. Allen JF. Maintaining knowledge about temporal intervals. Commun ACM 1983;26(11):832—43. Pujari AK, Sattar A. A new framework for reasoning about points, intervals and durations. In: Dean T, editor. Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence, IJCAI 99. San Francisco: Morgan Kaufmann; 1999. p. 1259—67. Lin F, Chou S, Pan S, Chen Y. Mining time dependency patterns in clinical pathways. Int J Med Inform 2001;62: 11—25. Combi C, Pozzi G. Architectures for a temporal workflow management system. In: Haddad H, Omicini A, Wainwright RL, Liebrock LM, editors. Proceedings of the 2004 ACM Symposium on Applied Computing (SAC). New York: ACM; 2004. p. 659—66. van Beek P, Cohen R. Exact and approximate reasoning about temporal relations. Comput Intell 1990;6(3):132—44. Nebel B, Bu ¨rckert HJ. Reasoning about temporal relations: a maximal tractable subclass of Allen’s interval algebra. J ACM 1995;42(1):43—66. Dechter R, Meiri I, Pearl J. Temporal constraint networks. Artif Intell 1991;49(1—3):61—95. Freksa C. Temporal reasoning based on semi-intervals. Artif Intell 1992;54(1):199—227. Finnie G, Sun Z. Similarity and metrics in case-based reasoning. Int J Intell Syst 2002;17:273—87. Pal SK, Shiu SCK. Foundations of soft case-based reasoning. San Francisco: John Wiley & Sons; 2004.

Please cite this article in press as: Combi C, et al. Temporal similarity measures for querying clinical workflows. Artif Intell Med (2008), doi:10.1016/j.artmed.2008.07.013

Lihat lebih banyak...

Comentários

Copyright © 2017 DADOSPDF Inc.