Querying Clinical Workflows by Temporal Similarity

June 16, 2017 | Autor: Matteo Gozzi | Categoria: Artificial Intelligence, Health Care Quality, Temporal Constraints
Share Embed


Descrição do Produto

Querying Clinical Workflows by Temporal Similarity Carlo Combi1 , Matteo Gozzi1 , Jose M. Juarez2 , Roque Marin2 , and Barbara Oliboni1 1

2

Department of Computer Science – University of Verona – Italy {combi|gozzi|oliboni}@sci.univr.it Dept. of Information and Communication Engineering – Universidad de Murcia – Spain {jmjuarez|roque}@dif.um.es

Abstract. The degree of fulfillment of clinical guidelines is considered a key factor when evaluating the quality of a clinical service. Guidelines can be seen as processes describing the sequence of activities to be done. Consequently, workflow formalisms seem to be a valid approach to model the flow of actions in the guideline and their temporal aspects. The application of a guideline to a specific patient (guideline instance) can be modeled by means of a workflow case. The best (worst) application of a guideline, represented as a reference workflow case, can be used to evaluate the quality of the service, by comparing the optimal case with specific patient instances. On the other hand, the correct application of a guideline to a patient involves the fulfillment of the guideline temporal constraints. Thus, the evaluation of the temporal similarity degree between different workflow cases is a key aspect in evaluating health care quality. In this work, we represent a portion of the stroke guideline using a temporal workflow schema and we propose a method to evaluate the temporal similarity between workflow cases. Our proposal, based on temporal constraint networks, consists of a linear combination of functions to differentiate intra-task and inter-task temporal distances.

1 Introduction In the past years, clinical guidelines have received an increasing attention in the medical community, but also in the academic context for research issues related to clinical guidelines modeling [9]. Clinical guidelines describe, in natural language, the recommended behaviour of a medical team, the activities to apply to the patient, and their fulfillment with respect to the time and to the the state of patient health, for defining the best way to manage patients. The number of clinical guidelines, covering almost all major branches of medicine, is growing up, together with their updates delivered after regular reviews and new scientific discoveries. In this work we consider the Italian guideline for stroke prevention and management (SPREAD) [8]. This guideline aims to provide knowledge and recommendations about primary and secondary prevention of stroke in clinical practice. The diffusion of guidelines in an electronic form is spreading out, and allows the physicians to compare and evaluate clinical guidelines coming from different countries but focusing on the same clinical activities. Despite the diffusion of electronic versions 

This work was partially supported by the Spanish MEC under the FPU national plan (grant ref. AP2003-4476) and the national projects TIC2003-09400-C04 / TIN2006-15460-C04-01.

R. Bellazzi, A. Abu-Hanna, and J. Hunter (Eds.): AIME 2007, LNAI 4594, pp. 469–478, 2007. c Springer-Verlag Berlin Heidelberg 2007 

470

C. Combi et al.

of guidelines, their consultation and interpretation may be very difficult. Due to the fact that the textual version can be very long, some information can be distributed along the document, and the sequence of medical activities can be difficult to understand, and may be ambiguous. For these reasons, issues related to the formal representation of guidelines have been considered by several research teams [7,9]. On one hand, in the clinical context, guidelines describe a sequence of activities to be done. On the other hand, in the business context, a business process can be defined as a description of tasks and consists of subprocesses, decisions and activities. In both cases, a sequence of activities must be done to reach a (given) goal, in the former case to manage in a correct way the patient situation, while in the latter case to satisfy the business needs. This means that guidelines can be seen as processes, and can be managed by means of business modeling tools such as Workflow Management Systems (WfMS) [7]. In general, Workflow Management Systems (WfMSs) allow one to specify, control, and coordinate the flow of work cases (sequences of activities which form a business process). In the clinical workflow context, an important aspect to consider is time: activities described in a guideline must be done satisfying coordination rules expressing constraints with respect to time. The rules a process has to follow are described in the workflow schema and must be satisfied by the instances (cases) of the process itself. A workflow schema describes the structure of the cases with respect to the coordination of the activities (parallel activities, activity sequences, total/partial fork, and total/partial join). Moreover, the schema may contain qualitative and quantitative temporal constraints. Workflow cases, instances of the same workflow schema, can be different with respect to the structure, i.e. to the activities composing the cases, and to their (temporal) order and length. When a workflow schema represents a guideline, its cases represent different instances deriving from the application of the guideline to different patients in different situations. The best (worst) application of the guideline can be represented by means of a workflow case and can be used to evaluate the quality of the service comparing the similarity between the optimal case and the other clinical cases. Moreover, a given case, representing something interesting, can be used to retrieve a particular class of cases similar to the given one. Thus, information retrieval can be done evaluating the similarity between workflow cases. The evaluation of temporal similarity seems to be an important issue in the clinical context, where instances (cases) are slightly different accordingly to the patient situation. In this work, we propose an approach to evaluate temporal similarity between instances (cases) of the workflow schema by the use of temporal constraint networks, and, as a motivating scenario, we represent a portion of the SPREAD guideline by means of a workflow schema. The structure of the paper is as follows: Section 2 reviews some of the most sound approaches to temporal similarity for clinical scenarios. Section 3 describes a portion of the considered guideline represented by a workflow model and the execution of its tasks. Section 4 describes the main aspects of our approach to evaluate similarity between workflow cases. Finally, this paper offers the conclusions and future work.

Querying Clinical Workflows by Temporal Similarity

471

2 Related Work In clinical workflows, and in most models for clinical scenarios, it is fundamental to find the best way for representing time, processing temporal data, and comparing temporal information by similarity techniques. In general, there are two kinds of medical temporal data: time series (biosignals), and temporal sequences (time-stamped clinical data). Time series similarity proposals usually work with raw time series data (e.g. ECG or EEG directly obtained from monitoring) and aim to derive the most representative features from a large amount of data [5]. Some of the most successful strategies are based on the dimensionality reduction (Discrete Fourier Transform, Discrete Wavelet, Time Warping Transformations), in order to obtain a feature vector or model parameters [5]. Temporal sequences are collections of occurrences of different event types, as, for example, the set of test results of a patient during a week in the Intensive Care Unit. Occurrences are usually associated to single time points. In [6], the similarity evaluates the relative position of an event occurrence within a window context. That is, event occurrences are similar if they occur in a similar context, and contexts are defined as the set of events happening within a predefined time window. Furthermore, it is common to find sequences composed by facts holding on intervals, such as the description of protocols, treatments, patient symptoms, or parameters abstracted from biosignals (e.g., ST-segment elevation of an ECG). This kind of temporal data is called interval sequence and it is of increasing interest in many research fields, as in temporal clinical abstraction [2,10]. Focusing on the few proposals dealing with similarity for interval sequences, in [12], the authors discuss different solutions for defining similarity between two temporal sequences, according to the distance between their composing intervals. In [3], the authors consider the issue of recognizing similar clinical scenarios, composed by both events (point-based) and facts (interval-based); as the scenarios are represented through temporal constraint networks, similarity is led back to the fusion of both networks. Temporal Constraint Networks are a powerful approach for representing and querying temporal information. They are represented as a Constraint Satisfaction Problem (CSP) [11], where variables denote event types and constraints represent the temporal relations amongst them. The interval algebra (IA), introduced by James Allen to represent and manage interval relations [1], is one of the most considered models and has obtained a large number of theoretical results.

3 A Motivating Problem In this paper, we consider the problem of properly managing a patient possibly having a stroke. In particular we focus on the Italian Guideline for Stroke Prevention and Management [8]. We will represent the suitable portion of the considered guideline by using a temporally extended workflow model, which allows one to simply show the required clinical tasks (i.e., activities), the flow of the execution of tasks, and the temporal constraints on them. Let us consider the following fragments of the guideline: “Synthesis 9.1: A stroke victim should rapidly be assessed after hospitalization (T1), by means of a general examination and [...]

472

C. Combi et al.

Recommendation 9.1 and 9.2: an early and standardized neurological evaluation (T2) is recommended in the setting of a qualitatively adequate management of acute stroke (Cond1). Recommendation 9.4: [...] the following blood exams are recommended: complete blood count including platelets (T3), [...], and coagulation tests (T4) [...] Recommendation 9.6: The electrocardiogram (T5) is recommended in all suspected stroke victims who are admitted to an Emergency Room (Cond2).” In Figure 1 we show the considered portion of the guideline by means of a workflow schema, where boxes represent tasks (T1, T2, ...), ovals represent connectors specifying different possible flows (Cond1, AND, ...), arrows associate successive tasks/connectors, and a double line is for the start point of the workflow. This representation is enriched by additional temporal information, such as minimum and maximum duration of a task (e.g., [1,2] is the allowed interval for duration of task T3) or the minimum and maximum delay between two consecutive tasks. According to the specified schema, there are several possible workflow instances (hereinafter cases), which correspond to the clinical treatment of different patients. Figure 2 depicts three different cases for the discussed workflow schema: each case is represented as a sequence of intervals labeled by the corresponding task name on the timeline having the start of the case as origin.

Fig. 1. The workflow schema of the considered guideline portion enriched by additional temporal information

In general, cases are only temporally constrained by the specification of the workflow schema. Thus, cases of the same schema could differ with respect to the order and duration of tasks, and with respect to the presence of different tasks due to alternative paths. For instance, the first and the second cases in Figure 2 differ on the order between tasks T3 and T4, while the first (second) case and the third one differ on tasks, due to the alternative paths induced by the connector Cond1. Note that the correct application of clinical guidelines and protocols is considered a quality of service indicator. In order to measure this indicator, one essential factor is the temporal dimension. Thus, according to the given temporal scenario, a huge amount of cases for the same guideline will be stored by a hospital stroke unit. Querying and analyzing this database is, thus, extremely important for several clinical applications: for example, to evaluate the quality of the provided care, we could compare the similarity between the best case and the real clinical cases in the database. Moreover, a

Querying Clinical Workflows by Temporal Similarity

473

Fig. 2. Three examples of workflow cases

given case, representing something clinically interesting, may be used to retrieve a set of cases similar to the given one. A proper definition of (temporal) similarity for cases needs to be deeply studied.

4 A Similarity Proposal for Clinical Cases In this section, we propose an approach to evaluate the similarity between two clinical (workflow) cases considering: (i) the comparison between the corresponding performed tasks; (ii) the comparison between the qualitative/quantitative temporal relations between corresponding tasks; and (iii) the presence/absence of some task. Our proposal compares workflow cases by means of an interval similarity function and is based on the following steps: 1. express both clinical cases through interval constraint networks; 2. evaluate the intra-task distance, i.e. the distance between intervals representing corresponding tasks; 3. evaluate the inter-task distance, i.e. the distance between the relations between corresponding tasks; 4. compute the overall similarity, by considering possible dissimilarities of cases with regard to the occurring tasks. In the following, we will present and discuss the details of each step. A workflow case (C) is a set of labeled task intervals. A labeled task interval (t) is a triple (taskN ame, t− , t+ ): taskN ame is a task label (e.g. T1, T2, T3), and t− , t+ are timestamps describing the beginning and ending time of the task interval. Moreover, C is an ordered set of task intervals: + − + − + C = {(taskN ame1 , t− 1 , t1 ), (taskN ame2 , t2 , t2 ), . . . , (taskN amen , tn , tn )} − where ∀ti ∈ C, t− i ≤ ti+1 , i = 1, . . . , n − 1 Temporal constraint networks are a temporal modeling approach that provides an explicit representation of temporal relations for a given scenario. This is an advantage when two temporal scenarios must be compared. Moreover, these constraints can

474

C. Combi et al.

also contain different aspects of the temporal information (e.g. quantitative/qualitative, crisp/fuzzy) enriching its description and providing a flexible representation for different purposes. The first step, therefore, of our approach is to obtain for each clinical workflow case a temporal constraint network. The considered network is composed by nodes and edges: nodes represent task intervals, while edges stand for qualitative relations, enriched by some quantitative information, between two task intervals. Nodes are labeled by the corresponding task name and by the related (upper and lower bounds of) interval durations. Multiple instances of a single task (i.e., from a loop in the workflow schema) are labeled by different identifiers (e.g. instances of task T1 would be named T1.1, T1.2, and so on). Edges are labeled by a single Allen’s interval relation enriched with some quantitative data. Each task interval has a corresponding node in the network, while edges are introduced only for relations between each task ti and its successive one ti+1 . For example, Figure 3 depicts the three networks corresponding to the cases reported in Figure 2: as for CASE 1, tasks T1 and T2 are represented by labeled nodes and their relation is represented as a directed edge, labeled by b, standing for the relation before, and by the interval [1,1], describing the (minimum and maximum) delay between the end of T1 and the beginning of T21 . Once the workflow cases are translated into temporal networks, in order to perform a comparison between cases, we need to establish a correspondence between nodes and edges of two different networks. As the task names univocally identify tasks within a case, the correspondence between nodes is built through task names; the correspondence between edges is built up on the correspondence of the connected nodes; when two connected nodes are consecutive in a case and are not consecutive in the other one, we need to derive the missing edge. For example, in Figure 3 derived edges are represented through dashed edges.

Fig. 3. The temporal networks obtained from the workflow cases of Figure 2

The intra-task distance provides a direct method to evaluate the similarity of corresponding tasks, with respect to their durations and to other atemporal features within the workflow process (such as the agent that performed the task...). 1

Note that the network corresponding to a case has no uncertainty for task durations and delays, and therefore the given ranges are redundant. We maintain this redundancy in the graphical notation to adhere to the usual notation for temporal networks and to adopt the related algorithms.

Querying Clinical Workflows by Temporal Similarity

475

The intra-task distance (din ) is based on the duration of the task interval (Dt = (t+ − t− )). Given two corresponding tasks t and t , having the same task name, the intra-task distance function is defined as follows: din (t , t ) = α

|(Dt − Dt )| + (1 − α)dwf (t , t ) |Dt | + |Dt |

where α ∈ [0, 1] is the weight of the duration in the function. The distance function dwf measures other workflow parameters not related to the temporal dimension. Table 1 shows the intra-task distance values of the problem example between the cases CASE 1 and CASE 2, with α = 1. Between the cases CASE 1 and CASE 3 (α = 1), din (T1CASE 1 , T1CASE 3 ) = 0. Table 1. Example of the intra-task distance between CASE 1 and CASE 2

din

 T1 T2 T3 T4 T5 din 0 0 1/3 0 3/7 0.7619

After considering intra-task distances, we have to take into consideration the intertask similarity. It deals with similarities for temporal relations between corresponding tasks. To this end, we define an inter-task distance function (dIN ), which takes into account both quantitative and qualitative components of the edge labels in the temporal network, by using functions q and Q, respectively. Given two corresponding relations r , r in two different workflow clinical cases, the inter-task distance function is defined as follows:  Q(v  , v  ) if Q(v  , v  ) > 0   dIN (r , r ) = βq(m , m ) if Q(v  , v  ) = 0 where r = (v  , m ) (r = (v  , m )) represents the relation between the tasks of the first (second) case, through a qualitative value v  (v  ), i.e., one of the Allen’s relations, and a quantitative value m (m ), i.e., the distance between the end time of the first task and the beginning of the second one. β is a weight for the quantitative component within the function. The distance between two Allen’s relations is evaluated according to the distance of the considered relations on one of the neighbour graphs proposed by Freksa in [4]: two interval relations between the same intervals are neighbours if it is possible to directly move from one relation to the other one, by continuously deforming the intervals (i.e. shortening, moving...). For example, if we have a before relations between two intervals, we can move from before to meets, by simply moving the first interval to be contiguous to the second one: in this case, the distance between before and meets is 1. In this work, we adopted, without loss of generality, the A-neighbours graph (a particular neighbour graph obtained by fixing 3 of the 4 bounds of the two intervals), as depicted in Figure 4.

476

C. Combi et al.

Fig. 4. The A-neighbours graph proposed by Freksa and an example of the Q function

The function Q evaluating the distance between two qualitative temporal relations and the function q, considering the quantitative part, are defined as: Q(v  , v  ) =

path(v  , v  , G) |m − m | , q(m , m ) = max({path(·, ·, G)}) |m + m |

where G is the chosen neighbours graph, path(v  , v  , G) stands for the length of the shortest path in G between relations v  and v  , and normalization is performed with respect to the longest path among the shortest ones in G (i.e., the maximum value of the function path(x, y, G) for any couple of nodes (x, y) in G). Table 2 shows the inter-interval distance values of the problem example between the cases CASE 1 and CASE 2 where β = 0.1. Table 2. Example of the inter-interval distance. ri,k stands for corresponding relations (for example, r1,2 stands for the corresponding relations between tasks T1 and T2).

Q q dIN

r1,2 0 0 0

r2,3 1/6 1/6

r2,4 0 0 0

r3,4 2/3 2/3

r3,5 0 2/7 2/70

r4,3 2/3 2/3

r4,5 0 0 0

In general, the similarity is inversely proportional to the distance between the elements (i.e., tasks and temporal relations), to be compared. Until now, we have defined the concept of distance between corresponding elements. In our case, when defining the overall similarity between two clinical workflow cases, we have to consider also the presence of tasks in one workflow case without corresponding tasks in the other workflow case. Such a presence of non corresponding tasks is suitably represented in the overall similarity function we will define for clinical workflow cases, by the function p.

Querying Clinical Workflows by Temporal Similarity

477

Given two workflow cases C  and C  , represented by sets T  , T  of tasks, and by sets R , R of temporal relations in the corresponding temporal networks, the overall similarity is defined as: similarity(C  , C  ) = 1 − (γd(C  , C  ) + (1 − γ)p(C  , C  )) where γ ∈ [0, 1] is the weight for distance similarities and       t ∈T  ,t ∈T  din (t , t ) r  ∈R ,r  ∈R dIN (r , r )   d(C , C ) = δ + (1 − δ)     |T ∩ T | |R ∩ R | while p(C  , C  ) =

|(T  ∪ T  )\(T  ∩ T  )| |T  | + |T  |

 Note that the din must calculate the inter-interval distance avoiding to measure redundant temporal information. For instance, in Figure 2 nodes i3 and i4 (CASE 1 and CASE 2) have both the temporal redundancy of r34 and r43 , obviously having the same dIN value (see Table 2). In that particular cases, the d function only considers one of the two constraints (ignoring the inverse). In the motivating example, assuming that γ = 0.5 and δ = 0.5, the similarity between cases CASE 1 and CASE 2 can be calculated given din and dIN (results calculated previously in this paper) as follows: 0.8619 0.761904 + 0.5 = 0.16238 5 5 0 p(CASE 1, CASE 2) = =0 5+5

d(CASE 1, CASE 2) = 0.5

similarity(CASE 1, CASE 2) = 1 − (0.5d(CASE 1, CASE 2)) = 0.9188 And the similarity measure between CASE 1 and CASE 3 is:

d(CASE 1, CASE 3) = 0 , p(CASE 1, CASE 3) =

{T1, T2, T3, T4, T5}\{T1} = 2/3 5+1

similarity(CASE 1, CASE 3) = 1 − (0.5p) = 0.66

5 Discussion and Conclusions This work deals with the representation of clinical guidelines by using temporally extended workflow modeling techniques. In particular, we propose an approach to evaluate the temporal similarity between workflow cases representing different applications of the same guideline. The similarity measure proposed in this paper provides a simple but powerful way to compare workflow cases by using temporal constraint networks, providing explicit temporal information about interval distances. We propose a general method that can be also applied for non medical applications; however, its use in clinical domains is essential due to the importance of the temporal dimension in many clinical procedures.

478

C. Combi et al.

Related proposals in the literature concern about event sequences [6], or interval sequences, like in [12]. Unlike our proposal, these sequence similarity approaches do not consider (or consider partially) the relative order position of intervals within the overall sequence. Another relevant aspect of our approach is the potential capability of managing and inferring temporal knowledge by inferring temporal information from the temporal network. Moreover, the use of temporal constraint networks also provides a flexible representation for evaluating the similarity of uncompleted and imprecise descriptions of a temporal scenario (e.g. when it is not possible to obtain a crisp duration of the tasks in a workflow case). In [3], temporal constraint networks represent clinical scenarios and the consistency of the fusion of networks (incompatible, compatible, or satisfactory) is used as a qualitative similarity evaluation. In this sense, our proposal also covers this aspect but considering also the absence of some tasks in the scenario. Our future work will focus on the description of specific temporal constraint network models to obtain an efficient similarity function and its evaluation in a concrete medical domain.

References 1. Allen, J.F.: Maintaining knowledge about temporal intervals. Communications of the ACM 26, 832–843 (1983) 2. Chittaro, L., Combi, C.: Visualizing queries on databases of temporal histories: new metaphors and their evaluation. Data Knowl. Eng. 44(2), 239–264 (2003) 3. Dojat, M., Ramaux, N., Fontaine, D.: Scenario recognition for temporal reasoning in medical domains. Artificial Intelligence in Medicine 14(1-2), 139–155 (1998) 4. Freksa, C.: Temporal reasoning based on semi-intervals. Artificial Intelligence 54(1), 199– 227 (1992) 5. Keogh, E., Chakrabarti, K., Pazzani, M., Mehrotra, S.: Dimensionality reduction for fast similarity search in large time series databases. Knowledge and Information Systems 3(3), 263–286 (2001) 6. Mannila, H., Moen, P.: Similarity between event types in sequences. In: DaWaK ’99: Proceedings of the First International Conference on Data Warehousing and Knowledge Discovery, London, UK, pp. 271–280. Springer, Heidelberg (1999) 7. Panzarasa, S., Stefanelli, M.: Workflow management systems for guideline implementation. Neurological Sciences 27 (2006) 8. The Stroke Prevention and Educational Awareness Diffusion (SPREAD) Collaboration. The italian guidelines for stroke prevention. Neurological Sciences, 21 (2000) 9. Quaglini, S., Ciccarese, P.: Models for guideline representation. Neurological Sciences 27 (2006) 10. Shahar, Y., Musen, M.A.: Knowledge-based temporal abstraction in clinical domains. Artificial Intelligence in Medicine 8(3), 267–298 (1996) 11. Vilain, M., Kautz, H.: Constraint propagation algorithms for temporal reasoning. In: Proceedings of the National Conference on Artificial Intelligence (AAAI-86), USA, vol. 6, pp. 132–144 (1986) 12. Yi, B.-K., Roh, J.-W.: Similarity search for interval time sequences. In: DASFAA, vol. 2973, pp. 232–243 (2004)

Lihat lebih banyak...

Comentários

Copyright © 2017 DADOSPDF Inc.