PASTEL: A Semantic Platform for Assisted Clinical Trial Patient Recruitment

May 25, 2017 | Autor: Tim Van Den Bulcke | Categoria: Translational Research, Semantic Technology, Internet

Descrição do Produto

See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/261430165

PASTEL: A Semantic Platform for Assisted Clinical Trial Patient Recruitment Conference Paper · September 2013 DOI: 10.1109/ICHI.2013.39

CITATIONS

READS

0

35

4 authors: David Damen

Kim Luyckx

9 PUBLICATIONS 7 CITATIONS

Universitair Ziekenhuis Antwerpen

SEE PROFILE

29 PUBLICATIONS 229 CITATIONS SEE PROFILE

G. Hellebaut

Tim Van den Bulcke

Universitair Ziekenhuis Antwerpen

Janssen Pharmaceutica

1 PUBLICATION 0 CITATIONS

31 PUBLICATIONS 253 CITATIONS

SEE PROFILE

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

ACCUMULATE - Acquiring Crucial Medical Information using Language Technology View project

All content following this page was uploaded by Kim Luyckx on 07 May 2014. The user has requested enhancement of the downloaded file. All in-text references underlined in blue are added to the original document and are linked to publications on ResearchGate, letting you access and read them immediately.

PASTEL: A Semantic Platform for Assisted Clinical Trial Patient Recruitment David Damen∗† , Kim Luyckx∗† , Geert Hellebaut∗ and Tim Van den Bulcke∗† ∗ biomedical informatics group Antwerp University Hospital Edegem, Belgium Email: {david.damen,kim.luyckx,geert.hellebaut,tim.vandenbulcke}@uza.be † biomina Antwerp University Hospital - University of Antwerp Antwerp, Belgium

Abstract—Clinical trial recruitment encompasses many challenging tasks. Chief amongst those is the fast and reliable recruitment of eligible participants for the study. Much of this selection is still performed manually, despite the possibility of missing eligible patients (up to 60% according to some studies). To mitigate this issue, a Topic Maps-based semantic platform was developed at the Antwerp University Hospital to assist in the recruitment of clinical trial participants from the full patient population. The platform consists of (1) a web-based editor for the creation of ontology-based clinical trial representations, (2) a patient evaluator that connects to structured and unstructured hospital data sources to determine eligibility for a clinical trial, and (3) a web-based analytics module for reviewing evaluation results. The semantic nature of the clinical trial representation allows for generic formalization as well as for local adaptation of the study protocol to accommodate a specific hospital IT infrastructure. Keywords-clinical trial; semantic technology; health informatics platform; translational research; electronic screening

I. I NTRODUCTION A main concern in clinical trial management is the timely and accurate selection of candidates that conform to the clinical trial protocol. Most trials experience delays between 1 and 6 months (86%) and some even longer [1], [2]. Eligibility screening is usually done manually through a labour-intensive and inefficient process [3] and it is often limited in time (e.g. screen only currently admitted patients) and in scale (e.g. only individual departments rather than hospital-wide). Through the large-scale adoption of electronic health record (EHR) systems in recent years, much information has become available electronically. The use of this information in computer-assisted eligibility screening (CAS) applications could process a larger candidate population, make the recruitment process more consistent, and significantly reduce the cost of clinical trial recruitment [4]. In a first use case, a CAS application can be used to exclude non-eligible patients and to suggest candidates that meet all criteria. As

a result, a much more focused set of candidates can be presented for manual review than would be the case in a full manual review of all patient files. In a second use case, the application can be used to conduct a feasibility study to gauge a hospital’s ability to provide the required number of participants prior to the start of the recruitment process at that site. A digital integrated platform can provide an assessment of such numbers in a smaller time frame with more accuracy. Unfortunately, the information required (e.g. lab results, clinical notes, medical codes, imaging reports) is often stored in heterogeneous, loosely linked, and sometimes proprietary databases. A clinical data warehouse that integrates the different hospital data sources can alleviate many of these issues. We present a novel framework for assisted patient recruitment for clinical trials, named PASTEL (Platform for Assisted Semantic clinical Trial ELigibility), that addresses four crucial elements: • A strong underlying semantic representation framework that allows for formal representation of eligibility criteria and efficient reuse of previously defined criteria in new trials. • The ability to quickly respond to requests for supporting additional expressions in eligibility criteria. • Visual analytics of eligibility evaluation results at both study and candidate level. • The evaluation of heterogeneous data sources, including structured clinical data and unstructured textual data. While current state-of-the-art systems [3], [5] are strong in one or more of these issues, to our knowledge, none of the systems combines all elements. In this paper, we describe the conceptual framework in Section II with the semantic representation, the clinical trial ontology, and eligibility evaluation; the PASTEL architecture (editor, evaluator, results viewer, and data sources) in Section III; and discuss the implications of these choices with respect to criterion complexity and quality/availability of

data sources in a real-life setting at the Antwerp University Hospital in Section IV.

Concept

II. C ONCEPTUAL FRAMEWORK To enable the formal evaluation of a patients eligibility for a clinical trial, the following conceptual framework is defined. The starting point is always the (free-text) description of the clinical trial protocol provided by the study sponsor, which also contains the inclusion and exclusion criteria that a patient needs to meet to be considered eligible for the trial. First, a formal computer-readable representation of these inclusion and exclusion criteria is created. In a second step, a specific patient or set of patients is evaluated to determine their eligibility for this (formal) trial representation. In the following sections, we will discuss the conceptual choices for the semantic representation, clinical trial ontology, and eligibility evaluation.

Lab Concept

Report Concept

Demographics Concept

Thrombocyte count

Metastases in report

Age

query metastase OR metastases

lab test id lab test id

142002

query uitzaaiing OR uitzaaiingen

142005

Figure 1.

Example Concepts.

A. Semantic representation The main goal of creating a semantic representation for clinical trials is to avoid ambiguity in the interpretation of the concepts and ideas of that domain. Our aim is to formally represent the trial’s eligibility criteria in a machine-readable format that can be shared between different sites and that can be adapted to a specific data infrastructure without loss or corruption of the original trial information. The most widely used standard for semantic descriptions is RDF (Resource Description Framework) [6]. In PASTEL, we have however chosen Topic Maps (ISO/IEC 13250:2003) for the underlying semantic layer. Topic Maps provides us with more flexibility and richer constructs than RDF. In contrast to RDF, the subject-centric view provides a natural fit to describe a knowledge domain. Secondly, associations are always reified and can as such play roles in other associations, having subtle implications for extensibility and robustness. Finally, RDF representations can also still have ambiguity as to the precise target of the URI (uniform resource identifier) while Topic Maps provide a stronger identification layer. For more information on our choice for Topic Maps in this platform, we refer to Damen et al. [7]. Topic Maps is a semantic technology for a subject-centric description of any knowledge domain. Topics can represent any subject and they can be related to each other through Associations. Topics have one or more Names and can have properties in the form of Occurrences. Additionally, Scopes can be used to indicate when Names, Occurrences, or Associations are valid. Topic Maps has been described in Pepper et al. [8] and Garshol et al. [9]. B. Clinical Trial Ontology To facilitate sharing of semantic representations of information from the same knowledge domain, common structure are used for particular knowledge domains. Such a structure is called an ontology and we we will introduce an ontology

for clinical trials in this section. Note that while the term ontology has many different meanings in literature, its core description is a model that consists of types, properties, and most importantly relationships for describing the world [9]. Our clinical trial ontology provides the semantic structure for a knowledge representation of a clinical trial and contains the following main Topic types: Concept, Cell, Group, Clinical Trial, and Institution. In the next paragraphs, we will summarize the most important aspects. A full description of the underlying semantic framework of PASTEL can be found in Damen et al. [7]. A Concept is the smallest building block in the ontology. It designates an abstract, functional representation of a single information source. A Concept is part of the formal representation of a clinical trial and and can be linked with the internal IT infrastructure of the hospital. For every type of Concept, there is a plugin in the PASTEL evaluator that knows how to connect to a specific hospital database and retrieve that kind of data. Currently, five types of Concepts are supported: Demographic, Lab, Diagnostic Code, Medication, and Report Concepts. Instances of the Concept type then include Topics such as date of birth and Hb A1C lab test. Figure 1 further illustrates this with some example concepts. A Cell performs an arbitrarily complex aggregation of the values retrieved from a Concept to a ternary logic value true, false, or unknown. It typically represents an elementary inclusion or exclusion criterion in a clinical trial protocol. Figure 2 illustrates how an abstract Cell links the clinical trial representation to a real hospital infrastructure: a cell representing a high thrombocyte count can be answered by evaluating lab data (stored in a specific format and unit), and the lab value should be higher than 256 to account for the way they are stored in this specific hospital. A Group can be thought of as a meaningful logical

Cell

Thrombocyte count > 2.5 ULN

Lab Concept

Thrombocyte count

Constant

greater than @CARDIOS @UZA

256 value 256 @numeric

Figure 2.

An example Cell.

grouping of multiple criteria (e.g. ‘Diabetes mellitus type 2, ‘heart problem or ‘no previous chemotherapy treatment). Groups combine Cells and other Groups through a logical expression, thus bringing additional structure in the inclusion and exclusion criteria. A Clinical Trial Topic is included in every clinical trial representation. This Topic is used as Scope to distinguish between different clinical trials. The Clinical Trial Topic instance is also linked to the Groups that make up its inclusion and exclusion criteria in a hierarchical, tree-like fashion. Instances of the Institution Topic are also used as Scope in a clinical trial representation to indicate how a specific hospital or medical center implemented the study protocol. For instance, a criterion that represents diabetes mellitus type 2 might be implemented through diagnostic codes (e.g. ICD9-CM codes) at one site and through lab tests at another. Concepts, but also cells and groups, only need to be defined once and can be reused across different clinical trial representations and different institutions. Additional subjects and constraints that support a specific clinical trial representation are also defined in the ontology. C. Eligibility evaluation The expression of absent or negative information poses some specific challenges when formulating the eligibility criteria. Under a closed world assumption, if something is not known to be true, it is considered as false. An eligibility criterion ‘last HbA1c measurement > 100 nmol/L’ would therefore lead to false if there are no lab measurements for a particular patient. However, the criterion should evaluate to an inconclusive state since the statement can still be either true or false depending on the outcome of the lab test. This is reflected by an open-world assumption where a statement is considered unknown unless it is explicitly stated as either

Figure 3.

The PASTEL Platform architecture.

true or false. Our evaluation engine processes a Logical Expression using a ternary logic based on Kleene logic [10] with the following truth states: true, false, and unknown. In this logic, unknown is considered as either true or false. Logic operations that involve an unknown value and which are unambiguously either true or false, result in true or false respectively in Kleene logic. E.g. ‘true or unknown’ would resolve to ‘true’ and ‘true and unknown’ to ‘unknown’. The logic values of the Cells are combined in Logical Expressions using the above rules. Intermediate results trickle up to the top-level ‘Eligible’ , which collects the overall eligibility result for a specific patient. III. T HE PASTEL P LATFORM The PASTEL platform is primarily aimed at clinicians and study nurses tasked with the recruitment of patients for clinical trials. A high-level overview of the PASTEL platform is illustrated in Figure 3. In a first step, the inclusion and exclusion criteria in a clinical trial protocol are formalized in the PASTEL editor, which in turn generates a machine-readable and ontology-based representation of the criteria. This formal representation is then submitted to a message queue in combination with a patient list of interest (e.g. patients of a specific department and/or a specific time period). A number of PASTEL evaluator processes subscribe to this queue and pick up new evaluation requests. After the evaluators have finished processing these requests, the results of the evaluation can be investigated in the PASTEL viewer. The viewer provides both a top-level view of evaluation results for the study indicating how many patients conform to individual criteria, as well as a drilldown view to individual patients to review their eligibility.

Figure 4.

PASTEL editor screenshot (left: overview of all studies, middle: formalized clinical trial protocol, right: detail view of a single cell).

A. The PASTEL editor Several representation formats for the exchange of topic maps exist, of which the Compact Topic Maps (CTM) notation is used in PASTEL. In order to support the creation of correct topic maps in CTM by end users, a web-based editor was developed using Google Web Toolkit [11] and the Smart GWT library [12]. The PASTEL editor provides an intuitive graphical user interface for study nurses and clinicians to enter the inclusion and exclusion criteria of a study protocol and generates a formal semantic representation of the clinical trial. The PASTEL editor, as shown in Figure 4, consists of three panels. On the left, a list of clinical trials previously entered by the user is shown. The middle panel contains a tree that shows the overall formalized representation of the study protocol down to the Cell level. Selecting a node of this tree opens a custom detailed editor on the right. The contents of this editor also change depending on the type of node that was selected. E.g. in the case of Groups, a text field is shown where the name of the Group can be changed. The detailed editor is also customized based on the type of Concept that is used in the Cell. E.g. for a Report Concept, text queries can be defined through a text field, and for a Lab Concept multiple complex filtering and aggregation steps can be specified. Finally, after the user has finished entering the study protocol, he can submit it to the PASTEL evaluator. When doing so, he also provides a list of patient IDs for which he wishes to determine eligibility according to the protocol. The PASTEL viewer can then be opened to track the progress of

the eligibility evaluation and review both intermediate and final results. B. The PASTEL evaluator The input for the evaluation engine consists of a clinical trial topic map, an Institution topic as scope, and a list of patient identifiers. The engine then performs a postorder traversal of the criteria tree. When a Cell instance is encountered, the engine calls a (center-specific) service for the particular Concept involved, thereby providing the service with the patient identifier(s) and the occurrences of this Concept. The output of the service is then aggregated to a logic value by filtering and/or comparing the output with one or more Constants. During this evaluation, a logic value (true, false or unknown) is assigned to every cell for each patient. A PASTEL evaluator is a self-contained process that listens to a message queue for incoming requests to determine the eligibility of a cohort of patients for a specific clinical trial. An evaluation request contains a clinical trial ID, a list of patients, and an (optional) start and/or end date. These dates can be used to limit eligibility evaluation to only use data captured before or after specific dates. Being able to supply an end date is particularly useful in retrospective comparisons where the performance of the platform needs to be evaluated against the actual patient recruitment for a past clinical trial. After accepting the evaluation request, the PASTEL evaluator will load the Topic Maps representation of the clinical trial and walk through its tree structure. When a Cell is reached, data for the cohort of patients is retrieved and each

Figure 5. PASTEL viewer screenshot (left: overview of all studies, middle: lists of eligible, potentially eligible, and non-eligible patients, right: detailed visualization of evaluation results at the study level).

patient is assigned one of the following ternary logic values for that Cell: true

false

unknown

The data retrieved for the patient matches the comparison represented by the Cell. The data retrieved for the patient does not match the comparison represented by the Cell. No data could be retrieved for the patient, hence eligibility according to the criterion in the Cell could not be determined.

Individual Cell results are propagated back up the tree so that a single eligibility result (true, false or unknown) can be assigned to each patient for the study as a whole. The tree walker has been implemented using the Visitor pattern which enables consistent access and flow through the clinical trial tree structure on which additional services can be built beyond eligibility evaluation. Multiple PASTEL evaluators, all listening to the same message queue, can be spun up to dynamically deal with different eligibility evaluation loads, such as sudden request spikes. This allows for a gradual introduction of the system at new sites and provides a greater degree of control over the trade-off between speed of retrieval results and cost of infrastructure to run the system. At Antwerp University Hospital, more than 400 clinical trials are performed annually (overall, not only via PASTEL). Depending on the trial, between 1,000 and 100,000 patients are typically evaluated in a PASTEL run.

C. The PASTEL viewer As soon as the PASTEL evaluators start determining the eligibility of the patients in the initially provided list, the PASTEL viewer can be opened to review the intermediate and ultimately final evaluation results. The goal of the PASTEL viewer is to provide detailed insights, such as: • • •

Which patients are eligible, potentially eligible, and non-eligible for a clinical trial? Why is an individual patient eligible or not? What is the impact of individual and grouped criteria on the final results?

The PASTEL viewer, as displayed in Figure 5, consists of three panels. On the left, a list of all clinical trials available to the user is shown as is the case in the PASTEL editor. Clicking one of those clinical trials loads up three sets of patients in the middle panel: a list of eligible patients, a list of potentially eligible patients, and a list of non-eligible patients. At the same time, in the right panel, a detailed overview of the clinical trial is made visible. The detailed overview shows the same tree structure of the study protocol as in the PASTEL editor. Additionally, for each Cell and Group in the study protocol the number of patients matching, potentially matching, and not matching is listed. As a result, an assessment can be made regarding the impact of certain criteria on the overall eligibility rates, thereby giving clinicians the tools to adapt the strictness of criteria during the study definition phase. Furthermore, selecting a patient in the middle panel updates the study detail view to show how the patients eligibility was evaluated for each Cell and Group of the trial formalization. The data

that was used to determine eligibility for that particular patient is displayed as well. This greatly increases insight in why an individual is (not) eligible for the trial, as all necessary information is presented in a single but detailed overview. D. Data sources The inclusion and exclusion criteria as found in a typical clinical trial are based on structured as well as unstructured information scattered across the entire hospital organization. Often, this information is stored in a plurality of loosely linked databases. Even in the case of structured data, transforming this information in an interoperable form, allowing it to be queried in a patient recruitment environment, is quite a challenge. The PASTEL framework can be formally linked with any set of data sources (even outside the biomedical domain). At Antwerp University Hospital, it is currently linked with two major sources: a clinical data warehouse and a text-search platform. 1) Clinical Data Warehouse: At Antwerp University Hospital, many structured clinical data sources are centralized in a single clinical data warehouse, namely in an instance of i2b2 [13], an open source and widely used platform for clinical (research) data. Amongst the accessible data sources currently integrated within the i2b2 platform at Antwerp University Hospital we find patient demographic data, patient visit data, laboratory results, ICD-9-CM diagnoses, ICD-9-CM procedures, and Anatomical Therapeutic Chemical-based (ATC) medication prescriptions for more than 1.2 million patients. As much as possible, internal coding systems were mapped to international coding systems to guarantee interoperability between different data sources. Where applicable, the clinical data warehouse receives nearly real-time data from operational HL7 message streams. Data sources without HL7 exporting functionality are unlocked by traditional data warehouse and Extract-Transform-Load (ETL) techniques. In addition to the different existing i2b2 application programming interfaces, we developed an additional RESTful web services API that was optimized for and integrated in the PASTEL patient recruitment engine. While most i2b2 APIs are based on a patient-centric paradigm, the additional functionality provided by the developed REST API enables patient cohort querying. 2) Solr text search platform: Apart from the structured data sources stored in a clinical data warehouse, the PASTEL platform also has access to unstructured data in the form of clinical texts in the hospitals EHR system. These documents are indexed and stored in an instance of Apache Solr [14], a powerful open-source platform for full-text indexing and search. After sets of documents and selected meta information (time stamp, patient ID, department ID, etc.) have been indexed, they can be queried efficiently through the Solr web interface or HTTP requests.

The Solr server at Antwerp University Hospital holds an index of over 4 million clinical texts (e.g. discharge letters, radiology reports, clinical notes; all written in Dutch) in the hospitals EHR system dating back to the early 2000s. An open-source RTF parser [15] was used to extract the text fields from the original RTF documents. No additional text preprocessing (e.g. stop word removal or stemming) was done at the time of indexing to ensure close resemblance to the original clinical notes. Using the Solr Query Syntax [16], various types of queries can be processed, such as terms, phrases (“subdural hematoma), wildcard searches, fuzzy searches (diabetes∼0.8), proximity searches (“diabetes obesitas∼4, etc., as well as combinations of these using Boolean operators or grouping. These textual queries can be combined with queries on the meta information in the index, for instance for time range queries (e.g. [2012-01-01T00:00:00Z TO *]). Section IV briefly describes our current efforts to extend the keyword search currently in PASTEL to a concept search. IV. D ISCUSSION The PASTEL platform as described above distinguishes itself from current state-of-the-art systems, e.g. [5], by offering an integrated answer to four crucial issues. First, our machine-readable representation of clinical trials is based on a Topic Maps ontology that allows for the definition of a study protocol with easy reuse of criteria. The focus on subject identity in Topic Maps promotes unambiguous subject definitions and their reuse. For instance, it is not uncommon for a hospital to update its lab tests when faster or more accurate tests become available. A new or updated lab test implies the creation of a new lab test ID to track it. Therefore, a conceptual lab test, such as Hb A1C measurements might have multiple lab test IDs that identify actual Hb A1C measurements. Our ontology encapsulates those kinds of mappings in Lab Concepts that can be reused for other studies and other Principal Investigators or study nurses. Moreover, such mappings could be created and updated by a knowledge engineer with the added benefits of (1) more up-to-date mappings after new lab test additions and (2) allowing clinicians and study nurses to only need to reason over conceptual lab tests and not be concerned with local storage and data definitions. Second, developing a knowledge representation for clinical trials and an evaluation platform implementation in tandem provides us with the ability to quickly add new constraint expressions and subsequently test them within the platform. For example, PASTEL currently supports a criterion such as ‘diabetes mellitus type 2’ which can be formalized to a representation such as ‘at least 2 Hb A1c values above 6.5%’, i.e. a filter ‘values above 6.5%’ followed by a comparison ‘at least 2’. An alternative formalization could be ‘80% of the last 10 Hb A1c values above 6.5%’, which introduces a comparison of the form ‘x% of the last y’.

Extending the model to support these expressions requires small and localized extensions in the platform. Third, the PASTEL viewer provides clinicians and study nurses with a powerful visual analytics tool for the review of eligibility evaluation results. The viewer and editor use the same tree structure of the study protocol so that individual criteria can easily be retrieved. By providing intermediate evaluation results for all criteria and groups of criteria, we facilitate pinpointing bottlenecks in the study protocol design, e.g. where a disproportionate amount of candidates might be accepted or eliminated. The ability to quickly identify why individual patients are eligible enhances the end users understanding of the evaluation results. Fourth, the PASTEL platform relies on heterogeneous data sources for eligibility evaluation, including lab tests, clinical codes, and demographics as well as clinical freetext reports, instead of relying solely on structured data. Despite the importance of textual information [17], one of the few patient screening tools to provide keyword search functionality, is the ASAP tool [18]. However, ASAP has a clearly different focus than PASTEL, as it only offers end users a tool to manually search the EHR, whereas PASTEL performs automated evaluation of patient eligibility, complemented with human review of these evaluations. In other words, accurate natural language processing (NLP) of clinical free text is more important in a platform such as PASTEL as it directly influences the results being reviewed by the end users. To adopt the PASTEL framework in a different hospital environment, only the concept plugins for the local data sources need to be (re)defined. Existing clinical trial formalizations that are generically defined can be reused up until the cell level. E.g. a cell that defines ‘at least 2 Hb A1c values above 6.5%’ is defined independently of local or hospital-specific infrastructure. The concept mappings themselves are in principle hospital specific and need to be defined (once) locally, e.g. the mapping of ‘Hb A1c’ concept to specific lab tests at UZA. However, the PASTEL framework allows concepts to be annotated with additional information (e.g. LOINC codes for lab tests, ATC codes for medication, ...), which could enable fully automated mapping of annotated concepts to local data queries if the local database systems are compliant with these annotation standards. Studies have shown that e-screening methods improve throughput in initial screening attempts compared to the manual review of patient records [3]. The platform could produce a list of trial candidates much faster and over a much larger patient population. Whereas typically only patients of the department of the study’s Principal Investigator (PI) would be taken into account, now patients from other departments or even from the entire hospital population can be taken into account. There are a number of real-world limitations to any

platform for electronic eligibility screening. First, most clinical trials have one or more criteria that refer to consent, willingness, or ability of a candidate to participate in the trial. The required information to answer these criteria is difficult and often impossible to capture electronically prior to the trial. This implies that suitable candidates can be suggested by the PASTEL platform, whereas actual inclusion will still require manual evaluation of consent, willingness or ability-related criteria. Second, the data used in the automated eligibility evaluation might be deemed too old by study sponsors and additional tests might be ordered to ensure candidate patients still conform to the study criteria. We therefore see the PASTEL platform as a tool for assisted patient recruitment rather than automated recruitment. Third, the availability and quality of data sources directly impacts eligibility evaluation results. In our case, for instance, a number of lab tests are not performed in-house but are outsourced to specialized external laboratories. Those lab results are not always stored electronically, making it impossible for our platform to analyze them. Another area where up-to-date information is often missing, is diagnostic codes. Diagnostic codes are provided by the clinical coding department of the hospital and primarily for financial and reporting purposes. A lagging time of several months is not uncommon, reducing their usefulness in eligibility evaluation for ongoing clinical trials. Finally, the level of analysis of clinical free-text reports directly influences the ability of the platform to reliably evaluate patient eligibility. PASTEL currently supports a keyword search on the hospitals EHR system through a Solr platform (cf. Section III-D). Expanding keyword search to concept search - where keywords linked to hypertension are formalized as instances of cardiovascular disease - is an important step towards more intelligent natural language processing of clinical reports. In addition, accurate analysis of linguistic features such as negation, modality, and hedging (e.g. ‘Not a case of epilepsy’, ‘Patient may be suffering of epilepsy’) is crucial for the correct evaluation of a criterion. V. C ONCLUSION In this paper, we presented a platform for assisted eligibility evaluation for clinical trials. The PASTEL platform is built on a strong underlying semantic representation framework using Topic Maps that allows for formal representations of study protocols and reuse of eligibility criteria within and across institutions. The integrated approach of combining the ontology definition and platform implementation development enable fast turnaround times for adding additional filters and comparison constraints. Additionally, the use of asynchronous message queues enables the platform to scale comfortably to larger evaluation loads. The platform provides a powerful visual analytics environment for results analysis of large eligibility evaluations.

ACKNOWLEDGMENT This research was conducted at and sponsored by Antwerp University Hospital as part of the innovative ICT programme. We thank dr. Tim Van den Wyngaert, Katrien Lesage, and Paul Vanden Broucke for the helpful interactions. R EFERENCES [1] J. Sullivan, “Subject recruitment and retention: Barriers to success,” Applied Clinical Trials, pp. 50–54, 2004. [2] M. Campbell, C. Snowdon, D. Francis, D. Elbourne, A. McDonald, R. Knight, V. Entwistle, J. Garcia, I. Roberts, A. Grant, and A. Grant, “Recruitment to randomised trials: Strategies for trial enrollment and participation study. the STEPS study,” Health Technology Assessment, vol. 11, no. 48, 2007. [3] S. R. Thadani, C. Weng, J. T. Bigger, J. F. Ennever, and D. Wajngurt, “Electronic screening improves efficiency in clinical trial recruitment,” Journal of the American Medical Informatics Association, vol. 16, no. 6, pp. 869–873, 2009. [4] E. Fink, P. Kokku, S. Nikiforou, L. Hall, D. Goldgof, and J. Krischer, “Selection of patients for clinical trials: an interactive web-based system,” Artificial Intelligence in Medicine, vol. 31, no. 3, pp. 241–254, 2004. [5] C. Weng, S. W. Tu, I. Sim, and R. Richesson, “Formal representation of eligibility criteria: a literature review.” Journal of biomedical informatics, vol. 43, no. 3, pp. 451–467, Jun. 2010. [6] R. W. Group, “Resource Description Framework (RDF),” 2004. [Online]. Available: http://www.w3.org/RDF/ [7] D. Damen and T. Van den Bulcke, “Towards a flexible semantic framework for clinical trial eligibility using topic maps.” in Proceedings of the ACM SIGKDD Workshop on Health Informatics, HI-KDD ’12. New York, NY, USA: ACM, 2012. [8] S. Pepper, “The TAO of Topic Maps - Finding the Way in the Age of Infoglut,” in Proceedings of XML Europe 2000, 2000. [Online]. Available: http://www.ontopia.net/topicmaps/materials/tao.html [9] L. M. Garshol, “Metadata? Thesauri? Taxonomies? Topic Maps! Making Sense of it all,” Journal of Information Science, vol. 30, no. 4, pp. 378–391, Aug. 2004. [10] M. Fitting, “Kleene’s three-valued logics and their children,” Fundamenta Informaticae, vol. 20, no. 1, pp. 113–131, Jan. 1994. [11] “Google Web Toolkit,” https://developers.google.com/webtoolkit/. [12] “Smart GWT,” http://www.smartclient.com/product/smartgwt. jsp. [13] “Informatics for Integrating Biology & the Bedside, Partners Healthcare System.” http://www.i2b2.org.

View publication stats

[14] “Apache Solr,” http://lucene.apache.org/solr/, Apache Solr, Solr, Apache Lucene, Lucene and their logos are trademarks of the Apache Software Foundation. [15] “Pyth v0.5.6 ‘Python text markup and conversion’,” http://pypi.python.org/pypi/pyth. [16] “Apache Solr Query Syntax,” http://wiki.apache.org/solr/Solr QuerySyntax. [17] L. Li, H. Chase, C. Patel, C. Friedman, and C. Weng, “Comparing ICD9-encoded diagnoses and NLP-processed discharge summaries for clinical trials pre-screening: A case study,” in Proceedings of the AMIA Annual Symposium, 2008, pp. 404–408. [18] T. Pressler, P. Yen, J. Ding, J. Liu, P. Embi, and P. Payne, “Computational challenges and human factors influencing the design and use of clinical research participant eligibility prescreening tools,” BMC Medical Informatics and Decision Making, vol. 12, no. 47, 2012.

Lihat lebih banyak...

PASTEL: A Semantic Platform for Assisted Clinical Trial Patient Recruitment

Descrição do Produto

Comentários