Special Issue: Enabling Technologies for Parkinson\'s Disease Management

Share Embed


Descrição do Produto

IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 19, NO. 6, NOVEMBER 2015

1775

Guest Editorial Enabling Technologies for Parkinson’s Disease Management ARKINSON’s disease (PD) is the most common neurological movement disorder with a prevalence of up to 2% in the elderly. The cardinal motor symptoms bradykinesia, rigidity, tremor, and postural instability define the diagnosis of the PD [1], [2]. These motor features make PD an ideal model disease to establish, validate, and clinically apply sensor-based movement diagnostic approaches. The Unified PD rating scale is the most widely used clinical score to rate symptoms, in particular, motor symptoms (UPDRS Part III). Measurement of the motor symptoms provides a major treatment target parameter, but high interrater reliability reported for the standardized clinical assessment is a problem [3], which could be improved with sensor-based measurements. Patients typically live with this disease for decades and the motor symptoms gradually increase during disease progression; thereby, constantly adding to a reduced quality of life. In the advanced stages, motor symptoms fluctuate during the course of a day, generating a specific need for the home-based monitoring. Clinical assessments throughout the course of the disease consume substantial resources and frequently repeated assessments are generally impractical, only performed as a gold standard for some drug studies and, still providing only a snap-shot of the patients’ daily life impairments. From a clinical point of view, three different general applications of the sensor-based movement diagnostics are feasible and could be useful. First, instrumented versions of the tests that are already used as a part of the standardized motor function analysis performed by the physician could help reduce the interrater variability. Second, patients can be guided by a clinician through a video connection in their own homes. It could greatly improve the quality of the assessment to have instrument-defined movements evaluated in this semisupervised (or even unsupervised) environment and permit semicontinuous quantitative assessment of the impairment in a home monitoring approach. Finally, continuous analysis of the motor movements during everyday living could allow an unobtrusive assessment and monitoring under real-life conditions. This objective information on the quality of life and daily functioning could complement the diagnostic workup to greatly enhance the disease management and precisely address the patients’ needs, ultimately also reducing the healthcare costs. Developments in the field of portable sensors are being increasingly introduced into clinical diagnostics and treatment of movement disorders such as PD [4]. Improved sensors and algorithms are not only able to simply measure the distinct movement impairment, but also become available to solve clinical questions and to address the patients’ needs [5]. This progress

P

Digital Object Identifier 10.1109/JBHI.2015.2488158

and the initial commercialization of such technologies are the driving force to apply modern sensor systems to aid in the diagnosis, tracking, and treatment of diseases. Successful technology applications for this disease model will be generalizable to a variety of other neurological and musculoskeletal conditions. In contrast to activity monitoring technologies in fitness and health, there are additional hurdles to cross and fundamentally different requirements to meet for applications in medicine and disease. First, for sensor-based assessment, it is necessary to identify the specific impaired motor function before properly quantifying its impairment. Normal quantification of the healthy body function cannot be assumed to be applicable, as the biomechanical quality of the distinct motor function that sensors measure is impaired, and the impairment may even be disease specific. It is also important to note that the biomechanical measurements are only a part of the technology contribution. The mathematical modeling that converts the data into useful information about disease status is at least as important as the development of the measurement devices and must also be validated. Second, there are substantial ethical and legal requirements that underlie the medical applications of technical devices. Unlike a fitness monitor, there are specific requirements for adequate validation of the quantification of distinct motor impairments, especially where it will be used to guide medical decisions. A substantial challenge for medical–technical translational research is the combination of technical expertise with medical needs. On the one hand, fascinating technical developments do not necessarily optimally address the needs of the patient or doctor. On the other hand, many clinicians lack an appreciation of what engineering technologies can do to help them and their patients. Many articles are appearing that use PD as a disease model for the technology development and applications, but duplicative activity measurement studies submitted to technical journals and research grant study sections suggest a lack of awareness of relevant published work and a poor understanding of the disease research gaps. Therefore, the objective of this special issue is to improve and accelerate the productive research in this area, solving clinical needs with enabling technologies. The studies presented here are a representative sample that defines the edge between technical abilities and clinical applications. They demonstrate the usefulness of PD as a model disease opportunity for “convergence” science that will open the door for important technology applications to many other neurological and musculoskeletal movement disorders, as well as defining and improving the healthy patterns of the daily life performance [6]. The papers and their context are briefly summarized here. Using several body-worn inertial motion sensors within a clinical facility, Parisi et al. showed that several distinct items of the UPDRS could be objectively measured even matching

2168-2194 © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications standards/publications/rights/index.html for more information.

1776

IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 19, NO. 6, NOVEMBER 2015

the clinicians’ rating, but with a reduced interrater variability. An aggregated UPDRS score derived from inertial sensor analysis was proposed for remote monitoring of the PD patients. Impaired gait is one of the most biomechanically characteristic motor symptoms for PD, and is frequently observed and described by specialized physicians during routine diagnostics at the hospital or outpatient units. Wahid et al. presented changes in spatio-temporal gait parameters, assessed by a video motion analysis system and instrumented force platforms in a small patient cohort by deductive calculation and multiple parameter regression analysis. This must be validated, but these data clearly showed that the distinct spatio-temporal gait parameters support the clinical characterization of gait impairment in PD. Demonceau et al. made use of the trunk accelerometer systems to detect similar spatio-temporal gait parameter alterations, but also detected changes in gait regularity and symmetry, adding useful and objective information to the symptoms of shuffling of gait, short steps, and gait asymmetry and instability. In an interventional approach, Jellish et al. showed that treadmill-based systems using real-time feedback to improve distinct gait and posture parameters could increase the step length and posture in PD patients. Another typical symptom representing neuro-muscular dysfunction is the speech impairment in PD, and the clinical presentation differs from other forms of dysphonia. OrozcoArroyave et al. applied a complex analysis of the voice signals to distinguish PD-specific speech impairment from laryngeal pathologies of other origins. Importantly, they showed in a small cohort of patients that different speech impairments require unique algorithmic analysis to distinguish different entities. Translating technical devices for care outside of medical facilities, in the patients’ home environment, is a significant challenge also addressed in this issue. In a longitudinal study, Memedi et al. investigated a telemetry test battery to manage the advanced stage PD, where continuous dopaminergic treatment is achieved by intestinal drug pumps. Constant surveillance of treatment effects and patients’ wellbeing is necessary but challenging in traditional care. This study combined self-rating with instrumented motor tests of upper extremity function; an overall score compared well with distinct motor and nonmotor examinations and, in particular, showed significant changes in patients responding to this therapy. In addition to fine motor skills, tremor can also be a target symptom for instrumented analysis. Kostikis et al. used the inertial sensors from smart phones to detect and classify the tremor in PD patients in a home-based study, also supporting a strategy to keep patients connected with their treating neurologist. Gait impairments are the most frequent and limiting lower extremity dysfunction in PD. In advanced stages of the disease, freezing of gait impairs the patient’s mobility and becomes a major challenge for treatment. Mazilu et al. combined electrocardiography and skin conductance measures in PD patients with an algorithmbased prediction of freezing gait episodes based on defined motor tests, including walking and turning gait sequences. The generated algorithms predicted freezing episodes; this may enable biofeedback-based preemptive therapy approaches. Killiane et al. applied a combination of the virtual reality scenario with a balance board under dual-task conditions to lessen the freezing episodes in the PD patients. They showed that the di-

agnostic tool devices could also be used as an intervention, extending therapeutic strategies. Three more manuscripts reviewed or demonstrated the use of wearable sensors in everyday life scenarios. Stamford et al. reviewed technology applications and patients’ needs that extend beyond motor symptoms, in particular, considering nonmotor symptoms that are so important in daily life of PD patients. Pasluosta et al. summarized how the internet of things is moving into the management of PD patients. Future diagnostic and communication strategies between patients, caregivers, and doctors were discussed, along with projections for wearable technologies shifting treatment and care paradigms. Cook et al. applied these concepts, joining information obtained from smart home and wearable sensors in PD patients and healthy older adults to identify the differences in activity patterns using machine learning classification. In summary, this collection of papers illustrates the wide variety of approaches that can be taken, when leveraging technology to monitor and treat PD. In the wider context of chronic neurologic and musculoskeletal disorders, the contribution in this special issue give rise to new ideas and introduce new technologies with respect to the application of modern sensor systems in aiding diagnosis, tracking, and treatment of these diseases. J. KLUCKEN, Guest Editor Universit¨atsklinikum Erlangen Friedrich-Alexander University Erlangen-N¨urnberg Erlangen 91054, Germany [email protected] K. E. FRIEDL, Guest Editor University of California San Francisco, CA 94143 USA [email protected] B. M. ESKOFIER, Guest Editor Friedrich-Alexander University Erlangen-N¨urnberg Erlangen 91058, Germany [email protected] J. M. HAUSDORFF, Guest Editor Tel Aviv Sourasky Medical Center Tel Aviv 64239, Israel [email protected] REFERENCES [1] J. Parkinson, An Essay on the Shaking Palsy. London, U.K.: Neely & Jones, 1817. [2] M. M. Hoehn and M. D. Yahr, “Parkinsonism: Onset, progression and mortality,” Neurology, vol. 17, pp. 427–442, 1967. [3] M. Richards, K. Marder, L. Cote, and R. Mayeux, “Interrater reliability of the unified Parkinson’s disease rating scale motor examination,” Movement Disorders, vol. 9, pp. 89–91, 1994. [4] A. Mirelman, N. Giladi, and J.M. Hausdorff, “Body-fixed sensors for Parkinson disease,” J. Amer. Med. Assoc., vol. 314, pp. 873–874, 2015. [5] J. Klucken, J. Barth, P. Kugler, J. Schlachetzki, T. Henze, F. Marxreiter, et al., “Unbiased and mobile gait analysis detects motor impairment in Parkinson’s disease,” PloS One, vol. 8, p. e56956, 2013. [6] P. A. Sharp and R. Langer, “Research agenda: Promoting convergence in biomedical science,” Science, vol. 333, p. 527, 2011.

IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 19, NO. 6, NOVEMBER 2015

1777

Body-Sensor-Network-Based Kinematic Characterization and Comparative Outlook of UPDRS Scoring in Leg Agility, Sit-to-Stand, and Gait Tasks in Parkinson’s Disease Federico Parisi, Student Member, IEEE, Gianluigi Ferrari, Senior Member, IEEE, Matteo Giuberti, Member, IEEE, Laura Contin, Veronica Cimolin, Corrado Azzaro, Giovanni Albani, and Alessandro Mauro

Abstract— Recently, we have proposed a body-sensor-networkbased approach, composed of a few body-worn wireless inertial nodes, for automatic assignment of Unified Parkinson’s Disease Rating Scale (UPDRS) scores in the following tasks: Leg agility (LA), Sit-to-Stand (S2S), and Gait (G). Unlike our previous works and the majority of the published studies, where UPDRS tasks were the sole focus, in this paper, we carry out a comparative investigation of the LA, S2S, and G tasks. In particular, after providing an accurate description of the features identified for the kinematic characterization of the three tasks, we comment on the correlation between the most relevant kinematic parameters and the UPDRS scoring. We analyzed the performance achieved by the automatic UPDRS scoring system and compared the estimated UPDRS evaluation with the one performed by neurologists, showing that the proposed system compares favorably with typical interrater variability. We then investigated the correlations between the UPDRS scores assigned to the various tasks by both the neurologists and the automatic system. The results, based on a limited number of subjects with Parkinson’s disease (PD) (34 patients, 47 clinical trials), show poor-to-moderate correlations between the UPDRS scores of different tasks, highlighting that the patients’ motor performance may vary significantly from one task to another, since different tasks relate to different aspects of the disease. An aggregate UPDRS score is also considered as a concise parameter, which can provide additional information on the overall level of the motor impairments of a Parkinson’s patient. Finally, we discuss a possible

Manuscript received April 15, 2015; revised July 15, 2015; accepted August 16, 2015. Date of publication August 25, 2015; date of current version November 3, 2015. This work was supported in part by the Italian Ministry of Health (RF2009-1472190). F. Parisi and G. Ferrari are with the CNIT Research Unit of Parma and the Department of Information Engineering, University of Parma, I-43124 Parma, Italy (e-mail: [email protected]; [email protected]). M. Giuberti was with the CNIT Research Unit of Parma and the Department of Information Engineering, University of Parma, I-43124 Parma, Italy. He is now with Xsens Technologies B.V., 7500 AN Enshede, The Netherlands (e-mail: [email protected]). L. Contin is with Research & Prototyping, Telecom Italia, 10148 Turin, Italy (e-mail: [email protected]). V. Cimolin is with the Department of Electronics, Information, and Bioengineering, Politecnico di Milano, 20133 Milano, Italy (e-mail: veronica.cimolin@ polimi.it). C. Azzaro and G. Albani are with the Division of Neurology and Neurorehabilitation, Istituto Auxologico Italiano, IRCCS, I-28824 Piancavallo (VB), Italy (e-mail: [email protected]; [email protected]). A. Mauro is with the Division of Neurology and Neurorehabilitation, Istituto Auxologico Italiano, IRCCS, I-28824 Piancavallo (VB), Italy, and also with the Department of Neurosciences, University of Turin, 10125 Turin, Italy (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/JBHI.2015.2472640

implementation of a practical e-health application for the remote monitoring of PD patients. Index Terms—Body sensor network (BSN), gait (G) task, inertial measurement unit (IMU), leg agility (LA) task, Parkinson’s disease (PD), sit-to-stand (S2S) task, unified Parkinson’s disease rating scale (UPDRS).

I. INTRODUCTION ARKINSON’S disease (PD) is a progressive, chronic, neurodegenerative condition that is responsible for a gradual motor impairment. Therapies based on the use of dopaminergic drugs, such as L-dopa, are useful to manage the early PD motor symptoms, but their efficacy worsens over time, causing additional motor complications, such as dyskinesia and motor fluctuations, which can further impair the patients’ life quality. An accurate and continuous monitoring of the symptoms’ progression and treatment effect should be required in order to define an effective therapy, but neurologists can often rely only on qualitative and sporadic clinical observations, which may not be representative of the actual disease status. A more objective evaluation can be achieved using semiquantitative rating scales, such as the Movement Disorder Society-Unified Parkinson’s Disease Rating Scale (MDS-UPDRS) [1]. Although several studies have pointed out a good test–retest reliability for the global UPDRS motor score, the latter has some limits in clinical trials. Some of these limitations are: the need for a trained neurologist to assign the UPDRS score; the interrater/intrarater variability [2]; the discrete nature of the UPDRS (scores from 0 to 4), which is not optimal to detect minimal changes during the disease progression [3]; and the difficulty to convey a concise score, especially when several movement components (such as speed, amplitude, hesitations, etc.) should be taken into account for the evaluation. In order to improve the UPDRS assessment reliability, raising the detection rate between disease-modifying and symptomatic effects over a specific treatment regime, the sample size and the clinical trial duration should be increased [4]. Obviously, this solution is not always feasible or practical. The clinical judgment of the disease stage in PD patients has benefited from having the patients keep motor diaries while at home. However, this tool is often unreliable because of nonoptimal compliance in the patient record keeping, recall bias, or weak self-assessment skill due to cognitive impairment, which is often associated with late stages of PD [5].

P

2168-2194 © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications standards/publications/rights/index.html for more information.

1778

IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 19, NO. 6, NOVEMBER 2015

The advent of new motion-sensing devices, which allow accurate movement measurement through a kinematic analysis, has enabled the design of “man versus machine” clinical trials, with relevant implications in practical terms. In recent years, the use of integrated computerized systems, such as body sensor networks (BSNs), has proliferated in clinical environments, allowing easier acquisition of objective and quantitative measurements that can be repeated several times on a daily basis at the discretion of both patients and neurologists [6]. Recently, a unified approach to the evaluation of specific UPDRS motor tasks, namely, the leg agility (LA) [7], the sit-tostand (S2S) [8], and gait (G) tasks [9], [10], has been proposed. The designed system relies on a simple BSN formed by three inertial measurement units (IMUs) (two on the thighs and one on the chest) and aims at characterizing the considered tasks by extracting and analyzing the kinematic features associated with their typical movement patterns, in both time and frequency domains. The automatic data acquisition sessions have been carried out concurrently with clinical evaluations, performed by neurologists with expertise in PD and according to the standards of the MDS. The extracted features and the subjective evaluations of the neurologists were then used to train an automatic UPDRS scoring system, with the aim to automatically assess the patients’ motor performance matching as closely as possible the medical evaluation criteria. Unlike our previous studies [7]–[10] and the majority of the existing literature, in which UPDRS tasks are analyzed singularly, in this paper, we focus on the comparative evaluation of the LA, S2S, and G tasks. An experimental analysis of the data from 34 PD patients and the UPDRS evaluations of three expert neurologists was carried out. The most relevant features, for the kinematic characterization of each task, were identified and motor performance of patients belonging to the different UPDRS classes was analyzed. We also considered the performance achieved by the designed automatic classification system in the three tasks, proposing a comparative outlook with the interneurologist assessment. Then, we investigated the correlation between the UPDRS scores assigned to the tasks by both the neurologists and our automatic system, highlighting that the latter shows a performance which is compliant with typical interneurologist variability, i.e., it behaves correctly. Furthermore, we introduce an aggregate UPDRS score, simply defined as the sum of the scores obtained in each task, as a significant concise metric which can provide additional information to neurologists for deriving insights on the overall level of impairments of patients and on the relative “weight” of each task in the assessment of the gravity of the symptoms. Finally, the feasibility of an application for remote rehabilitation and monitoring of PD patients in a telemedicine environment is discussed, and a possible efficient implementation approach is proposed. The structure of the paper is the following. In Section II, preliminaries and an overview on related works are given. The hardware configuration and the set of subjects considered in the experiments are presented in Section III. In Section IV, we describe the methods used for the kinematic features extraction through the inertial BSN in each task and for the automatic UPDRS scoring system implementation. The experimental re-

sults are shown in Section V and discussed in Section VI, together with a possible application of the proposed system for the management of PD patients in a telemedicine scenario. Conclusions are presented in Section VII. II. PRELIMINARIES AND RELATED WORK A. UPDRS Tasks The guidelines of the MDS for the evaluation of PD motor tasks are described in the Part III of the UPDRS document [1]. In this paper, we focus on the items 3.8, 3.9, and 3.10, which correspond to the LA, Arising from Chair,1 and G tasks. The choice of these particular tasks was influenced by the need to keep the BSN as simple as possible, maximizing at the same time the number of tasks which could be analyzed without changing the sensors’ placement. The selected tasks are particularly suitable for the considered unified analysis and clinically relevant for a comprehensive evaluation of the patients’ symptoms, as they refer to different aspects of PD (for example, LA is related to bradykinesia, while S2S and G are associated with posture/deambulation symptoms). We now quickly summarize the basic characteristics of each of these tasks. 1) LA Task: The LA exercise consists of alternately raising up and stomping the feet on the ground, as high and as fast as possible. Ten repetitions per leg must be performed while sitting on the chair in order to test each leg separately (in the following, we will distinguish between right LA (RLA) and left LA (LLA) tasks).2 The significant parameters that have to be measured, independently for each leg, are the speed, the regularity, and the amplitude of the movement. 2) S2S Task: In the S2S task, the patient is asked to sit on a straight-backed chair with armrests. The exercise consists in crossing the arms across the chest (in order to avoid their use in the movement) and getting up from the chair. In the case of failure, the patient can retry to raise up to two more times. If still unsuccessful, the patient can move forward on the chair to facilitate the movement, or in case of another failure, he/she can use the armrests to stand up. 3) G Task: In the G task, the patient is asked to walk, at his/her preferred speed, away from the examiner for at least 10 m and in straight line, then to turn around and return to the starting point. The parameters of interest are those strictly related to the gait characteristics, such as the stride/step amplitude and speed, the cadence, the gait cycle time (GCT ), parameters related to the turning phase, the variability between left and right steps, and the arm swing. Freezing of gait should be evaluated separately. The assessment of the upper limbs (e.g., the arm swing) will not be considered in this paper, as no sensor is placed on the arms in the designed BSN. An extension of the current approach including sensing devices on the arms represents an interesting research direction. 4) UPDRS Evaluation: As mentioned in Section I, the UPDRS allows assignment of an integer score to a patient’s motor 1 For consistency with our previous work [8], in the following, we denote the Arising from Chair task as S2S task. 2 When not specified, LA refers to the general task, including both RLA and LLA trials.

PARISI et al.: BODY-SENSOR-NETWORK-BASED KINEMATIC CHARACTERIZATION AND COMPARATIVE OUTLOOK OF UPDRS SCORING

1779

TABLE I UPDRS MAPPING FOR THE CONSIDERED TASKS LA task

S2S task

G task

UPDRS

Amplitude

0 1 2 3 4 UPDRS 0 1 2 3 4 UPDRS 0 1 2 3 4

nearly constant no decrements near the end slight decrements midway mild decrements after first tap moderate always minimal or null severe Failed attempts 0 failed attempts ≥ 1 failed attempts 0 failed attempts ≥ 1 failed attempts

Hesitations

Independent walking yes yes yes no no

Interruptions

Slowing

0 0 ≥1 1,2 − 3,4,5 − ≥6 − always Use of armrests Slowing no no no yes yes − yes − not able to stand up alone Impairments level no impairments minor impairments substantial impairments assistance device needed for safe walking cannot walk at all or only with another person’s assistance

Freezing 0 0 0 ≥1 − Move forward on chair no yes − −

performance in a specific task, ranging from 0, which means that the patient is able to perform the task normally and with no impairments, to 4, which means that the patient has major difficulties in performing the exercise or is not able to perform it at all. Table I maps the characteristics of each task which the examiner should consider for the assessment of the patient’s performance to the UPDRS scores. B. Related Work The kinematic analysis of specific motor tasks through different motion capture and sensing technologies, such as optoelectronic systems and inertial-based BSN, has been widely studied for various clinical applications. With regard to PD, the majority of the existing literature has focused on the quantitative kinematic characterization of single motor tasks and/or on the evaluation of patients’ performance in different PD conditions. In this context, the LA task [7], [11], the S2S task [8], [12], the G task [10], [13], [14], and tremors [15] have been analyzed. To the best of our knowledge, only limited attention has been devoted to investigation of the relationship between different UPDRS tasks characterized through motion capture technologies. Stochl et al. [16] investigated the structure of PD symptoms in terms of the motor symptom evaluations defined in UPDRS Part III. Five main latent symptoms factors were identified, and the correlations between the UPDRS scores assigned to the various tasks were reported. Similarly, in [17], a statistical analysis of the UPDRS motor scores was performed, using classical evaluation methods by neurologists and considering PD patients in both ON (i.e., the intervals during which the medication is effective) and OFF (i.e., the intervals during which the medication is not effective) conditions, in order to identify latent relationships between UPDRS tasks and combine the tasks in “macrogroups” related to similar PD symptoms. Five of these groups, denoted as “factors,” have been identified (namely, gait/posture, tremor, rigidity, left extremities bradykinesia, and right extremities bradykinesia). The correlations 1) between the UPDRS scores assigned by neurologists to the introduced “macro-groups” of tasks and 2) between them and an aggregate

Fig. 1. (a) Inertial BSN designed for the evaluation of the three UPDRS tasks of interest (LA, S2S, G): the subsets of nodes used in each task are marked with different colors. (b) Shimmer device (IMU) and its reference coordinate system.

UPDRS score are also presented, showing that the macrogroups can be assessed separately and provide information about different aspects of the disease. From our point of view, carrying out a comparative analysis of the LA, S2S, and G tasks, using 1) the same BSN for the kinematic characterization of each task and 2) the same approach for the design of an automatic UPDRS scoring system, allows analysis of the correlations between the UPDRS values of single tasks and a total UPDRS score, thus opening an interesting research direction. III. EXPERIMENTAL SETUP A. Hardware The BSN designed for the unified evaluation of the LA, S2S, and G tasks is formed by only three IMUs, one on the chest and one per thigh, attached to the body with Velcro straps, as shown in Fig. 1(a). Each node is a Shimmer device (http://www.shimmersensing.com/, [18]), which is a small (dimensions: 53 mm × 32 mm × 25 mm; weight: 22 g) and lowpower wireless IMU, equipped with a triaxial accelerometer, a triaxial gyroscope, and a triaxial magnetometer. A Shimmer node and its reference coordinate system are shown in Fig. 1(b). The sampling rate is set to 102.4 Hz, which is the closest, in the

1780

IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 19, NO. 6, NOVEMBER 2015

set of the sampling rates supported by the Shimmer platform, to the sampling frequency (namely, 100 Hz) of the optoelectronic reference system (Vicon, Oxford, U.K.) used for the validation of the inertial signals’ accuracy. The acquired data are streamed wirelessly (via a Bluetooth radio interface) to a personal computer, where signal processing and automatic classification (described in details in Section IV) are performed. The placement of the sensors has been chosen taking into account two main motivations: 1) the need to analyze the three tasks without changing the configuration of the nodes in order to minimize the patients’ stress and simplify the acquisition procedure, allowing sequential execution of the three tasks; and 2) the higher accuracy and reliability of IMUs in measuring inclinations and accelerations, rather than positions or displacements. With the current BSN configuration, indeed, all the kinematic parameters in the LA and S2S tasks are extracted from inclination and/or angular velocities measured with the nodes on the thighs and on the chest, respectively; at the opposite, in the G task, the majority of the features are extracted from acceleration signals directly measured by the sensor placed on the trunk. For validation purposes, the data acquired with the inertial BSN and the extracted kinematic features have been compared with those measured with the Vicon optoelectronic system. In particular, in [7], we have first demonstrated the equivalence between heel’ and thigh’ kinematics. More precisely, the 3-D orientations of the Shimmer nodes placed on the thighs are estimated, with reference to the Earth frame, through an orientation estimation filter [19]. For each leg, the orientation component in the sagittal plane, corresponding to the inclination θ (dimension: [deg]) of the thigh, is extracted, together with the thigh’s angular velocity ω (dimension: [deg/s]). These signals are then compared to those estimated with the optoelectronic system taking into account the 3-D positions and velocities of reflective markers placed on the subject’s heels. The results presented in [7] show a strong correlation (approximately equal to 0.98) between heels’ optical data and thighs’ inertial data, motivating the use of θ and ω for the kinematic characterization of the LA task. The same approach has been applied, in the S2S task, to determine the accuracy, with respect to the Vicon system, of the trunk inclination estimated with a Shimmer node and similar accuracy results have been obtained. The validation of kinematic parameters for the G task is discussed in [9] and [10]. The average errors are comparable with those obtained in other studies in the literature, such as [13], [20], [21], and are sufficiently low to be considered almost negligible for the purpose of this study—for example, the average errors for the estimations of some temporal parameters, such as the heel-strike (HS), toe-off (T O), and GCT , are (mean ± standard deviation) 8.22 ± 17.6, 6.83 ± 26.3, and 8.87 ± 23.7 ms, respectively, whereas for spatial parameters, such as the stride length and the step length, the average errors are 4.23 ± 4.94 and 3.15 ± 7.34 cm, respectively. B. Subjects and Acquisition Procedure The subjects in these studies were 34 PD patients, including 22 males and 12 females with average age equal to 67.4 years (ages between 31 and 79 years) and standard deviation equal

to 11.6 years. The average Modified Hoehn and Yahr Scale score for the subjects was 1.6 (standard deviation equal to 0.47, minimum score equal to 1, maximum score equal to 3) on the 1-to-5 scale (higher scores indicate more severe impairments and more advanced stages of the disease). The sensing devices were placed on a patient’s body as shown in Fig. 1(a), trying to align the x-, y-, and z-axes of the node coordinate reference system, shown in Fig. 1(b), to the upward–downward, right–left, and forward–backward directions, respectively. The alignment of the sensors, with respect to the anatomical structure, is aided by the developed acquisition software. Before the beginning of the acquisition procedure, a check is performed on the sensors’ placement, considering both the gravity direction and the 3-D orientation of the Shimmer nodes in the Earth frame: if the alignment is within a confidence range (heuristically defined), the examiner is allowed to proceed in the acquisition procedure; otherwise, a warning message is shown and the procedure is stopped until the sensors’ placement is correctly modified by the examiner. The data acquisitions were carried out by asking the patient to execute the LA, S2S, and the G tasks sequentially. In each task, only the signals recorded by a proper subset of nodes of the BSN were considered—the subsets of Shimmer devices used for the evaluation of the single tasks are shown in Fig. 1(a) using different colors. For the LA task acquisitions, only the two devices placed on the thighs were used, whereas for the S2S task, only the trunk-mounted node was considered. In the G task, all the BSN IMUs were used to achieve a complete characterization of the complex gait movement. Although we studied 34 patients, a total of 47 trials (94 for the LA task, considering separately RLA and LLA) per task have been acquired, since some of the patients performed the tasks in distinct PD conditions, i.e., in ON/OFF states or at different times corresponding to different motor fluctuation phases. The motor performance of the same subject in these situations and, consequently, the recorded kinematic patterns and the assessment by the neurologists may vary consistently. To avoid distortion in the results, we include in our dataset only the trials by the same patient in which substantial variations in motor performance have been observed, allowing us to consider each trial as a single sample for the following analysis. We remark that a detailed analysis of PD patients’ motor performance, distinguishing between ON and OFF conditions, represents a relevant extension of our work. All the trials have been assessed independently by three neurologists expert in movement disorders, using a noninteger scale with intermediate scores (·.5) to label the trials in which the neurologists were undecided between consecutive (integer) UPDRS classes. The consensus score, denoted as UPDRSM ean , is defined as the arithmetic average of the scores assigned by the three neurologists to each trial, rounded to the nearest (integer or intermediate) UPDRS value. This methodology has been already used in the literature to combine the assessments of multiple neurologists in a single concise score [22], enhancing the robustness of the evaluation and reducing the distortion caused by the interrater variability [2]. In Fig. 2, the distributions of UPDRSM ean and of the UPDRS scores assigned by the three neurologists to the 47 trials are shown. It can be observed that there are slight

PARISI et al.: BODY-SENSOR-NETWORK-BASED KINEMATIC CHARACTERIZATION AND COMPARATIVE OUTLOOK OF UPDRS SCORING

Fig. 2.

Trials distribution for all the considered tasks.

differences among neurologists, in both the assessment criteria (for example the neurologist 1 uses more than the other two the intermediate scores, whereas neurologist 2 tends to assign higher scores than the other two in the LA task) and the distribution of the UPDRS scores in the three tasks. In particular, the latter seems to be a Gaussian-like distribution centered in correspondence to a dominant UPDRS class, which depends on both the neurologist and the task. Finally, the distribution of the UPDRSM ean score, as expected, is smoother than the UPDRS scores of the single neurologists and the scores in the LA, S2S, and G tasks show Gaussian-like distributions centered in 1.5, 0.5, and 1, respectively. We note that in an “ideal” scenario, all UPDRS classes should have similar number of patients, i.e., the UPDRS distribution should be almost uniform. IV. METHODS A. Kinematic Features Extraction The unified approach proposed in this study, which aims, through a single BSN, to automatically assign UPDRS scores to the LA, S2S, and G tasks, considers the same data processing and automatic classification methods presented in the single tasks analyses proposed in [7]–[10]. In the following, a concise description about the features’ extraction procedure is provided. For ease of visualization, in Table II, a summary of the most relevant features identified for each task is provided. The names of the parameters considered in the experimental results are highlighted in bold. 1) LA Task: As anticipated in Section III-A, both the inclination (θ, dimension: [deg]) and the angular velocity (ω, dimension: [deg/s]) of the thighs in the sagittal plane, measured by the Shimmer nodes on the thighs, are considered for the kinematic analysis of the LA task. An illustrative portion of the inclination signal θ of one thigh, for two consecutive LA repetitions, denoted as rth and (r + 1)th (r ∈ {1, 2, . . . , 9}), is shown in Fig. 3(a). Through an automatic segmentation procedure, three fundamental time epochs, denoted as tS (r), tE (r), and tP (r) and associated, respectively, with the start, the end, and the epoch of maximal thigh inclination of the rth LA repetition, are identified. Starting from these epochs, various parameters can

1781

be straightforwardly calculated. The results obtained in [7] have shown that, in the time domain, the most relevant features for the kinematic characterization of the LA task are: the angular amplitude Θ (dimension: [deg]), the angular speed of execution Ω (dimension: [deg/s]), the pause between consecutive executions P (dimension: [s]), the regularity of execution R (dimension: [s]), and the repetition frequency F (dimension: [Hz]). In the frequency domain, upon the computation of the amplitude spectra of the discrete Fourier transforms of θ and ω, denoted as Xθ and Xω , respectively, the powers of the inclination spectrum (PX θ ) and the angular velocity spectrum (PX ω ) are shown to be relevant features. For the rth LA repetition (r ∈ {1, 2, . . . , 10}), the expression of the most relevant features outlined in the previous paragraph is shown in the first part of Table II. When not specified, in the following, we refer to the average values of the features Θ, Ω, P , and R, obtained by averaging, over all consecutive repetitions, the values measured in each repetition and denoted, respectively, as Θm ean , Ωm ean , Pm ean , and Rm ean . 2) S2S Task: The S2S task is the simplest one among those considered in this paper. For this reason, the characterization of the body movements during the execution of the task can be easily obtained considering only the inclination signal estimated through the chest-mounted sensor. As for the thighs’ nodes in the LA task, the 3-D orientation of the Shimmer device placed on the trunk is estimated and the inclination of the torso, denoted as θ (dimension: [deg]), is measured [8]. The typical shape of θ, during the S2S task, is shown in Fig. 3(b). The following relevant time labels can be identified with a simple automatic segmentation procedure: 1) the starting epoch tS of the S2S (i.e., when the chest starts bending forward); 2) the epoch of maximal bending of the chest tP (placed around the middle of the S2S exercise); and 3) the ending epoch tE of the S2S (i.e., when the chest returns in the vertical position). Once these time instants have been identified, the 12 features shown in the central part of Table II can be calculated. In particular, the features refer to the duration T (dimension: [s]), the angular amplitude Θ (dimension: [deg]), and the speed of execution Ω (dimension: [deg/s]) of the S2S exercise. Moreover, the difference D between the forward and the backward bending phases is computed for all the considered variables. In [8], the subset of features, among the 12 extracted, which has turned out to be the most significant for the characterization of the S2S task, includes T , TB , TF , DT , Θ, and Ω. 3) G Task: The movements involved in the G task are inherently more complex than those associated with LA and S2S tasks, and for this reason, the characterization of gait through kinematic features is more challenging. In [9] and [10], an indepth kinematic analysis of Parkinsonian gait is performed, considering features in both time and frequency domains. A novel approach for gait cycle phases segmentation, through a proper processing of the accelerometric signal of the chest-mounted inertial node, is presented. Considering the typical patterns in trunk accelerations shown in Fig. 3(c), the fundamental events, which identify a complete gait cycle and all the associated gait phases, namely, the HS (i.e., the instant at which the foot

1782

IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 19, NO. 6, NOVEMBER 2015

TABLE II SUMMARY OF THE MOST RELEVANT FEATURES CONSIDERED FOR EACH TASK Task LA

Name

Definition Θ(r )  Ω(r ) 

Angular amplitude Angular speed of execution

P (r )  t S (r + 1) − t E (r ) R (r )  t P (r + 1) − t P (r ) F  t ( 1 01) 0−t ( 1 ) E S N −1 2 P X θ  N1 h = 0 (X θ , h ) 2 1  N −1 PX ω  N h = 0 (X ω , h ) TF  tP − tS TB  tE − tP T  TF + TB = tE − tS DT  TF − TB Θ F  θ (t P ) − θ (t S ) Θ B  θ (t P ) − θ (t E ) Θ +Θ Θ F2 B DΘ  ΘF − ΘB Θ ΩF  T F

Pause of execution Regularity of execution Repetition frequency

S2S

Thigh inclination spectrum power Thigh angular velocity spectrum power Forwards bending duration Backwards bending duration Total duration Forwards/backwards duration difference Forwards bending amplitude Backwards bending amplitude Average bending amplitude Forwards/backwards bending amplitude difference Forwards bending speed

Gait

Stance Time Initial Double Support Terminal Double Support Double Support Limp Step Length Stride Length Step Velocity Thigh Range of Rotation Cadence

Ω

ΘF+ΘB T

T O R / L ( k ) −H S R / L ( k ) G C T R / L (k ) T O L ( k ) −H S R ( k ) I D S (k )  100 × G C T (k ) T O R ( k ) −H S L ( k ) T D S (k )  100 × G C T (k )

S T R / L (k )  100 ×

D S (k )  I D S (k ) + T D S (k ) L im p(k )  |I D S (k ) − T D S (k )|

S tepL R / L (k )  K 2

2h R / L (k ) − h R / L (k ) 2

Dimension

r -value

[deg] [deg/s]

n.s. −0.50

[s] [s] [Hz]

0.27 0.49 −0.36

adimensional adimensional [s] [s] [s] [s] [deg] [deg] [deg] [deg] [deg/s]

−0.46 −0.34 0.58 0.56 0.62 0.47 0.35 0.25 0.33 0.20 −0.33

[deg/s]

−0.21

[deg/s] [deg/s] [s]

−0.32 −0.14 0.30

% of GCT

n.s.

% of GCT

0.29

% of GCT

n.s.

% of GCT % of GCT

n.s. 0.30

% of height

−0.59 (mean)

S L (k )  S tepL R (k ) + S tepL L (k ) S tepL S tepV R (k )  H S ( k ) −H RS ( k )

% of height % of height/s

−0.60 −0.59

T hig h R oR R (k )  m ax i θ (i) − m in i θ (i) i ∈ G C T R (k ) C  6d0 f

[deg] [steps/min]

−0.49 n.s.

R s t e p  A u n b i a s e d (d 1 )

adimensional

−0.58

L

R

1

Step Regularity

Total spectrum power

T

= + Ω B TB DΩ  ΩF − ΩB G C T R / L (k ) = H S R / L (k + 1) − H S R / L (k )

R

Symmetry Spectrum power for accelerations

F ΘB TB T Ω F TF

ΩB 

Backwards bending speed Average bending speed Forwards/backwards bending speed difference Gait Cycle Time

Θ A (r )+ Θ D (r ) 2 Θ A (r )+ Θ D (r ) T (r )

Pa ve rt/ y / z

S  R step trid e  Ns −1 2  N1 k = 0 (X a v e r t / y / z , k )

Ps u m  Pa ve rt + Pa y + Pa z

adimensional

n.s.

adimensional

−0.38 (mean)

adimensional

−0.43

The names of the parameters taken into account for the experimental results are marked in bold. In the last column, the correlation coefficients between the features and the neurologist-assigned UPDRS scores are shown (the best for each task is highlighted in bold). Correlation coefficients (r -values) with associated p-values (not shown here) greater than 0.05 are considered as nonsignificant (n.s.).

touches the ground) and the T O (i.e., the instant at which the foot leaves the ground), have been identified. Once all the HSs and the T Os for both legs are known, temporal parameters, such as gait cycle time (GCT , dimension: [s]), Stance Time (ST , dimension: [% of GCT]), double support time (DS, dimension: [% of GCT]), and Limp (dimension: [% of GCT]), can be calculated following the approaches of classical gait analysis [13], [23]. Spatial parameters, such as Stride Length/Velocity (SL/SV , dimension: [% of patient’s height]/[% of patient’s height/s]) and Step Length/Velocity (StepL/StepV , dimension: [% of patient’s height]/[% of patient’s height/s]), are estimated modeling gait as an inverted pendulum and using the vertical displacement (h) of the trunk and the leg length () to compute the forward distance (D) traveled at each step [20]. Important information about the mobility of the lower limbs is extracted using the gyroscopes of the Shimmer devices placed on the thighs. Integrating the angular rate measured by the gyroscopes, the instantaneous inclination of the thighs (θ) can be estimated. The angular amplitude

of a thigh’s flexion/extension movement, denoted as thigh range of rotation (T high RoR, dimension: [deg]), is then computed considering the maximum and the minimum inclination values in each gait cycle. Moreover, we calculate and analyze the unbiased autocorrelation (Aunbiased ) associated with the acceleration signals recorded with the trunk-mounted IMU to obtain, in a simple way, additional information about the regularity and periodicity of patients’ walking patterns, such as the Step Regularity (Rstep , adimensional) and the step/stride Symmetry (S, adimensional) [24]. Finally, similarly to the LA case, in the frequency domain we compute the spectra of the three components of the trunk acceleration, denoted, respectively, as Xa v e r t , Xa z , and Xa y . The power associated with each spectrum is then calculated and their sum, denoted as Psum (adimensional), is a relevant kinematic feature. For ease of visualization and according to the results obtained in [10], in the following, only a reduced set of features is considered. In particular, the parameters with right and left

PARISI et al.: BODY-SENSOR-NETWORK-BASED KINEMATIC CHARACTERIZATION AND COMPARATIVE OUTLOOK OF UPDRS SCORING

1783

Fig. 3. Typical patterns in (a) the inclination signal θ of one thigh during the LA task (for two consecutive repetitions), (b) the torso inclination signal θ during the S2S task, and (c) the trunk acceleration signals during the G task. In (a) and (b), the fundamental time events are denoted with red crosses and the intuitive representations of the most relevant features are also shown. In (c), the circles in linear vertical acceleration a ve rt identify the searching region in the frontal acceleration a z , within which the H S and T O events, denoted, respectively, with triangles and asterisks, are identified. In the mediolateral acceleration a y , H S points are connected with consecutive T O points by a line whose slope allows to discriminate left leg (blue line, positive slope) from right leg features (red line, negative slope).

components are replaced with the arithmetic average of the two values. The reduced set of features for the G task is the following: {GCTm ean , STm ean , DS, Limp, SL, StepVm ean , C, Rstep , S, T high RoRm ean , Psum }. B. Automatic Classification Approach As mentioned in Section I, a key tool behind this study is the design and implementation of a unique system for the automatic UPDRS scoring of the LA, S2S, and G tasks, based on the assessment of the relevant kinematic features outlined in Section IV-A. In our previous works [7], [8], [10], although each task is analyzed singularly, the same approach for data processing, automatic classification, and performance analysis is used. In the following, a brief description of the used methods is presented. 1) Principal Component Analysis: In order to reduce the features’ dimensionality and redundancy, while retaining most of the information content of the original data, principal component analysis (PCA) is applied on the collections of kinematic features defined for each task. Before applying PCA, the original data are first centered at their means (which are set equal to 0) and rescaled to have unit standard deviation. For the automatic classification procedure presented in the following, both original and ”PCA-projected” data will be considered as input. 2) Classification Algorithms and Performance Analysis: The automatic UPDRS scoring system relies on the use of the consensus UPDRS score (UPDRSM ean ), i.e., on the arithmetic average of the UPDRS scores assigned by the three neurologists to each trial. Three well-known classifiers have been considered: nearest centroid classifiers (NCC), k nearest neighbors (kNN), and support vector machine (SVM) [25]. The chosen classification algorithms have different characteristics in terms of complexity and effectiveness and, thus, represent a good starting point to evaluate the feasibility of the proposed system. In order

to avoid biasing the classification performance and considering the amount of data available, the leave-one-out cross-validation method is chosen. In particular, each point of both the original and “PCA-projected” dataset is used, in turn, as a new (unknown) point to be classified, while the remaining samples are used to train the classifiers. The result of the classification procedure is an estimated UPDRS value, generally denoted as u M , for every trial of each task. The system performance is evaluated considering the absolute3 classification error eM , defined as follows: uM − uM | eM  | where uM is the value of UPDRSM ean for the considered trial M ∈ {0, 0.5, 1, 1.5, 2, 2.5, 3, 3.5, 4}). (uM , u In [7], [8], and [10], the automatic UPDRS scoring systems designed for the three tasks of interest have been exhaustively tested in order to determine the configurations, in terms of features combinations and system parameters, which allow us to achieve the best classification performance. For each task, the cumulative distribution function (CDF) of the error eM is computed considering the results of the classification procedure obtained using, as inputs for the three classifiers (NCC, kNN, and SVM): 1) all the possible combinations of kinematic features with original data; and 2) increasing number of principal components (up to the number of features in the original dataset) when PCA-projected data are considered. Furthermore, when kNN is used as classification algorithm, values of k between 1 and 10 are used. The area under the curve (AuC) of the CDF of eM is selected as a representative performance optimization metric, since maximizing this value corresponds to minimizing 3 The absolute value of the classification error is considered because we are interested in quantifying the absolute deviation between automatically estimated and neurologist-assigned UPDRS scores.

1784

IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 19, NO. 6, NOVEMBER 2015

Fig. 4. Radar plots of the average normalized features grouped by UPDRS class in the LA, S2S, and G tasks considering the UPDRS scores by neurologist 1 [cases (LA-1), (S2S-1), (G-1)], neurologist 2 [cases (LA-2), (S2S-2), (G-2)], neurologist 3 [cases (LA-3), (S2S-3), (G-3)], and the UPDRSM e a n [cases (LA-M), (S2S-M), (G-M)].

the overall absolute classification error. Among all the CDFs obtained for all considered parameters’ combinations, those which maximize the AuC are selected to determine the system configuration which achieves the best classification performance. V. RESULTS A. Kinematic Characterization As already discussed in [7], [8], and [10], in each task, a strong relationship can be identified between some of the extracted kinematic features and the UPDRS score assigned by the neurologists. In Fig. 4, the average values, over all trials belonging to each UPDRS class (from 0 to 4), of the most relevant features for the LA, S2S, and G tasks are shown, through radar plots, considering the UPDRS scores given by the three neurologists

and the UPDRSM ean . Each parameter has been normalized and rescaled in order to assume a value between 0 and 1, where 0 represents the worst case and 1 the best case—best and worst are feature-specific. As expected, it can be observed that, beyond the interneurologist variability, in each task, the overall motor performance of the patients, in terms of its associated kinematic parameters, tend to worsen for increasing UPDRS scores. The values of the parameters corresponding to the UPDRS class 0, indeed, achieve the best performance in almost all the considered features and in every task, whereas the values belonging to higher UPDRS classes (from 2.5 to 3.5) tend to lie in the region of the plot near the origin, which is thus associated with the worst performance. In order to show how much every single feature is related to the UPDRS score, we compute the Pearson’s correlation coefficient (denoted also as r-value)

PARISI et al.: BODY-SENSOR-NETWORK-BASED KINEMATIC CHARACTERIZATION AND COMPARATIVE OUTLOOK OF UPDRS SCORING

between the kinematic parameters and the UPDRSM ean values of each task. In the last column of Table II, the r-values for all the most relevant features are shown, highlighting in bold those which have the highest correlation with the UPDRSM ean score of the considered task. We note that correlations with the corresponding p-value4 higher than 0.05 are considered as nonsignificant (n.s.). Furthermore, the sign of the correlation coefficients makes it possible to discern if the value of the considered feature increases (positive sign) or decreases (negative sign) for increasing UPDRS score. For the LA task, the average values per UPDRS class is obtained considering both RLA and LLA trials. Looking at Fig. 4 (LA-1), (LA-2), (LA-3), and (LA-M), it can be observed that the decreasing trend for increasing UPDRS score is evident in all the considered features. Values belonging to the extremes UPDRS classes (e.g., UPDRS scores equal to 0 or 3/3.5) are clearly separated from the others, while values associated with the intermediate UPDRS classes (e.g., UPDRS scores equal to 1, 1.5, or 2) tend to overlap, in some cases, in the same region of the plot. This behavior is consistent with clinical evaluations of PD patients, when distinguishing between intermediate levels of impairments may be difficult. Considering the correlations between LA features and UPDRS score, the parameter which has the highest (absolute) r-value is Ωm ean . Similarly to the LA case, in the S2S task, the performance degradation for increasing UPDRS score is evident, as shown in Fig. 4 (S2S-1), (S2S-2), (S2S-3), and (S2S-M). The parameters for which this trend is clearer are those related to the duration of the S2S single rising, namely, T , TF , and TB . As expected, these parameters achieve the highest correlation values with respect to the UPDRS score and are those that better characterize the S2S task. In the G task, the trends of the features are more difficult to interpret due to the higher complexity of the body movements involved in this exercise. Considering the temporal parameters, such as GCT , ST , DS, and Limp, the performance decreases for increasing UPDRS score. Patients with gait impairments, indeed, tend to walk more slowly than normally walking subjects, increasing the duration of GCT and the permanence in the DS phase. This normally corresponds to lower values of C, since the number of steps, which can be performed by a patient in a minute reduces. Nevertheless, in Fig. 4 (G-1), (G-2), (G-3), and (G-M), it can be observed that the “best” performance in some of the temporal features is achieved by the average values associated with UPDRS class 3. This is due to the fact that PD patients who present festinating gait, i.e., a gait pattern alteration typical of Parkinsonians, characterized by a quickening and shortening of normal strides, perform short steps with a very high cadence, thus leading to values in temporal parameters and cadence that may be interpreted as “good” even for high UPDRS scores. Spatial parameters, flexion/extension excursions of the thighs, and step regularity show similar behaviors, with a clear decreasing trend for increasing UPDRS values. This result is consistent 4 We recall that the p-value represents the probability that the observed differences, in the sample data which are being tested, are due to random sampling errors and not to true differences between populations [26].

1785

TABLE III COMBINATION OF PARAMETERS CORRESPONDING TO THE BEST PERFORMANCE OF THE AUTOMATIC UPDRS SCORING SYSTEM FOR EACH TASK Task

Classifier

Set of features

LA S2S G

k NN (k = 3) k NN (k = 3) k NN (k = 6)

{Ω m e a n , F } T {D S , R s t e p , T hig h R oR m e a n }

with clinical observations of Parkinsonian walking, in which patients with increasing gait impairments perform shorter steps, with reduced velocity and regularity, revealing, in general, a more limited movement range in lower limbs [27]. This observation is also confirmed by the good correlation values (absolute values approximately equal to 0.6) between kinematic parameters, such as SL, StepVm ean , Rstep , and T high RoRm ean , and the UPDRSM ean score of the G task. The S parameter maintains a low variability across all UPDRS values, except for the subjects with UPDRS score equal to 3: this seems to be more related to the single subject walking characteristics than to the entire scoring cluster. Finally, the feature Psum , which is representative of the overall “power” associated with the patient’s movements during gait, clearly decreases monotonically from UPDRS 0 to UPDRS 3, showing also a moderate correlation (r-value equal to −0.43) with the UPDRS score. B. Automatic Detection As anticipated in Section IV-B2, an exhaustive testing of the automatic UPDRS scoring system for each task has been carried out in previous papers [7], [8], [10] in order to find the parametric configuration which allows to achieve, for each task, the best classification performance, i.e., which maximizes the AuC of the CDF of the error eM . In Table III, the optimal system configuration, including the best classification algorithm and the best combination of features, is shown for each task. The observed results show that in all tasks the best classifier is kNN, with k set to 3 (LA and S2S) or 6 (G). Furthermore, it can be observed that the number of features associated with the best performance increases for increasing complexity of the task movement patterns. In Fig. 5(a), the CDFs corresponding to the optimized parametric configurations of the automatic UPDRS scoring system in each task are shown. The accuracy of the automatic system (corresponding to CDFs’ values at e = 0) ranges from approximately 43% in the LA and S2S tasks to 62% in the G task. The percentage of trials classified with e ≤ 0.5 is over 81% in all the considered tasks (LA: 83%; S2S: 81%; and G: 94%) while more than 94% of the samples are classified with e ≤ 1 (LA: 97%; S2S: 94%; and G: 98%). Moreover, it can be observed that the classification error is never greater than 1.5. To better characterize the classification performance with more details, in Tables IV–VI, the confusion matrices associated with the automatic scoring procedures of the LA, S2S, and G tasks are shown, respectively. It is easy to observe that the automatic system tends, in all cases, to bias the estimated

1786

IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 19, NO. 6, NOVEMBER 2015

TABLE VI CONFUSION MATRIX FOR THE G TASK AUTOMATIC UPDRS SCORING PROCEDURE

Fig. 5. CDFs of (a) the automatic classification error eM for the best performance achieved in each task and (b) the (absolute) difference d in the UPDRS scoring between neurologist 1 and 2.

TABLE IV CONFUSION MATRIX FOR THE LA TASK AUTOMATIC UPDRS SCORING PROCEDURE [% ]

0

0.5

1

1.5

2

2.5

3

3.5

0 0.5 1 1.5 2 2.5 3 3.5

0 8 0 0 0 0 0 0

50 54 23 5 17 0 0 0

50 23 18 16 0 0 0 0

0 15 53 65 58 50 50 0

0 0 6 11 25 10 0 0

0 0 0 3 0 40 50 100

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

TABLE V CONFUSION MATRIX FOR THE S2S TASK AUTOMATIC UPDRS SCORING PROCEDURE

[% ]

0

0.5

1

1.5

2

2.5

3

3.5

0 0.5 1 1.5 2 2.5 3 3.5

100 47 40 34 34 -

0 31 40 0 0 -

0 1 20 22 0 -

0 1 0 22 66 -

0 0 0 22 0 100 -

0 0 0 0 0 0 -

0 0 0 0 0 0 -

0 0 0 0 0 0 -

[% ]

0

0.5

1

1.5

2

2.5

3

3.5

0 0.5 1 1.5 2 2.5 3 3.5

0 0 0 0 0 0 -

0 45 0 6 0 0 -

100 55 78 25 0 0 -

0 0 22 69 100 100 -

0 0 0 0 0 0 -

0 0 0 0 0 0 -

0 0 0 0 0 0 -

0 0 0 0 0 0 -

UPDRS values around a dominant class, in accordance to the Gaussian-like distribution of the UPDRSM ean score observed in Section III-B. The UPDRS classes with a small number of samples are almost “ignored” by the classifier and the samples associated with them are labeled with a UPDRS value nearer to the dominant class, thus determining a general underestimation of actual UPDRS score. This behavior is a clear consequence of the non-homogeneity of the UPDRS evaluations of the various neurologists. From the presented confusion matrices, other perclass performance indexes, such as the precision, the sensitivity, and the specificity, can be calculated—they are not shown here due to lack of space. We only remark that the average values (across all the UPDRS classes) of these indexes are equal to (precision, sensitivity, specificity) 34.55%, 25.17%, and 84.52% in the LA task, 28.00%, 25.63%, and 77.51% in the S2S task, and 66.48%, 31.83%, and 88.03% in the G task. Finally, in order to evaluate the operational correctness of our automatic scoring system, we compare the achieved performance with the inter-rater variability of the neurologists in the UPDRS task assessment. In Fig. 5(b), the CDFs of the (absolute) difference d between the UPDRS scores assigned by neurologists 1 and 2 in the three tasks are shown. The agreement in the evaluations ranges from 33% in the G task to 52% in the S2S task, whereas the difference in the UPDRS scores between the two neurologists is lower than or equal to 1 in approximately 90% of the trials (100% for the G task). Similar results have been obtained from the comparison of the evaluations by neurologists 1 and 3 and by neurologists 2 and 3—the corresponding CDFs are not shown here for lack of space. Comparing Fig. 5(a) and (b), it is possible to observe very similar trends in the CDFs of the estimation error eM and the difference of neurologists’ evaluations d. It can thus be concluded that the variability in the UPDRS scoring between the automatic system and the neurologists seems to be comparable with the inter-rater variability between clinicians. In other words, the proposed automatic classification system is “accurate” enough to mimic the evaluation performance of medical personnel. Obviously, a larger set of patients, a more uniform distribution of the patients in all the UPDRS classes, and additional evaluations by more neurologists would make the proposed system performance analysis more meaningful from a statistical perspective. However, this goes beyond the scope of the paper,

PARISI et al.: BODY-SENSOR-NETWORK-BASED KINEMATIC CHARACTERIZATION AND COMPARATIVE OUTLOOK OF UPDRS SCORING

1787

Fig. 6. Comparison between UPDRS values assigned to each trial for different pairs of UPDRS tasks considering: in the upper row [(a)–(f)], the UPDRSM e a n scores; in the lower row [(g)–(l)], the UPDRS values estimated through the automatic scoring system. Each point corresponds to a “cluster” of trials labeled with the same pair of UPDRS values and its size is proportional to the number of trials belonging to the same cluster. (a) RLA versus LLA. (b) S2S versus RLA. (c) S2S versus LLA. (d) G versus RLA. (e) G versus LLA. (f) G versus S2S. (g) RLA versus LLA auto. (h) S2S versus RLA auto. (i) S2S versus LLA auto. (j) G versus RLA auto. (k) G versus LLA auto. (l) G versus S2S auto.

which focuses on proposing a novel approach, rather than an exhaustive medical investigation. C. Correlations In Section V-A, we have investigated the correlations between the most relevant kinematic features in each task and the UPDRS scores assigned by the neurologists. Since we are investigating the feasibility of a single system able to automatically assess different motor tasks of Parkinsonians and to automatically assign UPDRS scores to patients’ trials, we now focus on the analysis of the correlations between the UPDRS evaluation carried out by doctors and the one performed by our automatic UPDRS scoring system. Looking at Fig. 2, the differences in the distribution of the UPDRS values in the three tasks, performed by the same group of patients, can be easily distinguished in the (single) UPDRS scores assigned by the neurologists and the UPDRSM ean score. As expected, this means that the performance achieved by patients in the three tasks and, consequently, the UPDRS evaluations by doctors, may vary significantly from one task to another. In Fig. 6, a comparison between the UPDRSM ean scores [cases from (a) to (f)] and the UPDRS values assigned by the automatic system [cases from (g) to (l)] is shown considering pairs of tasks.5 Each trial is labeled with a pair of UPDRS values, corresponding to the UPDRS scores assigned to it in the two considered tasks. In each subplot, the “clusters” of trials labeled with the same pair of UPDRS scores and, thus, overlapping on the same portion of the plane, are shown as black points, whose size is proportional to the number of trials belonging to the cluster. Moreover, the linear regression line, i.e., the bestfitting straight line obtained with the least squares method, is also shown. For all the considered pairs of tasks, an increasing 5 For the LA task, comparisons with both RLA and LLA UPDRS scores are shown separately.

trend in UPDRS scores for both tasks can be observed, although for each UPDRS value assigned to one task, several UPDRS values can be assigned to the other task. For some pairs of tasks, such as those shown in Fig. 6(a) and (d)–(f), the linear relationship between the UPDRS scores is evident, and consequently, the regression line lies near the diagonal. In these cases, the values assigned in both tasks, in fact, are likely to be similar, indicating both comparable difficulties experimented by patients in the two tasks and a uniform metric used by the neurologists to assess them. In other cases, especially those including the S2S task, it can be observed that, while in one task (either LA or G), the trials are labeled with increasing UPDRS scores, in the S2S task, the patient is still able to achieve a good performance, and consequently, the trials are labeled with a UPDRS equal to 0 or 0.5. This leads to an accumulation of points near the 0 and 0.5 UPDRS classes for the S2S task, which also influences the slope of the linear regression line, making it “flatter.” This behavior is even more visible considering the automatically estimated UPDRS scores [cases from (g) to (l)]. It can been observed that the linear relationship between tasks is always weaker than in the neurologist-assessed cases. This is due to the fact that the automatic scoring system, as anticipated in Section V-B, tends to slightly underestimate the values of the UPDRS scores with respect to those assigned by doctors, increasing the concentration of the scores around the dominant UPDRS class of each task. To reinforce the considerations made observing the pairs of tasks in Fig. 6, we now analyze numerically the correlation between the considered tasks. We also introduce an aggregate UPDRS score, denoted as UPDRSTotal , given by the sum of the UPDRSM ean scores assigned to the RLA, LLA, S2S, and G tasks. The summation of the UPDRS scores assigned to different combinations of tasks [13], [17], [28], is a common practice used in the literature to predict the overall functional capabilities

1788

IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 19, NO. 6, NOVEMBER 2015

TABLE VII CORRELATIONS BETWEEN UPDRSM e a n SCORES Total Total RLA LLA S2S G

1.00 0.82 0.87 0.76 0.82

RLA

1.00 0.75 0.37 0.51

LLA

1.00 0.46 0.60

S2S

1.00 0.66

TABLE IX CORRELATIONS BETWEEN UPDRSM e a n SCORES AND AUTOMATICALLY ESTIMATED UPDRS SCORES

G Total

RLA

LLA

S2S

G

0.79

0.74

0.54

0.55

0.60

1.00

TABLE VIII CORRELATIONS BETWEEN UPDRS SCORES ASSIGNED BY THE AUTOMATIC UPDRS SCORING SYSTEM

Total RLA LLA S2S G

Total

RLA

LLA

S2S

G

1.00 0.77 0.74 0.58 0.54

1.00 0.63 n.s. 0.29

1.00 n.s. n.s.

1.00 n.s.

1.00

of PD patients. From our viewpoint, UPDRSTotal is a concise parameter representing the overall level of motor impairments of a patient in the considered tasks. In Table VII, the correlations between the UPDRSM ean scores in the various tasks and between them and UPDRSTotal are shown. As in Section V-A, the Pearson’s correlation coefficient (i.e., the r-value) is used, considering as significant only the parameters with associated p-value ≤ 0.05. It can be observed that the correlation between the UPDRS scores of each task and the aggregate score is high (from 0.76 to 0.87), indicating that UPDRSTotal significantly represents, in a concise way, the motor performance level measured by each task. The correlations between pairs of tasks, instead, range from 0.75 (between RLA and LLA, which are likely to be strongly correlated since they refer to the same exercise) to 0.37 between RLA and S2S—this is representative of the poor correlation between those tasks. The latter result is expected because, from a medical viewpoint, each UPDRS task aims to assess a specific PD symptom and the “performance” achieved by patients may vary consistently from one exercise to another [17], [29]. However, in some cases, such as in the comparison between S2S and G UPDRS scores, the r-value is still relevant because both tasks refer to the same “macrogroup” of PD symptoms (in this case to the “Gait/Posture” group) and may share some characteristics [17]. In the same way, in Table VIII, the correlation values (obtained by pairwise comparisons of the UPDRS scores estimated with the automatic UPDRS scoring system in each task, and the associated UPDRSTotal ) are presented. The obtained results still show relatively high r-values (although slightly lower the those of the neurologist-assessed cases) between tasks and the UPDRSTotal score. Also, the correlation between RLA and LLA remains high. For all the other pairs of tasks, instead, the correlations become nonsignificant. This is due to the fact that, as observed in Fig. 6(h)–(l), the automatically estimated UPDRS scores are often underestimated and tend to overlap in correspondence to the dominant UPDRS class of each task, making

Fig. 7. Bar plots representing the values of UPDRSTo ta l for all the trials considering (a) the associated UPDRSM e a n scores and (b) the UPDRS values assigned by the automatic scoring system. Contributes by each task are shown with different colors.

the linear dependence between the UPDRS values in the pairs of tasks weaker. For completeness, Table IX shows the correlations between the neurologist-assigned and the automatically assigned UPDRS scores. In UPDRSTotal and RLA cases, the correlation is high (0.79 and 0.74, respectively), while in the LLA, S2S, and G tasks, lower values are observed (0.54, 0.55, and 0.60, respectively), indicating a higher sensitivity to classification errors. Finally, in Fig. 7, the UPDRSTotal values, calculated considering both (a) the UPDRSM ean and (b) the trials’ scoring by the automatic system, are shown. The trials are ordered in ascending order to highlight the data trend for increasing values of UPDRSTotal . The contributions of the UPDRS scores assigned to the single tasks, for each trial, are shown with different colors. The (red) exponential curves, obtained by minimum mean square error fitting, represent smoothed versions of the aggregate scores’ trends. First of all, it can be noticed that, as mentioned

PARISI et al.: BODY-SENSOR-NETWORK-BASED KINEMATIC CHARACTERIZATION AND COMPARATIVE OUTLOOK OF UPDRS SCORING

TABLE X REDUCTIONS IN THE AUC OF THE CDF OF eM COMPARING THE VARIOUS PARAMETRIC CONFIGURATIONS WITH THE OPTIMAL ONE Method Number of features or principal comp.

k NN

1 2 3 4 5

−2.79% Best LA −0.29% −1.25% −1.15%

1 2 3 4 5

Best S2S −0.77% −0.97% −1.55% −1.36%

1 2 3 4 5

−1.11% −0.37% Best G −0.19% −0.37%

k NN-PCA

LA −5.20% −5.01% −7.23% −5.49% −4.53% S2S −0.19% −2.92% −1.94% −1.55% −1.55% G −4.46% −4.08% −4.83% −3.71% −4.27%

SVM

SVM-PCA

−3.76% −5.31% −4.82% −5.59% −6.07%

−10.99% −11.57% −12.72% −14.27% −14.94%

−2.92% −4.28% −2.92% −3.11% −4.67%

−3.50% −4.48% −3.31% −5.84% −3.70%

−6.69% −5.20% −4.27% −3.71% −4.08%

−8.36% −7.06% −9.47% −7.80% −8.00%

above, the automatic classification system tends to assign UPDRS scores in accordance to the dominant UPDRS class of each task: for the G task, indeed, the majority of the scores is equal to 1; for the S2S case, it is equal to 0; and for the LA task, most of the trials are labeled with UPDRS scores equal to 1.5. This behavior leads to generally lower values of UPDRSTotal and to a more reduced variability in the aggregate score for the automatic system, which is also emphasized by a lower slope of exponential fitting curve in Fig. 7(b). Another important observation regards the contribution given by the S2S scores to UPDRSTotal : it can be observed that it is almost negligible (between 0 and 0.5 in 90% of the trials) for aggregate scores lower than 5, especially when the automatic system is used. These experimental observations confirm the effectiveness of the LA and G tasks in representing the progression of motor impairments in PD patients and also highlight the “nonchallenging” nature of the S2S task for patients with mild symptoms. Nevertheless, the UPDRS score assigned to the S2S task becomes very important to distinguish between Parkinsonians with moderate and severe motor complications, when the UPDRSTotal is used as evaluation metric. VI. DISCUSSION A. On the Performance of the Proposed Automatic Classification System The results obtained in Section V-A indicate clearly that, in each task, some of the extracted kinematic features are strongly related to the UPDRS scores. The parameters that have turned out to be the most significant are the angular speed of execution (Ω) for the LA task, the total duration (T ) for the S2S task, and the stride length (SL) for the G task. Regarding the automatic UPDRS scoring system, the results presented in Section V-B have highlighted that the best classification performance is achieved using kNN as the classifier on

1789

the selected kinematic features, using an increasing number of features and increasing values of k for increasing complexity of the tasks. The accuracy of the system ranges from 43% (in the LA and S2S tasks) to 62% (in the G task), but the classification error is lower than or equal to 1 in more than 94% of the cases. The comparison between the evaluation error of the automatic system and the interrater variability of the neurologists has shown similar performance trends, allowing us to consider the accuracy of the proposed UPDRS scoring system acceptable. Nevertheless, the automatic system tends to underestimate the actual UPDRS scores and to concentrate the predicted UPDRS values in correspondence to dominant UPDRS classes. This is not an intrinsic limitation of the proposed approach. In fact, it could be overcome by considering a statistically more significant dataset (i.e., a larger set of patients, with a more homogeneous distribution across all UPDRS classes). In addition, increasing the number of neurologists involved in the evaluation could reduce the bias in the assessment of UPDRS motor tasks. In Section V-C, the comparative analysis of the correlations suggests that the motor performance of PD patients may vary consistently between different tasks, and thus, the associated correlations may range from low (0.37, poor correlation) to high (0.75, good correlation) values. The correlation between UPDRS scores in distinct tasks becomes weaker (almost always negligible) when the automatic system is considered. This is another consequence of the fact that the automatic system tends to bias the evaluation around the dominant UPDRS classes. However, this result complies with findings in the medical literature [17], according to which the correlation between different tasks is likely to be poor or moderate because each task has been defined with the aim to evaluate a specific aspect or symptom of the PD. Finally, our results have shown a good correlation between UPDRSTotal and all the UPDRS scores of all the tasks (slightly lower correlations for the automatically assessed tasks). This concise index can provide neurologists useful information about the overall condition and the functional capabilities of patients, integrating the evaluations made considering the single UPDRS scores in each task. The contribution of the S2S task in the aggregate score, for example, seems to be significant to distinguish patients with slight and mild symptoms from those who manifest moderate or severe impairments. This observation can be related to the characteristics of the movement typical of the S2S task, which involves the simultaneous activation of several control mechanisms (visual, posture, and balance) and worsens with the progress of the disease. A further investigation on the role of the S2S task goes beyond the scope of this work. B. Application to Telerehabilitation The characteristics of the designed system make it suitable for real applications in the e-health scenario, such as telemedicine systems for remote monitoring of PD. The objective recording of motor fluctuations in a home environment throughout the day, unlike a time-limited and “randomly timed” clinical evaluation in an out-patient environment, could provide more reliable information to neurologists and allow a more accurate assessment and management of the symptoms. These goals could be

1790

Fig. 8.

IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 19, NO. 6, NOVEMBER 2015

CDFs associated with different system configurations in the (a) LA, (b) S2S, and (c) G tasks. The best CDF is represented with a solid black line.

achieved, for example, implementing a software which guides patients in performing different activities, such as the UPDRS tasks for the evaluation of the PD symptoms or rehabilitation exercises in the comfort of their own home and perhaps several times per day. The information about the patient’s performance, acquired by the BSN, could be evaluated automatically or sent to the neurologist for further analysis and should be supported by a concurrent video acquisitions which clinicians can check to ensure that the exercises are performed correctly. With this expedient, the supervision of a movement disorder specialist would not be required, although the assistance of a relative or a nurse could be useful to check that each task is executed safely. A tool with these functionalities may bring several advantages 1) to patients, who would feel more comfortable and motivated to do their exercises in a familiar environment and would save time and money by avoiding to go to an ambulatory for each visit, and 2) to clinicians and doctors, who could assist a larger number of subjects and rely on more accurate and up-to-date clinical pictures. Since the e-health systems market is quite fragmented and poorly standardized, the best strategy for a practical implementation of the proposed PD monitoring application could be to integrate it into an existing e-health platform [30], [31], exploiting the APIs provided by the platform. The integration could be done, for example, at the gateway level (i.e., the level at which the data recorded by the sensing devices are collected, analyzed, and then forwarded through the web to the cloud-based core of the e-health platform), by adding a proper software component for the analysis of the data recorded by the IMU-based BSN. The processed data could then be managed according to the platform requirements, in the same way as the other data collected by the platform. C. Efficient Implementation In the previous sections, we have designed and tested our system in order to investigate its feasibility and performance, without considering constraints in terms of computational power or time consumption. In a real application scenario, as the one discussed in Section VI-B, the system would be subject to several limitations. An efficient design and implementation would be thus required in order to reduce the complexity and, consequently, the required computational resources, while still

retaining the ability to achieve good (although not optimal) performance. To this end, we now evaluate and discuss the differences in performance between the optimal system configuration and other “suboptimal,” but simpler, alternatives. In Table X, the performance reduction (in percentage), obtained by comparing the AuC of the CDF of the error eM associated with the best parametric configuration with the AuCs of “suboptimal” alternatives, are shown. The best configurations obtained for the three tasks, previously shown in Table III, are denoted in Table X as Best LA (AuC equal to 92.29%), Best S2S (AuC equal to 90.96%), and Best G (AuC equal to 95.39%) and highlighted in bold. Moreover, in Fig. 8, the CDFs associated with the kNN algorithm applied to an increasing number of features (from 1 to 4) of the original data and the best configurations obtained using 1) kNN and PCA, 2) SVM on the original dataset, and 3) SVM on the “PCA-projected” data, are shown to provide a visual counterpart of the data presented in Table X. As shown in Section V-B, it can be observed that the best classification algorithm is the kNN in all the considered tasks. This kind of classifier does not require an explicit training phase, as it keeps all the dataset points (true neurologist-based scores) to take decisions in the “online” phase. This so-called lazy learning approach, however, implies high memory consumption and computational power to check all the training set elements for each classification round. On the other hand, the SVM algorithm achieves a performance similar to (actually, slightly worse than) the one of the kNN method but with a more efficient learning procedure. In fact, SVM, after a more complex training phase, builds a compact classification model which can be used to simply identify the class of the test data. The relative reduction (in percentage), considering the best configuration in the SVM case for all the three tasks, is very limited, ranging from 2.92% (S2S task) to 3.76% (LA task). Another possible strategy to lower the complexity of the automatic classification procedure is to reduce the dimensionality of the features dataset used as input for the classifiers. Looking at Table X, it can be observed that the performance reduction, considering only one or two features (original data) for the kNN algorithm, is almost always below 2.80%. In the SVM case as well, the performance degradation, with respect to the best SVM case, using only one feature is minimal. Dimensionality reduction in the presence of PCA tends to worsen the overall performance with both kNN and SVM methods. From these

PARISI et al.: BODY-SENSOR-NETWORK-BASED KINEMATIC CHARACTERIZATION AND COMPARATIVE OUTLOOK OF UPDRS SCORING

considerations, it can be concluded that, although the best classification performance is achieved by the kNN classifier, the SVM classifier, applied to features of the original dataset, could be a more attractive choice for real-world applications, because of 1) higher efficiency (in terms of computational resources consumption) and 2) minimal classification accuracy reduction with respect to kNN. VII. CONCLUSION We have proposed an innovative approach for kinematic characterization of the LA, S2S, and G tasks through the same BSN with three nodes, together with automatic UPDRS assessment of the trials carried out by PD patients. Building on this unified approach, we investigated the intertask correlations and the correlation, per task, between the proposed automatic scoring system and the neurologists’ scoring. The main findings of our analysis can be summarized as follows. 1) Different UPDRS tasks aims to assess distinct aspects of the disease and show poor or moderate correlations between each other, especially with the automatic classification system. 2) On the basis of the scores given by three expert neurologists, our results show that the performance of the proposed automatic classification system compares favorably with typical interrater variability. 3) The aggregate UPDRS score (UPDRSTotal ) represents a good concise indicator of overall level of patient’s motor impairments. 4) The UPDRS scores associated with the LA and G tasks are effective in representing the progression of motor impairments in PD patients and contribute to the UPDRSTotal score proportionally to the level of difficulty experienced by the patient. The S2S task, instead, appears to be “nonchallenging” for patients with light or mild symptoms, but its contribution in the aggregate UPDRS score becomes important in identification of the subjects with most severe impairments. The integration of the proposed system in a real cloud-based e-health platform for the development of a telemedicine application for continuous monitoring of PD patients has then been discussed, with focus on a possible system architecture and possible strategies for an efficient implementation of the proposed functionalities. The analyses and findings discussed in this paper can be further investigated to overcome their current limitations. In particular, the proposed automatic UPDRS scoring system can be improved by considering a larger and, thus, statistically more relevant dataset, together with clinical evaluations of additional neurologists. Moreover, a motor performance evaluation of PD patients in UPDRS tasks considering separately ON and OFF states represents an interesting research direction. ACKNOWLEDGMENT The anonymous Reviewers provided relevant feedback, which led to a significant improvement of this work. The authors would like to thank Dr. L. G. Pradotto (Istituto Auxologico

1791

Italiano IRCCS, Piancavallo (VB), Italy) for his contribution in the clinical evaluation of the subjects considered in this research. They are also grateful to Prof. K. Friedl (University of California, San Francisco, CA, USA) for carefully revising the final version of this manuscript and for his valuable suggestions.

REFERENCES [1] C. G. Goetz, S. Fahn, P. Martinez-Martin, W. Poewe, C. Sampaio, G. T. Stebbins, M. B. Stern, B. C. Tilley, R. Dodel, B. Dubois, R. Holloway, J. Jankovic, J. Kulisevsky, A. E. Lang, A. Lees, S. Leurgans, P. A. LeWitt, D. Nyenhuis, C. W. Olanow, O. Rascol, A. Schrag, J. A. Teresi, J. J. V. Hilten, and N. LaPelle, “Movement disorder society-sponsored revision of the Unified Parkinson’s Disease Rating Scale (MDS-UPDRS): Process, format, and clinimetric testing plan,” Movement Disorders, vol. 22, no. 1, pp. 41–47, Jan. 2007. [2] B. Post, M. P. Merkus, R. M. de Bie, R. J. Haan, and J. D. Speelman, “Unified Parkinson’s disease rating scale motor examination: Are ratings of nurses, residents in neurology, and movement disorders specialists interchangeable?” Movement Disorders, vol. 20, pp. 1577–1584, Dec. 2005. [3] J. E. Ahlskog and R. J. Uitti, “Rasagiline, parkinson neuroprotection, and delayed-start trials,” Neurology, vol. 74, pp. 197–203, Jun. 2010. [4] K. Kieburtz, “Issues in neuroprotection clinical trials in Parkinsons disease,” Neurology, vol. 66, pp. 550–557, May 2006. [5] R. A. Hauser, F. Deckers, and P. Lehert, “Parkinson’s disease home diary: Further validation and implications for clinical trials,” Movement Disorders, vol. 19, pp. 1409–1413, Dec. 2004. [6] W. Maetzler, J. Domingos, K. Srulijes, J. J. Ferreira, and B. R. Bloem, “Quantitative wearable sensors for objective assessment of Parkinson’s disease,” Movement Disorders, vol. 28, pp. 1628–1637, Oct. 2013. [7] M. Giuberti, G. Ferrari, L. Contin, V. Cimolin, C. Azzaro, G. Albani, and A. Mauro, “Assigning UPDRS scores in the leg agility task of Parkinsonians: Can it be done through BSN-based kinematic variables?” IEEE Internet Things J., vol. 2, no. 1, pp. 41–51, Feb. 2015. [8] M. Giuberti, G. Ferrari, L. Contin, V. Cimolin, C. Azzaro, G. Albani, and A. Mauro, “Automatic UPDRS evaluation in the sit-to-stand task of Parkinsonians: Kinematic analysis and comparative outlook on the leg agility task,” IEEE J. Biomed. Health Informat., vol. 19, no. 3, pp. 803–814, May 2015. [9] F. Parisi, G. Ferrari, M. Giuberti, L. Contin, V. Cimolin, C. Azzaro, G. Albani, and A. Mauro, “Low-complexity inertial sensor-based characterization of th UPDRS score in the gait task of Parkinsonians,” presented at the 9th Int. Conf. Body Area Netw., London, U.K., Sep./Oct. 2014. [10] F. Parisi, G. Ferrari, M. Giuberti, L. Contin, V. Cimolin, C. Azzaro, G. Albani, and A. Mauro. (2015, Aug.). Inertial BSN-based characterization and automatic UPDRS evaluation of the gait task of Parkinsonians,” IEEE Trans. Affective Comput.. [Online]. Available: http://www.tlc.unipr. it/ferrari/TAFFC_Parisi_et_al_Feb2015.pdf [11] D. A. Heldman, D. E. Filipkowski, D. E. Riley, C. M. Whitney, B. L. Walter, S. A. Gunzler, J. P.Giuffrida, and T. O. Mera, “Automated motion sensor quantification of gait and lower extremity bradykinesia,” in Proc. 34th Annu. Int. Conf. IEEE Eng. Med. Biol. Soc., San Diego, CA, USA, Aug. 2012 pp. 1956–1959. [12] D. Giansanti, G. Maccioni, F. Benvenuti, and V. Macellari, “Inertial measurement units furnish accurate trunk trajectory reconstruction of the sit-to-stand manoeuvre in healthy subjects,” Med. Biol. Eng. Comput., vol. 45, no. 10, pp. 969–976, Oct. 2007. [13] A. Salarian, H. Russmann, F. J. G. Vingerhoets, C. Dehollain, Y. Blanc, P. R. Burkhard, and K. Aminian, “Gait assessment in Parkinson’s Disease: Toward an ambulatory system for long-term monitoring,” IEEE Trans. Biomed. Eng., vol. 51, no. 8, pp. 1434–1443, Aug. 2004. [14] S. T. Moore, H. G. MacDougall, J. Gracies, H. S. Cohen, and W. G. Ondo, “Long-term monitoring of gait in Parkinson’s disease,” Gait Posture, vol. 26, no. 2, pp. 200–207, Jul. 2007. [15] A. Salarian, H. Russmann, C. Wider, P. R. Burkhard, F. J. G. Vingerhoets, and K. Aminian, “Quantification of tremor and bradykinesia in Parkinson’s disease using a novel ambulatory monitoring system,” IEEE Trans. Biomed. Eng., vol. 54, no. 2, pp. 313–322, Feb. 2007. [16] J. Stochl, A. Boomsma, E. Ruzicka, H. Brozova, and P. Blahus, “On the structure of motor symptoms of Parkinson’s disease,” Movement Disorders, vol. 23, no. 9, pp. 1307–1312, 2008.

1792

IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 19, NO. 6, NOVEMBER 2015

[17] S. D. Vassar, Y. M. Bordelon, R. D. Hays, N. Diaz, R. Rausch, C. Mao, and B. G. Vickrey, “Confirmatory factor analysis of the motor unified Parkinson’s disease rating scale,” Parkinson’s Disease, vol. 2012, p. 719167, 2012. [18] A. Burns, B. R. Greene, M. J. McGrath, T. J. O’Shea, B. Kuris, S. M. Ayer, F. Stroiescu, and V. Cionca, “SHIMMER—A wireless sensor platform for noninvasive biomedical research,” IEEE Sens. J., vol. 10, no. 9, pp. 1527–1534, Sep. 2010. [19] S. O. H. Madgwick, A. J. L. Harrison, and R. Vaidyanathan, “Estimation of IMU and MARG orientation using a gradient descent algorithm,” in Proc. IEEE Int. Conf. Rehabil. Robot., Zurich, Switzerland, Jun. 2011, pp. 1–7. [20] W. Zijlstra and A. L. Hof, “Assessment of spatio-temporal gait parameters from trunk accelerations during human walking,” Gait Posture, vol. 18, no. 2, pp. 1–10, Oct. 2003. [21] A. K¨ose, A. Cereatti, and U. Della Croce, “Bilateral step length estimation using a single inertial measurement unit attached to the pelvis,” J. Neuroeng. Rehabil., vol. 9, pp. 1–9, Jan. 2012. [22] J. Stamatakis, J. Ambroise, J. Cr´emers, H. Sharei, V. Delvaux, B. Macq, and G. Garraux, “Finger tapping clinimetric score prediction in Parkinson’s disease using low-cost accelerometers,” Comput. Intell. Neurosci., vol. 2013, p. 717853, Jan. 2013. [23] K. Aminian, B. Najafi, C. B¨ula, P. F. Leyvraz, and P. Robert, “Spatiotemporal parameters of gait measured by an ambulatory system using miniature gyroscopes,” J. Biomech., vol. 35, no. 5, pp. 689–699, May 2002. [24] R. Moe-Nilssen and J. L. Helbostad, “Estimation of gait cycle characteristics by trunk accelerometry,” J. Biomech., vol. 37, pp. 121–126, Jan. 2004. [25] R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification and Scene Analysis,2nd ed. New York, NY, USA: Wiley-Interscience, 2000. [26] R. A. Fisher, Statistical Methods for Research Workers. Edinburgh, U.K.: Oliver and Boyd, 1925. [27] M. Svehlk, E. B. Zwick, G. Steinwender, W. E. Linhart, P. Schwingenschuh, P. Katschnig, E. Ott, and C. Enzinger, “Gait analysis in patients with Parkinson’s disease off dopaminergic therapy,” Arch. Phys. Med. Rehabil., vol. 90, no. 11, pp. 1880–1886, 2009. [28] J. Song, B. Fisher, G. Petzinger, A. Wu, J. Gordon, and G. J. Salem, “The relationships between the unified Parkinson’s disease rating scale and lower extremity functional performance in persons with early-stage Parkinson’s disease,” Neurorehabilitation Neural Repair, vol. 23, no. 7, pp. 657–661, 2009. [29] K. J. Brusse, S. Zimdars, K. R. Zalewski, and T. M. Steffen, “Testing functional performance in people with Parkinson disease,” Phys. Therapy, vol. 85, pp. 134–141, Feb. 2005. [30] Telecom Italia: Sanit`a Digitale. (2015). [Online]. Available: http:// nuvolaitaliana.impresasemplice.it/digitalizzazione/sanit%C3%A0digitale [31] Qualcomm Life: 2net Platform. (2015). [Online]. Available: http://www. qualcommlife.com/wireless-health

Federico Parisi (S’15) received the B.Sc. and M.Sc. degrees (summa cum laude) in computer engineering from the University of Parma, Parma, Italy, in 2010 and 2013, respectively. Since January 2014, he has been working toward the Ph.D. degree in information technologies at the Information Engineering Department, University of Parma. He is also a Member of the Wireless Ad-Hoc and Sensor Networks Laboratory, University of Parma, supervised by Prof. G. Ferrari. His research interests include body sensor networks, inertial motion capture and motion reconstruction, gait analysis, mobile applications for e-health and telerehabilitation systems, and pedestrian positioning systems. Mr. Parisi was a Member of the winning team of the first Body Sensor Networks Hackathon, organized at the 12th Annual Body Sensor Networks Conference, Boston, MA, USA, 2015.

Gianluigi Ferrari (S’96–M’98–SM’12) received his “Laurea” (5-year program, summa cum laude) and Ph.D. degrees from the University of Parma, Italy, in 1998 and 2002, respectively. He is currently an Associate Professor of telecommunications with the University of Parma, Parma, Italy. He was a Visiting Researcher with the University of Southern California, Los Angeles, CA, USA, from 2000 to 2001; Carnegie Mellon University, Pittsburgh, PA, USA, from 2002 to 2004; King Mongkut’s Institute of Technology Ladkrabang, Bangkok, Thailand, in 2007; and Universit`e Libre de Bruxelles, Brussels, Belgium, in 2010. Since 2006, he has been the Coordinator with the Wireless Ad-Hoc and Sensor Networks Laboratory, Department of Information Engineering, University of Parma. His research interests include wireless ad-hoc and sensor networking, adaptive digital signal processing, and communication theory. Prof. Ferrari currently serves on the Editorial Board of several international journals. He was the Recipient of the Paper/Technical Awards at the 3rd International Workshop on Wireless Ad-hoc and Sensor Networks 2006, the 2nd International Conference on Emerging Network Intelligence 2010, the 8th International Conference on Body Sensor Networks 2011, the 11th International Conference on ITS Telecommunications 2011, the 1st International Conference on Sensor Networks 2012, the track on Application of Nature-inspired Techniques for Communication Networks and other Parallel and Distributed Systems 2013, the 11th International Conference on Wearable and Implantable Body Sensor Networks 2014, the IEEE 22st International Conference on Software, Telecommunications and Computer Networks and the track on Application of Nature-inspired Techniques for Communication Networks and other Parallel and Distributed Systems 2015.

Matteo Giuberti (GSM’13–M’14) received the B.Sc. and M.Sc. degrees (summa cum laude) in telecommunications engineering and the Ph.D. degree in information technologies from the University of Parma, Parma, Italy, in 2008, 2010, and 2014, respectively. He is currently a Research Engineer with Xsens Technologies B.V., Enschede, The Netherlands. His research interests include data processing in wireless sensor networks and, more specifically, body sensor networks, body area networks, inertial motion capture and posture recognition, motion reconstruction, activity classification, orientation estimation algorithms, telerehabilitation and mobile health applications, and wireless (indoor) localization. Dr. Giuberti was the recipient of the first Body Sensor Network Contest and ranked second in one task of the Opportunity Challenge in 2011. He also received a Paper Award at the 11th International Conference on Wearable and Implantable Body Sensor Networks 2014.

Laura Contin received the degree in Mathematics from the University of Turin, Turin, Italy, in 1985. She is currently a Senior Researcher with the Strategy and Innovation Department, Telecom Italia, Turin, Italy. She has researched and published extensively in multimedia telecommunications and is currently involved in R&D projects related to active aging, remote rehabilitation, and remote monitoring of neurological patients. She has authored numerous international conference papers, journal papers, and patents. Her research interests include body sensor networks, algorithms for motion tracking, and ambient assisted living.

PARISI et al.: BODY-SENSOR-NETWORK-BASED KINEMATIC CHARACTERIZATION AND COMPARATIVE OUTLOOK OF UPDRS SCORING

Veronica Cimolin received the Master’s and Ph.D. degrees in bioengineering (with a focus on the quantitative analysis of movement for the assessment of functional limitation in children with Cerebral Palsy) from the Politecnico di Milano, Milan, Italy, in 2002 and 2007, respectively. She is currently a Researcher with the Department of Electronics, Information, and Bioengineering, Politecnico di Milano. She is with the “Luigi Divieti” Posture and Movement Analysis Laboratory. She is involved in the activity of several movement analysis laboratories of national and international institutes. She teaches the course “Impianti ospedalieri e sicurezza” and assists in the teaching of “Laboratorio di valutazione funzionale.” She has authored several peer-reviewed international papers. Her research interests include quantitative movement analysis for clinical and rehabilitative applications.

Corrado Azzaro received the Degree in medicine and surgery from the University of Palermo, Palermo, Italy, in 2002. He completed his specialty training in clinical neurology with the Neurology Department, “Molinette Hospital,” Turin, Italy, in 2011. He has been a Visiting Doctor with the “Centro de Evaluaci`on Neurologica Para Ni˜nos y Adolecentes”, Monterrey, Mexico, from 2002 to 2003, and with the Neurology Department, “Hospital Virgen del Rocio,” Sevilla, Spain, in 2003. He is currently as an Assistant with the Department of Neurology, “San Giuseppe” Hospital, Clinical Research Institute “Istituto Auxologico Italiano,” Piancavallo, Verbania, Italy. His research interests include neurophysiology, movement analysis in virtual reality environments, and applied medical informatics.

1793

Giovanni Albani received the Degree in medicine and surgery from the University of Pavia, Pavia, Italy, in 1992, and completed the School of Neurophysiopathology, Neurological National Institute “Casimiro Mondino,” Pavia, Italy, in 1997. He has been a Visiting Researcher with the Hospital Clinico (1990) and Bellvitge, Barcelona, Spain (1992), and Neurologische Clinic University of Zurich, “Paul Sherrer Institute” Nuclear Medicine Centre Villigen, Switzerland, and “Balgrist” Swiss Paraplegic Zentrum, Zurich, Switzerland (from 1995 to 1997). After being Chief of Service of Neurophysiology in Clinical Hospital “Beato Matteo,” Vigevano, Italy (1998–2005). He is currently an Assistant with the Department of Neurology, “San Giuseppe” Hospital, Clinical Research Institute “Istituto Auxologico Italiano,” Piancavallo, Verbania, Italy. His research interests include movement disorders, movement analysis, neurophysiology, and virtual reality.

Alessandro Mauro received the Medical Doctor degree from the faculty of medicine and surgery, University of Torino, Turin, Italy, in 1978 and specialized in Neurology in 1982. He is currently a Full Professor of neurology with the Department of Neurosciences, Medical School, University of Torino, Turin, Italy, since 2007. Since 2000, he has been the Medical Director with the University Unit of Neurology and Neurorehabilitation, and with the Laboratory of Neuropathology and Clinical Neurobiology, San Giuseppe Hospital—Istituto Auxologico Italiano—IRCCS, Piancavallo, Oggebbio (VB), Italy. Since 2010, he has been the Medical Director with the Centre of Sleep Medicine of the same institution. His research interests include clinical neurology, neurogenetics and molecular neuropathology of neurodegenerative diseases, neurorehabilitation, and sleep disorders. Prof. Mauro is an active Member of National and International Scientific Societies, including the Italian Association of Neuropathology and Clinical Neurobiology (AINPeNC) of which he had been the President since 2011 (current past-president and a Member of the Board of Governors).

1794

IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 19, NO. 6, NOVEMBER 2015

Classification of Parkinson’s Disease Gait Using Spatial-Temporal Gait Features Ferdous Wahid, Rezaul K. Begg, Chris J. Hass, Saman Halgamuge, and David C. Ackland

Abstract— Quantitative gait assessment is important in diagnosis and management of Parkinson’s disease (PD); however, gait characteristics of a cohort are dispersed by patient physical properties including age, height, body mass, and gender, as well as walking speed, which may limit capacity to discern some pathological features. The aim of this study was twofold. First, to use a multiple regression normalization strategy that accounts for subject age, height, body mass, gender, and self-selected walking speed to identify differences in spatial-temporal gait features between PD patients and controls; and second, to evaluate the effectiveness of machine learning strategies in classifying PD gait after gait normalization. Spatial-temporal gait data during self-selected walking were obtained from 23 PD patients and 26 aged-matched controls. Data were normalized using standard dimensionless equations and multiple regression normalization. Machine learning strategies were then employed to classify PD gait using the raw gait data, data normalized using dimensionless equations, and data normalized using the multiple regression approach. After normalizing data using the dimensionless equations, only stride length, step length, and double support time were significantly different between PD patients and controls (p < 0.05); however, normalizing data using the multiple regression method revealed significant differences in stride length, cadence, stance time, and double support time. Random Forest resulted in a PD classification accuracy of 92.6% after normalizing gait data using the multiple regression approach, compared to 80.4% (support vector machine) and 86.2% (kernel Fisher discriminant) using raw data and data normalized using dimensionless equations, respectively. Our multiple regression normalization approach will assist in diagnosis and treatment of PD using spatial-temporal gait data. Index Terms—Biomechanics, machine learning, regression model, walking.

I. INTRODUCTION ARKINSON’s disease (PD) is a neurodegenerative disorder of the central nervous system that affects control of body movement including posture and balance [1]. Gait disturbance, which may present as short, shuffling steps, gait freezing, lurching unsteady gait, or spontaneous falls, is recognized as a contributing diagnostic criterion for PD [2], [3]. Despite advances in medical care, gait disturbances are known to worsen

P

Manuscript received December 14, 2014; revised March 18, 2015; accepted March 28, 2015. Date of publication June 29, 2015; date of current version November 3, 2015. F. Wahid, S. Halgamuge, and D. C. Ackland are with the Mechanical Engineering Department, The University of Melbourne, Parkville, Vic. 3010, Australia (e-mail: [email protected]; [email protected]; [email protected]). R. K. Begg is with the College of Sport and Exercise Science, Institute of Sport, Exercise and Active Living, Victoria University, Footscray, Vic. 8001, Australia (e-mail: [email protected]). C. J. Hass is with the Department of Applied Physiology and Kinesiology, Center for Movement Disorders and Neurorestoration, University of Florida, Gainesville, FL 32611-8205 USA (e-mail: [email protected]). Digital Object Identifier 10.1109/JBHI.2015.2450232

as the disease progresses, contributing to loss of independence, falls, and poor quality of life [4], [5]. PD is clinically characterized using patient medical history and neurological examination; however, diagnosis may be difficult in the early stages of the disease, and can often take many years due to the variability of presenting signs, symptoms, and frequency. Quantitative assessment of spatial-temporal gait characteristics in PD patients, including walking speed, stride length, and double support time, has been previously performed in an attempt to elucidate gait control pathways affected by PD, assist in disease diagnosis, and evaluate time-dependent disease progression [5], [6]. Spatial-temporal gait features may be measured quickly, easily, and at low cost using technology such as pressure-sensitive GaitRite mats, or foot switches. In contrast, 3-D gait analysis requires a considerable amount of preparation time and costly digital optical motion capture technology [7]. Analysis of spatial-temporal data in gait analysis of PD patients may therefore be useful in clinics where 3-D gait analysis technology is not available. At present, however, there is no established procedure for reliably classifying PD based on spatial-temporal gait characteristics alone. Machine learning strategies in human movement, including neural networks and support vector machines (SVM), have gained popularity as they offer an objective approach to identifying or differentiating subgroups of individuals with movement disorders and quantifying outcomes of surgical or therapeutic intervention [8]. Using ground reaction force (GRF) data, PD has been classified with 77.3% accuracy using neural networks with weighted fuzzy membership functions [9], 84.4% using projection-based learning with a metacognitive radial basis function (RBF) network [10], and 91.2% using SVM with chi-square distance kernel [3]. Spatial-temporal gait features have also been used in SVM approaches to classify PD with an accuracy of between 75.6% and 90.32% [11]–[13]. While the outcomes of these studies show some promise, these approaches are inherently limited in their classification accuracy due to dispersion of raw gait data as a consequence of variability in patient age and physical features. Spatial-temporal gait characteristics of an individual are affected by ageing [2], and physical differences between subjects, including height, body mass, and gender, may increase gait variability and limit the degree to which pathological trends may be reliably discerned. Spatial-temporal gait features are also a function of and are greatly influenced by walking speed [14]–[16]. In a study of 17 healthy elderly subjects (age 68.9 ± 7 years), average self-selected walking speed was 111.92 ± 21.22 cm/s [17] and considerably lower and more variable than the walking speed of healthy young adults (141.4 ± 3.1 cm/s) [18]. Greater between-subject variations in walking speed

2168-2194 © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications standards/publications/rights/index.html for more information.

WAHID et al.: CLASSIFICATION OF PARKINSON’S DISEASE GAIT USING SPATIAL-TEMPORAL GAIT FEATURES

increases dispersion of spatial-temporal gait variables, and reduces power to detect between-group differences in a given gait feature. Spatial-temporal gait features may also be influenced by variations in a subject’s own self-selected walking speed, which may vary substantially in a given testing session [19], [20]. In a recent study of a young adult subject walking at their preferred walking speed in seven different laboratories, the mean self-selected speed varied between 1.38 ± 0.07 and 1.57 ± 0.03 m/s [20]. Ultimately, extrinsic factors such as laboratory environment and cognitive state may have a substantial effect on a subject’s self-selected gait speed, and therefore spatial-temporal gait parameters. Minimizing the effect of between-subject differences in physical properties and walking speed on spatial-temporal gait data may reduce data dispersion, leading to improved gait classification accuracy. It has been shown that the accuracy of cerebral palsy diagnosis using a SVM approach improves from 83.3% to 96.8% when spatial-temporal gait data are first normalized using leg length and subject age [8]. At present, however, gait normalization in machine learning has received little attention to date in the extant Parkinsonian gait literature, and the influence of walking speed on gait classification accuracy has not been explored. The aim of this study was twofold. First, to use a multiple regression (MR) normalization strategy to identify differences in spatial-temporal gait features between PD patients and controls; and second, to evaluate the effectiveness of machine learning strategies in classifying PD gait after MR normalization. By applying a MR normalization to spatial-temporal gait measurements using a range of patient characteristics including age, height, body mass, gender, and self-selected walking speed, we hypothesized that classification of pathological gait using machine learning approaches would significantly more accurate than conventional single-parameter normalization strategies [21]. The results of this study have future implications for early PD diagnosis strategies, and the way gait may be clinically evaluated in PD patients. II. MATERIALS AND METHODS

1795

ing an 8-camera video motion analysis system (Vicon, Oxford Metrics, Ltd., Oxford), while GRFs were recorded simultaneously using two instrumented force platforms (Kistler, Switzerland and Advanced Mechanical Technology, Inc., Watertown, MA). Video and analog force-plate data were sampled at 100 and 1000 Hz, respectively. Marker data were filtered with a lowpass, fourth-order Butterworth filter using a cut-off frequency of 4 Hz. Approval for the study was obtained from the relevant Human Research Ethics Committees, and written informed consent was provided according to the Committees’ guidelines. B. Data Normalization Spatial-temporal gait features including stride length, step length, step width, cadence, double support time, stance time, swing time, step time, and stride time were averaged across all trials and normalized to subject height using the following dimensionless (DS) equations reported by Hof [21]: ln =

lr D

(1)

fn =

fr  60 × Dg

(2)

tr tn = 

D g

where lr and ln represent the raw and normalized gait length feature, respectively (stride length, step length or step width); fr and fn represent the raw and normalized gait cadence, respectively; tr and tn represent the raw and normalized gait time feature, respectively (double support time, stance time, swing time, step time, or stride time); D is subject height, and g is the universal gravitational acceleration constant. Spatial-temporal gait features were normalized using (1)–(3) in both the PD patients and controls. Cadence was divided by 60 to produce units of steps/second. All normalized gait features were dimensionless. Spatial-temporal gait data for the PD patients and controls were also normalized using a MR approach which calculates the ratio of the original and fitted values as follows:

A. Experimental Protocol Gait data for 23 PD patients, age: 68.5 ± 6.1 years (60–80 years), height: 1.8 ± 0.1 m (1.6–1.9 m), body mass: 80.1 ± 15.5 kg (52.3–120.2 kg), male/female: 20/3, were selected retrospectively from an existing database [22]. Gait data for 26 aged-matched healthy controls, age: 69.5 ± 5.0 years (60–79 years), height: 1.7 ± 0.1 m (1.6–1.8 m), body mass: 76.7 ± 14.7 kg (55.0–106.4 kg), male/female: 13/13, were acquired from the same laboratory database. PD patients were mildly affected by the disease (Hoehn and Yahr [23] score of 2) and were in an optimally medicated state. Fifteen reflective markers were attached to each subject according to a modified Helen Hayes marker set [24]. Subjects walked at their preferred walking speed ten times across an 8 m walkway. The average walking speed of the PD patients and controls was 1.1 ± 0.2 m/s (0.8–1.6 m/s) and 1.2 ± 0.2 m/s (0.7–1.8 m/s), respectively. Marker positions were measured us-

(3)

yi = β0 +

p 

βj xi,j + εi

(4)

j =1

where yi represents the dependent spatial-temporal gait feature including stride length, step length, step width, cadence, double support time, stance time, swing time, step time, and stride time for the ith observation; xi,j represents the jth physical properties including speed, age, height, body mass, and gender for the ith observation; β0 represents the linear regression line intercept term; βj represents the coefficient for the jth physical properties;  and εi ∼ N 0, σ 2 represents the independent residual error for the ith observation. The model’s coefficients can be estimated using the control subject data and (4) rewritten as follows: yi = yˆi + εˆi

(5)

where yˆi represents the best fitted value from the regression model for the ith observation and εˆi represents the observed

1796

IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 19, NO. 6, NOVEMBER 2015

residual error for the ith observation. Gait features are normalized by dividing the value of the original dependent gait feature, yi , by yˆi according to yin =

yi yˆi

(6)

where yin represents the normalized gait feature for the ith observation. Using a Taylor’s series expansion, it can be shown that the expected value of the control group normalized using this method is 1 (i.e., E(yn ) ≈ 1). The normalized gait feature is dimensionless as both yi and yˆi have the same units. New gait data may be subsequently normalized using (6), where yˆi is the predicted value of the gait data that is obtained from the regression model. C. Machine Learning Five machine learning strategies were employed to classify PD gait: kernel Fisher discriminant (KFD) [25], na¨ıve Bayesian approach (BA) [26], k-nearest neighbor (kNN) [27], SVM [28], and Random Forest (RF) [29]. These machine learning strategies were applied using: 1) raw spatial-temporal gait features; 2) spatial-temporal gait features normalized using the DS equations; and 3) spatial-temporal gait features normalized using the MR approach. The classifiers’ internal parameters were optimized using exhaustive grid search and tenfold cross-validation method. RBF kernel was used for both KFD and SVM, where the grid search ranges used for the cost regularization (C) and shape (gamma) parameters were from 10–3 to 102 and from 10–3 to 101 , respectively (see the Appendix). All spatial-temporal gait features were scaled to have zero means and unity variances in order to avoid the phenomenon of large value dominance [30]. The performance of the machine learning was determined by evaluating mean accuracy [31] and area under curve (AUC) [32], where 1 represents perfect classification, and 0.5 represents the worst possible classification. D. Data Analysis Variance inflation factors (VIF) for the independent variables were first computed to examine the multicollinearity among the subject physical properties and self-selected speed. The VIFs for all independent variables were found to be less than 5 [33] (see the Appendix); therefore, all independent variables were considered in development of the MR models. For each gait feature, backward stepwise elimination was used to select the regression model where p < 0.1 for all independent variables. Finally, robust fitted models for each gait features were computed using a “bisquare” weight function [34]. Statistical assumptions including linearity, normality, and homoscedasticity for the MR models were met for all variables. Coefficient of variation (CV) was used as a measure of data dispersion before and after normalizing, and Spearman’s rank order correlation coefficient (ρ) was used to assess the influence of speed and physical properties including age, height, and body mass on the spatial-temporal gait features. To compute the correlations between the gait features and gender, a point biserial coefficient of correlation method was adopted [35]. 95% con-

fidence intervals and two-tailed student’s t-tests were used to compare the differences between the control and PD subjects. Standard error and 95% confidence intervals were computed as described previously [36]. For multiple comparisons, p-values were adjusted according to Holm [37]. Statistical significance was set at p < 0.05. All calculations were performed using MATLAB R2013b (MathWorks, Natick, MA). III. RESULTS Using raw data (i.e., before normalization), the only significant differences in spatial-temporal gait features between the PD patients and controls were stride length (mean difference: 0.09 m, 95% CI: [0.00, 0.19], p = 0.03), and double support time (mean difference: 0.09 s, 95% CI: [0.05, 0.13], p < 0.001), which were significantly smaller in the PD patients [see Fig. 1 (a)]. When the raw data were normalized using the DS equations, significant differences between the PD patients and controls were observed in stride length (mean difference: 0.09, 95% CI: [0.03, 0.15], p < 0.01), step length (mean difference: 0.03, 95% CI: [0.00, 0.06], p < 0.01), and double support time (mean difference: 0.23, 95% CI: [0.15, 0.32], p < 0.001) [see Fig. 1 (b)]. After normalizing using the MR approach, stride length (mean difference: 0.04, 95% CI: [0.00, 0.08], p = 0.03), stance time (mean difference: 0.06, 95% CI: [0.01, 0.11], p < 0.01), and double support time (mean difference: 0.39, 95% CI: [0.31, 0.48], p < 0.001) were significantly smaller in the PD patients, whereas cadence (mean difference: –0.04, 95% CI: [–0.08, – 0.00], p = 0.045) was significantly larger compared to controls [see Fig. 1(c)]. All raw spatial-temporal gait features were strongly correlated with speed (|ρ| > 0.68) except swing time (ρ = −0.33) and step width (ρ = − 0.02), which were weakly correlated (see Table I). Cadence, stance time, double support time, step time, and stride time were moderately correlated with age (|ρ| > 0.30). Most temporal gait features were weakly correlated with height, body mass, and gender (|ρ| < 0.28); however, swing time was moderately correlated with gender (ρ = 0.35). Step width was moderately correlated with height, body mass, and gender (ρ > 0.42), while both stride length and step length had weak correlations with all physical properties (|ρ| < 0.20). After normalization using the DS equations, correlations between temporal gait features and age increased (see Table I). While correlations between all gait features and height, body mass, and gender remained weak after normalization using the DS equations (|ρ| < 0.22), moderate correlations were still observed between step width and height, body mass, and gender (0.35 < |ρ| < 0.47). Spatial-temporal gait features remained moderately to highly correlated with speed after normalization using the DS equations. Normalization using the MR approach reduced most correlations between spatial-temporal gait features and physical properties to trivial values (ρ < 0.1), although some weak correlations were still observed after normalization (|ρ| < 0.25) (see Table I). After normalization using the MR approach, spatialtemporal gait features were no longer substantially correlated with speed (|ρ| < 0.16). Normalization using the MR approach

WAHID et al.: CLASSIFICATION OF PARKINSON’S DISEASE GAIT USING SPATIAL-TEMPORAL GAIT FEATURES

1797

Fig. 1. (a) Raw spatial-temporal data, (b) spatial-temporal data normalized using DS equations, and (c) spatial-temporal data normalized using a MR approach. Data are shown for PD patients and aged-matched controls, with standard deviation (SD) values given (whiskers). Significant differences in gait features between PD patients and controls are indicated with one asterisk (p < 0.05), two asterisks (p < 0.01), and three asterisks (p < 0.001). In order to fit all data onto a single plot, data were scaled between 0 and 1. All data are dimensionless.

TABLE I SPEARMAN’S RANK ORDER CORRELATION COEFFICIENTS ASSOCIATED WITH GAIT FEATURES, SPEED AND PHYSICAL PROPERTIES OF SUBJECTS Spearman’s ρ Speed

Age

Height

Body mass

Gender∗

Raw DS MR Raw DS MR Raw DS MR Raw DS MR Raw DS MR

Stride length

Cadence

Stance time

Swing time

Double support

Step length

Step time

Stride time

Step width

0.87 0.90 0.00 –0.19 –0.14 0.14 0.08 –0.19 0.12 0.01 –0.21 0.13 0.16 –0.08 0.05

0.70 0.67 0.00 –0.31 –0.41 –0.03 –0.19 0.02 –0.03 –0.10 0.04 0.10 –0.27 –0.07 –0.19

–0.72 –0.75 0.03 0.43 0.44 –0.10 0.18 –0.01 0.14 0.16 0.05 0.15 0.18 0.04 0.03

–0.33 –0.34 0.02 0.04 0.16 0.02 0.10 –0.12 –0.17 0.00 –0.17 –0.24 0.35 0.09 –0.05

–0.78 –0.78 0.11 0.41 0.41 0.23 0.19 0.03 –0.05 0.23 0.11 0.05 0.09 0.01 –0.11

0.83 0.88 0.04 –0.19 –0.10 0.20 0.14 –0.17 0.13 0.07 –0.14 0.24 0.17 –0.09 –0.01

–0.69 –0.65 0.00 0.31 0.38 0.07 0.21 –0.02 0.08 0.11 –0.04 0.03 0.26 0.06 0.05

–0.70 –0.67 0.00 0.31 0.41 0.10 0.19 –0.02 0.09 0.10 –0.04 0.03 0.26 0.06 0.05

–0.02 0.00 0.15 –0.19 –0.20 –0.13 0.55 0.46 0.11 0.54 0.46 –0.03 0.43 0.36 0.18

Correlations are shown for raw spatial-temporal data, spatial-temporal data normalized using the DS equations, and spatial-temporal data normalized using the MR approach. ∗ Point biserial coefficient of correlation computed according to Lev [35].

reduced the dispersion of all spatial-temporal gait features relative to that of the raw data and data normalized using the DS equations; however, significantly reduced dispersion was observed in stride length, stance time, double support time, step length, and step time (p < 0.05). Normalization using the DS equations had little effect on reducing data dispersion, and in some cases slightly increased dispersion of data (see Table II). Classification accuracy of PD using machine learning was lowest when using the raw spatial-temporal data, and highest when the data were normalized using the MR approach (see Table III). RF yielded a mean classification accuracy of 92.6% in predicting PD gait after data normalization using the MR approach, whereas the classification accuracies using KFD,

BA, kNN, and SVM after normalization with the MR approach were 87.4%, 82.0%, 84.4%, and 86.0%, respectively. Machine learning using BA, kNN, and RF resulted in classification accuracy of less than 80% with raw data or data normalized using the DS equations; however, KFD when applied to the data normalized using the DS equations resulted in a PD classification accuracy of 86.2%. The maximum AUC value (0.96) was determined from SVM and RF when applied to data normalized using the MR approach, while an AUC value of 0.95 was obtained from KFD when applied to data normalised using DS equations. An AUC value of 0.85 was obtained from SVM when applied to the raw data. The performance of BA, kNN, and RF produced an AUC value of less

1798

IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 19, NO. 6, NOVEMBER 2015

TABLE II CV VALUES COMPUTED USING RAW SPATIAL-TEMPORAL DATA, SPATIAL-TEMPORAL DATA NORMALIZED USING THE DS, AND SPATIAL-TEMPORAL DATA NORMALIZED USING THE MR APPROACH CV (%) Stride length

Cadence

Stance time

Swing time

Double support time

Step length

Step time

Stride time

Step width

Raw DS MR Raw DS MR Raw DS MR Raw DS MR Raw DS MR Raw DS MR Raw DS MR Raw DS MR Raw DS MR

Mean

(95% CI)

SE

12.87 13.34 4.73 8.21 7.76 4.72 10.08 9.72 4.81 6.36 6.35 5.25 19.90 19.37 10.20 12.16 12.53 5.33 8.18 7.89 4.56 8.06 7.78 4.55 37.35 35.51 33.94

(10.07:17.91) (10.43:18.57) (3.71:6.54) (6.43:11.37) (6.08: 10.75) (3.70:6.52) (7.89:13.98) (7.61:13.47) (3.77:6.65) (4.98:8.79) (4.98:8.79) (4.11:7.25) (15.50:27.99) (15.09:27.22) (7.98:14.14) (9.52:16.91) (9.80:17.43) (4.18:7.36) (6.41:11.33) (6.18:10.93) (3.57:6.30) (6.31:11.15) (6.10:10.77) (3.57:6.29) (28.61:55.33) (27.26:52.22) (26.10:49.62)

c c a,b

c c a,b

c c a,b c c a,b c c a,b

2.00 2.08 0.72 1.26 1.19 0.72 1.55 1.50 0.73 0.97 0.97 0.80 3.19 3.09 1.57 1.89 1.95 0.81 1.26 1.21 0.70 1.24 1.19 0.69 6.82 6.37 6.00

a, b, and c indicate a significant difference with raw data, data normalized using DS equations, and data normalized using the MR approach, respectively (p < 0.05).

TABLE III MEAN MACHINE LEARNING CLASSIFICATION ACCURACIES USING RAW SPATIAL-TEMPORAL DATA, SPATIAL-TEMPORAL DATA NORMALIZED USING DS EQUATIONS, AND SPATIAL-TEMPORAL DATA NORMALIZED USING THE MR APPROACH Machine learning strategy Mean KFD BA kNN SVM RF

0.794 0.738 0.702 0.804 0.734

Raw

DS

MR

(95% CI)

SD

Mean

(95% CI)

SD

Mean

(95% CI)

SD

(0.76:0.83) (0.70:0.77) (0.66:0.74) (0.77:0.84) (0.70:0.77)

0.18 0.17 0.20 0.18 0.17

0.862 0.748 0.778 0.806 0.760

(0.83:0.89) (0.71:0.79) (0.74:0.81) (0.77:0.84) (0.72:0.80)

0.14 0.19 0.18 0.17 0.19

0.874 0.820 0.844 0.860 0.926

(0.84:0.90) (0.79:0.85) (0.81:0.88) (0.83:0.89) (0.90:0.95)

0.15 0.15 0.16 0.16 0.12

Data for KFD, na¨ıve BA, kNN, SVM, and RF are shown, along with SD and 95% confidence intervals.

than 0.9 when applied to raw data or data normalized using the DS equations. Optimum machine learning performances (both classification accuracies and AUC values) were observed when using data normalized by the MR approach (see the Appendix for classifier rankings). Specificities and sensitivities of machine learning are also presented in the Appendix. IV. DISCUSSION The aims of this study were to use a MR normalization strategy to identify differences in spatial-temporal gait features between PD patients and controls, and assess the capacity

of machine learning strategies to classify PD gait after data normalization. Normalization and classification results were compared to those obtained using DS equations. Detrending normalization has also been previously proposed in which a polynomial model for each gait feature is developed using a highly correlated physical property such as age or height [38]; however, this method was not used in this study due to the weak correlations between the gait features and subject physical properties (see Table I). When normalizing data using the DS equations, only stride length, step length, and double support time were significantly different between the PD patients and controls; however, normalizing data using the MR approach revealed significant differences in stride length, cadence, stance time, and double support time. We hypothesized that MR normalization of spatial-temporal gait data would improve the performance of classification of PD using machine learning compared to the single parameter normalization method. We found that classification of PD using RF resulted in an accuracy of 92.6% after normalizing gait data using the MR approach, compared to maximum classifier accuracies of 80.4% (SVM) and 86.2% (KFD) using raw data and data normalized using DS equations, respectively. We showed that correlations among physical properties, speed, and gait features explained most of the dispersion of spatial-temporal gait features, and prevented a number of pathology-related differences in gait features between PD patients and controls from being discerned. In particular, variations in self-selected speed accounted for the majority of the dispersion of the gait data (see the Appendix). Normalizing spatial-temporal gait data using DS equations did not substantially reduce gait data dispersion, and in some cases, actually increased the data dispersion (see Table II). In contrast, normalizing the data using the MR approach greatly reduced correlations among age, height, body mass, gender, speed, and spatial-temporal gait features, thus reducing data dispersion, and significantly improving the performances of machine learning strategies to classify PD gait. The results demonstrate that MR normalization using patient physical properties and speed may be used to evaluate gait variability and quantify clinical gait aberrations associated with PD. This may have implications in early detection of PD, the monitoring of disease progression, and in the evaluation of the effectiveness of disease modifying interventions. The PD classification accuracy reported in this study was somewhat higher than some classification accuracies reported in the literature. We reported that a maximum classification accuracy of 92.6% using RF was obtained after normalizing gait data with the MR approach. Using SVM and RBF kernels, previous studies have reported PD classification accuracy of 75.6% using raw gait features including speed, step length, cadence, and stride time [12], and an accuracy of 89.3% using only raw temporal gait features [11]. A classification accuracy of 90.32% was also reported using least-squares SVM on temporal gait features [13]. However, in these studies, datasets were small (
Lihat lebih banyak...

Comentários

Copyright © 2017 DADOSPDF Inc.