Segmenting Triaxial Accelerometer Data via Data Programming

May 22, 2017 | Autor: Ishfaq Haque | Categoria: Medical Informatics, Machine Learning, Semi-supervised Learning, Deep Learning, Accelerometer, Generative Modeling

Share Embed

Denunciar este link

Descrição do Produto

Segmenting Triaxial Accelerometer Data via Data Programming

Haque Ishfaq Department of Statistics Stanford University [email protected]

Abstract The measurements of physical activity in individual is important for better understanding of diabetes, cardiovascular diseases, obesity, osteoporosis causes and their prevention and control. Monitoring human movement over extended period of time using wrist and hip-worn accelerometers allows us to investigate long-term pattern of activity and it’s effect on subject’s health and fitness. Accurate analysis of real world accelerometer data requires identification of time intervals when the device is not worn. Similarity among non-wear status, inactive time period and sleep makes this task hard to address. Moreover, generating large amount of labeled data, that could be used to train discriminative model, is hard, time consuming, error-prone and expensive. To create training data from unlabeled data for our problem, we use data programming, which leverages simple heuristic labeling functions to weakly label data. We perform two separate experiments to see the efficacy of data programming paradigm in our problem: 1. using existing expert algorithms for non-weartime detection as labeling functions and 2. using heuristically defined thresholded labeling functions.

1

Introduction

Physical activity can have positive impact on a wide range of medical conditions such as cardiovascular disease, obesity, diabetes, osteoporosis etc. To understand the cause of these medical conditions and design effective prevention and control measures, we need measurement of physical activity and it’s proper analysis. Hip-worn and wrist accelerometers are routinely used to collect physical activity data over extended period of time. Proper analysis of these data will allow to understand relationship between physical activity pattern and various medical condition and fitness. But in order to get accurate analysis, first we need to identify time intervals when the accelerometer device is not worn. The similarity between sedentary behavior, inactive state and non-wear state makes this task particularly difficult. Moreover, training any discriminative model that would classify weartime and non-wear time would require properly curated training data with true labels. But collecting large amount of properly labeled training data that would capture the real world day-to-day physical activity is infeasible and designing such data collection experiment would introduce various artificiality in the data which would not be representative of real world noisy data. There have been extensive studies on establishing generative model for accelerometer weartime detection problem. For example, the method described in Choi et al. [3] uses vector magnitude. The Padaco software[5] applies smoothing and thresholds on each axis separately. Troiano et al.[8] detects non-wear interval using thresholds on the length of zero sequence on each axis separately. But when applied on real world accelerometer data, it is often the case that these methods provide conflicting label predictions. Their performance also varies a lot across different subjects and there isn’t a single algorithm that always performs better than the other across all subjects. Stanford University Class Project: Artificial Intelligence (CS221).

In this paper, we introduce an algorithm using data programming [6] to classify weartime and non-weartime that uses the predictions of several activity labeling methods and automatically denoise the resulting training label set by learning the accuracies of the labeling functions and their correlation structure. Then we can use this denoised training set to train a noise-aware discriminative model. Note that in the whole process, we do not need true training labels.

2

Related Work

There have been numerous studies on activity data segmentation and classification[7, 4, 1]. These studies considered a range of base classifiers including decision trees, support vector machines, neural networks etc. For most of these studies, the ActiGraph sensor device had been used for data collection, being worn at the wrist[7, 9], hip[9, 4] or lower leg[1]. Usually these data sets are collected under close monitoring and carefully annotated. For some cases, feature engineering and data preprocessing were performed to get better result with standard classification algorithms[7, 9, 2]. Most of these prior studies are limited in the sense that their data sets were collected under controlled setting and close monitoring for certain sets of activities. Thus those data sets suffer from artificiality and do not represent real world raw accelerometer data. And since the data sets were collected under close monitoring, non-wear status was essentially never considered. This makes most of prior studies inadequate and ineffective since raw data collected in the field do not have annotations of when the sensor device is worn and they constitute a wider range of activities unlike controlled studies. Moreover, collecting carefully annotated natural activity data on larger scale is infeasible due to high expenses and artificiality introduction. Our approach on weartime prediction problem is novel in sense that we use distant learning and data programming paradigm, which, to the best of our knowledge, was not previously studied. Moreover, to the best of our knowledge, this is the first time that data programming paradigm is studied on time-series data. All prior applications involved relation extraction from text data[6].

Figure 1: Triaxial ActiGraph data. Red, green and blue signals represent each of the three axes. The magenta and black signals represent vector magnitude and step counts respectively

3

Methodology

We describe a distant supervision paradigm in which data programming method operates. Using data programming, our goal is to model a generative process that would describe the triaxial accelerometer data and help us create training set with noisy labels. Then we would like to use this noisy training data to train a noise-aware discriminative model which could further be used for labeling new data later. 2

3.1

Problem Setting

Before going into details of data programming[6], we first describe the problem setting on which data programming operates. Consider a binary classification problem in which we are concerned with a distribution π over object and class pairs (x, y) ∈ X × {−1, 1} and we have features f (x), a family of M labeling functions Λ(x) ∈ {−1, 0, 1}M , each of which encodes a noisy or weak guess for the true label y. We are given a set of N training examples x ∈ χ, and have access to their features and labeling functions - but not their true label during the training process. Our goal is to output a classifier that accurately estimates y.

Figure 2: A toy schematic of data programming paradigm: Heuristic labeling functions label some subset of data which implicitly define a generative model. The predictions of this generative model is then used to train a discriminative model.

3.2

Data Programming

It is often the case that there is a particular discriminative model that we would like to train for our problem since it has performed well for similar problems but we cannot do this because of the unavailability of enough training data. And with the absence of ground truth during the training process, there is no way of naively applying standard discriminative method such as logistic regression or SVM. Ratner et al. [6] recently proposed data programming paradigm to handle such situations. In this paradigm, we first learn a generative model using labeling functions to predict the labels for the training example set. Then we train a noise-aware discriminative model using the features f (X) and noisy predicted labels YG . The discriminative model fits a classifier YD : f (X) → {−1, 1} which can be used to classify all objects x ∈ X including the objects not labeled by the labeling functions. The generative model describes the relationship between the labeling functions Λ and the true class Y using a probabilistic distribution of the form: πφ (Λ, Y ) =

1 exp(φT ΛY ) Z

(1)

where Z is a partition function that makes π a distribution. During the generative model fitting phase, data programming learns the parameter φ ∈ RM by maximizing the marginal likelihood of the ˆ the generative model can compute observed labeling functions Λ. After estimating the parameter φ, πφˆ(Y |Λ(x)) and assign predicted labels, in the form of marginal likelihoods for each object x in the training set. Then we train a discriminative model D by minimizing empirical noise-aware regularized logistic loss function as described in [6] given the marginal likelihoods from the generative model and the features f (X): lφˆ(w) = E(Λ,Y )∼πφˆ [log(1 + exp(−wT f (X)Y ))|Λ] (2) 3

Choi

0.8

0.8

0.6

0.6

0.4 0.2 0.0

0

Padaco(X)

1.0

F1 score

F1 score

1.0

0.4 0.2

5

10

15 20 Different Subjects

25

0.0

30

(a) Choi Performance

0

5

10

15 20 Different Subjects

25

30

(b) Padaco(X) Performance

Figure 3: Performance of expert algorithms can vary highly across different subjects.

We can use gradient descent to solve the above logistic regression problem. In the similar manner as above, we can convert any discriminative model to be noise-aware by changing its loss function. The data programming setting differs from the standard supervised learning setting in that we do not need to have access to true labels for training, rather we require to use the noisy labeling functions as heuristics. It differs from the general weakly supervised learning in that at the training time, we have no access to ground truth for any example.

4

ActiGraph Wear Time Detection

We have applied the data programming method for labeling physical activity monitoring data. We specifically label sensor weartime and non-weartime based on raw accelerometer data. 4.1

Data Description

We use raw triaxial ActiGraph data collected for 260 children, aged 7-11, over a 7-day period using hip-worn accelerometers. In order to validate the efficiency of the data programming method for time series labeling, we used data from a monitored sleep study (30 subjects). 4.2

Learning Generative Model using Expert Algorithms as Labeling Functions

For our first experiment, we employed several state-of-the-art wear-time detection techniques as experts or labeling function to learn the generative model. Specifically, we use methods described in Troiano et al.[8], Padaco[5] and Choi et al.[3]. For the first two methods, we apply them on different time series axis: x-axis, y-axis, z-axis, step count and vector magnitude. As shown in Figure 3, these methods’ performance across different subjects is highly varying. Then, using labels generated from these labeling functions, we fit a generative model (noise-aware naive Bayes as described in [6]) on the all-subject-aggregated data. We regard our individual expert labeling functions as baseline and compare our generative model’s performance against them. We consider the true known label(that we do not consider nor use during any of our training steps) as oracle. The generative model accuracy is evaluated on the held out validation set. Table 1 shows a comparison of the F1-score, precision and recall of the baselines and the fitted generative model. We see that, in the case of F1 score, precision and recall, the generative model outperforms all the baselines/labeling functions. But it is also noticeable that the difference margin between the generative model and the best performing individual expert algorithm Padaco is pretty slim. Our hypothesis is that it is due to the relatively higher weakness(i.e poor accuracy) of other labeling functions. Thus the generative function is largely influenced by the Padaco based weak classifiers. 4

Table 1: Comparison of generative weartime Detection Algorithms. Method F1 Score Precision Recall Truth (Oracle) Choi Troiano(X) Troiano(Y) Troiano(Z) Troiano(S) Padaco(X) Padaco(Y) Padaco(Z) Padaco(Magnitude) Generative Model

1.00 0.130 0.105 0.095 0.102 0.062 0.885 0.885 0.884 0.885 0.887

1.00 0.980 0.972 0.969 0.971 0.959 0.997 0.997 0.997 0.997 0.997

1.00 0.069 0.055 0.050 0.054 0.032 0.795 0.795 0.794 0.796 0.798

The generative model ensemble was able to boost the wear-time detection accuracy such that it is currently better than any of the state-of-the-art methods, and equals the accuracy of an oracle that selects the best predictor for each window. 4.3

Learning Generative Model using Heuristic Labeling Functions

One of the main advantages of data programming is that it is able to incorporate simple heuristic labeling functions as weak classifier in the generative model learning phase. This way it is able to add value on any heuristics. So far, this approach has been shown highly successful in text relation extraction problems[6]. But we would like to see how this idea of heuristically defined labeling functions perform in time series data such as triaxial accelerometer data. So, we define five simple heuristic labeling functions. For each of these labeling functions, it looks into 30 minute context window of the time data point we want to label. Then the labeling function looks into the time axis specific to itself in the context window and calculates the percentage of zero in the series. Depending on a certain threshold, it assigns the the data point as either weartime or non-weartime.

LF(Lux)

0.8

0.8

0.6

0.6

0.4 0.2 0.0

0

LF(Vector Magnitude)

1.0

F1 score

F1 score

1.0

0.4 0.2

2

4 6 Different Subjects

8

0.0

10

(a) LF(Lux) Performance

0

2

4 6 Different Subjects

8

10

(b) LF(Vector Magnitude) Performance

Figure 4: Performance of heuristic labeling functions across different subjects. As shown in Figure 4, our heuristic labeling functions also show varied accuracy across different subjects. Among all the 5 labeling functions, the labeling functions that uses vector magnitude and put threshold of zero on it performs most consistently across different subjects. The other heuristics showed poor performance in general regardless of the subject. As in the first experiment, we also regard our individual labeling functions as the baseline and regard the true known label as the oracle. The learned generative model’s accuracy is evaluated on held out validation data. From Table 2, we see that in case of F1 score and recall, the generative model performs much worse than most of the baselines. We suspect that the poor quality of the heuristic labeling function is not letting the generative model to learn properly. Even though LF5(Vector 5

magnitudes) performs well, the performance is not carried over in the generative model due to the really poor quality of the other four labeling functions. Table 2: Performance of generative model trained using heuristically defined labeling functions. Method F1 Score Precision Recall Truth (Oracle) LF1 (x axis) LF2 (y axis) LF3 (z axis) LF4 (Lux) LF5 (Vector magnitude) Generative Model

5

1.00 0.049 0.003 0.057 0.106 0.826 0.048

1.00 0.967 0.883 0.970 0.969 0.995 0.966

1.00 0.025 0.001 0.029 0.056 0.707 0.024

Conclusion and Future Work

We have shown how data programming paradigm can be applied in time series data labeling, specifically weartime and non-weartime detection triaxial accelerometer data. We showed that using existing state-of-the-art weartime detection techniques as labeling functions allows us to learn high performance generative model using data programming that beats all the baselines. We also showed that using highly naive and simple heuristic methods as labeling functions outputs really poor quality generative model. Our experiments indicate that data programming has the potential to be highly useful for time-series data given high quality labeling functions. For future work, we hope to explore and improve upon heuristic labeling function design process for time series data. Figuring out how to design robust and high performing heuristics for time series data will allow us to run wider range of experiments using data programming. We would also like to explore the potential of noise-aware Tree Augmented Naive Bayes(TAN) and Chow-Liu Bayes Net for learning the generative model. A drawback of the current noise-aware naive Bayes is that it relies on strong independent assumption among labeling functions and thus not able to account for potential dependencies among different labeling functions. TAN and Chow-Liu Bayes net would allow to handle this situation. Once we improve our generative model, we will start focusing on incorporating discriminative model for time series data in data programming. We specifically want to experiment the efficacy of noise-aware Long Short Term Memory(LSTM) neural network as discriminative model when trained on denoised training data generated from the generative model. Acknowledgments Thanks to Jason Fries, Madalina Fiterau, Alex Ratner, Jennifer Hicks and CS221 course staffs for their helpful conversations and feedback.

6

References [1] Hamzah S AlZubi, Simon Gerrard-Longworth, Waleed Al-Nuaimy, Yannis Goulermas, and Stephen Preece. Human activity classification using a single accelerometer. In 2014 14th UK Workshop on Computational Intelligence (UKCI), pages 1–6. IEEE, 2014. [2] Yen-Ping Chen, Jhun-Ying Yang, Shun-Nan Liou, Gwo-Yun Lee, and Jeen-Shing Wang. Online classifier construction algorithm for human activity detection using a tri-axial accelerometer. Applied Mathematics and Computation, 205(2):849–860, 2008. [3] Leena Choi, Zhouwen Liu, Charles E Matthews, and Maciej S Buchowski. Validation of accelerometer wear and nonwear time classification algorithm. Medicine and science in sports and exercise, 43(2):357, 2011. [4] Markus Hagenbuchner, Dylan P Cliff, Stewart G Trost, Nguyen Van Tuc, and Gregory E Peoples. Prediction of activity type in preschool children using machine learning techniques. journal of Science and Medicine in Sport, 18(4):426–431, 2015. [5] Moore Hyatt. Padaco, 2016 (accessed December 16, 2016). http://web.stanford.edu/ ~hyatt4/software/padaco. [6] Alexander Ratner, Christopher De Sa, Sen Wu, Daniel Selsam, and Christopher Ré. Data programming: Creating large training sets, quickly. arXiv preprint arXiv:1605.07723, 2016. [7] Raghavendiran Srinivasan, Chao Chen, and Diane Cook. Activity recognition using actigraph sensor. In Proceedings of the Fourth Int. Workshop on Knowledge Discovery form Sensor Data (ACM SensorKDD’10), Washington, DC, July, pages 25–28. Citeseer, 2010. [8] Richard P Troiano, David Berrigan, Kevin W Dodd, Louise C Masse, Timothy Tilert, Margaret McDowell, et al. Physical activity in the united states measured by accelerometer. Medicine and science in sports and exercise, 40(1):181, 2008. [9] Yonglei Zheng, Weng-Keen Wong, Xinze Guan, and Stewart Trost. Physical activity recognition from accelerometer data using a multi-scale ensemble method. In IAAI. Citeseer, 2013.

7

Lihat lebih banyak...

Segmenting Triaxial Accelerometer Data via Data Programming

Descrição do Produto

Comentários