Practical reliability data analysis

June 3, 2017 | Autor: Jake Ansell | Categoria: Engineering, Data Analysis, Mathematical Sciences

Descrição do Produto

Reliability Engineering and System Safety 28 (1990) 337-356

Practical Reliability Data Analysis J. I. Ansell Department of Management Systems and Sciences, University of Hull, Hull HU6 7RX, UK

& M. J. Phillips Department of Mathematics, University of Leicester, Leicester LE1 7RH, UK (Received and accepted 30 August 1989)

ABSTRACT The practical problems encountered in Reliability Data Analysis are considered, particularly when censoring is an important feature. Approaches to Statistical Data Analysis rather than specific techniques are described. Careful examination of the data is recommended using simple plotting methods. The importance of the objective of a Reliability Study is emphasized as this must always be kept in mind. Statistical methods should be a tool to achieve a given end. Three examples of Reliability Studies from three differen t areas of application are used for illustration. These examples consist of data collected from the field and do not cover laboratory-controlled testing.

1 INTRODUCTION Beale 1 commented about the formulation of Linear Programming that 'a general discussion on such topics often promises to be interesting, but in fact turns out to consist of WAFFLE'. Being aware of such sentiments it is to be hoped that this pitfall can be avoided in attempting to describe the broader area of Reliability Data Analysis. For any particular data set, depending on the context and the objective of the analysis, it is possible to describe a specific approach to the data. However, there is still a need for general guidance. Also there have been 337 Reliability Engineering and System Safety 0951-8320/90/$03.50 © 1990 Elsevier Science Publishers Ltd, England. Printed in Great Britain

338

J. 1. Ansell, M. J. Phillips

developments recently in techniques applied to data which are worth describing and commenting on. This paper does not intend to cover all possible applications of Statistics or Probability to Reliability, nor does it intend to attempt to cover the whole field of Reliability Analysis. The objective is to describe elements of Statistical Data Analysis of use in Reliability, it is hoped in a manner which gives the flavour of modern approaches. Rather than displaying a set of techniques on their own it is more important to demonstrate the techniques through case studies. This heightens the questions which surround the approach to Data Analysis as well as the application itself. Obviously a single paper cannot describe all the possible techniques, nor for each example can all plausible analyses be explored. Hence one approach may be applied to only one set of data, though if applicable it could equally have been applied to the others. Before embarking on the detail it is important to decide the role of Statistical Data Analysis within a Reliability Study. However, to do this it is necessary to suggest what the aims of a Reliability Study might be. Again the aim will depend on several factors, such as who has instigated it, why they have done so, etc. An analysis may be to establish whether a system has achieved a specific level of performance, or it may be to assess the likelihood of achieving a successful mission. Identification of where improvement can be made is also a frequent objective. This does not cover all possible aims, but they highlight the differing goals which will often imply different analyses. Many of the plausible goals in Reliability Studies will be achieved without the need of any Statistical Analysis; others will require specific forms. Rarely will the Statistical Analysis play the central role. Where Statistical Analysis is required, its role may be in assessment, identification or prediction. Assessment is either the estimation of the component or system lifetimes or the probability of successful completion of a task. Identification is the process of identifying the significant features which affect lifetimes, or of components in order to enhance performance. Prediction is the extrapolation into the future based on the past history. Some may disagree with such comments but they should reflect whether they are extending statistics beyond its bound or unwisely limiting its role. The paper starts with some general comments on approaches to the analysis and then considers the case studies. 2 G E N E R A L A P P R O A C H E S TO R E L I A B I L I T Y D A T A ANALYSIS It will be assumed that the analyst has an initial objective. This will define the strategy for analysing the data and may also suggest the end point of the

Practical Reliability Data Analysis

339

analysis. Ascher & Feingold 2 produced a flow-chart to indicate possible stages in an analysis. Such an approach is too rigid. It is also doomed to failure because of the very size of the flow-chart required to cover all eventualities. Our preferred approach is closer to Tukey's Exploratory Data Analysis. 3 Simple plots and statistics should be used at the initial stage, followed by more detailed analyses based on the earlier work. Obviously the data should not be allowed to define the whole analysis; the objective must always play a significant role. The initial stage of data analysis is inspection and validation. The data have to be checked where possible for inconsistencies and possible errors. This should include checking that observed lifetimes are usually less than the period of study, that for renewable systems the sum of lifetimes are less than the period of study, etc. Repetitions of values should be checked to see whether they are genuine repeats. Similarly, associated values should be checked. The next stage is to plot the data. Fortunately plotting has a long history in Reliability Analysis but it is still important to emphasize that one should start with simple plots. Even the stemleaf can be very informative in a Reliability Analysis, for it will possibly indicate the homogeneous nature of the data and whether it is plausible that there is more than one population. It may also indicate the presence of'outliers' and will show the skewness of the sample. Other simple plots such as dot plots (see Section 4.1) and Pareto diagrams (see Section 4.2) can be very helpful. Depending on the data it may be possible to investigate bivariate relationships. For repairable systems a bivariate scatterplot of lifetimes against lagged lifetimes may shed some light on dependency. Usually one starts in such cases with a lag of one and, through a series of graphs, increases the lag. There will be loss of information as the lagging increases, which means eventually the plotting becomes meaningless. Similarly, if there is more than one variable in the data, then again scatterplots or box-whisker diagrams may indicate differences in the data. The statistics should again start with the basics such as mean, median and variance (or standard deviation). The index of dispersion is also invaluable in giving guidance on IFR (Increasing Failure Rate) or D F R (Decreasing Failure Rate). For repairable systems data, the next stage might be to test for trend and dependency. There are a number of tests in the literature for trend, e.g. Laplace 4 and MIL-HDBK-189. s Unfortunately no test provides general defence against all departures from constancy. As Stephens 6 has pointed out, the Laplace test may indicate trend when there is none if the coefficient of variation is large, e.g. the Weibull and Gamma distributions with shape

340

J. 1. Ansell, M. J. Phillips

parameters less than unity. The objective of the study may guide one to the most appropriate test or tests. For dependency Cox & Lewis 4 suggested that the correlation coefficients provide an adequate indication. At this stage more specific distributional forms will become appropriate to the study. Again, plotting techniques such as Weibull or TTT (Total Time on Test) plots are useful. Obviously the plotting techniques chosen will assume some distributional form. Interpretation of such plots has many pitfalls. It should be remembered that the objective of the analysis must play a very significant role in interpretation. One of the most serious pitfalls is reading too much into the plot. Certainly the depart~are of the extreme points from any predicted line is usually not significant. TTT-plots, suggested by Barlow & Campo, 7 have recently received attention since they may be helpful in deciding whether a distribution is D F R or IFR. Suppose that X~I~, X(2 ) . . . . . ~t"(n I is an ordered sample of failure times, X~I)< X~21< " " < X~.I. The ith TTT statistic T~,. is defined by Ti,. = X~1i + X~21 + "

+ X~i- 11 + (n - i + 1)X(i)

and the ith scaled TTT statistic u,) is defined by u,i =

The TTT-plot is then obtained by plotting u,) against i/n. If the distribution is I F R (DFR) then the plot should be concave (convex). However Barlow 8 has suggested that such plots can be very misleading when the data are censored, as frequently happens in survival analysis. Lawless 9 indicated that similar information can be obtained from the standard Weibull plot. Beyond this stage any analysis becomes too specific for the particular data set. The case studies will now be considered. 3 CASE STUDIES For the purpose of this paper the data consist of lifetimes which have been collected from the field. There is an important role for laboratory-controlled data when they are available but a considerable a m o u n t of Reliability data are from the field. The data may be collected on a component or on a system. As well as the lifetimes there may also be associated data on other variables. These accompanying data may qualitatively describe the context or quantitatively describe covariates. Data may be left- and right-censored and values may be missing. Right censoring is usually unavoidable in the analysis of field (service) data, and techniques applied to this kind of data must take this into account. It is c o m m o n that a high percentage of the observations are censored.

Practical Reliability Data Analysis

341

Three sets of data are considered though they do not cover all the features that Reliability data may possess. As far as possible for each data set the practical setting is described as well as the client's objective. It is unfortunately the case for the development of the application of statistics to Reliability that commercial sensitivity is often used as an excuse for the lack of published data. This is being corrected, but still many novel and useful techniques are not explored through lack of data. 3.1 Process event data A company wishes to be able to predict the rates of events which affect the reliability of their processes. The rate of occurrence is assumed to be related to certain variables, either directly measured or derivable. The aim of the study is to ascertain whether the rate was dependent on these variables and to predict the rate of occurrence. The data consist of dates of specific events during a six-month period with measures on two covariates. For economy of presentation only the days are presented in Table 1; the covariates can be obtained directly from the authors. TABLE 1 Days on which an Event Took Place for the Event Process with Covariates

30 102 124 137 167

31 103 125 139 170

34 104 126 141 171

47 105 127 142 173

55 108 129 143 180

62 112 130 150

64 115 131 152

66 117 132 157

68 118 135 165

101 122 136 166

Source: Ref. 20.

3.2 Electronic system data A manufacturer of electronic equipment decides to assess the performance of the systems supplied to a customer. The supplier's data in this case will consist of the failure times. The customer's data will consist of the failurefree times. This is because whilst the manufacturer is under an obligation to repair failed systems for his customer and will therefore have information about these failed systems, the customer is under no obligation to supply information about systems which are performing satisfactorily. This information must be obtained from the customer in some direct or indirect way. The information from both sources are combined to produce the data for 100 systems. The method of data collection is summarized in Table 2.

342

J. 1. Ansell, M. J. Phi~lips TABLE 2

Method of Data Collection tbr the Repairable Electronic Systems a Co~lector

Data

Manufacturer

Failures Bad news Failure-free time Good news

Customer

Statistics

Events Censored times

" The systems were progressively introduced into service and not operated continuously or in a uniform manner. A n o t h e r aspect o f the d a t a is t h a t c a l e n d a r time is not a useful metric since the system is n o t used t h r o u g h o u t the period. T h e time t a k e n to repair the system is not taken into c o n s i d e r a t i o n as these times are short c o m p a r e d with the average time between failures. T h e aim o f the s t u d y is to decide w h e t h e r the systems satisfy speeification, o r need a m a j o r m o d i f i c a t i o n or a c o m p l e t e redesign. W h e n a failure o f the system occurs in service, the electronic m o d u l e (subsystem) which caused the failure is identified and is t h e n replaced by a new m o d u l e and the system is r e t u r n e d to service. So there is a record for each 10

x

x

®

)¢

X

x

X

,,

®

E

==

xx

X~:

E X

Xx

XN

-I --

x x

x

I

40

'

I

80

'

I 120

'

I 160

'

I 200

'

I 240

time

Fig. 1.

Example of the data from the repairable electronic systems. ×, Failure; ®, last time withdrawn; - - , failure-free time. (Time scale in days.)

Practical Reliability Data Analysis

343

"'7".

E F-

i t~

t'q

"7, e.,

..Q

© .J

2 .fi e~

e.,

.s o

--~

~E

0

o o Z

-6

= o ~ 0

e,i

J. L Ansell, M. J. Phillips

344

system which consists of the failure times for each failure with the appropriate serial number of the replaced module. The record also contains an estimate of the last time at which the system was withdrawn from service before the end of the period of data collection. An example of these data for a subset of 10 of the 100 systems presented in Fig. 1. 3.3 Fleet mechanical equipment data The operator of a fleet decides to assess the performance of mechanical equipment fitted to the ships from data which have been collected from the operation of the fleet over a number of years. The aim of the study is to decide about the future replacement or repair of the fleet equipment. The problem was described by Triner. 1° The data consist of the number of events (failures) X(ti_ 1, ti) in the ith (1 < i _< s) period (t~_ 1, t~), assuming instantaneous repairs. The number of failures in the initial period (0, t o) are not known. The method of data collection is shown in Fig. 2.

4 ANALYSIS OF R E L I A B I L I T Y D A T A In Section 2 general approaches to the data analysis were considered. In this section the approach is more specific for each data set. The same analyses for each data set will not be repeated unless they are important. Further suggestions for analysis which may be carried out on the data sets will be made. This is done for two reasons--firstly to illustrate that statistical analysis is an iterative process. Having applied a technique to a data set it is often the case that another technique will become appropriate. Secondly the objective of the analysis may have already been achieved and intellectual curiosity becomes the only reason for continuation. 4.1 Process event data As previously indicated, the initial stage is to examine the raw data. Figure 3 is the stemleaf for the times between events. The plot is positively skewed and does indicate the possibility of two 'outliers'. An 'outlier' may be due to errors in collection but it may also be very informative about the process. At such an early stage of the analysis it would seem a little unwise to eliminate these points, until further evidence weighs against them. From the stemleafit is not possible to assess whether the data are under- or over-dispersed compared with the exponential. The index of dispersion indicates that the

345

Practical Reliability Data Analysis 1

~

18

2 3 4 5 6 7

~ 00000 00 0

11 5 2 1 0 3

8

00

000

9 1" 2* 3*

2

0 1 0 2

3 03

45 Fig. 3. The stemleaf of the times between the process events. data are over-dispersed with the two 'outliers' and under-dispersed without them. The nature o f the process under study and the dot plot of the data in Fig. 4 do indicate quite clearly that there is a non-constant failure rate for the process. Calculation o f the Laplace statistic yields a value of - 3 . 4 which does indicate trend. A non-stationary model is appropriate. The objective o f the analysis is to assess the frequency o f events, and to ascertain whether these are related to the supplied covariates or other factors. Since the covariates vary with time, a model based on them will be non-stationary. There has been considerable interest throughout the statistical literature on the effect o f covariates on lifetimes.X 1.12 This has arisen from the papers of Cox 13'14 on Proportional Hazards Modelling. A number o f other authors have made a significant c'mtribution. 15'16 There have been a number of applications in Reliability. The Proportional Hazards Model is a specific model, not a general one. It therefore has to be treated with a degree of care when applied to data.

:,

-:

. .

. °

.

°

.

•

:I

¢

I

I

0"0

6'0 Fig. 4.

I

I

I

12"0 18"0 24"0 Days between occurrence of events The dot plot of the times between the process events.

I

30"0

346

J. 1. Ansell, M. J. Phillips

The technique derived from the model is usually applied in an exploratory manner. The aim is usually to highlight the significant variables or factors which affect lifetimes. Only secondarily has it been used to remove the effect ofcovariates in order to estimate the underlying distribution of the lifetimes (see Ref. 17). Fortunately work by Solomon Is appears to indicate that the technique will detect the significance of variables when the model is not appropriate. The result does not cover all possible departures from the model, but it reinforces the use of the technique. Application of Proportional Hazards Modelling needs care. The current data consist of two covariates which are measured throughout the period daily. The events are recorded as having occurred or not occurred on a day. Should the covariates be treated as affecting the events only on the day of occurrence or cumulatively? The first model suggests an instantaneous effect of the covariates, the second suggests a building up of the effect over time. The nature of the process is such that either model is appropriate. The analysis must reflect this and therefore includes both the covariates on the day of event and the cumulative measure. Also, since there is a belief that there is some time dependency, a term for time is incorporated. The results in Table 3 seem to suggest that there is an effect to be noted. Having established that the covariates have an effect, which achieves one objective, it is necessary to contemplate their incorporation so as to be able to make predictions. This second objective proves to be more problematic. The Proportional Hazards Model is a specific model, as stated earlier, and if it is to be used for prediction it is necessary to establish its appropriateness. The model assumes the proportionality of the hazard function to the covariates. This can be examined by considering the residuals derived from the model (see Ref. 19). The model is found to be inappropriate since if the model was correct the residuals should be independent and identically distributed with the exponential distribution. This is not the case, as can be seen from Fig. 5; hence an alternative model needs to be sought. Even if the Proportional Hazards Model had proved to be acceptable, applying it would not be easy. One can estimate the underlying distribution function, and it may be possible then to fit a distributional form. The next step is to incorporate the covariates into the distribution. This proves far from easy when they vary with time. A solution involving simulation may be the only way forward. Given the client's desire for a solution, an alternative model which seems adequate is the logistic. This can be formulated as follows: P(event on ith d a y ) =

exp(~Tz) 1 + exp (/~TZ)

Practical Reliability Data Analysis TABLE

347

3

Results of Proportional Hazards Analysis for the Event Process with Covariates a

Model

Coefficients (standard errors)

Null P+Q P+Q+P.log(t) P + Q + Q.log(t)

be

bQ

3.00 (2'56) 6.68 (2-76) 2.26 (2-12)

- 0.05 (0"09) 6 × 10 -3 (0.09) 0-27 (0"11)

Pc+Qc

bQ,,

Pc+Qc+Qc.lOg(t)

Interaction 274-23 266.0 l -5.80 (1.54) -0.35 (0-08)

-2.54 (0.69) -1.29 (1.46) -2.54 (0.69)

Pc+Qc+Pc.log(t)

"

be,.

Deviance

-0.09 (0.03) -0.06 (0.03) -0.09 (0.05)

204-87 232-13 196.14

-0.77 (0.80) 0.01 (0.03)

195.10 196.10

The Proportional Hazards Model has a hazard 2(t:z) at time t with covariates z given by 2(t; z) = exp (bTZ)Ao(t)

where 2o(t) is the base-line hazard and b T are the parameters to be estimated. The two covariates are denoted by P and Q, and the respective cumulative measures are denoted by Pc and Qc. Source: Ref. 20.

0

--

0

-

"~

g

-1 - -

-1-

~

-2--

-2-

~

-3--

•

%.

%.

o

-3

0

I 0.5

I 1.0

I 1.5

(a) Residual

I 2.0

I ~ 2.5

-

0

I 0.5

I 1.0

I 1.5

I 2.0

(b) Residual

Fig. 5. Residual plots of the event process with covariates data for the proportional hazards model: (a) when fitting P and Q: (b) when fitting Pc and Qc. (Source: Ref. 20.)

348

J. I. Ansell, M. J. Phillips

where z is a column vector representing the conditions on the ith day and fit is a row vector of parameters. Estimation of the parameters can again be carried out by Maximum Likelihood methods using the statistical package GLIM. An alternative analysis which proves equally fruitful is to consider using a non-stationary model as described by Ansell & Phillips. 2° One can then compare the change in failure rate with the values of the covariates to see if there is a relationship. This has a similar effect to the above model.

4.2 Electronic system data There are two possible levels of analysis, the systems level and the module level. Since the failures are occasioned by the modules it seems appropriate to start the analysis at that level. Starting with simple plots, a Pareto plot of the total number of failures of a module against module number ranked by frequency of failure seems initially most helpful; see Fig. 6. This immediately highlights for the management of the manufacturing company the modules which are performing badly and hence those where initially resources should be focused. Four modules contributed 50% of the failures, whereas three modules did not fail at all during the study. This may be regarded as having already achieved one aim of the study. 300 280 260 240 220 200 "6

180

~ 160 c-

140

>= .~ 120 100 80

60 40 20 0

'

I

4

'

I

8

'

I

'

12

I

16

'

[

20

'

I

24

'

l

28

'

I

32

Module serial number

Fig. 6. Cumulative number of failures for modules of the repairable electronic systems.

349

Practical Reliability Data Analysis

The next stage is to see what else may be gained from the current data. Obviously initially it is best to concentrate on the modules with most failures. This would mean initially producing stemleaf diagrams, means, variances and indices of dispersion. Having assumed nothing of note in this initial analysis it is necessary to delve more deeply. The next stage may be to consider the distributional form and whether it is I F R or DFR. Could the data be treated as being from a Weibull population? Figures 7 and 8 are the TTT-plot and the Weibull plot, respectively, of the first module. Both plots indicate strongly that the data come from a D F R distribution. The Weibull plot is similar to many empirical Weibull plots and the Cramer-von Mises statistic suggested by Koziol & Green 21 indicates that the Weibull distribution is appropriate. Since the modules are DFR, then one should replace on failure and not before. At this stage it is appropriate to comment again on over-interpretation. A Weibull plot for the second module was given by Ansell & Phillips. 2° It is plausible to suggest from that plot that the data illustrate the 'traditional history' of a system/component: an early burn-in period followed by a period of near-constant failure and then a wear-out phase. Alternatively one could suggest a non-homogeneous population. The apparent pattern might be accounted for by differing batches. Such comments cannot unfortunately

0.9 0.8 0.7 .u_ 0.6

i 0.5 u~ 0.4 ~ 0.3 0.2 0.1 0

Fig. 7.

0

I 0.1

I 0.2

I 0.3

I 0.4

I 0.5

I 0.6

I 0.7

I 0.8

I 0.9

I 1

1/52 Total time on test (TTT) plot of failure data of module ! of the repairable electronic

systems. (I is the index of the ordered TTT statistic.)

350

J. 1. Ansell, M. J. Phillips 0 -0.5 -1 -1.5 -2 -2.5

~- -3.5 0 a-4.5

-5 -5.5 -6 -6.5 -7 -7.5 -8

I

-2

Fig. 8.

-1

'

I

0

'

I

1

'

I

2

'

I

3

~

I

4

~

I

5

'

6

Log (time) Weibull probability plot of failure data of module 1 of the repairable electronic systems with best-fit line. (Time scale in days.)

be supported by the evidence available from the data. Without supportive evidence the danger arises of fitting too complex and possibly inappropriate models. If one accepts either model one would split the data into three or more groups of observations, fitting separate models to each group. Optimistic estimates may well arise from such an analysis. It is also far removed from the aim of the study. The failure data for another 17 of the modules were analysed in a similar way to that outlined above for the first module. As a result of this analysis of the 18 highest failing modules the Maximum Likelihood Estimates (MLEs) indicate that there is no significant difference between the failure time distributions of some of these modules. The four highest failing modules (1-4) were treated separately but the next 14 highest failing modules (5-18) were grouped into four groups. The previous analysis of failure times was repeated for these four groups and the results are given in Table 4. The remaining 14 modules had four or fewer recorded failures per module. With such sparse data it did not seem reasonable to perform any elaborate model fitting. Instead these modules were assumed to all have failure times from a common negative exponential distribution. This result is also included in Table 4. There are two groups (A and B) of fairly reliable modules which can be fitted by negative exponential distributions with medians of about 800 and

Practical Reliability Data Analysis

351

TABLE 4 MLEs of the Parameters of the Weibull and Negative Exponential Distributions for the Failure Times of the Grouped Modules of the Repairable Electronic Systems

Group

A B C D

Serial no. of module

Number of Shape .failures parameter

(S.E.)

Scale Median parameter

1 2 3 4

52 41 27 19

0.7102 0.8996 0.7564 0.9266

(0"0852) (0.1194) (0.1282) (0.182 1)

275"3 283'3 593.8 623'4

164-3 188.5 365'8 419.8

7,9,14 8,10,13 15,16 5,6,12 11,17,18 19-32

26 36

1.220 0 0"911 4

(0'208 3) (0.136 1)

792'3 1 854,5

586.8 1 240.4

31 20 25

0"6428 0.5261 1-0

(0.106 1) (0.1094)

3462'0 17039'6 5 845'3

1 957.4 8490.1 4 051.7

1000. Then there are two groups (C and D) of reliable modules, which exhibit a high initial hazard rate, which can be fitted by Weibull distributions of about 0"6 and 0"5 respectively and medians of about 2000 and 8500. The analysis of these failure data illustrates some practical problems. Firstly any system of electronic modules will typically contain many modules and some of these will have few failures. In practice, to do any parameter estimation it is necessary to group together similar modules. There are some dangers in doing this on the basis of the data rather than for prior physical/engineering reasons. Secondly there is the problem of highly censored data. Any plotting methods used for goodness-of-fit must take this into account. Thirdly it is possible to obtain estimates of the parameters of Weibull distributions easily but obtaining standard errors of the estimates is not straightforward. If this model fitting is acceptable, the final problem is obtaining the failure rate for each module and combining them to obtain the failure rate of the system assuming that the modules operate independently. The failure rate is far from easy to obtain for certain distributions. Baxter et al. 22 have produced tables for the Weibull distribution but these unfortunately are incomplete and not easily accessible. Hence for many distributions the only possibility for a parametric approach is to use asymptotic formulae 2a which may not be sufficiently accurate. This was done by Ansell & Phillips. 2° A Superposed Renewal Process model for the system was assumed: hence summing over the five negative exponential distributions gave an estimated failure rate of 0.014, i.e. 14 failures per 1000 time units. For the four Weibull

J. I. Ansell, M. J. Phillips

352

distributions the estimate of H(t), the expected cumulative number of failures at time t, using the asymptotic formula was given by H(t) ~ 0.005t + 8.4 So there was a failure rate of 0-005, i.e. 5 failures per 1000 time units, plus 8.4 failures due to early failures because of the form of the Weibull distribution. Combining the two failure rates, the estimate of the total expected cumulative number of failures for the electronic system was given by /~(t) --~0.019t + 8-4 So the estimate of the asymptotic failure rate for the system was 19 failures per 1000 time units. An alternative is to use a non-parametric approach. Nelson 24 suggested plotting the mean cumulative number of failures. The plot for the electronic data is given in Fig. 9 with upper and lower limits. These have been obtained by using a method of obtaining a non-parametric equivalent of the Kaplan-Meier estimate and Greenwood's variance for recurrence data suggested by Nelson. The limits are calculated by adding and subtracting twice the standard errors. This plot suggests that the estimate of the failure rate of 19 per 1000 time units is about right, but that the 8.4 failures due to early failures is much too large (pessimistic). 6 -5.5 -5 --

,_ . . . . . . . .

¢,o

" Upper limit

4.5-"6

4--

,-, E c=

3.5--

'~

2.5

, / f j . s ~ s* !

3--

. . . . . .

~" " ~ / / I _

~

s~

8

-~-~ailures

.... ~ /.

/

.......

L'ower limit

~

~

2--

8

1.5-1 0.5

.,, /~ jJ / 'S it ///.s/S~"

s~ S

I

I

40

I

I

I

80

I

120

I

I

160

I

I

200

Time Fig.

9.

M e a n c u m u l a t i v e n u m b e r o f failures for the electronic data. (Time scale in days.)

Practical Reliability Data Analysis

353

This analysis illustrates the problem that estimating the failure rate is difficult for the Weibull distribution. The usefulness of the asymptotic formulae applied by Ansell & Phillips 2° depends on whether they can be used for reasonably small t or not. As they have been obtained by considering different failure rates and the rate of convergence varies over the different functions it is difficult in any application to be sure. But until methods of easily evaluating the failure rate are readily available this seems to be the best that can be done using a parametric approach. The alternative is to use a non-parametric approach.

4.3 Fleet mechanical equipment data This final set of data illustrates that often after the initial analysis a very specific form of analysis is chosen. In this case a specific stochastic model is chosen because of the nature of the data and also because the model seemed to be plausible within the physical context. The model chosen for the analysis is a Birth-Immigration process with two parameters ~, the rate of failures due to external causes (the immigration rate), and fl, the rate of failures due to wearing out (the birth rate). For this Birth-Immigration process model the MLEs ~ and ~ were found by Triner & Phillips, 25 who obtained &= 0.5391 and ~ = 0.2117. The objective of the study is to enable decisions to be made about the repair/replacement policy. For this reason it would seem useful to have an estimate of the failure rate h(t) at time t. Hence, using the above MLEs, the estimated failure rate is given by no(t)= 2-1564 exp (0.2117t) This estimate of the failure rate is plotted for t in the interval [0, 12] in Fig. 9. The distributions of the MLEs &and ~ were investigated by simulation of the Birth-Immigration process with parameters 0t = 0"5391 and fl = 0"2117, the observed values of the MLEs. The results of simulations indicate that, though ~ is likely to be an unbiased estimate, & is likely to be a biased estimate. The two MLEs have a negative correlation coefficient. An approximation for the standard deviation ofln (/~(t)) as a function of 0t and fl was obtained in terms of var (0i), var (~) and cov (~, ~). Ignoring any effect of bias for ~ and ~, upper and lower limits are given in Fig. 10 for h(t). This is done by using In (~(t)) and adding and subtracting twice the standard error of In (~(t)) using the simulated values obtained by Triner & Phillips. 2° This shows a slowly increasing interval between the limits up to about time 6 which then rapidly increases after time 9. Estimates of the variances of MLEs can usually be obtained using

354

J. 1. Ansell, M. J. Phillips 8075 70/

65-

/

/

iI /

60-

iI

55--

/

50--

i

Upper limit

i I i I

45I

~ "o

40

~

35-

.E

3o

uJ

25-

I

I

-

I

i I i

-

/

I

t tI

~

20-

e rate

I

1510..........

50

'

I

2

'

I

4

'

t

'

6

I

8

Lower

'

I

10

~

limit

I

12

Time

Fig. 10.

Estimated failure rate for the aggregate data from the mechanical equipment fitted to a fleet. (Time scale in years.)

standard methods from the information matrix. However, for the BirthImmigration process the usual condition of independence of observations does not apply. Sweeting 26 indicated that the use of the information matrix may still be justified because of alternative 'asymptotics'.

5 CONCLUSIONS A general review of Reliability Data Analysis in a brief paper is an ambitious project. The pitfalls are obvious: either the inclusion of too many unconnected ideas and concepts, or the exclusion of too much. However the task gives the authors time to reflect on the current state and the reader the opportunity to disagree. The views presented will, of course, always be subjective. It is to be hoped that others will continue the debate initiated by this paper. This paper describes approaches to Statistical Data Analysis rather than pursuing specific techniques--hence the significance attached to the objective, or objectives, of a study in the present paper which is often missing in statistical reviews. The paper does not widen its scope to cover the definition of the objective as this impinges on the management of Reliability Studies. Other authors may feel more confident of addressing that topic.

Practical Reliability Data Analysis

355

Given a clear definition of the objectives, some attempt can be made to achieve them. As an attempt is being made to describe an approach rather than techniques, the need for practical examples is crucial. The examples are taken from the authors' experience of Data Analysis and where possible an attempt is made to describe the context. The context, like the objective, plays a major role in the analysis. The physical context must always be used as an aid to analysis. The next point to be emphasized is the need for careful examination of the data before choosing a technique. The emphasis is on simple approaches initially as basic statistics can frequently expose more than over-powerful techniques. However, when censoring is an important feature, this must be taken account of in the methods used. A 'cookbook' approach is not to be advocated but the following points should be highlighted. (a) There should be a careful check (validation) of the data. (b) Simple plots of the data should be produced. (c) Simple statistics such as means, variances and indices of dispersion should be computed. (d) Trend and dependency should be investigated. (e) Plots of failure time distributions should be obtained. (f) Finally, more specific modelling techniques such as Proportional Hazards, Multivariate analysis or Time Series Analysis can be used. Before attempting (f) it is important to have spent time and resources on (a) to (e). Finally it is necessary to reiterate that Statistical Analysis is solely a tool to achieve a given end. The objective should always be kept in mind in the analysis. The objective is not the Statistical Analysis but the Reliability Study.

REFERENCES 1. Beale, E. M. L., Mathematical Programming in Practice. Pitman, London, 1968. 2. Ascher, H. & Feingold, H., Repairable Systems Reliability. Marcel Dekker, New York, 1984. 3. Tukey, J., Exploratory Data Analysis. Addison-Wesley, Reading, MA, USA, 1977. 4. Cox, D. R. & Lewis, P. A. W., The Statistical Analysis of Series of Events. Methuen, London, 1966. 5. US Army Communications Research and Development Command, Reliability growth management. MIL-HDBK-189, US Army, Fort Monmouth, N J, 1981.

356

J. I. Ansell, M. J. Phillips

6. Stephens, M. A., Contribution to the discussion of Ansell, J. I. & Phillips, M. J., Practical problems in the statistical analysis of reliability data. Appl. Statist., 38 (1989) 235-6. 7. Barlow, R. E. & Campo, R. E., Total time on test processes and applications to failure data analysis. In Reliability and Fault Tree Analysis, ed. R. E. Barlow, J. B. Fussel and N. D. Singpurwaila. SIAM, Philadelphia, USA, 1975, pp. 451-81. 8. Barlow, R. E., Contribution to the discussion of Ansell, J. I. & Phillips, M. J., Practical problems in the statistical analysis of reliability data. Appl. Statist., 38 (1989) 239. 9. Lawless, J. F., Contribution to the discussion of Ansell, J. I. & Phillips, M. J., Practical problems in the statistical analysis of reliability data. Appl. Statist., 38 (1989) 236-7. 10. Triner, D. A., The assessment of fleet equipment reliability. Reliabil. Eng., 14 (1986) 63-74. 11. Kalbfleisch, J. D. & Prentice, R. I., The Statistical Analysis of Failure Time Data. Wiley, New York, 1980. 12. Lawless, J. F., Statistical Models and Methods for Lifetime Data. Wiley, New York, 1982. 13. Cox, D. R., Regression models and life tables (with Discussion). J. R. Statist. Soc. B, 34 (1972) 187-220. 14. Cox, D. R., Partial likelihood. Biometrika, 62 (1975) 269-76. 15. Bendell, A. & Wightman, D. M., The practical application of proportional hazards modelling. In Proc. 5th Nat. ReliabiL Conf. Birmingham. UKAEA, Warrington, 1985. 16. Dale, C. J., Application of the proportional hazards model in the reliability field. In Proc. 4th Nat. Reliabil. Conf. Birmingham. UKAEA, Warrington, 1983. 17. Ansell, R. O. & Ansell, J. I., Modelling the reliability of sodium sulphur cells. Reliabil. Eng., 17 (1987) 127-37. 18. Solomon, P. J., Effect of misspecification of regression models in the analysis of survival data. Biometrika, 71 (1984) 291-8. 19. Kay, R., Proportional hazards regression models and analysis of censored survival data. Appl. Statist., 26 (1977) 227-37. 20. Ansell, J. I. & Phillips, M. J., Practical problems in the statistical analysis of reliability data (with discussion). Appl. Statist., 38 (1989) 205-47. 21. Koziol, J. A. & Green, S. B., A Cramer-von Mises statistic for randomly censored data. Biometrika, 63 (1976) 465-74. 22. Baxter, L. A., Scheuer, E. M., McConalogue, D. J. & Blischke, W. R., Renewal tables: tables of functions arising in Renewal Theory. Technical Report, University of S. California, 1981. 23. Cox, D. R., Renewal Theory, Meuthen, London, 1962. 24. Nelson, W., Graphical analysis of systems repair data. J. Qual. Tech., 20 (1988) 24-35. 25. Triner, D. A. & Phillips, M. J., The reliability of equipment fitted to a fleet of ships. In Proc. 9th Adv. in Reliabil. Tech. Syrup: Bradford. UKAEA, Warrington, 1986. 26. Sweeting, T. J., Contribution to the discussion of Ansell, J. I. & Phillips, M. J., Practical problems in the statistical analysis of reliability data. Appl. Statist., 38 (1989) 234-5.

Lihat lebih banyak...

Practical reliability data analysis

Descrição do Produto

Comentários