System Reliability Modeling – a comparative Study

May 24, 2017 | Autor: Haim Livni | Categoria: Safety Engineering, Markov chains, Reliability, FTA

Descrição do Produto

System Reliability Modeling – a comparative Study

Haim Livni, M.Sc. C.R.E

Advanced Logistic Design Ltd. , A.L.D., [email protected]

Abstract

RBD, FTA, Markov, Boolean , Monte Carlo simulation are the most common
approaches to System Reliability Modeling.

Fact is that all of them are in use. This already proves that each one has
its own advantage and disadvantage. If there was one with advantages only
over the other, all others would disappear.

Comparing Monte Carlo simulation with all the others as a group of
analytical methods was treated by Ajit Kumar Verma and others ( 1)

Regarding the analytical methods , it is common practice , in the
literature covering System Reliability Modeling , to justify the selection
of a specific method.

Man Cheol Kim and Poong Hyun Seong [ 2] dealing with nuclear protection
system recognize FTA as the most advantageous model. The same method is
preferred by Rongrong Yu1 and others [ 3] for multi-input and mul-ti-
output, systems with large quantity of components,

W.E. Smith and others chose Markov chains in there Server Availability
study [ 4]

Bennets [ 5] indicates that RBD has its advantages in relatively simple
systems, but for complex systems the use of a conditional probability
result (Bayes theorem) will be required. He advocates an alternative method
: to treat the whole problem as if it were Boolean.

Is it accidental that safety problems prefer FTA , Server Availability
vote for Markov and very complex system advocate Boolean methods?

The paper deals with the above question. It reveals the logic behind the
listed preferences .

Introduction

The simplest case for Reliability/Availability Modeling , containing
redundancy is a pair of two redundant units. Suppose they are a pair of two
servers. An RBD representation of this system would be :

Some further information is needed regarding the redundancy mechanism to be
able to solve the problem . Some variations of this

1. A and B are connected in a cold stand –by redundancy configuration

2. A and B are in a Master Slave configuration

RBD, FTA, Markov, Boolean , Monte Carlo simulation are tools which can be
applied to solve the problem. The fact that all of them survived the
evolutionaly process indicates that each one can be used in various
situations.

The paper will investigate , when is one of the above tools advantageous?

The results , will indicate that the fact, that Reliability studies prefer
Markov models , while safety studies apply mostly FTA has its rational
explanations .

First we shall deal with perfect testability cases. Then imperfect
testability will be introduced for reliability and safety issues. The
complexity of the considered system is gradually increased.. Tools
adequacy will be discussed further on.. Last , conclusions are drawn

Perfect testability

The simplest interpretation of the above configurations , would assume
perfect testability . The resulting scenarios for system failure are :

In a hot stand by configuration :

1. Both servers are active all the time .

2. One server , suppose A fails. The operation is supported by B.

3. Server A is repaired during a given repair time (usually the MTTR of
A).

4. If B fails while A is under service , system fails.

5. Otherwise one starts from 1.

In a cold stand by configuration:

1. Cold stand by scenario 1

2. One server , say A is active .

3. If A fails B is activated. During activation time , the system could
be considered failed , or functional - depending on the application.

4. Server A is repaired during a given repair time (usually the MTTR of
A).

5. If B fails while A is under service , system fails.

6. Otherwise one starts from 1.

Cold stand by scenario 2

( applicable only when stand – by unit can fail also. Sometimes this
configuration is referred to as "warm standby"):

1. One server , say A is active .

2. If B fails it is repaired during a given repair time (usually the
MTTR of A).

3. If A fails while B is under service , system fails.

4. Otherwise one starts from 1.

A Master Slave configuration, in this case , is similar to cold stand by
above.

Imperfect testability

A real system must allow for undetected failures. Experience indicates ,
that test coverage , i.e. the probability that a failured is detected is a
major factor , which determines availability of redundant systems. Indeed,
if a failure is not detected , switching to standby, as well as start of
repair activities, are inhibited. Thus the effectivity of redundancy is
reduced significantly.

Configurations

In a hot standby configuration there is no need to activate a stand-by
unit , while the active is funcitonal.. But if repair is not initiated ,
upon failure of standby , the system will fail as soon as the next failure
occurs.

For stand-by configuration, undetected failures are associated with the
following events, causing system failure:

1. Active unit fails in an undetectable failure mode . Stand – by is not
activated

2. Stand by failures fails in an undetectable failure mode . When failed
stand by is activated – after a failure of the active unit- system
fails. The vulnerability of stand –by redundancy is increased , by the
fact that detection of failures in a stand – by unit is rather
difficult.

In a Master- Slave configuration , to the system failures causes, listed
for stand by configuration , we should add cases when

3. a slave "recognizes" a false failure in the Master and both units tend
to be masters.

The failure coverage of a each unit depends on the division of functions
( tasks ) between Master and Slave. If the Master does most functions, it
will have a high coverage rate and the slave will have a low one, and
viceversa.

Testability considerations

The imperfect testability is further complicated , by the following
considerations:

1. A failed module , with an undetectable failure mode – can develop a
further failure – which is detectable. In this case, the module will
be replaced and the undetectable failure will disappear.

2. When a system failure occurs (i.e. both modules fail) in most cases
the system failure will, ultimately, be detected ,. ( If a failure
will never be detected , then it probably does not cause any problem).
However the ultimately term is problematic.

Consider two examples of two redundant servers:

1. In a control system, one expexts the stopping of a production line ,
when an out-of-control state is detected. However if the out of
control state is not detected, by either one of the servers, the
production line will continue to run in the out of control state. We
can assume that the problem will be detected by some QC operation, at
a later state, or by one or more unsatisfied customers. This kind of
late detection has severe consequences . The severity depends on the
duration of the latency state.

2. A safety protection system , which detects a hazardous situation (
i.e. high temperatures) and should stop the unit to prevent fire
raising.. If both units must fail to detect failure , the detection
will be after a damage ( death , injuries, high losses) , or
undesired hazard state occured.

Periodic tests, performed at various levels and intervals, will eventually
allow to complement the on-line undetectable failure modes. By increasing
the efficiency of these tests and their frequency – the availability of the
system will approach that of a perfect testability case.

Complex systems

Markov states are solved easily by algebric equations , if steady state
solutions are required and failure rates are constant. When this is not the
case , the solution of Markov equations will need some numerical solution.
If numerical solution is involved , then why not Monte Carlo simulation?

Reality is usually much more complex than the situations described above.
Fig . 1 representents the simplest possible configuration , involving
redundancy . But take a ship, or an aircraft and you have a much more
complex system.

But , even if Figure 1 would represent some system of interest, if we want
to discuss the availability of a field consisting of N such systems, its
treatment by Markov becomes difficult. If spare parts, maintenance
echelons, varying turn arround times etc. are added , the complexity of the
system becomes extremely difficult

Tools adequacy

Perfect Testability

It is easy to verify that under the perfect testability assumption all
analytical methods (BDE, Markov, FTA, Boolean ) give the same results:

Notes:

1. Markov state solutions are readily available for these cases in most
Reliability Handbooks

2. BDE is not quite a method. It is a graphical representation of the
redundancy relations, which in most applications (e.g. 6 ) will
solve the problem based on a readily available solutions (usually
based on Markov state formulas, or Monte Carlo simmulations)

As a consequence , an available RBD tool , will be the simplest solution of
the perfect testability case.

Imperfect Testability

RBD will not be an efficient tool to solve problems with imperfect
testability . RBD recognizes either repairable or non repairable units.

Markov models will be found efficient , when probability of a unit
remaining undetected until system failure, is not negligible. In most
reliability problems where typical unavailability requirements vary between
10-2 to 10-6 , this will be the case since such levels can be achieved with
no/ infrequent PM .

FTA will be found efficient , when the frequency of preventive maintenance
is high enough , to allow the neglection of failures remaining
undetected/unrepaired until system failures. For safety problems this will
be very often the case, because system failure probabilities
characteristic to safety are extremely low : 10-6 to 10-14. Such figures
require frequent preventive maintenance

Graph Theory and associated Boolean Algebra will be necessary to solve
complex network type redundancies (e.g. 7). However, once the cut sets of
network are established , the problem can be solved using FTA models

Monte Carlo Simmulations ( 8) will be the practical solution for complex
systems or fields , such as those discussed above in the previous section

Conclusions

Table 1 Recommended Tool

"Configuration "Detectabi"Un-Availability"Typical "Recommended "
" "lity "Requirement "Application "Tool "
" " "(Typical) " " "
"Simple "Perfect " "Prelimnary "RBD "
"Redundamncies " " "studies " "
" "Imperfect">1e-6 "Reliability "Markov "
" " " " " "
" " "1e-6 " Communication,"Boolean, Graph"
"Networks) " " "Transportation "theory + FTA "
" " " "etc. " "
"Fleets, " ">1e-6 "Reliability + "Monte Carlo "
"Multiple " " "ILS " "
"Maintenance " " " " "
"Echelons " " " " "

References

1. Ajit Kumar Verma, Srividya Ajit, Durga Rao Karanki Reliability and
Safety Engineering - 2010 - Technology & Engineering

2. Man Cheol Kim and Poong Hyun Seong An integrated Model for Reliability
Estimation of Digital nuclear Protection of Systems based on Fault
Tree and Software Control Flow Methodologies , Proceedings of the
Korean Nuclear Society Autumn Meeting, Taejon, Korea, October 2000

3. Rongrong Yu1, Yao Chen1, Jiuping Pan2, Richard W Vesel, Generic
Reliability Evaluation Method for Industrial Grids with Variable
Frequency Drives Energy and Power Engineering, Vol 5 no 48, July 2013

4. W. E. Smith, K. S. Trivedi, L. A. Tomek, J. Ackaret Availability
analysis of blade server systems IBM Systems Journal _ Friday, 8
August 2008

5. Bennetts, R.G. Analysis of Reliability Block Diagrams by Boolean
Techniques Cirrus Computers Limited; 29-30 High Street; Fareham; Hants
PO16 7AD, UK.

6. Soft Scout , the Business Software Encyclopedia, RAM Commander by
A.L.D. Ltd

7. H. Livni and Y. Bar Ness - Reliability analysis of telecommunication
networks using cutset approach, Microelectronics Reliability,Volume
18, Issue 3, 1978, Pages 285–289

8. Murray Wiseman - Monte Carlo Simulation , Optimal Maintenance
Decisions (OMDEC) Inc. 2006
-----------------------

Figure 1ô7ødŒfTgägzhÈhriÍipppp-pp p!púúúúúúúúøööööööú Parallel
Configuration

A

B

Lihat lebih banyak...

System Reliability Modeling – a comparative Study

Descrição do Produto

Comentários