Special Feature: Fault Diagnosis of Microprocessor Systems

August 6, 2017 | Autor: Vidhya Srini | Categoria: Fault Detection, Control Systems, Fault diagnosis, Digital System Testing, Computer

Share Embed

Denunciar este link

Descrição do Produto

Special Feature: Fault Diagnosis of Microprocessor Systems

Vason P. Srini Virginia Polytechnic Institute and State University

Introduction The rapid growth in numbers of microprocessor-based intelligent terminals-e.g., POS terminals, communication preprocessors, and I/O processors for large computer systems-has brought with it increased awareness of the importance of guaranteeing the correct operation of the microprocessor without disturbing the use of the system. This in turn has focused attention on fault detection, location, and repair in such systems. One way to guarantee the correct operation of the microprocessor is to use a set of diagnostic programs that checks the units of the microprocessor system for the correct operation. This paper explores diagnostic systems for microprocessors-resident diagnostic programs (permanently in the microprocessor system), non-resident diagnostic programs (loaded into the memory), and the diagnostic supervisor (the collection of programs controlling the execution of the diagnostic programs).

The diagnostic system

in the microprocessor system; the third is realized by the diagnostic supervisor. The diagnostic supervisor has three states: program state, idle state, and machine check state. The application and user programs of the microprocessor system are executed in the program state. The diagnostic supervisor has no control over the execution of the programs in this state. If there are no user programs waiting to be executed, then the supervisor changes to the idle state. The diagnostic programs performing fault detection are executed under the control of the supervisor in this state. The diagnostic supervisor changes to the machine check state if a fault has been detected in the idle state. The machine check state of the supervisor indicates the presence of a fault in the microprocessor system, and the diagnostic programs are executed under the control of the supervisor to locate the faults. The state transitions are shown in Figure 2.

Resident diagnostics

A typical microprocessor system consists of a microprocessor chip, ROM and RAM chips for memory, I/O controllers, communication controllers, interface chips, and communications bus (see Figure 1). The diagnostic system for the microprocessor system consists of diagnostic programs that are executed by the microprocessor in the system. Executing the diagnostic programs results in indication of the correct operation of the units or the faulty units and their location. The functions of the diagnostic system are the following: (1) to provide confidence level testing, (power up testing, routine testing of the system when it is idle, testing all loading operations, and peripheral and other optional testing); (2) to locate faults at the LSI chip level (processor chip, RAM chip, communication chip, etc.); and (3) to perform automatic testing of the microprocessor

These programs, stored in ROM are the first ones executed after power is turned on, to provide confidencelevel testing of the system on power up. The diagnostic programs are then executed under the control of the diagnostic supervisor in the idle state. The resident diagnostic programs detect the fundamental type of faults in the various units of the microprocessor system. The microprocessor, RAM memory, I/O controllers, and communication controller each have a collection of resident diagnostic programs. The microprocessor-resident diagnostic program tests for the following faults: (1) single stuck-at 0(1) fault on the microprocessor data lines, (2) single stuck-at 0(1) fault on address lines, (3) single stuck-at 0(1) fault on the registers, (4) single stuck-at fault in the arithmetic and logic unit, and (5) timing fault on a single data line or address line.'4'5

The first two of these functions are accomplished by the diagnostic programs that detect and locate the faults

The tests for this fault"7 are lengthy and time consuining, in general.2 91'6,1724,26 The use of fault equivalence,23 27 and fault folding3" to the data lines and address lines of

system.

60

COMPUTER

COMMUNICATION!S BUS

INTERFACE

ROM

RAM

--

TELEPHONE LINE 4

W

INTERFACE

(3) single stuck-at 0(1) fault in the row and column decoders of the RAM chip, and (4) single stuck-at 0(1) fault in the storage elements. Most of the RAM chips are constructed by using square storage arrays,,20,28,29 and the above faults are detected by one test.29 This test writes a word 111 ... 1 (000 ... 0) in the location (i,i), corresponding to the address of a diagonal element in the symmetric storage array. Then it writes the word 000 .0. (111 ... 1) at all other locations in the RAM chip, reads the results, and repeats with a new value for i, until all values of i are covered. If any of the above faults is present, then we will observe a word not equal to 111 ... 1 (000 .. . 0) in location (i,i) or a word not equal to 000... .0 (111... 1) in at least one of the remaining locations.

J

COMMUNICATIONS CONTROLLER

7

The diagnostic programs for I/O controllers and communications controllers detects single stuck-at 0(1) faults on the data lines in these devices. In many situations, the faults in the various units of the microprocessor system that manifest themselves as logical stuck-at 0(1) on the data lines and address lines of the system could result in an incorrect computation and consequently an incorrect operation of the microprocessor system. In many cases the resident diagnostic programs detect the single type of faults in the microprocessor system.

I/O~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1/0 DEVICE 1/0 CONTROLLER

I l/0 DEVICE

1/0

I

CONTROLLER

NO USER OR APPLICATION PROGRAMS IN THE WAITING LIST .0

Figure 1. Typical microprocessor system

the inicroprocessor, reduces the number of tests and timne required; however, they are still prohibitive for use in the resident diagnostics. So, single stuck-at faults in ALU functions are tested instead of their circuits. For example, the addition of two numbers, one stored in the accumulator register and the other in some other register, is performed by checking the outcomes after adding the two registers with the i-th bits of accumulator and the other register in 00, 01, 10, or 11 when all other bits are O's (l's). The RAM-resident diagnostic program detects faulty memory. This is performed by testing the functional units of the RAM chip.' 22.29 The faults considered are the

following: (1) single stuck-at 0(1) fault on data in/data out lines of the RAM chip, (2) single stuck-at 0(1) fault on the address lines, January 1977

Figure 2. State transitions of diagnostic supervisor. The diagnostic programs detecting faults in the microprocessor are executed under the control of the supervisor in the idle state. These programs are preempted on an interrupt indicating the presence of an application or user program awaiting execution. The detection of a fault in the idle state results in a transition to machine check state. The faulty units of the microprocessor system are located by the diagnostic programs executed in the machine check state. The system is powered down; faulty units are replaced and then powered up. 61

Non-resident diagnostics This set of programs, loaded into RAM when the diagnostic supervisor is in the machine check state, performs a more thorough testing of the microprocessor system to detect and locate the faults to LSI chips. They are executed under the control of the diagnostic supervisor after a fault has been detected by the resident diagnostic programs.

The faults considered in developing the non-resident diagnostic programs are the following: (1) multiple stuck,at 0(1) faults on the data lines and the address lines, (2) multiple stuck-at 0(1) faults' in the functional unlits of the microprocessor (processor registers, adder, multiplier, program counter, etc.), (3) stuck-at 0(1) fault on the control lines of the various units of the microprocessor system, (4) intermittent faults that temporarily appear as faults of the type shown in (1), (2), and (3), and (5) pattern-sensitive faults in the^RAM chips. It can be seen that the non-resident diagnostic programs could be longer and might take more time to execute than the resident diagnostic programs. In addition, executing these programs requires an understanding of the details of the system. The microprocessor and the RAM chips in the system with a small number of pins to access the circuitry and a high density of devices increases the number of tests in developing the non-resident diagnostic programs.

The tests for detecting faults (2) and (3)

can -be gener-

ated by using the representative functions approach3,24'32

boolean difference approach.'1824 By combining the above approach the techniques discussed by Kamal and Page,'4 the tests for detecting fault (4) can be generated. These tests are implemented by writing programs using the instructions of the microprocessor.

(5) A column is formed for each flag bit. The entry in the table corresponding to ith row and jth column is 1 if the instruction corresponding to ith row has the property corresponding to jth column, and 0 otherwise. The number of l's in the ith row defines the weight of the instruction corresponding to the ith row, w(i). The instruction with the least weight is the "most reliable" one. The microprocessor is divided into several functional units (e.g., adder, multiplier, incrementer, decrementer, input buffer, output buffer, program counter, stack pointer, etc.). These are assigned weights in a manner similar to that shown above, by using circuit schematic, design automation files, or logic diagrams of the microprocessor. A table is constructed with a row for each functional unit. The table has four columns to record the number of gate levels, the number of feedback paths, the number of instructions using a unit, and the number of clocks in a unit, respectively. The sum of the entries in the ith row, u(i), defines the weight of the corresponding functional unit. The weight of a functional unit is a measure of the complexity of the unit, and the highest weight indicates that the unit is the most complex one. Fault-detection tests are devised for the tunctional units.3,14,15,'8,24,32 These tests are programmed by using the weighted instructions of the microprocessor and executed on the microprocessor system. Let U(1), U(2), U(3),...,U(n) be the functional units in the microprocessor and P(1), P(2),.. ,P(m) be a minimum set of programs testing the faults in the functional units of the microprocessor. The sum of the weights of the instructions in the program P(i) may be expressed as

or

Detecting the faulty microprocessor Detecting microprocessor faults by executing the prothe microprocessor itself-i.e., by microprocessor self-test9'2' presents something of a vicious circle: a fault in the microprocessor can prevent the microprocessor from detecting its own faults. One way to break this loop is by providing a "reliability measure" on the instructions of the microprocessor based on the characteristics of the processor. The diagnostic programs are devised grams on

as a sequence

of programs with the programs at the be-

ginning of the sequence using the "most reliable" instructions. In essence, the "reliable instructions" of the microprocessor are tested first, and then these instructions are used to test other instructions, etc. Each instruction of the microprocessor is assigned a weight by constructing a table with a row for each instruction. The columns of the table are the following: (1) A column is formed for each type of instructionregister-register, register-immediate, register-memory, register-index, memory-immediate, memory-memory, flag oriented, etc. (2) A column is formed for each of the available operations-AND, OR, NOT, COMPLEMENT, SHIFT, ADD, SUBTRACT, MULTIPLY, DIVIDE, MOVE, INPUT, OUTPUT, etc. (3) A column is formed for each of the data, address, and control lines. (4) A column is formed for each of the timing cycles. (For example, the Intel 8080 has the timing cycles T1, T2, T3, T4, and Ts.)

62

w(j)

w(P(i)) =

jWP(i) If UQ, UQ2. UQ. are the units tested by the program P(i), then the' weight of the program P(i), Wp(i)

=

w(P(i)) [maxlu(U

I), u(UQ2), * ., u(Uwi]

The diagnostic program for the microprocessor is the set of m programs, forming a minimal cover of the faults in the units U(j), 1 5 j c n, so that the sum of weights of the programs, m

*

E i=1

~~~~~wp(i)

is a minimum. Note that the test for a functional unit may be programmed in several ways. We select the program using the "reliable instructions"-i.e., the program with the minimum weight. Since the functional units in the microprocessor are not accessible for repair, the non-resident diagnostic programs for the microprocessor need only detect faults. Microprocessors with microprogrammed control have been recently proposed.'3 Fault-detection tests for such machines can be developed easily using microdiagnostics.2' Then, the abovementioned strategy can be used in developing the microprograms to test the microprocessor.

Locating the faulty RAM chips The memory in the microprocessor system uses a number of identical RAM chips, most of which have a symmetric storage array and a single transistor per storage element.'° 20 States 0 and 1 of a storage element correspond to the presence or absence of electric charge above COMPUTER

I

I

Fr- -ml

l

l l

l

l

l

r - - --I L-

J

Figure 3. A storage element and its four neighbors. the charge interaction between a storage element and its four neighbors result in a pattern-sensitive fault in the RAM chip, called an "adjacent pattern interference fault."

threshold level. The interconnection of the storage elements and their layout in the storage array result in charge interaction.1"'' 9 The possibility of charge interaction between a storage element and its four neighbors, one in each sense in the dimensional directions (shown in Figure 3), is higher than the possibility of charge interaction with other storage elements in the storage array. This charge interaction could result in changing the state of a storage element (0 to 1 or 1 to 0), when only the states of its neighbors are altered. This pattern-sensitive fault a

is called the adjacent pattern-interference fault.29

The non-resident diagnostic programs for the memory in the microprocessor system include a test for the adjacent pattern interference faults. The diagnosis of semiconductor memory has beeln widely discussed by several authors.5293' One aspect of this-the location of faulty RAM chips in a memory board-is described in Reference 29. The scheme consists of a sequence of six experiments. The first three experiments locate faulty interconnecting lines and address lines. The next two detect faulty address decoders, faulty storage elements, and faulty refresh and sense amplifiers in the RAM chips. The last experiment locates RAM chips with adjacent pattern interference faults. The non-resident diagnostic programs for the memory perform these experiments. The non-resident diagnostics for I/O controllers and communications controllers are performed by devising tests that use several different techniques.3 24I32 Programming is accomplished using the instructions of the microprocessor, and the programs are executed in the sequence determined by Chang's algorithm.4

Structure of the diagnostic programs The resident and non-resident diagnostic programs for each unit in the microprocessor system consist of a collection of programs, a test matrix, and a fault table. The test matrix, which provides information on the sub-units January 1977

P(1)

0

1

0

P(2)

1

0

0

P(3)

0

1

1

P(i)

0

1

0

1

0

1

0

1

P(m) I

a

WEIGHT OF U(n) PROGRAMS

U(j)

U(1) U(2)

1

-

0 I

1

I.

0

1

0

0

1

Wp(1)

0

Wp(2)

0

p(3)

0

0

0

1

1

1

.-

wP(m)

TOTALWEIGHT Figure 4. Test matrix. The weight of each program is computed and entered in a column. The non-resident diagnostic programs for the microprocessor are a collection of m programs whose sum of weights,

is a minimum, where P1, P2, ..., Pm is U(1), U(2), . , U(n).

a

minimal

cover

of

tested by each program, has a row for each program in the collection of programs and a column for each sub-unit tested by the collection of programs. The (i,j)th element in the test matrix is 1 if the ith program is a test for the jth sub-unit, and 0 otherwise. The test matrix of the nonresident diagnostic programs of the microprocessor, shown in Figure 4, provides a typical example. The fault table is a record of the faults diagnosed by the diagnostic programs. It has a row for each program. The ith row contains information on the sub-units detected as faulty by the ith program and if possible, the location of the sub-units. That is, the fault table contains the diagnostic message. The collection of programs is executed in a sequence by the microprocessor in the system. The completion of the ith program in the sequence of programs P(1), P(2),.... P(i), P(i+l),..., P(m) results in a jump to the start of P(i+ 1), 1 ment Department at NCR, the Electrical y_ Department at Virginia PolyXEngineering technical Institute and State University, and the Electrical Engineering Department at Tennessee Technological University. He received a BE from the University of Madras, an MSEE from Tennessee Technological University, and is working on his PhD in computer science at VPI and State University. His research interests are in the diagnosis of digital systems, artificial intelligence, automata theory, and software reliability.

-;

65

Lihat lebih banyak...

Special Feature: Fault Diagnosis of Microprocessor Systems

Descrição do Produto

Comentários