Masking Wrong-successor Control Flow Errors Employing Data Redundancy

Share Embed


Descrição do Produto

2015 5th International Conference on Computer and Knowledge Engineering (ICCKE)

Masking Wrong-successor Control Flow Errors Employing Data Redundancy

J avad Yousefi 1, Yasser Sedaghat2, Mohammadreza Rezaee3

Dependable Distributed Embedded Systems (DDEmS) Laboratory Department of Computer Engineering, Ferdowsi University of Mashhad Mashhad, Iran [email protected], [email protected], [email protected]

Abstract- Advancements of CMOS technology lead to reduction of the transistor size and operating voltage levels that cause transistors to become more sensitive to cosmic rays. Therefore CMOS devices like memory (i.e., RAMs) are more likely to be hit by transient faults. Up to 77% of the transient faults cause Control Flow Errors (CFEs). One type of CFEs is wrong-successor CFE which is caused by faults in data variables

resident in RAM. Previous control

flow

checking techniques neither detect nor correct this type of errors. A technique with the ability of masking wrong­ successor CFEs is proposed in this paper. Since occurrence of these errors is induced by faults in data variables which affect the program execution flow (control variables), in the proposed

technique,

distinguished

from

the other

control variables.

variables This

are

step

is

being being

followed by a traditional fault masking technique that is applied on the control variables. To evaluate the proposed technique, it was applied on five various benchmarks of the MiBench package. The experimental results demonstrated that the proposed technique is able to mask all 50,000 injected faults in control variables; while it had almost 21 % performance overhead with 6% memory overhead. It is reasonable and feasible to apply this technique on the former control flow checking techniques due to its perfect wrong­ successor CFE correction coverage and low overheads. Keywords- Wrong-successor control flow error, Error correction, Control variables, fault masking.

I. INTRODUCTION Safety-critical systems are the systems whose a failure may result in catastrophic consequences such as loss or serious hurt to people, extreme harm to property and environmental scathe. Medical devices along with aircraft flight control systems and railway signaling systems are some famous examples of safety-critical systems that have a remarkable impact on human's daily life. These systems usually operate in harsh environments; therefore, fault tolerance is the key requirement of these systems [1]. As the CMOS technology is counting to reduce dimensions and operating voltage levels of computer electronics, the radiation sensitivity of computer electronics such as memories extensively increases; therefore, safety-critical systems are much more fault prone [2].

Categorizing faults by their time span produce three main types that may threat the computer systems. These types of faults are permanent, intermittent and transient faults. Transient faults or soft errors have the most occurrence among all by the probability 10 to 30 times greater than the other types [3]. A transient fault is a fault that its effect fades away having time passed. Therefore, a component affected with a transient fault is able to function normally after spending this time [4]. Transient faults that have the origin of alpha particles from decaying radioactive impuntles in packaging and interconnect materials, and high-energy neutrons from cosmic radiation are one of the most important threats in memory devices [5, 6]. These faults can cause a bit-flip in the memory cells. A bit-flip is an unwanted change in the state of a memory cell. It can change the state of the memory cell from 0 to 1 or from 1 to 0 [7]. This change can modify an instruction of the program or alter the stored data in the memory. This modifications may cause a Control Flow Error (CFE) or data error in the program. A CFE occurs when the processor executes a different sequence of instructions compared to the desired execution. When the value of a variable in the program alters erroneously, a data error would arise [7, 8]. For example, a fault in the code segment of the program may cause a change in the opcode of a non-control instruction. This modification may alter the instruction to a control instruction, so this fault resulted as a CFE. If the same fault exists in the data segment of the program, this fault may change the value of a variable in the program that causes a data error [9]. It is reported that up to 77% of the transient faults cause CFEs. Some type of the CFEs occur due to faults in data. For example, if a fault occurs in a variable that is stored in the memory and a branch decision make up its mind by the value of the faulty variable then the fault may cause a CFE. This type of CFE is called a wrong-successor CFE. In fact, a valid yet incorrect branch is happening in this scenario. In the case that an invalid branch happens this branch is called not-successor [10, 11]. Control flow checking techniques are widely being applied to detect CFEs since the 1980s. These techniques are categorized into three major groups, software-based,

978-1-4673-9280-8/15/$31.00 ©2015 IEEE

20

I

hardware-based and hybrid [11, 12]. The signature monitoring method is the foundation of the most of these control flow checking techniques. In this method the program is divided into basic blocks. A basic block is a maximal set of ordered instructions in which its execution begins from the first instruction and terminates at the last instruction. There is no branching instruction in a basic block except possibly for the last one. A graph can be produced by the control flow of the program which is known as the Control Flow Graph (CFG). Basic blocks constitute the vertices of the graph while edges represent the branches [13]. Fig.l shows the basic blocks and an example of how the CFG of the insertion sort procedure is generated. vo id 1I7ser/iol7-Sorl(il7/l7, il7/0 x)

( il7/ key,i; fo,.(il7/j = I;j

<

l7;j++)

This paper focuses on masking wrong-successor CFEs which are caused by faults in data variables. These data variables, which are called control variables, affect the execution flow of the program. The idea behind this paper has two main phases. In the first phase, control variables of the program are being differentiated and in the second phase, a traditional fault masking technique is applied on the control variables. Implementations of this idea show that it is possible to mask all the wrong-successor CFEs with a slight performance and memory overhead compared to the control flow checking techniques. The rest of this paper is organized as follows: in the next section the fault model for this paper is presented, then in section III the proposed technique introduced and analyzed in detail. Section IV shows the fault injection technique, and the experimental results of the proposed method are evaluated in this section. Finally some conclusions are given in section V.

( key xlJ/; i j-I; while{-Y{i] > key &&

II. FAULT MODEL

=

=

1 >=

0)

{ xfi+ I} i--;

} x{i+ I}

=

=

xli};

key;

} }

Figure

I. Basic blocks and CFG of insertion sort procedure

In signature monitoring methods, at the compile time, a signature is being dedicated to any vertex of the graph, after having the control flow graph made. At the execution time of the program, a signature is made that is saved based on the control flow checking technique. This signature always indicate a basic block of code which is being executed by the system. When the control of program leave a basic block this signature, will be compared with the dedicated signature then it will be updated with the signature of the destination block. If the mentioned signatures do not be the same, a control flow error had happened and these techniques detect this error. In hardware-based control flow checking techniques, creating the run-time signature and comparing it with the dedicated signature are the duty of a redundant hardware like a watchdog processor. yet, in the software-based control flow checking techniques, this responsibility is done only using software by adding some extra codes to the program in order to create and compare the signatures [11, 13]. Although former control flow checking techniques have a great rate of control flow error detection, in evaluation of these techniques some form of targeted fault injection has been used and there is no fault injection in data; therefore, wrong-successor CFEs were not considered entirely. The capability of these techniques to detect wrong-successor errors were not completely evaluated. Moreover, since the control flow checking techniques focus on the control flow of the programs they are only able to detect not-successor control flow errors not the wrong-successor ones [10].

Memory faults have various approaches to manifest themselves. There are two parameters which are important to make the decision that if an error will stop the application from its normal behavior or will not affect the application with any discernible harm. The first one is the memory cell which has been influenced by the fault, and the second one is the time at which the fault leave its mark to a program execution. For an instance, the system will not crash if a fault happen in an unused location of its memory. However, if a fault happen in the data or code segment of the program it may cause any kind of errors such as: data error, control flow error, segmentation fault, bus error or illegal instruction. system hardware can detect only segmentation fault, bus error, and illegal instruction and then the operating system will catch them and map them to a signal [14]. Control flow errors coupled with data errors are more likely to happen among the mentioned types of errors. The operating systems are not able to detect these kinds of errors; therefore, it is necessary to detect and correct them by some mechanisms in safety-critical systems. As mentioned before, in control flow checking techniques, the common approach is to divide the source code into some basic blocks. Having a precise look on these techniques, occurrence of any fault in the system may cause one of the following six types ofCFEs [11]: \- Illegal branches from the end of any basic block to the beginning of another one. 2- Legal but incorrect branches from the end of any basic block to the beginning of another basic block. As mentioned before, this type of CFE is named wrong-successor CFE. 3- A jump from the end of a basic block to any point of another basic block. 4- A jump from any point of a basic block to any point of another basic block. 5- A jump from any point intra a basic block into any point of itself. 6- A jump from any point of a basic block to an unused memory space.

20 2

Generally, second type of CFEs which are known as wrong-successor CFEs are due to fault occurrence in data (control variables). For example, Fig. 2 shows a hypothetical scenario. Assume that in this scenario before execution of the "while" instruction the value of x variable was equal to 5 and a transient fault had happened in the location of this variable in the memory. This fault flipped the fourth bit of x from 0 to 1 caused the value of x change from 5 to 13 which lead the control flow of the program to execute block 2 instead of block 1. This scenario illustrates the process of occurrence of a wrong-successor CFE. wh ile(x < 1 0)

SEU

x = Figure

0101

x =

1101

2. Example of wrong-successor CFE

Due to the proportion of execution time of loop commands compared to the execution time of the program, control variables have a much higher probability of being hit by a fault than variables in instructions that execute only once. Normally variables which are used in a program are classified into two groups, control variables and non­ control variables. A variable that may alter the control flow of the program is a control variable. There are two types of control variables. First, the variables which are used in condition statement of a branch instruction. Second, the variables which are not in any branch instruction directly, still at least the value of one control variable depends on them. In both two cases, the variable affects the control flow of the program directly or indirectly. Fig. 3 shows the matrix multiplication procedure. In this procedure {i, m, j, p, k, n} are the control variables and {a, b, c arrays and sum} are the non-control variables. Since the main goal of this paper is to detect wrong­ successor CFEs, some faults should be injected into the system to create wrong-successor errors. As mentioned before wrong-successor errors are caused by fault occurrence in control variables. Therefore, in order to evaluate the proposed technique the effect of Single Event Upset (SEU) on control variables must be modeled using fault injection.

for (i = 0; i < m; i++){ for (j = O;j
Lihat lebih banyak...

Comentários

Copyright © 2017 DADOSPDF Inc.