Circular N Redundancy: A New Approach for EGAT Control System

May 31, 2017 | Autor: Co. Sep | Categoria: Computer Networks
Share Embed


Descrição do Produto

International Journal of Automation and Control Engineering (IJACE) Volume 3 Issue 2, May 2014 doi: 10.14355/ijace.2014.0302.04

www.seipub.org/ijace

Circular N Redundancy: A New Approach for EGAT Control System Vittawus Prueksasri1, Thumanoon Paukatong*2 Transmission Control System Development Department, Electricity Generating Authority of Thailand 53 Moo 2, Charan Sanitwong Rd., Bangkruai, Nonthaburi 11130, Thailand [email protected];*[email protected]

1

Received 21 Jan, 2014; Accepted 1 Mar, 2014; Published 10 May, 2014 © 2014 Science and Engineering Publishing Company

Abstract EGAT CCS is an in-house Computerized Control System (CCS) developed by Electricity Generating Authority of Thailand mainly for internal use. It has been developed since two decades ago. While there was no redundancy in the first generation of EGAT CCS, there were several redundancy related problems in the second generation. The problems were due to lack of redundancy in the design phase. It was very difficult to solve all the problems with this initial design structure. In this paper, the Circular N Redundancy (CNR) technique implemented in the third generation of EGAT CCS is illustrated. It alleviates the problems occurring in the second version of this control system redundancy. The new design concept of decentralization with redundancy is introduced. A distributed redundancy module manages each other in loop pattern. However, the previous type of redundancy requires a centralized watchdog. This is still a limitation of redundancy in case of failure of this overseer. To overcome this constraint, the distributed concept is also applied to the watchdog. Each unit can have a watchdog itself. One unit looks after another unit. Therefore, redundancy capability is enhanced to N modules. The more number of units is added to the system, the more level of reliability of the system is. However, an investment cost has to be compared with. Finally, the conformance test of this redundancy is proved in both switching time and validity of the process. Besides, the performance test with an in-house IEC61850 communication module is examined. The reliability requirement of EGAT CCS generation III is met by CNR. Keywords Redundancy; SCADA; Control System

Introduction Electricity Generating Authority of Thailand (EGAT) developed a computerized control system (CCS) mainly for internal use two decades ago. Previously, EGAT CCS was created without redundancy by

design. The first generation of EGAT CCS was DOS based control system for only a substation about 20 years ago. It did not include any kinds of redundancy. The second generation of EGAT control system, a window based system, called EGAT SCADA was for control center (Paukatong et. al, 2005). It was designed by centralization concept. All applications inside were supervised by System Manager (SM). To improve the reliability, redundancy capability was added to the system three years after core system implementation in 2003. However, several difficulties had been found during the process of adding this function. The maximum units of redundancy were limited to only two machines. Moreover, in case of SM failure, the total system crashed. Sometimes, SM agent made an error in prioritizing the sequence of application operation. The problems were due to the redundancy was not included at the initial design phase. It was very difficult to solve all the problems with this design structure. Therefore, the completed redundancy in the EGAT CCS was expected to see in the new version. The third generation of EGAT CCS has been developed for IEC61850 smart substation since 2012. IEC61850 is one of the main protocols for smart grid using in modern control system of electricity utility. Different from previous version, it is designed based on distributed concept. Besides, redundancy is the major capability by design. Redundant System Generally, redundancy is defined as a common approach to improve the reliability and availability of the system. There are three main software models: Standby, N Modular, and 1:N model. Standby redundancy, also known as Backup redundancy, is having one or more identical units for backing up the

57

www.seipub.org/ijace

International Journal of Automation and Control Engineering (IJACE) Volume 3 Issue 2, May 2014

primary unit (Wilson, 2005; Anonymous, 2008). Standby Redundancy consists of “Hot” and “Cold” standby. The different between these two types is status of power (“on” for “Hot”/”off” for “Cold”). However, this redundancy requires a third party to be a watchdog for supervising all units in order to promote a selected one as an operating unit. The second model is “N Modular Redundancy” comprising of “Dual”, “Triple”, and “Quadruple” with Voter. While “Dual” type requires two units, “Triple” and “Quadruple” need three and four units respectively (Habinc, 2002; Beckman, 2004). All of them require a voter. The more number of units, the more level of reliability and cost. The third type is “1: N Redundancy”. One unit performs as a backing up for N units with a voter (Anonymous, 2008). Moreover, there is another kind of redundancy known as “Nversion programming” (NVP). NVP approach is a method or process in software engineering where multiple functionally equivalent programs are independently generated from the same initial specification (Chen and Avizienis, 1995). Circular N Redundancy All above redundancy types require either third party or voter to be a watchdog. This is still a limitation of redundancy in case of failure of this overseer. To overcome this constraint, distributed concept should be applied to the watchdog. Each unit can have a watchdog itself. One unit looks after another unit in circular manner. Redundancy Needs for EGAT Control System The reliability of the system can be improved by adding more number of redundancy units. The possibility of total system failure depends on number of redundancy units. Moreover, better reliability can be achieved by reducing the switching time of the promotion process for a unit to be ready in use. EGAT requires at least 99.9% of availability for the control system. This means the total downtime is less than 8.8 hours per 24/7 for one year. With this number, the switching time of one second or less is preferred with at least two units redundancy. Moreover, as the technology trend is moving to distributed system, Circular N Redundancy is a preferred choice. Being different from other generation, the third generation of EGAT CCS has been developed according to distributed concept. Even though both application and network redundancies are provided in

58

this version of EGAT CCS, only the application redundancy is mainly mentioned in this paper. Concept of Circular N Redundancy (CNR) The concept of Circular N Redundancy is a standby redundancy in loop chain manner with distributed watchdog. As shown in Figure 1., unit no.1 is an active unit operating for the expected function of the system. Unit no.2 is running as a standby unit having a watchdog to check the health of unit no.1. Unit no. 3 also operates the same procedure as unit no.2 does, but monitors the well-being of unit no.2. Like other units, a standby unit no. “n” takes care of the health of unit no. “n-1”. With the loop chain manner, unit no. 1 performs as the health checker of unit no. “n”.

FIG. 1 CIRCULAR N REDUNDANCY CONCEPT

At first, Transmission Control Protocol (TCP) communication was used for checking the health of a unit. In this case, time spending was greater than the requirement of 1 second. Since User Datagram Protocol (UDP) is faster than TCP, UDP communication are substituted and meet this requirement. The simple reason is because its nonexistent acknowledge packet (ACK) that permits a continuous packet stream, instead of TCP that acknowledges a set of packets, calculated by using the TCP window size and round-trip time (RTT). Therefore, currently, the checking process for EGAT CCS Redundancy is using UDP. However, in the future, should EGAT need shorter switching time, a preferred option could be an OSI Layer 2 communication such as Generic Object Oriented Substation Events (GOOSE) defined in IEC61850-8-1. Moreover, it requires to have a configuration file for describing the arrangement of redundancy units. This configuration file locates at every units to avoid the failure of centralized pattern. A promotion of any redundancy units causes some change in the configuration file at the promoted unit. To make all other units having the same updated configuration, the promoted unit writes the configuration file of all other units. This process can guarantee the validity of a configuration file contained in all units. Flow of CNR Design Concept To understand how CNR works, the following flow

International Journal of Automation and Control Engineering (IJACE) Volume 3 Issue 2, May 2014

chart will explain as in Figure 2.

www.seipub.org/ijace

watchdog by distributed concept instead. 5. In case of failure in any unit, the redundancy process works in two different ways. a. When any backup unit stops working, the system will restart this backup unit to act as standby unit again. b. When the active unit is not available in any case, the first runner up standby unit will promote itself to an active unit and perform the operational task. Moreover, it will restart the exactive unit to work as a new lowest priority standby unit. Conformance Test Before integrating with other applications in EGAT CCS generation III, a CNR application was tested against the requirements: switching time and CNR concept. 1) Justification The satisfaction criteria for CNR are the following.

FIG. 2 CIRCULAR N REDUNDANCY FLOW CHART

1. As an application with redundancy is started, the program will check the configuration file locating in the same computer whether is primary (active) or backup (standby). The file contains IP addresses of all computers which applications reside in. The sequence of IP address indicates the priority of any unit. The active application has the highest priority. The runner up unit as the first backup unit has the second highest priority. Any other unit has the same priority pattern as previous units. 2. In case of being an active application (highest priority), the program performs an expected function at once. Moreover, it examines the availability status of the last priority redundancy unit. 3. In case of being a standby unit, an application will load all required initial parameters and start working as a backup unit. It also has the responsibility to monitor the availability of the primary unit or the backup unit whose priority is immediately higher than it has. 4. The availability checking process of redundancy system has its pattern as the circular chain inspection according to CNR shown in Figure 1. It gets rid of the weakness of the centralized

-

Switching time is less than 1 second.

-

The system is able to perform as describing in “Flow of CNR Design Concept”.

Beside the above two criteria, the number of “N” in CNR has to be identified for the minimum requirement in order to ensuring reliability of the control system via redundancy. The availability of standby unit becomes the concerning issue. -

For normal important environment such as within substation, the duration of unavailable standby unit is not more than single digit of second according to the interview of users comparing with the investment cost.

-

For the vital environment such as within the control center, the unavailability of standby unit is unacceptable.

2) Testing Process For the first requirement, two units of CNR application is established and connected with Local Area Network (LAN) as illustrated in Figure 3. The reason of selecting the number of redundancy unit (N) as “two” is that the concerned switching time happen only between the failure of active unit and the promotion of the first runner up standby unit. The unavailability of an application is simulated by

59

www.seipub.org/ijace

International Journal of Automation and Control Engineering (IJACE) Volume 3 Issue 2, May 2014

turning off that application. The starting time of active unit’s unavailability is recorded. Until the standby unit completely promotes itself to be an active unit, the time is recorded again. The switching time is calculated as the difference between this interval times. Active

TABLE 1 SWITCHING TIME TEST

Standby

FIG. 3 CNR CONFIGURATION FOR SWITCHING TIME TEST

For the second requirement, three units of the CNR is established and connected with LAN as depicted in Figure 4. The following items are performed 3 times each in order to completely round up as CNR concept. -

The active application is turned off.

-

The first runner up standby application is turned off.

-

The last standby application is turned off.

-

Both the active and the first runner up standby applications are turned off.

-

Both the active and the applications are turned off.

-

Both the first runner up and last standby applications are turned off. Active

Standby#1

last

TABLE 2 REDUNDANCY CASE TEST

standby

Standby#2

FIG. 4 CNR CONFIGURATION FOR CASE VERIFICATION

Result of the Conformance Test 1. The major concern of redundancy is the switching time. As the testing process is performed according to the previous section, the results regarding to switching time are recorded as in Table 1. From Table 1, none of switching times is greater than one second. The Mean of switching time is “0.6292795” with “0.160768” of Standard Deviation. According to the IEC 61850-5 Edition 2 standard requirement, the application recovery delay is 0.8 sec. Therefore, the first requirement of redundancy is satisfied by CNR.

60

2. The second test is a redundancy case verification. Six cases are provided. The result of this test is illustrated in Table 2.

The CNR are proven with all those six cases.The system is able to perform as specifying in “Flow of CNR Design Concept”. 3. In addition to the above two requirements, the availability of Standby unit is also crucial. The duration times of the Standby unit restarting process are recorded as in Table 3. The duration of less than six second (Mean: “5.72523”, Standard Deviation: “0.140678”) is taken into account for the selection of “N” unit. This situation shows that the redundancy system is degraded for about 6 second. If “N” is equal to “2”, there is a chance for having no standby unit for 6 second. Therefore, in the scope of substation control system, 2 units of redundancy are acceptable. However, this circumstance is very dangerous for reliability of control system in the control center. Therefore, the preference of “N” is at least equal to “3” for ensuring the highest level

International Journal of Automation and Control Engineering (IJACE) Volume 3 Issue 2, May 2014

of reliability. TABLE 3 REDUNDANCY RESTARTING TEST

www.seipub.org/ijace

In this performance test, both CNR and network redundancy were investigated. For the network redundancy test, the connection from the “Active HMI” was switched to between the “1st IEC61850 IED” and “2nd IEC61850 IED”. The result was positive. Loss in connection of “Network#1” caused connection changing to “Active HMI” and “2nd IEC61850 IED” in “Netwok#2”. Vice versa for the loss in connection of “Network#2”, a connection was switched to “Network#1”. The connection between “Active HMI” and “1st IEC61850 IED” was established. For the CNR performance test in IEC61850 environment, the loss of “Active HMI” was simulated several time. The outcome was acceptable. Suddenly after losing the “Active HMI”, the “Standby HMI” was promoted to be the new “Active HMI” and connection to “1st IEC61850 IED” was launched if the “Network#1” was available or connection to “2nd IEC61850 IED” was created in case of loss of “Network#1”.

Performance Test (with IEC61850)

Conclusions

Since this CNR is intended to be a redundancy system of the third generation of the EGAT CCS, the performance was tested with this in-house prototype. In this prototype, the two sets of Human Machine Interface (HMI) with IEC61850 communication module integrated with CNR were connected to LAN. This IEC61850 communication module in this test used IEC61850-8-1 mapping to Manufacturing Message Specification (MMS) as a communication protocol. Each set hooked up with two subnets of LAN. However, due to the limitation of IEC61850 Intelligent Electronic Device (IED) in our hand, its network connection capability was limited for only one network at a time. Therefore, two identical IEC61850 Intelligent Electronic Devices (IEDs) were used in this test. While one IED connected to one network, another was connected to another network as Figure 5. These two identical IEDs acted as one IED connected to two networks.

The CNR is selected as a redundancy system of the third generation of EGAT CCS. The concept of distributed watchdog is also implemented. In the scope of substation control system, 2 units of redundancy are acceptable. However, this degrading circumstance is very dangerous for reliability of control system in the control center. Therefore, the preference of “N” is at least equal to “3” for ensuring the reliability. However, the CNR has been already tested with “N” equal to 4 and 5 and received the same result. Therefore,“N” in the CNR method can be theoretically extended to infinity. Since the application residing in a machine is not limited to one by one, the number of actual machines used in the control system is still the same, not increased. There is no extra investment cost on the machine. Moreover, the test of this redundancy is proven in both switching time and validity of the process. Finally, the performance test with IEC61850 yields positive result. The CNR is proven for a redundancy system of EGAT CCS.

Active

Standby HMI

Network#1 Network#2

However, the roll back and the memory for command sending processes during the failure of an active unit are not included in the prototype yet. To be a complete redundancy system, these two functions will be integrated with in the future. REFERENCES

1st IEC61850 IED

2nd IEC61850 IED

FIG. 5 CNR WITH IEC61850 COMMUNICATION TEST

Anonymous. “Redundant System Basic Concepts.” National Instruments, January 11, 2008. http://www.ni.com/white-

61

www.seipub.org/ijace

International Journal of Automation and Control Engineering (IJACE) Volume 3 Issue 2, May 2014

paper/6874/en.

McCormick Place Lakeside Center, Chicago, Illinois,

Beckman, Lawrence V. “The New Quad Architecture:

USA, 2005.

Explanation and Evaluation.” 2004. http://www.eic2. com/pdf/hima_quad_architecture.pdf. Chen, Limimg and A. Avizienis. “N Version Programming: A Fault-Tolerance Approach to Reliability of Software Operation.”

Paper

presented

at

the

Twenty-Fifth

International Symposium on Fault Tolerant Computing, June 27-30, 1995. Habinc, Sandi. “Functional triple Modular Redundancy (FTMR): VHDL Desin Methodology for Redundancy in Combinatorial and Sequential Logic.” 2002. http://klabs. org/richcontent/fpga_content/DesignNotes/seu_hardenin g/functional_tmr_fpga_003_01-0-2.pdf. Paukatong, T, J. Puangnak, and T. Luewan, “From RealTime Data to a Power Network Display: an EGATSCADA’s Conceptual Design,” Paper presented at Cigré/IEEE-PES Symposium II, Congestion Management in a Market Environment, Texas, USA, 2005. Wilson,

Frank

R.

Communications

“How in

Much

Control

redundancy

Systems

is

of

Really

Necessary?” Paper presented at ISA EXPO 2005,

62

Vittawus Prueksasri was born in Bangkok, Thailand, in 1987. He received the Bachelor's degree in Telecommunication Engineering from King Mongkut’s Institute of Technology Ladkrabang, Bangkok, Thailand, in 2010 and received the Master’s degree in Computer Engineering from Chulalongkorn University, Bangkok, Thailand, in 2012 respectively. In late 2012 he joined EGAT. He is now working as a team member of the EGAT-SCADA Software Development team, the Transmission Control System Development Department, Control and Protection Division. Thumanoon Paukatong received his Bachelor’s degrees of Engineering and Business Administration in 1992. He also holds a M.S.-Electrical Engineering (1994) from University of Southern California, M.B.A. (1996) from Kansas State University and Ph.D. in management of technology (2006) from Asian Institute of Technology. Currently, he works for EGAT in the Transmission Control System Development Department, Control and Protection Division. He is an EGAT-SCADA Software Development team leader. His responsibility is taking care of the development of EGAT Control System Generation III and EGAT-SCADA security. He also received the first rank research excellent award from Kamton Sintavanon Foundation in 2002.

Lihat lebih banyak...

Comentários

Copyright © 2017 DADOSPDF Inc.