A Controller-Based Autonomic Defense System

Share Embed


Descrição do Produto

A Controller-Based Autonomic Defense System Derek Armstrong ALPHATECH, Inc

Gregory Frazier ALPHATECH, Inc

Sam Carter ALPHATECH, Inc

Tiffany Frazier ALPHATECH, Inc

Abstract We will be demonstrating the results of our research into the implementation of a host-based Autonomic Defense System (ADS) using a Partially-Observable Markov Decision Process. The goal of an ADS is to “relexively” respond to an attack, thwarting it to the extent that humans have time to form a tactical response to the attack. A defensive system that automatically responds to an attack must meet two criteria: it must select the correct response in the face of an attack, and it must not take actions to attacks that are not there. This challenge is exaccerbated by the fact that, in order to detect never-before-seen attacks, the ADS must use anomally detectors for its sensor input – anomaly detectors typically have relatively high false posiive and falst negative rates. Thus, key to an ADS is a controller that can obtain a valid signal from a noisy sensor. The ALPHATECH Lightweight Autonomic Defense System (αLADS) is a prototype ADS constructed around a PO-MDP stochastic controller. The state model allows the controller to filter out the false positives from the anomaly sensor such that authorized processes are not killed and false alerts are not issued. We will demonstrate αLADS defending against Internet worms operating in real time.

1.

Introduction

The ALPHATECH Lightweight Autonomic Defense System (αLADS) is a prototype host-based intrusion detection and response system. The underlying technology is a stochastic feedback controller based on Partially-Observable Markov Decision Processes. The controller takes its input from a commercially-available anomaly sensor, calculates the probability that the system may be in an attack state, and invokes actuators to respond to a perceived attack. This prototype both demonstrates the facility of this system and the potential of a stochastic controller to direct the actions of an autonomic system.

The technical approach adopted by this project was to develop a real-time controller that would take inputs from sensors embedded in the operating system kernel and to invoke actuators in a timely manner to defend the computing system. A controller was developed in C++. A shared-memory module was developed to provide a high-bandwidth, low-latency connection between the sensors and the controllers. In parallel with the development of the controller, an off-line system was developed in Matlab to process the training data and develop the state transition probability matrixes and observation probability matrixes that are the POMDP models at the core of αLADS. The off-line system is capable not only of generating the models but also of performing rudimentary simulations of the controller to assist in controller evaluation. But, given the complexity and real-time nature of αLADS, the evaluation of the performance of αLADS must be empiracal. As such, we have begun to perform series of experiments in order to investigate the intricacies of controller behavior. These experiments and their analysis is inherently complex – due to the stochastic nature of the controller’s behavior, it can be very difficult to associate controller actions with the system activity that may have caused it. Thus, we pursued as simple a design as possible to facilitate the analysis. A brief summary of the accomplishments of the αLADS program: § The development of a real-time, feedback based controller integrated with a commercial sensor package. § The implementation of actuators that protect a workstation from attack. § Development of an off-line analysis package that processes data extracted from an operational system to create a PO-MDP model. § The acquisition of a suite of system attacks that allows us to experiment with system defense. § A Cyber Workbench system for performing an empiracle analysis of controller behavior. Tracks the “ground truth” of the system, allows comparison of what was really happening with

Proceedings of the DARPA Information Survivability Conference and Exposition (DISCEX’03) 0-7695-1897-4/03 $17.00 © 2003 IEEE

what the controller may have thought was happening on the system. As the experimen results presented in this report show, we have successfully developed an autonomic defense system that accomplishes sub-second response to automated computer system attacks. Ongoing research includes: § Continuing to refine our understanding of controller and anomaly sensor behavior. § The development of sophisticated actuators that provide a more comprehensive defensive capability. § Distributed and hierarchical autonomic systems that coordinate in the defense of organizational computing enterprises, or even the entire Internet.

2.

Summary of PO-MDP

This section provides a brief overview of what a PO-MDP control system is and how and ADS based on a stochastic control system works. Consider a system and a controller that asserts some degree of control over the system. The state of the system varies over time in response to external stimuli. The controller possesses controls that can be invoked to affect the current state of the system. The goal of the controller is to minimize a cost imposed on the system over time. There are a number of factors that impact the design of such a controller. One is observability: the extent to which the controller can determine the true state of the system. If the system is only partially observable (i.e., there is not a direct mapping between sensor observations and system state), the controller must maintain an estimate of the system state (as opposed to knowing the true state of the system). A second factor is the controllability of the system – the ability of the controller to affect the state of the system via its controls. The system is completely controllable if the controls deterministically set the state of the system. For a partially controllable system, the controls have a probabilistic effect on future states of the environment. Given that intrusion detection sensors generate both false positives and false negatives, one cannot determine with certainty whether a computer is under attack from sensor input. Further, while controls can be designed to oppose an attack, it is not certain that invoking these controls will return the system to a normal state (i.e., the system is partially controllable). A PO-MDP (Fig. 1) efficiently models such a system. A PO-MDP is a generalization of a Markov Decision Process (MDP) that addresses partially controllable and partially observable control problems. For a MDP, the next state and expected cost depend only on the current state and last control invoked. The transition probabilities and expected cost are

independent of all other previous states and controls applied. If the controller cannot access the exact state, but instead receives one sensor observation zk prior to selecting a control, then determining what state one is in and selecting which control to invoke is more problematic. First, the PO-MDP must generate a vector of probabilities to characterize its belief with regards to the current state of the system. At each time step, this vector is modified by combining the previous belief distribution, the control previously invoked (which determines the state transition probabilities) and the current sensor reading. This will result in a new belief vector (bk in Fig. 1) which is passed to the response selector. The response selector must choose a response based on the cost of the controls, the cost of being in the individual states, the probability that one is those states and the probability that the control will move the system to a more desireable state. [1] provides a mathematically rigorous description of the PO-MDP ADS. Transition Probabilites Pr(xk+1|xk,uk)

xk

Observation Probabilities Pr(zk|xk,uk-1) zk

Delay

uk

uk-1

Stationary PO-MDP Controller Bk Response Selector Recursive Estimator

µ(Bk)

ϕ(zk,uk-1,Bk-1) Delay

k Time step x System state z Measurements u Invoked control B State estimate Control law µ State estimation function ϕ Figure 1: PO-MDP Model

3.

Experiments and Results

To demonstrate the effectiveness of the PO-MDP approach, we constructed a prototype ADS and evaluated it using two criteria: its ability to respond to and thwart attacks in real time; and its success in ameliorating false positives. The experiments were conducted by attacking a i386-based personal computer running Red Hat Linux 6.2 with the Apache/1.3.12 web server and the wu-2.6.0(1) FTP server. This platform was chosen based on the availability of both a kernel-based anomaly sensor and a suite of known attacks (we did not apply the patches

Proceedings of the DARPA Information Survivability Conference and Exposition (DISCEX’03) 0-7695-1897-4/03 $17.00 © 2003 IEEE

which would foil the selected attacks). The prototype ADS was equipped with only four controls: Wait for the next measurement. Observe Alert the system re the attack. Notify Kill a recent anomalous process. Kill Shut the computer down. Halt This is enough controls that we could experiment with cost models and response selection algorithms, but a small enough number and simple enough controls that we could focus on understanding the controller’s behavior, which was the focus of our research.

less likely to respond inappropriately to authorized system activity than a static controller and is thus able to effectively use a more sensitive anomaly detector than a static controller. 3.2 Responding to Attacks. While correctly responding to false positives is important, it is equally important that the ADS respond to a real attack. It is a necessary aspect of the PO-MDP technology that the controller be trained against some data set. Ultimately, one would want to train an ADS

Table 1: Summary of results for 30 attacks. The average time value for the Notify control is from the beginning of the scan portion of the attack to the notify. The number thwarted is how many of the thirty attacks were stopped by Kill actions. The average time of Kill actions is from start of the bypass to the first kill attempt. # comp is the number of times the attack succeeded in compromising the system, and # recvr is the number of times the Halt control was invoked. Attack notifies avg time thwarted avg time total kills comp recvr 30 1.80 s 30 0.90 s 63 0 0 ftpd 30 1.90 s 30 5.12 s 203 0 2 rpc.statd 30 1.96 s 30 1.73 s 153 0 2 named 3.1 Using Noisy Sensors An important goal for the PO-MDP approach is to be able to obtain a signal from a noisy sensor(s). Specifically, the state model allows the controller to filter out the false positives from the anomaly sensor such that authorized processes are not killed and false alerts are not issued. To validate this, the victim box was run with a normal load (no attacks) for 90 min. In one case, the system was protected by the PO-MDP ADS prototype. In the other, the Kill actuator was directly triggered by anomalous sensor readings (a static controller). Table 2: Behavior when there is no attack. Total number of anomalies and kills in 90 minutes, and the average number of kills per. PO-MDP Static anomalies kills kills/min

543

26

1

26

0.011

0.289

Table 1 presents the results of this experiment. Even under normal loads there will be some anomalous behavior – activity on the system whose distance from normal exceeds the established threshold (e.g., false positives). The PO-MDP controller received 543 anomalous inputs over the 90 m span. However, since most of the anomalies did not map to its attack model, only a single process was killed during that time span. The static controller saw significantly fewer anomalies because it killed the processes that excited the anomaly detector. The conclusions that can be drawn from this table are that an ADS built on a feedback controller is

against all known instances of a class of attack that it is to defend the system against. Given the constraints of our project, we trained the controller against a single worm, and then tested its ability to defend against that worm (an attack against an FTP server) and two similar but never-before seen by the controller worms (attacks against the named and rpcd servers). Table 2 shows the results of attacking the victim box thirty times using the three worms. Given that the controller was trained against the FTP attack, it is expected that the ADS had its best results against that worm. The ADS issued a significantly higher number of kills per against the other two worms. Also, for two runs of each of those attacks, the ADS invoked the Halt control even though the attack had been thwarted. That being said, the ADS was highly effective at thwarting these previously unseen attacks.

4.

The Demonstration

We will demonstrate the experiment described in §3.2. The victem computer will run a “normal” workload, as well as αLADS. A second computer will launch an attack against the first. We will first show that the attack fails to succeed. A graphical display will show the signals generated by the anomaly detector during the attack. After the attack, we will use a visualization tool to show the belief states that the αLADS controller went through, and what its belief about the state of the system was at the points in time that the system responded to the attack.

Proceedings of the DARPA Information Survivability Conference and Exposition (DISCEX’03) 0-7695-1897-4/03 $17.00 © 2003 IEEE

Lihat lebih banyak...

Comentários

Copyright © 2017 DADOSPDF Inc.