Toward decentralized probabilistic management

June 9, 2017 | Autor: Avi Miron | Categoria: Distributed Computing, Network Management, Anomaly Detection, Probabilistic Logic, Telecommunication Networks, Data Communication, Distributed Control, Algorithm Design, IEEE Communications Magazine, Real Time Monitoring, Electrical And Electronic Engineering, Dynamic Networks, Data Communication, Distributed Control, Algorithm Design, IEEE Communications Magazine, Real Time Monitoring, Electrical And Electronic Engineering, Dynamic Networks

Share Embed

Denunciar este link

Descrição do Produto

GONZALEZ LAYOUT

6/20/11

4:14 PM

Page 80

TOPICS IN NETWORK AND SERVICE MANAGEMENT

Toward Decentralized Probabilistic Management Alberto Gonzalez Prieto, Cisco Systems Daniel Gillblad and Rebecca Steinert, Swedish Institute of Computer Science Avi Miron, Israel Institute of Technology

ABSTRACT In recent years, data communication networks have grown to immense size and have been diversified by the mobile revolution. Existing management solutions are based on a centralized deterministic paradigm, which is appropriate for networks of moderate size operating in relatively stable conditions. However, it is becoming increasingly apparent that these management solutions are not able to cope with the large dynamic networks that are emerging. In this article, we argue that the adoption of a decentralized and probabilistic paradigm for network management will be crucial to meet the challenges of future networks, such as efficient resource usage, scalability, robustness, and adaptability. We discuss the potential of decentralized probabilistic management and its impact on management operations, and illustrate the paradigm by three example solutions for real-time monitoring and anomaly detection.

INTRODUCTION

The work presented in this article was done while the first author was at the Royal Institute of Technology (KTH), Sweden.

80

Current network management solutions typically follow a centralized and deterministic paradigm, under which a dedicated station executes all management tasks in order to manage a set of devices [1]. This paradigm has proved successful for communication networks of moderate size operating under relatively stable network conditions. However, the growing complexity of communication networks, with millions of network elements operating under highly dynamic network conditions, poses new challenges to network management, including efficient resource usage, scalability, robustness, and adaptability. Scalability and robustness are essential for coping with the increasing network sizes while providing resilience against failures and disturbances. Adaptability of management solutions will be crucial, as networks to a higher degree will operate under network conditions that vary over time. Furthermore, since communication and computational capacities are limited, it is critical that management solutions use

0163-6804/11/$25.00 © 2011 IEEE

network resources efficiently. In order to meet these challenges, new approaches to network management must be considered. We believe that two key approaches to network management will become increasingly important: decentralized management, where management algorithms operate locally at the managed nodes without centralized control; and probabilistic management, in which management decisions and policies are not based on deterministic and guaranteed measurements or objectives. As we argue in this article, decentralized and probabilistic approaches are well positioned to solve the critical network management challenges listed above, and therefore a move toward decentralized probabilistic management seems likely. This move, however, means a fundamental shift in how network performance is viewed and controlled, and introduces operational changes for those that configure and maintain networks. In this article we discuss decentralized probabilistic management, describing how it can solve pressing challenges that large-scale dynamic networks bring to management, and how the adoption of this approach impacts network operations. We discuss the benefits, drawbacks, and impact of moving toward decentralized probabilistic management. We provide evidence and directions of current developments, while we present three concrete examples that demonstrate merits of decentralized probabilistic management.

DECENTRALIZED PROBABILISTIC MANAGEMENT ASPECTS OF PROBABILISTIC MANAGEMENT Management solutions can be probabilistic in one or several aspects. A probabilistic management algorithm: •Makes use of probabilistic models for representing the network state. That is, the algorithm represents the network state as probability distributions, using, for example, simple parametric representations [2], more complex graphical- or mixture models [3, 4], or sampling from generative models.

IEEE Communications Magazine • July 2011

GONZALEZ LAYOUT

6/20/11

4:14 PM

Page 81

•Does not necessarily use deterministic objectives, but rather objectives that are specified in terms of probabilities and uncertainties [2, 5]. This could, for example, mean specifying a bound on service failure probability. •Might provide its output in terms of probability distributions, instead of deterministic values. As an example, the average link load over a network could be reported with an expected mean value and variance instead of a single value [6]. •Might implement a probabilistic sampling algorithm, say, by randomly turning on and off management functions [6], relying on a random subset of nodes for estimating an aggregate value for all nodes [7], or sampling relevant network parameters at random rather than regular intervals [8]. •Might perform network control actions or explore the consequences of control actions using an element of randomness [9]. For example, the efficiency of different routing strategies could be continuously explored to adapt to changing network conditions. The first of these aspects is inherent to all probabilistic management solutions, whereas the other aspects listed may or may not be present in a specific algorithm.

POTENTIAL BENEFITS AND DRAWBACKS Decentralized approaches have several benefits: they scale very well with increasing network size, adapt to churn well, avoid single points of failure, and thus can be significantly more robust than centralized solutions. Probabilistic approaches can be significantly more resource-efficient than deterministic ones. By using the slack provided by the use of probabilistic rather than deterministic objectives (which often assume worst-case scenarios), the amount of bandwidth and processing resources consumed by the solutions can be significantly reduced. This feature is highly valuable for volatile networks, such as in cloud computing. Probabilistic approaches are also highly suitable for efficiently managing uncertainty and noise, thereby improving robustness in algorithm performance. Furthermore, probabilistic management changes the way networks are configured, as goals of the management algorithms can be stated as acceptable probabilities of distributions of performance metrics, allowing for intuitive interaction with the operator. A possible drawback of decentralized approaches is that the lack of centralized control typically leads to suboptimal solutions to management problems. For probabilistic approaches a possible drawback follows from the introduction of uncertainty. It does not provide operators with deterministic control over the state of managed devices. Note that while probabilistic management cannot provide hard guarantees on the accuracy of network measurements, capturing an accurate snapshot of today’s dynamic and volatile networks is already impossible. Managers must already accept some level of uncertainty in the data collected for management operations, and the introduction of probabilistic approaches might therefore come with acceptable cost in this regard.

IEEE Communications Magazine • July 2011

The combined benefits of decentralized and probabilistic approaches amount to management solutions that operate in a failure-resilient and resource-efficient manner. The potential drawbacks are less detailed control over managed devices and suboptimal solutions.

IMPACT ON NETWORK MANAGEMENT The introduction of decentralized probabilistic management approaches will bring a paradigm shift in how we specify network objectives and view network performance. Specifically, terms such as risk and uncertainty must now be taken into account when configuring and running networks. For example, although probabilistic management solutions may achieve better performance on average, they may, with some small probability miss objectives or perform badly. Managers can no longer rely on strict guarantees on network performance, and need to view the goals of the network as an expected service quality while allowing for some variance. Additionally, the information that management and configuration decisions are based on is no longer guaranteed to be within deterministic bounds, and could, with a low probability, be out of the expected range. With the new paradigm, network managers will not only be utilizing strict rules, but will also be specifying objectives that include uncertainties. In addition to monitoring network performance, the role of a network manager is likely to also include monitoring the performance of decentralized probabilistic management solutions. Related to this, managers will have to analyze problems in the network differently, as the increased autonomy and uncertainty of decentralized probabilistic management solutions could potentially allow for some underlying problems to remain undetected. In summary, network managers will need to think about the network in terms of uncertainties and likely scenarios while considering the expected cost and cost variability. This is a very different approach from current practices, where managers focus on upholding strict guarantees on parameters staying within predetermined limits.

Network managers will need to think about the network in terms of uncertainties and likely scenarios while considering the expected cost and cost variability. This is a very different approach from current practices.

FURTHER CONSIDERATIONS Autonomous mechanisms are an important part of efficient management processes in complex networks. Naturally, adaptability is key to achieving the autonomy necessary to reduce operative costs in future networks. Although facilitated by a probabilistic approach, adaptability is by no means an inherent property and must be considered during method development. However, the combination of decentralized and probabilistic approaches allows for the design of highly adaptable solutions. Increased autonomy and adaptability will likely lead to less detailed control of managed resources for the network operators. In our view, this means that decentralized probabilistic solutions need to be developed with an additional property in mind to gain acceptance: the solutions must provide managers with an accurate prediction of the performance of a management solution (e.g., the amount of resources it con-

81

GONZALEZ LAYOUT

6/20/11

Probabilistic practices are already widely used in various networking and communications processes. Examples include CSMA/CD, the Ethernet protocol ensuring that only one node is transmitting its data on the network wire at any point of time; and statistical multiplexing in IP networks for better resource efficiency.

82

4:14 PM

Page 82

sumes), and managers need to be able to control this performance. As illustrated by specific examples provided later, this type of performance control and prediction is actually rather straightforward to express in the adaptive probabilistic models we envision.

CURRENT DEVELOPMENT This section presents prior work in the area of decentralized probabilistic management systems, discussing, in turn, the management framework, and management operations in the area of monitoring, diagnosis and traffic management. Probabilistic approaches can be used to enhance the management infrastructure in terms of resource efficiency [6, 10]. A probabilistic decentralized framework for network management is presented in [6], in which management functions are randomly turned on or off, thereby effectively exploiting redundancy in those functions. By means of simulation, the study demonstrates reduced effort and resources required for performance and fault management operations, while still achieving a sound level of accuracy in the overall network view. Relying on probabilistic estimates, self-organization of sensor and ad hoc networks is discussed in [10]. The study makes use of connectivity probability information in order to select the management clusters that can efficiently carry out the management tasks. A further example makes use of random sampling among network entities, where network monitoring operations benefit from gossip-based solutions for aggregation of network management data. This can be carried out by nodes that collaborate with randomly selected neighbors [11]. Neighbor sampling during neighbor discovery results in more efficient data dissemination than non-gossip flooding schemes. A more recent study [12] further improves the efficiency of the neighbor discovery process by implementing a probabilistic eyesight direction function, which practically narrows the direction through which neighbors are sought. Network diagnosis is explored in multiple studies using probabilistic representations such as graphical models to infer the fault. Two studies also add decentralized processing [3, 4]. Moreover, [4] implements a collaborative approach for a Bayesian network model. The scheme effectively handles uncertainty in dynamic networks, which was demonstrated on three different scenarios. The probabilistic approach makes it possible to provide diagnostics with a limited amount of information, although the higher the amount of evidencs, the greater certainty the system gets. The study reported in [3] presents an extensible Bayesian-inference architecture for Internet diagnosis, which deploys distributed diagnostic agents and includes a component ontology-based probabilistic approach to diagnosis. The proposed architecture was successfully demonstrated with realworld Internet failures over a prototype network. Traffic management operations can also take advantage of probabilistic approaches. In [8], the authors propose sampling the data such that only a small number of packets are actually cap-

tured and reported, thereby reducing the problem to a more manageable size. End user congestion control mechanisms are presented in [13], which interact with a probabilistic active queue management of flows. The model captures the packet level dynamics and the probabilistic nature of the marking mechanism for investigating the bottleneck link and profiling the queue fluctuations, eventually gaining better understanding regarding the dynamics of the queue, and means to cope with congestion. The sample studies presented above demonstrate preliminary positive experience gained with decentralized probabilistic approaches for network management, and exemplifies directions for further evolution toward decentralized probabilistic management. Probabilistic practices are already widely used in various networking and communications processes. Examples include carrier sense multiple access with collision detection (CSMA/CD), the Ethernet protocol ensuring that only one node is transmitting its data on the network wire at any point of time; and statistical multiplexing in IP networks for better resource efficiency. These examples cope with vast amounts of information and limited resource availability by implementing probabilistic methods in a distributed manner. There is no question that such probabilistic practices are successful, as shown by the fact that they are standardized and widely deployed. As networks increase in scale, it is likely that network management operations will also start deploying decentralized probabilistic approaches. The studies reported above represent early examples in such a development.

PROBABILISTIC MANAGEMENT ALGORITHMS In this section we present three algorithms to illustrate different aspects of decentralized probabilistic management and applications within fault and performance monitoring. Such algorithms are responsible for estimating the network state, and are crucial functions of a complete management system as they support other management tasks, including fault, configuration, accounting, performance, and security management. First, we discuss a probabilistic approach to anomaly detection and localization, which makes use of local probabilistic models to adapt to local network conditions, while setting the management objectives in probabilistic terms. Second, we describe a tree-based algorithm for probabilistic estimation of global metrics, in which both objectives and reported results are probabilistic in nature. Finally, we present an alternative scheme for the estimation of network metrics by counting the number of nodes in a group, which makes use of random sampling over network entities in addition to the use of probabilistic objectives and outputs.

PROBABILISTIC ANOMALY DETECTION We have devised a distributed monitoring algorithm that provides autonomous detection and localization of faults and disturbances, while

IEEE Communications Magazine • July 2011

4:14 PM

Page 83

adapting its use of network resources to local network conditions [2, 14]. The decentralized approach is based on local probabilistic models created from monitored quality of service (QoS) parameters, such as drop rate and link latency, measured by probing. Based on the probabilistic models, the distributed algorithm provides link disturbance monitoring and localization of faults and anomalies. The anomaly detection approach serves as an example for some of the different aspects of using probabilistic management. The algorithm operates based on probabilistic objectives as input, instead of deterministically set parameters. This significantly reduces the requirements on manual configuration, of either individual network components or across the network, even when network conditions are highly variable. Given such probabilistic objectives, estimated probability models are used for adjusting relevant low-level algorithm parameters. The autonomous adjustment of low-level parameters matching the set of probabilistic objectives enables predictive control of the algorithm performance within probabilistic guarantees. Moreover, the probabilistic models can be used for prediction and decision making, as estimated model parameters can be extracted from the algorithm. This enables other parts of the network management (e.g., traffic management) to take advantage of the estimated probabilistic models for autonomous configuration of network parameters. To run the algorithm, the managing operator specifies a number of high-level management requirements, in terms of network resources, to run the algorithm. Here, the high-level requirements are expressed as probabilities related to probing traffic and detection delays. Low-level parameters, such as probing rates and probing intervals, autonomously adapt to current network conditions and the management requirements. Specifically, the operator sets the acceptable fraction of false alarms in detected anomalies. Moreover, the operator specifies a fraction of the estimated probability mass of observed probe response delays on a link [2]. Thereby, accuracy and probing rates are specified as probabilities rather than being specified in terms of a fixed probing rate across the entire network, providing a typical example of configurations expressed as probabilistic goals rather than in terms of deterministic limits. The use of these probabilistic parameters effectively determines the normally observed probing rates for each link and how quickly action is taken to confirm a suspected failure, while losses and delays are accounted for [2]. Figure 1 depicts examples of the algorithm behavior for one network link. We observe that the obtained rate of false alarms successfully meets the management objective of the specified acceptable false alarm rate for different rates of packet drop (Fig. 1a). In fact, the acceptable rate of false alarms is here an upper limit of the expected amount of false positives on an individual link, which exemplifies the difference between strict performance guarantees and predictive performance control, when using proba-

IEEE Communications Magazine • July 2011

Rate (log. scale)

6/20/11

-1 -3 -5 -3.5

-3

-2.5 -2 -1.5 Fraction of acceptable false alarms (log. scale)

-1

(a) 50 Rate

GONZALEZ LAYOUT

30 10 -3.5

-3

-2.5 -2 -1.5 Fraction of acceptable false alarms (log. scale)

-1

(b) Drop=0.2

Drop=0.3

Drop=0.4

Drop=0.5

Figure 1. a) Rate of false alarms given a fraction of acceptable false alarms and drop; b) adaptive probe rates given a fraction of acceptable false alarms and drop. bilistic management algorithms. Similarly, we show that the number of probes needed for detecting a failure adapts to the observed network conditions in order to meet the same requirements on the rate of acceptable false alarms (Fig. 1b), exemplifying how the estimated probabilistic models can be used for autonomous adjustments of low-level parameters.

TREE-BASED PROBABILISTIC ESTIMATION OF GLOBAL METRICS Accuracy — Generic Aggregation Protocol (AGAP) is a monitoring algorithm that provides a management station with a continuous estimate of a global metric for given performance objectives [5]. A global metric denotes the result of computing a multivariate function (e.g., sum, average, and max) whose variables are local metrics from nodes across the networked system (e.g., device counters or local protocol states). Examples of global metrics in the context of the Internet are the total number of VoIP flows in a domain or the list of the 50 subscribers with the longest end-to-end delay. A-GAP computes global metrics in a distributed manner using a mechanism we refer to as in-network aggregation. It uses a spanning tree, whereby each node holds information about its children in the tree, in order to incrementally compute the global metric. The computation is push-based in the sense that updates of monitored metrics are sent toward the management station along the spanning tree. In order to achieve efficiency, we combine the concepts of in-network aggregation and filtering. Filtering drops updates that are not significant when computing a global metric for a given accuracy objective, reducing the management overhead. A key part of A-GAP is the model it uses for the distributed monitoring process, which is based on discrete-time Markov chains. The model allows us to describe the behavior of individual nodes in their steady state and relates

83

GONZALEZ LAYOUT

6/20/11

4:14 PM

Page 84

300

Overall updates/s

250 200 150 100 50 0 0

10

20

30

40

50

Aggregating nodes

Figure 2. Overall management overhead as a function of the number of aggregating nodes (for a network with 200 nodes). performance metrics to control parameters. The model has been instrumental in designing a monitoring protocol that is controllable and achieves given performance objectives. The managing operator specifies the global metric of interest and then the desired accuracy as a probability distribution for the estimation error. Examples of input A-GAP supports include the average error, percentile errors, and maximum error. As a consequence, our design permits any administrator with a basic understanding of performance metrics to use our solution, without the need for detailed knowledge of our solution internals. Based on this input, the algorithm continuously adapts its configuration (e.g., filters), providing the global metric with the required accuracy. The output of the algorithm is a continuous estimate of the global metric with the required accuracy. The output also includes predictions for the error distribution and traffic overhead. This output is provided in real time at the root node of the spanning tree. A-GAP has proved to make an efficient use of resources. Figure 2 is a representative example of A-GAP’s performance: its maximum overhead increases sublinearly with the network size (for the same relative accuracy). In addition, the overall management overhead scales logarithmically with the number of internal (i.e., aggregating) nodes in the spanning tree. This behavior is consistent in all our experiments, where we have used both synthetic and real traces, and a wide range of network topologies [5]. Our experiments include both simulation and testbed implementations.

PROBABILISTIC ESTIMATION OF GROUP SIZES Not All at Once! (NATO!) is a probabilistic algorithm for precisely estimating the size of a group of nodes meeting an arbitrary criterion without explicit notification from every node [7]. The algorithm represents an example of a probabilistic sampling approach and provides an alternative to aggregation techniques. It can be used to collect information such as the number of nodes with high packet rate, indicating emerging congestion. By not having

84

each node reporting its above-normal metrics independently, available capacity and resources are efficiently utilized, thereby avoiding excessive amounts of traffic at the ingress channel of the management station. The scheme provides control over the tradeoffs between data accuracy, the time required for data collection, and the amount of overhead incurred. NATO! is an example of a family of algorithms that implement probabilistic polling for estimating the size of a population. It implements a distributed scheme in which nodes periodically and synchronously send reports only if their metrics exceed a threshold after waiting a random amount of time sampled from an agreed time distribution function. The network management station waits until it receives a sufficient number of reports to estimate the total number of nodes with the desired precision, and broadcasts a stop message, notifying the nodes that have not yet reported not to send their reports. The management station then analyzes the transmission time of the received reports, defines a likelihood function, and computes the number of affected nodes for which the likelihood function is maximized. Typically, with only 10 report messages coming from a group of 1000 or 10,000 nodes, the estimation error is practically eliminated. This significant reduction in network load is achieved at the expense of marginal computation load at each node and the broadcast messages. The scheme is an effective monitoring platform that demonstrates efficient resource usage, scalability, robustness, adaptability, autonomy, and ease of use. Network managers control and configure NATO! by means of a number of high-level parameters: the desired metrics to monitor and their threshold values, acceptable overhead (specified as the maximum allowed rate of incoming messages), the time it takes to conclude the number of nodes experiencing an abnormal condition, and the desired accuracy of the estimation. A simple heuristic translates these parameters to a specific time distribution function and a time interval, and the frequency at which NATO! is implicitly invoked. This configuration controls the trade-off between accuracy, timeliness, and overhead. It can be dynamically adapted when the network conditions or management objectives change: faster estimations can be delivered, setting a shorter time interval for the time distribution function, at the expense of higher density of the incoming messages; for faster reaction to changes in network conditions, the frequency at which NATO! is invoked can be increased; when in-depth analysis is required, threshold values of network metrics can be changed and new network metrics can be added; for better fault localization, local NATO! managers can be assigned to collect data in their subnetworks. All of these configuration changes can become active by means of a broadcast message from the management station, which adapts the monitoring task to the current needs for best performance under acceptable cost. Due to its probabilistic nature, the scheme is practically scalable to any network size. There is

IEEE Communications Magazine • July 2011

GONZALEZ LAYOUT

6/20/11

4:14 PM

Page 85

an insignificant incremental overhead for larger network domains at the egress channel, delivering the broadcast messages from the management station to a larger group of nodes. However, this overhead fans out quickly while following the topology tree, without any detrimental effect on the stressed ingress channel of the management station.

LEVERAGING DECENTRALIZED PROBABILISTIC MANAGEMENT All three algorithms presented in this section demonstrate the benefits of decentralized probabilistic approaches to network management: they enable efficient resource usage compared to deterministic approaches, exploiting currently available resources, while taking noise and variations in the network into account. The algorithms are scalable, robust, and adaptive to network conditions. The decentralized approach for anomaly detection enables scalable and efficient usage of resources. The use of adaptive probabilistic models allows for capturing the local network behavior and predicting the algorithm performance. For A-GAP, decentralization enables efficiency, and probabilistic management provides performance control. Computing metrics in a distributed fashion along an aggregation tree permits reducing the monitoring traffic compared to a centralized approach. The use of probabilistic models permits A-GAP to predict its performance and therefore meet the objectives of the network manager. Thanks to its decentralized and probabilistic nature, NATO! is scalable to any network size, avoiding congestion at the ingress channel of the management station while effectively controlling the trade-off between accuracy, timeliness, and overhead.

CONCLUSIONS In this article we have advocated for the adopt i on of a de c e n t r al i z e d an d p r o b ab i l i s tic paradigm for network management. We have argued that solutions based on it can meet the challenges posed by large-scale dynamic networks, including efficient resource usage, scalability, robustness, and adaptability. We have exemplified this with three specific solutions, and discussed how they address such challenges. A key challenge in the adoption of this paradigm is acceptance by network managers. They will need to think about the network in terms of uncertainties and likely scenarios while considering the expected cost and cost variability. This is a very different approach from current practices, where managers focus on upholding strict guarantees on parameters staying within pre-determined limits. Operators must look at the state of their networks in terms of probabilities and accept a certain degree of uncertainty. Solutions that can quantify that uncertainty are, from this point of view, of great relevance. This paradigm shift is also likely to cause

IEEE Communications Magazine • July 2011

some reluctance among network operators to deploy this type of solutions. We believe that in order to mitigate this reluctance, it will be key to show that decentralized probabilistic approaches can reduce operational expenditures. This reduction is enabled by their higher degree of automation compared to traditional approaches. However, at this point, this is a conjecture and must be supported by developing use cases that quantify the potential savings in different scenarios. For this purpose, experimental evaluations in large-scale testbeds and production networks are a must. While we expect some reluctance in adopting a new management paradigm, we strongly believe that not doing it would have a major negative impact on the ability to manage larger and more complex networks, and as the need for solutions for such networks increases, we are likely to see more widespread adoption.

While we expect some reluctance in adopting a new management paradigm, we strongly believe that not doing it would have a major negative impact on the ability to manage larger and more complex networks

ACKNOWLEDGEMENT This work was supported in part by the European Union through the 4WARD and SAIL projects (http://www.4ward-project.eu/, http:// www.sail-project.eu/) in the 7th Framework Programme. The authors would like to thank Reuven Cohen (Technion), Björn Levin (SICS), Danny Raz (Technion), and Rolf Stadler (KTH) for their valuable input to this work.

REFERENCES [1] G. Pavlou, “On the Evolution of Management Approaches, Framework and Protocols: A Historical Perspective,” J. Network and Sys. Mgmt., vol. 15, no. 4, Dec. 2007, pp. 425–45. [2] R. Steinert and D. Gillblad, “Towards Distributed and Adaptive Detection and Localisation Of Network Faults,” AICT 2010, Barcelona, Spain, May 2010. [3] G. J. Lee, CAPRI: A Common Architecture for Distributed Probabilistic Internet Fault Diagnosis, Ph.D. dissertation, CSAIL-MIT, Cambridge, MA, 2007. [4] F. J. Garcia-Algarra et al., “A Lightweight Approach to Distributed Network Diagnosis under Uncertainty,” INCOS ’09, Barcelona, Spain, Nov. 2009. [5] A. Gonzalez Prieto, “Adaptive Real-Time Monitoring for Large-Scale Networked Systems,” Ph.D. dissertation, Dept. Elect. Eng., Royal Insti. Technology, KTH, 2008. [6] M. Brunner et al., “Probabilistic Decentralized Network Management,” Proc. IEEE IM ’09, New York, NY, 2009. [7] R. Cohen and A. Landau, “Not All At Once! — A Generic Scheme for Estimating the Number of Affected Nodes While Avoiding Feedback Implosion,” INFOCOM 2009 Mini-Conf., Rio di Janeiro, Brazil, Apr. 2009. [8] K. C. Claffy, G. C. Polyzos, and H.-W. Braun, “Application of Sampling Methodologies to Network Traffic Characterization,” ACM SIGCOMM Comp. Commun. Rev., vol. 23, no. 4, Oct. 1993, pp. 194–203. [9] E. Stevens-Navarro, L. Yuxia, and V. W. S. Wong, “An MDP-Based Vertical Handoff Decision Algorithm for Heterogeneous Wireless Networks,” IEEE Trans. Vehic. Tech., vol. 57, no. 2, 2008. [10] R. Badonnel, R. State, and O. Festor, “Probabilistic Management of Ad Hoc Networks,” Proc. NOMS ’06, Vancouver, Canada, Apr. 2006, p. 339–50. [11] A. G. Dimakis, A. D. Sarwate, and M. Wainwright, “Geographic Gossip: Efficient Aggregation for Sensor Networks,” IPSN 2006, Nashville, TN, Apr. 2006. [12] L. Guardalben et al., “A Cooperative Hide and Seek Discovery over in Network Management,” IEEE/IFIP NOMS Wksps. ’10 Osaka, Japan, Apr. 2010, pp. 217–24. [13] P. Tinnakornsrisuphap and R. J. La, “Characterization of Queue Fluctuations in Probabilistic AQM Mechanisms,” Proc. ACM SIGMETRICS, 2004, pp. 283–94. [14] R. Steinert and D. Gillblad, “Long-Term Adaptation and Distributed Detection of Local Network Changes,” IEEE GLOBECOM, Miami, FL, Dec. 2010.

85

GONZALEZ LAYOUT

6/20/11

4:14 PM

Page 86

BIOGRAPHIES ALBERTO GONZALEZ PRIETO ([email protected]) received his M.Sc. in electrical engineering from the Universidad Politecnica de Cataluña, Spain, and his Ph.D. in electrical engineering from the Royal Institute of Technology (KTH), Stockholm, Sweden. He has been with Cisco Systems since 2010. He was an intern at NEC Network Laboratories, Heidelberg, Germany, in 2001, and at AT&T Labs Research, Florham Park, New Jersey, in 2007. His research interests include management of large-scale networks, real-time network monitoring, and distributed algorithms. DANIEL GILLBLAD ([email protected]) has a background in statistical machine learning and data analysis, and has extensive experience in applying such methods in industrial systems. He holds an M.Sc. in electrical engineering and a Ph.D. in computer science, both from KTH. He has been with the Swedish Institute of Computer Science (SICS) since 1999, where he currently manages the network management and diagnostics group within the Industrial Applications and Methods (IAM) laboratory. His research interests are currently focused around network management, diagnostics, data mining and mobility modeling.

86

AVI MIRON ([email protected]) is a researcher at the Computer Science Department of the Israeli Institute of Technology (Technion). Graduated from the University of Southern California, Los Angeles, he has participated in a few EU-funded research projects, including BIONETS and 4WARD, and now in SAIL and ETICS. He is an experienced high-tech executive and an entrepreneur in the area of tele/data communications, in both Israel and the United States. REBECCA STEINERT ([email protected]) is with IAM at SICS since 2006. She has a background in statistical machine learning and data mining, and in 2008 she received her M.Sc. from KTH in computer science with emphasis on autonomous systems. Since the beginning of 2010, she is pursuing her Ph.D. at KTH with focus on statistical approaches for network fault management. She has worked in the EU project 4WARD and is currently involved in SAIL, focusing on fault management in cloud computing and network virtualization. She also contributes to the network management research within the SICS Center for Networked Systems.

IEEE Communications Magazine • July 2011

Lihat lebih banyak...

Toward decentralized probabilistic management

Descrição do Produto

Comentários