Towards decentralized self-adaptive component-based systems
Descrição do Produto
Towards Decentralized Self-adaptive Component-based Systems Luciano Baresi, Sam Guinea, and Giordano Tamburrelli Dipartimento di Elettronica e Informazione – Politecnico di Milano via Golgi 40 – 20133 Milano, Italy (baresi | guinea | tamburrelli)@elet.polimi.it
ABSTRACT
1.
Modern software systems are challenging challenge software engineers since they must adapt effectively and efficiently with respect to the environment in which they are deployed. To this end, the paper outlines an architecture, supported by special-purpose languages and aspect oriented techniques, for the design of component-based of distributed self-adaptive systems. Software artifacts are dynamically grouped, and constantly supervised by a network of ad-hoc components. The supervision mechanism exploits special-purpose languages that define data collection, correlation, aggregation, and analysis, to reason on properties defined at different levels of pervasiveness (from component-wide to system-wide). Its goal is to identify situations that trigger adaptations. Our approach is characterized by a decentralized architecture in which there is no single point of failure or bottleneck, and a clear separation of concerns between business logic and adaptation mechanisms. These concepts are demonstrated by a fantasy example that offers a general abstraction for distributed load balancing problems.
Adaptability [7] is becoming a key feature of modern software systems, which are not only supposed to behave correctly, but must also be able to adjust their behavior to cope with changes in the surrounding environment. Behind the term adaptability, we actually find very different capabilities, conceived for different systems —from robotic controllers to mechatronic systems and fully distributed, context-aware applications. In some cases, the set of components is fixed, and adaptability boils down to the runtime adjustment of some parameters, while in other cases, the world is open and the application is capable of discovering and exploiting the “best” functionality available. Generally speaking, adaptability is often based on the notation of control (feedback) loop, where sensors, which can be dedicated software elements, provide field data, specialpurpose components analyze them and decide what to do, and actuators apply planned reactions on the system. IBM coined the term autonomic [8] to identify these capabilities and proposes the monitor-analyze-plan-execute loop to succinctly describe how such systems work. Autonomic features can be either built-in at design time, or be added afterwards, on top of existing systems. In the former case, software elements usually embed reasoning capabilities: some special-purpose platforms (e.g., SelfLet [6]) provide adaptation capabilities and require that users program their applications as suitable customizations of these features. In contrast, the latter idea considers adaptation as an extrinsic cross-cutting feature that does not affect the system’s nature and that can be added subsequently [10]. In both cases, two kinds of supervision are possible. First, systems can be supervised by a centralized (or hierarchical) controller [20], where a single entity is in charge of taking decisions on how the different elements must adapt their behavior. Second, the cooperation (and coordination) among the components can support decentralized supervision based on peer-to-peer solutions (e.g., [4]). The amount of flexibility embedded in these systems, and their underlying communication infrastructure, are usually the key drivers when choosing one solution over the other. Among the different alternatives briefly sketched above, the paper addresses the adaptability of context-aware applications, and considers adaptation a cross-cutting concern. We think that this choice is fundamental for a clearer separation of concerns and for a more flexible approach to adaptability. We address fully distributed, highly dynamic applications whose components behave and cooperate to carry out a well-defined business goal. The loose cooperation
Categories and Subject Descriptors K.6.4 [Management of Computing and Information Systems]: System Management—Centralization/decentralization; D.2.11 [Software Engineering]: Software Architectures—Languages
General Terms Design, Languages, Management
Keywords Self-adapatability, Decentralization, Aspect Oriented Programming, Distributed Supervision
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. SEAMS’08, May 12–13, 2008, Leipzig, Germany. Copyright 2008 ACM 978-1-60558-037-1/08/05 ...$5.00.
57
INTRODUCTION
2.
among components, and the fact that they enter and leave the application freely, without any re-deployment or re-configuration, are two other distinctive features. This dynamism imposes a decentralized solution. A centralized approach would not guarantee the desired degree of scalability and fault tolerance. Moreover, its need to maintain an updated image of the system’s state may easily turn the central control into a bottleneck for the entire application. Application components play different roles1 : besides those devoted to actual computations, the control-loop requires components that act as software reifications of sensors and actuators, and others that oversee the whole supervision process. In other words, some components gather context information, others decide what to do, and the last set works on the outcome (e.g., they provide suitable information to the user or control some external devices). In the solution proposed in the paper, decentralization is obtained by grouping software elements into homogenous clusters using application-specific metrics. We have chosen to experiment clustering techniques as a means to improve and optimize the creation of the different groups, and to limit the setup phase when a component enters or leaves the application. Each cluster is headed by a particular component, in charge of supervising it and of gathering information from the rest of the system. This information is exchanged among the different clusters by exploiting a network that connects them through their head components. We adopt aspect-oriented techniques [11] (AOP) as means to specify and add adaptation features to the components in our system. In our model we distinguish between local adaptation, which does not require system-wide decisions, and global adaptation, where the whole system is involved in planning possible reactions. We also introduce a third level of adaptation in which the information gathered from a supervised cluster is used to inject new behaviours into its components. Since this paper is mainly work in progress we use a fantasybased example to introduce and illustrate the different elements of our proposal. The scenario is inspired by dungeons and dragons games. Rooms are magical, and each has a number of traps and creatures that it can use to confront adventurers. However, to activate these traps and creatures, the room needs a certain amount of magical power, which is guaranteed by the presence of wizards. Since adventures can kill wizards, the dungeon can respond by moving wizards among rooms, and by occasionally re-spawning a dead wizard. The goal of the dungeon is to maintain an adequate amount of wizards in each room. Despite the fantastical nature of our example, it offers a fun abstraction for distributed and dynamic load balancing problems. Indeed, the dungeon continuously moves its wizards among the rooms to satisfy load balancing constraints. More precisely, the mobility, death, and re-spawning of our wizards correspond to component migration, failure, and replacement. The paper is organized as follows. Section 2 introduces the model behind our proposal. Section 3 explains the different languages used in the control loop. Section 4 surveys some related approaches, and Section 5 concludes the paper.
MODEL
The model presented in this paper aims at supporting the self-adaptability of distributed, component-based applications. We assume a peer-to-peer network of components [17], and we organize them in: • Supervised elements, which are the components in charge of executing the business logic. These components may also play the role of sensors and actuators with respect to the deployment context. • Supervisors, which are special-purpose components in charge of overseeing how the other components behave and decide for non-local adaptation strategies. Given these two classes of components, our goals are: (1) the injection of self-adaptability into supervised elements as cross-cutting concern, and (2) the adoption of a decentralized approach. The first goal is met by augmenting supervised elements with aspect-oriented probes [11], which act as both sensors and actuators within the software components. Aspects alone would be able to add extrinsic local adaptability, that is, each component acts and reacts in isolation, but when we want to embed reactions based on system-wide policies, we need to complement them with something more. This is why supervisors come into play. Indeed, these components are in charge of: (1) grouping together, (2) monitoring, and (3) adapting supervised components. The grouping leads to the definition of a logical cluster, that is of a group of elements lead by a supervisor. Each time an element is added to the system, its AOP advice uses a clustering algorithm to choose the cluster that optimizes its application-specific metric. The metric can be as simple as the physical distance of peers, or any other information acquired by the system (i.e., some set preferences or other similarities). Once an element has joined a cluster, its AOP advice opens a communication stream with the cluster’s supervisor. To manage a cluster, membership information is distributed using an event-based paradigm. In particular, our model notifies the entrance, exit, and failure of any element. This way a supervisor always knows who is in the cluster, but elements can also adapt locally to supervisor failures. Supervisors (and supervised elements) exploit a subscription/notification protocol [3] to set the events sent by the probes. Each supervisor subscribes to the set of events of interest by asking the probes to notify them as soon as they occur in the business logic they are overseeing. Supervisors are always aware of elements’ execution, and they can also adjust the amount of information they want to receive dynamically. The whole set of supervised elements is partitioned into (meaningful) slices, each controlled by a supervisor, but we still need a way to make the different slices communicate and recreate a complete view of the system. This is why we also group the supervisors into a federation and exploit the same subcription/notification protocol to make them interact. Once again, an event-based paradigm is used to manage federation membership accordingly. Every supervisor in the federation subscribes to the set of events of interest, to receive them as soon as they occur. Conceptually, there is no difference between the events flowing from the supervised elements to the supervisors and the
1 The assumption that a component only plays one role is nothing but a simplification adopted in this paper for the sake of clarity. Components playing different roles would not change the approach, but they would only complicate its presentation.
58
events flowing among the supervisors themselves. In fact, from a supervisor’s point of view the two kinds of events are undistinguishable.
(parallel adaptations) must be coordinated to reach a single common goal. Although the more subtle cases are still open and part of our continuous refinement, more details are presented in Section 3.5. These concepts allow us to model our running example in the following way. Every room in the dungeon is characterized by a name. In our example, we have the following four rooms: Cavern, Librarium, Morgue, and Dragon’s Lair. Wizards are connected at startup (or after a re-spawning) to a supervisor. If we imagine that supervisors are in charge of assuring an application specific property (i.e., “Every wizard must be in the room he has to defend.”), then they are able to adapt the wizards and move them by assigning them a room to defend. Section 3 continues this example and describes the languages provided by our approach.
Federation Supervisor
Supervisor
Supervisor
Supervisor
ACE ACE ACE ACE ACE
ACE ACE ACE ACE ACE
SE
SE
Clusters
3. ACE ACE ACE ACE ACE
SE
ACE ACE ACE ACE ACE
SE
Figure 1: Clusters and Federations. After sketching the topology of our self-adaptive systems, and explaining the information flow among the components, we can now move to explain how adaptation works. We already explained local (component-wide) adaptation, but now we can introduce two other options: • Cluster-wide adaptation works at the level of each single cluster and can decide upon reactions that involve all the supervised elements of a cluster. Supervisors exploit the languages described in Section 3 to recognize patterns of events received from the elements in their clusters and trigger adaptation in the supervised elements. • Application-wide adaption considers the federation of supervisors and is able to enforce reaction policies that involve the whole system. Global reasoning requires information gathered through the federation, which in turn implies event flows among clusters. More precisely, the distribution of knowledge is enabled through the subscription/notification protocol. The concepts introduced in this section enable a control loop for supervised elements. This control loop comprises event collection, correlation, aggregation, analysis, and reaction. More precisely, we can to identify two different control loops. Aspect oriented probes implement a tight control loop with respect to individual elements, while supervisors enable cluster-wide and application-wide adaptations using a looser control loop. The control loops themselves make the proposed solution robust. For example, when a cluster loses its supervisor, it non longer exists from a logical standpoint. Thanks to a supervisor failure event, the elements in the cluster use a tight control loop to chose the next best cluster, as if they were connecting for the first time to the system. On the other hand, a supervisor’s failure is also managed globally at the federation level. The loose control loop, in fact, can result in the creation of a new supervisor, and in new cluster membership optimizations. All these different initiatives
59
CONTROL LOOP
Now we concentrate on the control loop used for clusterwide and application-wide adaptation, and on the languages provided in our approach. The two types of adaptation only differ in the sources of information considered when adapting. Therefore, we focus on application-wide adaptation, since it needs to capture global information about the system. The control loop (shown in Figure 2) is typically executed by supervisor components, and consists of five main steps: Collection, Correlation, Aggregation, Analysis, and Reaction. The capabilities required to perform these steps are encoded directly into supervisors. However, their actual behaviours are configured externally using our languages. The control loop starts when events of interest are received by a supervisor. These events can be sent either by supervised elements or by other supervisors. In the former case, communication can be synchronous or asynchronous. If synchronous, whenever an element’s AOP probe sends a message to its supervisor, it waits for the entire control loop to complete before allowing the business logic to continue. This means that if adaptation is necessary, it is performed as soon as possible. If asynchronous, the element’s business logic is only stopped for the amount of time needed to send a new event to the supervisor. When the control loop completes, adaptations are performed asynchronously on the supervised element. Communication with the other supervisors is always performed asynchronously. The Collection step is responsible for receiving the events and for creating the finite sequences of events that are considered throughout the control loop. These sequences are then passed on to the Correlation step, in which the system uses a correlation property to match events belonging to different sequences. Each correlation property defines a new sequence of event tuples that satisfy the property. The third step performs Aggregation, a broad term we use to indicate data aggregation, elimination, or re-arrangement, in preparation for the Analysis. Once the system has the sequences it needs, the Analysis checks to see whether these sequences satisfy set properties. If they do not, they are forwarded to the Reaction step, which takes action and induces new behaviours in the system’s components. However, not all of these steps are strictly required. In fact, the correlation and aggregation steps can be skipped entirely, enabling much simpler control loops. The execution of the control loop is obviously time consuming. Therefore we must be sure that the adaptation
Event A
Sequence A
Collection Event B
Sequence C
Correlation Sequence B
Aggregation Sequence D
Sequence C1
Sequence D1
true/false
Reaction
Analysis data
Figure 2: The supervision and reaction loop. only takes place in a situation that is consistent with the one seen at the beginning of the control loop. For example, in the meanwhile the federation may have changed due to supervisor failures. In this case, supervisors receive an appropriate event from the federation and stop their control loops. The only control loops that are allowed to proceed in an inconsistent state are those in charge of trying to fix the federation. As soon as the federation is fixed the adaptation loop is restarted In our example, application-wide adaptation could be used to help maintain the following property:
leaves a cluster. Using these events, a subscriber will create four sequences. For example, the Librarium’s supervisor will create sequences seqCavern, seqMorgue, and seqLair), and sequence seqRoom, which will contain all the events received from the room’s wizards (i.e., the supervised elements). The supervisor uses a cardinality-based window in the first three cases: create seqCavern = $notif:space(1)[$notif/source==’Cavern’]; create seqMorgue = $notif:space(1)[$notif/source==’Morgue’]; create seqLair = $notif:space(1)[$notif/source==’Lair’];
and a time-based window (30 seconds) in the last case: Each dungeon room should have a number of wizards equal to the average number of wizards per room calculated over the entire dungeon.
create seqRoom = $presence:time(30);
In these examples, $notif and $presence are event message types, space(1) defines a cardinality-based window, time(30) defines a time-based window, and the squared brackets contain filtering properties. In our example, the events received from a supervisor contain a first-level child node called source which holds the message sender’s identity. Therefore, the filtering properties state that we are only interested in events that were sent by the Cavern supervisor in the first case, the Morgue supervisor in the second case, and the Lair supervisor in the last case.
In this case, the control loop must be performed by all four room supervisors. Due to the lack of space, we cannot go into details of our languages, but we can only introduce them through examples. However they all borrow from our [2], an XML based constraint language used for web service specifications. The language mixes XML technology, such as XPath (used to select data from within a complex event message), and typical boolean, relational, and arithmetic operators. It also provides data-type specific functions (e.g., string-length, etc.), aggregate functions (e.g., sum, avg, min, etc.), and existential and universal quantification. Since events are defined as XML fragments in our model, the language fits quite well.
3.1
3.2
Correlation
The Correlation step receives events from the Collection, and groups them into tuples of events that satisfy a given correlation property. This is achieved in two substeps. First of all, not all the events in the sequence are considered for correlation. The events that are to be considered are chosen using an arithmetic or boolean expression. The former, when calculated, gives the position of an event in the sequence. Events are numbered from 0 up to #sequence-1, where the unary operator # gives the cardinality of the sequence. The latter, on the other hand, makes use of the free variable i, and selects the events that satisfy a given boolean property. In these properties, i indicates an event’s position within the sequence. Once we have defined our subsets, selected events are grouped into tuples using the cartesian product between subsets. This operation excludes tuples with permutated events. Finally, only those tuples that satisfy a special cross-event correlation property are kept.
Collection
During the Collection step, a supervisor receives events from the rest of the system (from other supervisors and from its own supervised elements), and produces sequences. A sequence is defined as a set of events captured within a certain time- or cardinality-based window, and that satisfy a given filtering property. When an event is captured it is timestamped by the supervisor. Notice that in our model, events are XML fragments. In our running example there is one supervisor for each dungeon room. Each one needs to know the number of wizards in the other rooms. Therefore, it subscribes to receive events from the other supervisors. Each one also needs to gain information from the supervised elements in its cluster, so it listens to events created when an element enters or
60
Correlation properties are given in a temporal extension of the filtering language presented earlier. In particular, we introduce three temporal predicates: within, at, and after. within(A, B, K) is true if B is true when A is true, or if B becomes true less than (or equal to) K seconds after A is true. at(A, B, K) is true if B is true exactly K seconds after A is true. after(A, B, K) is true if B is true K (or more) seconds after A is true. Nothing is said about the time interval going from A being true to K instances later. In our running example, we have four sequences. Let us start by correlating the events coming from the other supervisors. When creating the sequences, we chose to use a cardinality-based window. This means that the events in these sequences may have been timestamped at very distant moments. This is not acceptable, since we need to avoid correlating events that would give us an incorrect snapshot of the overall system. First, we create three subsets of events, one for each sequence. Each chooses the single event in its sequence using an arithmetic expression (i.e., we indicate we want the events in position 0), while our correlation property uses the within predicate to check when the events were collected:
$r
notif
notif
source
wizards
source
wizards
source
wizards
blue
8
red
12
yellow
10
rooms
wizards
wizards
wizards
8
12
10
Figure 3: Aggregation as manipulation. Figure 3 illustrates the event manipulation performed on the single correlated tuple contained in sequence $seqSup. We pick a subset of the data in the tuple, and create a simpler structure that only considers what is strictly necessary for analysis. create seqAggSup = for $r in seqSup[0] do tree (node(’rooms’), node(’wizards’, $r/notif[0]/wizards), node(’wizards’, $r/notif[1]/wizards), node(’wizards’, $r/notif[2]/wizards));
create seqSup = $c in seqCavern[0], $m in seqMorgue[0], $l in seqLair[0] where (within($c, $m, 30) && within($c, $l, 30)) || (within($m, $l, 30) && within($m, $c, 30)) || (within($l, $c, 30) && within($l, $m, 30));
Notice that the alias $r is used to refer to the virtual node containing the three events in the tuple. The second possible behaviour adds true aggregation. In fact, we provide a special function for collapsing all the trees in a sequence into a single XML structure containing them all. This is achieved using function collapse(seq), where seq identifies the sequence being collapsed. If we want to further manipulate the collapsed tree, we can then use the same manipulation techniques shown above. Due to the fine-grained nature of the aggregation step, things can become complex. This is why the tree construction language also provides special syntax for code that needs to be repeated a given number of times. This is achieved combining the use of curly brackets and the . . . symbol, as we shall see in the example. Figure 4 shows how we collapse all the events in seqRoom into a single tree structure. We then simplify the new structure by eliminating the data we do not need.
The result is a new sequence (called seqSup) that contains a single tuple that holds all the three correlated events. Notice that the definition of the subsets (first three lines) also creates aliases for the correlation property (after the where keyword). In our example, we can decide to leave the sequence seqRoom untouched, and to forward it directly to the next step.
3.3
notif
Aggregation
In the Aggregation step, data is finalized for analysis. This is achieved at a fine-grained level, by leveraging XML technology to pick and manipulate a events’ internal data. Since each event is an XML tree, a correlated tuple can be seen as a virtual root node that contains the correlated events as subtrees, while a sequence can be seen as a sequence of XML trees. Data manipulation has two possible behaviours. In the former, each event in a sequence is manipulated and transformed to present a new data structure. In this case we dive deep into an event to pick out the data we want to keep, and then create a new tree structure to hold them. To do this, we use three ad-hoc tree creation functions:
create seqAggRoom = for $w in collapse(seqRoom) do tree (node(’wizards’), {node(’wizard’, $w/wizard[0]/id), node(’wizard’, $w/wizard[1]/id), ... node(’wizard’, $w/wizard[#seqRoom-1]/id)});
Notice that $w refers to the root node of the collapsed tree. At this point, should we want to aggregate the data from both sequences seqAggSup and seqAggRoom into a single tree we could specify:
• node(T), which generates an XML node with the tag name T , • node(T, v), which generates an XML node called T that contains the value v, and
create seqAgg = for $s in seqAggSup[0], $r in seqAggRoom[0] do tree (node(’global’), $s, %r);
• tree(r, n1 , n2 , ...nn ), which generates an XML tree in which r is the root node and n1 to nn are children nodes or XML subtrees.
Once again, $s and $r are aliases that refer to the roots of our events of interest. Notice that, when defining the construction of the new tree, we can refer directly to the
61
wizard
wizard
wizards nodes. The function calculates the average value of an expression (in our case $r itself). Using this notation we check the property once for each node in the sequence. To check that the property holds for all nodes in the sequence we susbstitute the keyword for with forall, while to check that the property holds for at least one node in the sequence we susbstitute it with exists. Finally, to predicate over a tuple and not a single event, we use the ‘,’ notation to introduce more than one alias, as we did in the correlation and aggregation steps. When the analysis is complete, the truth values are sent, together with the final aggregated data, over to the Reaction step.
wizard
id
power
id
power
id
power
1
10
2
20
3
30
$w
wizard
wizard
wizard
id
power
id
power
id
power
1
10
2
20
3
30
3.5
wizards
wizard
wizard
wizard
1
2
3
Figure 4: Aggregation with the collapse of a sequence. global
rooms
wizards
wizards
wizards
wizards
wizard
wizard
wizard
8
12
10
1
2
3
Figure 5: Final aggregation. tree pointed to by an alias, or to one of its subtrees, by appending a simple XPath expression. Figure 5 shows the final aggregation.
3.4
Reaction
We do not provide any special-purpose language for reactions, but use Java code instead. The reason is that we want reactions to be able to exploit the full power of the programming language. This choice may hamper the generality of our approach, but Java can easily be substituted by other similar programming languages. The reaction code can be seen as a pre-defined plan that is distributed across the supervisor and its elements. In general, a first part of the plan is executed by the supervisor, which delegates further parts of the plan to the elements in its cluster. When designing our reactions we must avoid parallel adaptations that can lead to inconsistent states. For example, there are cases in which we might want only one of the supervisors to be in charge of fixing the problem. In this particular case the problem could be solved using negotiation at the federation level. However, since the reaction can be quite complex, and since it hevaily depends on an application’s requirements, it is difficult to provide a general solution. This is why, in this phase of our ongoing work, this aspect is left to the application designer. In other words, currently the convergence of the adaptation depends on the designer’s plans. On the other hand, once the adaptation is clearly defined at the global level, supervisors can take action by pushing instructions towards their supervised elements. Depending on the nature of the communication between the supervised element and the supervisor, the reaction code to be executed at the element level is activated either synchronously or asynchronously. In the synchronous approach (see Figure 6), the element will be waiting for a response from the control loop. In this case, the reaction code is executed before allowing the business logic to resume. This is an intrusive approach that can have a deep impact on performance, but at the same time can be useful to avoid further problems. Asynchronous reactions have a much lower impact on supervised elements. When a supervised element is started, AOP kicks in to create a parallel thread. This thread will constantly wait for analysis results from the supervisor, while the supervised element is free to continue its own execution. When the analysis results arrive, pre-defined reaction code is executed. This code can use Java’s synchronization techniques to change an element’s behavior consistently, avoiding invalid configurations. This has two implications. First, designers must have sufficient knowledge of the business logic they want to impact. Second, the supervised element must be open to and aware of adaptation. Indeed, the validity of this solution depends on the application at hand.
Analysis
The Analysis step is responsible for checking certain properties against aggregated sequences. Once again, we use an extension to the filtering language presented earlier. In this step, however, we do not support temporal predicates. During analysis we can once again decide to treat aggregated sequences seperately or jointly. In the first case we predicate over the events in a sequence, in the second case we predicate over tuples created using cartesian products. Whatever the choice, we can define properties that must hold for all the events (or tuples), properties that must hold for at least one event (or tuple), or properties that need to be applied to every event (or tuple) and produce multiple truth values. In our running example, we have a single aggregated event (seqAgg) containing all the data we need. Therefore, we can write (see Figure 5): for $a in seqAgg[0] check count($w in $a/wizards/wizard; true) > avg($r in $a/rooms/wizards; $r);
First of all, we count the number of wizard nodes present in the supervised room (under node wizards). Notice that function count only considers the nodes that satisfy a given property (in our case the property is true so all wizards are counted). The result is compared to the average number of wizards calculated over the other rooms. This is expressed using function avg which defines an alias $r over a range of
62
4.
RELATED WORK
Kramer and Magee [13] have analyzed the architectural challenges that arise in the development of self-managed software. They propose a three level architectural model. The first layer is the component control layer, which manages the interconnections among a system’s components. The second layer is the change management layer. This layer contains a set of pre-planned reactions to foreseeable changes of state or environment. The third level is the goal management level in charge of managing potentially unforseeable system goals, defined at a high-level of abstraction by human stake-holders. With respect to this model, we provide solutions for the first two layers. Using AOP we adapt behaviours at the component level, while, thanks to our supervisors and our federations, we gather consistent views of the overall system pushing adaptations towards our components. Our approach differs from similar existing works in many ways. First of all, in this paper we propose a decentralized solution and our architecture avoids single point of failure and bottlenecks due to centralized components successfully managing applications in which components frequently enter or leave the system. Silva et al. [20] describe a stream processing infrastructure in which a central component is in charge of job orchestration, optimization, and resource management. Failures are dealt with by means of centralized and persistent check-pointing. Ruth et al. [19] propose another example in which a shared distributed infrastructure is formed by federating computing resources of multiple domains, but a single centralized component is in charge of reallocating the different tasks. A key element of our proposal is the extrinsic monitoring [9] used to oversee the execution of supervised elements. Beside aspect orientation ( [1, 11]), there are several different techniques that can be adopted to supervise components. A widely adopted solution to query the current status of supervised components is the adoption of standard interfaces (e.g., [14]). This approach forbids code-level monitoring, and requires that a proper wrapper be developed for each managed resource. The development of kernel modules (e.g., [16]) allows for the monitoring of specific resources used by supervised components and does not require wrappers. However, this technique only monitors low-level resources (e.g, CPU utilization) and assures a particular quality of service, but it does not allow fine-grained application specific monitoring. Runtime verification frameworks perform monitoring with techniques similar to ours. Indeed, M.Kim et al. [12] and Chen et al. [5] designed and developed two different tools for runtime verification that collect events and data from the supervised program by transparently instrumenting the source code. However, these tools do not support systemwide application steering through adaptations, and their architectures are based on centralized components in charge of detecting property violations. A significant amount of the work presented in this paper borrows its basis from the ideas behind complex event processing. For example both our languages and the work by Luckham et al. [15] aim at discovering complex patterns among streams of events. In both cases, we start from row events, filter them and map them against user-defined constraints. In our case, everything is also rendered in XML and tailored to the adaptability of supervised elements. Another important player in complex event processing is
Supervised entity sendEvent(); analysisResult = receiveAnalysis(); if (analysisResult == true) react(); AOP code
Figure 6: Synchronous reaction. Another option is to have the code activate new AOP hooks in the element (see Figure 7). This way, when the supervised element’s logic reaches that point, the reaction code is finally executed. Although this approach has a lesser performance impact on the supervised element, it does not guarantee that the reaction will occur as soon as we may like.
Supervised entity
addReactionHook(); AOP code
Figure 7: Asynchronous reaction. In our running example, the supervisor takes action when its own room does not contain enough wizards. In this case, it executes the following plan. First, it uses the federation to communicate how many wizards it needs, asks for help, and waits for answers from other supervisors. Other supervisors respond with the number of available wizards2 . The supervisor needing additional wizards can then make explicit requests to the others. If a supervisor receives a request for X wizards, it choses them from its cluster, and pushes down further adaptation. Each wizard is built to have a list of room prefereneces. These are the information used to cluster them. Since the communication between the supervisor and the element is asynchronous, the local adaptation uses Java’s synchronization techniques to modify the list of room preferences in favour of the room to which the wizard needs to be sent. The adaptation then detaches the element from the cluster. This results in a new clusterization phase for that element. The solution is robust since the clusterization tries to put the wizard in the optimal room. Should the room non longer exist, it would chose the second best option.
2 A supervisor only offers wizards if it knows the promise can be held.
63
Esper [21] that proposes a complete but quite complex SQLlike notation. Although fine-grained selection and event creation are possible, Esper does not explicitly encourage incremental approaches. Even though our supervision can also be considered complex, recall that only the collection and the analysis steps are required, and that we explicitly support incremental development. Finally, Esper is proposed as an integral tool to the design and development of a system. On the other hand, thanks to AOP our approach can easily be used with legacy components.
5.
[5]
[6]
CONCLUSIONS AND FUTURE WORK [7]
This paper outlines our ongoing work on an innovative approach to the component-based design and implementation of distributed self-adaptive systems. The approach introduces a component model and languages for triggering adaptation at different levels of pervasiveness. In this paper we only tackle supervision aspects, while the application’s logic is out of scope. Therefore, we do not consider what links or dependencies our supervised elements may have with the rest of the world, including with other elements in other clusters. These aspects are only considered if the designer decides to modify them through local adaptations. The solution is decentralized, robust with respect to failures and bottlenecks, and guarantees a dynamic and flexible infrastructure for application domains in which selfadaptivity is a critical requirement. Moreover, our approach guarantees a sharp separation of concerns between business logic and adaptation logic. Although we adopted a fantasy example, the case study actually provides a general abstraction for distributed load balancing tasks. Future work will consist in a further refinement of our model and languages, to achieve a higher degree of generality and expressiveness. Moreover, we are also investigating the different technological enablers we can exploit. Currently, we are using JBoss AOP [1] to inject Java code into our components, we are evaluating the JXTA framework [17] for peer-to-peer networks, and we are studying Shoal [18] as clustering infrastructure. The AOP code we introduce for clustering, monitoring, and adaptation is currently statically defined prior to execution, and only switched on during execution. However, we are also interested in studying planning techniques, since there is no reason to believe that the code cannot be created at least partially on the fly. Finally, we plan to continue our ongoing implementation and to use a set of real case studies to assess the hypotheses made in our approach.
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15] [16]
[17] [18] [19]
6.
REFERENCES
[1] JBoss AOP. http://labs.jboss.com/jbossaop. [2] L. Baresi and S. Guinea. Dynamo and Self-Healing BPEL Compositions. International Conference on Software Engineering, pages 69–70, 2007. [3] Antonio Carzaniga, David S. Rosenblum, and Alexander L. Wolf. Design and evaluation of a wide-area event notification service. ACM Transactions on Computer Systems, 19(3):332–383, August 2001. [4] A.J. Chakravarti, G. Baumgarner, and M. Lauria. The organic grid: Self-organizing computation on a
[20]
[21]
64
peer-to-peer network. In International Conference on Autonomic Computing., 2004. F. Chen and G. Rosu. Java-MOP: A monitoring oriented programming environment for Java. Proceedings of the Eleventh International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS), pages 546–550, 2005. D. Devescovi, E. Di Nitto, and R. Mirandola. An infrastructure for autonomic system development: the selflet approach. Proceedings of the twenty-second IEEE/ACM international conference on Automated software engineering, pages 449–452, 2007. Mohamed Fayad and Marshall P. Cline. Aspects of software adaptability. Commun. ACM, 39(10):58–59, 1996. P. Horn. Autonomic computing: IBM’s perspective on the state of information technology. IBM TJ Watson Labs., October 2001. G. Kaiser, P. Gross, G. Kc, J. Parekh, and G. Valetto. An Approach to Autonomizing Legacy Systems. Defense Technical Information Center, 2005. G. Kaiser, J. Parekh, P. Gross, and G. Valletto. Kinesthetichs extreme: An external infrastructure for monitoring distributed legacy systems. In IEEE 5th Annual International Active Middleware Workshop, 2003. G.J. Kiczales, J.O. Lamping, C.V. Lopes, J.J. Hugunin, E.A. Hilsdale, and C. Boyapati. Aspect-oriented programming, October 15 2002. US Patent 6,467,086. M. Kim, S. Kannan, I. Lee, and O. Sokolsky. Java-MaC: a Run-time Assurance Tool for Java. ˜ Proceedings of Runtime Verification (RVO01, 55, 2001. J. Kramer and J. Magee. Self-Managed Systems: an Architectural Challenge. International Conference on Software Engineering, pages 259–268, 2007. H. Liu, M. Parashar, and S. Hariri. A Component Based Programming Framework for Autonomic Applications. Proc. of 1st International Conference on Autonomic Computing, 2004. D. Luckham. The Power of Events: An Introduction to Complex Event Processing, 2002. C. Poellabauer, H. Abbasi, and K. Schwan. Cooperative run-time management of adaptive applications and distributed resources. In 10th ACM Multimedia Conference., 2002. JXTA Project. https://jxta.dev.java.net/. Shoal Project. https://shoal.dev.java.net/. P. Ruth, J. Rhee, D. Xu, R. Kennell, and S. Goasguen. Autonomic live adaptation of virtual computational environment in a multi-domain infrastructure. In International Conference on Autonomic Computing., 2006. G.J Silva, J. Challenger, L. Degenaro, J. Giles, and R. Wagle. Towards autonomic fault recovery in system-s. In International Conference on Autonomic Computing., 2007. Esper: Event Stream and Complex Event Processing. http://esper.codehaus.org/index.html.
Lihat lebih banyak...
Comentários