Dynamic power management of multiprocessor systems

June 30, 2017 | Autor: Dong-in Kang | Categoria: Power Management, Energy Management, Real Time Systems, Frequency, Adaptive System, Power allocation, Job shop scheduling, Dynamic power management, Power allocation, Job shop scheduling, Dynamic power management

Share Embed

Denunciar este link

Descrição do Produto

Dynamic Power Management of Multiprocessor Systems∗ Jinwoo Suh, Dong-In Kang, and Stephen P. Crago University of Southern California/Information Sciences Institute 3811 N. Fairfax Drive, Suite 200, Arlington, VA 22203 {jsuh, dkang, crago}@isi.edu

Abstract Power management is critical to power-constrained real-time systems. In this paper, we present a dynamic power management algorithm. Unlike other approaches that focus on the tradeoff between power and performance, our algorithm maximizes the power utilization and performance. Our algorithm considers the dynamic nature of the environment such as changes on the available energy and adapts system parameters such as the operating voltage, frequency, and the number of processors. In our algorithm, we divide the power management problem into three sub-problems: initial power allocation, system parameter computation based on the allocated power, and dynamic update of the power and system parameters at run time. Initial power allocation minimizes wasted energy by using extra energy for useful work. It also avoids the undersupplied power situation by reducing power usage before such situations happen. The system parameters are computed to maximize the performance for a given power. During runtime, the system parameters are updated continuously to accommodate differences between the expected and real situations. The simulation results of the algorithm for a satellite system using eight Processor-In-Memory (PIM) processors are presented.

1. Introduction Many embedded systems in power-constrained environments such as satellite systems, hand-held devices, and solar powered systems operate using rechargeable batteries. Even though technology advances in rechargeable batteries have improved the energy storage capability continuously, the power requirements for many mission critical systems still far exceed the storage capacity. Therefore, efficient power management techniques are critical for the operation of these systems. In this paper, we consider a multiprocessor system that has a rechargeable battery to store energy and external power sources. In this system, not only low-power design is important, but also high power utilization and performance are important to achieve. The power management algorithm needs to find an initial power management schedule, and, during run time, it needs to determine system parameters and update the schedule to accommodate the variances of the planned schedule and real schedule. Recently, hardware/interface support for efficient power management has become available (e.g. frequency scaling, voltage scaling, and OS-level industry standards). The StrongARM processor supports frequency scaling by providing a clock frequency change register. However, it does not support dynamic voltage scaling. Some efforts have added voltage scaling capability to StrongARM based systems using external circuits [17][23]. Crusoe chips provide both voltage and frequency scaling [6]. The Power-Aware Multiprocessor Architecture (PAMA) effort implemented a multiprocessor system that provides frequency scaling [14]. Industrial standards include Advanced Configuration and Power Interface (ACPI) [12] and OnNow [16]. Using these hardware/interface supports, many power management techniques have been proposed. The simplest and most widely used technique for dynamic power management is the time-out method, in which components are turned off after a fixed amount of idling time. Advanced methods include more sophisticated techniques to estimate expected idling time to minimize the power usage. For example, the idling history can be used to estimate the future status of a system [10][15][25]. Low-power task scheduling algorithms for uniprocessor [3][13][22][24][27] and multiple devices [14] have been proposed. In these algorithms, the service requests are clustered to make the busy time and idle time long

∗

Effort sponsored by Defense Advanced Research Projects Agency (DARPA) through the Air Force Research Laboratory, USAF, under agreement number F30602-00-2-0548. The U.S. Government is authorized to reproduce and distribute reprints for governmental purposes notwithstanding any copyright annotation thereon. The views and conclusions contained herein are those of the authors and should not interpreted as necessarily representing the official policies or endorsement, either expressed or implied, of the Defense Advanced Research Projects Agency (DARPA), Air Force Research Laboratory, or the U.S. Government.

1

Expected Charging Schedule

Remaining Energy

Expected Event Schedule

Allowable Power Estimation

Calculate System Parameters -Number of processors -Operating voltages -Operating frequencies

Event Statistics

System Parameters

Figure 1.

Dynamic power management algorithm

enough to minimize the state transitions and hence reduce power usage. However, these algorithms cannot save power if the input data is fed into the system continuously since the system is never turned off. Stochastic control approaches are used in [1][20][21]. In these approaches, the dynamic power management problem is formulated as a policy optimization. In [1], a system is modeled as a discrete-time Markov decision process. In [20], a dynamic power management algorithm based on continuous-time Markov decision processes is proposed. In [21], the system is extended to include multiple service providers. These algorithms calculate power management policies for system components that minimize power usage for given delay constraints. Thus, if power is oversupplied, i.e., the supplied energy is more than the rechargeable battery can store, then the energy is wasted. Other related areas such as instruction-level power optimization, variable voltage techniques, and approximate signal processing are extensively surveyed in [5]. The previous work mainly focuses on minimizing the energy consumption or maximizing the performance when the energy is given. In this paper, we describe a comprehensive dynamic power management algorithm for multiprocessor systems to maximize the performance and power utilization. To maximize both power utilization and performance, we first maximize the power utilization, which results in an initial power allocation schedule. Then, the performance is maximized for the given schedule. In our algorithm, we consider both environment and computing resources together. Environmental factors include the dynamic nature of the energy source, the status of the energy currently available from a rechargeable battery, and the dynamic nature of the events. The parameters of the computing resource include operating voltage, frequency, and the number of processors dynamically in run-time. Unlike most previous algorithms that are designed for uniprocessor systems, our algorithm is for multiprocessor systems. Figure 1 shows our approach. It first estimates the initial power allocation. The estimation of the initial power allocation maximizes power utilization and performance by avoiding oversupplied or undersupplied power conditions by using or saving energy before such conditions occur. Then, based on the power allocation, the system parameters are computed. The parameters are computed to maximize the performance for a given power allocation. Then, during run-time, the variances of the externally supplied energy and power usage are computed dynamically and are used to recalculate the power allowance and system parameters. The rest of the paper is organized as follows: In Section 2, the dynamic power management problem is defined. The performance and power usage models are briefly described in Section 3. Section 4 presents our algorithm. Simulation results are shown in Section 5. Section 6 concludes the paper.

2. Problem definition The system considered in this paper consists of N homogeneous processors. Each processor can operate at a frequency between fmin and fmax and can be turned off independently. The supplied voltage to each processor is

2

between vmin and vmax. The power is supplied from a rechargeable battery that is charged by an external power source that has a periodic power supply schedule. For example, a signal processing system on a satellite operates using energy from a solar panel mounted on the satellite. The charging property is periodic due to periodic orbiting. Let us denote the period as T. For a given time t, 0 ≤ t < T, the following are defined: Expected charging schedule c(t). The c(t) is an estimated external power that will be supplied at time t. The schedule may be derived theoretically or empirically. For example, the recorded charging power for the previous period or weighted average of the several previous periods can be used. Expected event rate schedule u(t): The event rate is the rate of the events that initiate the computation on the system. As in the expected charging schedule, u(t) can be derived using any reasonable method such as predicting from data for the previous periods and mathematically computed supply power. For example, if the event is related to weather conditions, weather forecast data can be used for the estimation of u(t). Weight function w(t): The weight function is user input that is used to emphasize some portion of the period. For example, if we want to process data more intensively during commute time in a traffic monitoring system, then the period is given a higher weight value. Maximum charging capacity cmax: The maximum capacity of the rechargeable battery. Even if the externally supplied energy is available for charging, if the energy charged to the battery is equal to cmax, then the energy cannot be stored. Thus, the additional supplied energy is wasted. Minimum charging capacity cmin: The minimum charge that should be maintained at all times. Energy utilization: (Energy used for computation)/(Energy available) for the period T The goal is to compute the following parameters of the multiprocessor system to maximize the energy utilization and maximize the performance for the given inputs. Number of processors nt: The number of active processors on the system at time t. Frequncy of processors (f0, f1, … fn-1)t: The frequency of each active processor at time t. Voltage of processors (v0, v1, … vn-1)t: The voltage of each active processor at time t.

3. Performance and Power model The performance of the system as a function of frequency f and voltage v is modeled as follows1: Perf ( f , v) ∝ min( f , g (v)) , f min ≤ f ≤ f max , vmin ≤ v ≤ vmax

(1)

, where g(v) is the maximum frequency that can be obtained when voltage v is applied. The equation indicates that the performance is proportional to the frequency if f is less than g(v). The applications that we consider in T1

S

T2

E

TN

Figure 2.

Task graph

1

Note that the equation is simplified for analysis by assuming other factors such as memory latency and data dependency are all scalable to the frequency.

3

this paper are parallel applications with initial and final stages. The task graph is shown in Figure 2. The performance of the system for the applications using n homogeneous processors is derived from Amdahl’s law [1]. Perf (n) = c0

(2)

1 Tt − Ts Ts + n

where c0 is a constant, Tt is a total execution time when n = 1, and Ts is the execution time of the portion that cannot be parallelized. When we consider both the frequency and voltage, the overall performance equation is Perf (n, f ) = c1

min( f , g (v)) T − Ts Ts + t n

(3)

The power consumption model for a uniprocessor [9] is Power (n, f , v) ∝ f i vi

(4)

2

Thus, the power consumption for N processors is n −1

Power ( n, f , v) = c2 å f i vi

2

(5)

i =0

, where c2 is a constant2. For a homogeneous system in which all the processors operate at the same clock frequency and voltage, then the power consumption is

(6)

Power(n, f , v) = c2 nfv 2

This equation indicates the power is proportional to the number of processors, operating frequency, and square of operating voltage.

4. Dynamic Power Management Technique In this section, our dynamic power management technique is described. First, the initial power allocation computation is presented. Then, the system parameter computation algorithm based on the allocated power is shown. Then, the dynamic update of the power allocation and system parameters during run-time is described.

4.1. Initial Power Allocation Computation To compute initial power allocation, the weighted power usage function (WPUF) is computed. The WPUF is a desired power allocation based on the event rate schedule and weight function.

WPUF(t) = u(t) w(t) for all 0 ≤ t ≤ T

(7)

To maintain balance between externally supplied energy and dissipated power for a period, WPUF is multiplied by a constant as follows. T

u new =

u (t ) w(t ) ò c(t )dt 0

ò

T

0

The surplus power function is

(8)

w(t )u (t )dt

c(t ) − u new (t )

(9)

The integration of the above function shows the power available from rechargeable battery (not including externally supplied power) at time t. t

Poriginal (t ) = ò c(v) − u new (v)dv

(10)

0

2

Note that the cost of communication is ignored. This is true for applications that have no communications among processors. However, the simplified model does not limit the applicability of the algorithms presented in this paper except Equation (18).

4

This function may exceed Cmax or be less than Cmin. If it exceeds Cmax, then the externally supplied energy available cannot be charged to the battery, which results in the waste of available energy. If it is less than Cmin, then computation may not be performed until the battery is recharged. Thus, if the power available at time t exceeds Cmax, then it is desirable to dissipate some power before time t for useful tasks to obtain better energy utilization. If the power available at time t is less than Cmin, then the power needs to be saved before time t by using less energy before time t. The adjustment of power allocation can be done using the following algorithm. Algorithm 1: Adjust power dissipation schedule

dPoriginal (t )

= 0 and ( Poriginal (t ) < C min or Poriginal (t ) > C max ) };

1

S ← {t|

2 3 4

Sort S based on t in ascending order; for all elements in S if two consecutive elements, t0 and t1 satisfy

5

Remove an element that has larger

6

if two consecutive elements, t0 and t1 satisfy

7 8 9 10 11 12 13

Poriginal (t 0 ) < C min and Poriginal (t1 ) < C min then Poriginal (t ) ;

Remove an element that has smaller t0← the first element of S; temp ← t0; S ← S – {t0}; while S is not empty t1 ← the first element of S; if Poriginal (t 0 ) < C min and

14 15

dt

Pinit (t ) = else if

Poriginal (t 0 ) > C max and Poriginal (t1 ) > C max then Poriginal (t ) ;

Poriginal (t1 ) > C max then

C max − C min ( Poriginal (t ) − Poriginal (t 0 )) + C min , t 0 ≤ t ≤ t1 ; Poriginal (t1 ) − Poriginal (t 0 )

Poriginal (t ) > C max and Poriginal (t1 ) < C min then C max − C min ( Poriginal (t ) − Poriginal (t1 )) + C min , t 0 ≤ t ≤ t1 ; Poriginal (t 0 ) − Poriginal (t1 )

16

Pinit (t ) =

17 18 19 20

t0 ← t1; S ← S- {t1};

t1 ← temp; Adjust Pinit for

0 ≤ t < t1 , t 0 ≤ t < T using Lines 13 – 16 by considering two segments as a one contiguous time segment t 0 ≤ t < T + t1 . In lines 1 – 7, the algorithm first identifies the times at which the original power allocation reaches maximum or minimum. These times are used to adjust the power allocation. From lines 8 to 18, the algorithm adjusts the power allocation by allocating more power or less power based on this information. In the algorithm, the amount of stored energy depends on the original power allocation. However, other ways of adjusting can be used. For example, the power can be evenly distributed. From lines 19 – 20, the power allocation is adjusted between 0 ≤ t < t1 , t 0 ≤ t < T .

4.2. Initial Parameter Computation For the given initial power allocation, the system parameters need to be computed. The status of the system at time t can be denoted using (f0, … fn-1, v0, … vn)t, where fi denotes the frequency of the processor i and vi denotes the voltage of the processor i. When a processor is inactive, the processor is denoted using zero voltage or frequency. The frequency and voltage are such that fi ∈F and vi ∈ V, where F and V are sets of frequency values and voltages values, respectively. The number of active processors can be easily derived by counting processors that have non-zero frequency or voltage.

5

The parameter space can be reduced by the following observations. The first observation is that if the frequency of a processor is lower than f0, where f 0 = g (v min ) at time t, then, changing frequency will change the performance; However, changing operating voltage does not change performance. The second observation is that if the frequency of a processor, f1, is higher than or equal to f0, where f 0 = g (v min ) , then, the operating voltage to choose for the best performance/power ratio is v1 = g −1 ( f1 ) . From these observations, the best operating voltage for a given frequency is determined by: ì g −1 ( f ), if g −1 ( f ) ≥ v min v=í −1 î v min , if g ( f ) < v min

(11)

Therefore, the system parameter space can be reduced to frequency only since voltage can be derived from frequency using the above equation. Thus, in those systems, the system parameters can be represented using (f0, … fn). In many systems, the same clock frequency is used for all processors. Thus, we can further simplify the problem for those systems by using the same frequency for all processors. Then, the parameter can be represented by (n, f). If the parameter space is continuous and there is no overhead for changing frequency, voltage, and the number of processors, then the parameters can be computed as follows for two cases: (i) f < g(vmin) and (ii) f ≥ g(vmin).

Case 1: f < g(vmin)

The partial derivatives of the performance with respect to power when either n or f is constant and f < g(vmin) are: ∂Perf (n, f ) ∂Power

∂Perf (n, f ) ∂Power

n =const

f = const

Power ö æ ç c1 ÷ c2 n ÷ c1 ∂ ç = = ç ÷ T − T ∂Power c2 (nTs +T t −Ts ) t s ç Ts + ÷ n ø è æ ö ç ÷ ç ÷ ç ÷ c1 (Tt − Ts ) c1 f ∂ = = ç Tt − Ts ÷ c 2 (nTs +T t −Ts ) 2 ∂Power ç ÷ Ts + Power ÷ ç ç c 2 f ÷ø è

(12)

(13)

To determine which is better to change to obtain a higher performance/power ratio, we compare the two equations as follows. c1 ∂Perf (n, f ) ∂Power n =const nTs c (nTs +T t −Ts ) = 2 = +1 > 1 c1 (Tt − Ts ) ∂Perf (n, f ) T t −Ts ∂Power f =const c 2 (nTs +T t −Ts ) 2

(3

nTs >0) T t −Ts

(14)

Therefore, better performance per power can be obtained by changing frequency than the number of processors when f < g(vmin). Case 2: f ≥ g(vmin)

The partial derivatives of the performance with respect to power when either n or f is constant and f ≥ g(vmin) are: ∂Perf (n, f ) ∂Power n =const

6

æ æ Power ö1 / 3 ö çc ç ÷ ÷ ç 1 çè c2 n ÷ø ÷ c1 ∂ = ÷= ç Tt − Ts ÷ 3c 2 f 2 (nTs +T t −Ts ) ∂Power ç T + ÷ ç s n ø è

(15)

∂Perf (n, f ) ∂Power

f = const

æ ç ç ç c1 f ∂ = ç T − Ts ∂Power ç Ts + t Power ç ç c2 f è

ö ÷ ÷ ÷ c1 (Tt − Ts ) ÷= 2 c ( nT 2 s +T t −Ts ) ÷ ÷ ÷ ø

(16)

To determine which is better to change, we compare the two equations. ∂Perf ( n, f ) ∂Power n =const nTs 1 = + ∂Perf (n, f ) 3(T t −Ts ) 3 ∂Power f =const

(17)

Thus, if Tt - Ts is zero, i.e., no parallel computation can be performed, then it is obvious that there is no need to increase the number of processors. Thus, we do not consider this case anymore. If nTs > 2 , then changing (Tt − Ts )

frequency will obtain better performance/power than changing the number of processors. If

nTs ≤2, (Tt − Ts )

changing the number of processors is better than changing frequency. Thus, (n, f) can be determined as follows. ì Powerinit (t ) ), ï (1, c v 2 2 min ï ï ê Powerallowed (t ) ú ï ( ê c g (v )v 2 ú, g (vmin )), ï ëê 2 min min ûú ( n, f ) = í é æ öù ï( ê2ç Tt − 1÷ú, g (v)), ï ê çè Ts ÷øú ï ï ( ê Powerallowed (t ) ú, g (v )), max ï êê c g (v )v 2 úú î ë 2 min min û

if Powerinit (t ) < c2 g (vmin )v 2min é æT öù if c2 g (vmin )v 2min ≤ Powerinit (t ) < ê2c2 çç t − 1÷÷ú g (vmin )v 2min ê è Ts øú é æ Tt é æT öù öù if ê2c2 çç − 1÷÷ú g (vmin )v 2min ≤ Powerinit (t ) < ê2c2 çç t − 1÷÷ú g (vmax )v 2max ê è Ts øú ê è Ts øú

é æT öù if ê2c2 çç t − 1÷÷ú g (vmax )v 2max ≤ Powerinit (t ) ê è Ts øú

(18)

In real systems, it is obvious that the number of processors cannot change continuously. Also, in many systems, the frequency can be chosen from a set of pre-selected frequency values, and there is an overhead for changing parameters. For example, when a processor is changed to active mode from stand-by mode, it needs to be initialized before it can process tasks. Let us denote the overhead due to the change in the number of processors and frequency change as OHn and OHf, respectively. We assume that the system parameters can be changed at time t =iτ, i = 0, 1, 2, …, T/τ-1, where τ is a constant. The following algorithm computes the system parameters so that the power usage closely follows the allocated power schedule while obtaining the best performance/power ratio. Algorithm 2: Determining the number of processors and operating frequency 1 for all n and f 2 Compute power-performance pair (Power(n, f), Perf(n, f)); 3 for all (nv, fv) and (nw, fw), where nv ≠ nw and fv ≠ fw 4 if Power(nv, fv) ≥ Power (nw, fw) and Perf(nv,fv) τ Poweroverhead then (nt, ft) ← (ntmp, ftmp); else then (nt, ft) ← (nt - τ, ft - τ); t = t + τ;

Lines 1 - 2 compute the parameter-power pair table. Lines 3 – 5 check the table and if a pair shows less performance with the same power, then the pair that shows less performance is removed. Lines 6 – 9 calculate the parameter for time 0. In lines 10 – 22, the parameters for the rest of the periods are computed. In line 11, the Pinit is recomputed to accommodate the difference of the initial power allocation and the power usage due to the discrete nature of the computed parameters. Lines 12 – 13 compute the new parameters. Lines 14 – 22 compute the overhead due to the change of the number of processors and frequency. If they are worth changing, then the parameters are changed; otherwise, the current parameters are maintained. Algorithm 3 Dynamic update of the power allocation t

1 2 3 4 5

Ediff ←

òP

init

(v) − Pactual (v)dv

t −τ

if Ediff > 0 then Find time w such that

Pinit (v) = C max ;

for all v such that t ≤ v < w

Pinit (v) = min( E diff

Pinit (v)

òP

init

6 7 8 9

if Ediff < 0 then Find time w such that

, C max ) ;

w

(v)dv

t

Pinit (v) = C min ;

for all v such that t ≤ v < w

Pinit (v) = E diff

Pinit (v) w

òP

init

;

(v)dv

t

In Algorithm 3, the deviation during initial system parameter computation due to the discrete parameter space is accommodated. The deviation is computed after each interval τ. If less energy was used than planned, then energy is allocated to the next computations. If more energy was dissipated than expected, then less power is allocated to the next computations.

8

4.3. Dynamic Recomputation of the Power Allocation and System Parameters Since the actual event rate and externally supplied energy can be changed during the run-time, there can be deviation of the energy from the initial expectation. Algorithm 3 is used for the recomputation of the power allocation during run time. In this case, the Pactual in line 1 is the real power used for the previous computations.

5. Simulation Results: An Example System We implemented a simplified Fast On-Orbit Recording of Transient Events (FORTE) [18] signal processing application that detects radio frequency events from a satellite. In FORTE, when the signals from sensors trigger an analogue threshold circuit, digital signal processing is performed to check if it has the characteristics of an interesting RF event. Most of the computation for the system is an FFT that takes about 60% of the execution time. In this implementation, we used only FFT portion of the application. Since our platform does not support floating-point operations, we implemented fixed-point FFT operations. The system simulated is based on the PAMA (Power-Aware Multiprocessor Architecture) board parameter values [4]. The board is a modified version of the System Level Intelligent Intensive Computing (SLIIC) Quick Look board [26]. The board consists of eight Processor-In-Memory (PIM) processors and two FPGAs for interconnection. The PIM processors are M32R/D chips each of which contains 2 MB DRAM. The clock speed of the M32R/D can go up to 80 MHz internally with 20 MHz I/O. The network in the FPGA is a unidirectional ring network. A power measurement board is used to measure real-time power consumption. The processors can be in either active, sleep, or stand-by mode [18]. In active mode, the full circuit is active (546 mW typical). In sleep mode, only the memory in the chip is working (393 mW typical). In stand-by mode, all circuits are stopped except the interrupt monitoring circuit (6.6 mW typical). In this simulation, the sleep mode is not used. The board supports changing the number of processors and frequency. Possible frequencies are 20, 40, and 80 MHz. We used one of the processors as a controller processor and the rest of them are used for FORTE signal processing. The controller processor computes the Pinit and updates it. Based on Pinit, it sends frequency and active/stand-by mode change commands to other processors. Each processor checks the command from controller processor after each computation. If the command is stand-by, then it changed its mode to stand-by. When the controller processor decides to wake-up a processor, it sends the wake-up signal to the processor. When the command is a frequency change, then a processor changes its frequency. The frequency change is performed by writing the frequency data to a specific address that is mapped to the adjacent FPGA. The FPGA changes the supplied clock frequency to the processor. After writing the frequency data, the processor goes to stand-by. Then, after a fixed time (10 cycles), the FPGA automatically wakes up the processor so that the processor continues the computation with the changed frequency. Therefore, in our current implementation, changing the frequency takes more time than changing the mode. Currently, the specification of the M32R/D recommends 80 MHz. However, our experiments found that the processor can operate at 20 – 80 MHz clock frequency. Thus, we set the possible frequency choices in this simulation are 0, 20, 40, and 80 MHz. We set the current voltage (3.3 V) to vmin = vmax. The measurement of the execution time for the 2K sample FFT at 20 MHz was 4.8 sec. Thus, we set the τ as 4.8 sec. T is set to 57.6 sec. Thus, there are 12 parameter updates in a period. The run-time trace of the Pinit is simulated for two scenarios. Figure 2 and Figure 3 show charging schedules and power usage schedules for two different scenarios. The simulated results are shown in Tables 2 - 5. Table 2 and Table 4 show the computation of the initial power allocations. Rows show the iteration of the computation. In both scenarios, after five iterations, the integration of the initial power allocation is more than the minimum requirement (0.098). Table 3 and Table 5 show dynamic update of the power allocations during run-time. Rows correspond to time. Used energy means the energy used by the system based on the parameter setting derived from the allocated energy. Expected charge is the energy expected at the time and supplied energy is the real energy supplied. The tables show that whenever there is a difference in power usage and energy supplied between estimated and actual values, the power allocation is recalculated. In this simulation, we assumed no overheads for changing the number of processors and frequency.

9

3

Charging schedule Use schedule

Power (W)

2.5 2 1.5 1 0.5 0 0

4.8 9.6 14.4 19.2 24 28.8 33.6 38.4 43.2 48 52.8 Time (Sec)

Figure 3.

Charging and use schedule for scenario I

4

Charging schedule Use schedule

3.5

Power (W)

3 2.5 2 1.5 1 0.5 0 0

4.8 9.6 14.4 19.2 24 28.8 33.6 38.4 43.2 48 52.8 Time (Sec)

Figure 4.

Charging and use schedule for scenario II

Two metrics are computed for comparison: energy wasted and undersupplied energy. The energy wasted is energy that was not used for useful computation. This happens when the battery is fully charged while the external source has energy to supply. Undersupplied energy means the energy needed for computation but not available at that time. For comparison, we implemented a static algorithm. Since no overhead for changing the number of processors or frequency is assumed, the system is turned off while there is no input data to process. If the externally supplied energy is more than the usage, then, the difference is charged to a rechargeable battery. If more energy is used than supplied energy, then, the difference is supplied from battery. Table 1 Comparison of algorithms Algorithm Metric Proposed Wasted energy Undersupplied energy Static Wasted energy Undersupplied energy

Scenario1 13.68 J 23.11 J 40.93 J 39.33 J

10

Scenario 2 6.18 J 6.27 J 69.33 J 67.91 J

The results are shown in Table 1. The results show that the proposed algorithm reduces the wasted energy by more than a factor of ten compared with the optimal time-out algorithm. Also, since it allocates less power if it anticipates a situation when the energy is undersupplied, it lowers the probability of the undersupplied situation.

6. Conclusions and Future Work We have described a dynamic power management technique for a multiprocessor system. By computing the initial power allocation, our technique can minimize the wasted energy. Also, system parameters are calculated to maximize the performance for a given power. We have simulated the algorithm for a satellite signal processing system based on the Power-Aware Multiprocessor Architecture (PAMA). The results show that the dynamic algorithm reduces the wasted energy significantly. Also, it prevents a situation where the supplied power is not enough to maintain minimum requirement of the power supply. In the future, we plan to implement the algorithms on the PAMA board. Also, we plan to extend the algorithm to allow different frequency and voltage for each processor in the system. Finally, we plan to extend the algorithm to a heterogeneous system in which each component has different processing characteristics.

7. References [1] G. M. Amdahl, “Validity of Single-Processor Approach to Achieving Large-Scale Computing Capability,” AFIPS Conference, Reston, VA, 1967. [2] L. Benini, A. Bogliolo, G. Paleologo, and G. De Micheli, “Policy Optimization for Dynamic Power Management,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems,” Vol. 18, No. 6, pp. 813-833, June 1999. [3] J. J. Brow, D. Z. Chen, G. W. Greenwood, X. Hu, and R. W. Taylor, “Scheduling for Power Reduction in a Real-Time System,” International Symposium on Low Power Electronics and Design, 1997. [4] S. Crago, D.-I. Kang, C. Li, J. Suh, K. McCabe, and M. Gokhale, “Power Aware Multiprocessor Architecture,” DARPA PI meeting, Annapolis, MD, November, 2000. [5] L. Benini and G. De Micheli, “System-Level Power Optimization: Techniques and Tools,” International Workshop on Low Power Electronics and Design, 1999. [6] DARPA, Power Aware Computation/Communication, http://www.darpa.mil/ito/research /pacc/index.html, 2001. [7] L. Geppert and T. S. Perry, “Transmeta’s Magic Show,” IEEE Spectrum, Vol. 37, No. 5, 2000. [8] J. Hennessy and D. A. Patterson, Computer Architecture: A Quantitative Approach, 2nd Edition, Morgan Kaufmann Publishers, Inc., 1996. [9] I. Hong, G. Qu, M. Pokonjak, and M. B. Srivastava, “Synthetic Techniques for Low-Power Hard Real-Time Systems on Variable Voltage Processors,” IEEE Real-Time Systems Symposium, December 1998. [10] C.-H. Hwang and A. Wu, “A Predictive System Shutdown Method for Energy Saving of Event-driven Computation,” International Conference on Computer Aided Design,” November 1997. [11] Intel, “Intel StrongARM 1100 Microprocessor Developers Manual,” August 1999. [12] Intel, Microsoft, and Toshiba, “Advanced Configuration and Power Interface Specification,” http://www.teleport.com/~acpi/, 2000. [13] J. R. Lorch and A. J. Smith, “Scheduling Techniques for Reducing Processor Energy Use in MacOS,” Wireless Networks, Vol. 3, Num. 5, pp. 311-324, 1997. [14] Y.-H. Lu, L. Benimi, and G. De Micheli, “Low-power Task Scheduling for Multiple Devices,” International Workshop on Hardware/Software Codesign, 2000. [15] Y. Lu and G. De Micheli, “Adaptive Hard Disk Power Management on Personal Computers,” IEEE Great Lakes Symposium on VLSI, February 1999. [16] Microsoft, “Benefits of OnNow,” http://msdn.microsoft.com/training/offers/ WINVBO_BLD /Topics/winvb00325.htm, 2000. [17] R. Min, M. Bhardwaj, S.-H. Cho, A. Sinha, E. Shih, A. Wang, and Anantha Chandrakassan, “An Architecture for a Power-aware Distributed Microsensor Node,” IEEE Workshop on Signal Processing Systems, 2000. [18] Mitsubishi Microcomputers, M32000D4BFP-80 Data Book, http://www.mitsubishichips.com/data/datasheets/mcus/ mcupdf/ds/e32r80.pdf.

11

[19] K. R. Moore, J. F. Wilkerson, D. Call, S. Johnson, T. Payne, W. Ford, K. Spencer, and C. Baumgart, “A Space-Based Classification System for RF Transients,” International Workshop on Artificial Intelligence in Solar-Terrestrial Physics, Lund, Sweden, 1993. [20] Q. Qiu and M. Pedram, “Dynamic Power Management Based on Continuous-Time Markov Decision Processes,” Design Automation Conference, June 1999. [21] Q. Qui, Q. Wu, and M. Pedram, “Dynamic Power Management of Complex Systems Using Generalized Stochastic Petri Nets,” Design Automation Conference, June 2000. [22] G. Qu and M Potkonjak, “Power Minimization using System Level Partitioning of Applications with Quality of Service Requirement,” International Conference on Computer Aided Design , 1999. [23] J. Pouwelse, K. Langendoen, and H. Sips, “Dynamic Voltage Scaling on a Low-Power Microprocessor,” UbiComTechnical Report, April 2000. [24] Y. Shin and K. Choi, “Power Conscious Fixed Priority Scheduling for Hard Real-Time Systems,” Design Automation Conference, 1999. [25] M. Srivastava, A. Chandrakasan, and R. Brodersen, “Predictive System Shutdown and Other Architectural Techniques for Energy Efficient Programmable Computation,” IEEE Transactions on VLSI System, Vol. 4, pp. 42-55, March 1996. [26] J. Suh, C. Li, S. P. Crago, and R. Parker, “A PIM-Based Multiprocessor System,” International Parallel nd Distributed Processing Symposium, San Francisco, CA, April 2001. [27] M. Weiser, B. Welch, A. Demers, and S. Shenker, “Scheduling for Reduced CPU Energy,” Symposium on Operating Systems Design and Implementation, 1994.

8. Acknowledgment The authors appreciate the help of Maya Gokhale and Kevin McCabe at Los Alamos National Laboratory in providing FORTE information.

9. Appendix Table 2 Initial power allocation computation Time (Sec) Iteration 1 Pinit Integration 2 Pinit Integration 3 Pinit Integration 4 Pinit Integration 5 Pinit Integration

0 4.8 9.614.419.2 2428.833.638.443.2 4852.8 1.891.210.320.321.212.03 1.91.210.320.321.212.03 0.471.623.655.696.847.165.274.063.733.41 2.20.17 2.362.011.121.122.012.36 1.91.210.320.321.212.03 00.351.582.813.163.161.270.06 -0.3 -0.6 -1.8 -3.8 2.262.011.121.122.012.36 1.91.210.320.321.212.03 0.10.451.682.913.263.261.370.16 -0.2 -0.5 -1.7 -3.7 2.262.241.351.352.242.36 1.91.210.320.321.212.03 0.10.221.232.242.362.360.46 -0.8 -1.1 -1.4 -2.6 -4.6 2.262.241.351.352.241.160.94 0.60.160.16 0.6 1 0.10.221.232.242.363.552.622.021.86 1.7 1.1 0.1

Table 3 Dynamic update of the power allocation (W) Used t (Sec) Pinit(t) Power 0.0 4.8 9.6 14.4 19.2 24.0 28.8 33.6 38.4 43.2 48.0

2.26 2.24 1.33 1.38 2.21 1.29 1.19 0.86 0.52 0.52 0.80

2.36 2.36 1.18 1.38 2.36 1.18 1.18 0.79 0.49 0.49 0.79

Supplied Charging Power 2.36 2.36 2.36 2.36 2.36 2.36 0.00 0.00 0.00 0.00 0.00

Pinit (0)

Pinit (1)

Pinit (2)

Pinit (3)

Pinit (4)

Pinit (5)

Pinit (6)

Pinit (7)

Pinit (8)

Pinit Pinit Pinit (9) (10) (11)

2.26 2.26 2.27 2.27 2.27 2.28 2.11 2.04 2.01 1.94 1.83

2.24 2.19 2.2 2.21 2.21 2.22 2.06 1.99 1.97 1.9 1.79

1.35 1.33 1.38 1.45 1.45 1.52 1.41 1.37 1.35 1.3 1.23

1.35 1.33 1.38 1.45 1.45 1.52 1.41 1.37 1.35 1.3 1.23

2.24 2.19 2.2 2.21 2.21 2.22 2.06 1.99 1.97 1.9 1.79

1.16 1.14 1.21 1.29 1.29 1.37 1.28 1.24 1.22 1.18 1.11

0.94 0.92 1 1.09 1.09 1.19 1.11 1.07 1.06 1.02 0.97

0.6 0.59 0.69 0.8 0.8 0.92 0.86 0.83 0.82 0.8 0.75

0.16 0.16 0.28 0.42 0.42 0.58 0.54 0.52 0.52 0.5 0.48

0.16 0.16 0.28 0.42 0.42 0.58 0.54 0.52 0.52 0.5 0.48

12

0.6 0.59 0.69 0.8 0.8 0.92 0.86 0.83 0.82 0.8 0.75

1 1 1.08 1.17 1.17 1.26 1.17 1.13 1.12 1.08 1.02

52.8 57.6 62.4 67.2 72.0 76.8 81.6 86.4 91.2 96.0 101.0 106.0 110.0

1.02 1.74 1.73 1.28 1.35 1.83 1.36 1.32 1.10 0.88 0.84 0.94 1.00

0.98 1.57 1.57 1.18 1.38 1.97 1.38 1.18 0.98 0.79 0.79 0.98 0.98

0.00 2.36 2.36 2.36 2.36 2.36 2.36 0.00 0.00 0.00 0.00 0.00 0.00

1.74 1.77 1.8 1.83 1.87 1.88 1.91 1.78 1.69 1.6 1.51 1.4 1.28

1.7 1.73 1.76 1.79 1.83 1.85 1.88 1.75 1.66 1.57 1.48 1.37 1.26

1.17 1.23 1.28 1.35 1.42 1.45 1.5 1.4 1.33 1.26 1.19 1.1 1.02

1.17 1.23 1.28 1.35 1.42 1.45 1.5 1.4 1.33 1.26 1.19 1.1 1.02

1.7 1.73 1.76 1.79 1.83 1.85 1.88 1.75 1.66 1.57 1.48 1.37 1.26

1.06 1.12 1.18 1.25 1.33 1.36 1.42 1.32 1.26 1.19 1.13 1.05 0.96

0.92 0.99 1.06 1.14 1.22 1.26 1.32 1.23 1.17 1.11 1.05 0.98 0.9

0.72 0.8 0.88 0.96 1.06 1.1 1.18 1.1 1.04 0.99 0.94 0.87 0.8

0.46 0.55 0.64 0.74 0.85 0.9 0.99 0.92 0.88 0.84 0.79 0.74 0.68

0.46 0.55 0.64 0.74 0.85 0.9 0.99 0.92 0.88 0.84 0.79 0.74 0.68

0.72 0.8 0.88 0.96 1.06 1.1 1.18 1.1 1.04 0.99 0.94 0.87 0.8

0.97 1.04 1.11 1.18 1.26 1.29 1.36 1.27 1.2 1.14 1.08 1 0.92

Table 4 Initial power allocation computation Time (Sec) Iteration 1 Pinit Integration 2 Pinit Integration 3 Pinit Integration 4 Pinit Integration 5 Pinit Integration

0 0.59 2.65 2.43 0.81 2.43 0.81 2.43 0.81 2.43 0.81

4.8 0.88 5.31 2.73 1.62 2.73 1.62 2.73 1.62 2.73 1.62

9.6 0.88 7.96 2.73 2.43 2.73 2.43 2.73 2.43 2.73 2.43

14.4 0.59 10.9 2.43 3.54 2.43 3.54 2.43 3.54 2.43 3.54

19.2 3.54 8.25 3.54 0.88 1.81 2.61 1.53 2.9 1.53 2.9

24 3.54 4.72 3.54 -2.7 1.81 0.79 1.53 1.37 1.53 1.37

28.8 33.6 38.4 43.2 48 52.8 2.95 0 0.59 1.77 2.95 2.36 1.77 1.77 2.06 1.18 0 0 2.95 0 0.59 1.77 2.95 2.36 -5.6 -5.6 -5.3 -6.2 -7.4 -7.4 1.51 0 0.3 0.91 1.51 2.36 -0.7 -0.7 -0.1 -0.2 0.1 0.1 1.27 0 0.3 0.91 1.51 2.36 0.1 0.1 0.68 0.66 0.91 0.91 1.27 0 0.3 0.91 1.51 2.36 0.1 0.1 0.68 0.66 0.91 0.91

Table 5 Dynamic update of the power allocation (W) Used t (Sec) Pinit(t) Power 0.0 4.8 9.6 14.4 19.2 24.0 28.8 33.6 38.4 43.2 48.0 52.8 57.6 62.4 67.2 72.0 76.8 81.6 86.4 91.2 96.0 101.0 106.0 110.0

2.43 2.93 3.04 2.43 1.53 1.50 1.17 0.01 0.31 0.90 1.39 2.08 2.39 2.68 2.78 2.30 1.75 1.72 1.44 0.62 0.78 1.13 1.45 1.93

2.36 2.95 2.95 2.36 1.57 1.38 1.18 0.00 0.29 0.79 1.38 2.06 2.36 2.75 2.75 2.36 1.57 1.57 1.38 0.59 0.79 0.98 1.38 1.97

Supplied Charging Power 3.24 3.54 3.54 3.54 0.88 0.00 0.00 0.00 0.88 0.88 1.77 2.36 3.24 3.54 3.54 3.54 0.88 0.00 0.00 0.00 0.88 0.88 1.77 2.36

Pinit (0)

Pinit (1)

Pinit (2)

Pinit (3)

Pinit (4)

Pinit (5)

Pinit (6)

Pinit (7)

Pinit (8)

Pinit Pinit Pinit (9) (10) (11)

2.72 2.72 2.72 2.72 2.66 2.48 2.32 2.33 2.37 2.36 2.36 2.39 2.44 2.47 2.5 2.55 2.51 2.33 2.17 2.13 2.13 2.11 2.12 2.14

2.93 3.04 3.04 3.04 2.98 2.77 2.59 2.6 2.63 2.62 2.62 2.64 2.68 2.71 2.73 2.77 2.72 2.53 2.35 2.31 2.31 2.29 2.3 2.31

2.93 3.04 3.13 3.13 3.07 2.85 2.67 2.68 2.71 2.69 2.7 2.72 2.75 2.78 2.79 2.83 2.78 2.58 2.41 2.36 2.36 2.34 2.35 2.36

2.43 2.43 2.43 2.43 2.38 2.22 2.08 2.09 2.14 2.13 2.13 2.17 2.22 2.27 2.3 2.36 2.32 2.15 2.01 1.97 1.97 1.95 1.97 1.99

1.53 1.53 1.53 1.53 1.5 1.4 1.31 1.33 1.4 1.4 1.4 1.46 1.54 1.61 1.65 1.75 1.72 1.6 1.49 1.46 1.47 1.46 1.47 1.5

1.53 1.53 1.53 1.53 1.5 1.4 1.31 1.33 1.4 1.4 1.4 1.46 1.54 1.61 1.65 1.75 1.72 1.6 1.49 1.46 1.47 1.46 1.47 1.5

1.27 1.27 1.27 1.27 1.25 1.17 1.09 1.12 1.2 1.19 1.2 1.25 1.35 1.42 1.47 1.58 1.55 1.44 1.35 1.32 1.33 1.32 1.33 1.37

0 0 0 0 0 0.01 0.01 0.05 0.16 0.16 0.18 0.25 0.39 0.5 0.56 0.72 0.71 0.66 0.62 0.61 0.62 0.62 0.64 0.69

0.3 0.3 0.3 0.3 0.3 0.28 0.27 0.31 0.41 0.41 0.42 0.49 0.62 0.72 0.78 0.92 0.91 0.85 0.8 0.78 0.79 0.78 0.8 0.85

0.91 0.91 0.91 0.91 0.89 0.83 0.78 0.81 0.9 0.9 0.91 0.97 1.07 1.16 1.21 1.33 1.31 1.22 1.14 1.12 1.13 1.12 1.13 1.17

13

1.51 1.51 1.51 1.51 1.48 1.38 1.3 1.32 1.39 1.39 1.39 1.44 1.53 1.6 1.64 1.74 1.71 1.59 1.49 1.46 1.46 1.45 1.46 1.5

2.36 2.36 2.36 2.36 2.31 2.15 2.02 2.03 2.08 2.07 2.08 2.11 2.17 2.21 2.24 2.31 2.27 2.11 1.97 1.93 1.93 1.91 1.93 1.95

Lihat lebih banyak...

Dynamic power management of multiprocessor systems

Descrição do Produto

Comentários