Logarithmic Arithmetic for Low-Power Adaptive Control Systems

May 25, 2017 | Autor: Patricio Bulic | Categoria: FPGA, Computer Arithmetic, Kalman Filter
Share Embed


Descrição do Produto

Circuits Syst Signal Process DOI 10.1007/s00034-016-0486-1

Logarithmic Arithmetic for Low-Power Adaptive Control Systems Uroš Lotriˇc1 · Patricio Buli´c1

Received: 5 April 2016 / Revised: 21 December 2016 / Accepted: 23 December 2016 © Springer Science+Business Media New York 2016

Abstract To reduce the power dissipation in adaptive control systems, we propose replacing the exact arithmetic hardware units with approximate ones. As a case study, an adaptive control system for object tracking based on the Kalman filter is implemented in FPGA. A thorough analysis of the Kalman filter’s circuitry for real-world object tracks acquired by an aviation radar system proved that adaptive control systems can successfully compensate for the calculation errors introduced by the approximate arithmetic units. The main contributions of this paper are that the introduction of the approximate arithmetic circuits to the adaptive control system (1) preserves the required accuracy and (2) significantly reduces the power dissipation and the size of the adaptive system’s circuitry. Keywords Kalman filter · Adaptive control systems · Approximate arithmetic · FPGA · Low-power design

1 Introduction Adaptive control is an active field in the design of control systems to deal with uncertainties. The key advantage of an adaptive controller is its ability to adjust itself to cope with unknown model uncertainties. Recently, much effort has been focused on adaptive control in both theory and applications. The robustness issues associated with adap-

B

Patricio Buli´c [email protected] Uroš Lotriˇc [email protected]

1

Faculty of Computer and Information Science, University of Ljubljana, Veˇcna pot 113, 1000 Ljubljana, Slovenia

Circuits Syst Signal Process

tive control under non-ideal conditions such as external disturbances and unmodelled dynamics were addressed, which resulted in many different robust adaptive control algorithms [20,25–27]. Over the years, there has been an ever-increasing trend for using adaptive systems in a variety of applications, from signal processing to artificial intelligence and control [7,17,18,33,40,44,47,48,50]. For example, in [15] model identification and adaptive control design were performed on the Devanit–Hartenberg model of a humanoid robot. Also, adaptive control systems used in autonomous vehicles have received great attention [5,23,34] in recent years. With the ever-increasing need for omnipresence and miniaturization, the cost of control systems is becoming very important. The authors of [9] aimed to solve the optimal tracking control problem. They used a fuzzy logic system to approximate the long-term utility function, and the Lyapunov analysis approach was utilized to prove the stability of the chaotic system. In such a way, they managed to reduce the cost of the controller. Furthermore, in [24], an adaptive control technique based on Lyapunov functions was developed for a class of uncertain nonlinear parametric systems. No matter whether they are integrated in high-performance devices or mobile devices, an important aspect addressed in their design is to gain as much computational power as possible and at the same time to keep power consumption and the device size as small as possible. It is very common to implement adaptive algorithms in general-purpose computers. However, it is easier to achieve the aforementioned goals by implementing the algorithms in hardware, using designs tailored to the specific processing needs of the target application [32]. Object-tracking systems require a large number of complex arithmetic operations, which result in significant power dissipation. In the applications where power dissipation is extremely important, approximate solutions to multiplication and other arithmetic operations can be used if their introduction does not critically affect the application’s performance. The most common approaches include the adaptation of general equations to application specifics, linearization and fixed-point computation. Considering the cost and implementation complexity, fixed-point solutions are generally even more efficient than their floating-point counterparts [22]. Moreover, the exact and complex fixed-point units can be replaced by approximate and simpler ones [3,13,30,31]. An appropriate selection of the number system can reduce the power dissipation, because it can reduce the number of operations, the strength of operators and the activity of the data [35,36,42]. The logarithmic number system has recently attracted the interest of researchers because of its low power consumption [21,35,36,42], which arises from the simplification of the arithmetic operations. This paper focuses on the use of the logarithmic number system for the implementation of intensively used arithmetic operations in adaptive control systems. Since adaptive algorithms can adjust to changes in the environment, it is anticipated that they will also be able to compensate for the computational errors, internally produced by the approximate arithmetic units. The study [45] proposed a small-insize and energy-efficient implementation of embedded, real-time model predictive control that uses a 16-bit logarithmic number system arithmetic unit. Research in the field of machine learning [28] showed that the replacement of exact multipliers by approximate counterparts in highly adaptive neural network models does not have

Circuits Syst Signal Process

any noticeable impact on their processing and learning accuracy. Similar behavior is expected in more rigorous adaptive object-tracking applications. The Kalman filter is one of the most common adaptive algorithms for objecttracking applications. It is able to iteratively update and estimate the underlying system state given a series of inaccurate and uncertain measurements. It can grasp the dynamics of a given system solely by observing its location over time. The equations governing the Kalman filter include many matrix multiplications and reciprocals, which can be efficiently processed in parallel, thus reducing the processing time [37,38,41,46,49]. As the number of concurrently executed arithmetic operations increases, any reduction in power dissipation as well as in the usage of the hardware resources should be even more pronounced. The rest of this article is organized as follows: Sect. 2 presents the Kalman filter model for object tracking, and Sect. 3 introduces the basic approximate arithmetic units. Section 4 presents the design of the Kalman filter circuitry, while Sect. 5, with its comprehensive theoretical and experimental analyses, gives an overview of the parameter settings of the Kalman filter circuitry. The paper concludes with the main findings on the applicability of approximate logarithmic arithmetic in adaptive control systems.

2 The Kalman Filter The discrete Kalman filter [14,39,51,52] is a powerful linear estimator. It can be used to estimate stationary and non-stationary processes that are described by the linear state transition model xk+1 = Fxk + wk , zk = Hxk + vk ,

(1) (2)

where xk is the state of the process at a discrete time k and F is the transition matrix. The process can be measured through the observable zk , obtained from the state of the process by the measurement matrix H. The process noise wk and the measurement noise vk are considered to be independent, zero-mean, and Gaussian with covariance matrices Q and R, respectively. The Kalman filter recursively estimates the state of the process from the previous state estimate and the new measurement. The estimation process can be described with the equations of the predicted state estimate and the predicted estimate covariance xˆ k− = Fˆxk−1 , Pk−

(3)

= FPk−1 F + Q, T

(4)

the optimal Kalman gain equation  −1 Gk = Pk− HT HPk− HT + R ,

(5)

and the equations for the updated state estimate and the updated estimate covariance

Circuits Syst Signal Process

  xˆ k = xˆ k− + Gk zk − Hxˆ k− ,

(6)

Pk =

(7)

Pk−

− Gk HPk− .

The predicted and the updated values of the observable can be computed, respectively, as zˆ k− = Hxˆ k− and zˆ k = Hxˆ k .

(8)

The Kalman filter has been successfully applied in a series of applications from the field of navigation, trajectory estimations of mobile robots, the positions of vehicles and the localization of mobile devices [8,11,43]. The proposed implementation uses the Kalman filter to refine the position of the object recorded by a two-dimensional radar. A very similar form of matrices is also obtained for the improved localization of a mobile device based on the distance to the mobile network’s base stations [8]. Most radar systems give the object position in terms of the range r (t) and the bearing or the angle θ (t). In the constant-velocity model [10], the range rate v(t) and angle rate ω(t) should be constant. However, in the real world both rates will undergo some perturbations, i.e., dv(t)/dt = a(t) and dω(t)/dt = α(t), where a(t) and α(t) are the radial and angular accelerations. Under the assumption that both perturbations are non-additive, zero-mean and Gaussian, E[a(t)] = E[α(t)] = 0, E[a(t)a(t  )] = a 2 δ(t − t  ), and E[α(t)α(t  )] = α 2 δ(t − t  ), the movement of the object is described as ∂ x(t) = Ax(t) + B(t), ∂t

(9)

with the state vector x(t) = [r (t), θ (t), v(t), ω(t)]T and the matrices ⎡ ⎤ ⎡ ⎤ 0010 0 ⎢0 0 0 1⎥ ⎢ 0 ⎥ ⎥ ⎢ ⎥ A=⎢ ⎣0 0 0 0⎦ , B(t) = ⎣ a(t) ⎦ . 0000 α(t)

(10)

When the system is observed at regular sampling instances xk = x(kt), where k ∈ N0 and t is the sampling period, we can express (9) in terms of the state transition model given by (1). The integration of (9) over one sampling period [10] gives the transition matrix and the process-noise covariance matrix ⎡

1 ⎢0 F=⎢ ⎣0 0

0 1 0 0

t 0 1 0

⎡ a 2 t 3 ⎤ 0 3 ⎢ 0 t ⎥ ⎢ ⎥, Q = ⎢ 2 2 0⎦ ⎣ a t 2 1 0

0 α 2 t 3 3

0 α 2 t 2 2

a 2 t 2 2

0



2 2⎥ 0 α t 2 ⎥ ⎥. a 2 t 0 ⎦ 0 α 2 t

(11)

Usually, radar systems only provide measurements of an object’s position in terms of independent range and angle. Thus, the measurement matrix and the measurement covariance matrix equal

Circuits Syst Signal Process

2 σ 0 1000 , R= r 2 , 0100 0 σθ

H=

(12)

where σr2 and σθ2 are the measurement variances of the range and angle, respectively. To initialize the Kalman filter, we set P0 = Q and derive the range and the angular (0) rates of the initial state vector xˆ 0 from the first two measurements, v(0) = r (t)−r t and ω(0) = θ(t)−θ(0) , respectively. t

3 Approximate Arithmetic Units As multiplication and reciprocal are complex operations that are heavily used in the Kalman filter, we propose a simplification of these operations with approximate logarithmic methods. One of the oldest methods to approximate the multiplication and division was suggested by Mitchell [30]. Later, several methods for improving the accuracy of Mitchell’s method were proposed in the literature [1,2,4,13,29]. Recently, researchers suggested an iterative logarithmic multiplier [4] to iteratively refine a result. Similarly, the reciprocal can be obtained iteratively by using multiplications. 3.1 Iterative Logarithmic Multiplier Babi´c et al. [4] proposed a low-power iterative logarithmic multiplier (ILM) for product approximation. It has the ability to refine a result to an arbitrary precision. The basic idea is as follows. The non-negative integer numbers M j and N j , j ∈ N, can be written as (13) M j = 2l(M j ) + M j+1 and N j = 2l(N j ) + N j+1 where the function l(x) = log2 x gives the position of the leading one in the binary representation of a number, while the numbers M j+1 and N j+1 represent the binary numbers M j and N j without the leading ones, respectively. Using the above representation, we can write the product of the numbers M j and N j as P j = M j N j = C j + P j+1 ,

(14)

C j = 2l(M j )+l(N j ) + M j+1 2l(N j ) + N j+1 2l(M j ) .

(15)

where

It is important to note that the hardware implementation of the term C j requires only a few shift and add operations, while the implementation of the term P j+1 requires an exact multiplication. Discarding the term P j+1 from (14) gives an approximate product, which can be underestimated for at most 25%. However, by repeating the above procedure on the term P j+1 , we can approximate the product to an arbitrary precision without using the exact multiplier. By repeating the above procedure p times, the approximation of the product P1 of the numbers M1 and N1 becomes

Circuits Syst Signal Process

p p p P1 = c et al. [4] j=1 C j and introduces an error equal to P1 − P1 = Pp+1 . Babi´ showed that in the worst-case scenario the relative error Pp+1 /P1 decays exponentially with the rate 2−2 per pass. Our implementation of the iterative logarithmic multiplier is presented in Fig. 1a. The control unit orchestrates the operation of the iterative logarithmic multiplier and determines the number of passes p though the ILM unit. In the jth pass, the ILM unit calculates the term C j according to (15). It starts by calculating the two characteristic numbers, l(M j ), l(N j ), and the two residues, M j+1 , N j+1 , using the leading-one detectors (LOD) [1], the simple encoders and the exclusive-or gates, respectively. Then it uses an adder and two barrel shifters (SHL) to obtain the terms l(M j ) + l(N j ), M j+1 2l(N j ) and N j+1 2l(M j ) . Next it calculates 2l(M j )+l(N j ) and M j+1 2l(N j ) + N j+1 2l(M j ) and adds them to the term C j . The unit also checks whether M j or N j equals zero in any of the passes. In this case, the unit immediately sets C j to zero and suppresses any further processing.

3.2 Reciprocal Unit Based on Iterative Logarithmic Multiplier There are many digital schemes for fixed-point division and reciprocal [6,12,13,16, 19,30]. To decrease the power dissipation, we propose a unit that calculates the reciprocal of a fixed-point number using the Newton–Raphson algorithm and the iterative logarithmic multiplier. The unit depicted in Fig. 1b is designed to perform an arbitrary number of iterations and to use any size of the initial approximation table. Due to its quadratic convergence, the Newton–Raphson method is commonly used to find the root of a function. By applying it to the function f (X ) = X −1 − N with the root at the reciprocal of the number N , the Newton–Raphson iteration equation becomes   f (X j ) = 2 − N X j X j, (16) X j+1 = X j −  f (X j ) where X j is the current and X j+1 is the next estimate of the root. The iteration part of the proposed reciprocal unit, depicted in Fig. 1b, assumes that the number N is represented by b bits as 1 · n b−2 . . . n 1 n 0 . In such a way, the number N is confined to the interval [1, 2) and its reciprocal approximation X j to the interval (0.5, 1]. The representation of the reciprocal approximation X j is in the form 0 · xb−1 . . . x1 x0 . In the first phase of the iteration, an approximation of the reciprocal is multiplied by the number N , resulting in the product P = N X j , represented as p2b−1 · p2b−2 . . . p0 . We use only the b most significant bits from the product. The term (2 − N X j ) can be replaced with the one’s complement of the b most significant bits, if we approximate the constant 2 by b bits as 1 · 1 . . . 1. This approximation introduces a negligible error of 2−(b−1) . The result is then saved to a b-bit register. In the second phase, the multiplication of the one’s complement with X j yields the new approximation of the reciprocal X j+1 = (2− N X j )X j , represented by the b most significant fractional bits. The approximation of the reciprocal can be further improved by repeating both phases, where in the first phase the initial approximation is replaced by the new approximation

Circuits Syst Signal Process

(a)

(b)

Fig. 1 Block diagrams of basic circuits. a Iterative logarithmic multiplier, b reciprocal unit

X j+1 . After q iterations, the unit outputs the final approximation X q . In most cases, the implementations use two iterations. The initial approximation X 0 is obtained from the initial approximation table. A common method when designing the initial approximation table is to implement a piecewise-constant approximation of the reciprocal function. The table comprises 2s entries of the size b bits, where s ∈ N0 . Each entry i ∈ [0, 2s − 1] in the table is obtainedfrom the corresponding interval midpoint Vi = 1 + i2−s + 2−(s+1)  −1 b+1 −1 b+1 /2 . The method takes the table entry Ti , i = as Ti = Vi 2 +2 s−1 j j=0 n b−s+ j−1 2 , for the initial approximation of the reciprocal. More generally, we want to obtain the reciprocal of the number M represented with one integer bit and b − 1 fractional bits in the fixed-point notation, i.e., its format is m b−1 · m b−2 . . . m 0 . The number M must be initially scaled to the interval [1, 2) by shifting it left by b − 1 − l(M) bits. The shift-left operation is accomplished with the barrel shifter (SHL), while l(M) is obtained using a leading-one detector (LOD) and a simple encoder. The shifted number N = M2b−1−l(M) enters the iteration part of the reciprocal unit. The approximation of the reciprocal of M can be expressed as M −1 ≈ X q 2b−1−l(M) . Thus, M −1 is obtained by shifting left the approximation by b − 1 − l(M) bits. If we use the reciprocal unit to implement division, we obtain a higher precision when we postpone the shift after the multiplication of the reciprocal of the divisor and the dividend. Therefore, the reciprocal unit also outputs the value l(M).

Circuits Syst Signal Process

Average relative error

0.015

p = 2, q = 1 p = 2, q = 2 p = 0, q = 0

0.01

0.005

0

0

1

2

3

4

5

6

7

8

9

10

Size s of the initial approximation table

Fig. 2 Mean relative error of the reciprocal unit

Figure 2 shows the relative error of the reciprocal unit averaged over all possible operands for different parameter settings. With p = 0, q = 0, we denoted the error where the reciprocals are obtained solely from the initial approximation table, without the iterative logarithmic multiplier and Newton–Raphson iterations. As expected, the error caused by the approximate multiplication affects the convergence. For example, when the iteration part of the reciprocal unit uses p = 2 passes through the ILM unit, increasing the size of the initial approximation table to more than s = 9 does not improve the result. Furthermore, for s ≥ 5 the iteration part of the unit does not improve the initial approximation.

4 Hardware Implementation of the Kalman Filter The Kalman filter equations comprise a large number of matrix multiplications and a matrix inverse. In our model, the most demanding are the computations over matrices of size 4 × 4, which appear in (4) and (7). To make these computations effective, we propose an appropriate operand representation and two special-purpose hardware units, a dot-product unit and a multiply-and-divide unit. The dot-product unit exploits the fact that any matrix multiplication can be decomposed to a series of independent dot products. In our application, the matrices and vectors are rather small, so we integrated four dot-product units into the larger multiply-and-divide (MAD) unit that is capable of computing four dot products in parallel. The MAD unit also includes two reciprocal units that serve in the computation of the inverse.

4.1 Operand Representation and its Impact on the Circuitry All the operands in our design of the Kalman filter use the same fixed-point representation. As the computation of the Kalman filter involves a large number of multiplications, we decided to use the sign-and-magnitude representation with one bit used for the sign (sa ) and b bits used for the magnitude. The magnitude is specified as a fixed-point number with f fractional bits and b − f integer bits. Thus, a bit-level representation of an arbitrary operand A equals sa ab−1 . . . a f · a f −1 . . . a0 .

Circuits Syst Signal Process

To multiply two such operands, the iterative logarithmic multiplier can be used. The sign bits of the operands do not enter the iterative logarithmic multiplier, but are guided only to the exclusive-or logic gate that outputs the sign bit of the product. Only the magnitude bits enter the iterative logarithmic multiplier, which outputs the magnitude of their product represented by 2(b − f ) integer bits and 2 f fractional bits. To obtain the product in the required fixed-point representation, we must keep the middle bits: b − f least significant integer bits and f most significant fractional bits. A similar situation occurs with the reciprocal unit. The sign of the reciprocal is the same as the sign of the operand, so only the magnitude bits enter the reciprocal unit. The reciprocal unit requires operands with b − 1 fractional bits. Therefore, an arbitrary operand B with f fractional bits must be scaled to M = B2−(b−1− f ) . As B −1 = M −1 2−(b−1− f ) ≈ X q 2 f −l(M) , we shift the output of the reciprocal unit X q to obtain the reciprocal in the required representation. The quotient is obtained by multiplication of the dividend A and the reciprocal of the divisor B. To obtain the quotient with the maximum accuracy, we first multiply the dividend by X q . The result AX q has b − f integer bits and b + f fractional bits. To obtain the quotient in the required representation, we need to shift right the result by b − f + l(M) bits and take b least significant bits. The sign-and-magnitude representation of the operands is appropriate for the multiplication and computation of the reciprocal, but not for the summation of the operands with the opposite signs. To obtain the exact sum, we should convert the operands to the two’s complement representation, add them and convert the sum back to the signand-magnitude representation. To simplify the process, we convert the operands to the one’s complement and use the two’s complement addition. In this way, we introduce an error at the least significant bit. The same idea is appropriate for the subtraction of equal-sign operands.

4.2 The Kalman Filter Circuitry The dot-product (DP) unit consists of four iterative logarithmic multipliers and three adders connected in a tree-like structure, as shown in Fig. 3a. The adders take care of the conversion between both operand representations. The DP unit accepts two vectors A = [A1 , . . . , A4 ] and  B = [B1 , . . . , B4 ] with four elements each and outputs their dot product DP = 4j=1 A j B j . One of the matrix multiplications in (7) involves a dot-product computation of the vectors with two elements. For their efficient calculation, the DP unit outputs the partial sums PS = [PS1 , PS2 ]. It also outputs the vector of the products Q = [Q 1 , . . . , Q 4 ] properly scaled for the use after the multiplication with the reciprocal. The multiply-and-divide (MAD) unit consists of four DP units and is capable of computing four dot products at a time. It contains two reciprocal units that support the calculation of the Kalman gain matrix Gk . The form of the matrices H and R in (5) ensures that the matrix HPk− HT + R is a 2 × 2 diagonal matrix. Thus, we can easily obtain the Kalman gain matrix Gk by taking the reciprocal of both diagonal elements and multiplying the elements of the matrix Pk− HT by them.

Circuits Syst Signal Process

(a)

(b)

Fig. 3 Block diagrams of the proposed hardware units. Where not specifically marked, the line width is b + 1 bits. a Dot product unit, b the multiply-add-divide unit

Although the MAD unit can compute four dot products in a single step, the multiplication of large matrices requires more invocations of the MAD unit. Some processing steps utilize only half of the resources in the MAD unit (two DP units out of four). This is the situation even though the average utilization of the MAD unit is more than 90% per step. We could have implemented the 100% utilized MAD unit with only two DP units, but it would require a larger control unit and a longer computation time. Our experimental results showed that such an unit dissipates only 78% of the power, but requires 80% more clock cycles, resulting in a more than 40% higher energy consumption. This confirms the appropriateness of the proposed design of the MAD unit. The Kalman filter circuitry depicted in Fig. 4 consists of the MAD unit, the addition/subtraction (AS) unit, the random access memory (RAM), the read-only memory (ROM), the multiplexers and the control unit. The MAD unit and the AS unit do all the arithmetic calculations. The RAM of size 16 vectors of four operands is directly connected to the data bus and functions primarily as a storage of the intermediate results. The ROM connected through a multiplexer to the AS unit stores the constants involved in the computation. The control unit orchestrates the operation of the Kalman filter circuitry. By switching the multiplexers, it selects the appropriate operands from the data bus, routes them to the MAD and other units, chooses the type of operation in the units and stores the results in the RAM.

Circuits Syst Signal Process

Fig. 4 Kalman filter circuitry. Where not specifically marked, the data width is 4 × (b + 1) bits

5 Experimental Work In this section, we present the methodology used to design the proper circuitry. In the first subsection, we show how we normalized the data to simplify the design of the circuitry and how we estimated the measurement uncertainty. In the next section, based on the obtained restraints, we searched for the optimal Kalman filter circuitry parameters. The analysis was two fold: From Student’s T -statistics, we first devised the minimum requirements, which we later aggravated by observing the maximum error between the updated measurement values. The analysis is broad enough to show the influence of particular parameters on the behavior of the circuitry. Subsection three shows the effect of the operand size on the hardware-related performance aspects. Finally, the last subsection shows an integral analysis of the circuit’s performance in terms of accuracy, energy consumption and circuitry size. 5.1 Experimental Data Slovenia Control, Ltd., provided the live tracks from one of its radar stations. The radar outputs a binary stream in the ASTERIX (All purpose STructured Eurocontrol suRveillance Information eXchange) format. From the radar stream, we extracted eight data sets having a few hundreds of range and bearing measurements. Selected datasets are depicted in Fig. 5. The radar has a maximum range of 370 km and a scan time t = 4 s. The accuracy of its range sensor is 160 m, and the accuracy of its bearing sensor is 0.1◦ or 1.75 ×

Circuits Syst Signal Process

250

DATA SET 2 y − position [km]

y − position [km]

200 150 100

DATA SET 5

−18

377 samples

218 samples

−20 −22 −24 −26 −28

50

−30 −150

−100

−50

0

15

50

25

DATA SET 7

−85

DATA SET 8

316 samples

−90

391 samples

y − position [km]

y − position [km]

0

20

x − position [km]

x − position [km]

−10 −20 −30

−95 −100 −105 −110 −115

−40

−120 −125

−130 −120 −110 −100

−90

−80

x − position [km]

−10

0

10

20

30

x − position [km]

Fig. 5 Object trajectories in selected experimental data sets. Diamonds indicate starting positions

10−3 rad. The typical range of the observed objects in eight data sets is about rt = 100 km and their maximum acceleration amax = 2 × 9.8 ms−2 . From the above values, we calculated the noise variances of the matrix R (12) as σr2 = 2.56 × 104 m2 and σθ2 = 3.06 × 10−6 rad2 . We estimated the noise variances 2 = 128 m2 s−4 and α 2 = 13 (amax /rt )2 = 1.28 × of the matrix Q (11) as a 2 = 13 amax 2 −8 −4 10 rad s . In order to use the same fixed-point representation throughout the whole Kalman filter circuitry, we normalized the input data to the interval [0, 100] by multiplying the ranges with the factor fr = 1/3700 m−1 and the bearings with the factor f θ = 100/2π rad−1 . Accordingly, we multiplied the range-related variances by the factor fr2 and the bearing-related variances by the factor f θ2 . 5.2 Parameters of the Kalman Filter Circuitry With the above settings, none of the calculated values in the Kalman filter circuitry exceeded 128, which suggests the use of the fixed-point representation of the operands

Circuits Syst Signal Process

with b − f = 7 integer bits. The variances were additionally multiplied by 213 , which kept the calculated values below 128, but importantly reduced the number of fractional bits in the fixed-point representation. This operation scaled up the predicted and updated estimate covariances in (4) and (7), but had no effect on any other quantity of the Kalman filter. 5.2.1 Student’s T-test We set the parameters of the Kalman filter circuitry in such a way that the updated measurement values in the fixed-point processing, zˆ kfixed , were similar to the updated measurement values in the double-precision floating-point processing, zˆ kfloat . We compared the distributions of the Euclidian distances between the measured and updated values of both models, dkfixed = zk − zˆ kfixed and dkfloat = zk − zˆ kfloat , in terms of Student’s T -test. The T -statistics tests whether the means of the distributions obtained with the fixed-point and floating-point models are different and is calculated  fixed float − d ) (v fixed + v float )/K , where d fixed and d float are the correas T = (d sponding sample means, v fixed and v float are the corresponding sample variances, and K is the number of samples. The null hypothesis states that the means are equal. However, the null hypothesis can be rejected at the 90% confidence level if |T | > 1.645. Figure 6 shows the Student’s T -statistics of the Kalman filter circuitry using an exact multiplier (EM) and of the Kalman filter circuitry using an iterative logarithmic multiplier. As the number of magnitude bits b used in the operand representation increases, the output of the fixed-point Kalman filter circuitry becomes more similar to the floating-point ideal, which reflects in the T -statistics approaching zero. However, at some point the convergence stops due to the error in the multiplication. Each additional ILM pass p importantly improves the convergence. The design of the approximate reciprocal unit also affects the convergence. The convergence improves by increasing the number of reciprocal iterations q and by enlarging the initial approximation table, but is limited by the accuracy of the approximate multiplier. A similar response can be observed for any other data set. To satisfy the above statistical criteria on all the data sets, the Kalman filter circuitry needs at least p = 2 ILM passes without the iteration part (q = 0), b = 19 magnitude bits and the reciprocal unit with a small initial approximation table (s = 2). 5.2.2 Maximum Error Between the Updated Measurement Values To further analyze the configuration of the Kalman circuitry, we imposed additional requirements in terms of the maximum error between the updated measurement values emax = max ˆzkfixed − zˆ kfloat . k

(17)

For the smallest circuitry that complies with the Student’s T -statistics, the maximum error is less than 60 m, which is well below the radar’s accuracy. Table 1 contains the optimal values for the Kalman filter circuitry parameters that keep the error emax below

Circuits Syst Signal Process

T− statistics

(a)

2 1 0 −1

EM ILM, p = 1 ILM, p = 2

−2 16

17

q = 2, s = 8 18

19

20

21

22

23

24

Number of magnitude bits b T− statistics

(b)

2 1 0 −1 −2

q = 0, b = 24 0

1

2

3

4

5

6

7

8

(c)

2

T− statistics

Size s of initial aproximation table 1 0 −1 −2

q = 1, b = 24 0

1

2

3

4

5

6

7

8

(d)

2

T− statistics

Size s of initial aproximation table 1 0 −1 −2

q = 2, b = 24 0

1

2

3

4

5

6

7

8

Size s of initial aproximation table Fig. 6 Student’s T -statistics on data set 4 with respect to: a different numbers of magnitude bits b, and b–d different numbers of reciprocal iterations q and different sizes s of the initial approximation table. In all plots, two variants of the ILM unit are compared to the exact multiplier (EM). Dotted lines denote the 90% confidence level

the given value on all data sets also for emax below 30 m as well as 15 m. For reference, the circuit configuration that satisfies the T -statistics is displayed in bold. Table 1 shows that increased demands on accuracy reflect in a more complex Kalman filter circuitry. Implementations of the Kalman filter circuitry with p = 1 ILM pass, regardless of the number of reciprocal iterations, fail to go below emax = 60 m. When we halve the allowed maximum error emax to 30 m, the implementations need larger initial approximation tables. Besides, implementations with less accurate multiplication require additional magnitude bits. To keep the error emax below 15 m, all the implementations that use approximate arithmetic require additional magnitude bits. The implementation with p = 2 ILM passes, which estimates the reciprocals solely from the initial approximation table (q = 0), fails to meet the requirement. However, with at least q = 1 reciprocal iteration the performance improves.

Circuits Syst Signal Process Table 1 Analysis of the Kalman filter circuitry parameters p/q

b/s

emax

< 60 m

< 30 m

< 15 m

E/2

19/0

19/1

20/0

1/q







2/0

19/2

20/7



2/1

19/0

20/5

22/7

2/2

19/0

20/0

22/1

Parameter p denotes the number of passes through the ILM unit, q is the number of reciprocal unit iterations, b is the number of bits for the operands magnitude, and s the size of the initial approximations table. The parameter p equal to E stands for the exact multiplier

5.3 Power Dissipation and Device Utilization We designed the Kalman filter circuitry with the Xilinx ISE Design Suite 14.3 in the VHDL language. We synthesized the Kalman filter circuitry in the Kintex7 XC7K160T field-programmable gate array device and analyzed the power dissipation with the Xilinx XPower Analyzer 14.3. The field-programmable gate array (FPGA) devices contain programmable logic components called logic cells and a hierarchy of reconfigurable interconnects that allow the cells to be connected. A cell consists of a 4-input look-up table (LUT) that can implement any 4-input Boolean function, and a D-type flip-flop. In such a way, the cells can be interconnected and configured to perform complex digital functions. In order to evaluate the power dissipation, we tested the Kalman filter circuitry at a 25 MHz frequency with a signal rate of 12.5% and an output load of 5 pF. Figure 7 presents the power dissipation, the critical path (the minimal required clock period) and the device utilization for various implementations of the Kalman filter circuitry. The application of approximate arithmetic units reduces the power dissipation of the Kalman filter circuitry. At b = 18 bit, the implementations with approximate arithmetic dissipate two-thirds of the power of the implementations with exact arithmetic, while at b = 22 bits, the introduction of approximate arithmetic reduces the power dissipation to only around 55% of the exact arithmetic. It is also noticeable that the power dissipation for the implementations that use the approximate arithmetic increases with a much slower rate. The critical path linearly increases with the number of magnitude bits. The slope is smaller when approximate arithmetic is used. We can observe that with the increasing number of magnitude bits, the size of the Kalman filter circuitry given in the number of exploited look-up tables (LUTs) increases linearly. Again, the slope is smaller when approximate arithmetic is used. By using approximate arithmetic, we can reduce the usage of the hardware resources to two-thirds of the Kalman filter circuitry with exact arithmetic. The multipliers prevail in our design, which diminishes the importance of the size of the initial approximation table used for reciprocal units.

Circuits Syst Signal Process

Estimated power dissipation [mW]

(a)

150 100 50 0 18

Critical path [ns]

(b)

Thousands of LUTs

19

20

21

22

20

21

22

20

21

22

Number of bits b

25 20 15 10 18

(c)

E/2 2/2

19

Number of bits b

30 25 20 15 10 18

19

Number of bits b Fig. 7 Impact of the approximate circuits on a power dissipation, b speed and c device utilization of the Kalman filter circuitry

5.4 Energy Consumption To select the best approximate design that suits the application needs, we considered energy consumption as well as accuracy. We estimated the energy consumption as the product of their power dissipation, required clock cycles, and the clock period and accuracy in terms of the maximum error emax (17) observed in all eight data sets. The Kalman filter circuitry was running at 25 MHz frequency. Figure 8 shows the behavior of the Kalman filter circuitry for implementations given in Table 1 in terms of energy consumption and accuracy. Figure 8a depicts the correlation between accuracy and energy consumption, while Fig. 8b depicts the correlation between accuracy and device utilization (number of LUTs). For example, the minimum circuit with an acceptable error below 60 m requires at least p = 2 ILM passes without the iteration part (q = 0), b = 19 magnitude bits, and the reciprocal

Circuits Syst Signal Process

(a)

0.35

Estimated energy consumption [µ J]

Circuitry parameters p/q/b/s E/2

E/2/20/0

2/2

2/1

2/0

0.30

2/2/22/1 2/1/22/7 E/2/19/0

E/2/19/1

0.25

2/2/20/0 2/2/19/0 2/1/19/0

0.20

2/1/20/5 2/0/20/7

2/0/19/2 < 60 m 0.15 60

50

< 30 m 40

30

20

< 15 m 10

0

error e max [m]

(b)

20

Circuitry parameters p/q/b/s Circuitry size [thousands of LUTs]

19

E/2

E/2/20/0

2/2

2/1

2/0

2/2/22/1 18

E/2/19/0

E/2/19/1

2/1/22/7

17 16 2/2/20/0

15

2/1/20/5 14

2/2/19/0 2/1/19/0

2/0/20/7

13 2/0/19/2 < 60 m

< 30 m

< 15 m

12 60

50

40

30

20

10

0

error e max [m] Fig. 8 Correlation between a accuracy and energy consumption, b accuracy and size of the Kalman filter circuitry implementations specified in Table 1

unit with a small initial approximation table (s = 2), as we already found at the end of Sect. 5.2.1. If we want the accuracy below 30 m, then the Kalman filter should be implemented with ILM with at least p = 2 passes. The minimum such circuit in terms of energy consumption and device utilization uses the reciprocal unit with zero passes (q = 0), operands are represented with b = 20 bits, and the initial approximation table is of size s = 7. Furthermore, if we wish the error to be below 15 m, then we

Circuits Syst Signal Process

should use p = 2 ILM passes, a reciprocal unit with at least q = 1 passes, b = 22 bits for magnitude, and a rather large initial approximation table (s = 7). Measurements in Fig. 8a, b are distributed around diagonals, which indicates rather strong correlations between the observed quantities. Generally, as the required accuracy increases, the Kalman filter circuitry becomes larger or needs more iterations and thus consumes more energy. The approximate units dissipate less power than the exact ones, but they usually need more clock cycles for the calculation. Nevertheless, the implementations with p = 2 ILM passes are smaller and more energy efficient than the implementations with exact arithmetic. For them, the energy consumption per one Kalman filter iteration reduces to 70–80% of the exact arithmetic. The largest savings are shown by the implementations in which the reciprocal is estimated solely from the initial approximation table. However, the invocation of the iteration part might refine the results at the expense of a slightly higher energy consumption.

6 Conclusion Object-tracking systems appear in many applications, ranging from mobile devices to complex track-while-scan radar systems. Whereas the focus of the first is on a low power consumption, the second demands high computational throughput. In the present work, these problems are addressed by using logarithmic approximations in the arithmetic units. The proposed object-tracking system uses the Kalman filter as a case study. The performance of the proposed object-tracking system is compared to the equivalent system that uses exact arithmetic on real-world radar data sets. The consumption of fewer resources per approximate arithmetic unit results in more resource- and energyefficient designs, with up to 30% savings in the size and power dissipation, while still attaining the required accuracy. The power analysis proves that the use of the approximate arithmetic in the logarithmic number system has a notable impact on the power dissipation. Moreover, the approximate arithmetic units reduce the critical path by up to 40%. The results confirm that for our application the adaptive nature of the Kalman filter compensates for the erroneous calculations introduced by the approximate arithmetic. The approximate arithmetic in the adaptive systems leads to a slimmer, low-power design of the circuits, which can considerably increase throughput for the whole adaptive system. So, we will continue our research in two directions: (1) the study of more efficient circuits for approximate arithmetic operations and (2) the study of the impact of these circuits on more complex and computationally demanding adaptive systems, like deep neural networks. Acknowledgements This research was supported by Slovenian Research Agency under Grants P2-0359 (National research program Pervasive computing) and P2-0241 (National research program Synergetics of complex systems and processes), and by Slovenian Research Agency and Ministry of Civil Affairs, Bosnia and Herzegovina, under Grant BI-BA/16-17-029.

Circuits Syst Signal Process

References 1. K. Abed, R. Siferd, CMOS VLSI implementation of a low-power logarithmic converter. IEEE Trans. Comput. 52(11), 1421–1433 (2003). doi:10.1109/TC.2003.1244940 2. K. Abed, R. Siferd, VLSI implementation of a low-power antilogarithmic converter. IEEE Trans. Comput. 52(9), 1221–1228 (2003). doi:10.1109/TC.2003.1228517 3. F. Auger, Z. Lou, B. Feuvrie, F. Li, Multiplier-free divide, square root, and log algorithms (DSP tips and tricks). Sig. Process. Mag. IEEE 28(4), 122–126 (2011). doi:10.1109/MSP.2011.941101 4. Z. Babi´c, A. Avramovi´c, P. Buli´c, An iterative logarithmic multiplier. Microprocess. Microsyst. 35(1), 23–33 (2011). doi:10.1016/j.micpro.2010.07.001 5. I. Baturone, F.J. Moreno-Velo, S. Sanchez-Solano, A. Ollero, Automatic design of fuzzy controllers for car-like autonomous robots. IEEE Trans. Fuzzy Syst. 12(4), 447–465 (2004). doi:10.1109/TFUZZ. 2004.832532 6. D. Chen, B. Zhou, Z. Guo, P. Nilsson, Design and implementation of reciprocal unit, in Circuits and Systems, 2005. 48th Midwest Symposium on, vol. 2 (2005), pp. 1318–1321. doi:10.1109/MWSCAS. 2005.1594352 7. S. Dixit, D. Nagaria, Design and analysis of cascaded lms adaptive filters for noise cancellation. Circuits Syst. Signal Process. (2016). doi:10.1007/s00034-016-0332-5 8. J.P. Dubois, J.S. Daba, M. Naderm, E.C. Ferkh, GSM position tracking using a Kalman filter. World Acad. Sci. Eng. Technol. 68, 1610–1619 (2012) 9. Y. Gao, Y.J. Liu, Adaptive fuzzy optimal control using direct heuristic dynamic programming for chaotic discrete-time system. J. Vib. Control 22(2), 595–603 (2016). doi:10.1177/1077546314534286 10. M.S. Grewal, A.P. Andrews, Kalman Filtering: Theory and Practice Using MATLAB, 3rd edn. (Wiley, Hoboken, 2011) 11. X.V. Ha, C. Ha, J. Lee, Trajectory estimation of a tracked mobile robot using the sigma-point Kalman filter with an IMU and optical encoder, in ICIC (1) Lecture Notes in Computer Science, vol. 7399, ed. by D.S. Huang, C. Jiang, V. Bevilacqua, J.C.F. García (Springer, Berlin, 2012), pp. 415–422 12. A. Habegger, A. Stahel, J. Goette, M. Jacomet, An efficient hardware implementation for a reciprocal unit. In: Fifth IEEE International Symposium on Electronic Design, Test and Application, 2010. DELTA ’10 (2010), pp. 183–187. doi:10.1109/DELTA.2010.65 13. E. Hall, D. Lynch, S. Dwyer, Generation of products and quotients using approximate binary logarithms for digital filtering applications. IEEE Trans. Comput. C–19(2), 97–105 (1970). doi:10.1109/T-C.1970. 222874 14. S. Haykin, Kalman Filtering and Neural Networks, 2nd edn. (Wiley, Hoboken, 2001) 15. W. He, W. Ge, Y. Li, Y.J. Liu, C. Yang, C. Sun, Model identification and control design for a humanoid robot. IEEE Trans. Syst. Man Cybern. Syst. PP(99), 1–13 (2016). doi:10.1109/TSMC.2016.2557227 16. S.F. Hsiao, C.S. Wen, M.Y. Tsai, Low-cost design of reciprocal function units using shared multipliers and adders for polynomial approximation and Newton Raphson interpolation. in 2010 International Symposium on Next-Generation Electronics (ISNE) (2010) pp. 40–43. doi:10.1109/ISNE.2010. 5669204 17. S. Khan, I. Naseem, R. Togneri, M. Bennamoun, A novel adaptive kernel for the rbf neural networks. Circuits Syst. Signal Process. (2016). doi:10.1007/s00034-016-0375-7 18. A.K. Kohli, D.S. Kapoor, Adaptive filtering techniques using cyclic prefix in OFDM systems for multipath fading channel prediction. Circuits Syst. Signal Process. 35(10), 3595–3618 (2016). doi:10. 1007/s00034-015-0214-2 19. U. Kucukkabak, A. Akkas, Design and implementation of reciprocal unit using table look-up and Newton–Raphson iteration. in Digital System Design, 2004. DSD 2004. Euromicro Symposium on (2004), pp. 249–253. doi:10.1109/DSD.2004.1333284 20. G. Lai, Z. Liu, Y. Zhang, C.L.P. Chen, S. Xie, Y.J. Liu, Fuzzy adaptive inverse compensation method to tracking control of uncertain nonlinear systems with generalized actuator dead zone. IEEE Trans. Fuzzy Syst. PP(99), 1–1 (2016). doi:10.1109/TFUZZ.2016.2554152 21. M. Lastras, B. Parhami, A logarithmic approach to energy-efficient GPU arithmetic for mobile devices. in Signals, Systems and Computers (ASILOMAR), 2013 Conference Record of the Forty Seventh Asilomar Conference on (2013) pp. 1–4 22. D. Lee, A. Gaffar, R. Cheung, O. Mencer, W. Luk, G. Constantinides, Accuracy-guaranteed bit-width optimization. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 25(10), 1990–2000 (2006). doi:10. 1109/TCAD.2006.873887

Circuits Syst Signal Process 23. T.H.S. Li, C.Y. Chen, K.C. Lim, Combination of fuzzy logic control and back propagation neural networks for the autonomous driving control of car-like mobile robot systems. in SICE Annual Conference 2010, Proceedings of (2010), pp. 2071–2076 24. Y. Liu, S. Tong, Barrier Lyapunov functions-based adaptive control for a class of nonlinear purefeedback systems with full state constraints. Automatica 64, 70–75 (2016). doi:10.1016/j.automatica. 2015.10.034 25. Y.J. Liu, Y. Gao, S. Tong, Y. Li, Fuzzy approximation-based adaptive backstepping optimal control for a class of nonlinear discrete-time systems with dead-zone. IEEE Trans. Fuzzy Syst. 24(1), 16–28 (2016). doi:10.1109/TFUZZ.2015.2418000 26. Y.J. Liu, S. Tong, Adaptive fuzzy control for a class of nonlinear discrete-time systems with backlash. IEEE Trans. Fuzzy Syst. 22(5), 1359–1365 (2014). doi:10.1109/TFUZZ.2013.2286837 27. Y.J. Liu, S. Tong, Adaptive fuzzy identification and control for a class of nonlinear pure-feedback MIMO systems with unknown dead zones. IEEE Trans. Fuzzy Syst. 23(5), 1387–1398 (2015). doi:10. 1109/TFUZZ.2014.2360954 ˇ P. BuliC, ` Applicability of approximate multipliers in hardware neural networks. Neuro28. U. LotriC, computing 96, 57–65 (2012). doi:10.1016/j.neucom.2011.09.039 29. V. Mahalingam, N. Ranganathan, Improving accuracy in Mitchell’s logarithmic multiplication using operand decomposition. IEEE Trans. Comput. 55(12), 1523–1535 (2006). doi:10.1109/TC.2006.198 30. J.N. Mitchell, Computer multiplication and division using binary logarithms. IRE Trans. Electron. Comput. EC–11(4), 512–517 (1962). doi:10.1109/TEC.1962.5219391 31. M. Monajati, S.M. Fakhraie, E. Kabir, Approximate arithmetic for low-power image median filtering. Circuits Syst. Signal Process. 34(10), 3191–3219 (2015). doi:10.1007/s00034-015-9997-4 32. E. Monmasson, L. Idkhajine, M. Cirstea, I. Bahri, A. Tisan, Mw Naouar, FPGAs in industrial control applications. IEEE Trans. Industr. Inf. 7(2), 224–243 (2011). doi:10.1109/TII.2011.2123908 33. L. Murali, D. Chitra, T. Manigandan, B. Sharanya, An efficient adaptive filter architecture for improving the seizure detection in EEG signal. Circuits Syst. Signal Process. 35(8), 2914–2931 (2016). doi:10. 1007/s00034-015-0178-2 34. J.E. Naranjo, C. Gonzalez, R. Garcia, T. de Pedro, Lane-change fuzzy control in autonomous vehicles for the overtaking maneuver. IEEE Trans. Intell. Transp. Syst. 9(3), 438–450 (2008). doi:10.1109/ TITS.2008.922880 35. V. Paliouras, T. Stouraitis, Low-power properties of the logarithmic number system, in Computer Arithmetic, 2001. Proceedings. 15th IEEE Symposium on (2001) pp. 229–236. doi:10.1109/ARITH. 2001.930124 36. V. Paliouras, T. Stouraitis, Signal activity and power consumption reduction using the logarithmic number system, in Circuits and Systems, 2001. ISCAS 2001. The 2001 IEEE International Symposium on, vol. 2 (2001), pp. 653–656. doi:10.1109/ISCAS.2001.921155 37. B. Robinson, D. Hernandez-Garduno, M. Saquib, Fixed and floating-point implementations of linear adaptive techniques for predicting physiological hand tremor in microsurgery. IEEE J. Sel. Topics Signal Process. 4(3), 659–667 (2010). doi:10.1109/JSTSP.2010.2048240 38. O. Rosen, A. Medvedev, T. Wigren, Parallelization of the Kalman filter on multicore computational platforms. Control Eng. Pract. 21(9), 1188–1194 (2013). doi:10.1016/j.conengprac.2013.03.008 39. I.W. Sandberg, Robust Kalman filter design for discrete time-delay systems. Circuits Syst. Signal Process. 21(3), 337–343 (2002) 40. Z. Shen, Y. Yu, T. Huang, Normalized subband adaptive filter algorithm with combined step size for acoustic echo cancellation. Circuits Syst. Signal Process. (2016). doi:10.1007/s00034-016-0429-x 41. S. So, A.E.W. George, R. Ghosh, K.K. Paliwal, Kalman filter with sensitivity tuning for improved noise reduction in speech. Circuits Syst. Signal Process. (2016). doi:10.1007/s00034-016-0363-y 42. T. Stouraitis, V. Paliouras, Considering the alternatives in low-power design. IEEE Circuits Devices Mag. 17(4), 22–29 (2001). doi:10.1109/101.950050 43. L. Tamas, G. Lazea, R. Robotin, C. Marcu, S. Herle, Z. Szekely, State estimation based on Kalman filtering techniques in navigation, in Automation, Quality and Testing, Robotics, 2008. AQTR 2008. IEEE International Conference on, vol. 2 (2008), pp. 147–152. doi:10.1109/AQTR.2008.4588811 44. A.A. Vidal, V.G. Tavares, J.C. Príncipe, An adaptive signal processing framework for PV power maximization. Circuits Syst. Signal Process. 34(9), 2973–2992 (2015). doi:10.1007/s00034-015-9972-0 45. P. Vouzis, M. Kothare, L. Bleris, M. Arnold, A system-on-a-chip implementation for embedded realtime model predictive control. IEEE Trans. Control Syst. Technol. 17(5), 1006–1017 (2009). doi:10. 1109/TCST.2008.2004503

Circuits Syst Signal Process 46. C. Wu, X. Wang, Y. Guo, Q. Fu, Y. Yan, Robust uncertainty control of the simplified Kalman filter for acoustic echo cancelation. Circuits Syst. Signal Process. 35(12), 4584–4595 (2016). doi:10.1007/ s00034-016-0263-1 47. S.W. Xu, P.L. Shui, X.Y. Yan, Y.H. Cao, Combined adaptive normalized matched filter detection of moving target in sea clutter. Circuits Syst. Signal Process. (2016). doi:10.1007/s00034-016-0413-5 48. L. Yu, S. Fei, L. Sun, J. Huang, Y. Zhu, Fuzzy approximation of a novel nonlinear adaptive switching controller design. Circuits Syst. Signal Process. 34(2), 377–391 (2015). doi:10.1007/ s00034-014-9866-6 49. W.J. Zhang, S.Y. Wang, Y.L. Feng, Novel simplex Kalman filters. Circuits Syst. Signal Process. (2016). doi:10.1007/s00034-016-0323-6 50. X. Zhao, Y. Yin, H. Yang, R. Li, Adaptive control for a class of switched linear systems using state-dependent switching. Circuits Syst. Signal Process. 34(11), 3681–3695 (2015). doi:10.1007/ s00034-015-0029-1 51. X. Zhu, Y.C. Soh, L. Xie, Robust Kalman filter design for discrete time-delay systems. Circuits Syst. Signal Process. 21(3), 319–335 (2002) 52. Y. Zou, M. Sheng, N. Zhong, S. Xu, A generalized Kalman filter for 2D discrete systems. Circuits Syst. Signal Process. 23(5), 351–364 (2004)

Lihat lebih banyak...

Comentários

Copyright © 2017 DADOSPDF Inc.